From david.holmes at oracle.com Mon Jun 1 01:34:00 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 1 Jun 2020 11:34:00 +1000 Subject: RFR: JDK-8225056 VM support for sealed classes In-Reply-To: <9da783ba-edd9-b5fe-0476-644ba7d01990@oracle.com> References: <7b3430e1-f821-1e2d-2c8b-f1c621f059da@oracle.com> <9d7da8af-cda3-693d-1ea1-1db5069fea97@oracle.com> <9b32addd-d576-268e-61ab-0ac4921d22f5@oracle.com> <151289f6-820c-08d1-c2f9-85b18d1bcaf5@oracle.com> <0749bff1-02ac-841e-4bd7-4a511a90be9d@oracle.com> <9da783ba-edd9-b5fe-0476-644ba7d01990@oracle.com> Message-ID: Hi Harold, On 1/06/2020 8:57 am, Harold Seigel wrote: > Thanks for the comments. > > Here's version 3 of the JDK and VM changes for sealed classes. > > full webrev: > http://cr.openjdk.java.net/~hseigel/sealedClasses.8225056.3/webrev/ > > The new webrev contains just the following three changes: > > 1. The sealed classes API's in Class.java (permittedSubclasses() and > isSealed()) were revised and, in particular, API > permittedSubclasses() no longer uses reflection. For those following along we have presently abandoned the attempt to cache the array in ReflectionData. Current changes look okay. But I note from the CSR there appears to be a further minor update to the javadoc coming. > 2. An unneeded 'if' statement was removed from > JVM_GetPermittedSubclasses() (pointed out by David.) Looks good. > 3. VM runtime test files SealedUnnamedModuleIntfTest.java and > Permitted.java were changed to add a test case for a non-public > permitted subclass and its sealed superclass being in the same > module and package. Looks good. > Additionally, two follow on RFE's will be filed.? One to add additional > VM sealed classes tests Thanks. I think there is a more mechanical approach to testing here that will allow the complete matrix to be easily covered with minimal duplication between testing for named and unnamed modules. > and one to improve the implementations of the > sealed classes API's in Class.java. Thanks. David ----- > Thanks, Harold > > On 5/28/2020 8:30 PM, David Holmes wrote: >> >> Hi Harold, >> >> Sorry Mandy's comment raised a couple of issues ... >> >> On 29/05/2020 7:12 am, Mandy Chung wrote: >>> Hi Harold, >>> >>> On 5/27/20 1:35 PM, Harold Seigel wrote: >>>> >>>> Incremental webrev: >>>> http://cr.openjdk.java.net/~hseigel/sealedClasses.8225056.incr.2/ >>>> >>>> full webrev: >>>> http://cr.openjdk.java.net/~hseigel/sealedClasses.8225056.2/webrev/ >>>> >>> Class.java >>> >>> 4406 ReflectionData rd = reflectionData(); >>> 4407 ClassDesc[] tmp = rd.permittedSubclasses; >>> 4408 if (tmp != null) { >>> 4409 return tmp; >>> 4410 } >>> 4411 >>> 4412 if (isArray() || isPrimitive()) { >>> 4413 rd.permittedSubclasses = new ClassDesc[0]; >>> 4414 return rd.permittedSubclasses; >>> 4415 } >>> >>> This causes an array class or primitive type to create a >>> ReflectionData.?? It should first check if this is non-sealed class >>> and returns a constant empty array. >> >> It can't check if this is a non-sealed class as the isSealed() check >> calls the above code! But for arrays and primitives which can't be >> sealed we should just do: >> >> 4412 if (isArray() || isPrimitive()) { >> 4413 return new ClassDesc[0]; >> 4414 } >> >> But this then made me realize that we need to be creating defensive >> copies of the returned arrays, as happens with other APIs that use >> ReflectionData. >> >> Backing up a bit I complained that: >> >> public boolean isSealed() { >> return permittedSubclasses().length != 0; >> } >> >> is a very inefficient way to answer the question as to whether a class >> is sealed, so I suggested that the result of permittedSubclasses() be >> cached. Caching is not without its own issues as we are discovering, >> and when you add in defensive copies this seems to be trading one >> inefficiency for another. For nestmates we don't cache >> getNestMembers() because we don;t think it will be called often - it >> is there to complete the introspection API of Class rather than being >> anticipated as used in a regular programmatic sense. I expect the same >> is true for permittedSubclasses(). Do we expect isSealed() to be used >> often or is it too just there for completeness? If just for >> completeness then perhaps a VM query would be a better compromise on >> the efficiency front? Otherwise I can accept the current >> implementation of isSealed(), and a non-caching permittedClasses() for >> this initial implementation of sealed classes. If efficiency turns out >> to be a problem for isSealed() then we can revisit it then. >> >> Thanks, >> David >> >> >>> In fact, ReflectionData caches the derived names and reflected >>> members for performance and also they may be invalidated when the >>> class is redefined.?? It might be okay to add >>> ReflectionData::permittedSubclasses while `PermittedSubclasses` >>> attribute can't be redefined and getting this attribute is not >>> performance sensitive.?? For example, the result of `getNestMembers` >>> is not cached in ReflectionData.? It may be better not to add it in >>> ReflectionData for modifiable and performance-sensitive data. >>> >>> >>> 4421 tmp = new ClassDesc[subclassNames.length]; >>> 4422 int i = 0; >>> 4423 for (String subclassName : subclassNames) { >>> 4424 try { >>> 4425 tmp[i++] = ClassDesc.of(subclassName.replace('/', '.')); >>> 4426 } catch (IllegalArgumentException iae) { >>> 4427 throw new InternalError("Invalid type in permitted subclasses >>> information: " + subclassName, iae); >>> 4428 } >>> 4429 } >>> Nit: rename tmp to some other name e.g. descs >>> >>> I read the JVMS but it isn't clear to me that the VM will validate >>> the names in `PermittedSubclasses`attribute are valid class >>> descriptors.?? I see ConstantPool::is_klass_or_reference check but >>> can't find where it validates the name is a valid class descriptor - >>> can you point me there??? (otherwise, maybe define it to be unspecified?) >>> >>> >>> W.r.t. the APIs. I don't want to delay this review.? I see that you >>> renamed the method to new API style as I brought up.? OTOH,? I expect >>> more discussion is needed.? Below is a recent comment from John on >>> this topic [1] >>> >>>> One comment, really for the future, regarding the shape of the Java >>>> API here: It uses Optional and omits the "get" prefix on accessors. >>>> This is the New Style, as opposed to the Classic Style using null >>>> (for "empty" results) and a "get" prefix ("getComponentType") to get >>>> a related type. We may choose to to use the New Style for new >>>> reflection API points, and if so let's not choose N different New >>>> Styles, but one New Style. Personally I like removing "get"; I >>>> accept Optional instead of null; and I also suggest that arrays (if >>>> any) be replaced by (immutable) Lists in the New Style >>> >>> There are a few existing Class APIs that use the new API style: >>> Class arrayClass(); >>> Optional describeConstable(); >>> String descriptorString(); >>> >>> This will set up a precedence of the new API style in this class. >>> Should this new permittedSubclasses method return a List instead of >>> an array?? It's okay with me if you prefer to revert back to the old >>> API style and follow this up after integration. >>> >>> 4442 * Returns true if this {@linkplain Class} is sealed. >>> 4443 * >>> 4444 * @return returns true if this class is sealed >>> >>> NIt: {@code true} instead of true -? consistent with the style this >>> class uses (in most methods) >>> >>> test/jdk/java/lang/reflect/sealed_classes/SealedClassesReflectionTest.java >>> >>> Nit: s/sealed_classes/sealedClasses/ >>> - the test directory/file naming convention use camel case or java >>> variable name convention. >>> >>> Thanks >>> [1] https://github.com/openjdk/valhalla/pull/53#issuecomment-633116043 From david.holmes at oracle.com Mon Jun 1 06:14:50 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 1 Jun 2020 16:14:50 +1000 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> Message-ID: <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> Hi Jiangli, On 29/05/2020 9:02 am, Jiangli Zhou wrote: > (Looping in serviceability-dev at openjdk.java.net ...) > > Hi David and Ioi, > > On Wed, May 27, 2020 at 11:15 PM David Holmes wrote: >> >> Hi Jiangli, >> >> On 28/05/2020 11:35 am, Ioi Lam wrote: >>> >>> >>> On 5/27/20 6:17 PM, Jiangli Zhou wrote: >>>> On Wed, May 27, 2020 at 1:56 PM Ioi Lam wrote: >>>>> On 5/26/20 6:21 PM, Jiangli Zhou wrote: >>>>> >>>>>> Focusing on the link state for archived classes in this thread, I >>>>>> updated the webrev to only set archived boot classes to 'linked' state >>>>>> at restore time. More investigations can be done for archived classes >>>>>> for other builtin loaders. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8232222 >>>>>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/ >>>>>> >>>>>> Please let me know if there is any additional concerns to the change. >>>>>> >>>>>> Best regards, >>>>>> Jiangli >>>>>> >>>>> Hi Jiangli, >>>>> >>>>> I think the change is fine. I am wondering if this >>>>> >>>>> 2530 if (!BytecodeVerificationLocal && >>>>> 2531 loader_data->is_the_null_class_loader_data()) { >>>>> 2532 _init_state = linked; >>>>> 2533 } >>>>> >>>>> >>>>> can be changed to >>>>> >>>>> if (!BytecodeVerificationLocal && >>>>> loader_data->is_the_null_class_loader_data() && >>>>> !JvmtiExport::should_post_class_prepare()) >>>>> >>>>> That way, there's no need to change systemDictionary.cpp. >>>>> >>>>> >>>> I was going to take the suggestion, but realized that it would add >>>> unnecessary complications for archived boot classes with class >>>> pre-initialization support. Some agents may set >>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support >>>> class pre-init uniformly for archived boot classes with >>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>> >>> This would introduce behavioral changes when JVMTI is enabled: >>> >>> + The order of JvmtiExport::post_class_prepare is different than before >>> + JvmtiExport::post_class_prepare may be called for a class that was not >>> called before (if the class is never linked during run time) >>> + JvmtiExport::post_class_prepare was called inside the init_lock, now >>> it's called outside of the init_lock >> >> I have to say I share Ioi's concerns here. This change will impact JVM >> TI agents in a way we can't be sure of. From a specification perspective >> I think we are fine as linking can be lazy or eager, so there's no >> implied order either. But this would be a behavioural change that will >> be observable by agents. (I'm less concerned about the init_lock >> situation as it seems potentially buggy to me to call out to an agent >> with the init_lock held in the first place! I find it hard to imagine an >> agent only working correctly if the init_lock is held.) >> > > Totally agree that we need to be very careful here (that's also part > of the reason why I separated this into an individual RFE for the > dedicated discussion). David, thanks for the analysis from the spec > perspective! Agreed with the init_lock comment also. In the future, I > think we can even get rid of the needs for init_lock completely for > some of the pre-initialized classes. > > This change has gone through extensive testing since the later part of > last year and has been in use (with the default CDS) with agents that > do post_class_prepare. Hopefully that would ease some of the concerns. That is good to know, but that is just one sample of a set of agents. >> This would need a CSR request and involvement of the serviceabilty folk, >> to work through any potential issues. >> > > I've looped in serviceability-dev at openjdk.java.net for this > discussion. Chris or Serguei could you please take a look of the > change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/, > specifically the JvmtiExport::post_class_prepare change in > systemDictionary.cpp. > > Filing a CSR request sounds good to me. The CSR looks after source, > binary, and behavioral compatibility. From a behavior point of view, > the change most likely does not cause any visible effects to a JVMTI > agent (based on what's observed in testing and usages). What should be > included in the CSR? The CSR request should explain the behavioural change that will be observable by agents, and all of the potential compatibility issues that might arise from that - pointing out of course that as the spec (JVMS 5.4**) allows for eager or lazy linking, agents shouldn't be relying on the exact timing or order of events. ** I note this section has some additional constraints regarding dynamically computed constants that might also come into play with this pre-linking for CDS classes. Cheers, David ----- >> Ioi's suggestion avoids this problem, but, as you note, at the expense >> of disabling this optimisation if an agent is attached and wants class >> prepare events. >> > > Right, if we handle that case conditionally, we would alway need to > store the cached static field values separately since the dump time > cannot foresee if the runtime can set boot classes in 'linked' state > (and 'fully_initialized' state with the planned changes) at restore > time. As a result, we need to handle all pre-initialized static fields > like what we are doing today, which is storing them in the archived > class_info_records then installing them to the related fields at > runtime. That causes both unwanted memory and CPU overhead at runtime. > > I also updated the webrev.02 in place with typo fixes. Thanks! > > Best regards, > Jiangli > >> Thanks, >> David >> >>> Thanks >>> - Ioi >>> >>>> >>>>> BTW, I was wondering where the performance came from, so I wrote an >>>>> investigative patch: >>>>> >>>>> diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp >>>>> --- a/src/hotspot/share/oops/instanceKlass.cpp Thu May 21 15:56:27 >>>>> 2020 -0700 >>>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp Wed May 27 10:48:57 >>>>> 2020 -0700 >>>>> @@ -866,6 +866,13 @@ >>>>> return true; >>>>> } >>>>> >>>>> + if (UseSharedSpaces && !BytecodeVerificationLocal && >>>>> is_shared_boot_class()) { >>>>> + Handle h_init_lock(THREAD, init_lock()); >>>>> + ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >>>>> + set_init_state(linked); >>>>> + return true; >>>>> + } >>>>> + >>>>> // trace only the link time for this klass that includes >>>>> // the verification time >>>>> PerfClassTraceTime vmtimer(ClassLoader::perf_class_link_time(), >>>>> >>>>> >>>>> Benchmarking results (smaller numbers are better): >>>>> >>>>> (baseline vs your patch) >>>>> >>>>> baseline jiangli baseline >>>>> jiangli >>>>> 1: 58514375 57755638 (-758737) ----- 40.266 >>>>> 40.135 ( >>>>> -0.131) - >>>>> 2: 58506426 57754623 (-751803) ----- 40.367 >>>>> 39.417 ( >>>>> -0.950) ----- >>>>> 3: 58498554 57759735 (-738819) ----- 40.513 >>>>> 39.970 ( >>>>> -0.543) --- >>>>> 4: 58491265 57751296 (-739969) ----- 40.439 >>>>> 40.268 ( >>>>> -0.171) - >>>>> 5: 58500588 57750975 (-749613) ----- 40.569 >>>>> 40.080 ( >>>>> -0.489) -- >>>>> 6: 58497015 57744418 (-752597) ----- 41.097 >>>>> 40.147 ( >>>>> -0.950) ----- >>>>> 7: 58494335 57749909 (-744426) ----- 39.983 40.214 >>>>> ( 0.231) + >>>>> 8: 58500401 57750305 (-750096) ----- 40.235 40.417 >>>>> ( 0.182) + >>>>> 9: 58490728 57767463 (-723265) ----- 40.354 >>>>> 39.928 ( >>>>> -0.426) -- >>>>> 10: 58497858 57746557 (-751301) ----- 40.756 >>>>> 39.706 ( >>>>> -1.050) ----- >>>>> ============================================================ >>>>> 58499154 57753091 (-746062) ----- 40.457 >>>>> 40.027 ( >>>>> -0.430) -- >>>>> instr delta = -746062 -1.2753% >>>>> time delta = -0.430 ms -1.0619% >>>>> >>>>> >>>>> (baseline vs my patch) >>>>> >>>>> baseline ioi baseline ioi >>>>> 1: 58503574 57821124 (-682450) ----- 40.554 39.783 ( >>>>> -0.771) ----- >>>>> 2: 58499325 57819459 (-679866) ----- 40.092 40.325 >>>>> ( 0.233) ++ >>>>> 3: 58492362 57811978 (-680384) ----- 40.546 >>>>> 39.826 ( >>>>> -0.720) ----- >>>>> 4: 58488655 57828878 (-659777) ----- 40.270 40.550 >>>>> ( 0.280) ++ >>>>> 5: 58501567 57830179 (-671388) ----- 40.382 >>>>> 40.145 ( >>>>> -0.237) -- >>>>> 6: 58496552 57808774 (-687778) ----- 40.702 >>>>> 40.527 ( >>>>> -0.175) - >>>>> 7: 58482701 57808925 (-673776) ----- 40.268 >>>>> 39.849 ( >>>>> -0.419) --- >>>>> 8: 58493831 57807810 (-686021) ----- 40.396 >>>>> 39.940 ( >>>>> -0.456) --- >>>>> 9: 58489388 57811354 (-678034) ----- 40.575 >>>>> 40.078 ( >>>>> -0.497) --- >>>>> 10: 58482512 57795489 (-687023) ----- 40.084 40.247 >>>>> ( 0.163) + >>>>> ============================================================ >>>>> 58493046 57814396 (-678650) ----- 40.386 >>>>> 40.126 ( >>>>> -0.260) -- >>>>> instr delta = -678650 -1.1602% >>>>> time delta = -0.260 ms -0.6445% >>>>> >>>>> >>>>> (your patch vs my patch) >>>>> >>>>> jiangli ioi jiangli ioi >>>>> 1: 57716711 57782622 ( 65911) ++++ 41.042 40.302 ( >>>>> -0.740) ----- >>>>> 2: 57709666 57780196 ( 70530) ++++ 40.334 40.965 ( >>>>> 0.631) ++++ >>>>> 3: 57716074 57803315 ( 87241) +++++ 40.239 39.823 ( >>>>> -0.416) --- >>>>> 4: 57725152 57782719 ( 57567) +++ 40.430 39.805 ( >>>>> -0.625) ---- >>>>> 5: 57719799 57787187 ( 67388) ++++ 40.138 40.003 ( >>>>> -0.135) - >>>>> 6: 57721922 57769193 ( 47271) +++ 40.324 40.207 ( >>>>> -0.117) - >>>>> 7: 57716438 57785212 ( 68774) ++++ 39.978 40.149 ( >>>>> 0.171) + >>>>> 8: 57713834 57778797 ( 64963) ++++ 40.359 40.210 ( >>>>> -0.149) - >>>>> 9: 57711272 57786376 ( 75104) ++++ 40.575 40.724 ( >>>>> 0.149) + >>>>> 10: 57711660 57780548 ( 68888) ++++ 40.291 40.091 ( >>>>> -0.200) - >>>>> ============================================================ >>>>> 57716252 57783615 ( 67363) ++++ 40.370 40.226 ( >>>>> -0.144) - >>>>> instr delta = 67363 0.1167% >>>>> time delta = -0.144 ms -0.3560% >>>>> >>>>> >>>>> These numbers show that the majority of the time spent (678650 >>>>> instructions) inside InstanceKlass::link_class_impl is spent from the >>>>> PerfClassTraceTime. Walking of the class hierarchy and taking the >>>>> h_init_lock only takes about 67363 instructions). >>>>> >>>>> Due to this finding, I filed two more RFEs: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>>> PerfClassTraceTime slows down VM start-up >>>>> >>>> It's related to JDK-8246020, and I've commented on the bug (see >>>> JDK-8246020 comments). UsePerfData for perf data collection is common >>>> in cloud usages. It's better to keep UsePerfData enabled by default. >>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8246015 >>>>> Method::link_method is called twice for CDS methods >>>> >>>> That was addressed as part of the initial change for JDK-8232222: >>>> http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html >>>> >>>> >>>> It's cleaner to handle it separately, so I removed it from the latest >>>> version. I've assigned JDK-8246015 to myself and will address it >>>> separately. Thanks for recording the separate bug. >>>> >>>> Thanks! >>>> Jiangli >>>> >>>>> >>>>> Thanks >>>>> - Ioi >>> From serguei.spitsyn at oracle.com Mon Jun 1 06:16:18 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sun, 31 May 2020 23:16:18 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> Message-ID: <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Jun 1 06:20:22 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sun, 31 May 2020 23:20:22 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> Message-ID: <7bff2ddc-cb63-1eb4-3ac1-c0b0e2f7a0da@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon Jun 1 07:10:51 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 1 Jun 2020 17:10:51 +1000 Subject: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently In-Reply-To: References: Message-ID: <814ceeca-d700-500e-de82-7d5fd3c05192@oracle.com> Hi Daniil, On 30/05/2020 10:07 am, Daniil Titov wrote: > Please review a change [1] that fixes an intermittent test timeout. > > The main logic of the test has this basic structure: > > try { > ??// lots of thread state manipulation of target > } > finally { > ??thread.getLog(); > } > > and as David noticed in his comment ( the last comment in [2] ) if an exception occurs anywhere > in the try block we can hang waiting for the join() in getLog() because we haven't executed the logic that > tells the thread to terminate. So the fix puts a timeout on the join() which means the test will no longer timeout but it will still fail when whatever was leading to the timeout now happens. So as a diagnostic fix this seems fine. Hopefully the logger will show what we need to see and determine the real underlying problem. Thanks, David ----- > Testing: Running a modified test that explicitly throws a runtime exception inside the try block shows the fix solves the problem. > Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are in progress. > > [1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/ > [2] https://bugs.openjdk.java.net/browse/JDK-8081652 > > Thank you, > Daniil > > > From david.holmes at oracle.com Mon Jun 1 07:46:45 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 1 Jun 2020 17:46:45 +1000 Subject: RFR(XS): 8234882: JVM TI StopThread should only allow ThreadDeath In-Reply-To: <6ebc70ce-787d-7f13-66f4-14ad8c8102d6@oracle.com> References: <12cd04f9-c3f9-654f-fff2-1c4e315b6eeb@oracle.com> <3feb9c3f-4f61-f4b7-160f-c6b328305111@oracle.com> <40f21609-f086-722a-1af4-3f281c9b8963@oracle.com> <7b272791-4c47-27b0-9313-391a9e620295@oracle.com> <38db06ac-6e4e-029a-9376-ee577afe64d7@oracle.com> <2ce42985-9325-1c74-fa8d-c2a5049ec011@oracle.com> <0f1ec272-4410-f7e5-1c11-1238c0079b00@oracle.com> <3120b170-8d0f-7915-7224-f44523bdae6e@oracle.com> <586c3878-d175-2f8e-6ce8-95a187965de6@oracle.com> <2586bb75-f560-f905-1937-b778b7faba59@oracle.com> <6ebc70ce-787d-7f13-66f4-14ad8c8102d6@oracle.com> Message-ID: <25f4a64a-10ca-2695-6748-ccd24d84ef22@oracle.com> Hi Serguei, Sorry, I think we have to re-think this change. As Dan flags in the CSR request debuggers directly expose this API as part of the debugger interface, so any change here will directly impact those tools. At a minimum I think we would need to consult with the tool developers about the impact of making this change, as well as whether it makes any practical difference in the sense that there may be other (less convenient but still available) mechanisms to achieve the same goal in a debugger or agent. David On 31/05/2020 5:50 pm, serguei.spitsyn at oracle.com wrote: > Hi David, > > Also jumping to end. > > On 5/30/20 06:50, David Holmes wrote: >> Hi Serguei, >> >> Jumping to the end for now ... >> >> On 30/05/2020 5:50 am, serguei.spitsyn at oracle.com wrote: >>> Hi David and reviewers, >>> >>> The updated webrev version is: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/src/ >>> >>> >>> This update adds testing that StopThread can return >>> JVMTI_ERROR_INVALID_OBJECT error code. >>> >>> The JVM TI StopThread spec is: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/docs/specs/jvmti.html#StopThread >>> >>> >>> >>> There is a couple of comments below. >>> >>> >>> On 5/29/20 06:18, David Holmes wrote: >>>> On 29/05/2020 6:24 pm, serguei.spitsyn at oracle.com wrote: >>>>> On 5/29/20 00:56, serguei.spitsyn at oracle.com wrote: >>>>>> On 5/29/20 00:42, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> Thank you for reviewing this! >>>>>>> >>>>>>> On 5/28/20 23:57, David Holmes wrote: >>>>>>>> Hi Serguei, >>>>>>>> >>>>>>>> On 28/05/2020 3:12 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> I've updated the CSR and webrev in place. >>>>>>>>> >>>>>>>>> The changes are: >>>>>>>>> ??- addressed David's suggestion to rephrase StopThread >>>>>>>>> description change >>>>>>>>> ??- replaced JVMTI_ERROR_INVALID_OBJECT with >>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>> ??- updated the implementation in jvmtiEnv.cpp to return >>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>> ??- updated one of the nsk.jvmti StopThread tests to check >>>>>>>>> error case with the JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm reposting the links for convenience. >>>>>>>>> >>>>>>>>> Enhancement: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>> >>>>>>>>> CSR draft: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>> >>>>>>>> Spec updates are good - thanks. >>>>>>> >>>>>>> Thank you for the CSR review. >>>>>>> >>>>>>>>> Webrev: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>> >>>>>>>> >>>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>>>> >>>>>>>> The ThreadDeath check is fine but I'm a bit confused about the >>>>>>>> additional null check that leads to JVMTI_ERROR_INVALID_OBJECT. >>>>>>>> I can't see how resolve_external_guard can return NULL when not >>>>>>>> passed in NULL. Nor why that would result in >>>>>>>> JVMTI_ERROR_INVALID_OBJECT rather than JVMTI_ERROR_NULL_POINTER. >>>>>>>> And I note JVMTI_ERROR_NULL_POINTER is not even a listed error >>>>>>>> for StopThread! This part of the change seems unrelated to this >>>>>>>> issue. >>>>>>> >>>>>>> I was also surprised with the JVMTI_ERROR_NULL_POINTER and >>>>>>> JVMTI_ERROR_INVALID_OBJECT error codes. >>>>>>> The JVM TI spec automatic generation adds these two error codes >>>>>>> for a jobject parameter. >>>>>>> >>>>>>> Also, they both are from the Universal Errors section: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#universal-error >>>>>>> >>>>>>> >>>>>>> You can find a link to this section at the start of the Error >>>>>>> section: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>> >>>>>>> >>>>>>> My understanding (not sure, it is right) is that NULL has to be >>>>>>> reported with JVMTI_ERROR_NULL_POINTER and a bad >>>>>>> jobject (for instance, a WeakReference with a GC-ed target) has >>>>>>> to be reported with JVMTI_ERROR_INVALID_OBJECT. >>>>>>> At least, I was not able to construct a test case to get this >>>>>>> error code returned. >>>>>>> So, I'm puzzled with this. I'll try to find some examples with >>>>>>> JVMTI_ERROR_NULL_POINTER errors. >>>>>> >>>>>> Found the explanation. >>>>>> The JDI file: >>>>>> src/jdk.jdi/share/classes/com/sun/tools/jdi/JDWPException.java >>>>>> >>>>>> has a fragment that translate the INVALID_OBJECT error to the >>>>>> ObjectCollectedException: >>>>>> ??? RuntimeException toJDIException() { >>>>>> ??????? switch (errorCode) { >>>>>> ??????????? case JDWP.Error.INVALID_OBJECT: >>>>>> ??????????????? return new ObjectCollectedException(); >>>>>> >>>>>> So, the INVALID_OBJECT is for a jobject handle that is referencing >>>>>> a collected object. >>>>>> It means that previous implementation incorrectly returned >>>>>> JVMTI_ERROR_NULL_POINTER error code. >>>>> >>>>> I should create and delete local or global ref to construct a test >>>>> case for this. >>>>> >>>>> Interesting that the JDWPException::toJDIException() does not >>>>> convert the ILLEGAL_ARGUMENT error code to an >>>>> IllegalArgumentException. >>>>> I've just added this conversion. >>>> >>>> Given the definition of JDWP INVALID_OBJECT then obviously JDI >>>> converts it to ObjectCollectedException. >>>> >>>> So reading further in JNI spec: >>>> >>>> "Weak global references are a special kind of global reference. >>>> Unlike normal global references, a weak global reference allows the >>>> underlying Java object to be garbage collected. Weak global >>>> references may be used in any situation where global or local >>>> references are used." >>>> >>>> So it seems that any function that takes a jobject cxould in fact >>>> accept a jweak, in which case JVMTI_ERROR_INVALID_OBJECT is a >>>> possibility in all cases. So IIUC JNIHandles::resolve_external_guard >>>> can return NULL if a weak reference has been collected. So the new >>>> code you propose seems correct. >>> >>> You are right about weak global references. >>> I was able to construct a test case for JVMTI_ERROR_INVALID_OBJECT. >>> The JNI NewGlobalRef and DeleteGlobalRef are used for it. >>> You can find it in the updated webrev version. >>> >>>> However, this still is unrelated to the current issue and I do not >>>> see other JVM TI doing checks for this case. So this seems to be a >>>> much broader issue. >>> There are many such checks in JVM TI. >>> For instance, there are checks like the following in jvmtiEnv.cpp: >>> NULL_CHECK(o, JVMTI_ERROR_INVALID_OBJECT) >> >> Yes but they are incorrect IMO e.g. >> >> JvmtiEnv::GetObjectSize(jobject object, jlong* size_ptr) { >> ? oop mirror = JNIHandles::resolve_external_guard(object); >> ? NULL_CHECK(mirror, JVMTI_ERROR_INVALID_OBJECT); >> >> The NULL_CHECK will fail if either object is NULL or object is a jweak >> that has been cleared. In the first case it should report >> JVMTI_ERROR_NULL_POINTER. >> >> The correct pattern is what you proposed with this fix: >> >> +?? NULL_CHECK(exception, JVMTI_ERROR_NULL_POINTER); >> ??? oop e = JNIHandles::resolve_external_guard(exception); >> +?? // the exception must be a valid jobject >> +?? if (e == NULL) { >> +???? return JVMTI_ERROR_INVALID_OBJECT; >> +?? } >> > > I see your point, thanks! > I'll check these cases and file a bug if necessary. > >> Though not sure why you didn't use a second NULL_CHECK > > I've already replaced it with: > ? NULL_CHECK(e, JVMTI_ERROR_INVALID_OBJECT); > > You, probably, need to refresh the webrev page. > > Thanks, > Serguei > > >> David >> ----- >> >>> Thanks, >>> Serguei >>> >>>> >>>> David >>>> ----- >>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/StopThread/stopthrd006/TestDescription.java >>>>>>>> >>>>>>>> >>>>>>>> The copyright year should be change to "2018, 2020,". >>>>>>> Thank you for the catch. >>>>>>> I planned to update the copyright comments. >>>>>>> >>>>>>>> I'm a little surprised the test doesn't actually check that a >>>>>>>> valid call doesn't produce an error. But that's an existing >>>>>>>> quirk of the test and not something you need to address here (if >>>>>>>> indeed it needs addressing - perhaps there is another test for >>>>>>>> that). >>>>>>> >>>>>>> There are plenty of other nsk.jvmti tests which check valid calls. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> >>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The old webrev and spec are here: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.0/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> On 5/27/20 18:03, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 5/27/20 02:00, David Holmes wrote: >>>>>>>>>>> On 27/05/2020 6:36 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 5/27/20 00:47, David Holmes wrote: >>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>> >>>>>>>>>>>>> On 27/05/2020 1:01 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>>>> >>>>>>>>>>>>>> CSR draft (one CSR reviewer is needed before finalizing it): >>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>>>> >>>>>>>>>>>>> I have some thoughts on the wording which I will add to the >>>>>>>>>>>>> CSR. >>>>>>>>>>>> >>>>>>>>>>>> Thank you a lot for looking at this! >>>>>>>>>>>> >>>>>>>>>>>>> Also on reflection I think JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>> would the best error to use, and it has an equivalent in >>>>>>>>>>>>> JDWP and at the Java level for JDI. >>>>>>>>>>>> >>>>>>>>>>>> This is an interesting variant, thanks! >>>>>>>>>>>> We need to balance on several criteria: >>>>>>>>>>>> ??1) Compatibility: keep returning error as close as >>>>>>>>>>>> possible to the current spec >>>>>>>>>>> >>>>>>>>>>> If you are adding a new error condition I don't understand >>>>>>>>>>> what you mean by "close to the current spec" ?? >>>>>>>>>> >>>>>>>>>> If the JVMTI_ERROR_INVALID_OBJECT is returned than the JDWP >>>>>>>>>> agent does not need any new error handling. >>>>>>>>>> The same can be true in the JDI if the JDWP returns the same >>>>>>>>>> error as it returned before. >>>>>>>>>> In this case we do not add new error code but extend the >>>>>>>>>> existing to cover new error condition. >>>>>>>>>> >>>>>>>>>> But, in fact (especially, after rethinking), I do not like the >>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>> error code as it normally means something different. >>>>>>>>>> So, let's avoid using it and skip this criteria. >>>>>>>>>> Then we need new error code to cover new error condition. >>>>>>>>>> >>>>>>>>>>>> ??2) Best error naming match between JVM TI and JDI/JDWP >>>>>>>>>>>> ??3) Best practice in errors naming >>>>>>>>>>> >>>>>>>>>>> If the argument is not a ThreadDeath instance then it is an >>>>>>>>>>> illegal argument - perfect fit semantically all the specs >>>>>>>>>>> involved have an "illegal argument" error form. >>>>>>>>>> >>>>>>>>>> I agree with this. >>>>>>>>>> It is why I like this suggestion. :) >>>>>>>>>> The JDWP equivalent is: ILLEGAL_ARGUMENT. >>>>>>>>>> The JDI equivalent is:? IllegalArgumentException >>>>>>>>>> >>>>>>>>>> I'll prepare and send the update. >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> I think the #1 is most important but will look at it once more. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ?? The JVM TI StopThread method mirrored the functionality >>>>>>>>>>>>>> of the >>>>>>>>>>>>>> ?? java.lang.Thread::stop(Throwable t) method, in that it >>>>>>>>>>>>>> allows any exception >>>>>>>>>>>>>> ?? type to be installed as an asynchronous exception in >>>>>>>>>>>>>> the target thread. >>>>>>>>>>>>>> ?? However, the java.lang.Thread::stop(Throwable t) method >>>>>>>>>>>>>> was inherently unsafe >>>>>>>>>>>>>> ?? and in Java 8 (under JDK-7059085) it was "retired" so >>>>>>>>>>>>>> that it always threw >>>>>>>>>>>>>> ?? UnsupportedOperationException. >>>>>>>>>>>>>> ?? The updated JVM TI StopThread spec disallows an >>>>>>>>>>>>>> arbitrary Throwable from being passed, >>>>>>>>>>>>>> ?? and instead restricts the argument to being an instance >>>>>>>>>>>>>> of ThreadDeath, thus >>>>>>>>>>>>>> ?? mirroring the (deprecated but still functional) >>>>>>>>>>>>>> java.lang.Thread::stop() method. >>>>>>>>>>>>>> ?? The error JVMTI_ERROR_INVALID_OBJECT is returned if the >>>>>>>>>>>>>> exception argument >>>>>>>>>>>>>> ?? is not an instance of ThreadDeath. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ?? Also, I will file similar RFE and CSR on the JDI and >>>>>>>>>>>>>> JDWP spec. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Testing: >>>>>>>>>>>>>> ?? Built docs and checked the doc has been generated as >>>>>>>>>>>>>> expected. >>>>>>>>>>>>>> ?? Will run the nsk.jvmti tests locally. >>>>>>>>>>>>>> ?? Will submit hs-tiers1-3 to make sure there are no >>>>>>>>>>>>>> regressions in the JVM TI and JDI tests. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>> > From fairoz.matte at oracle.com Mon Jun 1 08:27:23 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 1 Jun 2020 01:27:23 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken Message-ID: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> Hi, Please review this small test infra change to identify at runtime the JFR is active or not. JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ Thanks, Fairoz From erik.gahlin at oracle.com Mon Jun 1 09:00:55 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 1 Jun 2020 11:00:55 +0200 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> Message-ID: <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> Hi Fairoz, If the test needs to run with builds where the JFR module is not present(?), you need to do the check using reflection. If not, looks good. Erik > On 1 Jun 2020, at 10:27, Fairoz Matte wrote: > > Hi, > > Please review this small test infra change to identify at runtime the JFR is active or not. > > JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > > Thanks, > Fairoz From fairoz.matte at oracle.com Mon Jun 1 10:30:29 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 1 Jun 2020 03:30:29 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> Message-ID: Hi Erik, Thanks for your quick response, Below is the updated webrev to handle if jfr module is not present http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ Thanks, Fairoz > -----Original Message----- > From: Erik Gahlin > Sent: Monday, June 1, 2020 2:31 PM > To: Fairoz Matte > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect > and corresponsing logic seems to be broken > > Hi Fairoz, > > If the test needs to run with builds where the JFR module is not present(?), you > need to do the check using reflection. > > If not, looks good. > > Erik > > > On 1 Jun 2020, at 10:27, Fairoz Matte wrote: > > > > Hi, > > > > Please review this small test infra change to identify at runtime the JFR is > active or not. > > > > JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > > Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > > > > Thanks, > > Fairoz > From erik.gahlin at oracle.com Mon Jun 1 10:55:31 2020 From: erik.gahlin at oracle.com (Erik Gahlin) Date: Mon, 1 Jun 2020 12:55:31 +0200 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> Message-ID: Hi Fairoz, What I think you need to do is something like this: ??????? if (className.equals("java.lang.Thread")) { ??????????? return !isJfrInitialized(); ??????? } ... ??? private static boolean isJfrInitialized() { ??????? try { ??????????? Class clazz = Class.forName("jdk.jfr.FlightRecorder"); ??????????? Method method = clazz.getDeclaredMethod("isInitialized", new Class[0]); ??????????? return (boolean) method.invoke(null, new Object[0]); ??????? } catch (Exception e) { ??????????? return false; ??????? } ??? } Erik On 2020-06-01 12:30, Fairoz Matte wrote: > Hi Erik, > > Thanks for your quick response, > Below is the updated webrev to handle if jfr module is not present > http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ > > Thanks, > Fairoz > >> -----Original Message----- >> From: Erik Gahlin >> Sent: Monday, June 1, 2020 2:31 PM >> To: Fairoz Matte >> Cc: serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect >> and corresponsing logic seems to be broken >> >> Hi Fairoz, >> >> If the test needs to run with builds where the JFR module is not present(?), you >> need to do the check using reflection. >> >> If not, looks good. >> >> Erik >> >>> On 1 Jun 2020, at 10:27, Fairoz Matte wrote: >>> >>> Hi, >>> >>> Please review this small test infra change to identify at runtime the JFR is >> active or not. >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 >>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ >>> >>> Thanks, >>> Fairoz From harold.seigel at oracle.com Mon Jun 1 13:07:04 2020 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 1 Jun 2020 09:07:04 -0400 Subject: RFR: JDK-8225056 VM support for sealed classes In-Reply-To: References: <7b3430e1-f821-1e2d-2c8b-f1c621f059da@oracle.com> <9d7da8af-cda3-693d-1ea1-1db5069fea97@oracle.com> <9b32addd-d576-268e-61ab-0ac4921d22f5@oracle.com> <151289f6-820c-08d1-c2f9-85b18d1bcaf5@oracle.com> <0749bff1-02ac-841e-4bd7-4a511a90be9d@oracle.com> <9da783ba-edd9-b5fe-0476-644ba7d01990@oracle.com> Message-ID: Hi David, Thanks for reviewing the latest changes. I'll create the follow on RFE's once the sealed classes code is in mainline. Harold On 5/31/2020 9:34 PM, David Holmes wrote: > Hi Harold, > > On 1/06/2020 8:57 am, Harold Seigel wrote: >> Thanks for the comments. >> >> Here's version 3 of the JDK and VM changes for sealed classes. >> >> full webrev: >> http://cr.openjdk.java.net/~hseigel/sealedClasses.8225056.3/webrev/ >> >> The new webrev contains just the following three changes: >> >> ?1. The sealed classes API's in Class.java (permittedSubclasses() and >> ??? isSealed()) were revised and, in particular, API >> ??? permittedSubclasses() no longer uses reflection. > > For those following along we have presently abandoned the attempt to > cache the array in ReflectionData. > > Current changes look okay. But I note from the CSR there appears to be > a further minor update to the javadoc coming. > >> ?2. An unneeded 'if' statement was removed from >> ??? JVM_GetPermittedSubclasses() (pointed out by David.) > > Looks good. > >> ?3. VM runtime test files SealedUnnamedModuleIntfTest.java and >> ??? Permitted.java were changed to add a test case for a non-public >> ??? permitted subclass and its sealed superclass being in the same >> ??? module and package. > > Looks good. > >> Additionally, two follow on RFE's will be filed.? One to add >> additional VM sealed classes tests > > Thanks. I think there is a more mechanical approach to testing here > that will allow the complete matrix to be easily covered with minimal > duplication between testing for named and unnamed modules. > >> and one to improve the implementations of the sealed classes API's in >> Class.java. > > Thanks. > > David > ----- > >> Thanks, Harold >> >> On 5/28/2020 8:30 PM, David Holmes wrote: >>> >>> Hi Harold, >>> >>> Sorry Mandy's comment raised a couple of issues ... >>> >>> On 29/05/2020 7:12 am, Mandy Chung wrote: >>>> Hi Harold, >>>> >>>> On 5/27/20 1:35 PM, Harold Seigel wrote: >>>>> >>>>> Incremental webrev: >>>>> http://cr.openjdk.java.net/~hseigel/sealedClasses.8225056.incr.2/ >>>>> >>>>> full webrev: >>>>> http://cr.openjdk.java.net/~hseigel/sealedClasses.8225056.2/webrev/ >>>>> >>>> Class.java >>>> >>>> 4406 ReflectionData rd = reflectionData(); >>>> 4407 ClassDesc[] tmp = rd.permittedSubclasses; >>>> 4408 if (tmp != null) { >>>> 4409 return tmp; >>>> 4410 } >>>> 4411 >>>> 4412 if (isArray() || isPrimitive()) { >>>> 4413 rd.permittedSubclasses = new ClassDesc[0]; >>>> 4414 return rd.permittedSubclasses; >>>> 4415 } >>>> >>>> This causes an array class or primitive type to create a >>>> ReflectionData.?? It should first check if this is non-sealed class >>>> and returns a constant empty array. >>> >>> It can't check if this is a non-sealed class as the isSealed() check >>> calls the above code! But for arrays and primitives which can't be >>> sealed we should just do: >>> >>> 4412 if (isArray() || isPrimitive()) { >>> 4413 return new ClassDesc[0]; >>> 4414 } >>> >>> But this then made me realize that we need to be creating defensive >>> copies of the returned arrays, as happens with other APIs that use >>> ReflectionData. >>> >>> Backing up a bit I complained that: >>> >>> public boolean isSealed() { >>> return permittedSubclasses().length != 0; >>> } >>> >>> is a very inefficient way to answer the question as to whether a >>> class is sealed, so I suggested that the result of >>> permittedSubclasses() be cached. Caching is not without its own >>> issues as we are discovering, and when you add in defensive copies >>> this seems to be trading one inefficiency for another. For nestmates >>> we don't cache getNestMembers() because we don;t think it will be >>> called often - it is there to complete the introspection API of >>> Class rather than being anticipated as used in a regular >>> programmatic sense. I expect the same is true for >>> permittedSubclasses(). Do we expect isSealed() to be used often or >>> is it too just there for completeness? If just for completeness then >>> perhaps a VM query would be a better compromise on the efficiency >>> front? Otherwise I can accept the current implementation of >>> isSealed(), and a non-caching permittedClasses() for this initial >>> implementation of sealed classes. If efficiency turns out to be a >>> problem for isSealed() then we can revisit it then. >>> >>> Thanks, >>> David >>> >>> >>>> In fact, ReflectionData caches the derived names and reflected >>>> members for performance and also they may be invalidated when the >>>> class is redefined.?? It might be okay to add >>>> ReflectionData::permittedSubclasses while `PermittedSubclasses` >>>> attribute can't be redefined and getting this attribute is not >>>> performance sensitive.?? For example, the result of >>>> `getNestMembers` is not cached in ReflectionData.? It may be better >>>> not to add it in ReflectionData for modifiable and >>>> performance-sensitive data. >>>> >>>> >>>> 4421 tmp = new ClassDesc[subclassNames.length]; >>>> 4422 int i = 0; >>>> 4423 for (String subclassName : subclassNames) { >>>> 4424 try { >>>> 4425 tmp[i++] = ClassDesc.of(subclassName.replace('/', '.')); >>>> 4426 } catch (IllegalArgumentException iae) { >>>> 4427 throw new InternalError("Invalid type in permitted subclasses >>>> information: " + subclassName, iae); >>>> 4428 } >>>> 4429 } >>>> Nit: rename tmp to some other name e.g. descs >>>> >>>> I read the JVMS but it isn't clear to me that the VM will validate >>>> the names in `PermittedSubclasses`attribute are valid class >>>> descriptors.?? I see ConstantPool::is_klass_or_reference check but >>>> can't find where it validates the name is a valid class descriptor >>>> - can you point me there??? (otherwise, maybe define it to be >>>> unspecified?) >>>> >>>> >>>> W.r.t. the APIs. I don't want to delay this review.? I see that you >>>> renamed the method to new API style as I brought up.? OTOH,? I >>>> expect more discussion is needed.? Below is a recent comment from >>>> John on this topic [1] >>>> >>>>> One comment, really for the future, regarding the shape of the >>>>> Java API here: It uses Optional and omits the "get" prefix on >>>>> accessors. This is the New Style, as opposed to the Classic Style >>>>> using null (for "empty" results) and a "get" prefix >>>>> ("getComponentType") to get a related type. We may choose to to >>>>> use the New Style for new reflection API points, and if so let's >>>>> not choose N different New Styles, but one New Style. Personally I >>>>> like removing "get"; I accept Optional instead of null; and I also >>>>> suggest that arrays (if any) be replaced by (immutable) Lists in >>>>> the New Style >>>> >>>> There are a few existing Class APIs that use the new API style: >>>> Class arrayClass(); >>>> Optional describeConstable(); >>>> String descriptorString(); >>>> >>>> This will set up a precedence of the new API style in this class.? >>>> Should this new permittedSubclasses method return a List instead of >>>> an array?? It's okay with me if you prefer to revert back to the >>>> old API style and follow this up after integration. >>>> >>>> 4442 * Returns true if this {@linkplain Class} is sealed. >>>> 4443 * >>>> 4444 * @return returns true if this class is sealed >>>> >>>> NIt: {@code true} instead of true -? consistent with the style this >>>> class uses (in most methods) >>>> >>>> test/jdk/java/lang/reflect/sealed_classes/SealedClassesReflectionTest.java >>>> >>>> >>>> Nit: s/sealed_classes/sealedClasses/ >>>> - the test directory/file naming convention use camel case or java >>>> variable name convention. >>>> >>>> Thanks >>>> [1] https://github.com/openjdk/valhalla/pull/53#issuecomment-633116043 From serguei.spitsyn at oracle.com Mon Jun 1 15:30:57 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 1 Jun 2020 08:30:57 -0700 Subject: RFR(XS): 8234882: JVM TI StopThread should only allow ThreadDeath In-Reply-To: <25f4a64a-10ca-2695-6748-ccd24d84ef22@oracle.com> References: <12cd04f9-c3f9-654f-fff2-1c4e315b6eeb@oracle.com> <3feb9c3f-4f61-f4b7-160f-c6b328305111@oracle.com> <40f21609-f086-722a-1af4-3f281c9b8963@oracle.com> <7b272791-4c47-27b0-9313-391a9e620295@oracle.com> <38db06ac-6e4e-029a-9376-ee577afe64d7@oracle.com> <2ce42985-9325-1c74-fa8d-c2a5049ec011@oracle.com> <0f1ec272-4410-f7e5-1c11-1238c0079b00@oracle.com> <3120b170-8d0f-7915-7224-f44523bdae6e@oracle.com> <586c3878-d175-2f8e-6ce8-95a187965de6@oracle.com> <2586bb75-f560-f905-1937-b778b7faba59@oracle.com> <6ebc70ce-787d-7f13-66f4-14ad8c8102d6@oracle.com> <25f4a64a-10ca-2695-6748-ccd24d84ef22@oracle.com> Message-ID: <1190375b-d7da-47c4-61d4-121f4d0ba33a@oracle.com> Hi David, I'll check with JetBrains on this. Thank you to Dan and you for raising this concern. The JetBrains use case you posted in the CSR looks like valid and useful. Thanks, Serguei On 6/1/20 00:46, David Holmes wrote: > Hi Serguei, > > Sorry, I think we have to re-think this change. As Dan flags in the > CSR request debuggers directly expose this API as part of the debugger > interface, so any change here will directly impact those tools. At a > minimum I think we would need to consult with the tool developers > about the impact of making this change, as well as whether it makes > any practical difference in the sense that there may be other (less > convenient but still available) mechanisms to achieve the same goal in > a debugger or agent. > > David > > On 31/05/2020 5:50 pm, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> Also jumping to end. >> >> On 5/30/20 06:50, David Holmes wrote: >>> Hi Serguei, >>> >>> Jumping to the end for now ... >>> >>> On 30/05/2020 5:50 am, serguei.spitsyn at oracle.com wrote: >>>> Hi David and reviewers, >>>> >>>> The updated webrev version is: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/src/ >>>> >>>> >>>> This update adds testing that StopThread can return >>>> JVMTI_ERROR_INVALID_OBJECT error code. >>>> >>>> The JVM TI StopThread spec is: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/docs/specs/jvmti.html#StopThread >>>> >>>> >>>> >>>> There is a couple of comments below. >>>> >>>> >>>> On 5/29/20 06:18, David Holmes wrote: >>>>> On 29/05/2020 6:24 pm, serguei.spitsyn at oracle.com wrote: >>>>>> On 5/29/20 00:56, serguei.spitsyn at oracle.com wrote: >>>>>>> On 5/29/20 00:42, serguei.spitsyn at oracle.com wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Thank you for reviewing this! >>>>>>>> >>>>>>>> On 5/28/20 23:57, David Holmes wrote: >>>>>>>>> Hi Serguei, >>>>>>>>> >>>>>>>>> On 28/05/2020 3:12 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> I've updated the CSR and webrev in place. >>>>>>>>>> >>>>>>>>>> The changes are: >>>>>>>>>> ??- addressed David's suggestion to rephrase StopThread >>>>>>>>>> description change >>>>>>>>>> ??- replaced JVMTI_ERROR_INVALID_OBJECT with >>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>> ??- updated the implementation in jvmtiEnv.cpp to return >>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>> ??- updated one of the nsk.jvmti StopThread tests to check >>>>>>>>>> error case with the JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm reposting the links for convenience. >>>>>>>>>> >>>>>>>>>> Enhancement: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>> >>>>>>>>>> CSR draft: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>> >>>>>>>>> Spec updates are good - thanks. >>>>>>>> >>>>>>>> Thank you for the CSR review. >>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>> >>>>>>>>> >>>>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>>>>> >>>>>>>>> The ThreadDeath check is fine but I'm a bit confused about the >>>>>>>>> additional null check that leads to >>>>>>>>> JVMTI_ERROR_INVALID_OBJECT. I can't see how >>>>>>>>> resolve_external_guard can return NULL when not passed in >>>>>>>>> NULL. Nor why that would result in JVMTI_ERROR_INVALID_OBJECT >>>>>>>>> rather than JVMTI_ERROR_NULL_POINTER. And I note >>>>>>>>> JVMTI_ERROR_NULL_POINTER is not even a listed error for >>>>>>>>> StopThread! This part of the change seems unrelated to this >>>>>>>>> issue. >>>>>>>> >>>>>>>> I was also surprised with the JVMTI_ERROR_NULL_POINTER and >>>>>>>> JVMTI_ERROR_INVALID_OBJECT error codes. >>>>>>>> The JVM TI spec automatic generation adds these two error codes >>>>>>>> for a jobject parameter. >>>>>>>> >>>>>>>> Also, they both are from the Universal Errors section: >>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#universal-error >>>>>>>> >>>>>>>> >>>>>>>> You can find a link to this section at the start of the Error >>>>>>>> section: >>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>> >>>>>>>> >>>>>>>> My understanding (not sure, it is right) is that NULL has to be >>>>>>>> reported with JVMTI_ERROR_NULL_POINTER and a bad >>>>>>>> jobject (for instance, a WeakReference with a GC-ed target) has >>>>>>>> to be reported with JVMTI_ERROR_INVALID_OBJECT. >>>>>>>> At least, I was not able to construct a test case to get this >>>>>>>> error code returned. >>>>>>>> So, I'm puzzled with this. I'll try to find some examples with >>>>>>>> JVMTI_ERROR_NULL_POINTER errors. >>>>>>> >>>>>>> Found the explanation. >>>>>>> The JDI file: >>>>>>> src/jdk.jdi/share/classes/com/sun/tools/jdi/JDWPException.java >>>>>>> >>>>>>> has a fragment that translate the INVALID_OBJECT error to the >>>>>>> ObjectCollectedException: >>>>>>> ??? RuntimeException toJDIException() { >>>>>>> ??????? switch (errorCode) { >>>>>>> ??????????? case JDWP.Error.INVALID_OBJECT: >>>>>>> ??????????????? return new ObjectCollectedException(); >>>>>>> >>>>>>> So, the INVALID_OBJECT is for a jobject handle that is >>>>>>> referencing a collected object. >>>>>>> It means that previous implementation incorrectly returned >>>>>>> JVMTI_ERROR_NULL_POINTER error code. >>>>>> >>>>>> I should create and delete local or global ref to construct a >>>>>> test case for this. >>>>>> >>>>>> Interesting that the JDWPException::toJDIException() does not >>>>>> convert the ILLEGAL_ARGUMENT error code to an >>>>>> IllegalArgumentException. >>>>>> I've just added this conversion. >>>>> >>>>> Given the definition of JDWP INVALID_OBJECT then obviously JDI >>>>> converts it to ObjectCollectedException. >>>>> >>>>> So reading further in JNI spec: >>>>> >>>>> "Weak global references are a special kind of global reference. >>>>> Unlike normal global references, a weak global reference allows >>>>> the underlying Java object to be garbage collected. Weak global >>>>> references may be used in any situation where global or local >>>>> references are used." >>>>> >>>>> So it seems that any function that takes a jobject cxould in fact >>>>> accept a jweak, in which case JVMTI_ERROR_INVALID_OBJECT is a >>>>> possibility in all cases. So IIUC >>>>> JNIHandles::resolve_external_guard can return NULL if a weak >>>>> reference has been collected. So the new code you propose seems >>>>> correct. >>>> >>>> You are right about weak global references. >>>> I was able to construct a test case for JVMTI_ERROR_INVALID_OBJECT. >>>> The JNI NewGlobalRef and DeleteGlobalRef are used for it. >>>> You can find it in the updated webrev version. >>>> >>>>> However, this still is unrelated to the current issue and I do not >>>>> see other JVM TI doing checks for this case. So this seems to be a >>>>> much broader issue. >>>> There are many such checks in JVM TI. >>>> For instance, there are checks like the following in jvmtiEnv.cpp: >>>> NULL_CHECK(o, JVMTI_ERROR_INVALID_OBJECT) >>> >>> Yes but they are incorrect IMO e.g. >>> >>> JvmtiEnv::GetObjectSize(jobject object, jlong* size_ptr) { >>> ? oop mirror = JNIHandles::resolve_external_guard(object); >>> ? NULL_CHECK(mirror, JVMTI_ERROR_INVALID_OBJECT); >>> >>> The NULL_CHECK will fail if either object is NULL or object is a >>> jweak that has been cleared. In the first case it should report >>> JVMTI_ERROR_NULL_POINTER. >>> >>> The correct pattern is what you proposed with this fix: >>> >>> +?? NULL_CHECK(exception, JVMTI_ERROR_NULL_POINTER); >>> ??? oop e = JNIHandles::resolve_external_guard(exception); >>> +?? // the exception must be a valid jobject >>> +?? if (e == NULL) { >>> +???? return JVMTI_ERROR_INVALID_OBJECT; >>> +?? } >>> >> >> I see your point, thanks! >> I'll check these cases and file a bug if necessary. >> >>> Though not sure why you didn't use a second NULL_CHECK >> >> I've already replaced it with: >> ?? NULL_CHECK(e, JVMTI_ERROR_INVALID_OBJECT); >> >> You, probably, need to refresh the webrev page. >> >> Thanks, >> Serguei >> >> >>> David >>> ----- >>> >>>> Thanks, >>>> Serguei >>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/StopThread/stopthrd006/TestDescription.java >>>>>>>>> >>>>>>>>> >>>>>>>>> The copyright year should be change to "2018, 2020,". >>>>>>>> Thank you for the catch. >>>>>>>> I planned to update the copyright comments. >>>>>>>> >>>>>>>>> I'm a little surprised the test doesn't actually check that a >>>>>>>>> valid call doesn't produce an error. But that's an existing >>>>>>>>> quirk of the test and not something you need to address here >>>>>>>>> (if indeed it needs addressing - perhaps there is another test >>>>>>>>> for that). >>>>>>>> >>>>>>>> There are plenty of other nsk.jvmti tests which check valid calls. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The old webrev and spec are here: >>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.0/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> On 5/27/20 18:03, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 5/27/20 02:00, David Holmes wrote: >>>>>>>>>>>> On 27/05/2020 6:36 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 5/27/20 00:47, David Holmes wrote: >>>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 27/05/2020 1:01 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> CSR draft (one CSR reviewer is needed before finalizing >>>>>>>>>>>>>>> it): >>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have some thoughts on the wording which I will add to >>>>>>>>>>>>>> the CSR. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you a lot for looking at this! >>>>>>>>>>>>> >>>>>>>>>>>>>> Also on reflection I think JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>>> would the best error to use, and it has an equivalent in >>>>>>>>>>>>>> JDWP and at the Java level for JDI. >>>>>>>>>>>>> >>>>>>>>>>>>> This is an interesting variant, thanks! >>>>>>>>>>>>> We need to balance on several criteria: >>>>>>>>>>>>> ??1) Compatibility: keep returning error as close as >>>>>>>>>>>>> possible to the current spec >>>>>>>>>>>> >>>>>>>>>>>> If you are adding a new error condition I don't understand >>>>>>>>>>>> what you mean by "close to the current spec" ?? >>>>>>>>>>> >>>>>>>>>>> If the JVMTI_ERROR_INVALID_OBJECT is returned than the JDWP >>>>>>>>>>> agent does not need any new error handling. >>>>>>>>>>> The same can be true in the JDI if the JDWP returns the same >>>>>>>>>>> error as it returned before. >>>>>>>>>>> In this case we do not add new error code but extend the >>>>>>>>>>> existing to cover new error condition. >>>>>>>>>>> >>>>>>>>>>> But, in fact (especially, after rethinking), I do not like >>>>>>>>>>> the JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>>> error code as it normally means something different. >>>>>>>>>>> So, let's avoid using it and skip this criteria. >>>>>>>>>>> Then we need new error code to cover new error condition. >>>>>>>>>>> >>>>>>>>>>>>> ??2) Best error naming match between JVM TI and JDI/JDWP >>>>>>>>>>>>> ??3) Best practice in errors naming >>>>>>>>>>>> >>>>>>>>>>>> If the argument is not a ThreadDeath instance then it is an >>>>>>>>>>>> illegal argument - perfect fit semantically all the specs >>>>>>>>>>>> involved have an "illegal argument" error form. >>>>>>>>>>> >>>>>>>>>>> I agree with this. >>>>>>>>>>> It is why I like this suggestion. :) >>>>>>>>>>> The JDWP equivalent is: ILLEGAL_ARGUMENT. >>>>>>>>>>> The JDI equivalent is: IllegalArgumentException >>>>>>>>>>> >>>>>>>>>>> I'll prepare and send the update. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> I think the #1 is most important but will look at it once >>>>>>>>>>>>> more. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ?? The JVM TI StopThread method mirrored the >>>>>>>>>>>>>>> functionality of the >>>>>>>>>>>>>>> ?? java.lang.Thread::stop(Throwable t) method, in that >>>>>>>>>>>>>>> it allows any exception >>>>>>>>>>>>>>> ?? type to be installed as an asynchronous exception in >>>>>>>>>>>>>>> the target thread. >>>>>>>>>>>>>>> ?? However, the java.lang.Thread::stop(Throwable t) >>>>>>>>>>>>>>> method was inherently unsafe >>>>>>>>>>>>>>> ?? and in Java 8 (under JDK-7059085) it was "retired" so >>>>>>>>>>>>>>> that it always threw >>>>>>>>>>>>>>> ?? UnsupportedOperationException. >>>>>>>>>>>>>>> ?? The updated JVM TI StopThread spec disallows an >>>>>>>>>>>>>>> arbitrary Throwable from being passed, >>>>>>>>>>>>>>> ?? and instead restricts the argument to being an >>>>>>>>>>>>>>> instance of ThreadDeath, thus >>>>>>>>>>>>>>> ?? mirroring the (deprecated but still functional) >>>>>>>>>>>>>>> java.lang.Thread::stop() method. >>>>>>>>>>>>>>> ?? The error JVMTI_ERROR_INVALID_OBJECT is returned if >>>>>>>>>>>>>>> the exception argument >>>>>>>>>>>>>>> ?? is not an instance of ThreadDeath. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ?? Also, I will file similar RFE and CSR on the JDI and >>>>>>>>>>>>>>> JDWP spec. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Testing: >>>>>>>>>>>>>>> ?? Built docs and checked the doc has been generated as >>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>> ?? Will run the nsk.jvmti tests locally. >>>>>>>>>>>>>>> ?? Will submit hs-tiers1-3 to make sure there are no >>>>>>>>>>>>>>> regressions in the JVM TI and JDI tests. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >> From fairoz.matte at oracle.com Mon Jun 1 15:52:08 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 1 Jun 2020 08:52:08 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> Message-ID: Hi Erik, Thanks for the review, below is the updated webrev. http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ Thanks, Fairoz > -----Original Message----- > From: Erik Gahlin > Sent: Monday, June 1, 2020 4:26 PM > To: Fairoz Matte > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect > and corresponsing logic seems to be broken > > Hi Fairoz, > > What I think you need to do is something like this: > > ??????? if (className.equals("java.lang.Thread")) { > ??????????? return !isJfrInitialized(); > ??????? } > > ... > > ??? private static boolean isJfrInitialized() { > ??????? try { > ??????????? Class clazz = Class.forName("jdk.jfr.FlightRecorder"); > ??????????? Method method = clazz.getDeclaredMethod("isInitialized", > new Class[0]); > ??????????? return (boolean) method.invoke(null, new Object[0]); > ??????? } catch (Exception e) { > ??????????? return false; > ??????? } > ??? } > > Erik > > On 2020-06-01 12:30, Fairoz Matte wrote: > > Hi Erik, > > > > Thanks for your quick response, > > Below is the updated webrev to handle if jfr module is not present > > http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ > > > > Thanks, > > Fairoz > > > >> -----Original Message----- > >> From: Erik Gahlin > >> Sent: Monday, June 1, 2020 2:31 PM > >> To: Fairoz Matte > >> Cc: serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > >> incorrect and corresponsing logic seems to be broken > >> > >> Hi Fairoz, > >> > >> If the test needs to run with builds where the JFR module is not > >> present(?), you need to do the check using reflection. > >> > >> If not, looks good. > >> > >> Erik > >> > >>> On 1 Jun 2020, at 10:27, Fairoz Matte wrote: > >>> > >>> Hi, > >>> > >>> Please review this small test infra change to identify at runtime > >>> the JFR is > >> active or not. > >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > >>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > >>> > >>> Thanks, > >>> Fairoz From jianglizhou at google.com Mon Jun 1 16:07:54 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Mon, 1 Jun 2020 09:07:54 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> Message-ID: Hi David, Thanks a lot for the guidance on CSR. I'll work on it. Best regards, Jiangli On Sun, May 31, 2020 at 11:17 PM David Holmes wrote: > > Hi Jiangli, > > On 29/05/2020 9:02 am, Jiangli Zhou wrote: > > (Looping in serviceability-dev at openjdk.java.net ...) > > > > Hi David and Ioi, > > > > On Wed, May 27, 2020 at 11:15 PM David Holmes wrote: > >> > >> Hi Jiangli, > >> > >> On 28/05/2020 11:35 am, Ioi Lam wrote: > >>> > >>> > >>> On 5/27/20 6:17 PM, Jiangli Zhou wrote: > >>>> On Wed, May 27, 2020 at 1:56 PM Ioi Lam wrote: > >>>>> On 5/26/20 6:21 PM, Jiangli Zhou wrote: > >>>>> > >>>>>> Focusing on the link state for archived classes in this thread, I > >>>>>> updated the webrev to only set archived boot classes to 'linked' state > >>>>>> at restore time. More investigations can be done for archived classes > >>>>>> for other builtin loaders. > >>>>>> > >>>>>> https://bugs.openjdk.java.net/browse/JDK-8232222 > >>>>>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/ > >>>>>> > >>>>>> Please let me know if there is any additional concerns to the change. > >>>>>> > >>>>>> Best regards, > >>>>>> Jiangli > >>>>>> > >>>>> Hi Jiangli, > >>>>> > >>>>> I think the change is fine. I am wondering if this > >>>>> > >>>>> 2530 if (!BytecodeVerificationLocal && > >>>>> 2531 loader_data->is_the_null_class_loader_data()) { > >>>>> 2532 _init_state = linked; > >>>>> 2533 } > >>>>> > >>>>> > >>>>> can be changed to > >>>>> > >>>>> if (!BytecodeVerificationLocal && > >>>>> loader_data->is_the_null_class_loader_data() && > >>>>> !JvmtiExport::should_post_class_prepare()) > >>>>> > >>>>> That way, there's no need to change systemDictionary.cpp. > >>>>> > >>>>> > >>>> I was going to take the suggestion, but realized that it would add > >>>> unnecessary complications for archived boot classes with class > >>>> pre-initialization support. Some agents may set > >>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support > >>>> class pre-init uniformly for archived boot classes with > >>>> JvmtiExport::should_post_class_prepare() enabled or disabled. > >>> > >>> This would introduce behavioral changes when JVMTI is enabled: > >>> > >>> + The order of JvmtiExport::post_class_prepare is different than before > >>> + JvmtiExport::post_class_prepare may be called for a class that was not > >>> called before (if the class is never linked during run time) > >>> + JvmtiExport::post_class_prepare was called inside the init_lock, now > >>> it's called outside of the init_lock > >> > >> I have to say I share Ioi's concerns here. This change will impact JVM > >> TI agents in a way we can't be sure of. From a specification perspective > >> I think we are fine as linking can be lazy or eager, so there's no > >> implied order either. But this would be a behavioural change that will > >> be observable by agents. (I'm less concerned about the init_lock > >> situation as it seems potentially buggy to me to call out to an agent > >> with the init_lock held in the first place! I find it hard to imagine an > >> agent only working correctly if the init_lock is held.) > >> > > > > Totally agree that we need to be very careful here (that's also part > > of the reason why I separated this into an individual RFE for the > > dedicated discussion). David, thanks for the analysis from the spec > > perspective! Agreed with the init_lock comment also. In the future, I > > think we can even get rid of the needs for init_lock completely for > > some of the pre-initialized classes. > > > > This change has gone through extensive testing since the later part of > > last year and has been in use (with the default CDS) with agents that > > do post_class_prepare. Hopefully that would ease some of the concerns. > > That is good to know, but that is just one sample of a set of agents. > > >> This would need a CSR request and involvement of the serviceabilty folk, > >> to work through any potential issues. > >> > > > > I've looped in serviceability-dev at openjdk.java.net for this > > discussion. Chris or Serguei could you please take a look of the > > change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/, > > specifically the JvmtiExport::post_class_prepare change in > > systemDictionary.cpp. > > > > Filing a CSR request sounds good to me. The CSR looks after source, > > binary, and behavioral compatibility. From a behavior point of view, > > the change most likely does not cause any visible effects to a JVMTI > > agent (based on what's observed in testing and usages). What should be > > included in the CSR? > > The CSR request should explain the behavioural change that will be > observable by agents, and all of the potential compatibility issues that > might arise from that - pointing out of course that as the spec (JVMS > 5.4**) allows for eager or lazy linking, agents shouldn't be relying on > the exact timing or order of events. > > ** I note this section has some additional constraints regarding > dynamically computed constants that might also come into play with this > pre-linking for CDS classes. > > Cheers, > David > ----- > > >> Ioi's suggestion avoids this problem, but, as you note, at the expense > >> of disabling this optimisation if an agent is attached and wants class > >> prepare events. > >> > > > > Right, if we handle that case conditionally, we would alway need to > > store the cached static field values separately since the dump time > > cannot foresee if the runtime can set boot classes in 'linked' state > > (and 'fully_initialized' state with the planned changes) at restore > > time. As a result, we need to handle all pre-initialized static fields > > like what we are doing today, which is storing them in the archived > > class_info_records then installing them to the related fields at > > runtime. That causes both unwanted memory and CPU overhead at runtime. > > > > I also updated the webrev.02 in place with typo fixes. Thanks! > > > > Best regards, > > Jiangli > > > >> Thanks, > >> David > >> > >>> Thanks > >>> - Ioi > >>> > >>>> > >>>>> BTW, I was wondering where the performance came from, so I wrote an > >>>>> investigative patch: > >>>>> > >>>>> diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp > >>>>> --- a/src/hotspot/share/oops/instanceKlass.cpp Thu May 21 15:56:27 > >>>>> 2020 -0700 > >>>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp Wed May 27 10:48:57 > >>>>> 2020 -0700 > >>>>> @@ -866,6 +866,13 @@ > >>>>> return true; > >>>>> } > >>>>> > >>>>> + if (UseSharedSpaces && !BytecodeVerificationLocal && > >>>>> is_shared_boot_class()) { > >>>>> + Handle h_init_lock(THREAD, init_lock()); > >>>>> + ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); > >>>>> + set_init_state(linked); > >>>>> + return true; > >>>>> + } > >>>>> + > >>>>> // trace only the link time for this klass that includes > >>>>> // the verification time > >>>>> PerfClassTraceTime vmtimer(ClassLoader::perf_class_link_time(), > >>>>> > >>>>> > >>>>> Benchmarking results (smaller numbers are better): > >>>>> > >>>>> (baseline vs your patch) > >>>>> > >>>>> baseline jiangli baseline > >>>>> jiangli > >>>>> 1: 58514375 57755638 (-758737) ----- 40.266 > >>>>> 40.135 ( > >>>>> -0.131) - > >>>>> 2: 58506426 57754623 (-751803) ----- 40.367 > >>>>> 39.417 ( > >>>>> -0.950) ----- > >>>>> 3: 58498554 57759735 (-738819) ----- 40.513 > >>>>> 39.970 ( > >>>>> -0.543) --- > >>>>> 4: 58491265 57751296 (-739969) ----- 40.439 > >>>>> 40.268 ( > >>>>> -0.171) - > >>>>> 5: 58500588 57750975 (-749613) ----- 40.569 > >>>>> 40.080 ( > >>>>> -0.489) -- > >>>>> 6: 58497015 57744418 (-752597) ----- 41.097 > >>>>> 40.147 ( > >>>>> -0.950) ----- > >>>>> 7: 58494335 57749909 (-744426) ----- 39.983 40.214 > >>>>> ( 0.231) + > >>>>> 8: 58500401 57750305 (-750096) ----- 40.235 40.417 > >>>>> ( 0.182) + > >>>>> 9: 58490728 57767463 (-723265) ----- 40.354 > >>>>> 39.928 ( > >>>>> -0.426) -- > >>>>> 10: 58497858 57746557 (-751301) ----- 40.756 > >>>>> 39.706 ( > >>>>> -1.050) ----- > >>>>> ============================================================ > >>>>> 58499154 57753091 (-746062) ----- 40.457 > >>>>> 40.027 ( > >>>>> -0.430) -- > >>>>> instr delta = -746062 -1.2753% > >>>>> time delta = -0.430 ms -1.0619% > >>>>> > >>>>> > >>>>> (baseline vs my patch) > >>>>> > >>>>> baseline ioi baseline ioi > >>>>> 1: 58503574 57821124 (-682450) ----- 40.554 39.783 ( > >>>>> -0.771) ----- > >>>>> 2: 58499325 57819459 (-679866) ----- 40.092 40.325 > >>>>> ( 0.233) ++ > >>>>> 3: 58492362 57811978 (-680384) ----- 40.546 > >>>>> 39.826 ( > >>>>> -0.720) ----- > >>>>> 4: 58488655 57828878 (-659777) ----- 40.270 40.550 > >>>>> ( 0.280) ++ > >>>>> 5: 58501567 57830179 (-671388) ----- 40.382 > >>>>> 40.145 ( > >>>>> -0.237) -- > >>>>> 6: 58496552 57808774 (-687778) ----- 40.702 > >>>>> 40.527 ( > >>>>> -0.175) - > >>>>> 7: 58482701 57808925 (-673776) ----- 40.268 > >>>>> 39.849 ( > >>>>> -0.419) --- > >>>>> 8: 58493831 57807810 (-686021) ----- 40.396 > >>>>> 39.940 ( > >>>>> -0.456) --- > >>>>> 9: 58489388 57811354 (-678034) ----- 40.575 > >>>>> 40.078 ( > >>>>> -0.497) --- > >>>>> 10: 58482512 57795489 (-687023) ----- 40.084 40.247 > >>>>> ( 0.163) + > >>>>> ============================================================ > >>>>> 58493046 57814396 (-678650) ----- 40.386 > >>>>> 40.126 ( > >>>>> -0.260) -- > >>>>> instr delta = -678650 -1.1602% > >>>>> time delta = -0.260 ms -0.6445% > >>>>> > >>>>> > >>>>> (your patch vs my patch) > >>>>> > >>>>> jiangli ioi jiangli ioi > >>>>> 1: 57716711 57782622 ( 65911) ++++ 41.042 40.302 ( > >>>>> -0.740) ----- > >>>>> 2: 57709666 57780196 ( 70530) ++++ 40.334 40.965 ( > >>>>> 0.631) ++++ > >>>>> 3: 57716074 57803315 ( 87241) +++++ 40.239 39.823 ( > >>>>> -0.416) --- > >>>>> 4: 57725152 57782719 ( 57567) +++ 40.430 39.805 ( > >>>>> -0.625) ---- > >>>>> 5: 57719799 57787187 ( 67388) ++++ 40.138 40.003 ( > >>>>> -0.135) - > >>>>> 6: 57721922 57769193 ( 47271) +++ 40.324 40.207 ( > >>>>> -0.117) - > >>>>> 7: 57716438 57785212 ( 68774) ++++ 39.978 40.149 ( > >>>>> 0.171) + > >>>>> 8: 57713834 57778797 ( 64963) ++++ 40.359 40.210 ( > >>>>> -0.149) - > >>>>> 9: 57711272 57786376 ( 75104) ++++ 40.575 40.724 ( > >>>>> 0.149) + > >>>>> 10: 57711660 57780548 ( 68888) ++++ 40.291 40.091 ( > >>>>> -0.200) - > >>>>> ============================================================ > >>>>> 57716252 57783615 ( 67363) ++++ 40.370 40.226 ( > >>>>> -0.144) - > >>>>> instr delta = 67363 0.1167% > >>>>> time delta = -0.144 ms -0.3560% > >>>>> > >>>>> > >>>>> These numbers show that the majority of the time spent (678650 > >>>>> instructions) inside InstanceKlass::link_class_impl is spent from the > >>>>> PerfClassTraceTime. Walking of the class hierarchy and taking the > >>>>> h_init_lock only takes about 67363 instructions). > >>>>> > >>>>> Due to this finding, I filed two more RFEs: > >>>>> > >>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 > >>>>> PerfClassTraceTime slows down VM start-up > >>>>> > >>>> It's related to JDK-8246020, and I've commented on the bug (see > >>>> JDK-8246020 comments). UsePerfData for perf data collection is common > >>>> in cloud usages. It's better to keep UsePerfData enabled by default. > >>>> > >>>>> https://bugs.openjdk.java.net/browse/JDK-8246015 > >>>>> Method::link_method is called twice for CDS methods > >>>> > >>>> That was addressed as part of the initial change for JDK-8232222: > >>>> http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html > >>>> > >>>> > >>>> It's cleaner to handle it separately, so I removed it from the latest > >>>> version. I've assigned JDK-8246015 to myself and will address it > >>>> separately. Thanks for recording the separate bug. > >>>> > >>>> Thanks! > >>>> Jiangli > >>>> > >>>>> > >>>>> Thanks > >>>>> - Ioi > >>> From chris.plummer at oracle.com Mon Jun 1 18:31:57 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 1 Jun 2020 11:31:57 -0700 Subject: RFR(XS): 8221306: JVMTI spec for FramePop(), MethodExit(), and MethodEnter() could use some cleanup In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Jun 1 18:56:50 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 1 Jun 2020 11:56:50 -0700 Subject: RFR(XS): 8221306: JVMTI spec for FramePop(), MethodExit(), and MethodEnter() could use some cleanup In-Reply-To: References: Message-ID: <4638f9fc-17d7-1ead-e838-a3bf55612e5a@oracle.com> Thanks, Chris! Serguei On 6/1/20 11:31, Chris Plummer wrote: > Hi Serguei, > > Looks good. > > thanks, > > Chris > > On 5/31/20 1:11 AM, serguei.spitsyn at oracle.com wrote: >> Please, review a fix for small spec bug: >> https://bugs.openjdk.java.net/browse/JDK-8221306 >> >> Webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmt-funcs-cleanup.1/src/ >> >> Updated JVM TI spec for the FramePop, MethodEntry and MethodExit events: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmt-funcs-cleanup.1/docs/specs/jvmti.html#FramePop >> >> Summary: >> ? It is a minor spec cleanup for JVM TI events >> FramePop/MethodEntry/MethodExit: >> ?? - added small clarification that GetFrameLocation needs to be >> asked for frame at depth 0 >> ?? - removed partly unneeded and partly incorrect statements about >> MethodExit event argument >> >> Testing: >> ? Manually verified the generated jvmti.html. >> >> I think, there is no need to file a CSR for this spec update as it is >> just minor cleanup. >> >> Thanks, >> Serguei > From serguei.spitsyn at oracle.com Mon Jun 1 19:30:00 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 1 Jun 2020 12:30:00 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> Message-ID: <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Jun 1 20:36:18 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 1 Jun 2020 13:36:18 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> Message-ID: <6225a47a-c777-ac27-07b6-19e21318cae8@oracle.com> Hi Daniil, LGTM. Thanks, Serguei On 5/29/20 16:28, Daniil Titov wrote: > Hi Alex and Serguei, > > Please review a new version of the change [1] that makes sure that the test counts > only the threads it creates and ignores Internal threads VM might create or destroy. > > Testing: Running this test in Mach5 with Graal on several hundred times , > tier1-tier3 tests are in progress. > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > Hi Daniil, > > I'm not sure all this retry logic is a good way. > As mentioned in jira the most important part of the testing is ensuring > that you find all the created threads when they are alive, and you don't > find them when they are dead. The actual thread count checking is not > that important. > I agree with this and I'd just simplify the test by removing checks for > thread count. VM may create and destroy internal threads when it needs it. > > --alex > > On 05/18/2020 10:31, Daniil Titov wrote: > > Please review the change [1] that fixes an intermittent failure of the test. > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > > > > > From dean.long at oracle.com Mon Jun 1 22:10:53 2020 From: dean.long at oracle.com (Dean Long) Date: Mon, 1 Jun 2020 15:10:53 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> Message-ID: <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: > Hi Dean, > > To check the is_old as you suggest the target method has to be passed > to the cache_jvmti_state() as argument. Is it what you are suggesting? I believe you can use use _task->method()->is_old(), as the ciEnv already has the task. > Just want to make sure I understand you correctly. > > The cache_jvmti_state() and cache_dtrace_flags() are called in the > CompileBroker::init_compiler_runtime() for a ciEnv with the NULL > CompileTask > which looks unnecessary (or I don't understand it): > > bool CompileBroker::init_compiler_runtime() { > ? CompilerThread* thread = CompilerThread::current(); > ? . . . > ??? ciEnv ci_env((CompileTask*)NULL); > ??? // Cache Jvmti state > ??? ci_env.cache_jvmti_state(); > ??? // Cache DTrace flags > ??? ci_env.cache_dtrace_flags(); > These calls look unnecessary to me, as the ci_env will cache these again before compiling a method. I suggest removing these calls.? We should make sure the cache fields are initialized to sane values in the ciEnv ctor. > The JVMCI has a separate implementation for ciEnv which is jvmciEnv and > its own set of cache_jvmti_state() and jvmti_state_changed() functions. > Both are not called in the JVMCI case. > So, these checks look as broken in JVMCI now. > JVMCI is in better shape, because it doesn't transition out of _thread_in_vm state, but yes it needs similar changes. > Not sure, I have enough compiler knowledge to fix this at this stage > of release. > Would it better to file a separate hotspot/compiler RFE targeted to 16? > It can be assigned to me if it helps. > This is a P3 so I believe we have time to fix it for 15.? Please go ahead and let's see if we can get it in.? I can help with the JVMCI changes if they are not straightforward. dl > Thanks, > Serguei > > > On 5/28/20 10:54, Dean Long wrote: >> Sure, you could just have cache_jvmti_state() return a boolean to >> bail out immediately for is_old. >> >> dl >> >> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>> Hi Dean, >>> >>> Thank you for looking at this! >>> Okay. Let me check what cab be done in this direction. >>> There is no point to cache is_old. The compilation has to bail out >>> if it is discovered to be true. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 5/28/20 00:59, Dean Long wrote: >>>> This seems OK as long as the memory barriers in the thread state >>>> transitions prevent the C++ compiler from doing something like >>>> reading is_old before reading redefinition_count.? I would feel >>>> better if both JVMCI and C1/C2 cached is_old and redefinition_count >>>> at the same time (making sure to be in the _thread_in_vm state), >>>> then bail out based on the cached value of is_old. >>>> >>>> dl >>>> >>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>> Please, review a fix for: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>> >>>>>> >>>>>> Summary: >>>>>> ? The Kitchensink stress test with the Instrumentation module >>>>>> enabled does >>>>>> ? a lot of class retransformations in parallel with all other >>>>>> stressing. >>>>>> ? It provokes the assert at the compiled code installation time: >>>>>> ??? assert(!method->is_old()) failed: Should not be installing >>>>>> old methods >>>>>> >>>>>> ? The problem is that the >>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>> ? (non-JVMCI tiered compilation) is missing the check that exists >>>>>> in the JVMCI >>>>>> ? part of implementation: >>>>>> 2148 // Skip redefined methods >>>>>> 2149 if (target_handle->is_old()) { >>>>>> 2150 failure_reason = "redefined method"; >>>>>> 2151 retry_message = "not retryable"; >>>>>> 2152 compilable = ciEnv::MethodCompilable_never; >>>>>> 2153 } else { >>>>>> . . . >>>>>> 2168 } >>>>>> >>>>>> ? The fix is to add this check. >>>>> >>>>> Sorry, forgot to explain one thing. >>>>> Compiler code has a special mechanism to ensure the JVMTI class >>>>> redefinition did >>>>> not happen while the method was compiled, so all the assumptions >>>>> remain correct. >>>>> 2190 // Cache Jvmti state >>>>> 2191 ci_env.cache_jvmti_state(); >>>>> Part of this is a check that the value of >>>>> JvmtiExport::redefinition_count() is >>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>> The JvmtiExport::redefinition_count() value change means a class >>>>> redefinition >>>>> happened which also implies some of methods may become old. >>>>> However, the method being compiled can be already old at the point >>>>> where the >>>>> redefinition counter is cached, so the redefinition counter check >>>>> does not help much. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>>> Testing: >>>>>> Ran Kitchensink test with the Instrumentation module enabled in mach5 >>>>>> ?multiple times for 100 times. Without the fix the test normally fails >>>>>> a couple of times in 200 runs. It does not fail with the fix anymore. >>>>>> Will also submit hs tiers1-5. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ioi.lam at oracle.com Mon Jun 1 22:56:34 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 1 Jun 2020 15:56:34 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> Message-ID: On 5/31/20 11:14 PM, David Holmes wrote: > Hi Jiangli, > > On 29/05/2020 9:02 am, Jiangli Zhou wrote: >> (Looping in serviceability-dev at openjdk.java.net ...) >> >> Hi David and Ioi, >> >> On Wed, May 27, 2020 at 11:15 PM David Holmes >> wrote: >>> >>> Hi Jiangli, >>> >>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>> >>>> >>>> On 5/27/20 6:17 PM, Jiangli Zhou wrote: >>>>> On Wed, May 27, 2020 at 1:56 PM Ioi Lam wrote: >>>>>> On 5/26/20 6:21 PM, Jiangli Zhou wrote: >>>>>> >>>>>>> Focusing on the link state for archived classes in this thread, I >>>>>>> updated the webrev to only set archived boot classes to 'linked' >>>>>>> state >>>>>>> at restore time. More investigations can be done for archived >>>>>>> classes >>>>>>> for other builtin loaders. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8232222 >>>>>>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/ >>>>>>> >>>>>>> Please let me know if there is any additional concerns to the >>>>>>> change. >>>>>>> >>>>>>> Best regards, >>>>>>> Jiangli >>>>>>> >>>>>> Hi Jiangli, >>>>>> >>>>>> I think the change is fine. I am wondering if this >>>>>> >>>>>> 2530?? if (!BytecodeVerificationLocal && >>>>>> 2531 loader_data->is_the_null_class_loader_data()) { >>>>>> 2532???? _init_state = linked; >>>>>> 2533?? } >>>>>> >>>>>> >>>>>> can be changed to >>>>>> >>>>>> ????????? if (!BytecodeVerificationLocal && >>>>>> loader_data->is_the_null_class_loader_data() && >>>>>> ????????????? !JvmtiExport::should_post_class_prepare()) >>>>>> >>>>>> That way, there's no need to change systemDictionary.cpp. >>>>>> >>>>>> >>>>> I was going to take the suggestion, but realized that it would add >>>>> unnecessary complications for archived boot classes with class >>>>> pre-initialization support. Some agents may set >>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support >>>>> class pre-init uniformly for archived boot classes with >>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>> >>>> This would introduce behavioral changes when JVMTI is enabled: >>>> >>>> + The order of JvmtiExport::post_class_prepare is different than >>>> before >>>> + JvmtiExport::post_class_prepare may be called for a class that >>>> was not >>>> called before (if the class is never linked during run time) >>>> + JvmtiExport::post_class_prepare was called inside the init_lock, now >>>> it's called outside of the init_lock >>> >>> I have to say I share Ioi's concerns here. This change will impact JVM >>> TI agents in a way we can't be sure of. From a specification >>> perspective >>> I think we are fine as linking can be lazy or eager, so there's no >>> implied order either. But this would be a behavioural change that will >>> be observable by agents. (I'm less concerned about the init_lock >>> situation as it seems potentially buggy to me to call out to an agent >>> with the init_lock held in the first place! I find it hard to >>> imagine an >>> agent only working correctly if the init_lock is held.) >>> >> >> Totally agree that we need to be very careful here (that's also part >> of the reason why I separated this into an individual RFE for the >> dedicated discussion). David, thanks for the analysis from the spec >> perspective! Agreed with the init_lock comment also. In the future, I >> think we can even get rid of the needs for init_lock completely for >> some of the pre-initialized classes. >> >> This change has gone through extensive testing since the later part of >> last year and has been in use (with the default CDS) with agents that >> do post_class_prepare. Hopefully that would ease some of the concerns. > > That is good to know, but that is just one sample of a set of agents. > >>> This would need a CSR request and involvement of the serviceabilty >>> folk, >>> to work through any potential issues. >>> >> >> I've looped in serviceability-dev at openjdk.java.net for this >> discussion. Chris or Serguei could you please take a look of the >> change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/, >> specifically the JvmtiExport::post_class_prepare change in >> systemDictionary.cpp. >> >> Filing a CSR request sounds good to me. The CSR looks after source, >> binary, and behavioral compatibility. From a behavior point of view, >> the change most likely does not cause any visible effects to a JVMTI >> agent (based on what's observed in testing and usages). What should be >> included in the CSR? > > The CSR request should explain the behavioural change that will be > observable by agents, and all of the potential compatibility issues > that might arise from that - pointing out of course that as the spec > (JVMS 5.4**) allows for eager or lazy linking, agents shouldn't be > relying on the exact timing or order of events. > > ** I note this section has some additional constraints regarding > dynamically computed constants that might also come into play with > this pre-linking for CDS classes. > I think the CSR should also include the benefit of doing this. It's not a lot of code change, but now we have to maintain two different code paths for post_class_prepare to be called. JVMTI agents will typically introduce quite a bit of overhead in start-up, so a reduction in the range of 0.2~0.4ms seems a drop to the bucket. I'd rather keep the VM simple unless we have a strong reason to make it more complicated. Thanks - Ioi > Cheers, > David > ----- > >>> Ioi's suggestion avoids this problem, but, as you note, at the expense >>> of disabling this optimisation if an agent is attached and wants class >>> prepare events. >>> >> >> Right, if we handle that case conditionally, we would alway need to >> store the cached static field values separately since the dump time >> cannot foresee if the runtime can set boot classes in 'linked' state >> (and 'fully_initialized' state with the planned changes) at restore >> time. As a result, we need to handle all pre-initialized static fields >> like what we are doing today, which is storing them in the archived >> class_info_records then installing them to the related fields at >> runtime. That causes both unwanted memory and CPU overhead at runtime. >> >> I also updated the webrev.02 in place with typo fixes. Thanks! >> >> Best regards, >> Jiangli >> >>> Thanks, >>> David >>> >>>> Thanks >>>> - Ioi >>>> >>>>> >>>>>> BTW, I was wondering where the performance came from, so I wrote an >>>>>> investigative patch: >>>>>> >>>>>> diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp >>>>>> --- a/src/hotspot/share/oops/instanceKlass.cpp??? Thu May 21 >>>>>> 15:56:27 >>>>>> 2020 -0700 >>>>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp??? Wed May 27 >>>>>> 10:48:57 >>>>>> 2020 -0700 >>>>>> @@ -866,6 +866,13 @@ >>>>>> ??????? return true; >>>>>> ????? } >>>>>> >>>>>> +? if (UseSharedSpaces && !BytecodeVerificationLocal && >>>>>> is_shared_boot_class()) { >>>>>> +??? Handle h_init_lock(THREAD, init_lock()); >>>>>> +??? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >>>>>> +??? set_init_state(linked); >>>>>> +??? return true; >>>>>> +? } >>>>>> + >>>>>> ????? // trace only the link time for this klass that includes >>>>>> ????? // the verification time >>>>>> ????? PerfClassTraceTime >>>>>> vmtimer(ClassLoader::perf_class_link_time(), >>>>>> >>>>>> >>>>>> Benchmarking results (smaller numbers are better): >>>>>> >>>>>> (baseline vs your patch) >>>>>> >>>>>> ???????????? baseline jiangli?????????????????????????? baseline >>>>>> jiangli >>>>>> ????? 1:???? 58514375??? 57755638 (-758737) -----???? 40.266 >>>>>> 40.135 ( >>>>>> -0.131)????? - >>>>>> ????? 2:???? 58506426??? 57754623 (-751803) -----???? 40.367 >>>>>> 39.417 ( >>>>>> -0.950)????? ----- >>>>>> ????? 3:???? 58498554??? 57759735 (-738819) -----???? 40.513 >>>>>> 39.970 ( >>>>>> -0.543)????? --- >>>>>> ????? 4:???? 58491265??? 57751296 (-739969) -----???? 40.439 >>>>>> 40.268 ( >>>>>> -0.171)????? - >>>>>> ????? 5:???? 58500588??? 57750975 (-749613) -----???? 40.569 >>>>>> 40.080 ( >>>>>> -0.489)????? -- >>>>>> ????? 6:???? 58497015??? 57744418 (-752597) -----???? 41.097 >>>>>> 40.147 ( >>>>>> -0.950)????? ----- >>>>>> ????? 7:???? 58494335??? 57749909 (-744426) -----???? 39.983 40.214 >>>>>> (? 0.231)???? + >>>>>> ????? 8:???? 58500401??? 57750305 (-750096) -----???? 40.235 40.417 >>>>>> (? 0.182)???? + >>>>>> ????? 9:???? 58490728??? 57767463 (-723265) -----???? 40.354 >>>>>> 39.928 ( >>>>>> -0.426)????? -- >>>>>> ???? 10:???? 58497858??? 57746557 (-751301) -----???? 40.756 >>>>>> 39.706 ( >>>>>> -1.050)????? ----- >>>>>> ============================================================ >>>>>> ???????????? 58499154??? 57753091 (-746062) -----???? 40.457 >>>>>> 40.027 ( >>>>>> -0.430)????? -- >>>>>> instr delta =????? -746062??? -1.2753% >>>>>> time? delta =?????? -0.430 ms -1.0619% >>>>>> >>>>>> >>>>>> (baseline vs my patch) >>>>>> >>>>>> ???????????? baseline??? ioi baseline? ioi >>>>>> ????? 1:???? 58503574??? 57821124 (-682450)????? ----- 40.554??? >>>>>> 39.783 ( >>>>>> -0.771)????? ----- >>>>>> ????? 2:???? 58499325??? 57819459 (-679866) -----???? 40.092 40.325 >>>>>> (? 0.233)??? ++ >>>>>> ????? 3:???? 58492362??? 57811978 (-680384) -----???? 40.546 >>>>>> 39.826 ( >>>>>> -0.720)????? ----- >>>>>> ????? 4:???? 58488655??? 57828878 (-659777) -----???? 40.270 40.550 >>>>>> (? 0.280)??? ++ >>>>>> ????? 5:???? 58501567??? 57830179 (-671388) -----???? 40.382 >>>>>> 40.145 ( >>>>>> -0.237)????? -- >>>>>> ????? 6:???? 58496552??? 57808774 (-687778) -----???? 40.702 >>>>>> 40.527 ( >>>>>> -0.175)????? - >>>>>> ????? 7:???? 58482701??? 57808925 (-673776) -----???? 40.268 >>>>>> 39.849 ( >>>>>> -0.419)????? --- >>>>>> ????? 8:???? 58493831??? 57807810 (-686021) -----???? 40.396 >>>>>> 39.940 ( >>>>>> -0.456)????? --- >>>>>> ????? 9:???? 58489388??? 57811354 (-678034) -----???? 40.575 >>>>>> 40.078 ( >>>>>> -0.497)????? --- >>>>>> ???? 10:???? 58482512??? 57795489 (-687023) -----???? 40.084 40.247 >>>>>> (? 0.163)???? + >>>>>> ============================================================ >>>>>> ???????????? 58493046??? 57814396 (-678650) -----???? 40.386 >>>>>> 40.126 ( >>>>>> -0.260)????? -- >>>>>> instr delta =????? -678650??? -1.1602% >>>>>> time? delta =?????? -0.260 ms -0.6445% >>>>>> >>>>>> >>>>>> (your patch vs my patch) >>>>>> >>>>>> ???????????? jiangli ioi????????????????????????????? jiangli ioi >>>>>> ????? 1:???? 57716711??? 57782622 ( 65911) ++++????????? 41.042 >>>>>> 40.302 ( >>>>>> -0.740)????? ----- >>>>>> ????? 2:???? 57709666??? 57780196 ( 70530) ++++????????? 40.334 >>>>>> 40.965 ( >>>>>> 0.631)? ++++ >>>>>> ????? 3:???? 57716074??? 57803315 ( 87241) +++++????????? 40.239 >>>>>> 39.823 ( >>>>>> -0.416)????? --- >>>>>> ????? 4:???? 57725152??? 57782719 ( 57567) +++????????? 40.430 >>>>>> 39.805 ( >>>>>> -0.625)????? ---- >>>>>> ????? 5:???? 57719799??? 57787187 ( 67388) ++++????????? 40.138 >>>>>> 40.003 ( >>>>>> -0.135)????? - >>>>>> ????? 6:???? 57721922??? 57769193 ( 47271) +++????????? 40.324 >>>>>> 40.207 ( >>>>>> -0.117)????? - >>>>>> ????? 7:???? 57716438??? 57785212 ( 68774) ++++????????? 39.978 >>>>>> 40.149 ( >>>>>> 0.171)???? + >>>>>> ????? 8:???? 57713834??? 57778797 ( 64963) ++++????????? 40.359 >>>>>> 40.210 ( >>>>>> -0.149)????? - >>>>>> ????? 9:???? 57711272??? 57786376 ( 75104) ++++????????? 40.575 >>>>>> 40.724 ( >>>>>> 0.149)???? + >>>>>> ???? 10:???? 57711660??? 57780548 ( 68888) ++++????????? 40.291 >>>>>> 40.091 ( >>>>>> -0.200)????? - >>>>>> ============================================================ >>>>>> ???????????? 57716252??? 57783615 ( 67363) ++++????????? 40.370 >>>>>> 40.226 ( >>>>>> -0.144)????? - >>>>>> instr delta =??????? 67363???? 0.1167% >>>>>> time? delta =?????? -0.144 ms -0.3560% >>>>>> >>>>>> >>>>>> These numbers show that the majority of the time spent (678650 >>>>>> instructions) inside InstanceKlass::link_class_impl is spent from >>>>>> the >>>>>> PerfClassTraceTime. Walking of the class hierarchy and taking the >>>>>> h_init_lock only takes about 67363 instructions). >>>>>> >>>>>> Due to this finding, I filed two more RFEs: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>>>> PerfClassTraceTime slows down VM start-up >>>>>> >>>>> It's related to JDK-8246020, and I've commented on the bug (see >>>>> JDK-8246020 comments). UsePerfData for perf data collection is common >>>>> in cloud usages. It's better to keep UsePerfData enabled by default. >>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8246015 >>>>>> Method::link_method is called twice for CDS methods >>>>> >>>>> That was addressed as part of the initial change for JDK-8232222: >>>>> http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html >>>>> >>>>> >>>>> >>>>> It's cleaner to handle it separately, so I removed it from the latest >>>>> version. I've assigned JDK-8246015 to myself and will address it >>>>> separately. Thanks for recording the separate bug. >>>>> >>>>> Thanks! >>>>> Jiangli >>>>> >>>>>> >>>>>> Thanks >>>>>> - Ioi >>>> From david.holmes at oracle.com Mon Jun 1 23:22:05 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 2 Jun 2020 09:22:05 +1000 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> Message-ID: <13b1698c-e000-783e-b40f-02a048dac9e3@oracle.com> Hi Ioi, On 2/06/2020 8:56 am, Ioi Lam wrote: > > > On 5/31/20 11:14 PM, David Holmes wrote: >> Hi Jiangli, >> >> On 29/05/2020 9:02 am, Jiangli Zhou wrote: >>> (Looping in serviceability-dev at openjdk.java.net ...) >>> >>> Hi David and Ioi, >>> >>> On Wed, May 27, 2020 at 11:15 PM David Holmes >>> wrote: >>>> >>>> Hi Jiangli, >>>> >>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>> >>>>> >>>>> On 5/27/20 6:17 PM, Jiangli Zhou wrote: >>>>>> On Wed, May 27, 2020 at 1:56 PM Ioi Lam wrote: >>>>>>> On 5/26/20 6:21 PM, Jiangli Zhou wrote: >>>>>>> >>>>>>>> Focusing on the link state for archived classes in this thread, I >>>>>>>> updated the webrev to only set archived boot classes to 'linked' >>>>>>>> state >>>>>>>> at restore time. More investigations can be done for archived >>>>>>>> classes >>>>>>>> for other builtin loaders. >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8232222 >>>>>>>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/ >>>>>>>> >>>>>>>> Please let me know if there is any additional concerns to the >>>>>>>> change. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Jiangli >>>>>>>> >>>>>>> Hi Jiangli, >>>>>>> >>>>>>> I think the change is fine. I am wondering if this >>>>>>> >>>>>>> 2530?? if (!BytecodeVerificationLocal && >>>>>>> 2531 loader_data->is_the_null_class_loader_data()) { >>>>>>> 2532???? _init_state = linked; >>>>>>> 2533?? } >>>>>>> >>>>>>> >>>>>>> can be changed to >>>>>>> >>>>>>> ????????? if (!BytecodeVerificationLocal && >>>>>>> loader_data->is_the_null_class_loader_data() && >>>>>>> ????????????? !JvmtiExport::should_post_class_prepare()) >>>>>>> >>>>>>> That way, there's no need to change systemDictionary.cpp. >>>>>>> >>>>>>> >>>>>> I was going to take the suggestion, but realized that it would add >>>>>> unnecessary complications for archived boot classes with class >>>>>> pre-initialization support. Some agents may set >>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support >>>>>> class pre-init uniformly for archived boot classes with >>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>> >>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>> >>>>> + The order of JvmtiExport::post_class_prepare is different than >>>>> before >>>>> + JvmtiExport::post_class_prepare may be called for a class that >>>>> was not >>>>> called before (if the class is never linked during run time) >>>>> + JvmtiExport::post_class_prepare was called inside the init_lock, now >>>>> it's called outside of the init_lock >>>> >>>> I have to say I share Ioi's concerns here. This change will impact JVM >>>> TI agents in a way we can't be sure of. From a specification >>>> perspective >>>> I think we are fine as linking can be lazy or eager, so there's no >>>> implied order either. But this would be a behavioural change that will >>>> be observable by agents. (I'm less concerned about the init_lock >>>> situation as it seems potentially buggy to me to call out to an agent >>>> with the init_lock held in the first place! I find it hard to >>>> imagine an >>>> agent only working correctly if the init_lock is held.) >>>> >>> >>> Totally agree that we need to be very careful here (that's also part >>> of the reason why I separated this into an individual RFE for the >>> dedicated discussion). David, thanks for the analysis from the spec >>> perspective! Agreed with the init_lock comment also. In the future, I >>> think we can even get rid of the needs for init_lock completely for >>> some of the pre-initialized classes. >>> >>> This change has gone through extensive testing since the later part of >>> last year and has been in use (with the default CDS) with agents that >>> do post_class_prepare. Hopefully that would ease some of the concerns. >> >> That is good to know, but that is just one sample of a set of agents. >> >>>> This would need a CSR request and involvement of the serviceabilty >>>> folk, >>>> to work through any potential issues. >>>> >>> >>> I've looped in serviceability-dev at openjdk.java.net for this >>> discussion. Chris or Serguei could you please take a look of the >>> change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/, >>> specifically the JvmtiExport::post_class_prepare change in >>> systemDictionary.cpp. >>> >>> Filing a CSR request sounds good to me. The CSR looks after source, >>> binary, and behavioral compatibility. From a behavior point of view, >>> the change most likely does not cause any visible effects to a JVMTI >>> agent (based on what's observed in testing and usages). What should be >>> included in the CSR? >> >> The CSR request should explain the behavioural change that will be >> observable by agents, and all of the potential compatibility issues >> that might arise from that - pointing out of course that as the spec >> (JVMS 5.4**) allows for eager or lazy linking, agents shouldn't be >> relying on the exact timing or order of events. >> >> ** I note this section has some additional constraints regarding >> dynamically computed constants that might also come into play with >> this pre-linking for CDS classes. >> > I think the CSR should also include the benefit of doing this. It's not > a lot of code change, but now we have to maintain two different code > paths for post_class_prepare to be called. The CSR is concerned only with the compatibility aspects of a change. The cost:benefit ratio is an engineering decision that should be discussed here in the RFR. David ----- > JVMTI agents will typically introduce quite a bit of overhead in > start-up, so a reduction in the range of 0.2~0.4ms seems a drop to the > bucket. I'd rather keep the VM simple unless we have a strong reason to > make it more complicated. > > Thanks > - Ioi > >> Cheers, >> David >> ----- >> >>>> Ioi's suggestion avoids this problem, but, as you note, at the expense >>>> of disabling this optimisation if an agent is attached and wants class >>>> prepare events. >>>> >>> >>> Right, if we handle that case conditionally, we would alway need to >>> store the cached static field values separately since the dump time >>> cannot foresee if the runtime can set boot classes in 'linked' state >>> (and 'fully_initialized' state with the planned changes) at restore >>> time. As a result, we need to handle all pre-initialized static fields >>> like what we are doing today, which is storing them in the archived >>> class_info_records then installing them to the related fields at >>> runtime. That causes both unwanted memory and CPU overhead at runtime. >>> >>> I also updated the webrev.02 in place with typo fixes. Thanks! >>> >>> Best regards, >>> Jiangli >>> >>>> Thanks, >>>> David >>>> >>>>> Thanks >>>>> - Ioi >>>>> >>>>>> >>>>>>> BTW, I was wondering where the performance came from, so I wrote an >>>>>>> investigative patch: >>>>>>> >>>>>>> diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp >>>>>>> --- a/src/hotspot/share/oops/instanceKlass.cpp??? Thu May 21 >>>>>>> 15:56:27 >>>>>>> 2020 -0700 >>>>>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp??? Wed May 27 >>>>>>> 10:48:57 >>>>>>> 2020 -0700 >>>>>>> @@ -866,6 +866,13 @@ >>>>>>> ??????? return true; >>>>>>> ????? } >>>>>>> >>>>>>> +? if (UseSharedSpaces && !BytecodeVerificationLocal && >>>>>>> is_shared_boot_class()) { >>>>>>> +??? Handle h_init_lock(THREAD, init_lock()); >>>>>>> +??? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >>>>>>> +??? set_init_state(linked); >>>>>>> +??? return true; >>>>>>> +? } >>>>>>> + >>>>>>> ????? // trace only the link time for this klass that includes >>>>>>> ????? // the verification time >>>>>>> ????? PerfClassTraceTime >>>>>>> vmtimer(ClassLoader::perf_class_link_time(), >>>>>>> >>>>>>> >>>>>>> Benchmarking results (smaller numbers are better): >>>>>>> >>>>>>> (baseline vs your patch) >>>>>>> >>>>>>> ???????????? baseline jiangli?????????????????????????? baseline >>>>>>> jiangli >>>>>>> ????? 1:???? 58514375??? 57755638 (-758737) -----???? 40.266 >>>>>>> 40.135 ( >>>>>>> -0.131)????? - >>>>>>> ????? 2:???? 58506426??? 57754623 (-751803) -----???? 40.367 >>>>>>> 39.417 ( >>>>>>> -0.950)????? ----- >>>>>>> ????? 3:???? 58498554??? 57759735 (-738819) -----???? 40.513 >>>>>>> 39.970 ( >>>>>>> -0.543)????? --- >>>>>>> ????? 4:???? 58491265??? 57751296 (-739969) -----???? 40.439 >>>>>>> 40.268 ( >>>>>>> -0.171)????? - >>>>>>> ????? 5:???? 58500588??? 57750975 (-749613) -----???? 40.569 >>>>>>> 40.080 ( >>>>>>> -0.489)????? -- >>>>>>> ????? 6:???? 58497015??? 57744418 (-752597) -----???? 41.097 >>>>>>> 40.147 ( >>>>>>> -0.950)????? ----- >>>>>>> ????? 7:???? 58494335??? 57749909 (-744426) -----???? 39.983 40.214 >>>>>>> (? 0.231)???? + >>>>>>> ????? 8:???? 58500401??? 57750305 (-750096) -----???? 40.235 40.417 >>>>>>> (? 0.182)???? + >>>>>>> ????? 9:???? 58490728??? 57767463 (-723265) -----???? 40.354 >>>>>>> 39.928 ( >>>>>>> -0.426)????? -- >>>>>>> ???? 10:???? 58497858??? 57746557 (-751301) -----???? 40.756 >>>>>>> 39.706 ( >>>>>>> -1.050)????? ----- >>>>>>> ============================================================ >>>>>>> ???????????? 58499154??? 57753091 (-746062) -----???? 40.457 >>>>>>> 40.027 ( >>>>>>> -0.430)????? -- >>>>>>> instr delta =????? -746062??? -1.2753% >>>>>>> time? delta =?????? -0.430 ms -1.0619% >>>>>>> >>>>>>> >>>>>>> (baseline vs my patch) >>>>>>> >>>>>>> ???????????? baseline??? ioi baseline? ioi >>>>>>> ????? 1:???? 58503574??? 57821124 (-682450)????? ----- 40.554 >>>>>>> 39.783 ( >>>>>>> -0.771)????? ----- >>>>>>> ????? 2:???? 58499325??? 57819459 (-679866) -----???? 40.092 40.325 >>>>>>> (? 0.233)??? ++ >>>>>>> ????? 3:???? 58492362??? 57811978 (-680384) -----???? 40.546 >>>>>>> 39.826 ( >>>>>>> -0.720)????? ----- >>>>>>> ????? 4:???? 58488655??? 57828878 (-659777) -----???? 40.270 40.550 >>>>>>> (? 0.280)??? ++ >>>>>>> ????? 5:???? 58501567??? 57830179 (-671388) -----???? 40.382 >>>>>>> 40.145 ( >>>>>>> -0.237)????? -- >>>>>>> ????? 6:???? 58496552??? 57808774 (-687778) -----???? 40.702 >>>>>>> 40.527 ( >>>>>>> -0.175)????? - >>>>>>> ????? 7:???? 58482701??? 57808925 (-673776) -----???? 40.268 >>>>>>> 39.849 ( >>>>>>> -0.419)????? --- >>>>>>> ????? 8:???? 58493831??? 57807810 (-686021) -----???? 40.396 >>>>>>> 39.940 ( >>>>>>> -0.456)????? --- >>>>>>> ????? 9:???? 58489388??? 57811354 (-678034) -----???? 40.575 >>>>>>> 40.078 ( >>>>>>> -0.497)????? --- >>>>>>> ???? 10:???? 58482512??? 57795489 (-687023) -----???? 40.084 40.247 >>>>>>> (? 0.163)???? + >>>>>>> ============================================================ >>>>>>> ???????????? 58493046??? 57814396 (-678650) -----???? 40.386 >>>>>>> 40.126 ( >>>>>>> -0.260)????? -- >>>>>>> instr delta =????? -678650??? -1.1602% >>>>>>> time? delta =?????? -0.260 ms -0.6445% >>>>>>> >>>>>>> >>>>>>> (your patch vs my patch) >>>>>>> >>>>>>> ???????????? jiangli ioi????????????????????????????? jiangli ioi >>>>>>> ????? 1:???? 57716711??? 57782622 ( 65911) ++++????????? 41.042 >>>>>>> 40.302 ( >>>>>>> -0.740)????? ----- >>>>>>> ????? 2:???? 57709666??? 57780196 ( 70530) ++++????????? 40.334 >>>>>>> 40.965 ( >>>>>>> 0.631)? ++++ >>>>>>> ????? 3:???? 57716074??? 57803315 ( 87241) +++++????????? 40.239 >>>>>>> 39.823 ( >>>>>>> -0.416)????? --- >>>>>>> ????? 4:???? 57725152??? 57782719 ( 57567) +++????????? 40.430 >>>>>>> 39.805 ( >>>>>>> -0.625)????? ---- >>>>>>> ????? 5:???? 57719799??? 57787187 ( 67388) ++++????????? 40.138 >>>>>>> 40.003 ( >>>>>>> -0.135)????? - >>>>>>> ????? 6:???? 57721922??? 57769193 ( 47271) +++????????? 40.324 >>>>>>> 40.207 ( >>>>>>> -0.117)????? - >>>>>>> ????? 7:???? 57716438??? 57785212 ( 68774) ++++????????? 39.978 >>>>>>> 40.149 ( >>>>>>> 0.171)???? + >>>>>>> ????? 8:???? 57713834??? 57778797 ( 64963) ++++????????? 40.359 >>>>>>> 40.210 ( >>>>>>> -0.149)????? - >>>>>>> ????? 9:???? 57711272??? 57786376 ( 75104) ++++????????? 40.575 >>>>>>> 40.724 ( >>>>>>> 0.149)???? + >>>>>>> ???? 10:???? 57711660??? 57780548 ( 68888) ++++????????? 40.291 >>>>>>> 40.091 ( >>>>>>> -0.200)????? - >>>>>>> ============================================================ >>>>>>> ???????????? 57716252??? 57783615 ( 67363) ++++????????? 40.370 >>>>>>> 40.226 ( >>>>>>> -0.144)????? - >>>>>>> instr delta =??????? 67363???? 0.1167% >>>>>>> time? delta =?????? -0.144 ms -0.3560% >>>>>>> >>>>>>> >>>>>>> These numbers show that the majority of the time spent (678650 >>>>>>> instructions) inside InstanceKlass::link_class_impl is spent from >>>>>>> the >>>>>>> PerfClassTraceTime. Walking of the class hierarchy and taking the >>>>>>> h_init_lock only takes about 67363 instructions). >>>>>>> >>>>>>> Due to this finding, I filed two more RFEs: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>>>>> PerfClassTraceTime slows down VM start-up >>>>>>> >>>>>> It's related to JDK-8246020, and I've commented on the bug (see >>>>>> JDK-8246020 comments). UsePerfData for perf data collection is common >>>>>> in cloud usages. It's better to keep UsePerfData enabled by default. >>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246015 >>>>>>> Method::link_method is called twice for CDS methods >>>>>> >>>>>> That was addressed as part of the initial change for JDK-8232222: >>>>>> http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html >>>>>> >>>>>> >>>>>> >>>>>> It's cleaner to handle it separately, so I removed it from the latest >>>>>> version. I've assigned JDK-8246015 to myself and will address it >>>>>> separately. Thanks for recording the separate bug. >>>>>> >>>>>> Thanks! >>>>>> Jiangli >>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> - Ioi >>>>> > From alexey.menkov at oracle.com Tue Jun 2 00:07:18 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Mon, 1 Jun 2020 17:07:18 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> Message-ID: Hi Daniil, 1. before the fix checkLiveThreads() tested ThreadMXBean.getThreadCount(), but now as far as I see it tests Thread.getAllStackTraces(); 2. 237 private static void checkThreadIds() throws InterruptedException { 238 long[] list = mbean.getAllThreadIds(); 239 240 waitTillEquals( 241 list.length, 242 ()->(long)mbean.getThreadCount(), 243 "Array length returned by " + 244 "getAllThreadIds() = %1$d not matched count = ${provided}", 245 ()->list.length 246 ); 247 } I suppose purpose of waitTillEquals() is to handle creation/termination of VM internal threads. But if some internal thread terminates after mbean.getAllThreadIds() and before 1st mbean.getThreadCount() call and then VM does not need to restart it, waitTillEquals will wait forever. --alex On 05/29/2020 16:28, Daniil Titov wrote: > Hi Alex and Serguei, > > Please review a new version of the change [1] that makes sure that the test counts > only the threads it creates and ignores Internal threads VM might create or destroy. > > Testing: Running this test in Mach5 with Graal on several hundred times , > tier1-tier3 tests are in progress. > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > Hi Daniil, > > I'm not sure all this retry logic is a good way. > As mentioned in jira the most important part of the testing is ensuring > that you find all the created threads when they are alive, and you don't > find them when they are dead. The actual thread count checking is not > that important. > I agree with this and I'd just simplify the test by removing checks for > thread count. VM may create and destroy internal threads when it needs it. > > --alex > > On 05/18/2020 10:31, Daniil Titov wrote: > > Please review the change [1] that fixes an intermittent failure of the test. > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > > > > > From serguei.spitsyn at oracle.com Tue Jun 2 02:26:35 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 1 Jun 2020 19:26:35 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> Message-ID: <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> An HTML attachment was scrubbed... URL: From leonid.mesnik at oracle.com Tue Jun 2 04:33:57 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Mon, 1 Jun 2020 21:33:57 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error Message-ID: Hi Could you please review following fix which stop test execution if JVMTI function returns error. The test fails anyway however using potentially bad data in JVMTI function might cause misleading crash failures. The hs_err will contains the stacktrace not with problem function but with function called with corrupted data. Most of tests already has such behavior but not all. Also I fixed a couple of tests to finish if they haven't managed to suspend thread. I've updated only tests which try to use corrupted data in JVMTI functions after errors. I haven't updated tests which just compare/print values from erroring JVMTI functions. The crash in strcmp/println is not so misleading and might be point to real issue. webrev: http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8242891 Leonid From serguei.spitsyn at oracle.com Tue Jun 2 06:40:00 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 1 Jun 2020 23:40:00 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> Message-ID: <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Jun 2 16:54:42 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 2 Jun 2020 09:54:42 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi Richard, This looks good to me. Thanks, Serguei On 5/28/20 09:02, Vladimir Kozlov wrote: > Vladimir Ivanov is on break currently. > It looks good to me. > > Thanks, > Vladimir K > > On 5/26/20 7:31 AM, Reingruber, Richard wrote: >> Hi Vladimir, >> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >> >>> ? From JIT-compilers perspective it looks good. >> >> I put out webrev.1 a while ago [1]: >> >> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >> Webrev(delta): >> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >> >> You originally suggested to use a handshake to switch a thread into >> interpreter mode [2]. I'm using >> a direct handshake now, because I think it is the best fit. >> >> May I ask if webrev.1 still looks good to you from JIT-compilers >> perspective? >> >> Can I list you as (partial) Reviewer? >> >> Thanks, Richard. >> >> [1] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >> [2] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Freitag, 7. Februar 2020 09:19 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for >> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >> compiled methods on stack not_entrant >> >> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >> Not an expert in JVMTI code base, so can't comment on the actual >> changes. >> >> ? From JIT-compilers perspective it looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>> >>> The change avoids making all compiled methods on stack not_entrant >>> when switching a java thread to >>> interpreter only execution for jvmti purposes. It is sufficient to >>> deoptimize the compiled frames on stack. >>> >>> Additionally a handshake is used instead of a vm operation to walk >>> the stack and do the deoptimizations. >>> >>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>> release builds on all platforms. >>> >>> Thanks, Richard. >>> >>> See also my question if anyone knows a reason for making the >>> compiled methods not_entrant: >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>> >>> From richard.reingruber at sap.com Tue Jun 2 17:57:26 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 2 Jun 2020 17:57:26 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi Serguei, > This looks good to me. Thanks! From an earlier mail: > I'm thinking it would be more safe to run full tier5. I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would like to push. Thanks, Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Dienstag, 2. Juni 2020 18:55 To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, This looks good to me. Thanks, Serguei On 5/28/20 09:02, Vladimir Kozlov wrote: > Vladimir Ivanov is on break currently. > It looks good to me. > > Thanks, > Vladimir K > > On 5/26/20 7:31 AM, Reingruber, Richard wrote: >> Hi Vladimir, >> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >> >>> ? From JIT-compilers perspective it looks good. >> >> I put out webrev.1 a while ago [1]: >> >> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >> Webrev(delta): >> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >> >> You originally suggested to use a handshake to switch a thread into >> interpreter mode [2]. I'm using >> a direct handshake now, because I think it is the best fit. >> >> May I ask if webrev.1 still looks good to you from JIT-compilers >> perspective? >> >> Can I list you as (partial) Reviewer? >> >> Thanks, Richard. >> >> [1] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >> [2] >> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Freitag, 7. Februar 2020 09:19 >> To: Reingruber, Richard ; >> serviceability-dev at openjdk.java.net; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for >> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >> compiled methods on stack not_entrant >> >> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >> >> Not an expert in JVMTI code base, so can't comment on the actual >> changes. >> >> ? From JIT-compilers perspective it looks good. >> >> Best regards, >> Vladimir Ivanov >> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>> >>> The change avoids making all compiled methods on stack not_entrant >>> when switching a java thread to >>> interpreter only execution for jvmti purposes. It is sufficient to >>> deoptimize the compiled frames on stack. >>> >>> Additionally a handshake is used instead of a vm operation to walk >>> the stack and do the deoptimizations. >>> >>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>> release builds on all platforms. >>> >>> Thanks, Richard. >>> >>> See also my question if anyone knows a reason for making the >>> compiled methods not_entrant: >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>> >>> From serguei.spitsyn at oracle.com Tue Jun 2 18:01:42 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 2 Jun 2020 11:01:42 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com> Hi Richard, On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. Okay, I'll submit a mach5 job with your fix and let you know about the results. Thanks, Serguei > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From alexey.menkov at oracle.com Tue Jun 2 19:04:21 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Tue, 2 Jun 2020 12:04:21 -0700 Subject: RFR(XS): 8221306: JVMTI spec for FramePop(), MethodExit(), and MethodEnter() could use some cleanup In-Reply-To: References: Message-ID: <85155e97-d459-36ab-8a52-746df2caa1e9@oracle.com> +1 --alex On 06/01/2020 11:31, Chris Plummer wrote: > Hi Serguei, > > Looks good. > > thanks, > > Chris > > On 5/31/20 1:11 AM, serguei.spitsyn at oracle.com wrote: >> Please, review a fix for small spec bug: >> https://bugs.openjdk.java.net/browse/JDK-8221306 >> >> Webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmt-funcs-cleanup.1/src/ >> >> Updated JVM TI spec for the FramePop, MethodEntry and MethodExit events: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmt-funcs-cleanup.1/docs/specs/jvmti.html#FramePop >> >> Summary: >> ? It is a minor spec cleanup for JVM TI events >> FramePop/MethodEntry/MethodExit: >> ?? - added small clarification that GetFrameLocation needs to be asked >> for frame at depth 0 >> ?? - removed partly unneeded and partly incorrect statements about >> MethodExit event argument >> >> Testing: >> ? Manually verified the generated jvmti.html. >> >> I think, there is no need to file a CSR for this spec update as it is >> just minor cleanup. >> >> Thanks, >> Serguei > From richard.reingruber at sap.com Tue Jun 2 19:14:08 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 2 Jun 2020 19:14:08 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com> References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com> Message-ID: Excellent. Thanks! Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Dienstag, 2. Juni 2020 20:02 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. Okay, I'll submit a mach5 job with your fix and let you know about the results. Thanks, Serguei > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From serguei.spitsyn at oracle.com Tue Jun 2 19:22:04 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 2 Jun 2020 12:22:04 -0700 Subject: RFR(XS): 8221306: JVMTI spec for FramePop(), MethodExit(), and MethodEnter() could use some cleanup In-Reply-To: <85155e97-d459-36ab-8a52-746df2caa1e9@oracle.com> References: <85155e97-d459-36ab-8a52-746df2caa1e9@oracle.com> Message-ID: <4e4b7a6d-5932-4a23-451e-315b2e3e42ce@oracle.com> Thank you, Alex! Serguei On 6/2/20 12:04, Alex Menkov wrote: > +1 > > --alex > > On 06/01/2020 11:31, Chris Plummer wrote: >> Hi Serguei, >> >> Looks good. >> >> thanks, >> >> Chris >> >> On 5/31/20 1:11 AM, serguei.spitsyn at oracle.com wrote: >>> Please, review a fix for small spec bug: >>> https://bugs.openjdk.java.net/browse/JDK-8221306 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmt-funcs-cleanup.1/src/ >>> >>> >>> Updated JVM TI spec for the FramePop, MethodEntry and MethodExit >>> events: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmt-funcs-cleanup.1/docs/specs/jvmti.html#FramePop >>> >>> >>> Summary: >>> ? It is a minor spec cleanup for JVM TI events >>> FramePop/MethodEntry/MethodExit: >>> ?? - added small clarification that GetFrameLocation needs to be >>> asked for frame at depth 0 >>> ?? - removed partly unneeded and partly incorrect statements about >>> MethodExit event argument >>> >>> Testing: >>> ? Manually verified the generated jvmti.html. >>> >>> I think, there is no need to file a CSR for this spec update as it >>> is just minor cleanup. >>> >>> Thanks, >>> Serguei >> From jianglizhou at google.com Wed Jun 3 00:19:17 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Tue, 2 Jun 2020 17:19:17 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> Message-ID: Here is the CSR: https://bugs.openjdk.java.net/browse/JDK-8246289. David, I described that the JVM spec allows for eager or lazy linking and agents shouldn't rely on the timing/ordering, as you suggested. Please review the CSR. It's been a while since I've worked on a CSR, could you please remind me if the CSR should be proposed before reviewing? I can revert it to draft state if draft is the correct state before reviewing. Thanks! Best regards, Jiangli On Mon, Jun 1, 2020 at 9:07 AM Jiangli Zhou wrote: > > Hi David, > > Thanks a lot for the guidance on CSR. I'll work on it. > > Best regards, > > Jiangli > > On Sun, May 31, 2020 at 11:17 PM David Holmes wrote: > > > > Hi Jiangli, > > > > On 29/05/2020 9:02 am, Jiangli Zhou wrote: > > > (Looping in serviceability-dev at openjdk.java.net ...) > > > > > > Hi David and Ioi, > > > > > > On Wed, May 27, 2020 at 11:15 PM David Holmes wrote: > > >> > > >> Hi Jiangli, > > >> > > >> On 28/05/2020 11:35 am, Ioi Lam wrote: > > >>> > > >>> > > >>> On 5/27/20 6:17 PM, Jiangli Zhou wrote: > > >>>> On Wed, May 27, 2020 at 1:56 PM Ioi Lam wrote: > > >>>>> On 5/26/20 6:21 PM, Jiangli Zhou wrote: > > >>>>> > > >>>>>> Focusing on the link state for archived classes in this thread, I > > >>>>>> updated the webrev to only set archived boot classes to 'linked' state > > >>>>>> at restore time. More investigations can be done for archived classes > > >>>>>> for other builtin loaders. > > >>>>>> > > >>>>>> https://bugs.openjdk.java.net/browse/JDK-8232222 > > >>>>>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/ > > >>>>>> > > >>>>>> Please let me know if there is any additional concerns to the change. > > >>>>>> > > >>>>>> Best regards, > > >>>>>> Jiangli > > >>>>>> > > >>>>> Hi Jiangli, > > >>>>> > > >>>>> I think the change is fine. I am wondering if this > > >>>>> > > >>>>> 2530 if (!BytecodeVerificationLocal && > > >>>>> 2531 loader_data->is_the_null_class_loader_data()) { > > >>>>> 2532 _init_state = linked; > > >>>>> 2533 } > > >>>>> > > >>>>> > > >>>>> can be changed to > > >>>>> > > >>>>> if (!BytecodeVerificationLocal && > > >>>>> loader_data->is_the_null_class_loader_data() && > > >>>>> !JvmtiExport::should_post_class_prepare()) > > >>>>> > > >>>>> That way, there's no need to change systemDictionary.cpp. > > >>>>> > > >>>>> > > >>>> I was going to take the suggestion, but realized that it would add > > >>>> unnecessary complications for archived boot classes with class > > >>>> pre-initialization support. Some agents may set > > >>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support > > >>>> class pre-init uniformly for archived boot classes with > > >>>> JvmtiExport::should_post_class_prepare() enabled or disabled. > > >>> > > >>> This would introduce behavioral changes when JVMTI is enabled: > > >>> > > >>> + The order of JvmtiExport::post_class_prepare is different than before > > >>> + JvmtiExport::post_class_prepare may be called for a class that was not > > >>> called before (if the class is never linked during run time) > > >>> + JvmtiExport::post_class_prepare was called inside the init_lock, now > > >>> it's called outside of the init_lock > > >> > > >> I have to say I share Ioi's concerns here. This change will impact JVM > > >> TI agents in a way we can't be sure of. From a specification perspective > > >> I think we are fine as linking can be lazy or eager, so there's no > > >> implied order either. But this would be a behavioural change that will > > >> be observable by agents. (I'm less concerned about the init_lock > > >> situation as it seems potentially buggy to me to call out to an agent > > >> with the init_lock held in the first place! I find it hard to imagine an > > >> agent only working correctly if the init_lock is held.) > > >> > > > > > > Totally agree that we need to be very careful here (that's also part > > > of the reason why I separated this into an individual RFE for the > > > dedicated discussion). David, thanks for the analysis from the spec > > > perspective! Agreed with the init_lock comment also. In the future, I > > > think we can even get rid of the needs for init_lock completely for > > > some of the pre-initialized classes. > > > > > > This change has gone through extensive testing since the later part of > > > last year and has been in use (with the default CDS) with agents that > > > do post_class_prepare. Hopefully that would ease some of the concerns. > > > > That is good to know, but that is just one sample of a set of agents. > > > > >> This would need a CSR request and involvement of the serviceabilty folk, > > >> to work through any potential issues. > > >> > > > > > > I've looped in serviceability-dev at openjdk.java.net for this > > > discussion. Chris or Serguei could you please take a look of the > > > change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/, > > > specifically the JvmtiExport::post_class_prepare change in > > > systemDictionary.cpp. > > > > > > Filing a CSR request sounds good to me. The CSR looks after source, > > > binary, and behavioral compatibility. From a behavior point of view, > > > the change most likely does not cause any visible effects to a JVMTI > > > agent (based on what's observed in testing and usages). What should be > > > included in the CSR? > > > > The CSR request should explain the behavioural change that will be > > observable by agents, and all of the potential compatibility issues that > > might arise from that - pointing out of course that as the spec (JVMS > > 5.4**) allows for eager or lazy linking, agents shouldn't be relying on > > the exact timing or order of events. > > > > ** I note this section has some additional constraints regarding > > dynamically computed constants that might also come into play with this > > pre-linking for CDS classes. > > > > Cheers, > > David > > ----- > > > > >> Ioi's suggestion avoids this problem, but, as you note, at the expense > > >> of disabling this optimisation if an agent is attached and wants class > > >> prepare events. > > >> > > > > > > Right, if we handle that case conditionally, we would alway need to > > > store the cached static field values separately since the dump time > > > cannot foresee if the runtime can set boot classes in 'linked' state > > > (and 'fully_initialized' state with the planned changes) at restore > > > time. As a result, we need to handle all pre-initialized static fields > > > like what we are doing today, which is storing them in the archived > > > class_info_records then installing them to the related fields at > > > runtime. That causes both unwanted memory and CPU overhead at runtime. > > > > > > I also updated the webrev.02 in place with typo fixes. Thanks! > > > > > > Best regards, > > > Jiangli > > > > > >> Thanks, > > >> David > > >> > > >>> Thanks > > >>> - Ioi > > >>> > > >>>> > > >>>>> BTW, I was wondering where the performance came from, so I wrote an > > >>>>> investigative patch: > > >>>>> > > >>>>> diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp > > >>>>> --- a/src/hotspot/share/oops/instanceKlass.cpp Thu May 21 15:56:27 > > >>>>> 2020 -0700 > > >>>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp Wed May 27 10:48:57 > > >>>>> 2020 -0700 > > >>>>> @@ -866,6 +866,13 @@ > > >>>>> return true; > > >>>>> } > > >>>>> > > >>>>> + if (UseSharedSpaces && !BytecodeVerificationLocal && > > >>>>> is_shared_boot_class()) { > > >>>>> + Handle h_init_lock(THREAD, init_lock()); > > >>>>> + ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); > > >>>>> + set_init_state(linked); > > >>>>> + return true; > > >>>>> + } > > >>>>> + > > >>>>> // trace only the link time for this klass that includes > > >>>>> // the verification time > > >>>>> PerfClassTraceTime vmtimer(ClassLoader::perf_class_link_time(), > > >>>>> > > >>>>> > > >>>>> Benchmarking results (smaller numbers are better): > > >>>>> > > >>>>> (baseline vs your patch) > > >>>>> > > >>>>> baseline jiangli baseline > > >>>>> jiangli > > >>>>> 1: 58514375 57755638 (-758737) ----- 40.266 > > >>>>> 40.135 ( > > >>>>> -0.131) - > > >>>>> 2: 58506426 57754623 (-751803) ----- 40.367 > > >>>>> 39.417 ( > > >>>>> -0.950) ----- > > >>>>> 3: 58498554 57759735 (-738819) ----- 40.513 > > >>>>> 39.970 ( > > >>>>> -0.543) --- > > >>>>> 4: 58491265 57751296 (-739969) ----- 40.439 > > >>>>> 40.268 ( > > >>>>> -0.171) - > > >>>>> 5: 58500588 57750975 (-749613) ----- 40.569 > > >>>>> 40.080 ( > > >>>>> -0.489) -- > > >>>>> 6: 58497015 57744418 (-752597) ----- 41.097 > > >>>>> 40.147 ( > > >>>>> -0.950) ----- > > >>>>> 7: 58494335 57749909 (-744426) ----- 39.983 40.214 > > >>>>> ( 0.231) + > > >>>>> 8: 58500401 57750305 (-750096) ----- 40.235 40.417 > > >>>>> ( 0.182) + > > >>>>> 9: 58490728 57767463 (-723265) ----- 40.354 > > >>>>> 39.928 ( > > >>>>> -0.426) -- > > >>>>> 10: 58497858 57746557 (-751301) ----- 40.756 > > >>>>> 39.706 ( > > >>>>> -1.050) ----- > > >>>>> ============================================================ > > >>>>> 58499154 57753091 (-746062) ----- 40.457 > > >>>>> 40.027 ( > > >>>>> -0.430) -- > > >>>>> instr delta = -746062 -1.2753% > > >>>>> time delta = -0.430 ms -1.0619% > > >>>>> > > >>>>> > > >>>>> (baseline vs my patch) > > >>>>> > > >>>>> baseline ioi baseline ioi > > >>>>> 1: 58503574 57821124 (-682450) ----- 40.554 39.783 ( > > >>>>> -0.771) ----- > > >>>>> 2: 58499325 57819459 (-679866) ----- 40.092 40.325 > > >>>>> ( 0.233) ++ > > >>>>> 3: 58492362 57811978 (-680384) ----- 40.546 > > >>>>> 39.826 ( > > >>>>> -0.720) ----- > > >>>>> 4: 58488655 57828878 (-659777) ----- 40.270 40.550 > > >>>>> ( 0.280) ++ > > >>>>> 5: 58501567 57830179 (-671388) ----- 40.382 > > >>>>> 40.145 ( > > >>>>> -0.237) -- > > >>>>> 6: 58496552 57808774 (-687778) ----- 40.702 > > >>>>> 40.527 ( > > >>>>> -0.175) - > > >>>>> 7: 58482701 57808925 (-673776) ----- 40.268 > > >>>>> 39.849 ( > > >>>>> -0.419) --- > > >>>>> 8: 58493831 57807810 (-686021) ----- 40.396 > > >>>>> 39.940 ( > > >>>>> -0.456) --- > > >>>>> 9: 58489388 57811354 (-678034) ----- 40.575 > > >>>>> 40.078 ( > > >>>>> -0.497) --- > > >>>>> 10: 58482512 57795489 (-687023) ----- 40.084 40.247 > > >>>>> ( 0.163) + > > >>>>> ============================================================ > > >>>>> 58493046 57814396 (-678650) ----- 40.386 > > >>>>> 40.126 ( > > >>>>> -0.260) -- > > >>>>> instr delta = -678650 -1.1602% > > >>>>> time delta = -0.260 ms -0.6445% > > >>>>> > > >>>>> > > >>>>> (your patch vs my patch) > > >>>>> > > >>>>> jiangli ioi jiangli ioi > > >>>>> 1: 57716711 57782622 ( 65911) ++++ 41.042 40.302 ( > > >>>>> -0.740) ----- > > >>>>> 2: 57709666 57780196 ( 70530) ++++ 40.334 40.965 ( > > >>>>> 0.631) ++++ > > >>>>> 3: 57716074 57803315 ( 87241) +++++ 40.239 39.823 ( > > >>>>> -0.416) --- > > >>>>> 4: 57725152 57782719 ( 57567) +++ 40.430 39.805 ( > > >>>>> -0.625) ---- > > >>>>> 5: 57719799 57787187 ( 67388) ++++ 40.138 40.003 ( > > >>>>> -0.135) - > > >>>>> 6: 57721922 57769193 ( 47271) +++ 40.324 40.207 ( > > >>>>> -0.117) - > > >>>>> 7: 57716438 57785212 ( 68774) ++++ 39.978 40.149 ( > > >>>>> 0.171) + > > >>>>> 8: 57713834 57778797 ( 64963) ++++ 40.359 40.210 ( > > >>>>> -0.149) - > > >>>>> 9: 57711272 57786376 ( 75104) ++++ 40.575 40.724 ( > > >>>>> 0.149) + > > >>>>> 10: 57711660 57780548 ( 68888) ++++ 40.291 40.091 ( > > >>>>> -0.200) - > > >>>>> ============================================================ > > >>>>> 57716252 57783615 ( 67363) ++++ 40.370 40.226 ( > > >>>>> -0.144) - > > >>>>> instr delta = 67363 0.1167% > > >>>>> time delta = -0.144 ms -0.3560% > > >>>>> > > >>>>> > > >>>>> These numbers show that the majority of the time spent (678650 > > >>>>> instructions) inside InstanceKlass::link_class_impl is spent from the > > >>>>> PerfClassTraceTime. Walking of the class hierarchy and taking the > > >>>>> h_init_lock only takes about 67363 instructions). > > >>>>> > > >>>>> Due to this finding, I filed two more RFEs: > > >>>>> > > >>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 > > >>>>> PerfClassTraceTime slows down VM start-up > > >>>>> > > >>>> It's related to JDK-8246020, and I've commented on the bug (see > > >>>> JDK-8246020 comments). UsePerfData for perf data collection is common > > >>>> in cloud usages. It's better to keep UsePerfData enabled by default. > > >>>> > > >>>>> https://bugs.openjdk.java.net/browse/JDK-8246015 > > >>>>> Method::link_method is called twice for CDS methods > > >>>> > > >>>> That was addressed as part of the initial change for JDK-8232222: > > >>>> http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html > > >>>> > > >>>> > > >>>> It's cleaner to handle it separately, so I removed it from the latest > > >>>> version. I've assigned JDK-8246015 to myself and will address it > > >>>> separately. Thanks for recording the separate bug. > > >>>> > > >>>> Thanks! > > >>>> Jiangli > > >>>> > > >>>>> > > >>>>> Thanks > > >>>>> - Ioi > > >>> From daniil.x.titov at oracle.com Wed Jun 3 04:00:18 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Tue, 02 Jun 2020 21:00:18 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> Message-ID: Hi Alex, Serguei, and Martin, Thank you for your comments. Please review a new version of the fix that addresses them, specifically: 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. 3) Relaxes the check inside checkThreadIds() method [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ [2] https://bugs.openjdk.java.net/browse/JDK-8131745 Thank you, Daniil ?On 6/1/20, 5:06 PM, "Alex Menkov" wrote: Hi Daniil, 1. before the fix checkLiveThreads() tested ThreadMXBean.getThreadCount(), but now as far as I see it tests Thread.getAllStackTraces(); 2. 237 private static void checkThreadIds() throws InterruptedException { 238 long[] list = mbean.getAllThreadIds(); 239 240 waitTillEquals( 241 list.length, 242 ()->(long)mbean.getThreadCount(), 243 "Array length returned by " + 244 "getAllThreadIds() = %1$d not matched count = ${provided}", 245 ()->list.length 246 ); 247 } I suppose purpose of waitTillEquals() is to handle creation/termination of VM internal threads. But if some internal thread terminates after mbean.getAllThreadIds() and before 1st mbean.getThreadCount() call and then VM does not need to restart it, waitTillEquals will wait forever. --alex On 05/29/2020 16:28, Daniil Titov wrote: > Hi Alex and Serguei, > > Please review a new version of the change [1] that makes sure that the test counts > only the threads it creates and ignores Internal threads VM might create or destroy. > > Testing: Running this test in Mach5 with Graal on several hundred times , > tier1-tier3 tests are in progress. > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > Hi Daniil, > > I'm not sure all this retry logic is a good way. > As mentioned in jira the most important part of the testing is ensuring > that you find all the created threads when they are alive, and you don't > find them when they are dead. The actual thread count checking is not > that important. > I agree with this and I'd just simplify the test by removing checks for > thread count. VM may create and destroy internal threads when it needs it. > > --alex > > On 05/18/2020 10:31, Daniil Titov wrote: > > Please review the change [1] that fixes an intermittent failure of the test. > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > > > > > From ioi.lam at oracle.com Wed Jun 3 04:34:22 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 2 Jun 2020 21:34:22 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> Message-ID: <211756df-4667-1e76-936d-ca8e72c13bf1@oracle.com> Hi Jiangli, Before we spend time on the CSR review, do you have any data that shows the actual benefit of doing this? I am specifically asking about the benefit to JVMTI agents. As I mentioned before, there's an alternative, which is to not use the optimization when JVMTI is enabled. I don't think we should spend time worrying about the impact to JVMTI agents unless there's a compelling reasons to do so. Thanks - Ioi On 6/2/20 5:19 PM, Jiangli Zhou wrote: > Here is the CSR: https://bugs.openjdk.java.net/browse/JDK-8246289. > > David, I described that the JVM spec allows for eager or lazy linking > and agents shouldn't rely on the timing/ordering, as you suggested. > Please review the CSR. It's been a while since I've worked on a CSR, > could you please remind me if the CSR should be proposed before > reviewing? I can revert it to draft state if draft is the correct > state before reviewing. Thanks! > > Best regards, > Jiangli > From goetz.lindenmaier at sap.com Wed Jun 3 13:14:40 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 3 Jun 2020 13:14:40 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> Message-ID: Hi Ralf, I had a look at your change, webrev.3. Thanks for contributing this! Overall, a nicely engineered piece of work. Thus, my comments are mostly minor details: diagnosticCommand.hpp ok. diagnosticCommand.cpp: l.510 I would be a bit more precise in the comment: ..."9 the slowest level with the best compression." or maybe "strongest compression"? l. 528 I would appreciate if you fixed the existing comment wrt. to the language: // Request a full GC before dumping the heap if _all is false. // This helps reduce the amount of unreachable objects in the dump heapDumper.hpp ok. heapDumper.cpp Error messags is now recorded in _backend. ok. Not overwriting file is moved to FileWriter, ok. I like how you split the existing code with few changes to distribute the work to the thread gang, nice! l.1808 // Now we clear the global variables, so that a future dumper might run. Is "might" correct? Isn't is "can"? l.1819 // Write the file header - we always use 1.0. You lost the ".2" from 1.0.2. heapDumperCompression.hpp Usually, in the include guards, only '/' are replaced by '_'. l.31 Extra whitespace before "implementation". l.36 Initialized --> Initializes Return --> Returns it initialized --> initializes l.119 works --> WriteWorks ... I had to think about this a while to figure it's not a typo of 'work' but names WriteWork instances in short. But the term is used throughout the code, so maybe leave it as-is. l.163 Remove "to". l.165 returns the old --> commits the old ... or the like. l.210 type-o maxiumum heapDumperCompression.cpp It's a bit confusing that the static variable is called gzip_func (referring to a dedicated function), while there is a method load_gzip_func that loads any function from the gzip library. What about gzip_zip_func for the variable? l.113 What's the point of increasing needed_out_size after the call? You increment the pointer? l.125 add "of the": good choice of the buffer sizes CompressionBackend(): The check not to overwrite the inital, first error is in set_error(). ok. l.224 I think the comment should say "write the last remaining partially...." l.400 I had one overall question, which I think is ansered here at least partially: As I understand, writing the dump now needs more buffer memory, as there are several WriteWorks held at the same time. Are they smaller than the buffer used before, so no additional memory is needed, or is there a fallback if only a few can be allocated? Is the fallback implemented here implicitly? Just because if there is no memory for more works, the algorithm uses the ones it could allocate, which might result in some idle threads as there are less works than threads? This makes it more flexible wrt. to available memory than the implementation before, right? l.441 indentation l.458 I can't understand why this variable is named "left". Is this past tense of to leave? Or do you mean the left, filled, side of the buffer? Another question. The basic dumping is done sequential, right? The comression is parallel. Is there a tradeoff in #of threads where the compression is faster than writing? zip_util.c Looks good. I appreciate the precise error message handling you are doing. Could you please add comments that these functions are used for heap dump compression? HprofReader.java ok. Reader.java Should you close in and in2 in case of error? GzipRandomAccess.java l.146 closes -> close l.158 "the the" This file nicely demonstrates how to read the zipped hprof. Maybe you can add a hint in the JBS issue to this file? HeapDumpCompressedTest.java ok. The other Tests: Please merge them all into HeapDumpCompressedTest by using repeated @test comments. You might not be aware this is supported by jtreg. See test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java for an example. It will run each @test block sperately and evaluate the @requires as expected. Best regards, Goetz. -----Original Message----- From: serviceability-dev On Behalf Of Schmelter, Ralf Sent: Montag, 18. Mai 2020 09:23 To: Langer, Christoph Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Christoph, I've updated the webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.3/ The significant changes are moving most of the new compression code to its own file, changing to use a single option (see CSR) called -gz with a mandatory compression level and to load the zlib only once (analog to the new class loader code). Additionally I've removed some long lines. Best regards, Ralf -----Original Message----- From: Langer, Christoph Sent: Friday, 1 May 2020 18:46 To: Schmelter, Ralf Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime ; coleen.phillimore at oracle.com Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, while I'm reviewing your change I think extracting the compression coding to an own file would be a good idea. Maybe you could name it heapDumpCompression.cpp? When looking at the webrev I also figured that there are some very long lines (beyond 90 chars or so). Maybe you could have a look if you could shorten some of them and break a few of these long lines? More detailed review to follow. Best regards Christoph > -----Original Message----- > From: coleen.phillimore at oracle.com > Sent: Montag, 20. April 2020 14:13 > To: Reingruber, Richard ; Schmelter, Ralf > ; Ioi Lam ; Langer, Christoph > ; Yasumasa Suenaga > ; serguei.spitsyn at oracle.com; hotspot- > runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > Hi, I don't want to review this but could you put this new code in its > own file?? heapDumper only needs CompressionBackend to be exported, > from > what I can tell. > > Thanks, > Coleen > > On 4/20/20 6:12 AM, Reingruber, Richard wrote: > > Hi Ralf, > > > >>> 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >>> DumperSupport::end_of_dump() after the last dump segment has been > finished. > >>> You could call get_new_buffer() instead of the if clause. > >> Wow, how did you found this? I've fixed it by making sure we flush the > DumpWriter before calling the deactivate method. > > Spending long hours on the review ;) > > Ok with the fix. > > > >>> ### src/java.base/share/native/libzip/zip_util.c > >>> 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >>> measured the performance gain? In other words: is it worth it? :) > >> This is not done for performance, but to make sure the allocation will not > fail midway during writing the dump. Maybe it is not worth it, though. > > Understood. The heap dump will succeed if you can allocate at least one > WriteWork instance. Without > > that you could get out of memory errors in the zlib which would make the > dump fail. Ok! > > > >> http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.2/ > > Thanks for the clarifications and the changes in the new webrev. > > Webrev.2 looks good to me. > > > > Cheers, Richard. > > > > -----Original Message----- > > From: Schmelter, Ralf > > Sent: Montag, 20. April 2020 10:14 > > To: Reingruber, Richard ; Ioi Lam > ; Langer, Christoph ; > Yasumasa Suenaga ; > serguei.spitsyn at oracle.com; hotspot-runtime-dev at openjdk.java.net > runtime > > Cc: serviceability-dev at openjdk.java.net > > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > > > Hi Richard, > > > > thanks for the review. I have incorporated your remarks into a new > webrev: > > http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.2/ > > > > Some remarks to specific points: > > > >> ### src/hotspot/share/services/heapDumper.cpp > >> 762: assert(_active, "Must be active"); > >> > >> It appears to me that the assertion would fail, if an error occurred creating > the CompressionBackend. > > You are supposed to check for errors after creating the DumpWriter (which > creates the CompressionBackend). And in case of an error, you directly > destruct the object. I've added a comment to make that clear. > > > >> 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >> DumperSupport::end_of_dump() after the last dump segment has been > finished. > >> You could call get_new_buffer() instead of the if clause. > > Wow, how did you found this? I've fixed it by making sure we flush the > DumpWriter before calling the deactivate method. > > > >> 1064: DumpWriter::DumpWriter() > >> > >> There doesn't seem to be enough error handling if _buffer cannot be > allocated. > >> E.g. DumpWriter::write_raw() at line 1091 will enter an endless loop. > > As described above, this will not happen if we check for error after > constructing the DumpWriter. > > > >> ### src/java.base/share/native/libzip/zip_util.c > >> 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >> measured the performance gain? In other words: is it worth it? :) > > This is not done for performance, but to make sure the allocation will not > fail midway during writing the dump. Maybe it is not worth it, though. > > > >> 1655: The result of deflateBound() seems to depend on the header > comment, which is not given > >> here. Could this be an issue, because ZIP_GZip_Fully() can take a > comment? > > I've added a 1024 byte additional bytes to avoid the problem. > > > >> ### test/lib/jdk/test/lib/hprof/parser/Reader.java > >> > >> 93: is the created GzipRandomAccess instance closed somewhere? > > The object is not closed since it is still used by the Snapshot returned. > > > > Best regard, > > Ralf > > > > > > -----Original Message----- > > From: Reingruber, Richard > > Sent: Tuesday, 14 April 2020 10:30 > > To: Schmelter, Ralf ; Ioi Lam > ; Langer, Christoph ; > Yasumasa Suenaga ; > serguei.spitsyn at oracle.com; hotspot-runtime-dev at openjdk.java.net > runtime > > Cc: serviceability-dev at openjdk.java.net > > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > > > Hi Ralf, > > > > thanks for providing this enhancement to parallel gzip-compress heap > dumps! > > > > I reckon it's safe to say that the coding is sophisticated. It would be > awesome if you could sketch > > the idea of how HeapDumper, DumpWriter and CompressionBackend work > together to produce the gzipped > > dump in a source code comment. Just enough to get started if somebody > should ever have to track down > > a bug -- an unlikely event, I know ;) > > > > Please find the details of my review below. > > > > Thanks, Richard. > > // Not Reviewer > > > > -- > > > > ### src/hotspot/share/services/diagnosticCommand.cpp > > > > 510 _gzip_level("-gz-level", "The compression level from 0 (store) to 9 > (best) when writing in gzipped format.", > > 511 "INT", "FALSE", "1") { > > > > "FALSE" should be probably false. > > > > ### src/hotspot/share/services/diagnosticCommand.hpp > > Ok. > > > > ### src/hotspot/share/services/heapDumper.cpp > > > > 390: Typo: initized > > > > 415: Typo: GZipComressor > > > > 477: Could you please add a comment, how the "HPROF BLOCKSIZE" > comment is helpful? > > > > 539: Member variables of WriteWork are missing the '_' prefix. > > > > 546: Just a comment: WriteWork::in_max is actually a compile time > constant. Would be nice if it could be > > declared so. One could use templates for this, but then my favourite ide > (eclipse cdt) doesn't > > show me references and call hierarchies anymore. So I don't think it is > worth it. > > > > 591: Typo: Removes the first element. Returns NULL is empty. > > > > 663: _writer, _compressor, _lock could be const. > > > > 762: assert(_active, "Must be active"); > > > > It appears to me that the assertion would fail, if an error occurred > creating the CompressionBackend. > > > > 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > > DumperSupport::end_of_dump() after the last dump segment has > been finished. > > You could call get_new_buffer() instead of the if clause. > > > > 903: Typo: Check if we don not waste more than _max_waste > > > > 1064: DumpWriter::DumpWriter() > > > > There doesn't seem to be enough error handling if _buffer cannot be > allocated. > > E.g. DumpWriter::write_raw() at line 1091 will enter an endless loop. > > > > 2409: A comment, why Shenandoah is not supported, would be good. > > In general I'd say it is good and natural to use the GC work threads. > > > > ### src/hotspot/share/services/heapDumper.hpp > > Ok. > > > > ### src/java.base/share/native/libzip/zip_util.c > > > > I'm not familiar with zlib, but here are my .02? :) > > > > 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > > measured the performance gain? In other words: is it worth it? :) > > > > 1655: The result of deflateBound() seems to depend on the header > comment, which is not given > > here. Could this be an issue, because ZIP_GZip_Fully() can take a > comment? > > > > 1658: deflateEnd() should not be called if deflateInit2Wrapper() failed. I > think this can lead > > otherwise to a double free() call. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java > > > > 66: Maybe additionally check the exit value? > > > > 73: It's unclear to me, why this fails. Because the dump already exists? > Because the level is > > invalid? Reading the comment I'd expect success, not failure. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestEpsilo > n.java > > Ok. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestShen > andoah.java > > Ok. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestZ.jav > a > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/HprofReader.java > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/Reader.java > > > > 93: is the created GzipRandomAccess instance closed somewhere? From coleen.phillimore at oracle.com Wed Jun 3 15:49:38 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 3 Jun 2020 11:49:38 -0400 Subject: RFR(XS): 8222005: ClassRedefinition crashes with: guarantee(false) failed: OLD and/or OBSOLETE method(s) found In-Reply-To: <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> References: <5942b42c-b9b3-f1d4-6c13-774649fca32b@oracle.com> <2f9aa92c-18f5-1203-1523-3c1fd9ba9ad1@oracle.com> <52ba0f0f-a705-2043-1c1d-15ba4a441aba@oracle.com> <31ca58d7-99ac-c53d-461f-680461fb5698@oracle.com> <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> Message-ID: On 5/28/20 5:44 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > Thank you a lot for reviewing this! > > > On 5/28/20 12:48, coleen.phillimore at oracle.com wrote: >> Hi Serguei, >> Sorry for the delay reviewing this again. >> >> On 5/18/20 3:30 AM, serguei.spitsyn at oracle.com wrote: >>> Hi Coleen and potential reviewers, >>> >>> Now, the webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/ >>> >>> has a complete fix for all three failure modes related to the >>> guarantee about OLD and OBSOLETE methods. >>> >>> The root cause are the following optimizations: >>> >>> ?1) Optimization based on the flag ik->is_being_redefined(): >>> ??? The problem is that the cpcache method entries of such classes >>> are not being adjusted. >>> ??? It is explained below in the initial RFR summary. >>> ??? The fix is to get rid of this optimization. >> >> This seems like a good thing to do even though (actually especially >> because) I can't re-imagine the logic that went into this optimization. > > Probably, I've not explained it well enough. > The logic was that the class marked as is_being_redefined was > considered as being redefined in the current redefinition operation. > For classes redefined in current redefinition the cpcache is empty, so > there is? nothing to adjust. > The problem is that classes can be marked as is_being_redefined by > doit_prologue of one of the following redefinition operations. > In such a case, the VM_RedefineClasses::CheckClass::do_klass fails > with this guarantee. > It is because the VM_RedefineClasses::CheckClass::do_klass does not > have this optimization > and does not skip such classes as the > VM_RedefineClasses::AdjustAndCleanMetadata::do_class. > Without this catch this issue could have unknown consequences in the > future execution far away from the root cause. Yes this makes sense.? Two threads are redefining a set of classes in parallel,? not at a safepoint: t1: class A, B, C => marks them all as is_being_redefined t2: class D, E, F => marks these as is_being_redefined safepoint classes A, B, C are finishing redefinition in doit() so have their Methods replaced, and with is_being_redefine set for D, E, F the optimization was skipping replacing their Methods.? One of these classes D could have had a B::foo() in the vtable or cpCache. crash in the check_classes! > >>> >>> ?2) Optimization for array classes based on the flag >>> _has_redefined_Object. >>> ??? The problem is that the vtable method entries are not adjusted >>> for array classes. >>> ??? The array classes have to be adjusted even if the >>> java.lang.Object was redefined >>> ??? by one of previous VM_RedefineClasses operation, not only if it >>> was redefined in >>> ??? the current VM_RedefineClasses operation. The fix is is follow >>> this requirement. >> >> This I can't understand.? The redefinitions are serialized in >> safepoints, so why would you need to replace vtable entries for >> arrays if java.lang.Object isn't redefined in this safepoint? > The VM_RedefineClasses::CheckClass::do_klass fails with the same > guarantee because of this. > It never fails this way with this optimization relaxed. > I've already broke my head trying to understand it. > It can be because of another bug we don't know yet. Me neither but that's fine.? Remove the optimization! Coleen > >>> >>> ?3) Optimization based on the flag _has_null_class_loader which >>> assumes that the Hotspot >>> ??? does not support delegation from the bootstrap class loader to >>> auser-defined class >>> ? ? loader.The assumption is that if the current class being >>> redefined has a user-defined >>> ??? classloader as its defining class loader, then allclasses loaded >>> by the bootstrap >>> ? ? class loader can be skipped for vtable/itable method entries >>> adjustment. >>> ??? The problem is that this assumption is not really correct. There >>> are classes that >>> ??? still need the adjustment. For instance, the class >>> java.util.IdentityHashMap$KeyIterator >>> ??? loaded by the bootstrap class loader has the vtable/itable >>> references to the method: >>> java.util.Iterator.forEachRemaining(java.util.function.Consumer) >>> ??? The class java.util.Iterator is defined by a user-defined class >>> loader. >>> ??? The fix is to get rid of this optimization. >> >> Also with this optimization, I'm not sure what the logic was that >> determined that this was safe, so it's best to remove it. Above makes >> sense. > > I don't know the full theory behind this optimization. We only have a > comment. > > >>> All three failure modes are observed with the -Xcomp flag. >>> With all three fixes above in place, the Kitchensink does not fail >>> with this guarantee anymore. >> >> >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/src/hotspot/share/oops/cpCache.cpp.udiff.html >> >> For logging, the log_trace function will also repeat the 'if' >> statement and not allocate the external_name() if logging isn't >> specified, so you don't need the 'if' statement above. >> >> + if (log_is_enabled(Trace, redefine, class, update)) { >> + log_trace(redefine, class, update, constantpool) >> + ("cpc %s entry update: %s", entry_type, new_method->external_name()); >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/src/hotspot/share/oops/klassVtable.cpp.udiff.html >> >> Same in two cases here, and you could move the ResourceMark outside >> the loop at the top. > > Good suggestions, taken. > > Thanks! > Serguei > >> >> Thanks, >> Coleen >>> >>> There is still a JIT compiler relted failure: >>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>> ??? Kitchensink fails with: assert(destination == (address)-1 || >>> destination == entry) failed: b) MT-unsafe modification of inline cache >>> >>> I also saw this failure but just once: >>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>> ??? Kitchensink fails with: assert(!method->is_old()) failed: Should >>> not be installing old methods >>> >>> Thanks, >>> Serguei >>> >>> >>> On 5/15/20 15:14, serguei.spitsyn at oracle.com wrote: >>>> Hi Coleen, >>>> >>>> Thanks a lot for review! >>>> Good suggestion, will use it. >>>> >>>> In fact, I've found two more related problems with the same guarantee. >>>> One is with vtable method entries adjustment and another with itable. >>>> This webrev version includes a fix for the vtable related issue: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/ >>>> >>>> I'm still investigating the itable related issue. >>>> >>>> It is interesting that the Kitchensink with Instrumentation modules >>>> enabled is like a Pandora box full of surprises. >>>> New problems are getting discovered after some road blocks are removed. >>>> I've just filed a couple of compiler bugs discovered in this mode >>>> of testing: >>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>> ??? Kitchensink fails with: assert(!method->is_old()) failed: >>>> Should not be installing old methods >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>> ??? Kitchensink fails with: assert(destination == (address)-1 || >>>> destination == entry) failed: b) MT-unsafe modification of inline cache >>>> >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 5/15/20 05:12, coleen.phillimore at oracle.com wrote: >>>>> >>>>> Serguei, >>>>> >>>>> Good find!!? The fix looks good.? I'm sure the optimization wasn't >>>>> noticeable and thank you for the additional comments. >>>>> >>>>> There is a Method::external_name() function that I believe prints >>>>> all the things you want in the logging here: >>>>> >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.1/src/hotspot/share/oops/cpCache.cpp.udiff.html >>>>> >>>>> I don't need to see another webrev if you make this change. >>>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>>> On 5/14/20 12:26 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Please, review a fix for The Kitchensink bug: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8222005 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.1/ >>>>>> >>>>>> Summary: >>>>>> ? The VM_RedefineClasses::doit() uses two helper classes to walk >>>>>> all VM classes. >>>>>> ? First is AdjustAndCleanMetadata to adjust method entries in the >>>>>> vtables/itables/cpcaches. >>>>>> ? Second is CheckClass to check that adjustments for all method >>>>>> entries are correct. >>>>>> ? The Kitchensink test is failing with two modes: >>>>>> ??? - guarantee(false) failed: OLD and/or OBSOLETE method(s) >>>>>> found in the >>>>>> ????? VM_RedefineClasses::CheckClass::do_klass() >>>>>> ??? - SIGSEGV in the >>>>>> ConstantPoolCacheEntry::get_interesting_method_entry() in context >>>>>> ????? of VM_RedefineClasses::CheckClass::do_klass() execution >>>>>> >>>>>> ? The second failure mode is rare. In is before the first one in >>>>>> the code path. >>>>>> ? The root cause of both is that the >>>>>> VM_RedefineClasses::AdjustAndCleanMetadata::do_klass() >>>>>> ? is skipping the cpcache update for classes that are being >>>>>> redefined assuming they are >>>>>> ? being redefined by the current VM_RedefineClasses operation. In >>>>>> such cases, the adjustment >>>>>> ? is not needed as the cpcache is empty. The problem is that the >>>>>> assumption above is wrong. >>>>>> ? The class can also be redefined by another VM_RedefineClasses >>>>>> operation which has already >>>>>> ? executed its doit_prologue. The cpcache djustment for such >>>>>> class is necessary. >>>>>> ? The fix is to always call the cp_cache->adjust_method_entries() >>>>>> even if the class is >>>>>> ? being redefined by the current VM_RedefineClasses operation. It >>>>>> is possible to skip it >>>>>> ? but it will add extra complexity to the code. >>>>>> ? The fix also includes minor tweak in the cpCache.cpp to include >>>>>> method's class name to >>>>>> ? the redefinition cpcache log. >>>>>> >>>>>> Testing: >>>>>> ? Ran Kitchensink test locally on a Linux server with the >>>>>> Instrumentation module enabled. >>>>>> ? The test does not fail anymore. >>>>>> ? In progress, a mach5 tiers 1-5 and runs and separate mach5 >>>>>> Kitchensink run. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleen.phillimore at oracle.com Wed Jun 3 15:50:08 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 3 Jun 2020 11:50:08 -0400 Subject: RFR(XS): 8222005: ClassRedefinition crashes with: guarantee(false) failed: OLD and/or OBSOLETE method(s) found In-Reply-To: <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> References: <5942b42c-b9b3-f1d4-6c13-774649fca32b@oracle.com> <2f9aa92c-18f5-1203-1523-3c1fd9ba9ad1@oracle.com> <52ba0f0f-a705-2043-1c1d-15ba4a441aba@oracle.com> <31ca58d7-99ac-c53d-461f-680461fb5698@oracle.com> <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> Message-ID: <94e924e7-bbb6-5686-4cac-abd7e7d57f2e@oracle.com> Hi Serguei, This change looks great.? Thank you for fixing this! Coleen On 5/28/20 7:16 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > The updated webrev version is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.3/ > > It has your suggestions addressed: > ?- remove log_is_enabled conditions > ?- move ResourceMark's out of loops > > Thanks, > Serguei > > > On 5/28/20 14:44, serguei.spitsyn at oracle.com wrote: >> Hi Coleen, >> >> Thank you a lot for reviewing this! >> >> >> On 5/28/20 12:48, coleen.phillimore at oracle.com wrote: >>> Hi Serguei, >>> Sorry for the delay reviewing this again. >>> >>> On 5/18/20 3:30 AM, serguei.spitsyn at oracle.com wrote: >>>> Hi Coleen and potential reviewers, >>>> >>>> Now, the webrev: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/ >>>> >>>> has a complete fix for all three failure modes related to the >>>> guarantee about OLD and OBSOLETE methods. >>>> >>>> The root cause are the following optimizations: >>>> >>>> ?1) Optimization based on the flag ik->is_being_redefined(): >>>> ??? The problem is that the cpcache method entries of such classes >>>> are not being adjusted. >>>> ??? It is explained below in the initial RFR summary. >>>> ??? The fix is to get rid of this optimization. >>> >>> This seems like a good thing to do even though (actually especially >>> because) I can't re-imagine the logic that went into this optimization. >> >> Probably, I've not explained it well enough. >> The logic was that the class marked as is_being_redefined was >> considered as being redefined in the current redefinition operation. >> For classes redefined in current redefinition the cpcache is empty, >> so there is? nothing to adjust. >> The problem is that classes can be marked as is_being_redefined by >> doit_prologue of one of the following redefinition operations. >> In such a case, the VM_RedefineClasses::CheckClass::do_klass fails >> with this guarantee. >> It is because the VM_RedefineClasses::CheckClass::do_klass does not >> have this optimization >> and does not skip such classes as the >> VM_RedefineClasses::AdjustAndCleanMetadata::do_class. >> Without this catch this issue could have unknown consequences in the >> future execution far away from the root cause. >> >>>> >>>> ?2) Optimization for array classes based on the flag >>>> _has_redefined_Object. >>>> ??? The problem is that the vtable method entries are not adjusted >>>> for array classes. >>>> ??? The array classes have to be adjusted even if the >>>> java.lang.Object was redefined >>>> ??? by one of previous VM_RedefineClasses operation, not only if it >>>> was redefined in >>>> ??? the current VM_RedefineClasses operation. The fix is is follow >>>> this requirement. >>> >>> This I can't understand.? The redefinitions are serialized in >>> safepoints, so why would you need to replace vtable entries for >>> arrays if java.lang.Object isn't redefined in this safepoint? >> The VM_RedefineClasses::CheckClass::do_klass fails with the same >> guarantee because of this. >> It never fails this way with this optimization relaxed. >> I've already broke my head trying to understand it. >> It can be because of another bug we don't know yet. >> >>>> >>>> ?3) Optimization based on the flag _has_null_class_loader which >>>> assumes that the Hotspot >>>> ??? does not support delegation from the bootstrap class loader to >>>> auser-defined class >>>> ? ? loader.The assumption is that if the current class being >>>> redefined has a user-defined >>>> ??? classloader as its defining class loader, then allclasses >>>> loaded by the bootstrap >>>> ? ? class loader can be skipped for vtable/itable method entries >>>> adjustment. >>>> ??? The problem is that this assumption is not really correct. >>>> There are classes that >>>> ??? still need the adjustment. For instance, the class >>>> java.util.IdentityHashMap$KeyIterator >>>> ??? loaded by the bootstrap class loader has the vtable/itable >>>> references to the method: >>>> java.util.Iterator.forEachRemaining(java.util.function.Consumer) >>>> ??? The class java.util.Iterator is defined by a user-defined class >>>> loader. >>>> ??? The fix is to get rid of this optimization. >>> >>> Also with this optimization, I'm not sure what the logic was that >>> determined that this was safe, so it's best to remove it.? Above >>> makes sense. >> >> I don't know the full theory behind this optimization. We only have a >> comment. >> >> >>>> All three failure modes are observed with the -Xcomp flag. >>>> With all three fixes above in place, the Kitchensink does not fail >>>> with this guarantee anymore. >>> >>> >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/src/hotspot/share/oops/cpCache.cpp.udiff.html >>> >>> For logging, the log_trace function will also repeat the 'if' >>> statement and not allocate the external_name() if logging isn't >>> specified, so you don't need the 'if' statement above. >>> >>> + if (log_is_enabled(Trace, redefine, class, update)) { >>> + log_trace(redefine, class, update, constantpool) >>> + ("cpc %s entry update: %s", entry_type, new_method->external_name()); >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/src/hotspot/share/oops/klassVtable.cpp.udiff.html >>> >>> Same in two cases here, and you could move the ResourceMark outside >>> the loop at the top. >> >> Good suggestions, taken. >> >> Thanks! >> Serguei >> >>> >>> Thanks, >>> Coleen >>>> >>>> There is still a JIT compiler relted failure: >>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>> ??? Kitchensink fails with: assert(destination == (address)-1 || >>>> destination == entry) failed: b) MT-unsafe modification of inline cache >>>> >>>> I also saw this failure but just once: >>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>> ??? Kitchensink fails with: assert(!method->is_old()) failed: >>>> Should not be installing old methods >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 5/15/20 15:14, serguei.spitsyn at oracle.com wrote: >>>>> Hi Coleen, >>>>> >>>>> Thanks a lot for review! >>>>> Good suggestion, will use it. >>>>> >>>>> In fact, I've found two more related problems with the same guarantee. >>>>> One is with vtable method entries adjustment and another with itable. >>>>> This webrev version includes a fix for the vtable related issue: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/ >>>>> >>>>> I'm still investigating the itable related issue. >>>>> >>>>> It is interesting that the Kitchensink with Instrumentation >>>>> modules enabled is like a Pandora box full of surprises. >>>>> New problems are getting discovered after some road blocks are >>>>> removed. >>>>> I've just filed a couple of compiler bugs discovered in this mode >>>>> of testing: >>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>> ??? Kitchensink fails with: assert(!method->is_old()) failed: >>>>> Should not be installing old methods >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>>> ??? Kitchensink fails with: assert(destination == (address)-1 || >>>>> destination == entry) failed: b) MT-unsafe modification of inline >>>>> cache >>>>> >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 5/15/20 05:12, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> Serguei, >>>>>> >>>>>> Good find!!? The fix looks good.? I'm sure the optimization >>>>>> wasn't noticeable and thank you for the additional comments. >>>>>> >>>>>> There is a Method::external_name() function that I believe prints >>>>>> all the things you want in the logging here: >>>>>> >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.1/src/hotspot/share/oops/cpCache.cpp.udiff.html >>>>>> >>>>>> I don't need to see another webrev if you make this change. >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>>> >>>>>> On 5/14/20 12:26 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Please, review a fix for The Kitchensink bug: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222005 >>>>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.1/ >>>>>>> >>>>>>> Summary: >>>>>>> ? The VM_RedefineClasses::doit() uses two helper classes to walk >>>>>>> all VM classes. >>>>>>> ? First is AdjustAndCleanMetadata to adjust method entries in >>>>>>> the vtables/itables/cpcaches. >>>>>>> ? Second is CheckClass to check that adjustments for all method >>>>>>> entries are correct. >>>>>>> ? The Kitchensink test is failing with two modes: >>>>>>> ??? - guarantee(false) failed: OLD and/or OBSOLETE method(s) >>>>>>> found in the >>>>>>> ????? VM_RedefineClasses::CheckClass::do_klass() >>>>>>> ??? - SIGSEGV in the >>>>>>> ConstantPoolCacheEntry::get_interesting_method_entry() in context >>>>>>> ????? of VM_RedefineClasses::CheckClass::do_klass() execution >>>>>>> >>>>>>> ? The second failure mode is rare. In is before the first one in >>>>>>> the code path. >>>>>>> ? The root cause of both is that the >>>>>>> VM_RedefineClasses::AdjustAndCleanMetadata::do_klass() >>>>>>> ? is skipping the cpcache update for classes that are being >>>>>>> redefined assuming they are >>>>>>> ? being redefined by the current VM_RedefineClasses operation. >>>>>>> In such cases, the adjustment >>>>>>> ? is not needed as the cpcache is empty. The problem is that the >>>>>>> assumption above is wrong. >>>>>>> ? The class can also be redefined by another VM_RedefineClasses >>>>>>> operation which has already >>>>>>> ? executed its doit_prologue. The cpcache djustment for such >>>>>>> class is necessary. >>>>>>> ? The fix is to always call the >>>>>>> cp_cache->adjust_method_entries() even if the class is >>>>>>> ? being redefined by the current VM_RedefineClasses operation. >>>>>>> It is possible to skip it >>>>>>> ? but it will add extra complexity to the code. >>>>>>> ? The fix also includes minor tweak in the cpCache.cpp to >>>>>>> include method's class name to >>>>>>> ? the redefinition cpcache log. >>>>>>> >>>>>>> Testing: >>>>>>> ? Ran Kitchensink test locally on a Linux server with the >>>>>>> Instrumentation module enabled. >>>>>>> ? The test does not fail anymore. >>>>>>> ? In progress, a mach5 tiers 1-5 and runs and separate mach5 >>>>>>> Kitchensink run. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jianglizhou at google.com Wed Jun 3 16:34:28 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 3 Jun 2020 09:34:28 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <211756df-4667-1e76-936d-ca8e72c13bf1@oracle.com> References: <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> <211756df-4667-1e76-936d-ca8e72c13bf1@oracle.com> Message-ID: Hi Ioi, Monitoring agents are alway enabled in cloud production environments. The costs for agents are constant and always exist. The main motivation for the CDS work during the last several years was for cloud environments. Could you please explain why you think CDS should not be used for startup saving with JVMTI agents in cloud? Or, this and related future optimizations should not be enabled in that case? Majority of the Java startup improvement since OpenJDK 9 was achieved by small incremental improvements. Each such change has been a small saving only. Some of them were small enough and only measurable by instruction counts. However they were all worth the work and have been submitted to OpenJDK. As a result, we are seeing a good total startup improvement today with CDS enabled. This change is no exception. Even the saving is small, but it still should be done. Although I don't have data with agent enabled, I have provided performance data for before and after the change since the very beginning. In addition, I have also explained a few times that this change enables future optimizations for more general class pre-initialization approach. This is an important step for future work. So doing it right is crucial. Regards, Jiangli On Tue, Jun 2, 2020 at 9:34 PM Ioi Lam wrote: > > Hi Jiangli, > > Before we spend time on the CSR review, do you have any data that shows > the actual benefit of doing this? I am specifically asking about the > benefit to JVMTI agents. > > As I mentioned before, there's an alternative, which is to not use the > optimization when JVMTI is enabled. I don't think we should spend time > worrying about the impact to JVMTI agents unless there's a compelling > reasons to do so. > > Thanks > - Ioi > > > > On 6/2/20 5:19 PM, Jiangli Zhou wrote: > > Here is the CSR: https://bugs.openjdk.java.net/browse/JDK-8246289. > > > > David, I described that the JVM spec allows for eager or lazy linking > > and agents shouldn't rely on the timing/ordering, as you suggested. > > Please review the CSR. It's been a while since I've worked on a CSR, > > could you please remind me if the CSR should be proposed before > > reviewing? I can revert it to draft state if draft is the correct > > state before reviewing. Thanks! > > > > Best regards, > > Jiangli > > > From serguei.spitsyn at oracle.com Wed Jun 3 16:52:28 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 09:52:28 -0700 Subject: RFR(XS): 8222005: ClassRedefinition crashes with: guarantee(false) failed: OLD and/or OBSOLETE method(s) found In-Reply-To: <94e924e7-bbb6-5686-4cac-abd7e7d57f2e@oracle.com> References: <5942b42c-b9b3-f1d4-6c13-774649fca32b@oracle.com> <2f9aa92c-18f5-1203-1523-3c1fd9ba9ad1@oracle.com> <52ba0f0f-a705-2043-1c1d-15ba4a441aba@oracle.com> <31ca58d7-99ac-c53d-461f-680461fb5698@oracle.com> <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> <94e924e7-bbb6-5686-4cac-abd7e7d57f2e@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Jun 3 17:06:59 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 10:06:59 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Wed Jun 3 18:40:10 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 3 Jun 2020 11:40:10 -0700 Subject: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently In-Reply-To: <814ceeca-d700-500e-de82-7d5fd3c05192@oracle.com> References: <814ceeca-d700-500e-de82-7d5fd3c05192@oracle.com> Message-ID: <9ded6af1-d7fc-775a-eb66-b66f3147186e@oracle.com> An HTML attachment was scrubbed... URL: From ioi.lam at oracle.com Wed Jun 3 18:46:02 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 3 Jun 2020 11:46:02 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> <211756df-4667-1e76-936d-ca8e72c13bf1@oracle.com> Message-ID: Hi Jiangli, My point is, if we are introducing a behavioral change in an optimization, it should be carefully reviewed. That's a rule we have followed all along. Can we first push your change without the JVMTI behavioral change, and discuss the JVMTI behavioral change separately? If I understand your proposal correctly, in this RFE, we will first set boot classes to "linked" during restore_unsharable_info(). Subsequently, we will do the same thing for other classes. And after that, we will set classes to 'initialized' during restore_unsharable_info(). If that's the proposal, then we will be introducing more and more behavioral changes that are not only observable by JVMTI but also by Java code. I think we should discuss all these together to see if this is indeed the direction we want to take. Project Leyden is the right forum. Thanks - Ioi On 6/3/20 9:34 AM, Jiangli Zhou wrote: > Hi Ioi, > > Monitoring agents are alway enabled in cloud production environments. > The costs for agents are constant and always exist. The main > motivation for the CDS work during the last several years was for > cloud environments. Could you please explain why you think CDS should > not be used for startup saving with JVMTI agents in cloud? Or, this > and related future optimizations should not be enabled in that case? > > Majority of the Java startup improvement since OpenJDK 9 was achieved > by small incremental improvements. Each such change has been a small > saving only. Some of them were small enough and only measurable by > instruction counts. However they were all worth the work and have been > submitted to OpenJDK. As a result, we are seeing a good total startup > improvement today with CDS enabled. > > This change is no exception. Even the saving is small, but it still > should be done. Although I don't have data with agent enabled, I have > provided performance data for before and after the change since the > very beginning. In addition, I have also explained a few times that > this change enables future optimizations for more general class > pre-initialization approach. This is an important step for future > work. So doing it right is crucial. > > Regards, > Jiangli > > On Tue, Jun 2, 2020 at 9:34 PM Ioi Lam wrote: >> Hi Jiangli, >> >> Before we spend time on the CSR review, do you have any data that shows >> the actual benefit of doing this? I am specifically asking about the >> benefit to JVMTI agents. >> >> As I mentioned before, there's an alternative, which is to not use the >> optimization when JVMTI is enabled. I don't think we should spend time >> worrying about the impact to JVMTI agents unless there's a compelling >> reasons to do so. >> >> Thanks >> - Ioi >> >> >> >> On 6/2/20 5:19 PM, Jiangli Zhou wrote: >>> Here is the CSR: https://bugs.openjdk.java.net/browse/JDK-8246289. >>> >>> David, I described that the JVM spec allows for eager or lazy linking >>> and agents shouldn't rely on the timing/ordering, as you suggested. >>> Please review the CSR. It's been a while since I've worked on a CSR, >>> could you please remind me if the CSR should be proposed before >>> reviewing? I can revert it to draft state if draft is the correct >>> state before reviewing. Thanks! >>> >>> Best regards, >>> Jiangli >>> From daniil.x.titov at oracle.com Wed Jun 3 19:08:14 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 03 Jun 2020 12:08:14 -0700 Subject: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently In-Reply-To: <9ded6af1-d7fc-775a-eb66-b66f3147186e@oracle.com> References: <814ceeca-d700-500e-de82-7d5fd3c05192@oracle.com> <9ded6af1-d7fc-775a-eb66-b66f3147186e@oracle.com> Message-ID: Hi Chris, I was not able to reproduce the original issue anymore in Mach5. However, the test itself has a potential for a deadlock (that was also reported) and in the proposed change we fix it. ?The log still should be printed and the expectation is that we will be able to see the underlying problem in it if it ever reproduced. I could create a separate bug ( not sure if the subtask is a good fit here since the change fixes some problem in the test ) and close the current one as not reproducible if you think it is a better approach. Regarding Thread.suspend() and Thread.resume() methods the test also checks the thread state after these methods are invoked ?and since these deprecated methods are still ?in API I don?t think we should exclude them from being tested. Best regards, Daniil From: Chris Plummer Date: Wednesday, June 3, 2020 at 11:40 AM To: David Holmes , Daniil Titov , serviceability-dev Subject: Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently On 6/1/20 12:10 AM, David Holmes wrote: Hi Daniil, On 30/05/2020 10:07 am, Daniil Titov wrote: Please review a change [1] that fixes an intermittent test timeout. The main logic of the test has this basic structure: try { // lots of thread state manipulation of target } finally { thread.getLog(); } and as David noticed in his comment ( the last comment in [2] ) if an exception occurs anywhere in the try block we can hang waiting for the join() in getLog() because we haven't executed the logic that tells the thread to terminate. So the fix puts a timeout on the join() which means the test will no longer timeout but it will still fail when whatever was leading to the timeout now happens. So as a diagnostic fix this seems fine. Hopefully the logger will show what we need to see and determine the real underlying problem. If this change is really just diagnostic in nature, then it should be a subtask. However, it seems to me it will actually hide the failure. The test won't get a timeout and won't print the log. Am I missing something? Also, after reading through the bug comments it looks like the getLog()/join() timeout issue is different from the main issue that caused the CR to be filed in the first place. Comments regarding the initial problem are: "According to the stack trace the test seems to hang on trying to load the 'java.lang.Math' class concurrently. " "Need to see some native stacks to understand why the classloading thread is not proceeding even though RUNNABLE." "I should have looked at the test first - it uses Thread.suspend and Thread.resume and so is inherently deadlock prone." Does this issue no longer exist, or have we decided that since the test is expected to be deadlock prone to just ignore it. thanks, Chris Thanks, David ----- Testing: Running a modified test that explicitly throws a runtime exception inside the try block shows the fix solves the problem. Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are in progress. [1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/ [2] https://bugs.openjdk.java.net/browse/JDK-8081652 Thank you, Daniil -------------- next part -------------- An HTML attachment was scrubbed... URL: From jianglizhou at google.com Wed Jun 3 19:29:49 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Wed, 3 Jun 2020 12:29:49 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <91767eae-4486-8ceb-5452-c6afa4178af1@oracle.com> <211756df-4667-1e76-936d-ca8e72c13bf1@oracle.com> Message-ID: On Wed, Jun 3, 2020 at 11:48 AM Ioi Lam wrote: > > Hi Jiangli, > > My point is, if we are introducing a behavioral change in an > optimization, it should be carefully reviewed. That's a rule we have > followed all along. > Totally in agreement. It's another reason why David's suggestion of creating the CSR is a great idea. Please see my analysis about boot class loading in the previous email. > Can we first push your change without the JVMTI behavioral change, and > discuss the JVMTI behavioral change separately? > I've waited for half-year already, I can wait for a few more days. Let's get this class_prepare event sorted out first, so we set the right tone for future optimizations. > If I understand your proposal correctly, in this RFE, we will first set > boot classes to "linked" during restore_unsharable_info(). Subsequently, > we will do the same thing for other classes. And after that, we will set > classes to 'initialized' during restore_unsharable_info(). That's what I think would be the right thing to do, and plan to do so. Let's focus on the 'linked' state for the archived boot classes in the current scope, otherwise we will not be able to make any meaningful progress. > > If that's the proposal, then we will be introducing more and more > behavioral changes that are not only observable by JVMTI but also by > Java code. I think we should discuss all these together to see if this > is indeed the direction we want to take. Project Leyden is the right forum. Sorry that I'm reiterating myself, the class-preinitization should be discussed as part of the Leyden. The 'linked' state for the archived boot classes can be discussed and moved forward now. Thanks! Jiangli > > Thanks > - Ioi > > On 6/3/20 9:34 AM, Jiangli Zhou wrote: > > Hi Ioi, > > > > Monitoring agents are alway enabled in cloud production environments. > > The costs for agents are constant and always exist. The main > > motivation for the CDS work during the last several years was for > > cloud environments. Could you please explain why you think CDS should > > not be used for startup saving with JVMTI agents in cloud? Or, this > > and related future optimizations should not be enabled in that case? > > > > Majority of the Java startup improvement since OpenJDK 9 was achieved > > by small incremental improvements. Each such change has been a small > > saving only. Some of them were small enough and only measurable by > > instruction counts. However they were all worth the work and have been > > submitted to OpenJDK. As a result, we are seeing a good total startup > > improvement today with CDS enabled. > > > > This change is no exception. Even the saving is small, but it still > > should be done. Although I don't have data with agent enabled, I have > > provided performance data for before and after the change since the > > very beginning. In addition, I have also explained a few times that > > this change enables future optimizations for more general class > > pre-initialization approach. This is an important step for future > > work. So doing it right is crucial. > > > > Regards, > > Jiangli > > > > On Tue, Jun 2, 2020 at 9:34 PM Ioi Lam wrote: > >> Hi Jiangli, > >> > >> Before we spend time on the CSR review, do you have any data that shows > >> the actual benefit of doing this? I am specifically asking about the > >> benefit to JVMTI agents. > >> > >> As I mentioned before, there's an alternative, which is to not use the > >> optimization when JVMTI is enabled. I don't think we should spend time > >> worrying about the impact to JVMTI agents unless there's a compelling > >> reasons to do so. > >> > >> Thanks > >> - Ioi > >> > >> > >> > >> On 6/2/20 5:19 PM, Jiangli Zhou wrote: > >>> Here is the CSR: https://bugs.openjdk.java.net/browse/JDK-8246289. > >>> > >>> David, I described that the JVM spec allows for eager or lazy linking > >>> and agents shouldn't rely on the timing/ordering, as you suggested. > >>> Please review the CSR. It's been a while since I've worked on a CSR, > >>> could you please remind me if the CSR should be proposed before > >>> reviewing? I can revert it to draft state if draft is the correct > >>> state before reviewing. Thanks! > >>> > >>> Best regards, > >>> Jiangli > >>> > From chris.plummer at oracle.com Wed Jun 3 19:46:14 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 3 Jun 2020 12:46:14 -0700 Subject: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently In-Reply-To: References: <814ceeca-d700-500e-de82-7d5fd3c05192@oracle.com> <9ded6af1-d7fc-775a-eb66-b66f3147186e@oracle.com> Message-ID: <2aed6958-0f12-f632-ebaa-80ddb732b5ec@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Jun 3 20:41:32 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 13:41:32 -0700 Subject: RFR(XS): 8234882: JVM TI StopThread should only allow ThreadDeath In-Reply-To: <1190375b-d7da-47c4-61d4-121f4d0ba33a@oracle.com> References: <12cd04f9-c3f9-654f-fff2-1c4e315b6eeb@oracle.com> <3feb9c3f-4f61-f4b7-160f-c6b328305111@oracle.com> <40f21609-f086-722a-1af4-3f281c9b8963@oracle.com> <7b272791-4c47-27b0-9313-391a9e620295@oracle.com> <38db06ac-6e4e-029a-9376-ee577afe64d7@oracle.com> <2ce42985-9325-1c74-fa8d-c2a5049ec011@oracle.com> <0f1ec272-4410-f7e5-1c11-1238c0079b00@oracle.com> <3120b170-8d0f-7915-7224-f44523bdae6e@oracle.com> <586c3878-d175-2f8e-6ce8-95a187965de6@oracle.com> <2586bb75-f560-f905-1937-b778b7faba59@oracle.com> <6ebc70ce-787d-7f13-66f4-14ad8c8102d6@oracle.com> <25f4a64a-10ca-2695-6748-ccd24d84ef22@oracle.com> <1190375b-d7da-47c4-61d4-121f4d0ba33a@oracle.com> Message-ID: <5acfaf13-b3c1-d854-28a7-378e0bb5926e@oracle.com> Hi David, The JetBrains confirmed: ? Ability to select the exception is a valuable feature they provide. ? Throwing only ThreadDeath is almost useless. So, should I close this and related JDI/JDWP enhancements as WNF? Thanks, Serguei On 6/1/20 08:30, serguei.spitsyn at oracle.com wrote: > Hi David, > > I'll check with JetBrains on this. > Thank you to Dan and you for raising this concern. > The JetBrains use case you posted in the CSR looks like valid and useful. > > Thanks, > Serguei > > > On 6/1/20 00:46, David Holmes wrote: >> Hi Serguei, >> >> Sorry, I think we have to re-think this change. As Dan flags in the >> CSR request debuggers directly expose this API as part of the >> debugger interface, so any change here will directly impact those >> tools. At a minimum I think we would need to consult with the tool >> developers about the impact of making this change, as well as whether >> it makes any practical difference in the sense that there may be >> other (less convenient but still available) mechanisms to achieve the >> same goal in a debugger or agent. >> >> David >> >> On 31/05/2020 5:50 pm, serguei.spitsyn at oracle.com wrote: >>> Hi David, >>> >>> Also jumping to end. >>> >>> On 5/30/20 06:50, David Holmes wrote: >>>> Hi Serguei, >>>> >>>> Jumping to the end for now ... >>>> >>>> On 30/05/2020 5:50 am, serguei.spitsyn at oracle.com wrote: >>>>> Hi David and reviewers, >>>>> >>>>> The updated webrev version is: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/src/ >>>>> >>>>> >>>>> This update adds testing that StopThread can return >>>>> JVMTI_ERROR_INVALID_OBJECT error code. >>>>> >>>>> The JVM TI StopThread spec is: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/docs/specs/jvmti.html#StopThread >>>>> >>>>> >>>>> >>>>> There is a couple of comments below. >>>>> >>>>> >>>>> On 5/29/20 06:18, David Holmes wrote: >>>>>> On 29/05/2020 6:24 pm, serguei.spitsyn at oracle.com wrote: >>>>>>> On 5/29/20 00:56, serguei.spitsyn at oracle.com wrote: >>>>>>>> On 5/29/20 00:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> Thank you for reviewing this! >>>>>>>>> >>>>>>>>> On 5/28/20 23:57, David Holmes wrote: >>>>>>>>>> Hi Serguei, >>>>>>>>>> >>>>>>>>>> On 28/05/2020 3:12 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> I've updated the CSR and webrev in place. >>>>>>>>>>> >>>>>>>>>>> The changes are: >>>>>>>>>>> ??- addressed David's suggestion to rephrase StopThread >>>>>>>>>>> description change >>>>>>>>>>> ??- replaced JVMTI_ERROR_INVALID_OBJECT with >>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>> ??- updated the implementation in jvmtiEnv.cpp to return >>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>> ??- updated one of the nsk.jvmti StopThread tests to check >>>>>>>>>>> error case with the JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm reposting the links for convenience. >>>>>>>>>>> >>>>>>>>>>> Enhancement: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>> >>>>>>>>>>> CSR draft: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>> >>>>>>>>>> Spec updates are good - thanks. >>>>>>>>> >>>>>>>>> Thank you for the CSR review. >>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>>>>>> >>>>>>>>>> The ThreadDeath check is fine but I'm a bit confused about >>>>>>>>>> the additional null check that leads to >>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT. I can't see how >>>>>>>>>> resolve_external_guard can return NULL when not passed in >>>>>>>>>> NULL. Nor why that would result in JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>> rather than JVMTI_ERROR_NULL_POINTER. And I note >>>>>>>>>> JVMTI_ERROR_NULL_POINTER is not even a listed error for >>>>>>>>>> StopThread! This part of the change seems unrelated to this >>>>>>>>>> issue. >>>>>>>>> >>>>>>>>> I was also surprised with the JVMTI_ERROR_NULL_POINTER and >>>>>>>>> JVMTI_ERROR_INVALID_OBJECT error codes. >>>>>>>>> The JVM TI spec automatic generation adds these two error >>>>>>>>> codes for a jobject parameter. >>>>>>>>> >>>>>>>>> Also, they both are from the Universal Errors section: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#universal-error >>>>>>>>> >>>>>>>>> >>>>>>>>> You can find a link to this section at the start of the Error >>>>>>>>> section: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>> >>>>>>>>> >>>>>>>>> My understanding (not sure, it is right) is that NULL has to >>>>>>>>> be reported with JVMTI_ERROR_NULL_POINTER and a bad >>>>>>>>> jobject (for instance, a WeakReference with a GC-ed target) >>>>>>>>> has to be reported with JVMTI_ERROR_INVALID_OBJECT. >>>>>>>>> At least, I was not able to construct a test case to get this >>>>>>>>> error code returned. >>>>>>>>> So, I'm puzzled with this. I'll try to find some examples with >>>>>>>>> JVMTI_ERROR_NULL_POINTER errors. >>>>>>>> >>>>>>>> Found the explanation. >>>>>>>> The JDI file: >>>>>>>> src/jdk.jdi/share/classes/com/sun/tools/jdi/JDWPException.java >>>>>>>> >>>>>>>> has a fragment that translate the INVALID_OBJECT error to the >>>>>>>> ObjectCollectedException: >>>>>>>> ??? RuntimeException toJDIException() { >>>>>>>> ??????? switch (errorCode) { >>>>>>>> ??????????? case JDWP.Error.INVALID_OBJECT: >>>>>>>> ??????????????? return new ObjectCollectedException(); >>>>>>>> >>>>>>>> So, the INVALID_OBJECT is for a jobject handle that is >>>>>>>> referencing a collected object. >>>>>>>> It means that previous implementation incorrectly returned >>>>>>>> JVMTI_ERROR_NULL_POINTER error code. >>>>>>> >>>>>>> I should create and delete local or global ref to construct a >>>>>>> test case for this. >>>>>>> >>>>>>> Interesting that the JDWPException::toJDIException() does not >>>>>>> convert the ILLEGAL_ARGUMENT error code to an >>>>>>> IllegalArgumentException. >>>>>>> I've just added this conversion. >>>>>> >>>>>> Given the definition of JDWP INVALID_OBJECT then obviously JDI >>>>>> converts it to ObjectCollectedException. >>>>>> >>>>>> So reading further in JNI spec: >>>>>> >>>>>> "Weak global references are a special kind of global reference. >>>>>> Unlike normal global references, a weak global reference allows >>>>>> the underlying Java object to be garbage collected. Weak global >>>>>> references may be used in any situation where global or local >>>>>> references are used." >>>>>> >>>>>> So it seems that any function that takes a jobject cxould in fact >>>>>> accept a jweak, in which case JVMTI_ERROR_INVALID_OBJECT is a >>>>>> possibility in all cases. So IIUC >>>>>> JNIHandles::resolve_external_guard can return NULL if a weak >>>>>> reference has been collected. So the new code you propose seems >>>>>> correct. >>>>> >>>>> You are right about weak global references. >>>>> I was able to construct a test case for JVMTI_ERROR_INVALID_OBJECT. >>>>> The JNI NewGlobalRef and DeleteGlobalRef are used for it. >>>>> You can find it in the updated webrev version. >>>>> >>>>>> However, this still is unrelated to the current issue and I do >>>>>> not see other JVM TI doing checks for this case. So this seems to >>>>>> be a much broader issue. >>>>> There are many such checks in JVM TI. >>>>> For instance, there are checks like the following in jvmtiEnv.cpp: >>>>> NULL_CHECK(o, JVMTI_ERROR_INVALID_OBJECT) >>>> >>>> Yes but they are incorrect IMO e.g. >>>> >>>> JvmtiEnv::GetObjectSize(jobject object, jlong* size_ptr) { >>>> ? oop mirror = JNIHandles::resolve_external_guard(object); >>>> ? NULL_CHECK(mirror, JVMTI_ERROR_INVALID_OBJECT); >>>> >>>> The NULL_CHECK will fail if either object is NULL or object is a >>>> jweak that has been cleared. In the first case it should report >>>> JVMTI_ERROR_NULL_POINTER. >>>> >>>> The correct pattern is what you proposed with this fix: >>>> >>>> +?? NULL_CHECK(exception, JVMTI_ERROR_NULL_POINTER); >>>> ??? oop e = JNIHandles::resolve_external_guard(exception); >>>> +?? // the exception must be a valid jobject >>>> +?? if (e == NULL) { >>>> +???? return JVMTI_ERROR_INVALID_OBJECT; >>>> +?? } >>>> >>> >>> I see your point, thanks! >>> I'll check these cases and file a bug if necessary. >>> >>>> Though not sure why you didn't use a second NULL_CHECK >>> >>> I've already replaced it with: >>> ?? NULL_CHECK(e, JVMTI_ERROR_INVALID_OBJECT); >>> >>> You, probably, need to refresh the webrev page. >>> >>> Thanks, >>> Serguei >>> >>> >>>> David >>>> ----- >>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/StopThread/stopthrd006/TestDescription.java >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The copyright year should be change to "2018, 2020,". >>>>>>>>> Thank you for the catch. >>>>>>>>> I planned to update the copyright comments. >>>>>>>>> >>>>>>>>>> I'm a little surprised the test doesn't actually check that a >>>>>>>>>> valid call doesn't produce an error. But that's an existing >>>>>>>>>> quirk of the test and not something you need to address here >>>>>>>>>> (if indeed it needs addressing - perhaps there is another >>>>>>>>>> test for that). >>>>>>>>> >>>>>>>>> There are plenty of other nsk.jvmti tests which check valid >>>>>>>>> calls. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The old webrev and spec are here: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.0/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> On 5/27/20 18:03, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 5/27/20 02:00, David Holmes wrote: >>>>>>>>>>>>> On 27/05/2020 6:36 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 5/27/20 00:47, David Holmes wrote: >>>>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 27/05/2020 1:01 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> CSR draft (one CSR reviewer is needed before finalizing >>>>>>>>>>>>>>>> it): >>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have some thoughts on the wording which I will add to >>>>>>>>>>>>>>> the CSR. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you a lot for looking at this! >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also on reflection I think JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>>>> would the best error to use, and it has an equivalent in >>>>>>>>>>>>>>> JDWP and at the Java level for JDI. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is an interesting variant, thanks! >>>>>>>>>>>>>> We need to balance on several criteria: >>>>>>>>>>>>>> ??1) Compatibility: keep returning error as close as >>>>>>>>>>>>>> possible to the current spec >>>>>>>>>>>>> >>>>>>>>>>>>> If you are adding a new error condition I don't understand >>>>>>>>>>>>> what you mean by "close to the current spec" ?? >>>>>>>>>>>> >>>>>>>>>>>> If the JVMTI_ERROR_INVALID_OBJECT is returned than the JDWP >>>>>>>>>>>> agent does not need any new error handling. >>>>>>>>>>>> The same can be true in the JDI if the JDWP returns the >>>>>>>>>>>> same error as it returned before. >>>>>>>>>>>> In this case we do not add new error code but extend the >>>>>>>>>>>> existing to cover new error condition. >>>>>>>>>>>> >>>>>>>>>>>> But, in fact (especially, after rethinking), I do not like >>>>>>>>>>>> the JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>>>> error code as it normally means something different. >>>>>>>>>>>> So, let's avoid using it and skip this criteria. >>>>>>>>>>>> Then we need new error code to cover new error condition. >>>>>>>>>>>> >>>>>>>>>>>>>> ??2) Best error naming match between JVM TI and JDI/JDWP >>>>>>>>>>>>>> ??3) Best practice in errors naming >>>>>>>>>>>>> >>>>>>>>>>>>> If the argument is not a ThreadDeath instance then it is >>>>>>>>>>>>> an illegal argument - perfect fit semantically all the >>>>>>>>>>>>> specs involved have an "illegal argument" error form. >>>>>>>>>>>> >>>>>>>>>>>> I agree with this. >>>>>>>>>>>> It is why I like this suggestion. :) >>>>>>>>>>>> The JDWP equivalent is: ILLEGAL_ARGUMENT. >>>>>>>>>>>> The JDI equivalent is: IllegalArgumentException >>>>>>>>>>>> >>>>>>>>>>>> I'll prepare and send the update. >>>>>>>>>>>> >>>>>>>>>>>> Thanks! >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>>> I think the #1 is most important but will look at it once >>>>>>>>>>>>>> more. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ?? The JVM TI StopThread method mirrored the >>>>>>>>>>>>>>>> functionality of the >>>>>>>>>>>>>>>> ?? java.lang.Thread::stop(Throwable t) method, in that >>>>>>>>>>>>>>>> it allows any exception >>>>>>>>>>>>>>>> ?? type to be installed as an asynchronous exception in >>>>>>>>>>>>>>>> the target thread. >>>>>>>>>>>>>>>> ?? However, the java.lang.Thread::stop(Throwable t) >>>>>>>>>>>>>>>> method was inherently unsafe >>>>>>>>>>>>>>>> ?? and in Java 8 (under JDK-7059085) it was "retired" >>>>>>>>>>>>>>>> so that it always threw >>>>>>>>>>>>>>>> ?? UnsupportedOperationException. >>>>>>>>>>>>>>>> ?? The updated JVM TI StopThread spec disallows an >>>>>>>>>>>>>>>> arbitrary Throwable from being passed, >>>>>>>>>>>>>>>> ?? and instead restricts the argument to being an >>>>>>>>>>>>>>>> instance of ThreadDeath, thus >>>>>>>>>>>>>>>> ?? mirroring the (deprecated but still functional) >>>>>>>>>>>>>>>> java.lang.Thread::stop() method. >>>>>>>>>>>>>>>> ?? The error JVMTI_ERROR_INVALID_OBJECT is returned if >>>>>>>>>>>>>>>> the exception argument >>>>>>>>>>>>>>>> ?? is not an instance of ThreadDeath. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ?? Also, I will file similar RFE and CSR on the JDI and >>>>>>>>>>>>>>>> JDWP spec. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Testing: >>>>>>>>>>>>>>>> ?? Built docs and checked the doc has been generated as >>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>> ?? Will run the nsk.jvmti tests locally. >>>>>>>>>>>>>>>> ?? Will submit hs-tiers1-3 to make sure there are no >>>>>>>>>>>>>>>> regressions in the JVM TI and JDI tests. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> > From serguei.spitsyn at oracle.com Wed Jun 3 20:46:07 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 13:46:07 -0700 Subject: PING: Re: RFR(XS): 8222005: ClassRedefinition crashes with: guarantee(false) failed: OLD and/or OBSOLETE method(s) found In-Reply-To: References: <5942b42c-b9b3-f1d4-6c13-774649fca32b@oracle.com> <2f9aa92c-18f5-1203-1523-3c1fd9ba9ad1@oracle.com> <52ba0f0f-a705-2043-1c1d-15ba4a441aba@oracle.com> <31ca58d7-99ac-c53d-461f-680461fb5698@oracle.com> <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> <94e924e7-bbb6-5686-4cac-abd7e7d57f2e@oracle.com> Message-ID: <4f8993f3-4d20-5f89-843a-4eafb4534beb@oracle.com> An HTML attachment was scrubbed... URL: From daniil.x.titov at oracle.com Wed Jun 3 21:02:27 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 03 Jun 2020 14:02:27 -0700 Subject: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently In-Reply-To: <2aed6958-0f12-f632-ebaa-80ddb732b5ec@oracle.com> References: <814ceeca-d700-500e-de82-7d5fd3c05192@oracle.com> <9ded6af1-d7fc-775a-eb66-b66f3147186e@oracle.com> <2aed6958-0f12-f632-ebaa-80ddb732b5ec@oracle.com> Message-ID: <18C91237-F1DA-4E86-ABAA-040C4ED71394@oracle.com> Hi Chris, > Do you think 60 seconds is a bit long? Isn't the expectation that the join should happen almost immediately or not at all? In case if an exception is thrown in the try block after the thread is started and before it is moved in the terminated state the join never happens at all. And in other cases the join should happen immediately. I ?agree that 60 seconds look as a bit long but I just wanted to minimize the odds that that we miss the log and the root cause ?if the issue is reproduced on some slow machine with such VM options as, say, ?-Xcomp. >I don't think you need a separate bug, but you should document in the bug what currently can and can't be reproduce and what is being fixed. I will update the bug with this information. Best regards, Daniil From: Chris Plummer Date: Wednesday, June 3, 2020 at 12:46 PM To: Daniil Titov , David Holmes , serviceability-dev Subject: Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently Hi Daniil, Ok, I misread getLog() and see that is now always returns the log. Do you think 60 seconds is a bit long? Isn't the expectation that the join should happen almost immediately or not at all? I don't think you need a separate bug, but you should document in the bug what currently can and can't be reproduce and what is being fixed. thanks, Chris On 6/3/20 12:08 PM, Daniil Titov wrote: Hi Chris, I was not able to reproduce the original issue anymore in Mach5. However, the test itself has a potential for a deadlock (that was also reported) and in the proposed change we fix it. The log still should be printed and the expectation is that we will be able to see the underlying problem in it if it ever reproduced. I could create a separate bug ( not sure if the subtask is a good fit here since the change fixes some problem in the test ) and close the current one as not reproducible if you think it is a better approach. Regarding Thread.suspend() and Thread.resume() methods the test also checks the thread state after these methods are invoked and since these deprecated methods are still in API I don?t think we should exclude them from being tested. Best regards, Daniil From: Chris Plummer Date: Wednesday, June 3, 2020 at 11:40 AM To: David Holmes , Daniil Titov , serviceability-dev Subject: Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently On 6/1/20 12:10 AM, David Holmes wrote: Hi Daniil, On 30/05/2020 10:07 am, Daniil Titov wrote: Please review a change [1] that fixes an intermittent test timeout. The main logic of the test has this basic structure: try { // lots of thread state manipulation of target } finally { thread.getLog(); } and as David noticed in his comment ( the last comment in [2] ) if an exception occurs anywhere in the try block we can hang waiting for the join() in getLog() because we haven't executed the logic that tells the thread to terminate. So the fix puts a timeout on the join() which means the test will no longer timeout but it will still fail when whatever was leading to the timeout now happens. So as a diagnostic fix this seems fine. Hopefully the logger will show what we need to see and determine the real underlying problem. If this change is really just diagnostic in nature, then it should be a subtask. However, it seems to me it will actually hide the failure. The test won't get a timeout and won't print the log. Am I missing something? Also, after reading through the bug comments it looks like the getLog()/join() timeout issue is different from the main issue that caused the CR to be filed in the first place. Comments regarding the initial problem are: "According to the stack trace the test seems to hang on trying to load the 'java.lang.Math' class concurrently. " "Need to see some native stacks to understand why the classloading thread is not proceeding even though RUNNABLE." "I should have looked at the test first - it uses Thread.suspend and Thread.resume and so is inherently deadlock prone." Does this issue no longer exist, or have we decided that since the test is expected to be deadlock prone to just ignore it. thanks, Chris Thanks, David ----- Testing: Running a modified test that explicitly throws a runtime exception inside the try block shows the fix solves the problem. Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are in progress. [1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/ [2] https://bugs.openjdk.java.net/browse/JDK-8081652 Thank you, Daniil -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Wed Jun 3 21:30:52 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 3 Jun 2020 14:30:52 -0700 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property Message-ID: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 > 70 lines changed: 66 ins; 0 del; 4 mod Hi all, could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? the idea behind this patch is to have a way to clearly mark tests which ignore flags, so a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; b) they can be easily excluded from runs w/ flags. @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags [1] https://bugs.openjdk.java.net/browse/JDK-8151707 [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 [3] https://bugs.openjdk.java.net/browse/JDK-8246387 Thanks, -- Igor From dean.long at oracle.com Wed Jun 3 21:56:07 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 3 Jun 2020 14:56:07 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> Message-ID: <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> Hi Serguei, I like the latest changes so that JVMCI matches C2. Please get another review because this is not a trivial change. dl On 6/3/20 10:06 AM, serguei.spitsyn at oracle.com wrote: > Hi Dean, > > The updated webrev is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.3/ > > Probably, the JVMCI part can be simplified. > Only the compile_state line has to be moved up: > + JVMCICompileState compile_state(task); > // Skip redefined methods > - if (target_handle->is_old()) { > + if (compile_state.target_method_is_old()) { > failure_reason = "redefined method"; > retry_message = "not retryable"; > compilable = ciEnv::MethodCompilable_never; > } else { > - JVMCICompileState compile_state(task); > Fixes in the jvmciEnv.?pp are not really needed > > Please, let me know what do you think. > > This version does not fail at all (in 300 runs for both C2 and JVMCI). > It seems, other two issues disappeared as well: > > This was seen with the C2: > https://bugs.openjdk.java.net/browse/JDK-8245128 > > This was seen with the JVMCI: > https://bugs.openjdk.java.net/browse/JDK-8245446 > > Thanks, > Serguei > > > On 6/1/20 23:40, serguei.spitsyn at oracle.com wrote: >> Hi Dean, >> >> Thank you for the reply. >> >> The problem is I do not fully understand your suggestion, especially >> the part >> about caching the method,is_old() value in the cache_jvmti_state(). >> >> This is a preliminary webrev where I tried to implement your suggestion: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.2/ >> >> This variant is failing in half of test runs for both C1/C2 and JVMCI. >> I think, the root cause is a safepoint in a ThreadInVMfromNative >> desctructor. >> Here: >> ?232 void ciEnv::cache_jvmti_state() { >> ?233 VM_ENTRY_MARK; >> >> Then we check for the target_method_is_old() value which is not >> up-to-date any more. >> I feel, it was correct and more simple before introducing this approach. >> Probably, I'm missing something here. >> >> >> I also have a question about the update fragment: >> 1696 { >> 1697 // Must switch to native to allocate ci_env >> 1698 ThreadToNativeFromVM ttn(thread); >> 1699 ciEnv ci_env((CompileTask*)NULL); >> 1700 >> 1701 // Switch back to VM state to do compiler initialization >> 1702 ThreadInVMfromNative tv(thread); >> 1703 ResetNoHandleMark rnhm; >> 1704 >> 1705 // Perform per-thread and global initializations >> 1706 comp->initialize(); >> 1707 } >> Can we remove the ciEnv object initialization above with the state >> transitions? >> Or it has some side effects? >> >> Please, let me know what you think. >> >> Thanks, >> Serguei >> >> >> On 6/1/20 15:10, Dean Long wrote: >>> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Dean, >>>> >>>> To check the is_old as you suggest the target method has to be passed >>>> to the cache_jvmti_state() as argument. Is it what you are suggesting? >>> >>> I believe you can use use _task->method()->is_old(), as the ciEnv >>> already has the task. >>> >>>> Just want to make sure I understand you correctly. >>>> >>>> The cache_jvmti_state() and cache_dtrace_flags() are called in the >>>> CompileBroker::init_compiler_runtime() for a ciEnv with the NULL >>>> CompileTask >>>> which looks unnecessary (or I don't understand it): >>>> >>>> bool CompileBroker::init_compiler_runtime() { >>>> ? CompilerThread* thread = CompilerThread::current(); >>>> ? . . . >>>> ??? ciEnv ci_env((CompileTask*)NULL); >>>> ??? // Cache Jvmti state >>>> ??? ci_env.cache_jvmti_state(); >>>> ??? // Cache DTrace flags >>>> ??? ci_env.cache_dtrace_flags(); >>>> >>> >>> These calls look unnecessary to me, as the ci_env will cache these >>> again before compiling a method. >>> I suggest removing these calls.? We should make sure the cache >>> fields are initialized to sane values >>> in the ciEnv ctor. >>> >>>> The JVMCI has a separate implementation for ciEnv which is jvmciEnv and >>>> its own set of cache_jvmti_state() and jvmti_state_changed() functions. >>>> Both are not called in the JVMCI case. >>>> So, these checks look as broken in JVMCI now. >>>> >>> JVMCI is in better shape, because it doesn't transition out of >>> _thread_in_vm state, >>> but yes it needs similar changes. >>> >>>> Not sure, I have enough compiler knowledge to fix this at this >>>> stage of release. >>>> Would it better to file a separate hotspot/compiler RFE targeted to 16? >>>> It can be assigned to me if it helps. >>>> >>> >>> This is a P3 so I believe we have time to fix it for 15. Please go >>> ahead and let's see if >>> we can get it in.? I can help with the JVMCI changes if they are not >>> straightforward. >>> >>> dl >>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 5/28/20 10:54, Dean Long wrote: >>>>> Sure, you could just have cache_jvmti_state() return a boolean to >>>>> bail out immediately for is_old. >>>>> >>>>> dl >>>>> >>>>> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Dean, >>>>>> >>>>>> Thank you for looking at this! >>>>>> Okay. Let me check what cab be done in this direction. >>>>>> There is no point to cache is_old. The compilation has to bail >>>>>> out if it is discovered to be true. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 5/28/20 00:59, Dean Long wrote: >>>>>>> This seems OK as long as the memory barriers in the thread state >>>>>>> transitions prevent the C++ compiler from doing something like >>>>>>> reading is_old before reading redefinition_count.? I would feel >>>>>>> better if both JVMCI and C1/C2 cached is_old and >>>>>>> redefinition_count at the same time (making sure to be in the >>>>>>> _thread_in_vm state), then bail out based on the cached value of >>>>>>> is_old. >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Please, review a fix for: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Summary: >>>>>>>>> ? The Kitchensink stress test with the Instrumentation module >>>>>>>>> enabled does >>>>>>>>> ? a lot of class retransformations in parallel with all other >>>>>>>>> stressing. >>>>>>>>> ? It provokes the assert at the compiled code installation time: >>>>>>>>> ??? assert(!method->is_old()) failed: Should not be installing >>>>>>>>> old methods >>>>>>>>> >>>>>>>>> ? The problem is that the >>>>>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>>>>> ? (non-JVMCI tiered compilation) is missing the check that >>>>>>>>> exists in the JVMCI >>>>>>>>> ? part of implementation: >>>>>>>>> 2148 // Skip redefined methods >>>>>>>>> 2149 if (target_handle->is_old()) { >>>>>>>>> 2150 failure_reason = "redefined method"; >>>>>>>>> 2151 retry_message = "not retryable"; >>>>>>>>> 2152 compilable = ciEnv::MethodCompilable_never; >>>>>>>>> 2153 } else { >>>>>>>>> . . . >>>>>>>>> 2168 } >>>>>>>>> >>>>>>>>> ? The fix is to add this check. >>>>>>>> >>>>>>>> Sorry, forgot to explain one thing. >>>>>>>> Compiler code has a special mechanism to ensure the JVMTI class >>>>>>>> redefinition did >>>>>>>> not happen while the method was compiled, so all the >>>>>>>> assumptions remain correct. >>>>>>>> 2190 // Cache Jvmti state >>>>>>>> 2191 ci_env.cache_jvmti_state(); >>>>>>>> Part of this is a check that the value of >>>>>>>> JvmtiExport::redefinition_count() is >>>>>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>>>>> The JvmtiExport::redefinition_count() value change means a >>>>>>>> class redefinition >>>>>>>> happened which also implies some of methods may become old. >>>>>>>> However, the method being compiled can be already old at the >>>>>>>> point where the >>>>>>>> redefinition counter is cached, so the redefinition counter >>>>>>>> check does not help much. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>>> Testing: >>>>>>>>> Ran Kitchensink test with the Instrumentation module enabled in mach5 >>>>>>>>> ?multiple times for 100 times. Without the fix the test normally fails >>>>>>>>> a couple of times in 200 runs. It does not fail with the fix anymore. >>>>>>>>> Will also submit hs tiers1-5. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexey.menkov at oracle.com Wed Jun 3 22:09:05 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Wed, 3 Jun 2020 15:09:05 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> Message-ID: <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> Hi Daniil, couple notes: 198 waitForThreads(numNewThreads, numTerminatedThreads); You don't actually need any wait here. Test cases wait until all threads are in desired state (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and checkAllThreadsDead use join()) 205 private static void checkLiveThreads(int numNewThreads, 206 int numTerminatedThreads) { 207 int diff = numNewThreads - numTerminatedThreads; 208 long threadCount = mbean.getThreadCount(); 209 long expectedThreadCount = prevLiveTestThreadCount + diff; 210 if (threadCount < expectedThreadCount) { if some internal thread terminates, we'll get failure here --alex On 06/02/2020 21:00, Daniil Titov wrote: > Hi Alex, Serguei, and Martin, > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > 3) Relaxes the check inside checkThreadIds() method > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 6/1/20, 5:06 PM, "Alex Menkov" wrote: > > Hi Daniil, > > 1. before the fix checkLiveThreads() tested > ThreadMXBean.getThreadCount(), but now as far as I see it tests > Thread.getAllStackTraces(); > > 2. > 237 private static void checkThreadIds() throws InterruptedException { > 238 long[] list = mbean.getAllThreadIds(); > 239 > 240 waitTillEquals( > 241 list.length, > 242 ()->(long)mbean.getThreadCount(), > 243 "Array length returned by " + > 244 "getAllThreadIds() = %1$d not matched count = > ${provided}", > 245 ()->list.length > 246 ); > 247 } > > I suppose purpose of waitTillEquals() is to handle creation/termination > of VM internal threads. > But if some internal thread terminates after mbean.getAllThreadIds() and > before 1st mbean.getThreadCount() call and then VM does not need to > restart it, waitTillEquals will wait forever. > > --alex > > > On 05/29/2020 16:28, Daniil Titov wrote: > > Hi Alex and Serguei, > > > > Please review a new version of the change [1] that makes sure that the test counts > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > tier1-tier3 tests are in progress. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > > > Hi Daniil, > > > > I'm not sure all this retry logic is a good way. > > As mentioned in jira the most important part of the testing is ensuring > > that you find all the created threads when they are alive, and you don't > > find them when they are dead. The actual thread count checking is not > > that important. > > I agree with this and I'd just simplify the test by removing checks for > > thread count. VM may create and destroy internal threads when it needs it. > > > > --alex > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > > > > > > > > > > From david.holmes at oracle.com Wed Jun 3 23:02:28 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jun 2020 09:02:28 +1000 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: References: Message-ID: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com> Hi Igor, On 4/06/2020 7:30 am, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> 70 lines changed: 66 ins; 0 del; 4 mod > > Hi all, > > could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? > > the idea behind this patch is to have a way to clearly mark tests which ignore flags, so > a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; > b) they can be easily excluded from runs w/ flags. So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments. Okay this seems reasonable in what it does. Thanks, David > @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. > > this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. > > please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 > webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 > testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags > > [1] https://bugs.openjdk.java.net/browse/JDK-8151707 > [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 > [3] https://bugs.openjdk.java.net/browse/JDK-8246387 > > Thanks, > -- Igor > From serguei.spitsyn at oracle.com Wed Jun 3 23:05:28 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 16:05:28 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Jun 3 23:10:52 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jun 2020 09:10:52 +1000 Subject: RFR(XS): 8234882: JVM TI StopThread should only allow ThreadDeath In-Reply-To: <5acfaf13-b3c1-d854-28a7-378e0bb5926e@oracle.com> References: <12cd04f9-c3f9-654f-fff2-1c4e315b6eeb@oracle.com> <3feb9c3f-4f61-f4b7-160f-c6b328305111@oracle.com> <40f21609-f086-722a-1af4-3f281c9b8963@oracle.com> <7b272791-4c47-27b0-9313-391a9e620295@oracle.com> <38db06ac-6e4e-029a-9376-ee577afe64d7@oracle.com> <2ce42985-9325-1c74-fa8d-c2a5049ec011@oracle.com> <0f1ec272-4410-f7e5-1c11-1238c0079b00@oracle.com> <3120b170-8d0f-7915-7224-f44523bdae6e@oracle.com> <586c3878-d175-2f8e-6ce8-95a187965de6@oracle.com> <2586bb75-f560-f905-1937-b778b7faba59@oracle.com> <6ebc70ce-787d-7f13-66f4-14ad8c8102d6@oracle.com> <25f4a64a-10ca-2695-6748-ccd24d84ef22@oracle.com> <1190375b-d7da-47c4-61d4-121f4d0ba33a@oracle.com> <5acfaf13-b3c1-d854-28a7-378e0bb5926e@oracle.com> Message-ID: Hi Serguei, On 4/06/2020 6:41 am, serguei.spitsyn at oracle.com wrote: > Hi David, > > The JetBrains confirmed: > ? Ability to select the exception is a valuable feature they provide. > ? Throwing only ThreadDeath is almost useless. > > So, should I close this and related JDI/JDWP enhancements as WNF? Yes. Sorry about the wasted work here. Thanks, David ----- > Thanks, > Serguei > > > On 6/1/20 08:30, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> I'll check with JetBrains on this. >> Thank you to Dan and you for raising this concern. >> The JetBrains use case you posted in the CSR looks like valid and useful. >> >> Thanks, >> Serguei >> >> >> On 6/1/20 00:46, David Holmes wrote: >>> Hi Serguei, >>> >>> Sorry, I think we have to re-think this change. As Dan flags in the >>> CSR request debuggers directly expose this API as part of the >>> debugger interface, so any change here will directly impact those >>> tools. At a minimum I think we would need to consult with the tool >>> developers about the impact of making this change, as well as whether >>> it makes any practical difference in the sense that there may be >>> other (less convenient but still available) mechanisms to achieve the >>> same goal in a debugger or agent. >>> >>> David >>> >>> On 31/05/2020 5:50 pm, serguei.spitsyn at oracle.com wrote: >>>> Hi David, >>>> >>>> Also jumping to end. >>>> >>>> On 5/30/20 06:50, David Holmes wrote: >>>>> Hi Serguei, >>>>> >>>>> Jumping to the end for now ... >>>>> >>>>> On 30/05/2020 5:50 am, serguei.spitsyn at oracle.com wrote: >>>>>> Hi David and reviewers, >>>>>> >>>>>> The updated webrev version is: >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/src/ >>>>>> >>>>>> >>>>>> This update adds testing that StopThread can return >>>>>> JVMTI_ERROR_INVALID_OBJECT error code. >>>>>> >>>>>> The JVM TI StopThread spec is: >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/docs/specs/jvmti.html#StopThread >>>>>> >>>>>> >>>>>> >>>>>> There is a couple of comments below. >>>>>> >>>>>> >>>>>> On 5/29/20 06:18, David Holmes wrote: >>>>>>> On 29/05/2020 6:24 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>> On 5/29/20 00:56, serguei.spitsyn at oracle.com wrote: >>>>>>>>> On 5/29/20 00:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> Thank you for reviewing this! >>>>>>>>>> >>>>>>>>>> On 5/28/20 23:57, David Holmes wrote: >>>>>>>>>>> Hi Serguei, >>>>>>>>>>> >>>>>>>>>>> On 28/05/2020 3:12 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> I've updated the CSR and webrev in place. >>>>>>>>>>>> >>>>>>>>>>>> The changes are: >>>>>>>>>>>> ??- addressed David's suggestion to rephrase StopThread >>>>>>>>>>>> description change >>>>>>>>>>>> ??- replaced JVMTI_ERROR_INVALID_OBJECT with >>>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>> ??- updated the implementation in jvmtiEnv.cpp to return >>>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>> ??- updated one of the nsk.jvmti StopThread tests to check >>>>>>>>>>>> error case with the JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'm reposting the links for convenience. >>>>>>>>>>>> >>>>>>>>>>>> Enhancement: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>> >>>>>>>>>>>> CSR draft: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>> >>>>>>>>>>> Spec updates are good - thanks. >>>>>>>>>> >>>>>>>>>> Thank you for the CSR review. >>>>>>>>>> >>>>>>>>>>>> Webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>>>>>>> >>>>>>>>>>> The ThreadDeath check is fine but I'm a bit confused about >>>>>>>>>>> the additional null check that leads to >>>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT. I can't see how >>>>>>>>>>> resolve_external_guard can return NULL when not passed in >>>>>>>>>>> NULL. Nor why that would result in JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>>> rather than JVMTI_ERROR_NULL_POINTER. And I note >>>>>>>>>>> JVMTI_ERROR_NULL_POINTER is not even a listed error for >>>>>>>>>>> StopThread! This part of the change seems unrelated to this >>>>>>>>>>> issue. >>>>>>>>>> >>>>>>>>>> I was also surprised with the JVMTI_ERROR_NULL_POINTER and >>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT error codes. >>>>>>>>>> The JVM TI spec automatic generation adds these two error >>>>>>>>>> codes for a jobject parameter. >>>>>>>>>> >>>>>>>>>> Also, they both are from the Universal Errors section: >>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#universal-error >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You can find a link to this section at the start of the Error >>>>>>>>>> section: >>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My understanding (not sure, it is right) is that NULL has to >>>>>>>>>> be reported with JVMTI_ERROR_NULL_POINTER and a bad >>>>>>>>>> jobject (for instance, a WeakReference with a GC-ed target) >>>>>>>>>> has to be reported with JVMTI_ERROR_INVALID_OBJECT. >>>>>>>>>> At least, I was not able to construct a test case to get this >>>>>>>>>> error code returned. >>>>>>>>>> So, I'm puzzled with this. I'll try to find some examples with >>>>>>>>>> JVMTI_ERROR_NULL_POINTER errors. >>>>>>>>> >>>>>>>>> Found the explanation. >>>>>>>>> The JDI file: >>>>>>>>> src/jdk.jdi/share/classes/com/sun/tools/jdi/JDWPException.java >>>>>>>>> >>>>>>>>> has a fragment that translate the INVALID_OBJECT error to the >>>>>>>>> ObjectCollectedException: >>>>>>>>> ??? RuntimeException toJDIException() { >>>>>>>>> ??????? switch (errorCode) { >>>>>>>>> ??????????? case JDWP.Error.INVALID_OBJECT: >>>>>>>>> ??????????????? return new ObjectCollectedException(); >>>>>>>>> >>>>>>>>> So, the INVALID_OBJECT is for a jobject handle that is >>>>>>>>> referencing a collected object. >>>>>>>>> It means that previous implementation incorrectly returned >>>>>>>>> JVMTI_ERROR_NULL_POINTER error code. >>>>>>>> >>>>>>>> I should create and delete local or global ref to construct a >>>>>>>> test case for this. >>>>>>>> >>>>>>>> Interesting that the JDWPException::toJDIException() does not >>>>>>>> convert the ILLEGAL_ARGUMENT error code to an >>>>>>>> IllegalArgumentException. >>>>>>>> I've just added this conversion. >>>>>>> >>>>>>> Given the definition of JDWP INVALID_OBJECT then obviously JDI >>>>>>> converts it to ObjectCollectedException. >>>>>>> >>>>>>> So reading further in JNI spec: >>>>>>> >>>>>>> "Weak global references are a special kind of global reference. >>>>>>> Unlike normal global references, a weak global reference allows >>>>>>> the underlying Java object to be garbage collected. Weak global >>>>>>> references may be used in any situation where global or local >>>>>>> references are used." >>>>>>> >>>>>>> So it seems that any function that takes a jobject cxould in fact >>>>>>> accept a jweak, in which case JVMTI_ERROR_INVALID_OBJECT is a >>>>>>> possibility in all cases. So IIUC >>>>>>> JNIHandles::resolve_external_guard can return NULL if a weak >>>>>>> reference has been collected. So the new code you propose seems >>>>>>> correct. >>>>>> >>>>>> You are right about weak global references. >>>>>> I was able to construct a test case for JVMTI_ERROR_INVALID_OBJECT. >>>>>> The JNI NewGlobalRef and DeleteGlobalRef are used for it. >>>>>> You can find it in the updated webrev version. >>>>>> >>>>>>> However, this still is unrelated to the current issue and I do >>>>>>> not see other JVM TI doing checks for this case. So this seems to >>>>>>> be a much broader issue. >>>>>> There are many such checks in JVM TI. >>>>>> For instance, there are checks like the following in jvmtiEnv.cpp: >>>>>> NULL_CHECK(o, JVMTI_ERROR_INVALID_OBJECT) >>>>> >>>>> Yes but they are incorrect IMO e.g. >>>>> >>>>> JvmtiEnv::GetObjectSize(jobject object, jlong* size_ptr) { >>>>> ? oop mirror = JNIHandles::resolve_external_guard(object); >>>>> ? NULL_CHECK(mirror, JVMTI_ERROR_INVALID_OBJECT); >>>>> >>>>> The NULL_CHECK will fail if either object is NULL or object is a >>>>> jweak that has been cleared. In the first case it should report >>>>> JVMTI_ERROR_NULL_POINTER. >>>>> >>>>> The correct pattern is what you proposed with this fix: >>>>> >>>>> +?? NULL_CHECK(exception, JVMTI_ERROR_NULL_POINTER); >>>>> ??? oop e = JNIHandles::resolve_external_guard(exception); >>>>> +?? // the exception must be a valid jobject >>>>> +?? if (e == NULL) { >>>>> +???? return JVMTI_ERROR_INVALID_OBJECT; >>>>> +?? } >>>>> >>>> >>>> I see your point, thanks! >>>> I'll check these cases and file a bug if necessary. >>>> >>>>> Though not sure why you didn't use a second NULL_CHECK >>>> >>>> I've already replaced it with: >>>> ?? NULL_CHECK(e, JVMTI_ERROR_INVALID_OBJECT); >>>> >>>> You, probably, need to refresh the webrev page. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>>> David >>>>> ----- >>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/StopThread/stopthrd006/TestDescription.java >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The copyright year should be change to "2018, 2020,". >>>>>>>>>> Thank you for the catch. >>>>>>>>>> I planned to update the copyright comments. >>>>>>>>>> >>>>>>>>>>> I'm a little surprised the test doesn't actually check that a >>>>>>>>>>> valid call doesn't produce an error. But that's an existing >>>>>>>>>>> quirk of the test and not something you need to address here >>>>>>>>>>> (if indeed it needs addressing - perhaps there is another >>>>>>>>>>> test for that). >>>>>>>>>> >>>>>>>>>> There are plenty of other nsk.jvmti tests which check valid >>>>>>>>>> calls. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The old webrev and spec are here: >>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.0/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>> On 5/27/20 18:03, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 5/27/20 02:00, David Holmes wrote: >>>>>>>>>>>>>> On 27/05/2020 6:36 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 5/27/20 00:47, David Holmes wrote: >>>>>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 27/05/2020 1:01 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> CSR draft (one CSR reviewer is needed before finalizing >>>>>>>>>>>>>>>>> it): >>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have some thoughts on the wording which I will add to >>>>>>>>>>>>>>>> the CSR. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you a lot for looking at this! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also on reflection I think JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>>>>> would the best error to use, and it has an equivalent in >>>>>>>>>>>>>>>> JDWP and at the Java level for JDI. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is an interesting variant, thanks! >>>>>>>>>>>>>>> We need to balance on several criteria: >>>>>>>>>>>>>>> ??1) Compatibility: keep returning error as close as >>>>>>>>>>>>>>> possible to the current spec >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you are adding a new error condition I don't understand >>>>>>>>>>>>>> what you mean by "close to the current spec" ?? >>>>>>>>>>>>> >>>>>>>>>>>>> If the JVMTI_ERROR_INVALID_OBJECT is returned than the JDWP >>>>>>>>>>>>> agent does not need any new error handling. >>>>>>>>>>>>> The same can be true in the JDI if the JDWP returns the >>>>>>>>>>>>> same error as it returned before. >>>>>>>>>>>>> In this case we do not add new error code but extend the >>>>>>>>>>>>> existing to cover new error condition. >>>>>>>>>>>>> >>>>>>>>>>>>> But, in fact (especially, after rethinking), I do not like >>>>>>>>>>>>> the JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>>>>> error code as it normally means something different. >>>>>>>>>>>>> So, let's avoid using it and skip this criteria. >>>>>>>>>>>>> Then we need new error code to cover new error condition. >>>>>>>>>>>>> >>>>>>>>>>>>>>> ??2) Best error naming match between JVM TI and JDI/JDWP >>>>>>>>>>>>>>> ??3) Best practice in errors naming >>>>>>>>>>>>>> >>>>>>>>>>>>>> If the argument is not a ThreadDeath instance then it is >>>>>>>>>>>>>> an illegal argument - perfect fit semantically all the >>>>>>>>>>>>>> specs involved have an "illegal argument" error form. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>> It is why I like this suggestion. :) >>>>>>>>>>>>> The JDWP equivalent is: ILLEGAL_ARGUMENT. >>>>>>>>>>>>> The JDI equivalent is: IllegalArgumentException >>>>>>>>>>>>> >>>>>>>>>>>>> I'll prepare and send the update. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think the #1 is most important but will look at it once >>>>>>>>>>>>>>> more. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ?? The JVM TI StopThread method mirrored the >>>>>>>>>>>>>>>>> functionality of the >>>>>>>>>>>>>>>>> ?? java.lang.Thread::stop(Throwable t) method, in that >>>>>>>>>>>>>>>>> it allows any exception >>>>>>>>>>>>>>>>> ?? type to be installed as an asynchronous exception in >>>>>>>>>>>>>>>>> the target thread. >>>>>>>>>>>>>>>>> ?? However, the java.lang.Thread::stop(Throwable t) >>>>>>>>>>>>>>>>> method was inherently unsafe >>>>>>>>>>>>>>>>> ?? and in Java 8 (under JDK-7059085) it was "retired" >>>>>>>>>>>>>>>>> so that it always threw >>>>>>>>>>>>>>>>> ?? UnsupportedOperationException. >>>>>>>>>>>>>>>>> ?? The updated JVM TI StopThread spec disallows an >>>>>>>>>>>>>>>>> arbitrary Throwable from being passed, >>>>>>>>>>>>>>>>> ?? and instead restricts the argument to being an >>>>>>>>>>>>>>>>> instance of ThreadDeath, thus >>>>>>>>>>>>>>>>> ?? mirroring the (deprecated but still functional) >>>>>>>>>>>>>>>>> java.lang.Thread::stop() method. >>>>>>>>>>>>>>>>> ?? The error JVMTI_ERROR_INVALID_OBJECT is returned if >>>>>>>>>>>>>>>>> the exception argument >>>>>>>>>>>>>>>>> ?? is not an instance of ThreadDeath. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ?? Also, I will file similar RFE and CSR on the JDI and >>>>>>>>>>>>>>>>> JDWP spec. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Testing: >>>>>>>>>>>>>>>>> ?? Built docs and checked the doc has been generated as >>>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>>> ?? Will run the nsk.jvmti tests locally. >>>>>>>>>>>>>>>>> ?? Will submit hs-tiers1-3 to make sure there are no >>>>>>>>>>>>>>>>> regressions in the JVM TI and JDI tests. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >> > From daniil.x.titov at oracle.com Wed Jun 3 23:31:04 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 03 Jun 2020 16:31:04 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> Message-ID: Hi Alex, Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. I will include this change in the new version of the webrev. > 207 int diff = numNewThreads - numTerminatedThreads; > 208 long threadCount = mbean.getThreadCount(); > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > 210 if (threadCount < expectedThreadCount) { > if some internal thread terminates, we'll get failure here The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. 310 prevLiveTestThreadCount = getTestThreadCount(); Best regards, Daniil ?On 6/3/20, 3:08 PM, "Alex Menkov" wrote: Hi Daniil, couple notes: 198 waitForThreads(numNewThreads, numTerminatedThreads); You don't actually need any wait here. Test cases wait until all threads are in desired state (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and checkAllThreadsDead use join()) 205 private static void checkLiveThreads(int numNewThreads, 206 int numTerminatedThreads) { 207 int diff = numNewThreads - numTerminatedThreads; 208 long threadCount = mbean.getThreadCount(); 209 long expectedThreadCount = prevLiveTestThreadCount + diff; 210 if (threadCount < expectedThreadCount) { if some internal thread terminates, we'll get failure here --alex On 06/02/2020 21:00, Daniil Titov wrote: > Hi Alex, Serguei, and Martin, > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > 3) Relaxes the check inside checkThreadIds() method > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 6/1/20, 5:06 PM, "Alex Menkov" wrote: > > Hi Daniil, > > 1. before the fix checkLiveThreads() tested > ThreadMXBean.getThreadCount(), but now as far as I see it tests > Thread.getAllStackTraces(); > > 2. > 237 private static void checkThreadIds() throws InterruptedException { > 238 long[] list = mbean.getAllThreadIds(); > 239 > 240 waitTillEquals( > 241 list.length, > 242 ()->(long)mbean.getThreadCount(), > 243 "Array length returned by " + > 244 "getAllThreadIds() = %1$d not matched count = > ${provided}", > 245 ()->list.length > 246 ); > 247 } > > I suppose purpose of waitTillEquals() is to handle creation/termination > of VM internal threads. > But if some internal thread terminates after mbean.getAllThreadIds() and > before 1st mbean.getThreadCount() call and then VM does not need to > restart it, waitTillEquals will wait forever. > > --alex > > > On 05/29/2020 16:28, Daniil Titov wrote: > > Hi Alex and Serguei, > > > > Please review a new version of the change [1] that makes sure that the test counts > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > tier1-tier3 tests are in progress. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > > > Hi Daniil, > > > > I'm not sure all this retry logic is a good way. > > As mentioned in jira the most important part of the testing is ensuring > > that you find all the created threads when they are alive, and you don't > > find them when they are dead. The actual thread count checking is not > > that important. > > I agree with this and I'd just simplify the test by removing checks for > > thread count. VM may create and destroy internal threads when it needs it. > > > > --alex > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > > > > > > > > > > From alexey.menkov at oracle.com Wed Jun 3 23:42:45 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Wed, 3 Jun 2020 16:42:45 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> Message-ID: <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> Hi again Daniil, On 06/03/2020 16:31, Daniil Titov wrote: > Hi Alex, > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > I will include this change in the new version of the webrev. > >> 207 int diff = numNewThreads - numTerminatedThreads; >> 208 long threadCount = mbean.getThreadCount(); >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; >> 210 if (threadCount < expectedThreadCount) { >> if some internal thread terminates, we'll get failure here > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > 310 prevLiveTestThreadCount = getTestThreadCount(); Oh, yes, I missed it. LGTM. --alex > > Best regards, > Daniil > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" wrote: > > Hi Daniil, > > couple notes: > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > You don't actually need any wait here. > Test cases wait until all threads are in desired state > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > checkAllThreadsDead use join()) > > > 205 private static void checkLiveThreads(int numNewThreads, > 206 int numTerminatedThreads) { > 207 int diff = numNewThreads - numTerminatedThreads; > 208 long threadCount = mbean.getThreadCount(); > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > 210 if (threadCount < expectedThreadCount) { > > if some internal thread terminates, we'll get failure here > > > --alex > > On 06/02/2020 21:00, Daniil Titov wrote: > > Hi Alex, Serguei, and Martin, > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > 3) Relaxes the check inside checkThreadIds() method > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" wrote: > > > > Hi Daniil, > > > > 1. before the fix checkLiveThreads() tested > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > Thread.getAllStackTraces(); > > > > 2. > > 237 private static void checkThreadIds() throws InterruptedException { > > 238 long[] list = mbean.getAllThreadIds(); > > 239 > > 240 waitTillEquals( > > 241 list.length, > > 242 ()->(long)mbean.getThreadCount(), > > 243 "Array length returned by " + > > 244 "getAllThreadIds() = %1$d not matched count = > > ${provided}", > > 245 ()->list.length > > 246 ); > > 247 } > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > of VM internal threads. > > But if some internal thread terminates after mbean.getAllThreadIds() and > > before 1st mbean.getThreadCount() call and then VM does not need to > > restart it, waitTillEquals will wait forever. > > > > --alex > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > Hi Alex and Serguei, > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > tier1-tier3 tests are in progress. > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > > > > > Hi Daniil, > > > > > > I'm not sure all this retry logic is a good way. > > > As mentioned in jira the most important part of the testing is ensuring > > > that you find all the created threads when they are alive, and you don't > > > find them when they are dead. The actual thread count checking is not > > > that important. > > > I agree with this and I'd just simplify the test by removing checks for > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > --alex > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > From serguei.spitsyn at oracle.com Thu Jun 4 00:09:36 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 17:09:36 -0700 Subject: RFR(XS): 8234882: JVM TI StopThread should only allow ThreadDeath In-Reply-To: References: <12cd04f9-c3f9-654f-fff2-1c4e315b6eeb@oracle.com> <3feb9c3f-4f61-f4b7-160f-c6b328305111@oracle.com> <40f21609-f086-722a-1af4-3f281c9b8963@oracle.com> <7b272791-4c47-27b0-9313-391a9e620295@oracle.com> <38db06ac-6e4e-029a-9376-ee577afe64d7@oracle.com> <2ce42985-9325-1c74-fa8d-c2a5049ec011@oracle.com> <0f1ec272-4410-f7e5-1c11-1238c0079b00@oracle.com> <3120b170-8d0f-7915-7224-f44523bdae6e@oracle.com> <586c3878-d175-2f8e-6ce8-95a187965de6@oracle.com> <2586bb75-f560-f905-1937-b778b7faba59@oracle.com> <6ebc70ce-787d-7f13-66f4-14ad8c8102d6@oracle.com> <25f4a64a-10ca-2695-6748-ccd24d84ef22@oracle.com> <1190375b-d7da-47c4-61d4-121f4d0ba33a@oracle.com> <5acfaf13-b3c1-d854-28a7-378e0bb5926e@oracle.com> Message-ID: On 6/3/20 16:10, David Holmes wrote: > Hi Serguei, > > On 4/06/2020 6:41 am, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> The JetBrains confirmed: >> ?? Ability to select the exception is a valuable feature they provide. >> ?? Throwing only ThreadDeath is almost useless. >> >> So, should I close this and related JDI/JDWP enhancements as WNF? > > Yes. Sorry about the wasted work here. No problem, David. Thanks! Serguei > > Thanks, > David > ----- > >> Thanks, >> Serguei >> >> >> On 6/1/20 08:30, serguei.spitsyn at oracle.com wrote: >>> Hi David, >>> >>> I'll check with JetBrains on this. >>> Thank you to Dan and you for raising this concern. >>> The JetBrains use case you posted in the CSR looks like valid and >>> useful. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/1/20 00:46, David Holmes wrote: >>>> Hi Serguei, >>>> >>>> Sorry, I think we have to re-think this change. As Dan flags in the >>>> CSR request debuggers directly expose this API as part of the >>>> debugger interface, so any change here will directly impact those >>>> tools. At a minimum I think we would need to consult with the tool >>>> developers about the impact of making this change, as well as >>>> whether it makes any practical difference in the sense that there >>>> may be other (less convenient but still available) mechanisms to >>>> achieve the same goal in a debugger or agent. >>>> >>>> David >>>> >>>> On 31/05/2020 5:50 pm, serguei.spitsyn at oracle.com wrote: >>>>> Hi David, >>>>> >>>>> Also jumping to end. >>>>> >>>>> On 5/30/20 06:50, David Holmes wrote: >>>>>> Hi Serguei, >>>>>> >>>>>> Jumping to the end for now ... >>>>>> >>>>>> On 30/05/2020 5:50 am, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi David and reviewers, >>>>>>> >>>>>>> The updated webrev version is: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/src/ >>>>>>> >>>>>>> >>>>>>> This update adds testing that StopThread can return >>>>>>> JVMTI_ERROR_INVALID_OBJECT error code. >>>>>>> >>>>>>> The JVM TI StopThread spec is: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.2/docs/specs/jvmti.html#StopThread >>>>>>> >>>>>>> >>>>>>> >>>>>>> There is a couple of comments below. >>>>>>> >>>>>>> >>>>>>> On 5/29/20 06:18, David Holmes wrote: >>>>>>>> On 29/05/2020 6:24 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>> On 5/29/20 00:56, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> On 5/29/20 00:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> Thank you for reviewing this! >>>>>>>>>>> >>>>>>>>>>> On 5/28/20 23:57, David Holmes wrote: >>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>> >>>>>>>>>>>> On 28/05/2020 3:12 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> I've updated the CSR and webrev in place. >>>>>>>>>>>>> >>>>>>>>>>>>> The changes are: >>>>>>>>>>>>> ??- addressed David's suggestion to rephrase StopThread >>>>>>>>>>>>> description change >>>>>>>>>>>>> ??- replaced JVMTI_ERROR_INVALID_OBJECT with >>>>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>> ??- updated the implementation in jvmtiEnv.cpp to return >>>>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>> ??- updated one of the nsk.jvmti StopThread tests to check >>>>>>>>>>>>> error case with the JVMTI_ERROR_ILLEGAL_ARGUMENT >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm reposting the links for convenience. >>>>>>>>>>>>> >>>>>>>>>>>>> Enhancement: >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>>> >>>>>>>>>>>>> CSR draft: >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>>> >>>>>>>>>>>> Spec updates are good - thanks. >>>>>>>>>>> >>>>>>>>>>> Thank you for the CSR review. >>>>>>>>>>> >>>>>>>>>>>>> Webrev: >>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>>>>>>>> >>>>>>>>>>>> The ThreadDeath check is fine but I'm a bit confused about >>>>>>>>>>>> the additional null check that leads to >>>>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT. I can't see how >>>>>>>>>>>> resolve_external_guard can return NULL when not passed in >>>>>>>>>>>> NULL. Nor why that would result in >>>>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT rather than >>>>>>>>>>>> JVMTI_ERROR_NULL_POINTER. And I note >>>>>>>>>>>> JVMTI_ERROR_NULL_POINTER is not even a listed error for >>>>>>>>>>>> StopThread! This part of the change seems unrelated to this >>>>>>>>>>>> issue. >>>>>>>>>>> >>>>>>>>>>> I was also surprised with the JVMTI_ERROR_NULL_POINTER and >>>>>>>>>>> JVMTI_ERROR_INVALID_OBJECT error codes. >>>>>>>>>>> The JVM TI spec automatic generation adds these two error >>>>>>>>>>> codes for a jobject parameter. >>>>>>>>>>> >>>>>>>>>>> Also, they both are from the Universal Errors section: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#universal-error >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You can find a link to this section at the start of the >>>>>>>>>>> Error section: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> My understanding (not sure, it is right) is that NULL has to >>>>>>>>>>> be reported with JVMTI_ERROR_NULL_POINTER and a bad >>>>>>>>>>> jobject (for instance, a WeakReference with a GC-ed target) >>>>>>>>>>> has to be reported with JVMTI_ERROR_INVALID_OBJECT. >>>>>>>>>>> At least, I was not able to construct a test case to get >>>>>>>>>>> this error code returned. >>>>>>>>>>> So, I'm puzzled with this. I'll try to find some examples >>>>>>>>>>> with JVMTI_ERROR_NULL_POINTER errors. >>>>>>>>>> >>>>>>>>>> Found the explanation. >>>>>>>>>> The JDI file: >>>>>>>>>> src/jdk.jdi/share/classes/com/sun/tools/jdi/JDWPException.java >>>>>>>>>> >>>>>>>>>> has a fragment that translate the INVALID_OBJECT error to the >>>>>>>>>> ObjectCollectedException: >>>>>>>>>> ??? RuntimeException toJDIException() { >>>>>>>>>> ??????? switch (errorCode) { >>>>>>>>>> ??????????? case JDWP.Error.INVALID_OBJECT: >>>>>>>>>> ??????????????? return new ObjectCollectedException(); >>>>>>>>>> >>>>>>>>>> So, the INVALID_OBJECT is for a jobject handle that is >>>>>>>>>> referencing a collected object. >>>>>>>>>> It means that previous implementation incorrectly returned >>>>>>>>>> JVMTI_ERROR_NULL_POINTER error code. >>>>>>>>> >>>>>>>>> I should create and delete local or global ref to construct a >>>>>>>>> test case for this. >>>>>>>>> >>>>>>>>> Interesting that the JDWPException::toJDIException() does not >>>>>>>>> convert the ILLEGAL_ARGUMENT error code to an >>>>>>>>> IllegalArgumentException. >>>>>>>>> I've just added this conversion. >>>>>>>> >>>>>>>> Given the definition of JDWP INVALID_OBJECT then obviously JDI >>>>>>>> converts it to ObjectCollectedException. >>>>>>>> >>>>>>>> So reading further in JNI spec: >>>>>>>> >>>>>>>> "Weak global references are a special kind of global reference. >>>>>>>> Unlike normal global references, a weak global reference allows >>>>>>>> the underlying Java object to be garbage collected. Weak global >>>>>>>> references may be used in any situation where global or local >>>>>>>> references are used." >>>>>>>> >>>>>>>> So it seems that any function that takes a jobject cxould in >>>>>>>> fact accept a jweak, in which case JVMTI_ERROR_INVALID_OBJECT >>>>>>>> is a possibility in all cases. So IIUC >>>>>>>> JNIHandles::resolve_external_guard can return NULL if a weak >>>>>>>> reference has been collected. So the new code you propose seems >>>>>>>> correct. >>>>>>> >>>>>>> You are right about weak global references. >>>>>>> I was able to construct a test case for JVMTI_ERROR_INVALID_OBJECT. >>>>>>> The JNI NewGlobalRef and DeleteGlobalRef are used for it. >>>>>>> You can find it in the updated webrev version. >>>>>>> >>>>>>>> However, this still is unrelated to the current issue and I do >>>>>>>> not see other JVM TI doing checks for this case. So this seems >>>>>>>> to be a much broader issue. >>>>>>> There are many such checks in JVM TI. >>>>>>> For instance, there are checks like the following in jvmtiEnv.cpp: >>>>>>> NULL_CHECK(o, JVMTI_ERROR_INVALID_OBJECT) >>>>>> >>>>>> Yes but they are incorrect IMO e.g. >>>>>> >>>>>> JvmtiEnv::GetObjectSize(jobject object, jlong* size_ptr) { >>>>>> ? oop mirror = JNIHandles::resolve_external_guard(object); >>>>>> ? NULL_CHECK(mirror, JVMTI_ERROR_INVALID_OBJECT); >>>>>> >>>>>> The NULL_CHECK will fail if either object is NULL or object is a >>>>>> jweak that has been cleared. In the first case it should report >>>>>> JVMTI_ERROR_NULL_POINTER. >>>>>> >>>>>> The correct pattern is what you proposed with this fix: >>>>>> >>>>>> +?? NULL_CHECK(exception, JVMTI_ERROR_NULL_POINTER); >>>>>> ??? oop e = JNIHandles::resolve_external_guard(exception); >>>>>> +?? // the exception must be a valid jobject >>>>>> +?? if (e == NULL) { >>>>>> +???? return JVMTI_ERROR_INVALID_OBJECT; >>>>>> +?? } >>>>>> >>>>> >>>>> I see your point, thanks! >>>>> I'll check these cases and file a bug if necessary. >>>>> >>>>>> Though not sure why you didn't use a second NULL_CHECK >>>>> >>>>> I've already replaced it with: >>>>> ?? NULL_CHECK(e, JVMTI_ERROR_INVALID_OBJECT); >>>>> >>>>> You, probably, need to refresh the webrev page. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/StopThread/stopthrd006/TestDescription.java >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The copyright year should be change to "2018, 2020,". >>>>>>>>>>> Thank you for the catch. >>>>>>>>>>> I planned to update the copyright comments. >>>>>>>>>>> >>>>>>>>>>>> I'm a little surprised the test doesn't actually check that >>>>>>>>>>>> a valid call doesn't produce an error. But that's an >>>>>>>>>>>> existing quirk of the test and not something you need to >>>>>>>>>>>> address here (if indeed it needs addressing - perhaps there >>>>>>>>>>>> is another test for that). >>>>>>>>>>> >>>>>>>>>>> There are plenty of other nsk.jvmti tests which check valid >>>>>>>>>>> calls. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The old webrev and spec are here: >>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.0/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>> On 5/27/20 18:03, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 5/27/20 02:00, David Holmes wrote: >>>>>>>>>>>>>>> On 27/05/2020 6:36 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 5/27/20 00:47, David Holmes wrote: >>>>>>>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 27/05/2020 1:01 pm, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234882 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> CSR draft (one CSR reviewer is needed before >>>>>>>>>>>>>>>>>> finalizing it): >>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245853 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have some thoughts on the wording which I will add >>>>>>>>>>>>>>>>> to the CSR. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you a lot for looking at this! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also on reflection I think >>>>>>>>>>>>>>>>> JVMTI_ERROR_ILLEGAL_ARGUMENT would the best error to >>>>>>>>>>>>>>>>> use, and it has an equivalent in JDWP and at the Java >>>>>>>>>>>>>>>>> level for JDI. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is an interesting variant, thanks! >>>>>>>>>>>>>>>> We need to balance on several criteria: >>>>>>>>>>>>>>>> ??1) Compatibility: keep returning error as close as >>>>>>>>>>>>>>>> possible to the current spec >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you are adding a new error condition I don't >>>>>>>>>>>>>>> understand what you mean by "close to the current spec" ?? >>>>>>>>>>>>>> >>>>>>>>>>>>>> If the JVMTI_ERROR_INVALID_OBJECT is returned than the >>>>>>>>>>>>>> JDWP agent does not need any new error handling. >>>>>>>>>>>>>> The same can be true in the JDI if the JDWP returns the >>>>>>>>>>>>>> same error as it returned before. >>>>>>>>>>>>>> In this case we do not add new error code but extend the >>>>>>>>>>>>>> existing to cover new error condition. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But, in fact (especially, after rethinking), I do not >>>>>>>>>>>>>> like the JVMTI_ERROR_INVALID_OBJECT >>>>>>>>>>>>>> error code as it normally means something different. >>>>>>>>>>>>>> So, let's avoid using it and skip this criteria. >>>>>>>>>>>>>> Then we need new error code to cover new error condition. >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ??2) Best error naming match between JVM TI and JDI/JDWP >>>>>>>>>>>>>>>> ??3) Best practice in errors naming >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If the argument is not a ThreadDeath instance then it is >>>>>>>>>>>>>>> an illegal argument - perfect fit semantically all the >>>>>>>>>>>>>>> specs involved have an "illegal argument" error form. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>> It is why I like this suggestion. :) >>>>>>>>>>>>>> The JDWP equivalent is: ILLEGAL_ARGUMENT. >>>>>>>>>>>>>> The JDI equivalent is: IllegalArgumentException >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'll prepare and send the update. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the #1 is most important but will look at it >>>>>>>>>>>>>>>> once more. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/src/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Updated JVM TI StopThread spec: >>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-stop-thread.1/docs/specs/jvmti.html#StopThread >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ?? The JVM TI StopThread method mirrored the >>>>>>>>>>>>>>>>>> functionality of the >>>>>>>>>>>>>>>>>> java.lang.Thread::stop(Throwable t) method, in that >>>>>>>>>>>>>>>>>> it allows any exception >>>>>>>>>>>>>>>>>> ?? type to be installed as an asynchronous exception >>>>>>>>>>>>>>>>>> in the target thread. >>>>>>>>>>>>>>>>>> ?? However, the java.lang.Thread::stop(Throwable t) >>>>>>>>>>>>>>>>>> method was inherently unsafe >>>>>>>>>>>>>>>>>> ?? and in Java 8 (under JDK-7059085) it was "retired" >>>>>>>>>>>>>>>>>> so that it always threw >>>>>>>>>>>>>>>>>> UnsupportedOperationException. >>>>>>>>>>>>>>>>>> ?? The updated JVM TI StopThread spec disallows an >>>>>>>>>>>>>>>>>> arbitrary Throwable from being passed, >>>>>>>>>>>>>>>>>> ?? and instead restricts the argument to being an >>>>>>>>>>>>>>>>>> instance of ThreadDeath, thus >>>>>>>>>>>>>>>>>> ?? mirroring the (deprecated but still functional) >>>>>>>>>>>>>>>>>> java.lang.Thread::stop() method. >>>>>>>>>>>>>>>>>> ?? The error JVMTI_ERROR_INVALID_OBJECT is returned >>>>>>>>>>>>>>>>>> if the exception argument >>>>>>>>>>>>>>>>>> ?? is not an instance of ThreadDeath. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ?? Also, I will file similar RFE and CSR on the JDI >>>>>>>>>>>>>>>>>> and JDWP spec. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Testing: >>>>>>>>>>>>>>>>>> ?? Built docs and checked the doc has been generated >>>>>>>>>>>>>>>>>> as expected. >>>>>>>>>>>>>>>>>> ?? Will run the nsk.jvmti tests locally. >>>>>>>>>>>>>>>>>> ?? Will submit hs-tiers1-3 to make sure there are no >>>>>>>>>>>>>>>>>> regressions in the JVM TI and JDI tests. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >> From igor.ignatyev at oracle.com Thu Jun 4 01:05:07 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 3 Jun 2020 18:05:07 -0700 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com> References: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com> Message-ID: <5430D545-BE0C-4022-9468-D6EAFF7BAC78@oracle.com> Hi David, > So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments. correct, and 8151707's subtasks are going to mark only such tests (and tests which should be using driver-mode, but can't due to external factors, remember these follow-up fixes for my use driver-mode? ;) ). there are two more (a bit controversial) use cases where we can consider usage of vm.flagless: - some of debugger-debuggee tests have debugger executed w/ external flags, but don't pass these flags to debuggee; and in most cases, it doesn't seem to be right, so arguable all such tests should be updated to use driver mode to run debugger and then marked w/ vm.flagless. I know that svc team was doing some cleanup in this area recently, and given it's require more investigation w.r.t the tests' intent, I don't plan to do it as a part of 8151707, and instead will create follow up RFEs/tasks. - a unit-like tests which don't ignore flags, but weren't designed to be run w/ external flags; most of jfr tests can be used as an example: you can run w/ any flags, but they might fail as they assert things which happen only in certain configurations and these configurations are written in jtreg test descriptions. currently, these tests are marked w/ jfr k/w and it's advised not to run them w/ any external flags, yet I know that some people successfully do that to test their configurations. given the set of configurations which satisfies needs of jfr tests is much bigger than the configurations listed in the tests, I kinda feel sympathetic to people doing that, on the other hand, it's unsupported and I'd prefer us to express (and enforce) that more clearly. again, given the possible controversiality and need for a broader discussion, I'm planning to file an issue for jfr tests and follow up later w/ interested parties. to sum up, 8151707's subtasks are going to mark *only* obvious and non-controversial cases. for all other cases, the JBS entries are to be filed and followed up on. Cheers, -- Igor > On Jun 3, 2020, at 4:02 PM, David Holmes wrote: > > Hi Igor, > > On 4/06/2020 7:30 am, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >>> 70 lines changed: 66 ins; 0 del; 4 mod >> Hi all, >> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? >> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so >> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; >> b) they can be easily excluded from runs w/ flags. > > So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments. > > Okay this seems reasonable in what it does. > > Thanks, > David > >> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. >> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. >> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 >> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags >> [1] https://bugs.openjdk.java.net/browse/JDK-8151707 >> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 >> [3] https://bugs.openjdk.java.net/browse/JDK-8246387 >> Thanks, >> -- Igor From serguei.spitsyn at oracle.com Thu Jun 4 02:07:20 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 19:07:20 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi Richard, The mach5 test run is good. Thanks, Serguei On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. > > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From daniil.x.titov at oracle.com Thu Jun 4 03:42:57 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 03 Jun 2020 20:42:57 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> Message-ID: <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> Hi Alex, Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ [2] https://bugs.openjdk.java.net/browse/JDK-8131745 Thank you, Daniil ?On 6/3/20, 4:42 PM, "Alex Menkov" wrote: Hi again Daniil, On 06/03/2020 16:31, Daniil Titov wrote: > Hi Alex, > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > I will include this change in the new version of the webrev. > >> 207 int diff = numNewThreads - numTerminatedThreads; >> 208 long threadCount = mbean.getThreadCount(); >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; >> 210 if (threadCount < expectedThreadCount) { >> if some internal thread terminates, we'll get failure here > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > 310 prevLiveTestThreadCount = getTestThreadCount(); Oh, yes, I missed it. LGTM. --alex > > Best regards, > Daniil > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" wrote: > > Hi Daniil, > > couple notes: > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > You don't actually need any wait here. > Test cases wait until all threads are in desired state > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > checkAllThreadsDead use join()) > > > 205 private static void checkLiveThreads(int numNewThreads, > 206 int numTerminatedThreads) { > 207 int diff = numNewThreads - numTerminatedThreads; > 208 long threadCount = mbean.getThreadCount(); > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > 210 if (threadCount < expectedThreadCount) { > > if some internal thread terminates, we'll get failure here > > > --alex > > On 06/02/2020 21:00, Daniil Titov wrote: > > Hi Alex, Serguei, and Martin, > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > 3) Relaxes the check inside checkThreadIds() method > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" wrote: > > > > Hi Daniil, > > > > 1. before the fix checkLiveThreads() tested > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > Thread.getAllStackTraces(); > > > > 2. > > 237 private static void checkThreadIds() throws InterruptedException { > > 238 long[] list = mbean.getAllThreadIds(); > > 239 > > 240 waitTillEquals( > > 241 list.length, > > 242 ()->(long)mbean.getThreadCount(), > > 243 "Array length returned by " + > > 244 "getAllThreadIds() = %1$d not matched count = > > ${provided}", > > 245 ()->list.length > > 246 ); > > 247 } > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > of VM internal threads. > > But if some internal thread terminates after mbean.getAllThreadIds() and > > before 1st mbean.getThreadCount() call and then VM does not need to > > restart it, waitTillEquals will wait forever. > > > > --alex > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > Hi Alex and Serguei, > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > tier1-tier3 tests are in progress. > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > > > > > Hi Daniil, > > > > > > I'm not sure all this retry logic is a good way. > > > As mentioned in jira the most important part of the testing is ensuring > > > that you find all the created threads when they are alive, and you don't > > > find them when they are dead. The actual thread count checking is not > > > that important. > > > I agree with this and I'd just simplify the test by removing checks for > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > --alex > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > From serguei.spitsyn at oracle.com Thu Jun 4 03:57:30 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 3 Jun 2020 20:57:30 -0700 Subject: RFR (XS): 8196450: Deprecate JDWP/JDI canUnrestrictedlyRedefineClasses to match JVM TI capabilities Message-ID: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Thu Jun 4 07:20:48 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 00:20:48 -0700 Subject: RFR(S): 8245321: refactor the redefine check that an attribute consisting of a list of classes has not changed Message-ID: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> An HTML attachment was scrubbed... URL: From harold.seigel at oracle.com Thu Jun 4 14:09:04 2020 From: harold.seigel at oracle.com (Harold Seigel) Date: Thu, 4 Jun 2020 10:09:04 -0400 Subject: RFR(S): 8245321: refactor the redefine check that an attribute consisting of a list of classes has not changed In-Reply-To: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> References: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> Message-ID: Hi Serguei, The change looks good.? Could you add a comment to check_attribute_arrays() saying that its caller should have a ResourceMark? Also, I think that the log_trace arguments at line 724 are in the wrong order.? attr_name should be after the_class->external_name(). I don't need to see a new webrev. Thanks, Harold On 6/4/2020 3:20 AM, serguei.spitsyn at oracle.com wrote: > Please, review a fix for: > https://bugs.openjdk.java.net/browse/JDK-8245321 > > Webrev: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef-refact.1/ > > > Summary: > ? The jvmtiRedefineClasses.cpp functions check_nest_attributes and > ? check_permitted_subclasses_attribute have significant common part. > ? This fix is a refactoring which implements this common part into > ? the function check_attribute_arrays. And this function is used in > ? both check_nest_attributes and check_permitted_subclasses_attribute. > > ? The check_record_attributes was initially considered to be included > ? into this refactoring. However, it has many differences in layout. > ? I've decided, it is not worth to introduce more complexity into this > ? refactoring in order to support this function as well. But, please. > ? let me know if this function refactoring is still desirable. > > Testing: > ? Local test runs with the RedefineNestmateAttr and > RedefinePermittedSubclassesAttr > ? tests on a Linux server are passed. > ? In progress: submit mach5 jobs with the same Nestmates and > PermittedSubclasses tests. > > Thanks, > Serguei -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Thu Jun 4 14:25:37 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 07:25:37 -0700 Subject: RFR(S): 8245321: refactor the redefine check that an attribute consisting of a list of classes has not changed In-Reply-To: References: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From coleen.phillimore at oracle.com Thu Jun 4 18:41:34 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 4 Jun 2020 14:41:34 -0400 Subject: RFR(S): 8245321: refactor the redefine check that an attribute consisting of a list of classes has not changed In-Reply-To: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> References: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> Message-ID: <53967a36-9051-74d4-11a5-40bda2f626db@oracle.com> This looks good to me also. Coleen On 6/4/20 3:20 AM, serguei.spitsyn at oracle.com wrote: > Please, review a fix for: > https://bugs.openjdk.java.net/browse/JDK-8245321 > > Webrev: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef-refact.1/ > > > Summary: > ? The jvmtiRedefineClasses.cpp functions check_nest_attributes and > ? check_permitted_subclasses_attribute have significant common part. > ? This fix is a refactoring which implements this common part into > ? the function check_attribute_arrays. And this function is used in > ? both check_nest_attributes and check_permitted_subclasses_attribute. > > ? The check_record_attributes was initially considered to be included > ? into this refactoring. However, it has many differences in layout. > ? I've decided, it is not worth to introduce more complexity into this > ? refactoring in order to support this function as well. But, please. > ? let me know if this function refactoring is still desirable. > > Testing: > ? Local test runs with the RedefineNestmateAttr and > RedefinePermittedSubclassesAttr > ? tests on a Linux server are passed. > ? In progress: submit mach5 jobs with the same Nestmates and > PermittedSubclasses tests. > > Thanks, > Serguei -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexey.menkov at oracle.com Thu Jun 4 19:11:40 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Thu, 4 Jun 2020 12:11:40 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> Message-ID: Hi Daniil, LGTM. --alex On 06/03/2020 20:42, Daniil Titov wrote: > Hi Alex, > > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 6/3/20, 4:42 PM, "Alex Menkov" wrote: > > Hi again Daniil, > > On 06/03/2020 16:31, Daniil Titov wrote: > > Hi Alex, > > > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > > > I will include this change in the new version of the webrev. > > > >> 207 int diff = numNewThreads - numTerminatedThreads; > >> 208 long threadCount = mbean.getThreadCount(); > >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > >> 210 if (threadCount < expectedThreadCount) { > >> if some internal thread terminates, we'll get failure here > > > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > > > 310 prevLiveTestThreadCount = getTestThreadCount(); > > Oh, yes, I missed it. > > LGTM. > > --alex > > > > > Best regards, > > Daniil > > > > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" wrote: > > > > Hi Daniil, > > > > couple notes: > > > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > > > You don't actually need any wait here. > > Test cases wait until all threads are in desired state > > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > > checkAllThreadsDead use join()) > > > > > > 205 private static void checkLiveThreads(int numNewThreads, > > 206 int numTerminatedThreads) { > > 207 int diff = numNewThreads - numTerminatedThreads; > > 208 long threadCount = mbean.getThreadCount(); > > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > 210 if (threadCount < expectedThreadCount) { > > > > if some internal thread terminates, we'll get failure here > > > > > > --alex > > > > On 06/02/2020 21:00, Daniil Titov wrote: > > > Hi Alex, Serguei, and Martin, > > > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > > 3) Relaxes the check inside checkThreadIds() method > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" wrote: > > > > > > Hi Daniil, > > > > > > 1. before the fix checkLiveThreads() tested > > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > > Thread.getAllStackTraces(); > > > > > > 2. > > > 237 private static void checkThreadIds() throws InterruptedException { > > > 238 long[] list = mbean.getAllThreadIds(); > > > 239 > > > 240 waitTillEquals( > > > 241 list.length, > > > 242 ()->(long)mbean.getThreadCount(), > > > 243 "Array length returned by " + > > > 244 "getAllThreadIds() = %1$d not matched count = > > > ${provided}", > > > 245 ()->list.length > > > 246 ); > > > 247 } > > > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > > of VM internal threads. > > > But if some internal thread terminates after mbean.getAllThreadIds() and > > > before 1st mbean.getThreadCount() call and then VM does not need to > > > restart it, waitTillEquals will wait forever. > > > > > > --alex > > > > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > > Hi Alex and Serguei, > > > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > > tier1-tier3 tests are in progress. > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" wrote: > > > > > > > > Hi Daniil, > > > > > > > > I'm not sure all this retry logic is a good way. > > > > As mentioned in jira the most important part of the testing is ensuring > > > > that you find all the created threads when they are alive, and you don't > > > > find them when they are dead. The actual thread count checking is not > > > > that important. > > > > I agree with this and I'd just simplify the test by removing checks for > > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > > > --alex > > > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > Thank you, > > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From serguei.spitsyn at oracle.com Thu Jun 4 19:39:26 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 12:39:26 -0700 Subject: RFR(S): 8245321: refactor the redefine check that an attribute consisting of a list of classes has not changed In-Reply-To: <53967a36-9051-74d4-11a5-40bda2f626db@oracle.com> References: <43c35bf6-6d57-aa10-cf9e-0ca5d7114247@oracle.com> <53967a36-9051-74d4-11a5-40bda2f626db@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Thu Jun 4 22:03:15 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 15:03:15 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From daniil.x.titov at oracle.com Thu Jun 4 23:01:04 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Thu, 04 Jun 2020 16:01:04 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> Message-ID: <1BEDB6A2-54C5-4202-931C-E1684BC539CE@oracle.com> Hi Serguei, > 201 private static void checkLiveThreads(int numNewThreads, > 202 int numTerminatedThreads) { > 203 int diff = numNewThreads - numTerminatedThreads; > 204 long threadCount = mbean.getThreadCount(); > 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > 206 if (threadCount < expectedThreadCount) { > 207 testFailed = true; > When all threads are counted with mbean.getThreadCount() it is not clear > there is no race with new non-tested threads creation. Is it possible? > If so, then the check at line 206 is going to fail. Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass. --Best regards, Daniil From: "serguei.spitsyn at oracle.com" Date: Thursday, June 4, 2020 at 3:03 PM To: Daniil Titov , Alex Menkov , serviceability-dev Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently Hi Daniil, It is hard to be on top of all the details in these review rounds. When all threads are counted with mbean.getThreadCount() it is not clear there is no race with new non-tested threads creation. Is it possible? If so, then the check at line 206 is going to fail. 201 private static void checkLiveThreads(int numNewThreads, 202 int numTerminatedThreads) { 203 int diff = numNewThreads - numTerminatedThreads; 204 long threadCount = mbean.getThreadCount(); 205 long expectedThreadCount = prevLiveTestThreadCount + diff; 206 if (threadCount < expectedThreadCount) { 207 testFailed = true; Thanks, Serguei On 6/3/20 20:42, Daniil Titov wrote: Hi Alex, Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ [2] https://bugs.openjdk.java.net/browse/JDK-8131745 Thank you, Daniil ?On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: Hi again Daniil, On 06/03/2020 16:31, Daniil Titov wrote: > Hi Alex, > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > I will include this change in the new version of the webrev. > >> 207 int diff = numNewThreads - numTerminatedThreads; >> 208 long threadCount = mbean.getThreadCount(); >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; >> 210 if (threadCount < expectedThreadCount) { >> if some internal thread terminates, we'll get failure here > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > 310 prevLiveTestThreadCount = getTestThreadCount(); Oh, yes, I missed it. LGTM. --alex > > Best regards, > Daniil > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > Hi Daniil, > > couple notes: > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > You don't actually need any wait here. > Test cases wait until all threads are in desired state > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > checkAllThreadsDead use join()) > > > 205 private static void checkLiveThreads(int numNewThreads, > 206 int numTerminatedThreads) { > 207 int diff = numNewThreads - numTerminatedThreads; > 208 long threadCount = mbean.getThreadCount(); > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > 210 if (threadCount < expectedThreadCount) { > > if some internal thread terminates, we'll get failure here > > > --alex > > On 06/02/2020 21:00, Daniil Titov wrote: > > Hi Alex, Serguei, and Martin, > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > 3) Relaxes the check inside checkThreadIds() method > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > Hi Daniil, > > > > 1. before the fix checkLiveThreads() tested > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > Thread.getAllStackTraces(); > > > > 2. > > 237 private static void checkThreadIds() throws InterruptedException { > > 238 long[] list = mbean.getAllThreadIds(); > > 239 > > 240 waitTillEquals( > > 241 list.length, > > 242 ()->(long)mbean.getThreadCount(), > > 243 "Array length returned by " + > > 244 "getAllThreadIds() = %1$d not matched count = > > ${provided}", > > 245 ()->list.length > > 246 ); > > 247 } > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > of VM internal threads. > > But if some internal thread terminates after mbean.getAllThreadIds() and > > before 1st mbean.getThreadCount() call and then VM does not need to > > restart it, waitTillEquals will wait forever. > > > > --alex > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > Hi Alex and Serguei, > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > tier1-tier3 tests are in progress. > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > Hi Daniil, > > > > > > I'm not sure all this retry logic is a good way. > > > As mentioned in jira the most important part of the testing is ensuring > > > that you find all the created threads when they are alive, and you don't > > > find them when they are dead. The actual thread count checking is not > > > that important. > > > I agree with this and I'd just simplify the test by removing checks for > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > --alex > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > From serguei.spitsyn at oracle.com Thu Jun 4 23:08:58 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 16:08:58 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <1BEDB6A2-54C5-4202-931C-E1684BC539CE@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> <1BEDB6A2-54C5-4202-931C-E1684BC539CE@oracle.com> Message-ID: Hi Daniil, On 6/4/20 16:01, Daniil Titov wrote: > Hi Serguei, > >> 201 private static void checkLiveThreads(int numNewThreads, >> 202 int numTerminatedThreads) { >> 203 int diff = numNewThreads - numTerminatedThreads; >> 204 long threadCount = mbean.getThreadCount(); >> 205 long expectedThreadCount = prevLiveTestThreadCount + diff; >> 206 if (threadCount < expectedThreadCount) { >> 207 testFailed = true; >> When all threads are counted with mbean.getThreadCount() it is not clear >> there is no race with new non-tested threads creation. Is it possible? >> If so, then the check at line 206 is going to fail. > Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass. Okay, I see that it is failure condition. But then is there a race with (non-tested) threads termination? Note, the threads can be terminated even after the diff value is calculated at line 203. I'm sorry, if the same questions are repeated again. Thanks, Serguei > --Best regards, > Daniil > > From: "serguei.spitsyn at oracle.com" > Date: Thursday, June 4, 2020 at 3:03 PM > To: Daniil Titov , Alex Menkov , serviceability-dev > Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently > > Hi Daniil, > > It is hard to be on top of all the details in these review rounds. > When all threads are counted with mbean.getThreadCount() it is not clear > there is no race with new non-tested threads creation. Is it possible? > If so, then the check at line 206 is going to fail. > 201 private static void checkLiveThreads(int numNewThreads, > 202 int numTerminatedThreads) { > 203 int diff = numNewThreads - numTerminatedThreads; > 204 long threadCount = mbean.getThreadCount(); > 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > 206 if (threadCount < expectedThreadCount) { > 207 testFailed = true; > > Thanks, > Serguei > > On 6/3/20 20:42, Daniil Titov wrote: > Hi Alex, > > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > Hi again Daniil, > > On 06/03/2020 16:31, Daniil Titov wrote: > > Hi Alex, > > > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > > > I will include this change in the new version of the webrev. > > > >> 207 int diff = numNewThreads - numTerminatedThreads; > >> 208 long threadCount = mbean.getThreadCount(); > >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > >> 210 if (threadCount < expectedThreadCount) { > >> if some internal thread terminates, we'll get failure here > > > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > > > 310 prevLiveTestThreadCount = getTestThreadCount(); > > Oh, yes, I missed it. > > LGTM. > > --alex > > > > > Best regards, > > Daniil > > > > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > Hi Daniil, > > > > couple notes: > > > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > > > You don't actually need any wait here. > > Test cases wait until all threads are in desired state > > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > > checkAllThreadsDead use join()) > > > > > > 205 private static void checkLiveThreads(int numNewThreads, > > 206 int numTerminatedThreads) { > > 207 int diff = numNewThreads - numTerminatedThreads; > > 208 long threadCount = mbean.getThreadCount(); > > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > 210 if (threadCount < expectedThreadCount) { > > > > if some internal thread terminates, we'll get failure here > > > > > > --alex > > > > On 06/02/2020 21:00, Daniil Titov wrote: > > > Hi Alex, Serguei, and Martin, > > > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > > 3) Relaxes the check inside checkThreadIds() method > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > Hi Daniil, > > > > > > 1. before the fix checkLiveThreads() tested > > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > > Thread.getAllStackTraces(); > > > > > > 2. > > > 237 private static void checkThreadIds() throws InterruptedException { > > > 238 long[] list = mbean.getAllThreadIds(); > > > 239 > > > 240 waitTillEquals( > > > 241 list.length, > > > 242 ()->(long)mbean.getThreadCount(), > > > 243 "Array length returned by " + > > > 244 "getAllThreadIds() = %1$d not matched count = > > > ${provided}", > > > 245 ()->list.length > > > 246 ); > > > 247 } > > > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > > of VM internal threads. > > > But if some internal thread terminates after mbean.getAllThreadIds() and > > > before 1st mbean.getThreadCount() call and then VM does not need to > > > restart it, waitTillEquals will wait forever. > > > > > > --alex > > > > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > > Hi Alex and Serguei, > > > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > > tier1-tier3 tests are in progress. > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > > > Hi Daniil, > > > > > > > > I'm not sure all this retry logic is a good way. > > > > As mentioned in jira the most important part of the testing is ensuring > > > > that you find all the created threads when they are alive, and you don't > > > > find them when they are dead. The actual thread count checking is not > > > > that important. > > > > I agree with this and I'd just simplify the test by removing checks for > > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > > > --alex > > > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > Thank you, > > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From daniil.x.titov at oracle.com Thu Jun 4 23:45:53 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Thu, 04 Jun 2020 16:45:53 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> <1BEDB6A2-54C5-4202-931C-E1684BC539CE@oracle.com> Message-ID: <9E69E9FC-C041-4581-9810-2AE1BFA33B9B@oracle.com> Hi Serguei, > Note, the threads can be terminated even after the diff value is > calculated at line 203. Please note that the diff value calculated on line 203 shows how many *test* threads were created or terminated, numNewThreads is number of new *test* threads and numTerminatedThreads is number of terminated *test* threads. No *test* thread can terminate or start after the diff value is calculated. Number of threads mbean.getThreadCount() could be seen as number of live *test* threads plus number of live internal (non-test) threads, or A = B + C , where A - result of mbean.getThreadCount(), B - number of live test threads, C - number of live non-test threads. Regardless what happens with internal "non-tested" threads the invariant that this method tests is that number of threads mbean.getThreadCount() returns could not be less than number of live test threads, or that A >= B. Best regards, Daniil ?On 6/4/20, 4:08 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, On 6/4/20 16:01, Daniil Titov wrote: > Hi Serguei, > >> 201 private static void checkLiveThreads(int numNewThreads, >> 202 int numTerminatedThreads) { >> 203 int diff = numNewThreads - numTerminatedThreads; >> 204 long threadCount = mbean.getThreadCount(); >> 205 long expectedThreadCount = prevLiveTestThreadCount + diff; >> 206 if (threadCount < expectedThreadCount) { >> 207 testFailed = true; >> When all threads are counted with mbean.getThreadCount() it is not clear >> there is no race with new non-tested threads creation. Is it possible? >> If so, then the check at line 206 is going to fail. > Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass. Okay, I see that it is failure condition. But then is there a race with (non-tested) threads termination? Note, the threads can be terminated even after the diff value is calculated at line 203. I'm sorry, if the same questions are repeated again. Thanks, Serguei > --Best regards, > Daniil > > From: "serguei.spitsyn at oracle.com" > Date: Thursday, June 4, 2020 at 3:03 PM > To: Daniil Titov , Alex Menkov , serviceability-dev > Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently > > Hi Daniil, > > It is hard to be on top of all the details in these review rounds. > When all threads are counted with mbean.getThreadCount() it is not clear > there is no race with new non-tested threads creation. Is it possible? > If so, then the check at line 206 is going to fail. > 201 private static void checkLiveThreads(int numNewThreads, > 202 int numTerminatedThreads) { > 203 int diff = numNewThreads - numTerminatedThreads; > 204 long threadCount = mbean.getThreadCount(); > 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > 206 if (threadCount < expectedThreadCount) { > 207 testFailed = true; > > Thanks, > Serguei > > On 6/3/20 20:42, Daniil Titov wrote: > Hi Alex, > > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > Thank you, > Daniil > > ?On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > Hi again Daniil, > > On 06/03/2020 16:31, Daniil Titov wrote: > > Hi Alex, > > > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > > > I will include this change in the new version of the webrev. > > > >> 207 int diff = numNewThreads - numTerminatedThreads; > >> 208 long threadCount = mbean.getThreadCount(); > >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > >> 210 if (threadCount < expectedThreadCount) { > >> if some internal thread terminates, we'll get failure here > > > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > > > 310 prevLiveTestThreadCount = getTestThreadCount(); > > Oh, yes, I missed it. > > LGTM. > > --alex > > > > > Best regards, > > Daniil > > > > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > Hi Daniil, > > > > couple notes: > > > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > > > You don't actually need any wait here. > > Test cases wait until all threads are in desired state > > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > > checkAllThreadsDead use join()) > > > > > > 205 private static void checkLiveThreads(int numNewThreads, > > 206 int numTerminatedThreads) { > > 207 int diff = numNewThreads - numTerminatedThreads; > > 208 long threadCount = mbean.getThreadCount(); > > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > 210 if (threadCount < expectedThreadCount) { > > > > if some internal thread terminates, we'll get failure here > > > > > > --alex > > > > On 06/02/2020 21:00, Daniil Titov wrote: > > > Hi Alex, Serguei, and Martin, > > > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > > 3) Relaxes the check inside checkThreadIds() method > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > Thank you, > > > Daniil > > > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > Hi Daniil, > > > > > > 1. before the fix checkLiveThreads() tested > > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > > Thread.getAllStackTraces(); > > > > > > 2. > > > 237 private static void checkThreadIds() throws InterruptedException { > > > 238 long[] list = mbean.getAllThreadIds(); > > > 239 > > > 240 waitTillEquals( > > > 241 list.length, > > > 242 ()->(long)mbean.getThreadCount(), > > > 243 "Array length returned by " + > > > 244 "getAllThreadIds() = %1$d not matched count = > > > ${provided}", > > > 245 ()->list.length > > > 246 ); > > > 247 } > > > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > > of VM internal threads. > > > But if some internal thread terminates after mbean.getAllThreadIds() and > > > before 1st mbean.getThreadCount() call and then VM does not need to > > > restart it, waitTillEquals will wait forever. > > > > > > --alex > > > > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > > Hi Alex and Serguei, > > > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > > tier1-tier3 tests are in progress. > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > > > Hi Daniil, > > > > > > > > I'm not sure all this retry logic is a good way. > > > > As mentioned in jira the most important part of the testing is ensuring > > > > that you find all the created threads when they are alive, and you don't > > > > find them when they are dead. The actual thread count checking is not > > > > that important. > > > > I agree with this and I'd just simplify the test by removing checks for > > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > > > --alex > > > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > Thank you, > > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From serguei.spitsyn at oracle.com Thu Jun 4 23:56:29 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 16:56:29 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <9E69E9FC-C041-4581-9810-2AE1BFA33B9B@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> <1BEDB6A2-54C5-4202-931C-E1684BC539CE@oracle.com> <9E69E9FC-C041-4581-9810-2AE1BFA33B9B@oracle.com> Message-ID: <54b93ec6-e60e-d32b-5d73-51de2619cdea@oracle.com> Hi Daniil, Got it, thanks. I think, some short comment above this comparisons would be helpful. LGTM. Thanks, Serguei On 6/4/20 16:45, Daniil Titov wrote: > Hi Serguei, > >> Note, the threads can be terminated even after the diff value is >> calculated at line 203. > Please note that the diff value calculated on line 203 shows how many *test* threads were created or terminated, > numNewThreads is number of new *test* threads and numTerminatedThreads is number of terminated *test* threads. > > No *test* thread can terminate or start after the diff value is calculated. > > Number of threads mbean.getThreadCount() could be seen as number of live *test* threads plus number of live internal (non-test) threads, > or A = B + C , where A - result of mbean.getThreadCount(), B - number of live test threads, C - number of live non-test threads. > > Regardless what happens with internal "non-tested" threads the invariant that this method tests is that number of threads > mbean.getThreadCount() returns could not be less than number of live test threads, or that A >= B. > > > Best regards, > Daniil > > ?On 6/4/20, 4:08 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > > On 6/4/20 16:01, Daniil Titov wrote: > > Hi Serguei, > > > >> 201 private static void checkLiveThreads(int numNewThreads, > >> 202 int numTerminatedThreads) { > >> 203 int diff = numNewThreads - numTerminatedThreads; > >> 204 long threadCount = mbean.getThreadCount(); > >> 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > >> 206 if (threadCount < expectedThreadCount) { > >> 207 testFailed = true; > >> When all threads are counted with mbean.getThreadCount() it is not clear > >> there is no race with new non-tested threads creation. Is it possible? > >> If so, then the check at line 206 is going to fail. > > Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass. > > Okay, I see that it is failure condition. > But then is there a race with (non-tested) threads termination? > Note, the threads can be terminated even after the diff value is > calculated at line 203. > I'm sorry, if the same questions are repeated again. > > Thanks, > Serguei > > > --Best regards, > > Daniil > > > > From: "serguei.spitsyn at oracle.com" > > Date: Thursday, June 4, 2020 at 3:03 PM > > To: Daniil Titov , Alex Menkov , serviceability-dev > > Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently > > > > Hi Daniil, > > > > It is hard to be on top of all the details in these review rounds. > > When all threads are counted with mbean.getThreadCount() it is not clear > > there is no race with new non-tested threads creation. Is it possible? > > If so, then the check at line 206 is going to fail. > > 201 private static void checkLiveThreads(int numNewThreads, > > 202 int numTerminatedThreads) { > > 203 int diff = numNewThreads - numTerminatedThreads; > > 204 long threadCount = mbean.getThreadCount(); > > 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > > 206 if (threadCount < expectedThreadCount) { > > 207 testFailed = true; > > > > Thanks, > > Serguei > > > > On 6/3/20 20:42, Daniil Titov wrote: > > Hi Alex, > > > > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > Hi again Daniil, > > > > On 06/03/2020 16:31, Daniil Titov wrote: > > > Hi Alex, > > > > > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > > > > > I will include this change in the new version of the webrev. > > > > > >> 207 int diff = numNewThreads - numTerminatedThreads; > > >> 208 long threadCount = mbean.getThreadCount(); > > >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > >> 210 if (threadCount < expectedThreadCount) { > > >> if some internal thread terminates, we'll get failure here > > > > > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > > > > > 310 prevLiveTestThreadCount = getTestThreadCount(); > > > > Oh, yes, I missed it. > > > > LGTM. > > > > --alex > > > > > > > > Best regards, > > > Daniil > > > > > > > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > Hi Daniil, > > > > > > couple notes: > > > > > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > > > > > You don't actually need any wait here. > > > Test cases wait until all threads are in desired state > > > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > > > checkAllThreadsDead use join()) > > > > > > > > > 205 private static void checkLiveThreads(int numNewThreads, > > > 206 int numTerminatedThreads) { > > > 207 int diff = numNewThreads - numTerminatedThreads; > > > 208 long threadCount = mbean.getThreadCount(); > > > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > > 210 if (threadCount < expectedThreadCount) { > > > > > > if some internal thread terminates, we'll get failure here > > > > > > > > > --alex > > > > > > On 06/02/2020 21:00, Daniil Titov wrote: > > > > Hi Alex, Serguei, and Martin, > > > > > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > > > 3) Relaxes the check inside checkThreadIds() method > > > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > > > Hi Daniil, > > > > > > > > 1. before the fix checkLiveThreads() tested > > > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > > > Thread.getAllStackTraces(); > > > > > > > > 2. > > > > 237 private static void checkThreadIds() throws InterruptedException { > > > > 238 long[] list = mbean.getAllThreadIds(); > > > > 239 > > > > 240 waitTillEquals( > > > > 241 list.length, > > > > 242 ()->(long)mbean.getThreadCount(), > > > > 243 "Array length returned by " + > > > > 244 "getAllThreadIds() = %1$d not matched count = > > > > ${provided}", > > > > 245 ()->list.length > > > > 246 ); > > > > 247 } > > > > > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > > > of VM internal threads. > > > > But if some internal thread terminates after mbean.getAllThreadIds() and > > > > before 1st mbean.getThreadCount() call and then VM does not need to > > > > restart it, waitTillEquals will wait forever. > > > > > > > > --alex > > > > > > > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > > > Hi Alex and Serguei, > > > > > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > > > tier1-tier3 tests are in progress. > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > Thank you, > > > > > Daniil > > > > > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > > > > > Hi Daniil, > > > > > > > > > > I'm not sure all this retry logic is a good way. > > > > > As mentioned in jira the most important part of the testing is ensuring > > > > > that you find all the created threads when they are alive, and you don't > > > > > find them when they are dead. The actual thread count checking is not > > > > > that important. > > > > > I agree with this and I'd just simplify the test by removing checks for > > > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > > > > > --alex > > > > > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > > > Thank you, > > > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From daniil.x.titov at oracle.com Fri Jun 5 00:39:29 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Thu, 04 Jun 2020 17:39:29 -0700 Subject: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently In-Reply-To: <54b93ec6-e60e-d32b-5d73-51de2619cdea@oracle.com> References: <35299A9D-FBE7-443F-AFA4-765CA247931E@oracle.com> <10954117-967a-e6f6-13c8-86d8322a4330@oracle.com> <26868807-1c9f-8378-c201-5992bff3de20@oracle.com> <896d46c7-38fc-3cc3-3f06-4154108ac5fb@oracle.com> <09505C40-A14F-44D0-99CB-F72D0DC914FD@oracle.com> <1BEDB6A2-54C5-4202-931C-E1684BC539CE@oracle.com> <9E69E9FC-C041-4581-9810-2AE1BFA33B9B@oracle.com> <54b93ec6-e60e-d32b-5d73-51de2619cdea@oracle.com> Message-ID: <2B6BC713-5F4B-4AF1-8EB0-C77EEC9ED85D@oracle.com> Thank you, Serguei! I will add a comment before pushing the fix. Best regards, Daniil ?On 6/4/20, 4:56 PM, "serguei.spitsyn at oracle.com" wrote: Hi Daniil, Got it, thanks. I think, some short comment above this comparisons would be helpful. LGTM. Thanks, Serguei On 6/4/20 16:45, Daniil Titov wrote: > Hi Serguei, > >> Note, the threads can be terminated even after the diff value is >> calculated at line 203. > Please note that the diff value calculated on line 203 shows how many *test* threads were created or terminated, > numNewThreads is number of new *test* threads and numTerminatedThreads is number of terminated *test* threads. > > No *test* thread can terminate or start after the diff value is calculated. > > Number of threads mbean.getThreadCount() could be seen as number of live *test* threads plus number of live internal (non-test) threads, > or A = B + C , where A - result of mbean.getThreadCount(), B - number of live test threads, C - number of live non-test threads. > > Regardless what happens with internal "non-tested" threads the invariant that this method tests is that number of threads > mbean.getThreadCount() returns could not be less than number of live test threads, or that A >= B. > > > Best regards, > Daniil > > ?On 6/4/20, 4:08 PM, "serguei.spitsyn at oracle.com" wrote: > > Hi Daniil, > > > On 6/4/20 16:01, Daniil Titov wrote: > > Hi Serguei, > > > >> 201 private static void checkLiveThreads(int numNewThreads, > >> 202 int numTerminatedThreads) { > >> 203 int diff = numNewThreads - numTerminatedThreads; > >> 204 long threadCount = mbean.getThreadCount(); > >> 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > >> 206 if (threadCount < expectedThreadCount) { > >> 207 testFailed = true; > >> When all threads are counted with mbean.getThreadCount() it is not clear > >> there is no race with new non-tested threads creation. Is it possible? > >> If so, then the check at line 206 is going to fail. > > Even if some Internal (non-tested) threads are created the value mbean.getThreadCount() returns should be no less than the expected number of live test threads (please note that prevLiveTestThreadCount counts only *test* threads) that means that condition on line 206 will be evaluated to *false* and line 207 will not be executed and the test will pass. > > Okay, I see that it is failure condition. > But then is there a race with (non-tested) threads termination? > Note, the threads can be terminated even after the diff value is > calculated at line 203. > I'm sorry, if the same questions are repeated again. > > Thanks, > Serguei > > > --Best regards, > > Daniil > > > > From: "serguei.spitsyn at oracle.com" > > Date: Thursday, June 4, 2020 at 3:03 PM > > To: Daniil Titov , Alex Menkov , serviceability-dev > > Subject: Re: RFR: 8131745: java/lang/management/ThreadMXBean/AllThreadIds.java still fails intermittently > > > > Hi Daniil, > > > > It is hard to be on top of all the details in these review rounds. > > When all threads are counted with mbean.getThreadCount() it is not clear > > there is no race with new non-tested threads creation. Is it possible? > > If so, then the check at line 206 is going to fail. > > 201 private static void checkLiveThreads(int numNewThreads, > > 202 int numTerminatedThreads) { > > 203 int diff = numNewThreads - numTerminatedThreads; > > 204 long threadCount = mbean.getThreadCount(); > > 205 long expectedThreadCount = prevLiveTestThreadCount + diff; > > 206 if (threadCount < expectedThreadCount) { > > 207 testFailed = true; > > > > Thanks, > > Serguei > > > > On 6/3/20 20:42, Daniil Titov wrote: > > Hi Alex, > > > > Please review a new version of the webrev [1] that no longer uses waitTillEquals() method. > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.04/ > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > Thank you, > > Daniil > > > > ?On 6/3/20, 4:42 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > Hi again Daniil, > > > > On 06/03/2020 16:31, Daniil Titov wrote: > > > Hi Alex, > > > > > > Thanks for this suggestion. You are right, we actually don't need this waitForAllThreads() method. > > > > > > I will include this change in the new version of the webrev. > > > > > >> 207 int diff = numNewThreads - numTerminatedThreads; > > >> 208 long threadCount = mbean.getThreadCount(); > > >> 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > >> 210 if (threadCount < expectedThreadCount) { > > >> if some internal thread terminates, we'll get failure here > > > > > > The failure will not happen. Please note that prevLiveTestThreadCount counts only *test* threads. Thus even if some Internal threads terminated the value mbean.getThreadCount() returns should still be no less than the expected number of live test threads. > > > > > > 310 prevLiveTestThreadCount = getTestThreadCount(); > > > > Oh, yes, I missed it. > > > > LGTM. > > > > --alex > > > > > > > > Best regards, > > > Daniil > > > > > > > > > ?On 6/3/20, 3:08 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > Hi Daniil, > > > > > > couple notes: > > > > > > 198 waitForThreads(numNewThreads, numTerminatedThreads); > > > > > > You don't actually need any wait here. > > > Test cases wait until all threads are in desired state > > > (checkAllThreadsAlive uses startupCheck, checkDaemonThreadsDead and > > > checkAllThreadsDead use join()) > > > > > > > > > 205 private static void checkLiveThreads(int numNewThreads, > > > 206 int numTerminatedThreads) { > > > 207 int diff = numNewThreads - numTerminatedThreads; > > > 208 long threadCount = mbean.getThreadCount(); > > > 209 long expectedThreadCount = prevLiveTestThreadCount + diff; > > > 210 if (threadCount < expectedThreadCount) { > > > > > > if some internal thread terminates, we'll get failure here > > > > > > > > > --alex > > > > > > On 06/02/2020 21:00, Daniil Titov wrote: > > > > Hi Alex, Serguei, and Martin, > > > > > > > > Thank you for your comments. Please review a new version of the fix that addresses them, specifically: > > > > 1) Replaces a double loop in checkAllThreadsAlive() with a code that uses collections and containsAll() method. > > > > 2) Restores the checks for other ThreadMXBean methods (getThreadCount(), getTotalStartedThreadCount(), getPeakThreadCount()) but with more relaxed conditions. > > > > 3) Relaxes the check inside checkThreadIds() method > > > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.03/ > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > Thank you, > > > > Daniil > > > > > > > > ?On 6/1/20, 5:06 PM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > > > Hi Daniil, > > > > > > > > 1. before the fix checkLiveThreads() tested > > > > ThreadMXBean.getThreadCount(), but now as far as I see it tests > > > > Thread.getAllStackTraces(); > > > > > > > > 2. > > > > 237 private static void checkThreadIds() throws InterruptedException { > > > > 238 long[] list = mbean.getAllThreadIds(); > > > > 239 > > > > 240 waitTillEquals( > > > > 241 list.length, > > > > 242 ()->(long)mbean.getThreadCount(), > > > > 243 "Array length returned by " + > > > > 244 "getAllThreadIds() = %1$d not matched count = > > > > ${provided}", > > > > 245 ()->list.length > > > > 246 ); > > > > 247 } > > > > > > > > I suppose purpose of waitTillEquals() is to handle creation/termination > > > > of VM internal threads. > > > > But if some internal thread terminates after mbean.getAllThreadIds() and > > > > before 1st mbean.getThreadCount() call and then VM does not need to > > > > restart it, waitTillEquals will wait forever. > > > > > > > > --alex > > > > > > > > > > > > On 05/29/2020 16:28, Daniil Titov wrote: > > > > > Hi Alex and Serguei, > > > > > > > > > > Please review a new version of the change [1] that makes sure that the test counts > > > > > only the threads it creates and ignores Internal threads VM might create or destroy. > > > > > > > > > > Testing: Running this test in Mach5 with Graal on several hundred times , > > > > > tier1-tier3 tests are in progress. > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.02/ > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > Thank you, > > > > > Daniil > > > > > > > > > > ?On 5/22/20, 10:26 AM, "Alex Menkov" mailto:alexey.menkov at oracle.com wrote: > > > > > > > > > > Hi Daniil, > > > > > > > > > > I'm not sure all this retry logic is a good way. > > > > > As mentioned in jira the most important part of the testing is ensuring > > > > > that you find all the created threads when they are alive, and you don't > > > > > find them when they are dead. The actual thread count checking is not > > > > > that important. > > > > > I agree with this and I'd just simplify the test by removing checks for > > > > > thread count. VM may create and destroy internal threads when it needs it. > > > > > > > > > > --alex > > > > > > > > > > On 05/18/2020 10:31, Daniil Titov wrote: > > > > > > Please review the change [1] that fixes an intermittent failure of the test. > > > > > > > > > > > > This test creates and destroys a given number of daemon/user threads and validates the count of those started/stopped threads against values returned from ThreadMXBean thread counts. The problem here is that if some internal threads is started ( e.g. " HotSpotGraalManagement Bean Registration"), or destroyed (e.g. "JVMCI CompilerThread ") the test hangs waiting for expected number of live threads. > > > > > > > > > > > > The fix limits the time the test is waiting for desired number of live threads and in case if this limit is exceeded the test repeats itself. > > > > > > > > > > > > Testing. Test with Graal on and Mach5 tier1-tier7 test passed. > > > > > > > > > > > > [1] http://cr.openjdk.java.net/~dtitov/8131745/webrev.01 > > > > > > [2] https://bugs.openjdk.java.net/browse/JDK-8131745 > > > > > > > > > > > > Thank you, > > > > > > Daniil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From david.holmes at oracle.com Fri Jun 5 02:46:24 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jun 2020 12:46:24 +1000 Subject: RFR(XS): 8245913: JDI and JDWP ThreadReference::stop should only allow ThreadDeath In-Reply-To: <22cc3912-f0ee-770b-c4a3-8fc24c180443@oracle.com> References: <22cc3912-f0ee-770b-c4a3-8fc24c180443@oracle.com> Message-ID: <3ad1aa18-c2e1-413f-1af4-e988317b004d@oracle.com> Just for the record, this change has been withdrawn and no changes will be made to JDI or JDWP. David On 30/05/2020 8:09 am, serguei.spitsyn at oracle.com wrote: > Hi David and reviewers, > > I've updated the webrev and CSR according to agreement with David to add > new error code to the JDWP ThreadReference::Stop command and new exception > to the JDI ThreadReference::stop method. > Also, I've updated one of the nsk.jdi tests to provide a necessary test > coverage. > > Thanks, > Serguei > > On 5/26/20 22:58, serguei.spitsyn at oracle.com wrote: >> Please, review a fix for: >> https://bugs.openjdk.java.net/browse/JDK-8245913 >> >> CSR draft (one CSR reviewer is needed before finalizing it): >> https://bugs.openjdk.java.net/browse/JDK-8245923 >> >> Webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-stop-thread.1/src/ >> >> >> Updated JDI ThreadReference:stop method spec: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-stop-thread.1/docs/api/jdk.jdi/com/sun/jdi/ThreadReference.html#stop(com.sun.jdi.ObjectReference) >> >> >> Updated JDWP ThreadReference:Stop command spec: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-stop-thread.1/docs/specs/jdwp/jdwp-protocol.html#JDWP_ThreadReference_Stop >> >> >> Summary: >> ? The RFR+CSR for JVMTI StopThread has been posted to the >> serviceability-dev mailing list. >> ? This is a JPDA (JDI+JDWP) related spec update. >> ? One question is if it is okay to refer to the ThreadDeath as an >> Exception while, in fact, it is an Error. >> ? This update follows the initial JVM TI StopThread spec terminology. >> >> Testing: >> ? Built docs and checked the doc has been generated as expected. >> ? Will run the JDI/JDWP tests locally. >> ? Will submit hs-tiers1-5 to make sure there are no regressions in the >> JDI/JDWP tests. >> >> Thanks, >> Serguei > From serguei.spitsyn at oracle.com Fri Jun 5 02:53:07 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 19:53:07 -0700 Subject: RFR(XS): 8245913: JDI and JDWP ThreadReference::stop should only allow ThreadDeath In-Reply-To: <3ad1aa18-c2e1-413f-1af4-e988317b004d@oracle.com> References: <22cc3912-f0ee-770b-c4a3-8fc24c180443@oracle.com> <3ad1aa18-c2e1-413f-1af4-e988317b004d@oracle.com> Message-ID: <383ad196-4146-329e-125d-762a5cf7de87@oracle.com> Yes, that is right. Thank you for the comment, David. Thanks, Serguei On 6/4/20 19:46, David Holmes wrote: > Just for the record, this change has been withdrawn and no changes > will be made to JDI or JDWP. > > David > > On 30/05/2020 8:09 am, serguei.spitsyn at oracle.com wrote: >> Hi David and reviewers, >> >> I've updated the webrev and CSR according to agreement with David to add >> new error code to the JDWP ThreadReference::Stop command and new >> exception >> to the JDI ThreadReference::stop method. >> Also, I've updated one of the nsk.jdi tests to provide a necessary >> test coverage. >> >> Thanks, >> Serguei >> >> On 5/26/20 22:58, serguei.spitsyn at oracle.com wrote: >>> Please, review a fix for: >>> https://bugs.openjdk.java.net/browse/JDK-8245913 >>> >>> CSR draft (one CSR reviewer is needed before finalizing it): >>> https://bugs.openjdk.java.net/browse/JDK-8245923 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-stop-thread.1/src/ >>> >>> >>> >>> Updated JDI ThreadReference:stop method spec: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-stop-thread.1/docs/api/jdk.jdi/com/sun/jdi/ThreadReference.html#stop(com.sun.jdi.ObjectReference) >>> >>> >>> >>> Updated JDWP ThreadReference:Stop command spec: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-stop-thread.1/docs/specs/jdwp/jdwp-protocol.html#JDWP_ThreadReference_Stop >>> >>> >>> >>> Summary: >>> ? The RFR+CSR for JVMTI StopThread has been posted to the >>> serviceability-dev mailing list. >>> ? This is a JPDA (JDI+JDWP) related spec update. >>> ? One question is if it is okay to refer to the ThreadDeath as an >>> Exception while, in fact, it is an Error. >>> ? This update follows the initial JVM TI StopThread spec terminology. >>> >>> Testing: >>> ? Built docs and checked the doc has been generated as expected. >>> ? Will run the JDI/JDWP tests locally. >>> ? Will submit hs-tiers1-5 to make sure there are no regressions in >>> the JDI/JDWP tests. >>> >>> Thanks, >>> Serguei >> From serguei.spitsyn at oracle.com Fri Jun 5 05:19:18 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 4 Jun 2020 22:19:18 -0700 Subject: RFR (XS): 8196450: Deprecate JDWP/JDI canUnrestrictedlyRedefineClasses to match JVM TI capabilities In-Reply-To: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> References: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From richard.reingruber at sap.com Fri Jun 5 07:18:53 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 5 Jun 2020 07:18:53 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: Hi, > The mach5 test run is good. Thanks Serguei and thanks to everybody providing feedback! I just pushed the change. Just curious: is mach5 an alias for tier5? And is this mach5 the same as in "Job: mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job? Thanks, Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Donnerstag, 4. Juni 2020 04:07 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, The mach5 test run is good. Thanks, Serguei On 6/2/20 10:57, Reingruber, Richard wrote: > Hi Serguei, > >> This looks good to me. > Thanks! > > From an earlier mail: > >> I'm thinking it would be more safe to run full tier5. > I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would > like to push. > > Thanks, Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Dienstag, 2. Juni 2020 18:55 > To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > This looks good to me. > > Thanks, > Serguei > > > On 5/28/20 09:02, Vladimir Kozlov wrote: >> Vladimir Ivanov is on break currently. >> It looks good to me. >> >> Thanks, >> Vladimir K >> >> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>> Hi Vladimir, >>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> ? From JIT-compilers perspective it looks good. >>> I put out webrev.1 a while ago [1]: >>> >>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>> Webrev(delta): >>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>> >>> You originally suggested to use a handshake to switch a thread into >>> interpreter mode [2]. I'm using >>> a direct handshake now, because I think it is the best fit. >>> >>> May I ask if webrev.1 still looks good to you from JIT-compilers >>> perspective? >>> >>> Can I list you as (partial) Reviewer? >>> >>> Thanks, Richard. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>> [2] >>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>> >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Freitag, 7. Februar 2020 09:19 >>> To: Reingruber, Richard ; >>> serviceability-dev at openjdk.java.net; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S) 8238585: Use handshake for >>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>> compiled methods on stack not_entrant >>> >>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>> Not an expert in JVMTI code base, so can't comment on the actual >>> changes. >>> >>> ? From JIT-compilers perspective it looks good. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>> >>>> The change avoids making all compiled methods on stack not_entrant >>>> when switching a java thread to >>>> interpreter only execution for jvmti purposes. It is sufficient to >>>> deoptimize the compiled frames on stack. >>>> >>>> Additionally a handshake is used instead of a vm operation to walk >>>> the stack and do the deoptimizations. >>>> >>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>> release builds on all platforms. >>>> >>>> Thanks, Richard. >>>> >>>> See also my question if anyone knows a reason for making the >>>> compiled methods not_entrant: >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>> >>>> From serguei.spitsyn at oracle.com Fri Jun 5 07:31:01 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 5 Jun 2020 00:31:01 -0700 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> Message-ID: <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com> Hi Richard, On 6/5/20 00:18, Reingruber, Richard wrote: > Hi, > >> The mach5 test run is good. > Thanks Serguei and thanks to everybody providing feedback! I just pushed the change. Great, thanks! > Just curious: is mach5 an alias for tier5? The mach5 is a build and test system which also provides CI. Tier5 is one of the testing levels. > And is this mach5 the same as in "Job: > mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job? Yes. I guess all mach5 jobs have this prefix. Thanks, Serguei > > Thanks, > Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Donnerstag, 4. Juni 2020 04:07 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > The mach5 test run is good. > > Thanks, > Serguei > > > On 6/2/20 10:57, Reingruber, Richard wrote: >> Hi Serguei, >> >>> This looks good to me. >> Thanks! >> >> From an earlier mail: >> >>> I'm thinking it would be more safe to run full tier5. >> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would >> like to push. >> >> Thanks, Richard. >> >> -----Original Message----- >> From: serguei.spitsyn at oracle.com >> Sent: Dienstag, 2. Juni 2020 18:55 >> To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant >> >> Hi Richard, >> >> This looks good to me. >> >> Thanks, >> Serguei >> >> >> On 5/28/20 09:02, Vladimir Kozlov wrote: >>> Vladimir Ivanov is on break currently. >>> It looks good to me. >>> >>> Thanks, >>> Vladimir K >>> >>> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>>> Hi Vladimir, >>>> >>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>>> Not an expert in JVMTI code base, so can't comment on the actual >>>>> changes. >>>>> ? From JIT-compilers perspective it looks good. >>>> I put out webrev.1 a while ago [1]: >>>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>>> Webrev(delta): >>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>>> >>>> You originally suggested to use a handshake to switch a thread into >>>> interpreter mode [2]. I'm using >>>> a direct handshake now, because I think it is the best fit. >>>> >>>> May I ask if webrev.1 still looks good to you from JIT-compilers >>>> perspective? >>>> >>>> Can I list you as (partial) Reviewer? >>>> >>>> Thanks, Richard. >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>>> [2] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>>> >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Freitag, 7. Februar 2020 09:19 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(S) 8238585: Use handshake for >>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>>> compiled methods on stack not_entrant >>>> >>>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> >>>> ? From JIT-compilers perspective it looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>>> >>>>> The change avoids making all compiled methods on stack not_entrant >>>>> when switching a java thread to >>>>> interpreter only execution for jvmti purposes. It is sufficient to >>>>> deoptimize the compiled frames on stack. >>>>> >>>>> Additionally a handshake is used instead of a vm operation to walk >>>>> the stack and do the deoptimizations. >>>>> >>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>>> release builds on all platforms. >>>>> >>>>> Thanks, Richard. >>>>> >>>>> See also my question if anyone knows a reason for making the >>>>> compiled methods not_entrant: >>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>>> >>>>> From richard.reingruber at sap.com Fri Jun 5 08:05:46 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 5 Jun 2020 08:05:46 +0000 Subject: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant In-Reply-To: <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com> References: <32f34616-cf17-8caa-5064-455e013e2313@oracle.com> <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com> <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com> Message-ID: I see. Thanks for the explanation :) Richard. -----Original Message----- From: serguei.spitsyn at oracle.com Sent: Freitag, 5. Juni 2020 09:31 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant Hi Richard, On 6/5/20 00:18, Reingruber, Richard wrote: > Hi, > >> The mach5 test run is good. > Thanks Serguei and thanks to everybody providing feedback! I just pushed the change. Great, thanks! > Just curious: is mach5 an alias for tier5? The mach5 is a build and test system which also provides CI. Tier5 is one of the testing levels. > And is this mach5 the same as in "Job: > mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job? Yes. I guess all mach5 jobs have this prefix. Thanks, Serguei > > Thanks, > Richard. > > -----Original Message----- > From: serguei.spitsyn at oracle.com > Sent: Donnerstag, 4. Juni 2020 04:07 > To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant > > Hi Richard, > > The mach5 test run is good. > > Thanks, > Serguei > > > On 6/2/20 10:57, Reingruber, Richard wrote: >> Hi Serguei, >> >>> This looks good to me. >> Thanks! >> >> From an earlier mail: >> >>> I'm thinking it would be more safe to run full tier5. >> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would >> like to push. >> >> Thanks, Richard. >> >> -----Original Message----- >> From: serguei.spitsyn at oracle.com >> Sent: Dienstag, 2. Juni 2020 18:55 >> To: Vladimir Kozlov ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net >> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant >> >> Hi Richard, >> >> This looks good to me. >> >> Thanks, >> Serguei >> >> >> On 5/28/20 09:02, Vladimir Kozlov wrote: >>> Vladimir Ivanov is on break currently. >>> It looks good to me. >>> >>> Thanks, >>> Vladimir K >>> >>> On 5/26/20 7:31 AM, Reingruber, Richard wrote: >>>> Hi Vladimir, >>>> >>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>>> Not an expert in JVMTI code base, so can't comment on the actual >>>>> changes. >>>>> ? From JIT-compilers perspective it looks good. >>>> I put out webrev.1 a while ago [1]: >>>> >>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/ >>>> Webrev(delta): >>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/ >>>> >>>> You originally suggested to use a handshake to switch a thread into >>>> interpreter mode [2]. I'm using >>>> a direct handshake now, because I think it is the best fit. >>>> >>>> May I ask if webrev.1 still looks good to you from JIT-compilers >>>> perspective? >>>> >>>> Can I list you as (partial) Reviewer? >>>> >>>> Thanks, Richard. >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html >>>> [2] >>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html >>>> >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Freitag, 7. Februar 2020 09:19 >>>> To: Reingruber, Richard ; >>>> serviceability-dev at openjdk.java.net; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(S) 8238585: Use handshake for >>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make >>>> compiled methods on stack not_entrant >>>> >>>> >>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/ >>>> Not an expert in JVMTI code base, so can't comment on the actual >>>> changes. >>>> >>>> ? From JIT-compilers perspective it looks good. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585 >>>>> >>>>> The change avoids making all compiled methods on stack not_entrant >>>>> when switching a java thread to >>>>> interpreter only execution for jvmti purposes. It is sufficient to >>>>> deoptimize the compiled frames on stack. >>>>> >>>>> Additionally a handshake is used instead of a vm operation to walk >>>>> the stack and do the deoptimizations. >>>>> >>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and >>>>> release builds on all platforms. >>>>> >>>>> Thanks, Richard. >>>>> >>>>> See also my question if anyone knows a reason for making the >>>>> compiled methods not_entrant: >>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html >>>>> >>>>> From per.liden at oracle.com Fri Jun 5 08:20:22 2020 From: per.liden at oracle.com (Per Liden) Date: Fri, 5 Jun 2020 10:20:22 +0200 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: References: Message-ID: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com> Hi Igor, When looking at the follow-up sub-tasks for this, I see for example this: http://cr.openjdk.java.net/~iignatyev/8246499/webrev.00/test/hotspot/jtreg/gc/z/TestSmallHeap.java.udiff.html Maybe I'm misunderstanding how this is supposed to work, but it looks like this test would now _not_ be executed if I do: make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" Is that so? In that case, that seems incorrect. cheers, Per On 6/3/20 11:30 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> 70 lines changed: 66 ins; 0 del; 4 mod > > Hi all, > > could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? > > the idea behind this patch is to have a way to clearly mark tests which ignore flags, so > a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; > b) they can be easily excluded from runs w/ flags. > > @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. > > this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. > > please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 > webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 > testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags > > [1] https://bugs.openjdk.java.net/browse/JDK-8151707 > [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 > [3] https://bugs.openjdk.java.net/browse/JDK-8246387 > > Thanks, > -- Igor > From markus.gaisbauer at gmail.com Fri Jun 5 10:45:12 2020 From: markus.gaisbauer at gmail.com (Markus Gaisbauer) Date: Fri, 5 Jun 2020 12:45:12 +0200 Subject: JVMTI callback SampledObjectAlloc always fires for first allocation in a new thread Message-ID: Hi, JVMTI callback SampledObjectAlloc is currently always called for the first allocation of a thread. This generates a lot of bias in an application that regularly starts new threads. I tested this with latest Java 11 and Java 15. E.g. here is a sample that creates 100 threads and allocates one object in each thread. public class AllocationProfilingBiasReproducer { public static void main(String[] args) throws Exception { for (int i = 0; i < 100; i++) { new Thread(new Task(), "Task " + i).start(); Thread.sleep(1); } Thread.sleep(1000); } private static class Task implements Runnable { @Override public void run() { new A(); } } private static class A { } } I built a simple JVMTI agent that registers SampledObjectAlloc callback and sets interval to 1 MB with SetHeapSamplingInterval. The callback simply logs thread name and class name of allocated object. I see the following output: SampledObjectAlloc Ljava/lang/String; via Task 0 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 1 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 2 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 3 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 4 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 5 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 6 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 7 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 8 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 9 SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 10 ... This is not expected. I set a breakpoint in my SampledObjectAlloc callback and observed the following: In MemAllocator::Allocation::notify_allocation_jvmti_sampler() the local var bytes_since_last is always 0xf1f1f1f1f1f1f1f1 for first allocation of a thread. So first allocation is always reported to my agent. ThreadLocalAllocBuffer::_bytes_since_last_sample_point does not seem to be explicitly initialized before accessing it for the first time. I assume 0xf1f1f1f1f1f1f1f1 is a default value provided by some Hotspot allocator. Only after the first event fired, notify_allocation_jvmti_sampler calls ThreadLocalAllocBuffer::set_sample_end which initializes _bytes_since_last_sample_point to a proper value. I am looking for someone who could create a JIRA ticket for this. Regards, Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Jun 5 13:05:56 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jun 2020 23:05:56 +1000 Subject: RFR (XS): 8196450: Deprecate JDWP/JDI canUnrestrictedlyRedefineClasses to match JVM TI capabilities In-Reply-To: References: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> Message-ID: <847a64cb-77a8-150e-2e0e-b7796af96737@oracle.com> Sorry Serguei I got distracted and forgot about the RFR part of this. Reviewed :) Thanks, David On 5/06/2020 3:19 pm, serguei.spitsyn at oracle.com wrote: > Hi David, > > You have already approved the CSR below. > May I count it as a review as there is no difference between CSR and > webrev - both have the same spec update? > > Thanks, > Serguei > > > On 6/3/20 20:57, serguei.spitsyn at oracle.com wrote: >> Please, review a fix for: >> https://bugs.openjdk.java.net/browse/JDK-8196450 >> >> >> CSR draft (one CSR reviewer is needed before finalizing it): >> https://bugs.openjdk.java.net/browse/JDK-8246540 >> >> >> Webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/src/ >> >> >> Updated JDWP VirtualMachine::capabilitiesNew spec: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/specs/jdwp/jdwp-protocol.html#JDWP_VirtualMachine_CapabilitiesNew >> >> Updated JDI com.sun.jdi.VirtualMachine spec: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canAddMethod() >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canUnrestrictedlyRedefineClasses() >> >> >> Summary: >> ? The fix adds annotations and deprecation comments to the capabilities >> ?? canUnrestrictedlyRedefineClasses and canAddMethod. >> ?? It impacts the JDWP capabilitiesNew command and the JDI >> VirtualMachine interface. >> >> >> Testing: >> ? Built docs and checked the doc has been generated as expected. >> ? Will run the JDI/JDWP tests locally >> >> Thanks, >> Serguei > From ralf.schmelter at sap.com Fri Jun 5 15:01:30 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Fri, 5 Jun 2020 15:01:30 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> , Message-ID: Hi Goetz, thanks for the detailed review. I've incorporated your suggestions into a new webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.4/ In addition this new version adds a feature to the WorkGang, to execute the task in the foreground thread too (the thread which calls the run_task() method). This was needed, since now the SymbolTable::do_symbols() method can only be run from the VM Thread. As an additional side effect this allows parallel heap dumps in the Shenandoah GC case, which had a similar restriction (iterating the objects can only be done from a non-worker thread). Some specific remarks: > l.113 > What's the point of increasing needed_out_size > after the call? You increment the pointer? Good catch. I forgot the '*'. > l.400 > I had one overall question, which I think is ansered here > at least partially: > As I understand, writing the dump now needs more buffer memory, > as there are several WriteWorks held at the same time. > Are they smaller than the buffer used before, so no additional > memory is needed, or is there a fallback if only a few can be > allocated? Is the fallback implemented here implicitly? Just > because if there is no memory for more works, the algorithm uses the > ones it could allocate, which might result in some idle > threads as there are less works than threads? Yes, the buffer is now smaller (1M) versus the original (8M). You need to be able to at least allocate one buffer or you get an error (this is handled in the CompressionBackend ctor). You then allocate additional buffers as needed (we want a new buffer, but there is no free one), until we have a buffer for every worker thread or until the allocation of the buffer failed. In this case some threads will be idle, since we cannot have a buffer for each thread. > Another question. > The basic dumping is done sequential, right? The comression > is parallel. Is there a tradeoff in #of threads where > the compression is faster than writing? Yes. The compression and writing is done parallel. Depeding on the compression level and the speed of your harddrive, not all threads will be active all the time. But since we reuse the GC threads this should not matter. And the relative poor performance of deflate() ensures that at least 5 to 10 threads will probably always be active ;) > The other Tests: > Please merge them all into HeapDumpCompressedTest by using repeated > @test comments. You might not be aware this is > supported by jtreg. See test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java for an example. > It will run each @test block sperately and evaluate the @requires as expected. Cool, I really wasn't aware of it. Best regards, Ralf ________________________________ From: Lindenmaier, Goetz Sent: Wednesday, June 3, 2020 3:14 PM To: Schmelter, Ralf ; Langer, Christoph Cc: serviceability-dev at openjdk.java.net ; hotspot-runtime-dev at openjdk.java.net runtime Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, I had a look at your change, webrev.3. Thanks for contributing this! Overall, a nicely engineered piece of work. Thus, my comments are mostly minor details: diagnosticCommand.hpp ok. diagnosticCommand.cpp: l.510 I would be a bit more precise in the comment: ..."9 the slowest level with the best compression." or maybe "strongest compression"? l. 528 I would appreciate if you fixed the existing comment wrt. to the language: // Request a full GC before dumping the heap if _all is false. // This helps reduce the amount of unreachable objects in the dump heapDumper.hpp ok. heapDumper.cpp Error messags is now recorded in _backend. ok. Not overwriting file is moved to FileWriter, ok. I like how you split the existing code with few changes to distribute the work to the thread gang, nice! l.1808 // Now we clear the global variables, so that a future dumper might run. Is "might" correct? Isn't is "can"? l.1819 // Write the file header - we always use 1.0. You lost the ".2" from 1.0.2. heapDumperCompression.hpp Usually, in the include guards, only '/' are replaced by '_'. l.31 Extra whitespace before "implementation". l.36 Initialized --> Initializes Return --> Returns it initialized --> initializes l.119 works --> WriteWorks ... I had to think about this a while to figure it's not a typo of 'work' but names WriteWork instances in short. But the term is used throughout the code, so maybe leave it as-is. l.163 Remove "to". l.165 returns the old --> commits the old ... or the like. l.210 type-o maxiumum heapDumperCompression.cpp It's a bit confusing that the static variable is called gzip_func (referring to a dedicated function), while there is a method load_gzip_func that loads any function from the gzip library. What about gzip_zip_func for the variable? l.113 What's the point of increasing needed_out_size after the call? You increment the pointer? l.125 add "of the": good choice of the buffer sizes CompressionBackend(): The check not to overwrite the inital, first error is in set_error(). ok. l.224 I think the comment should say "write the last remaining partially...." l.400 I had one overall question, which I think is ansered here at least partially: As I understand, writing the dump now needs more buffer memory, as there are several WriteWorks held at the same time. Are they smaller than the buffer used before, so no additional memory is needed, or is there a fallback if only a few can be allocated? Is the fallback implemented here implicitly? Just because if there is no memory for more works, the algorithm uses the ones it could allocate, which might result in some idle threads as there are less works than threads? This makes it more flexible wrt. to available memory than the implementation before, right? l.441 indentation l.458 I can't understand why this variable is named "left". Is this past tense of to leave? Or do you mean the left, filled, side of the buffer? Another question. The basic dumping is done sequential, right? The comression is parallel. Is there a tradeoff in #of threads where the compression is faster than writing? zip_util.c Looks good. I appreciate the precise error message handling you are doing. Could you please add comments that these functions are used for heap dump compression? HprofReader.java ok. Reader.java Should you close in and in2 in case of error? GzipRandomAccess.java l.146 closes -> close l.158 "the the" This file nicely demonstrates how to read the zipped hprof. Maybe you can add a hint in the JBS issue to this file? HeapDumpCompressedTest.java ok. The other Tests: Please merge them all into HeapDumpCompressedTest by using repeated @test comments. You might not be aware this is supported by jtreg. See test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java for an example. It will run each @test block sperately and evaluate the @requires as expected. Best regards, Goetz. -----Original Message----- From: serviceability-dev On Behalf Of Schmelter, Ralf Sent: Montag, 18. Mai 2020 09:23 To: Langer, Christoph Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Christoph, I've updated the webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.3/ The significant changes are moving most of the new compression code to its own file, changing to use a single option (see CSR) called -gz with a mandatory compression level and to load the zlib only once (analog to the new class loader code). Additionally I've removed some long lines. Best regards, Ralf -----Original Message----- From: Langer, Christoph Sent: Friday, 1 May 2020 18:46 To: Schmelter, Ralf Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime ; coleen.phillimore at oracle.com Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, while I'm reviewing your change I think extracting the compression coding to an own file would be a good idea. Maybe you could name it heapDumpCompression.cpp? When looking at the webrev I also figured that there are some very long lines (beyond 90 chars or so). Maybe you could have a look if you could shorten some of them and break a few of these long lines? More detailed review to follow. Best regards Christoph > -----Original Message----- > From: coleen.phillimore at oracle.com > Sent: Montag, 20. April 2020 14:13 > To: Reingruber, Richard ; Schmelter, Ralf > ; Ioi Lam ; Langer, Christoph > ; Yasumasa Suenaga > ; serguei.spitsyn at oracle.com; hotspot- > runtime-dev at openjdk.java.net runtime dev at openjdk.java.net> > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > Hi, I don't want to review this but could you put this new code in its > own file? heapDumper only needs CompressionBackend to be exported, > from > what I can tell. > > Thanks, > Coleen > > On 4/20/20 6:12 AM, Reingruber, Richard wrote: > > Hi Ralf, > > > >>> 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >>> DumperSupport::end_of_dump() after the last dump segment has been > finished. > >>> You could call get_new_buffer() instead of the if clause. > >> Wow, how did you found this? I've fixed it by making sure we flush the > DumpWriter before calling the deactivate method. > > Spending long hours on the review ;) > > Ok with the fix. > > > >>> ### src/java.base/share/native/libzip/zip_util.c > >>> 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >>> measured the performance gain? In other words: is it worth it? :) > >> This is not done for performance, but to make sure the allocation will not > fail midway during writing the dump. Maybe it is not worth it, though. > > Understood. The heap dump will succeed if you can allocate at least one > WriteWork instance. Without > > that you could get out of memory errors in the zlib which would make the > dump fail. Ok! > > > >> http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.2/ > > Thanks for the clarifications and the changes in the new webrev. > > Webrev.2 looks good to me. > > > > Cheers, Richard. > > > > -----Original Message----- > > From: Schmelter, Ralf > > Sent: Montag, 20. April 2020 10:14 > > To: Reingruber, Richard ; Ioi Lam > ; Langer, Christoph ; > Yasumasa Suenaga ; > serguei.spitsyn at oracle.com; hotspot-runtime-dev at openjdk.java.net > runtime > > Cc: serviceability-dev at openjdk.java.net > > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > > > Hi Richard, > > > > thanks for the review. I have incorporated your remarks into a new > webrev: > > http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.2/ > > > > Some remarks to specific points: > > > >> ### src/hotspot/share/services/heapDumper.cpp > >> 762: assert(_active, "Must be active"); > >> > >> It appears to me that the assertion would fail, if an error occurred creating > the CompressionBackend. > > You are supposed to check for errors after creating the DumpWriter (which > creates the CompressionBackend). And in case of an error, you directly > destruct the object. I've added a comment to make that clear. > > > >> 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >> DumperSupport::end_of_dump() after the last dump segment has been > finished. > >> You could call get_new_buffer() instead of the if clause. > > Wow, how did you found this? I've fixed it by making sure we flush the > DumpWriter before calling the deactivate method. > > > >> 1064: DumpWriter::DumpWriter() > >> > >> There doesn't seem to be enough error handling if _buffer cannot be > allocated. > >> E.g. DumpWriter::write_raw() at line 1091 will enter an endless loop. > > As described above, this will not happen if we check for error after > constructing the DumpWriter. > > > >> ### src/java.base/share/native/libzip/zip_util.c > >> 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >> measured the performance gain? In other words: is it worth it? :) > > This is not done for performance, but to make sure the allocation will not > fail midway during writing the dump. Maybe it is not worth it, though. > > > >> 1655: The result of deflateBound() seems to depend on the header > comment, which is not given > >> here. Could this be an issue, because ZIP_GZip_Fully() can take a > comment? > > I've added a 1024 byte additional bytes to avoid the problem. > > > >> ### test/lib/jdk/test/lib/hprof/parser/Reader.java > >> > >> 93: is the created GzipRandomAccess instance closed somewhere? > > The object is not closed since it is still used by the Snapshot returned. > > > > Best regard, > > Ralf > > > > > > -----Original Message----- > > From: Reingruber, Richard > > Sent: Tuesday, 14 April 2020 10:30 > > To: Schmelter, Ralf ; Ioi Lam > ; Langer, Christoph ; > Yasumasa Suenaga ; > serguei.spitsyn at oracle.com; hotspot-runtime-dev at openjdk.java.net > runtime > > Cc: serviceability-dev at openjdk.java.net > > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > > > Hi Ralf, > > > > thanks for providing this enhancement to parallel gzip-compress heap > dumps! > > > > I reckon it's safe to say that the coding is sophisticated. It would be > awesome if you could sketch > > the idea of how HeapDumper, DumpWriter and CompressionBackend work > together to produce the gzipped > > dump in a source code comment. Just enough to get started if somebody > should ever have to track down > > a bug -- an unlikely event, I know ;) > > > > Please find the details of my review below. > > > > Thanks, Richard. > > // Not Reviewer > > > > -- > > > > ### src/hotspot/share/services/diagnosticCommand.cpp > > > > 510 _gzip_level("-gz-level", "The compression level from 0 (store) to 9 > (best) when writing in gzipped format.", > > 511 "INT", "FALSE", "1") { > > > > "FALSE" should be probably false. > > > > ### src/hotspot/share/services/diagnosticCommand.hpp > > Ok. > > > > ### src/hotspot/share/services/heapDumper.cpp > > > > 390: Typo: initized > > > > 415: Typo: GZipComressor > > > > 477: Could you please add a comment, how the "HPROF BLOCKSIZE" > comment is helpful? > > > > 539: Member variables of WriteWork are missing the '_' prefix. > > > > 546: Just a comment: WriteWork::in_max is actually a compile time > constant. Would be nice if it could be > > declared so. One could use templates for this, but then my favourite ide > (eclipse cdt) doesn't > > show me references and call hierarchies anymore. So I don't think it is > worth it. > > > > 591: Typo: Removes the first element. Returns NULL is empty. > > > > 663: _writer, _compressor, _lock could be const. > > > > 762: assert(_active, "Must be active"); > > > > It appears to me that the assertion would fail, if an error occurred > creating the CompressionBackend. > > > > 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > > DumperSupport::end_of_dump() after the last dump segment has > been finished. > > You could call get_new_buffer() instead of the if clause. > > > > 903: Typo: Check if we don not waste more than _max_waste > > > > 1064: DumpWriter::DumpWriter() > > > > There doesn't seem to be enough error handling if _buffer cannot be > allocated. > > E.g. DumpWriter::write_raw() at line 1091 will enter an endless loop. > > > > 2409: A comment, why Shenandoah is not supported, would be good. > > In general I'd say it is good and natural to use the GC work threads. > > > > ### src/hotspot/share/services/heapDumper.hpp > > Ok. > > > > ### src/java.base/share/native/libzip/zip_util.c > > > > I'm not familiar with zlib, but here are my .02? :) > > > > 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > > measured the performance gain? In other words: is it worth it? :) > > > > 1655: The result of deflateBound() seems to depend on the header > comment, which is not given > > here. Could this be an issue, because ZIP_GZip_Fully() can take a > comment? > > > > 1658: deflateEnd() should not be called if deflateInit2Wrapper() failed. I > think this can lead > > otherwise to a double free() call. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java > > > > 66: Maybe additionally check the exit value? > > > > 73: It's unclear to me, why this fails. Because the dump already exists? > Because the level is > > invalid? Reading the comment I'd expect success, not failure. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestEpsilo > n.java > > Ok. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestShen > andoah.java > > Ok. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestZ.jav > a > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/HprofReader.java > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/Reader.java > > > > 93: is the created GzipRandomAccess instance closed somewhere? -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Fri Jun 5 16:01:35 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 5 Jun 2020 16:01:35 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> , Message-ID: Hi Ralf, Thanks for the quick reply and all the fixes. The changes to the workgroup are ok. Reviewed. (An incremental webrev would have helped ??) What kind of tests did you run? > Yes, the buffer is now smaller (1M) versus the original (8M). You need > to be able to at least allocate one buffer or you get an error (this > is handled in the CompressionBackend ctor). You then allocate > additional buffers as needed (we want a new buffer, but there is no > free one), until we have a buffer for every worker thread or until > the allocation of the buffer failed. In this case some threads will > be idle, since we cannot have a buffer for each thread. Ok, that's what I thought. Thanks for the explanation. > >?Another question. > > The basic dumping is done sequential, right? The comression? > > is parallel. Is there a tradeoff in #of threads where > > the compression is faster than writing? > Yes. The compression and writing is done parallel. Depeding on > the compression level and the speed of your harddrive, not all > threads will be active all the time. But since we reuse the GC threads > this should not matter. And the relative poor performance of > deflate() ensures that at least 5 to 10 threads will probably always > be active ;) Ok, thanks. Best regards, Goetz. ________________________________________ From: Lindenmaier, Goetz Sent: Wednesday, June 3, 2020 3:14 PM To: Schmelter, Ralf ; Langer, Christoph Cc: mailto:serviceability-dev at openjdk.java.net ; mailto:hotspot-runtime-dev at openjdk.java.net runtime Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump ? Hi Ralf, I had a look at your change, webrev.3. Thanks for contributing this! Overall, a nicely engineered piece of work. Thus, my comments are mostly minor details: diagnosticCommand.hpp ? ok. diagnosticCommand.cpp: l.510 I would be a bit more precise in the comment: ..."9 the slowest level with the best compression." or maybe "strongest compression"? l. 528 I would appreciate if you fixed the existing comment wrt. to the language: ? // Request a full GC before dumping the heap if _all is false. ? // This helps reduce the amount of unreachable objects in the dump heapDumper.hpp ? ok. heapDumper.cpp Error messags is now recorded in _backend. ok. Not overwriting file is moved to FileWriter, ok. I like how you split the existing code with few changes to distribute the work to the thread gang, nice! l.1808 // Now we clear the global variables, so that a future dumper might run. Is "might" correct? Isn't is "can"? l.1819 // Write the file header - we always use 1.0. You lost the ".2" from 1.0.2. heapDumperCompression.hpp Usually, in the include guards, only '/' are replaced by '_'. l.31 Extra whitespace before "implementation". l.36 Initialized --> Initializes Return --> Returns it initialized --> initializes l.119 works --> WriteWorks ... I had to think about this a while to figure it's not a typo of 'work' but names WriteWork instances in short. But the term is used throughout the code, so maybe leave it as-is. l.163 Remove "to". l.165 returns the old --> commits the old? ... or the like. l.210 type-o maxiumum heapDumperCompression.cpp It's a bit confusing that the static variable is called gzip_func (referring to a dedicated function), while there is a method load_gzip_func that loads any function from the gzip library. What about gzip_zip_func for the variable? l.113 What's the point of increasing needed_out_size after the call? You increment the pointer? l.125 add "of the": good choice of the buffer sizes CompressionBackend(): The check not to overwrite the inital, first error is in set_error().? ok. l.224 I think the comment should say "write the last remaining partially...." l.400 I had one overall question, which I think is ansered here at least partially: As I understand, writing the dump now needs more buffer memory, as there are several WriteWorks held at the same time. Are they smaller than the buffer used before, so no additional memory is needed, or is there a fallback if only a few can be allocated?? Is the fallback implemented here implicitly? Just because if there is no memory for more works, the algorithm uses the ones it could allocate, which might result in some idle threads as there are less works than threads? This makes it more flexible wrt. to available memory than the implementation before, right? l.441 indentation l.458 I can't understand why this variable is named "left". Is this past tense of to leave? Or do you mean the left, filled, side of the buffer? Another question. The basic dumping is done sequential, right? The comression is parallel. Is there a tradeoff in #of threads where the compression is faster than writing? zip_util.c Looks good. I appreciate the precise error message handling you are doing. Could you please add comments that these functions are used for heap dump compression? HprofReader.java ok. Reader.java Should you close in and in2 in case of error? GzipRandomAccess.java l.146 closes -> close l.158? "the the" This file nicely demonstrates how to read the zipped hprof. Maybe you can add a hint in the JBS issue to this file? HeapDumpCompressedTest.java ok. The other Tests: Please merge them all into HeapDumpCompressedTest by using repeated @test comments. You might not be aware this is supported by jtreg. See test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java for an example. It will run each @test block sperately and evaluate the @requires as expected. Best regards, ? Goetz. -----Original Message----- From: serviceability-dev On Behalf Of Schmelter, Ralf Sent: Montag, 18. Mai 2020 09:23 To: Langer, Christoph Cc: mailto:serviceability-dev at openjdk.java.net; mailto:hotspot-runtime-dev at openjdk.java.net runtime Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Christoph, I've updated the webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.3/ The significant changes are moving most of the new compression code to its own file, changing to use a single option (see CSR) called -gz with a mandatory compression level and to load the zlib only once (analog to the new class loader code). Additionally I've removed some long lines. ?Best regards, Ralf -----Original Message----- From: Langer, Christoph Sent: Friday, 1 May 2020 18:46 To: Schmelter, Ralf Cc: mailto:serviceability-dev at openjdk.java.net; mailto:hotspot-runtime-dev at openjdk.java.net runtime ; mailto:coleen.phillimore at oracle.com Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, while I'm reviewing your change I think extracting the compression coding to an own file would be a good idea. Maybe you could name it heapDumpCompression.cpp? When looking at the webrev I also figured that there are some very long lines (beyond 90 chars or so). Maybe you could have a look if you could shorten some of them and break a few of these long lines? More detailed review to follow. Best regards Christoph > -----Original Message----- > From: mailto:coleen.phillimore at oracle.com > Sent: Montag, 20. April 2020 14:13 > To: Reingruber, Richard ; Schmelter, Ralf > ; Ioi Lam ; Langer, Christoph > ; Yasumasa Suenaga > ; mailto:serguei.spitsyn at oracle.com; hotspot- > mailto:runtime-dev at openjdk.java.net runtime mailto:dev at openjdk.java.net> > Cc: mailto:serviceability-dev at openjdk.java.net > Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > Hi, I don't want to review this but could you put this new code in its > own file?? heapDumper only needs CompressionBackend to be exported, > from > what I can tell. > > Thanks, > Coleen > > On 4/20/20 6:12 AM, Reingruber, Richard wrote: > > Hi Ralf, > > > >>> 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >>> DumperSupport::end_of_dump() after the last dump segment has been > finished. > >>> You could call get_new_buffer() instead of the if clause. > >> Wow, how did you found this? I've fixed it by making sure we flush the > DumpWriter before calling the deactivate method. > > Spending long hours on the review ;) > > Ok with the fix. > > > >>> ### src/java.base/share/native/libzip/zip_util.c > >>> 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >>>?? measured the performance gain? In other words: is it worth it? :) > >> This is not done for performance, but to make sure the allocation will not > fail midway during writing the dump. Maybe it is not worth it, though. > > Understood. The heap dump will succeed if you can allocate at least one > WriteWork instance. Without > > that you could get out of memory errors in the zlib which would make the > dump fail. Ok! > > > >> http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.2/ > > Thanks for the clarifications and the changes in the new webrev. > > Webrev.2 looks good to me. > > > > Cheers, Richard. > > > > -----Original Message----- > > From: Schmelter, Ralf > > Sent: Montag, 20. April 2020 10:14 > > To: Reingruber, Richard ; Ioi Lam > ; Langer, Christoph ; > Yasumasa Suenaga ; > mailto:serguei.spitsyn at oracle.com; mailto:hotspot-runtime-dev at openjdk.java.net > runtime > > Cc: mailto:serviceability-dev at openjdk.java.net > > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > > > Hi Richard, > > > > thanks for the review. I have incorporated your remarks into a new > webrev: > > http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.2/ > > > > Some remarks to specific points: > > > >> ### src/hotspot/share/services/heapDumper.cpp > >> 762: assert(_active, "Must be active"); > >> > >> It appears to me that the assertion would fail, if an error occurred creating > the CompressionBackend. > > You are supposed to check for errors after creating the DumpWriter (which > creates the CompressionBackend). And in case of an error, you directly > destruct the object. I've added a comment to make that clear. > > > >> 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >> DumperSupport::end_of_dump() after the last dump segment has been > finished. > >> You could call get_new_buffer() instead of the if clause. > > Wow, how did you found this? I've fixed it by making sure we flush the > DumpWriter before calling the deactivate method. > > > >> 1064: DumpWriter::DumpWriter() > >> > >> There doesn't seem to be enough error handling if _buffer cannot be > allocated. > >> E.g. DumpWriter::write_raw() at line 1091 will enter an endless loop. > > As described above, this will not happen if we check for error after > constructing the DumpWriter. > > > >> ### src/java.base/share/native/libzip/zip_util.c > >> 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >>?? measured the performance gain? In other words: is it worth it? :) > > This is not done for performance, but to make sure the allocation will not > fail midway during writing the dump. Maybe it is not worth it, though. > > > >> 1655: The result of deflateBound() seems to depend on the header > comment, which is not given > >> here. Could this be an issue, because ZIP_GZip_Fully() can take a > comment? > > I've added a 1024 byte additional bytes to avoid the problem. > > > >> ### test/lib/jdk/test/lib/hprof/parser/Reader.java > >> > >> 93: is the created GzipRandomAccess instance closed somewhere? > > The object is not closed since it is still used by the Snapshot returned. > > > > Best regard, > > Ralf > > > > > > -----Original Message----- > > From: Reingruber, Richard > > Sent: Tuesday, 14 April 2020 10:30 > > To: Schmelter, Ralf ; Ioi Lam > ; Langer, Christoph ; > Yasumasa Suenaga ; > mailto:serguei.spitsyn at oracle.com; mailto:hotspot-runtime-dev at openjdk.java.net > runtime > > Cc: mailto:serviceability-dev at openjdk.java.net > > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > > > Hi Ralf, > > > > thanks for providing this enhancement to parallel gzip-compress heap > dumps! > > > > I reckon it's safe to say that the coding is sophisticated. It would be > awesome if you could sketch > > the idea of how HeapDumper, DumpWriter and CompressionBackend work > together to produce the gzipped > > dump in a source code comment. Just enough to get started if somebody > should ever have to track down > > a bug -- an unlikely event, I know ;) > > > > Please find the details of my review below. > > > > Thanks, Richard. > > // Not Reviewer > > > > -- > > > > ### src/hotspot/share/services/diagnosticCommand.cpp > > > > 510?? _gzip_level("-gz-level", "The compression level from 0 (store) to 9 > (best) when writing in gzipped format.", > > 511?????????????? "INT", "FALSE", "1") { > > > >????? "FALSE" should be probably false. > > > > ### src/hotspot/share/services/diagnosticCommand.hpp > > Ok. > > > > ### src/hotspot/share/services/heapDumper.cpp > > > > 390: Typo: initized > > > > 415: Typo: GZipComressor > > > > 477: Could you please add a comment, how the "HPROF BLOCKSIZE" > comment is helpful? > > > > 539: Member variables of WriteWork are missing the '_' prefix. > > > > 546: Just a comment: WriteWork::in_max is actually a compile time > constant. Would be nice if it could be > >?????? declared so. One could use templates for this, but then my favourite ide > (eclipse cdt) doesn't > >?????? show me references and call hierarchies anymore. So I don't think it is > worth it. > > > > 591: Typo: Removes the first element. Returns NULL is empty. > > > > 663: _writer, _compressor, _lock could be const. > > > > 762: assert(_active, "Must be active"); > > > >?????? It appears to me that the assertion would fail, if an error occurred > creating the CompressionBackend. > > > > 767: I think _current->in_used doesn't take the final 9 bytes into account > that are written in > >?????? DumperSupport::end_of_dump() after the last dump segment has > been finished. > >?????? You could call get_new_buffer() instead of the if clause. > > > > 903: Typo: Check if we don not waste more than _max_waste > > > > 1064: DumpWriter::DumpWriter() > > > >?????? There doesn't seem to be enough error handling if _buffer cannot be > allocated. > >?????? E.g. DumpWriter::write_raw() at line 1091 will enter an endless loop. > > > > 2409: A comment, why Shenandoah is not supported, would be good. > >??????? In general I'd say it is good and natural to use the GC work threads. > > > > ### src/hotspot/share/services/heapDumper.hpp > > Ok. > > > > ### src/java.base/share/native/libzip/zip_util.c > > > > I'm not familiar with zlib, but here are my .02? :) > > > > 1610: Will be hard to beat zlib_block_alloc() and zlib_block_free() > performance wise. But have you > >??????? measured the performance gain? In other words: is it worth it? :) > > > > 1655: The result of deflateBound() seems to depend on the header > comment, which is not given > >??????? here. Could this be an issue, because ZIP_GZip_Fully() can take a > comment? > > > > 1658: deflateEnd() should not be called if deflateInit2Wrapper() failed. I > think this can lead > >??????? otherwise to a double free() call. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java > > > > 66: Maybe additionally check the exit value? > > > > 73: It's unclear to me, why this fails. Because the dump already exists? > Because the level is > >????? invalid? Reading the comment I'd expect success, not failure. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestEpsilo > n.java > > Ok. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestShen > andoah.java > > Ok. > > > > ### > test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTestZ.jav > a > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/HprofReader.java > > Ok. > > > > ### test/lib/jdk/test/lib/hprof/parser/Reader.java > > > > 93: is the created GzipRandomAccess instance closed somewhere? From igor.ignatyev at oracle.com Fri Jun 5 16:10:37 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 5 Jun 2020 09:10:37 -0700 Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property In-Reply-To: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com> References: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com> Message-ID: Hi Per, you are reading this correctly, make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" won't execute gc/z/TestSmallHeap.java; and I don't see it to be incorrect. Let me try to explain why using gc/z/TestSmallHeap.java as a running example. A hotspot test is expected not to be just runnable in an out-of-box configuration, but also to serve its purpose as much as possible (which is not always 100% given some tests require special build flavor, environment setup, etc); in other words, a test is to at least have all necessary VM flags within it and not to hope that someone will provide them. gc/z/TestSmallHeap.java does that, it explicitly selects zGC, so there is no need for -XX:+UseZGC to achieve that. Given this test can be run only when zGC can be selected, it @requires vm.gc.Z, which is set to true if zGC is already explicitly selected or if zGC is available and no other GC is specified, and the latter holds for an out-of-box configuration (assuming that zGC is available in the JVM under test); thus, again, you don't have to specify -XX:+UseZGC to run this test. So there are no "technical" reasons to run gc/z/TestSmallHeap.java (or any other gc/z/ tests) with -XX:+UseZGC. The proposed patches don't change that fact in any way. The patches exclude the tests that ignore external VM flags from execution if any significant VM flags are specified. gc/z/TestSmallHeap.java ignores all externally provided VM flags, including -XX:+UseZGC. And although in the case of -XX:+UseZGC, it's harmless, in almost all other cases it's not. Just to give you a few examples: Let's say you are fixing a bug in zGC which could be reproduced by gc/z/TestSmallHeap.java. You came up with two alternative solutions, one of which is guarded by `if (UseNewCode)`. To test these solutions, you ran gc/z tests twice: with -XX:+UseZGC -XX:+UseNewCode, and all tests passed; with XX:+UseZGC, and many tests (but not gc/z/TestSmallHeap.java) failed. So based on these results, you decided that the guarded solution is perfect, cleaned up the code, sent it out for review, got it pushed, and minutes later found out that gc/z/TestSmallHeap.java and some other tests which ignore VM flags failed. It would take you some time, to realize that you hadn't tested your UseNewCode solution by these tests. Yet were these tests excluded from your testing, it would be much easier for you to spot that and react accordingly. Here is another scenario, you decided to change the default value of ZUncommit, so you ran different tests with `XX:+UseZGC -XX:-ZUncommit`, all green, you pushed a trivial change s/true/false in z_globals.hpp, next thing you knew a bunch of zGC specific tests failed in CI. And again, these were the tests that silently ignored `XX:+UseZGC -XX:-ZUncommit`. Or a slight variation, zGC-supported was added to a future JIT, gc/z tests were run with the flag combination which enabled the future JIT, all passed, the victory was declared; N releases later; default JIT got changed to the future JIT; the next CI build is a disaster, with lots of tests failing from the bugs which had not been found N/2 years ago. Although I understand that it might take some getting used to from you and others who used to run gc/x tests with -XX:+Use${X}GC, I am certain that this will improve the overall quality of hotspot, save not only machine time (from running these tests with other flags) but engineers time from analyzing surprising failures, and increase confidence and trust in the hotspot test suite. In a word, I can see how this can be a bit surprising, yet still less surprising than the current behavior, but I don't see it as incorrect, it just surfaces limitations of certain tests. From my (slightly biased) point of view, it's the right thing to do. Thanks. -- Igor > On Jun 5, 2020, at 1:20 AM, Per Liden wrote: > > Hi Igor, > > When looking at the follow-up sub-tasks for this, I see for example this: > > http://cr.openjdk.java.net/~iignatyev/8246499/webrev.00/test/hotspot/jtreg/gc/z/TestSmallHeap.java.udiff.html > > Maybe I'm misunderstanding how this is supposed to work, but it looks like this test would now _not_ be executed if I do: > > make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" > > Is that so? In that case, that seems incorrect. > > cheers, > Per > > On 6/3/20 11:30 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >>> 70 lines changed: 66 ins; 0 del; 4 mod >> Hi all, >> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags? >> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so >> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed; >> b) they can be easily excluded from runs w/ flags. >> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless` w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable. >> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1]. >> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3]. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494 >> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00 >> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags >> [1] https://bugs.openjdk.java.net/browse/JDK-8151707 >> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336 >> [3] https://bugs.openjdk.java.net/browse/JDK-8246387 >> Thanks, >> -- Igor From serguei.spitsyn at oracle.com Fri Jun 5 21:38:53 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 5 Jun 2020 14:38:53 -0700 Subject: RFR (XS): 8196450: Deprecate JDWP/JDI canUnrestrictedlyRedefineClasses to match JVM TI capabilities In-Reply-To: <847a64cb-77a8-150e-2e0e-b7796af96737@oracle.com> References: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> <847a64cb-77a8-150e-2e0e-b7796af96737@oracle.com> Message-ID: Thank you, David! Sorry for late reply. I did not see you message until restarted my Zunderbird email client. Thanks, Serguei On 6/5/20 06:05, David Holmes wrote: > Sorry Serguei I got distracted and forgot about the RFR part of this. > > Reviewed :) > > Thanks, > David > > On 5/06/2020 3:19 pm, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> You have already approved the CSR below. >> May I count it as a review as there is no difference between CSR and >> webrev - both have the same spec update? >> >> Thanks, >> Serguei >> >> >> On 6/3/20 20:57, serguei.spitsyn at oracle.com wrote: >>> Please, review a fix for: >>> https://bugs.openjdk.java.net/browse/JDK-8196450 >>> >>> >>> CSR draft (one CSR reviewer is needed before finalizing it): >>> https://bugs.openjdk.java.net/browse/JDK-8246540 >>> >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/src/ >>> >>> >>> Updated JDWP VirtualMachine::capabilitiesNew spec: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/specs/jdwp/jdwp-protocol.html#JDWP_VirtualMachine_CapabilitiesNew >>> >>> >>> Updated JDI com.sun.jdi.VirtualMachine spec: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canAddMethod() >>> >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canUnrestrictedlyRedefineClasses() >>> >>> >>> >>> Summary: >>> ? The fix adds annotations and deprecation comments to the capabilities >>> ?? canUnrestrictedlyRedefineClasses and canAddMethod. >>> ?? It impacts the JDWP capabilitiesNew command and the JDI >>> VirtualMachine interface. >>> >>> >>> Testing: >>> ? Built docs and checked the doc has been generated as expected. >>> ? Will run the JDI/JDWP tests locally >>> >>> Thanks, >>> Serguei >> From chris.plummer at oracle.com Fri Jun 5 23:11:48 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 5 Jun 2020 16:11:48 -0700 Subject: RFR (XS): 8196450: Deprecate JDWP/JDI canUnrestrictedlyRedefineClasses to match JVM TI capabilities In-Reply-To: <847a64cb-77a8-150e-2e0e-b7796af96737@oracle.com> References: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> <847a64cb-77a8-150e-2e0e-b7796af96737@oracle.com> Message-ID: Hi Serguei, Looks good. thanks, Chris On 6/5/20 6:05 AM, David Holmes wrote: > Sorry Serguei I got distracted and forgot about the RFR part of this. > > Reviewed :) > > Thanks, > David > > On 5/06/2020 3:19 pm, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> You have already approved the CSR below. >> May I count it as a review as there is no difference between CSR and >> webrev - both have the same spec update? >> >> Thanks, >> Serguei >> >> >> On 6/3/20 20:57, serguei.spitsyn at oracle.com wrote: >>> Please, review a fix for: >>> https://bugs.openjdk.java.net/browse/JDK-8196450 >>> >>> >>> CSR draft (one CSR reviewer is needed before finalizing it): >>> https://bugs.openjdk.java.net/browse/JDK-8246540 >>> >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/src/ >>> >>> >>> Updated JDWP VirtualMachine::capabilitiesNew spec: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/specs/jdwp/jdwp-protocol.html#JDWP_VirtualMachine_CapabilitiesNew >>> >>> >>> Updated JDI com.sun.jdi.VirtualMachine spec: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canAddMethod() >>> >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canUnrestrictedlyRedefineClasses() >>> >>> >>> >>> Summary: >>> ? The fix adds annotations and deprecation comments to the capabilities >>> ?? canUnrestrictedlyRedefineClasses and canAddMethod. >>> ?? It impacts the JDWP capabilitiesNew command and the JDI >>> VirtualMachine interface. >>> >>> >>> Testing: >>> ? Built docs and checked the doc has been generated as expected. >>> ? Will run the JDI/JDWP tests locally >>> >>> Thanks, >>> Serguei >> From serguei.spitsyn at oracle.com Fri Jun 5 23:14:52 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 5 Jun 2020 16:14:52 -0700 Subject: RFR (XS): 8196450: Deprecate JDWP/JDI canUnrestrictedlyRedefineClasses to match JVM TI capabilities In-Reply-To: References: <4e4fc237-f3ad-f236-ac59-01875ce7ca8f@oracle.com> <847a64cb-77a8-150e-2e0e-b7796af96737@oracle.com> Message-ID: <54946b9a-924f-ddf9-78f8-b4af3a9a1ce9@oracle.com> Thank you for review, Chris! Serguei On 6/5/20 16:11, Chris Plummer wrote: > Hi Serguei, > > Looks good. > > thanks, > > Chris > > On 6/5/20 6:05 AM, David Holmes wrote: >> Sorry Serguei I got distracted and forgot about the RFR part of this. >> >> Reviewed :) >> >> Thanks, >> David >> >> On 5/06/2020 3:19 pm, serguei.spitsyn at oracle.com wrote: >>> Hi David, >>> >>> You have already approved the CSR below. >>> May I count it as a review as there is no difference between CSR and >>> webrev - both have the same spec update? >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/3/20 20:57, serguei.spitsyn at oracle.com wrote: >>>> Please, review a fix for: >>>> https://bugs.openjdk.java.net/browse/JDK-8196450 >>>> >>>> >>>> CSR draft (one CSR reviewer is needed before finalizing it): >>>> https://bugs.openjdk.java.net/browse/JDK-8246540 >>>> >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/src/ >>>> >>>> >>>> Updated JDWP VirtualMachine::capabilitiesNew spec: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/specs/jdwp/jdwp-protocol.html#JDWP_VirtualMachine_CapabilitiesNew >>>> >>>> >>>> Updated JDI com.sun.jdi.VirtualMachine spec: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canAddMethod() >>>> >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jdwp-depr.1/docs/api/jdk.jdi/com/sun/jdi/VirtualMachine.html#canUnrestrictedlyRedefineClasses() >>>> >>>> >>>> >>>> Summary: >>>> ? The fix adds annotations and deprecation comments to the >>>> capabilities >>>> ?? canUnrestrictedlyRedefineClasses and canAddMethod. >>>> ?? It impacts the JDWP capabilitiesNew command and the JDI >>>> VirtualMachine interface. >>>> >>>> >>>> Testing: >>>> ? Built docs and checked the doc has been generated as expected. >>>> ? Will run the JDI/JDWP tests locally >>>> >>>> Thanks, >>>> Serguei >>> > > From fairoz.matte at oracle.com Mon Jun 8 07:26:10 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 8 Jun 2020 00:26:10 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> Message-ID: <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> Hi Serguei, Erik, ? Thanks for the reviews, Below webrev contains the suggested changes, http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ ? The only thing I couldn?t do is to keep the local copy of isJFRActive() in HeapwalkingDebugger, The method is called in debugee code. In debugger, we have access to debugee before test started or after test completes. isJFRActive() method need to be executed during the test execution. Hence I didn?t find place to initialize and cannot make local copy. ? Thanks, Fairoz ? From: Serguei Spitsyn Sent: Tuesday, June 2, 2020 7:57 AM To: Fairoz Matte ; Erik Gahlin Cc: serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken ? On 6/1/20 12:30, HYPERLINK "mailto:serguei.spitsyn at oracle.com"serguei.spitsyn at oracle.com wrote: Hi Fairoz, It looks okay in general. But I'm not sure this check is going to work. The problem is the HeapwalkingDebuggee.useStrictCheck method is invoked in the context of the HeapwalkingDebugger process, not the HeapwalkingDebuggee process. Probably, you wanted to get this bit of information from the Debuggee process. The debuggee has to evaluate it itself and store in some field. The debugger should use the JDI to get this value from the debuggee. Thanks, Serguei I'm not sure, what exactly you wanted to do here. It can occasionally work for you as long as both processes are run with the same options. Thanks, Serguei On 6/1/20 08:52, Fairoz Matte wrote: Hi Erik, ? Thanks for the review, below is the updated webrev. http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ ? Thanks, Fairoz ? -----Original Message----- From: Erik Gahlin Sent: Monday, June 1, 2020 4:26 PM To: Fairoz Matte HYPERLINK "mailto:fairoz.matte at oracle.com" Cc: HYPERLINK "mailto:serviceability-dev at openjdk.java.net"serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken ? Hi Fairoz, ? What I think you need to do is something like this: ? ??????? if (className.equals("java.lang.Thread")) { ??????????? return !isJfrInitialized(); ??????? } ? ... ? ??? private static boolean isJfrInitialized() { ??????? try { ??????????? Class clazz = Class.forName("jdk.jfr.FlightRecorder"); ??????????? Method method = clazz.getDeclaredMethod("isInitialized", new Class[0]); ??????????? return (boolean) method.invoke(null, new Object[0]); ??????? } catch (Exception e) { ??????????? return false; ??????? } ??? } ? Erik ? On 2020-06-01 12:30, Fairoz Matte wrote: Hi Erik, ? Thanks for your quick response, Below is the updated webrev to handle if jfr module is not present http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ ? Thanks, Fairoz ? -----Original Message----- From: Erik Gahlin Sent: Monday, June 1, 2020 2:31 PM To: Fairoz Matte HYPERLINK "mailto:fairoz.matte at oracle.com" Cc: HYPERLINK "mailto:serviceability-dev at openjdk.java.net"serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken ? Hi Fairoz, ? If the test needs to run with builds where the JFR module is not present(?), you need to do the check using reflection. ? If not, looks good. ? Erik ? On 1 Jun 2020, at 10:27, Fairoz Matte HYPERLINK "mailto:fairoz.matte at oracle.com" wrote: ? Hi, ? Please review this small test infra change to identify at runtime the JFR is active or not. JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ ? Thanks, Fairoz ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Jun 8 08:59:25 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 8 Jun 2020 01:59:25 -0700 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: <28ba83d3-8228-5c5c-9ed4-925336bf11f3@oracle.com> References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> <28ba83d3-8228-5c5c-9ed4-925336bf11f3@oracle.com> Message-ID: <3d648db5-293f-595c-3f1b-b361080207e7@oracle.com> Thank you a lot for review, Christian! Serguei On 6/8/20 00:34, Christian Hagedorn wrote: > Hi Serguei > > Thanks for fixing this. I don't have official reviewer status but the > changes look good to me. > > As we've already discussed, this does not fix JDK-8245128, unfortunately. > > Best regards, > Christian > > On 04.06.20 01:05, serguei.spitsyn at oracle.com wrote: >> Hi Dean, >> >> Thank you a lot for the review! >> I hope, Christian will have a chance to look at it. >> >> Thanks, >> Serguei >> >> >> On 6/3/20 14:56, Dean Long wrote: >>> Hi Serguei, I like the latest changes so that JVMCI matches C2. >>> Please get another review because this is not a trivial change. >>> >>> dl >>> >>> On 6/3/20 10:06 AM, serguei.spitsyn at oracle.com wrote: >>>> Hi Dean, >>>> >>>> The updated webrev is: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.3/ >>>> >>>> Probably, the JVMCI part can be simplified. >>>> Only the compile_state line has to be moved up: >>>> + JVMCICompileState compile_state(task); >>>> ????? // Skip redefined methods >>>> - if (target_handle->is_old()) { >>>> + if (compile_state.target_method_is_old()) { >>>> ??????? failure_reason = "redefined method"; >>>> ??????? retry_message = "not retryable"; >>>> ??????? compilable = ciEnv::MethodCompilable_never; >>>> ????? } else { >>>> - JVMCICompileState compile_state(task); >>>> Fixes in the jvmciEnv.?pp are not really needed >>>> >>>> Please, let me know what do you think. >>>> >>>> This version does not fail at all (in 300 runs for both C2 and JVMCI). >>>> It seems, other two issues disappeared as well: >>>> >>>> This was seen with the C2: >>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>> >>>> This was seen with the JVMCI: >>>> https://bugs.openjdk.java.net/browse/JDK-8245446 >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/1/20 23:40, serguei.spitsyn at oracle.com wrote: >>>>> Hi Dean, >>>>> >>>>> Thank you for the reply. >>>>> >>>>> The problem is I do not fully understand your suggestion, >>>>> especially the part >>>>> about caching the method,is_old() value in the cache_jvmti_state(). >>>>> >>>>> This is a preliminary webrev where I tried to implement your >>>>> suggestion: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.2/ >>>>> >>>>> This variant is failing in half of test runs for both C1/C2 and >>>>> JVMCI. >>>>> I think, the root cause is a safepoint in a ThreadInVMfromNative >>>>> desctructor. >>>>> Here: >>>>> ?232 void ciEnv::cache_jvmti_state() { >>>>> ?233 VM_ENTRY_MARK; >>>>> >>>>> Then we check for the target_method_is_old() value which is not >>>>> up-to-date any more. >>>>> I feel, it was correct and more simple before introducing this >>>>> approach. >>>>> Probably, I'm missing something here. >>>>> >>>>> >>>>> I also have a question about the update fragment: >>>>> 1696?? { >>>>> 1697???? // Must switch to native to allocate ci_env >>>>> 1698???? ThreadToNativeFromVM ttn(thread); >>>>> 1699???? ciEnv ci_env((CompileTask*)NULL); >>>>> 1700 >>>>> 1701???? // Switch back to VM state to do compiler initialization >>>>> 1702???? ThreadInVMfromNative tv(thread); >>>>> 1703???? ResetNoHandleMark rnhm; >>>>> 1704 >>>>> 1705???? // Perform per-thread and global initializations >>>>> 1706???? comp->initialize(); >>>>> 1707?? } >>>>> Can we remove the ciEnv object initialization above with the state >>>>> transitions? >>>>> Or it has some side effects? >>>>> >>>>> Please, let me know what you think. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/1/20 15:10, Dean Long wrote: >>>>>> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Dean, >>>>>>> >>>>>>> To check the is_old as you suggest the target method has to be >>>>>>> passed >>>>>>> to the cache_jvmti_state() as argument. Is it what you are >>>>>>> suggesting? >>>>>> >>>>>> I believe you can use use _task->method()->is_old(), as the ciEnv >>>>>> already has the task. >>>>>> >>>>>>> Just want to make sure I understand you correctly. >>>>>>> >>>>>>> The cache_jvmti_state() and cache_dtrace_flags() are called in the >>>>>>> CompileBroker::init_compiler_runtime() for a ciEnv with the NULL >>>>>>> CompileTask >>>>>>> which looks unnecessary (or I don't understand it): >>>>>>> >>>>>>> bool CompileBroker::init_compiler_runtime() { >>>>>>> ? CompilerThread* thread = CompilerThread::current(); >>>>>>> ? . . . >>>>>>> ??? ciEnv ci_env((CompileTask*)NULL); >>>>>>> ??? // Cache Jvmti state >>>>>>> ??? ci_env.cache_jvmti_state(); >>>>>>> ??? // Cache DTrace flags >>>>>>> ??? ci_env.cache_dtrace_flags(); >>>>>>> >>>>>> >>>>>> These calls look unnecessary to me, as the ci_env will cache >>>>>> these again before compiling a method. >>>>>> I suggest removing these calls.? We should make sure the cache >>>>>> fields are initialized to sane values >>>>>> in the ciEnv ctor. >>>>>> >>>>>>> The JVMCI has a separate implementation for ciEnv which is >>>>>>> jvmciEnv and >>>>>>> its own set of cache_jvmti_state() and jvmti_state_changed() >>>>>>> functions. >>>>>>> Both are not called in the JVMCI case. >>>>>>> So, these checks look as broken in JVMCI now. >>>>>>> >>>>>> JVMCI is in better shape, because it doesn't transition out of >>>>>> _thread_in_vm state, >>>>>> but yes it needs similar changes. >>>>>> >>>>>>> Not sure, I have enough compiler knowledge to fix this at this >>>>>>> stage of release. >>>>>>> Would it better to file a separate hotspot/compiler RFE targeted >>>>>>> to 16? >>>>>>> It can be assigned to me if it helps. >>>>>>> >>>>>> >>>>>> This is a P3 so I believe we have time to fix it for 15. Please >>>>>> go ahead and let's see if >>>>>> we can get it in.? I can help with the JVMCI changes if they are >>>>>> not straightforward. >>>>>> >>>>>> dl >>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> On 5/28/20 10:54, Dean Long wrote: >>>>>>>> Sure, you could just have cache_jvmti_state() return a boolean >>>>>>>> to bail out immediately for is_old. >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi Dean, >>>>>>>>> >>>>>>>>> Thank you for looking at this! >>>>>>>>> Okay. Let me check what cab be done in this direction. >>>>>>>>> There is no point to cache is_old. The compilation has to bail >>>>>>>>> out if it is discovered to be true. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5/28/20 00:59, Dean Long wrote: >>>>>>>>>> This seems OK as long as the memory barriers in the thread >>>>>>>>>> state transitions prevent the C++ compiler from doing >>>>>>>>>> something like reading is_old before reading >>>>>>>>>> redefinition_count.? I would feel better if both JVMCI and >>>>>>>>>> C1/C2 cached is_old and redefinition_count at the same time >>>>>>>>>> (making sure to be in the _thread_in_vm state), then bail out >>>>>>>>>> based on the cached value of is_old. >>>>>>>>>> >>>>>>>>>> dl >>>>>>>>>> >>>>>>>>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Please, review a fix for: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>>>>>>>> >>>>>>>>>>>> Webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Summary: >>>>>>>>>>>> ? The Kitchensink stress test with the Instrumentation >>>>>>>>>>>> module enabled does >>>>>>>>>>>> ? a lot of class retransformations in parallel with all >>>>>>>>>>>> other stressing. >>>>>>>>>>>> ? It provokes the assert at the compiled code installation >>>>>>>>>>>> time: >>>>>>>>>>>> ??? assert(!method->is_old()) failed: Should not be >>>>>>>>>>>> installing old methods >>>>>>>>>>>> >>>>>>>>>>>> ? The problem is that the >>>>>>>>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>>>>>>>> ? (non-JVMCI tiered compilation) is missing the check that >>>>>>>>>>>> exists in the JVMCI >>>>>>>>>>>> ? part of implementation: >>>>>>>>>>>> 2148???? // Skip redefined methods >>>>>>>>>>>> 2149???? if (target_handle->is_old()) { >>>>>>>>>>>> 2150?????? failure_reason = "redefined method"; >>>>>>>>>>>> 2151?????? retry_message = "not retryable"; >>>>>>>>>>>> 2152?????? compilable = ciEnv::MethodCompilable_never; >>>>>>>>>>>> 2153???? } else { >>>>>>>>>>>> . . . >>>>>>>>>>>> 2168???? } >>>>>>>>>>>> >>>>>>>>>>>> ?? The fix is to add this check. >>>>>>>>>>> >>>>>>>>>>> Sorry, forgot to explain one thing. >>>>>>>>>>> Compiler code has a special mechanism to ensure the JVMTI >>>>>>>>>>> class redefinition did >>>>>>>>>>> not happen while the method was compiled, so all the >>>>>>>>>>> assumptions remain correct. >>>>>>>>>>> ?? 2190???? // Cache Jvmti state >>>>>>>>>>> ?? 2191???? ci_env.cache_jvmti_state(); >>>>>>>>>>> Part of this is a check that the value of >>>>>>>>>>> JvmtiExport::redefinition_count() is >>>>>>>>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>>>>>>>> The JvmtiExport::redefinition_count() value change means a >>>>>>>>>>> class redefinition >>>>>>>>>>> happened which also implies some of methods may become old. >>>>>>>>>>> However, the method being compiled can be already old at the >>>>>>>>>>> point where the >>>>>>>>>>> redefinition counter is cached, so the redefinition counter >>>>>>>>>>> check does not help much. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>>> Testing: >>>>>>>>>>>> ?? Ran Kitchensink test with the Instrumentation module >>>>>>>>>>>> enabled in mach5 >>>>>>>>>>>> ? ?multiple times for 100 times. Without the fix the test >>>>>>>>>>>> normally fails >>>>>>>>>>>> ?? a couple of times in 200 runs. It does not fail with the >>>>>>>>>>>> fix anymore. >>>>>>>>>>>> ?? Will also submit hs tiers1-5. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> From serguei.spitsyn at oracle.com Mon Jun 8 09:08:29 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 8 Jun 2020 02:08:29 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> Message-ID: <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> An HTML attachment was scrubbed... URL: From ralf.schmelter at sap.com Mon Jun 8 09:37:57 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Mon, 8 Jun 2020 09:37:57 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> , Message-ID: Hi Goetz, > What kind of tests did you run? The jdk submit repo, the JCK tests (apart from API) and the jtreg tests on Windows x86/64, MacOS X, linux on x86/64, ppcle, ppcbe, zarch and aarch64 and on AIX. If there aren't any other concerns, I would like to commit this this change on Wednesday. Best regards, Ralf -----Original Message----- From: Lindenmaier, Goetz Sent: Friday, 5 June 2020 18:02 To: Schmelter, Ralf ; Langer, Christoph Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, Thanks for the quick reply and all the fixes. The changes to the workgroup are ok. Reviewed. (An incremental webrev would have helped ??) What kind of tests did you run? > Yes, the buffer is now smaller (1M) versus the original (8M). You need > to be able to at least allocate one buffer or you get an error (this > is handled in the CompressionBackend ctor). You then allocate > additional buffers as needed (we want a new buffer, but there is no > free one), until we have a buffer for every worker thread or until > the allocation of the buffer failed. In this case some threads will > be idle, since we cannot have a buffer for each thread. Ok, that's what I thought. Thanks for the explanation. > >?Another question. > > The basic dumping is done sequential, right? The comression? > > is parallel. Is there a tradeoff in #of threads where > > the compression is faster than writing? > Yes. The compression and writing is done parallel. Depeding on > the compression level and the speed of your harddrive, not all > threads will be active all the time. But since we reuse the GC threads > this should not matter. And the relative poor performance of > deflate() ensures that at least 5 to 10 threads will probably always > be active ;) Ok, thanks. Best regards, Goetz. From goetz.lindenmaier at sap.com Mon Jun 8 09:51:30 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 8 Jun 2020 09:51:30 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> , Message-ID: HI Ralf, Thanks for the info. Looks good, in my eyes ready to be pushed. Best regards, Goetz. -----Original Message----- From: Schmelter, Ralf Sent: Montag, 8. Juni 2020 11:38 To: Lindenmaier, Goetz ; Langer, Christoph Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime ; David Holmes ; serguei.spitsyn at oracle.com; Ioi Lam Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Goetz, > What kind of tests did you run? The jdk submit repo, the JCK tests (apart from API) and the jtreg tests on Windows x86/64, MacOS X, linux on x86/64, ppcle, ppcbe, zarch and aarch64 and on AIX. If there aren't any other concerns, I would like to commit this this change on Wednesday. Best regards, Ralf -----Original Message----- From: Lindenmaier, Goetz Sent: Friday, 5 June 2020 18:02 To: Schmelter, Ralf ; Langer, Christoph Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, Thanks for the quick reply and all the fixes. The changes to the workgroup are ok. Reviewed. (An incremental webrev would have helped ??) What kind of tests did you run? > Yes, the buffer is now smaller (1M) versus the original (8M). You need > to be able to at least allocate one buffer or you get an error (this > is handled in the CompressionBackend ctor). You then allocate > additional buffers as needed (we want a new buffer, but there is no > free one), until we have a buffer for every worker thread or until > the allocation of the buffer failed. In this case some threads will > be idle, since we cannot have a buffer for each thread. Ok, that's what I thought. Thanks for the explanation. > >?Another question. > > The basic dumping is done sequential, right? The comression? > > is parallel. Is there a tradeoff in #of threads where > > the compression is faster than writing? > Yes. The compression and writing is done parallel. Depeding on > the compression level and the speed of your harddrive, not all > threads will be active all the time. But since we reuse the GC threads > this should not matter. And the relative poor performance of > deflate() ensures that at least 5 to 10 threads will probably always > be active ;) Ok, thanks. Best regards, Goetz. From christian.hagedorn at oracle.com Mon Jun 8 07:34:12 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 8 Jun 2020 09:34:12 +0200 Subject: RFR(XS): 8245126 Kitchensink fails with: assert(!method->is_old()) failed: Should not be installing old methods In-Reply-To: References: <62e34586-eaac-3200-8f5a-ee12ad654afa@oracle.com> <5d957cae-8911-8572-2b45-048b8d09ae79@oracle.com> <5b35d572-5db7-87b3-4983-f872e2c731b2@oracle.com> <42d73a82-b70e-1a0d-312d-303409840392@oracle.com> <64694163-fdd5-5ccb-3ffb-2027b05a3719@oracle.com> <6e9233a4-b743-5e66-328f-7f91c6a7b292@oracle.com> Message-ID: <28ba83d3-8228-5c5c-9ed4-925336bf11f3@oracle.com> Hi Serguei Thanks for fixing this. I don't have official reviewer status but the changes look good to me. As we've already discussed, this does not fix JDK-8245128, unfortunately. Best regards, Christian On 04.06.20 01:05, serguei.spitsyn at oracle.com wrote: > Hi Dean, > > Thank you a lot for the review! > I hope, Christian will have a chance to look at it. > > Thanks, > Serguei > > > On 6/3/20 14:56, Dean Long wrote: >> Hi Serguei, I like the latest changes so that JVMCI matches C2. Please >> get another review because this is not a trivial change. >> >> dl >> >> On 6/3/20 10:06 AM, serguei.spitsyn at oracle.com wrote: >>> Hi Dean, >>> >>> The updated webrev is: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.3/ >>> >>> Probably, the JVMCI part can be simplified. >>> Only the compile_state line has to be moved up: >>> + JVMCICompileState compile_state(task); >>> // Skip redefined methods >>> - if (target_handle->is_old()) { >>> + if (compile_state.target_method_is_old()) { >>> failure_reason = "redefined method"; >>> retry_message = "not retryable"; >>> compilable = ciEnv::MethodCompilable_never; >>> } else { >>> - JVMCICompileState compile_state(task); >>> Fixes in the jvmciEnv.?pp are not really needed >>> >>> Please, let me know what do you think. >>> >>> This version does not fail at all (in 300 runs for both C2 and JVMCI). >>> It seems, other two issues disappeared as well: >>> >>> This was seen with the C2: >>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>> >>> This was seen with the JVMCI: >>> https://bugs.openjdk.java.net/browse/JDK-8245446 >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/1/20 23:40, serguei.spitsyn at oracle.com wrote: >>>> Hi Dean, >>>> >>>> Thank you for the reply. >>>> >>>> The problem is I do not fully understand your suggestion, especially >>>> the part >>>> about caching the method,is_old() value in the cache_jvmti_state(). >>>> >>>> This is a preliminary webrev where I tried to implement your suggestion: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.2/ >>>> >>>> This variant is failing in half of test runs for both C1/C2 and JVMCI. >>>> I think, the root cause is a safepoint in a ThreadInVMfromNative >>>> desctructor. >>>> Here: >>>> ?232 void ciEnv::cache_jvmti_state() { >>>> ?233 VM_ENTRY_MARK; >>>> >>>> Then we check for the target_method_is_old() value which is not >>>> up-to-date any more. >>>> I feel, it was correct and more simple before introducing this approach. >>>> Probably, I'm missing something here. >>>> >>>> >>>> I also have a question about the update fragment: >>>> 1696 { >>>> 1697 // Must switch to native to allocate ci_env >>>> 1698 ThreadToNativeFromVM ttn(thread); >>>> 1699 ciEnv ci_env((CompileTask*)NULL); >>>> 1700 >>>> 1701 // Switch back to VM state to do compiler initialization >>>> 1702 ThreadInVMfromNative tv(thread); >>>> 1703 ResetNoHandleMark rnhm; >>>> 1704 >>>> 1705 // Perform per-thread and global initializations >>>> 1706 comp->initialize(); >>>> 1707 } >>>> Can we remove the ciEnv object initialization above with the state >>>> transitions? >>>> Or it has some side effects? >>>> >>>> Please, let me know what you think. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/1/20 15:10, Dean Long wrote: >>>>> On 5/31/20 11:16 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Dean, >>>>>> >>>>>> To check the is_old as you suggest the target method has to be passed >>>>>> to the cache_jvmti_state() as argument. Is it what you are suggesting? >>>>> >>>>> I believe you can use use _task->method()->is_old(), as the ciEnv >>>>> already has the task. >>>>> >>>>>> Just want to make sure I understand you correctly. >>>>>> >>>>>> The cache_jvmti_state() and cache_dtrace_flags() are called in the >>>>>> CompileBroker::init_compiler_runtime() for a ciEnv with the NULL >>>>>> CompileTask >>>>>> which looks unnecessary (or I don't understand it): >>>>>> >>>>>> bool CompileBroker::init_compiler_runtime() { >>>>>> ? CompilerThread* thread = CompilerThread::current(); >>>>>> ? . . . >>>>>> ??? ciEnv ci_env((CompileTask*)NULL); >>>>>> ??? // Cache Jvmti state >>>>>> ??? ci_env.cache_jvmti_state(); >>>>>> ??? // Cache DTrace flags >>>>>> ??? ci_env.cache_dtrace_flags(); >>>>>> >>>>> >>>>> These calls look unnecessary to me, as the ci_env will cache these >>>>> again before compiling a method. >>>>> I suggest removing these calls.? We should make sure the cache >>>>> fields are initialized to sane values >>>>> in the ciEnv ctor. >>>>> >>>>>> The JVMCI has a separate implementation for ciEnv which is >>>>>> jvmciEnv and >>>>>> its own set of cache_jvmti_state() and jvmti_state_changed() >>>>>> functions. >>>>>> Both are not called in the JVMCI case. >>>>>> So, these checks look as broken in JVMCI now. >>>>>> >>>>> JVMCI is in better shape, because it doesn't transition out of >>>>> _thread_in_vm state, >>>>> but yes it needs similar changes. >>>>> >>>>>> Not sure, I have enough compiler knowledge to fix this at this >>>>>> stage of release. >>>>>> Would it better to file a separate hotspot/compiler RFE targeted >>>>>> to 16? >>>>>> It can be assigned to me if it helps. >>>>>> >>>>> >>>>> This is a P3 so I believe we have time to fix it for 15. Please go >>>>> ahead and let's see if >>>>> we can get it in.? I can help with the JVMCI changes if they are >>>>> not straightforward. >>>>> >>>>> dl >>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 5/28/20 10:54, Dean Long wrote: >>>>>>> Sure, you could just have cache_jvmti_state() return a boolean to >>>>>>> bail out immediately for is_old. >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 5/28/20 7:23 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>> Hi Dean, >>>>>>>> >>>>>>>> Thank you for looking at this! >>>>>>>> Okay. Let me check what cab be done in this direction. >>>>>>>> There is no point to cache is_old. The compilation has to bail >>>>>>>> out if it is discovered to be true. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> >>>>>>>> On 5/28/20 00:59, Dean Long wrote: >>>>>>>>> This seems OK as long as the memory barriers in the thread >>>>>>>>> state transitions prevent the C++ compiler from doing something >>>>>>>>> like reading is_old before reading redefinition_count.? I would >>>>>>>>> feel better if both JVMCI and C1/C2 cached is_old and >>>>>>>>> redefinition_count at the same time (making sure to be in the >>>>>>>>> _thread_in_vm state), then bail out based on the cached value >>>>>>>>> of is_old. >>>>>>>>> >>>>>>>>> dl >>>>>>>>> >>>>>>>>> On 5/26/20 12:04 AM, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> On 5/25/20 23:39, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Please, review a fix for: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/kitchensink-comp.1/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Summary: >>>>>>>>>>> ? The Kitchensink stress test with the Instrumentation module >>>>>>>>>>> enabled does >>>>>>>>>>> ? a lot of class retransformations in parallel with all other >>>>>>>>>>> stressing. >>>>>>>>>>> ? It provokes the assert at the compiled code installation time: >>>>>>>>>>> ??? assert(!method->is_old()) failed: Should not be >>>>>>>>>>> installing old methods >>>>>>>>>>> >>>>>>>>>>> ? The problem is that the >>>>>>>>>>> CompileBroker::invoke_compiler_on_method in C2 version >>>>>>>>>>> ? (non-JVMCI tiered compilation) is missing the check that >>>>>>>>>>> exists in the JVMCI >>>>>>>>>>> ? part of implementation: >>>>>>>>>>> 2148 // Skip redefined methods >>>>>>>>>>> 2149 if (target_handle->is_old()) { >>>>>>>>>>> 2150 failure_reason = "redefined method"; >>>>>>>>>>> 2151 retry_message = "not retryable"; >>>>>>>>>>> 2152 compilable = ciEnv::MethodCompilable_never; >>>>>>>>>>> 2153 } else { >>>>>>>>>>> . . . >>>>>>>>>>> 2168 } >>>>>>>>>>> >>>>>>>>>>> ? The fix is to add this check. >>>>>>>>>> >>>>>>>>>> Sorry, forgot to explain one thing. >>>>>>>>>> Compiler code has a special mechanism to ensure the JVMTI >>>>>>>>>> class redefinition did >>>>>>>>>> not happen while the method was compiled, so all the >>>>>>>>>> assumptions remain correct. >>>>>>>>>> 2190 // Cache Jvmti state >>>>>>>>>> 2191 ci_env.cache_jvmti_state(); >>>>>>>>>> Part of this is a check that the value of >>>>>>>>>> JvmtiExport::redefinition_count() is >>>>>>>>>> cached in ciEnv variable: _jvmti_redefinition_count. >>>>>>>>>> The JvmtiExport::redefinition_count() value change means a >>>>>>>>>> class redefinition >>>>>>>>>> happened which also implies some of methods may become old. >>>>>>>>>> However, the method being compiled can be already old at the >>>>>>>>>> point where the >>>>>>>>>> redefinition counter is cached, so the redefinition counter >>>>>>>>>> check does not help much. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>>> Testing: >>>>>>>>>>> Ran Kitchensink test with the Instrumentation module enabled in mach5 >>>>>>>>>>> ?multiple times for 100 times. Without the fix the test normally fails >>>>>>>>>>> a couple of times in 200 runs. It does not fail with the fix anymore. >>>>>>>>>>> Will also submit hs tiers1-5. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > From serguei.spitsyn at oracle.com Mon Jun 8 17:03:32 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 8 Jun 2020 10:03:32 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> Message-ID: <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> An HTML attachment was scrubbed... URL: From ioi.lam at oracle.com Tue Jun 9 01:36:02 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 8 Jun 2020 18:36:02 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> Message-ID: <1d7a3514-f5b8-015b-294d-520328bb1c95@oracle.com> Hi Jiangli, I've asked this before. Do you have any performance data showing the benefit of the JVMTI part of the optimization, when JVMTI is used? I think ultimately the JVMTI team should evaluate the patch. I CC'ed Serguei. I think it will be helpful if there's data that shows how JVMTI can benefit from this patch. BTW, there's actually a race condition with the latest patch. I think this shows just how difficult it's to get things right in this very complicated part of the JVM. 1706???? update_dictionary(d_hash, p_index, p_hash, 1707?????????????????????? k, class_loader_h, THREAD); 1708?? } 1709?? k->eager_initialize(THREAD); 1710 >>>> HERE 1711?? // notify jvmti 1712?? if (JvmtiExport::should_post_class_load()) { 1713?????? assert(THREAD->is_Java_thread(), "thread->is_Java_thread()"); 1714?????? JvmtiExport::post_class_load((JavaThread *) THREAD, k); 1715 1716?? } 1717?? post_class_define_event(k, loader_data); 1718?? if (k->is_shared() && k->is_linked()) { 1719???? if (JvmtiExport::should_post_class_prepare()) { 1720?????? // To keep the same behavior as for dynamically loaded classes, 1721?????? // lock the init_lock before posting the ClassPrepare event. 1722?????? Handle h_init_lock(THREAD, k->init_lock()); 1723?????? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); 1724?????? JvmtiExport::post_class_prepare((JavaThread *)THREAD, k); 1725???? } 1726?? } 1727 } For a non-boot class, between the time where klass is added to the dictionary, to where you checked for ik->is_linked(), another thread could have called into klass->link_class_impl() and hence invoked the ClassPrepare callback. So your code will invoke the callback again. Thanks - Ioi On 6/3/20 12:06 PM, Jiangli Zhou wrote: > Hi Ioi, > > Really appreciate that you think through the details! I agree with > your analysis about the serializing effect of '_init_lock' for posting > class_prepare events. However I don't agree that an agent can rely on > that for class hierarchy analysis as that's VM implementation specific > behavior, but that is a different topic and does not belong to this > thread. > > Let's analyze the runtime archived boot class loading behavior as > well. When loading a shared class, the VM first loads all it's super > types. There are multiple locks involved during loading a boot class. > Those include the _system_loader_lock_obj (which is used for the boot > loader as well) and the SystemDictionary_lock. These locks are held > when loading a boot class by the NULL loader. That ensures the same > serializing effect for posting class_prepare events after runtime > restoration. Please let me know if you see any hole here. > > Thanks! > Jiangli > > On Tue, Jun 2, 2020 at 10:46 PM Ioi Lam wrote: >> >> >> On 6/2/20 10:16 PM, David Holmes wrote: >>> Hi Ioi, >>> >>> On 3/06/2020 2:55 pm, Ioi Lam wrote: >>>> >>>> On 5/27/20 11:13 PM, David Holmes wrote: >>>>> Hi Jiangli, >>>>> >>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>>> >>>>>>> I was going to take the suggestion, but realized that it would add >>>>>>> unnecessary complications for archived boot classes with class >>>>>>> pre-initialization support. Some agents may set >>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support >>>>>>> class pre-init uniformly for archived boot classes with >>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>>> >>>>>> + The order of JvmtiExport::post_class_prepare is different than >>>>>> before >>>>>> + JvmtiExport::post_class_prepare may be called for a class that >>>>>> was not called before (if the class is never linked during run time) >>>>>> + JvmtiExport::post_class_prepare was called inside the init_lock, >>>>>> now it's called outside of the init_lock >>>>> I have to say I share Ioi's concerns here. This change will impact >>>>> JVM TI agents in a way we can't be sure of. From a specification >>>>> perspective I think we are fine as linking can be lazy or eager, so >>>>> there's no implied order either. But this would be a behavioural >>>>> change that will be observable by agents. (I'm less concerned about >>>>> the init_lock situation as it seems potentially buggy to me to call >>>>> out to an agent with the init_lock held in the first place! I find >>>>> it hard to imagine an agent only working correctly if the init_lock >>>>> is held.) >>>> David, >>>> >>>> The init_lock has a serializing effect. The callback for a subclass >>>> will not be executed until the callback for its super class has been >>>> finished. >>> Sorry I don't see that is the case. The init_lock for the subclass is >>> distinct from the init_lock of the superclass, and linking of >>> subclasses and superclasses is independent. >> >> In InstanceKlass::link_class_impl, you first link all of your super classes. >> >> If another thread is already linking your super class, you will block on >> that superclass's init_lock. >> >> Of course, I may be wrong and my analysis may be bogus. But I hope you >> can appreciate that this is not going to be a trivial change to analyze. >> >> Thanks >> - Ioi >>> David >>> ----- >>> >>>> With the proposed patch, the callback for both the super class and >>>> subclass can proceed in parallel. So if an agent performs class >>>> hierarchy analysis, for example, it may need to perform extra >>>> synchronization. >>>> >>>> This is just one example that I can think of. I am sure there are >>>> other issues that we have not thought about. >>>> >>>> The fact is we are dealing with arbitrary code in the callbacks, and >>>> we are changing the conditions of how they are called. The calls >>>> happen inside very delicate code (class loading, system dictionary). >>>> I am reluctant to do the due diligence, which is substantial, of >>>> verifying that this is a safe change, unless we have a really >>>> compelling reason to do so. >>>> >>>> Thanks >>>> - Ioi >>>> >>>> From ioi.lam at oracle.com Tue Jun 9 01:47:35 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 8 Jun 2020 18:47:35 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <1d7a3514-f5b8-015b-294d-520328bb1c95@oracle.com> References: <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <1d7a3514-f5b8-015b-294d-520328bb1c95@oracle.com> Message-ID: There's also another race condition that the callback for a subclass will be called *before* the callback of its superclass. On 6/8/20 6:36 PM, Ioi Lam wrote: > Hi Jiangli, > > I've asked this before. Do you have any performance data showing the > benefit of the JVMTI part of the optimization, when JVMTI is used? > > I think ultimately the JVMTI team should evaluate the patch. I CC'ed > Serguei. I think it will be helpful if there's data that shows how > JVMTI can benefit from this patch. > > BTW, there's actually a race condition with the latest patch. I think > this shows just how difficult it's to get things right in this very > complicated part of the JVM. > > 1706???? update_dictionary(d_hash, p_index, p_hash, > 1707?????????????????????? k, class_loader_h, THREAD); > 1708?? } > 1709?? k->eager_initialize(THREAD); > 1710 > > > >>>> HERE > > 1711?? // notify jvmti > 1712?? if (JvmtiExport::should_post_class_load()) { > 1713?????? assert(THREAD->is_Java_thread(), "thread->is_Java_thread()"); > 1714?????? JvmtiExport::post_class_load((JavaThread *) THREAD, k); > 1715 > 1716?? } > 1717?? post_class_define_event(k, loader_data); > 1718?? if (k->is_shared() && k->is_linked()) { > 1719???? if (JvmtiExport::should_post_class_prepare()) { > 1720?????? // To keep the same behavior as for dynamically loaded > classes, > 1721?????? // lock the init_lock before posting the ClassPrepare event. > 1722?????? Handle h_init_lock(THREAD, k->init_lock()); > 1723?????? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); > 1724?????? JvmtiExport::post_class_prepare((JavaThread *)THREAD, k); > 1725???? } > 1726?? } > 1727 } > > For a non-boot class, between the time where klass is added to the > dictionary, to where you checked for ik->is_linked(), another thread > could have called into klass->link_class_impl() and hence invoked the > ClassPrepare callback. So your code will invoke the callback again. > > Thanks > - Ioi > > > On 6/3/20 12:06 PM, Jiangli Zhou wrote: >> Hi Ioi, >> >> Really appreciate that you think through the details! I agree with >> your analysis about the serializing effect of '_init_lock' for posting >> class_prepare events. However I don't agree that an agent can rely on >> that for class hierarchy analysis as that's VM implementation specific >> behavior, but that is a different topic and does not belong to this >> thread. >> >> Let's analyze the runtime archived boot class loading behavior as >> well. When loading a shared class, the VM first loads all it's super >> types. There are multiple locks involved during loading a boot class. >> Those include the _system_loader_lock_obj (which is used for the boot >> loader as well) and the SystemDictionary_lock. These locks are held >> when loading a boot class by the NULL loader. That ensures the same >> serializing effect for posting class_prepare events after runtime >> restoration. Please let me know if you see any hole here. >> >> Thanks! >> Jiangli >> >> On Tue, Jun 2, 2020 at 10:46 PM Ioi Lam wrote: >>> >>> >>> On 6/2/20 10:16 PM, David Holmes wrote: >>>> Hi Ioi, >>>> >>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: >>>>> >>>>> On 5/27/20 11:13 PM, David Holmes wrote: >>>>>> Hi Jiangli, >>>>>> >>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>>>> >>>>>>>> I was going to take the suggestion, but realized that it would add >>>>>>>> unnecessary complications for archived boot classes with class >>>>>>>> pre-initialization support. Some agents may set >>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to >>>>>>>> support >>>>>>>> class pre-init uniformly for archived boot classes with >>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>>>> >>>>>>> + The order of JvmtiExport::post_class_prepare is different than >>>>>>> before >>>>>>> + JvmtiExport::post_class_prepare may be called for a class that >>>>>>> was not called before (if the class is never linked during run >>>>>>> time) >>>>>>> + JvmtiExport::post_class_prepare was called inside the init_lock, >>>>>>> now it's called outside of the init_lock >>>>>> I have to say I share Ioi's concerns here. This change will impact >>>>>> JVM TI agents in a way we can't be sure of. From a specification >>>>>> perspective I think we are fine as linking can be lazy or eager, so >>>>>> there's no implied order either. But this would be a behavioural >>>>>> change that will be observable by agents. (I'm less concerned about >>>>>> the init_lock situation as it seems potentially buggy to me to call >>>>>> out to an agent with the init_lock held in the first place! I find >>>>>> it hard to imagine an agent only working correctly if the init_lock >>>>>> is held.) >>>>> David, >>>>> >>>>> The init_lock has a serializing effect. The callback for a subclass >>>>> will not be executed until the callback for its super class has been >>>>> finished. >>>> Sorry I don't see that is the case. The init_lock for the subclass is >>>> distinct from the init_lock of the superclass, and linking of >>>> subclasses and superclasses is independent. >>> >>> In InstanceKlass::link_class_impl, you first link all of your super >>> classes. >>> >>> If another thread is already linking your super class, you will >>> block on >>> that superclass's init_lock. >>> >>> Of course, I may be wrong and my analysis may be bogus. But I hope you >>> can appreciate that this is not going to be a trivial change to >>> analyze. >>> >>> Thanks >>> - Ioi >>>> David >>>> ----- >>>> >>>>> With the proposed patch, the callback for both the super class and >>>>> subclass can proceed in parallel. So if an agent performs class >>>>> hierarchy analysis, for example, it may need to perform extra >>>>> synchronization. >>>>> >>>>> This is just one example that I can think of. I am sure there are >>>>> other issues that we have not thought about. >>>>> >>>>> The fact is we are dealing with arbitrary code in the callbacks, and >>>>> we are changing the conditions of how they are called. The calls >>>>> happen inside very delicate code (class loading, system dictionary). >>>>> I am reluctant to do the due diligence, which is substantial, of >>>>> verifying that this is a safe change, unless we have a really >>>>> compelling reason to do so. >>>>> >>>>> Thanks >>>>> - Ioi >>>>> >>>>> > From fairoz.matte at oracle.com Tue Jun 9 04:20:34 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 8 Jun 2020 21:20:34 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> Message-ID: <42fdf66e-b097-47e4-9062-391e9b43968c@default> Hi Serguei, Thanks for the clarifications, I have incorporated the 2nd suggestion, below is the webrev, http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ Thanks, Fairoz From: Serguei Spitsyn Sent: Monday, June 8, 2020 10:34 PM To: Fairoz Matte ; Erik Gahlin Cc: serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken Hi Fairoz, On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: Hi Fairoz, There are two different isJFRActive() methods, one is on debuggee side and another on the debugger side. The one on debuggee side is better to keep in Debuggee.java (where it was before) instead of moving it to HeapwalkingDebuggee.java. It is okay to keep the call to it in the HeapwalkingDebuggee.java. Please, skip this suggestion as Debugger.java is not one of supers of HeapwalkingDebuggee.java as I've assumed. Thanks, Serguei + protected boolean isJFRActive() { + boolean isJFRActive = false; + ReferenceType referenceType = debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); + if (referenceType == null) + throw new RuntimeException("Debugeee is not initialized yet"); + + Field isJFRActiveFld = referenceType.fieldByName("isJFRActive"); + isJFRActive = ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); + return isJFRActive; } It is better to remove the line: + boolean isJFRActive = false; and just change this one: + boolean isJFRActive = ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); Otherwise, it looks good to me. I hope, it really works now. Thanks, Serguei On 6/8/20 00:26, Fairoz Matte wrote: Hi Serguei, Erik, ? Thanks for the reviews, Below webrev contains the suggested changes, http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ ? The only thing I couldn?t do is to keep the local copy of isJFRActive() in HeapwalkingDebugger, The method is called in debugee code. In debugger, we have access to debugee before test started or after test completes. isJFRActive() method need to be executed during the test execution. Hence I didn?t find place to initialize and cannot make local copy. ? Thanks, Fairoz ? From: Serguei Spitsyn Sent: Tuesday, June 2, 2020 7:57 AM To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin mailto:erik.gahlin at oracle.com Cc: mailto:serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: Hi Fairoz, It looks okay in general. But I'm not sure this check is going to work. The problem is the HeapwalkingDebuggee.useStrictCheck method is invoked in the context of the HeapwalkingDebugger process, not the HeapwalkingDebuggee process. Probably, you wanted to get this bit of information from the Debuggee process. The debuggee has to evaluate it itself and store in some field. The debugger should use the JDI to get this value from the debuggee. Thanks, Serguei I'm not sure, what exactly you wanted to do here. It can occasionally work for you as long as both processes are run with the same options. Thanks, Serguei On 6/1/20 08:52, Fairoz Matte wrote: Hi Erik, ? Thanks for the review, below is the updated webrev. http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ ? Thanks, Fairoz ? -----Original Message----- From: Erik Gahlin Sent: Monday, June 1, 2020 4:26 PM To: Fairoz Matte mailto:fairoz.matte at oracle.com Cc: mailto:serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken ? Hi Fairoz, ? What I think you need to do is something like this: ? ??????? if (className.equals("java.lang.Thread")) { ??????????? return !isJfrInitialized(); ??????? } ? ... ? ??? private static boolean isJfrInitialized() { ??????? try { ??????????? Class clazz = Class.forName("jdk.jfr.FlightRecorder"); ??????????? Method method = clazz.getDeclaredMethod("isInitialized", new Class[0]); ??????????? return (boolean) method.invoke(null, new Object[0]); ??????? } catch (Exception e) { ??????????? return false; ??????? } ??? } ? Erik ? On 2020-06-01 12:30, Fairoz Matte wrote: Hi Erik, ? Thanks for your quick response, Below is the updated webrev to handle if jfr module is not present http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ ? Thanks, Fairoz ? -----Original Message----- From: Erik Gahlin Sent: Monday, June 1, 2020 2:31 PM To: Fairoz Matte mailto:fairoz.matte at oracle.com Cc: mailto:serviceability-dev at openjdk.java.net Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken ? Hi Fairoz, ? If the test needs to run with builds where the JFR module is not present(?), you need to do the check using reflection. ? If not, looks good. ? Erik ? On 1 Jun 2020, at 10:27, Fairoz Matte mailto:fairoz.matte at oracle.com wrote: ? Hi, ? Please review this small test infra change to identify at runtime the JFR is active or not. JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ ? Thanks, Fairoz ? ? From serguei.spitsyn at oracle.com Tue Jun 9 04:26:42 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 8 Jun 2020 21:26:42 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <42fdf66e-b097-47e4-9062-391e9b43968c@default> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> Message-ID: <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> Hi Fairoz, LGTM. Thanks, Serguei On 6/8/20 21:20, Fairoz Matte wrote: > Hi Serguei, > > Thanks for the clarifications, > I have incorporated the 2nd suggestion, below is the webrev, > http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ > > Thanks, > Fairoz > > From: Serguei Spitsyn > Sent: Monday, June 8, 2020 10:34 PM > To: Fairoz Matte ; Erik Gahlin > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken > > Hi Fairoz, > > > On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: > Hi Fairoz, > > There are two different isJFRActive() methods, one is on debuggee side and another on the debugger side. > The one on debuggee side is better to keep in Debuggee.java (where it was before) instead of moving it to HeapwalkingDebuggee.java. > It is okay to keep the call to it in the HeapwalkingDebuggee.java. > > Please, skip this suggestion as Debugger.java is not one of supers of HeapwalkingDebuggee.java as I've assumed. > > Thanks, > Serguei > > > + protected boolean isJFRActive() { > + boolean isJFRActive = false; > + ReferenceType referenceType = debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); > + if (referenceType == null) > + throw new RuntimeException("Debugeee is not initialized yet"); > + > + Field isJFRActiveFld = referenceType.fieldByName("isJFRActive"); > + isJFRActive = ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > + return isJFRActive; > } > It is better to remove the line: > + boolean isJFRActive = false; > and just change this one: > + boolean isJFRActive = ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > > Otherwise, it looks good to me. > I hope, it really works now. > > Thanks, > Serguei > > On 6/8/20 00:26, Fairoz Matte wrote: > Hi Serguei, Erik, > > Thanks for the reviews, > Below webrev contains the suggested changes, > http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ > > The only thing I couldn?t do is to keep the local copy of isJFRActive() in HeapwalkingDebugger, > The method is called in debugee code. > In debugger, we have access to debugee before test started or after test completes. > isJFRActive() method need to be executed during the test execution. Hence I didn?t find place to initialize and cannot make local copy. > > Thanks, > Fairoz > > From: Serguei Spitsyn > Sent: Tuesday, June 2, 2020 7:57 AM > To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin mailto:erik.gahlin at oracle.com > Cc: mailto:serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken > > On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: > Hi Fairoz, > > It looks okay in general. > But I'm not sure this check is going to work. > The problem is the HeapwalkingDebuggee.useStrictCheck method is invoked in the > context of the HeapwalkingDebugger process, not the HeapwalkingDebuggee process. > > Probably, you wanted to get this bit of information from the Debuggee process. > The debuggee has to evaluate it itself and store in some field. > The debugger should use the JDI to get this value from the debuggee. > > Thanks, > Serguei > > I'm not sure, what exactly you wanted to do here. > It can occasionally work for you as long as both processes are run with the same options. > > Thanks, > Serguei > > > On 6/1/20 08:52, Fairoz Matte wrote: > Hi Erik, > > Thanks for the review, below is the updated webrev. > http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ > > Thanks, > Fairoz > > -----Original Message----- > From: Erik Gahlin > Sent: Monday, June 1, 2020 4:26 PM > To: Fairoz Matte mailto:fairoz.matte at oracle.com > Cc: mailto:serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect > and corresponsing logic seems to be broken > > Hi Fairoz, > > What I think you need to do is something like this: > > ??????? if (className.equals("java.lang.Thread")) { > ??????????? return !isJfrInitialized(); > ??????? } > > ... > > ??? private static boolean isJfrInitialized() { > ??????? try { > ??????????? Class clazz = Class.forName("jdk.jfr.FlightRecorder"); > ??????????? Method method = clazz.getDeclaredMethod("isInitialized", > new Class[0]); > ??????????? return (boolean) method.invoke(null, new Object[0]); > ??????? } catch (Exception e) { > ??????????? return false; > ??????? } > ??? } > > Erik > > On 2020-06-01 12:30, Fairoz Matte wrote: > Hi Erik, > > Thanks for your quick response, > Below is the updated webrev to handle if jfr module is not present > http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ > > Thanks, > Fairoz > > -----Original Message----- > From: Erik Gahlin > Sent: Monday, June 1, 2020 2:31 PM > To: Fairoz Matte mailto:fairoz.matte at oracle.com > Cc: mailto:serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > incorrect and corresponsing logic seems to be broken > > Hi Fairoz, > > If the test needs to run with builds where the JFR module is not > present(?), you need to do the check using reflection. > > If not, looks good. > > Erik > > On 1 Jun 2020, at 10:27, Fairoz Matte mailto:fairoz.matte at oracle.com wrote: > > Hi, > > Please review this small test infra change to identify at runtime > the JFR is > active or not. > JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > > Thanks, > Fairoz > > > From david.holmes at oracle.com Tue Jun 9 06:26:09 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 9 Jun 2020 16:26:09 +1000 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <623437f8-2d29-b0f4-3313-3cb570651452@oracle.com> <064147c8-957f-dc47-139d-6b1a362c9e98@oracle.com> Message-ID: <1b1ecfca-22e7-a64c-a5f2-6f5ea7b37604@oracle.com> Hi Jiangli, > http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ I'm having trouble keeping track of all the issues, so let me walk through the changes as I see them: - InstanceKlass::restore_unshareable_info For boot loader classes, when no verification is enabled, we mark the class as linked immediately. By doing this in restore_unshareable_info there are no races (as the class is not exposed to anyone yet) and it allows later checks for is_linked to be by-passed (under the assumption that the class and its supertypes truly are in a state that appears linked). However, this doesn't generate the JVM TI class prepare event, and we can't do it here as that would introduce a number of potential issues with JVM TI. I see in the bug report some metrics from HelloWorld, but really this needs to be backed up by a lot more performance measurements to establish this is actually a worthwhile optimisation. - SystemDictionary::define_instance_class This is where we catch up with the JVM TI requirements and immediately after posting the class load event we post the class prepare event. As we have discussed, this earlier posting of the event is observable to a JVMTI agent and although permitted by the specification it is a change in behaviour that might impact existing agents. Ioi has raised an issue about there being a race here with the potential for the event being delivered multiple times. I agree this code is not adequate: 1718 if (k->is_shared() && k->is_linked()) { You only want to fire the event for exactly those classes that you pre-linked, so at a minimum this has to be restricted to boot classes only. Even then as Ioi points out once the class is exported to the SystemDictionary and visibly seen to be loaded, then other threads may race to link it and so have already posted the class prepare event. In normal linking this race is avoided by the use of the init_lock to check the linked state, do the linking and issue the class prepare event, atomically. But your approach cannot do this as it stands, you would need to add an additional flag to track whether the prepare event had already be issued. --- So the change as it stands is incomplete, and introduces a behavioural change to JVM TI, and the benefits of it have not been clearly established. The JBS issue states this is a first step towards pre-initialization and other optimisations, and it is certainly a pre-requisite to pre-link before you can pre-initialize, but I don't think pulling out pre-linking as a separate optimisation is really a worthwhile first step. I have grave reservations about the ability to pre-initialize in general and those issues have to be fleshed out in a project like Leyden. Further, as Coleen points out this pre-linking optimisation is incompatible with proposed vtable changes. Additionally, this seems it will be incompatible with changes proposed in Valhalla, as additional link-time actions will be needed that can't be done at the time of restore_unshareable_info. Bottom line for me is that I just don't think this change is worth pursuing as a stand-alone optimisation at this time. Sorry. Cheers, David ----- On 5/06/2020 8:14 am, Jiangli Zhou wrote: > Hi David, > > On Wed, Jun 3, 2020 at 9:59 PM David Holmes wrote: >> >> Ioi pointed out that my proposal was incomplete and that it would need >> to be more like: >> >> if (is_shared() && >> JvmtiExport::should_post_class_prepare() && >> !BytecodeVerificationLocal && >> loader_data->is_the_null_class_loader_data()) { >> Handle h_init_lock(THREAD, init_lock()); >> ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >> set_init_state(linked); >> >>> call JVMTI >> return true; >> } >> >> This alleviates any concerns about behavioural changes to JVM TI, and >> also allows JVM TI enabled code to partially benefit from the >> pre-linking optimisation. >> >> Otherwise I agree with Ioi that any behaviour change to JVM TI needs to >> be justified by significant performance gains. >> > > Thanks a lot for the input and suggestion! Locking the init_lock for > the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is > normally posted before the ClassPrepare. That's why the change was > made in systemDictionary.cpp instead of within > InstanceKlass::restore_unshareable_info() function, to keep the same > events ordering for any given class. I added the 'init_lock' locking > code for post_class_prepare(), and kept the code in > systemDictionary.cpp in webreve.03 below. Not changing the JVMTI > events ordering feels safer to me. Would the following be ok to > everyone? > > http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ > > I also changed the InstanceKlass::restore_unshareable_info() to set > _init_state via set_init_state API as you suggested. We can get away > without locking the init_lock for setting the flag itself. > > Best regards, > > Jiangli > > >> David >> ----- >> >> On 4/06/2020 8:42 am, David Holmes wrote: >>> Correction ... >>> >>> On 3/06/2020 5:19 pm, David Holmes wrote: >>>> On 3/06/2020 3:44 pm, Ioi Lam wrote: >>>>> On 6/2/20 10:16 PM, David Holmes wrote: >>>>>> Hi Ioi, >>>>>> >>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: >>>>>>> >>>>>>> >>>>>>> On 5/27/20 11:13 PM, David Holmes wrote: >>>>>>>> Hi Jiangli, >>>>>>>> >>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>> I was going to take the suggestion, but realized that it would add >>>>>>>>>> unnecessary complications for archived boot classes with class >>>>>>>>>> pre-initialization support. Some agents may set >>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to >>>>>>>>>> support >>>>>>>>>> class pre-init uniformly for archived boot classes with >>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>>>>>> >>>>>>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>>>>>> >>>>>>>>> + The order of JvmtiExport::post_class_prepare is different than >>>>>>>>> before >>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class that >>>>>>>>> was not called before (if the class is never linked during run time) >>>>>>>>> + JvmtiExport::post_class_prepare was called inside the >>>>>>>>> init_lock, now it's called outside of the init_lock >>>>>>>> >>>>>>>> I have to say I share Ioi's concerns here. This change will impact >>>>>>>> JVM TI agents in a way we can't be sure of. From a specification >>>>>>>> perspective I think we are fine as linking can be lazy or eager, >>>>>>>> so there's no implied order either. But this would be a >>>>>>>> behavioural change that will be observable by agents. (I'm less >>>>>>>> concerned about the init_lock situation as it seems potentially >>>>>>>> buggy to me to call out to an agent with the init_lock held in the >>>>>>>> first place! I find it hard to imagine an agent only working >>>>>>>> correctly if the init_lock is held.) >>>>>>> >>>>>>> David, >>>>>>> >>>>>>> The init_lock has a serializing effect. The callback for a subclass >>>>>>> will not be executed until the callback for its super class has >>>>>>> been finished. >>>>>> >>>>>> Sorry I don't see that is the case. The init_lock for the subclass >>>>>> is distinct from the init_lock of the superclass, and linking of >>>>>> subclasses and superclasses is independent. >>>>> >>>>> >>>>> In InstanceKlass::link_class_impl, you first link all of your super >>>>> classes. >>>>> >>>>> If another thread is already linking your super class, you will block >>>>> on that superclass's init_lock. >>>> >>>> The point is that there is already a race in terms of the execution of >>>> the two callbacks. So while this change can certainly produce a >>>> different result to what would previously be seen, such a result is >>>> already possible in the general case. >>>> >>>>> Of course, I may be wrong and my analysis may be bogus. But I hope >>>>> you can appreciate that this is not going to be a trivial change to >>>>> analyze. >>>> >>>> Yes I agree. While in general ordering of the class_prepare callbacks >>>> is not guaranteed for independent classes, if a given application >>>> explicitly loads and links classes in a known order then it can >>>> (reasonably) expect its callbacks to execute in that order. If this >>>> change means classes will now be linked in an order independent of >>>> what the normal runtime order would be then that could be a problem >>>> for existing agents. >>>> >>>> So where does this leave us? The change is within spec, but could >>>> trigger changes in agent behaviour that we can't really evaluate >>>> a-priori. So as you say we should have a fairly good reason for doing >>>> this. I can easily envisage that pre-linking when no callbacks are >>>> enabled would be a performance boost. But with callbacks enabled and >>>> consuming CPU cycles any benefit from pre-linking could be lost in the >>>> noise. >>>> >>>> What if we did as Ioi suggested and only set the class as linked in >>>> restore_unshareable_info if !JvmtiExport::should_post_class_prepare(); >>>> and in addition in InstanceKlass::link_class_imp we added an >>>> additional check at the start: >>>> >>>> // Pre-linking at load time may have been disabled for shared classes, >>>> // but we may be able to do it now. >>>> if (JvmtiExport::should_post_class_prepare() && >>>> !BytecodeVerificationLocal && >>>> loader_data->is_the_null_class_loader_data()) { >>>> _init_state = linked; >>>> } >>> >>> There should obviously be a check for is_shared() in there as well. >>> >>> David >>> ----- >>> >>>> ? >>>> >>>> That avoids the problem of changing the JVM TI callback behaviour, but >>>> also shortens the link time path when the callbacks are enabled. >>>> >>>> Hope I got that right. :) >>>> >>>> David >>>> ----- >>>> >>>>> Thanks >>>>> - Ioi >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> With the proposed patch, the callback for both the super class and >>>>>>> subclass can proceed in parallel. So if an agent performs class >>>>>>> hierarchy analysis, for example, it may need to perform extra >>>>>>> synchronization. >>>>>>> >>>>>>> This is just one example that I can think of. I am sure there are >>>>>>> other issues that we have not thought about. >>>>>>> >>>>>>> The fact is we are dealing with arbitrary code in the callbacks, >>>>>>> and we are changing the conditions of how they are called. The >>>>>>> calls happen inside very delicate code (class loading, system >>>>>>> dictionary). I am reluctant to do the due diligence, which is >>>>>>> substantial, of verifying that this is a safe change, unless we >>>>>>> have a really compelling reason to do so. >>>>>>> >>>>>>> Thanks >>>>>>> - Ioi >>>>>>> >>>>>>> >>>>> From coleen.phillimore at oracle.com Tue Jun 9 12:20:27 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jun 2020 08:20:27 -0400 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <0842f780-e3a4-83df-8425-6e4bed765216@oracle.com> References: <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <623437f8-2d29-b0f4-3313-3cb570651452@oracle.com> <064147c8-957f-dc47-139d-6b1a362c9e98@oracle.com> <0842f780-e3a4-83df-8425-6e4bed765216@oracle.com> Message-ID: <61a5fe4a-6a87-b8d3-8580-764796a5f775@oracle.com> Hi Jiangli,? I'm sorry I didn't see the whole thread because I filtered it to serviceability-dev.? I see you have answered some of these questions, there. Still reading. Coleen On 6/8/20 10:46 PM, coleen.phillimore at oracle.com wrote: > > Hi Jiangi, > > I apologize for jumping in at this late stage of this change. I've > seen the emails but there's been a lot of discussion which is hard to > follow. > > I have some concerns with setting the state to "linked" since the > changes that Erik Osterlund is working on would require reinitializing > the itable and vtables when you load the shared class.? See the JEP > for New Invoke Bindings > https://bugs.openjdk.java.net/browse/JDK-8221828. > > We would have to remove this optimization.? Erik is planning on > getting this work into JDK 16 since we have a functionally complete > version.? See: > https://github.com/coleenp/jdk/blob/erik-calls/src/hotspot/share/oops/instanceKlass.cpp#L880 > > > Also, I haven't figured out why you are enabling this optimization if > JVMTI is requested, since the optimization seems to have minor > benefits.? And I'm concerned with threads observing the class as > linked but I don't see any bugs there.? By setting the state to > "linked" we are skipping these steps: > > linking super classes and interfaces - can we assume that they are > already linked when ik->restore_unshareable_info is called ? > check_verification_constraints - presumably OK for NULL CLD, this has > a quick exit > link_methods - this is already called in restore_unshareable_info so > it has a quick exit > check_linking_constraints - presumably OK for NULL CLD, this has a > quick exit > initializing the vtable - will need to revert the change for new > invoke bindings and maybe valhalla. > > link_class_impl doesn't really do that much for boot class loader. > > I imagine that this change is so that potentially CDS classes can be > pre-initialized so that more in the mirror can be shared, which sounds > difficult to do except maybe for some classes.? Is this being > discussed on the Project Leyden thread? > > Thanks, > Coleen > > > On 6/8/20 9:02 PM, Jiangli Zhou wrote: >> Hi Ioi, >> >> After incorporating David's suggestion of locking init_lock for >> posting ClassPrepare events, do you have other concerns about the >> change? I hope we are finally able to move on with an inclusive and >> right solution that works for broad usages, particular on the cloud >> spectrum. >> >> Best, >> Jiangli >> >> On Thu, Jun 4, 2020 at 3:14 PM Jiangli Zhou >> wrote: >>> Hi David, >>> >>> On Wed, Jun 3, 2020 at 9:59 PM David Holmes >>> wrote: >>>> Ioi pointed out that my proposal was incomplete and that it would need >>>> to be more like: >>>> >>>> if (is_shared() && >>>> ????? JvmtiExport::should_post_class_prepare() && >>>> ????? !BytecodeVerificationLocal && >>>> ????? loader_data->is_the_null_class_loader_data()) { >>>> ????? Handle h_init_lock(THREAD, init_lock()); >>>> ????? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >>>> ????? set_init_state(linked); >>>> ????? >>> call JVMTI >>>> ????? return true; >>>> ??? } >>>> >>>> This alleviates any concerns about behavioural changes to JVM TI, and >>>> also allows JVM TI enabled code to partially benefit from the >>>> pre-linking optimisation. >>>> >>>> Otherwise I agree with Ioi that any behaviour change to JVM TI >>>> needs to >>>> be justified by significant performance gains. >>>> >>> Thanks a lot for the input and suggestion! Locking the init_lock for >>> the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is >>> normally posted before the ClassPrepare. That's why the change was >>> made in systemDictionary.cpp instead of within >>> InstanceKlass::restore_unshareable_info() function, to keep the same >>> events ordering for any given class. I added the 'init_lock' locking >>> code for post_class_prepare(), and kept the code in >>> systemDictionary.cpp in webreve.03 below.? Not changing the JVMTI >>> events ordering feels safer to me. Would the following be ok to >>> everyone? >>> >>> ?? http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ >>> >>> I also changed the InstanceKlass::restore_unshareable_info() to set >>> _init_state via set_init_state API as you suggested. We can get away >>> without locking the init_lock for setting the flag itself. >>> >>> Best regards, >>> >>> Jiangli >>> >>> >>>> David >>>> ----- >>>> >>>> On 4/06/2020 8:42 am, David Holmes wrote: >>>>> Correction ... >>>>> >>>>> On 3/06/2020 5:19 pm, David Holmes wrote: >>>>>> On 3/06/2020 3:44 pm, Ioi Lam wrote: >>>>>>> On 6/2/20 10:16 PM, David Holmes wrote: >>>>>>>> Hi Ioi, >>>>>>>> >>>>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: >>>>>>>>> >>>>>>>>> On 5/27/20 11:13 PM, David Holmes wrote: >>>>>>>>>> Hi Jiangli, >>>>>>>>>> >>>>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>>>>>>>> >>>>>>>>>>>> I was going to take the suggestion, but realized that it >>>>>>>>>>>> would add >>>>>>>>>>>> unnecessary complications for archived boot classes with class >>>>>>>>>>>> pre-initialization support. Some agents may set >>>>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to >>>>>>>>>>>> support >>>>>>>>>>>> class pre-init uniformly for archived boot classes with >>>>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>>>>>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>>>>>>>> >>>>>>>>>>> + The order of JvmtiExport::post_class_prepare is different >>>>>>>>>>> than >>>>>>>>>>> before >>>>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class >>>>>>>>>>> that >>>>>>>>>>> was not called before (if the class is never linked during >>>>>>>>>>> run time) >>>>>>>>>>> + JvmtiExport::post_class_prepare was called inside the >>>>>>>>>>> init_lock, now it's called outside of the init_lock >>>>>>>>>> I have to say I share Ioi's concerns here. This change will >>>>>>>>>> impact >>>>>>>>>> JVM TI agents in a way we can't be sure of. From a specification >>>>>>>>>> perspective I think we are fine as linking can be lazy or eager, >>>>>>>>>> so there's no implied order either. But this would be a >>>>>>>>>> behavioural change that will be observable by agents. (I'm less >>>>>>>>>> concerned about the init_lock situation as it seems potentially >>>>>>>>>> buggy to me to call out to an agent with the init_lock held >>>>>>>>>> in the >>>>>>>>>> first place! I find it hard to imagine an agent only working >>>>>>>>>> correctly if the init_lock is held.) >>>>>>>>> David, >>>>>>>>> >>>>>>>>> The init_lock has a serializing effect. The callback for a >>>>>>>>> subclass >>>>>>>>> will not be executed until the callback for its super class has >>>>>>>>> been finished. >>>>>>>> Sorry I don't see that is the case. The init_lock for the subclass >>>>>>>> is distinct from the init_lock of the superclass, and linking of >>>>>>>> subclasses and superclasses is independent. >>>>>>> >>>>>>> In InstanceKlass::link_class_impl, you first link all of your super >>>>>>> classes. >>>>>>> >>>>>>> If another thread is already linking your super class, you will >>>>>>> block >>>>>>> on that superclass's init_lock. >>>>>> The point is that there is already a race in terms of the >>>>>> execution of >>>>>> the two callbacks. So while this change can certainly produce a >>>>>> different result to what would previously be seen, such a result is >>>>>> already possible in the general case. >>>>>> >>>>>>> Of course, I may be wrong and my analysis may be bogus. But I hope >>>>>>> you can appreciate that this is not going to be a trivial change to >>>>>>> analyze. >>>>>> Yes I agree. While in general ordering of the class_prepare >>>>>> callbacks >>>>>> is not guaranteed for independent classes, if a given application >>>>>> explicitly loads and links classes in a known order then it can >>>>>> (reasonably) expect its callbacks to execute in that order. If this >>>>>> change means classes will now be linked in an order independent of >>>>>> what the normal runtime order would be then that could be a problem >>>>>> for existing agents. >>>>>> >>>>>> So where does this leave us? The change is within spec, but could >>>>>> trigger changes in agent behaviour that we can't really evaluate >>>>>> a-priori. So as you say we should have a fairly good reason for >>>>>> doing >>>>>> this. I can easily envisage that pre-linking when no callbacks are >>>>>> enabled would be a performance boost. But with callbacks enabled and >>>>>> consuming CPU cycles any benefit from pre-linking could be lost >>>>>> in the >>>>>> noise. >>>>>> >>>>>> What if we did as Ioi suggested and only set the class as linked in >>>>>> restore_unshareable_info if >>>>>> !JvmtiExport::should_post_class_prepare(); >>>>>> and in addition in InstanceKlass::link_class_imp we added an >>>>>> additional check at the start: >>>>>> >>>>>> // Pre-linking at load time may have been disabled for shared >>>>>> classes, >>>>>> // but we may be able to do it now. >>>>>> if (JvmtiExport::should_post_class_prepare() && >>>>>> ????? !BytecodeVerificationLocal && >>>>>> ????? loader_data->is_the_null_class_loader_data()) { >>>>>> ??? _init_state = linked; >>>>>> } >>>>> There should obviously be a check for is_shared() in there as well. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> ? >>>>>> >>>>>> That avoids the problem of changing the JVM TI callback >>>>>> behaviour, but >>>>>> also shortens the link time path when the callbacks are enabled. >>>>>> >>>>>> Hope I got that right. :) >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Thanks >>>>>>> - Ioi >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> With the proposed patch, the callback for both the super class >>>>>>>>> and >>>>>>>>> subclass can proceed in parallel. So if an agent performs class >>>>>>>>> hierarchy analysis, for example, it may need to perform extra >>>>>>>>> synchronization. >>>>>>>>> >>>>>>>>> This is just one example that I can think of. I am sure there are >>>>>>>>> other issues that we have not thought about. >>>>>>>>> >>>>>>>>> The fact is we are dealing with arbitrary code in the callbacks, >>>>>>>>> and we are changing the conditions of how they are called. The >>>>>>>>> calls happen inside very delicate code (class loading, system >>>>>>>>> dictionary). I am reluctant to do the due diligence, which is >>>>>>>>> substantial, of verifying that this is a safe change, unless we >>>>>>>>> have a really compelling reason to do so. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> - Ioi >>>>>>>>> >>>>>>>>> > From coleen.phillimore at oracle.com Tue Jun 9 12:43:21 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jun 2020 08:43:21 -0400 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <1b1ecfca-22e7-a64c-a5f2-6f5ea7b37604@oracle.com> References: <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <623437f8-2d29-b0f4-3313-3cb570651452@oracle.com> <064147c8-957f-dc47-139d-6b1a362c9e98@oracle.com> <1b1ecfca-22e7-a64c-a5f2-6f5ea7b37604@oracle.com> Message-ID: (Posting on the right thread and list now...) On 6/9/20 2:26 AM, David Holmes wrote: > Hi Jiangli, > > >??? http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ > > I'm having trouble keeping track of all the issues, so let me walk > through the changes as I see them: > > - InstanceKlass::restore_unshareable_info > > For boot loader classes, when no verification is enabled, we mark the > class as linked immediately. By doing this in restore_unshareable_info > there are no races (as the class is not exposed to anyone yet) and it > allows later checks for is_linked to be by-passed (under the > assumption that the class and its supertypes truly are in a state that > appears linked). However, this doesn't generate the JVM TI class > prepare event, and we can't do it here as that would introduce a > number of potential issues with JVM TI. > > I see in the bug report some metrics from HelloWorld, but really this > needs to be backed up by a lot more performance measurements to > establish this is actually a worthwhile optimisation. > > - SystemDictionary::define_instance_class > > This is where we catch up with the JVM TI requirements and immediately > after posting the class load event we post the class prepare event. > > As we have discussed, this earlier posting of the event is observable > to a JVMTI agent and although permitted by the specification it is a > change in behaviour that might impact existing agents. > > Ioi has raised an issue about there being a race here with the > potential for the event being delivered multiple times. I agree this > code is not adequate: > > 1718?? if (k->is_shared() && k->is_linked()) { > > You only want to fire the event for exactly those classes that you > pre-linked, so at a minimum this has to be restricted to boot classes > only. Even then as Ioi points out once the class is exported to the > SystemDictionary and visibly seen to be loaded, then other threads may > race to link it and so have already posted the class prepare event. In > normal linking this race is avoided by the use of the init_lock to > check the linked state, do the linking and issue the class prepare > event, atomically. But your approach cannot do this as it stands, you > would need to add an additional flag to track whether the prepare > event had already be issued. > Thanks to Ioi and David for seeing this race.? As I looked at the change, it looked fairly simple and almost straightforward, but very scary how these changes interact in such surprising ways.? Without this careful review, these changes cause endless work later on.? The area of class loading and our code for doing so has all sorts of subtle details that are hard to reason about.? I wish this weren't so and we can have code that we're not afraid of. The CSR is a nice writeup but I didn't see the race from the CSR either. We need to take the opportunity to look at this from the top down in a project like Leyden. There are still some opportunities to speed up class loading in the context of CDS and finding places that we can simplify, but this was alarmingly not simple.? I'm grateful to Ioi and David for doing this work, and yours, for thorougly discussing this change. Thanks, Coleen > --- > > So the change as it stands is incomplete, and introduces a behavioural > change to JVM TI, and the benefits of it have not been clearly > established. > > The JBS issue states this is a first step towards pre-initialization > and other optimisations, and it is certainly a pre-requisite to > pre-link before you can pre-initialize, but I don't think pulling out > pre-linking as a separate optimisation is really a worthwhile first > step. I have grave reservations about the ability to pre-initialize in > general and those issues have to be fleshed out in a project like > Leyden. Further, as Coleen points out this pre-linking optimisation is > incompatible with proposed vtable changes. Additionally, this seems it > will be incompatible with changes proposed in Valhalla, as additional > link-time actions will be needed that can't be done at the time of > restore_unshareable_info. > > Bottom line for me is that I just don't think this change is worth > pursuing as a stand-alone optimisation at this time. Sorry. > > Cheers, > David > ----- > > On 5/06/2020 8:14 am, Jiangli Zhou wrote: >> Hi David, >> >> On Wed, Jun 3, 2020 at 9:59 PM David Holmes >> wrote: >>> >>> Ioi pointed out that my proposal was incomplete and that it would need >>> to be more like: >>> >>> if (is_shared() && >>> ????? JvmtiExport::should_post_class_prepare() && >>> ????? !BytecodeVerificationLocal && >>> ????? loader_data->is_the_null_class_loader_data()) { >>> ????? Handle h_init_lock(THREAD, init_lock()); >>> ????? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >>> ????? set_init_state(linked); >>> ????? >>> call JVMTI >>> ????? return true; >>> ??? } >>> >>> This alleviates any concerns about behavioural changes to JVM TI, and >>> also allows JVM TI enabled code to partially benefit from the >>> pre-linking optimisation. >>> >>> Otherwise I agree with Ioi that any behaviour change to JVM TI needs to >>> be justified by significant performance gains. >>> >> >> Thanks a lot for the input and suggestion! Locking the init_lock for >> the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is >> normally posted before the ClassPrepare. That's why the change was >> made in systemDictionary.cpp instead of within >> InstanceKlass::restore_unshareable_info() function, to keep the same >> events ordering for any given class. I added the 'init_lock' locking >> code for post_class_prepare(), and kept the code in >> systemDictionary.cpp in webreve.03 below.? Not changing the JVMTI >> events ordering feels safer to me. Would the following be ok to >> everyone? >> >> ?? http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ >> >> I also changed the InstanceKlass::restore_unshareable_info() to set >> _init_state via set_init_state API as you suggested. We can get away >> without locking the init_lock for setting the flag itself. >> >> Best regards, >> >> Jiangli >> >> >>> David >>> ----- >>> >>> On 4/06/2020 8:42 am, David Holmes wrote: >>>> Correction ... >>>> >>>> On 3/06/2020 5:19 pm, David Holmes wrote: >>>>> On 3/06/2020 3:44 pm, Ioi Lam wrote: >>>>>> On 6/2/20 10:16 PM, David Holmes wrote: >>>>>>> Hi Ioi, >>>>>>> >>>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 5/27/20 11:13 PM, David Holmes wrote: >>>>>>>>> Hi Jiangli, >>>>>>>>> >>>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I was going to take the suggestion, but realized that it >>>>>>>>>>> would add >>>>>>>>>>> unnecessary complications for archived boot classes with class >>>>>>>>>>> pre-initialization support. Some agents may set >>>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to >>>>>>>>>>> support >>>>>>>>>>> class pre-init uniformly for archived boot classes with >>>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>>>>>>> >>>>>>>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>>>>>>> >>>>>>>>>> + The order of JvmtiExport::post_class_prepare is different than >>>>>>>>>> before >>>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class that >>>>>>>>>> was not called before (if the class is never linked during >>>>>>>>>> run time) >>>>>>>>>> + JvmtiExport::post_class_prepare was called inside the >>>>>>>>>> init_lock, now it's called outside of the init_lock >>>>>>>>> >>>>>>>>> I have to say I share Ioi's concerns here. This change will >>>>>>>>> impact >>>>>>>>> JVM TI agents in a way we can't be sure of. From a specification >>>>>>>>> perspective I think we are fine as linking can be lazy or eager, >>>>>>>>> so there's no implied order either. But this would be a >>>>>>>>> behavioural change that will be observable by agents. (I'm less >>>>>>>>> concerned about the init_lock situation as it seems potentially >>>>>>>>> buggy to me to call out to an agent with the init_lock held in >>>>>>>>> the >>>>>>>>> first place! I find it hard to imagine an agent only working >>>>>>>>> correctly if the init_lock is held.) >>>>>>>> >>>>>>>> David, >>>>>>>> >>>>>>>> The init_lock has a serializing effect. The callback for a >>>>>>>> subclass >>>>>>>> will not be executed until the callback for its super class has >>>>>>>> been finished. >>>>>>> >>>>>>> Sorry I don't see that is the case. The init_lock for the subclass >>>>>>> is distinct from the init_lock of the superclass, and linking of >>>>>>> subclasses and superclasses is independent. >>>>>> >>>>>> >>>>>> In InstanceKlass::link_class_impl, you first link all of your super >>>>>> classes. >>>>>> >>>>>> If another thread is already linking your super class, you will >>>>>> block >>>>>> on that superclass's init_lock. >>>>> >>>>> The point is that there is already a race in terms of the >>>>> execution of >>>>> the two callbacks. So while this change can certainly produce a >>>>> different result to what would previously be seen, such a result is >>>>> already possible in the general case. >>>>> >>>>>> Of course, I may be wrong and my analysis may be bogus. But I hope >>>>>> you can appreciate that this is not going to be a trivial change to >>>>>> analyze. >>>>> >>>>> Yes I agree. While in general ordering of the class_prepare callbacks >>>>> is not guaranteed for independent classes, if a given application >>>>> explicitly loads and links classes in a known order then it can >>>>> (reasonably) expect its callbacks to execute in that order. If this >>>>> change means classes will now be linked in an order independent of >>>>> what the normal runtime order would be then that could be a problem >>>>> for existing agents. >>>>> >>>>> So where does this leave us? The change is within spec, but could >>>>> trigger changes in agent behaviour that we can't really evaluate >>>>> a-priori. So as you say we should have a fairly good reason for doing >>>>> this. I can easily envisage that pre-linking when no callbacks are >>>>> enabled would be a performance boost. But with callbacks enabled and >>>>> consuming CPU cycles any benefit from pre-linking could be lost in >>>>> the >>>>> noise. >>>>> >>>>> What if we did as Ioi suggested and only set the class as linked in >>>>> restore_unshareable_info if >>>>> !JvmtiExport::should_post_class_prepare(); >>>>> and in addition in InstanceKlass::link_class_imp we added an >>>>> additional check at the start: >>>>> >>>>> // Pre-linking at load time may have been disabled for shared >>>>> classes, >>>>> // but we may be able to do it now. >>>>> if (JvmtiExport::should_post_class_prepare() && >>>>> ????? !BytecodeVerificationLocal && >>>>> ????? loader_data->is_the_null_class_loader_data()) { >>>>> ??? _init_state = linked; >>>>> } >>>> >>>> There should obviously be a check for is_shared() in there as well. >>>> >>>> David >>>> ----- >>>> >>>>> ? >>>>> >>>>> That avoids the problem of changing the JVM TI callback behaviour, >>>>> but >>>>> also shortens the link time path when the callbacks are enabled. >>>>> >>>>> Hope I got that right. :) >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> Thanks >>>>>> - Ioi >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> With the proposed patch, the callback for both the super class and >>>>>>>> subclass can proceed in parallel. So if an agent performs class >>>>>>>> hierarchy analysis, for example, it may need to perform extra >>>>>>>> synchronization. >>>>>>>> >>>>>>>> This is just one example that I can think of. I am sure there are >>>>>>>> other issues that we have not thought about. >>>>>>>> >>>>>>>> The fact is we are dealing with arbitrary code in the callbacks, >>>>>>>> and we are changing the conditions of how they are called. The >>>>>>>> calls happen inside very delicate code (class loading, system >>>>>>>> dictionary). I am reluctant to do the due diligence, which is >>>>>>>> substantial, of verifying that this is a safe change, unless we >>>>>>>> have a really compelling reason to do so. >>>>>>>> >>>>>>>> Thanks >>>>>>>> - Ioi >>>>>>>> >>>>>>>> >>>>>> From poonam.bajaj at oracle.com Tue Jun 9 14:46:35 2020 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Tue, 9 Jun 2020 07:46:35 -0700 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures Message-ID: Hello, Please review this simple change for improving diagnostics around class verification and linking failures: Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ Problem: During the class redefinition process, if a class verification fails because it could not find a class referenced in the class being redefined, the printed NoClassDefFoundError error message is not very helpful. It does not print the class name for which NoClassDefFoundError was encountered, and that makes it very hard to find the real cause of redefinition failure. The proposed solution prints the class name during class linking and verification failures. Example output produced with these changes: With 'redefine' tag: ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) ???? [java] [3.243s][debug][redefine,class,load??????? ] loading name=org.apache.commons.logging.impl.Log4JLogger kind=101 (avail_mem=819540K) ???? [java] [3.244s][info ][redefine,class,load,exceptions] link_class exception: 'java/lang/NoClassDefFoundError org/apache/log4j/Priority' ???? [java] Java Result: 1 With 'verification' tag: ???? [java] [49.702s][info ][verification] Verification for org.apache.commons.logging.impl.Log4JLogger has exception pending 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' ???? [java] [49.702s][info ][verification] End class verification for: org.apache.commons.logging.impl.Log4JLogger Improved error message: ???? [java] Exception in thread "main" java.lang.InternalError: class redefinition failed: invalid class ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) ???? [java] ??? at Main.main(Unknown Source) Thanks, Poonam -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleen.phillimore at oracle.com Tue Jun 9 14:53:41 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jun 2020 10:53:41 -0400 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: References: Message-ID: <02b3de92-0b53-c063-bbe5-e35d6b69d785@oracle.com> This change looks good? to me. Coleen On 6/9/20 10:46 AM, Poonam Parhar wrote: > Hello, > > Please review this simple change for improving diagnostics around > class verification and linking failures: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 > Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ > > Problem: During the class redefinition process, if a class > verification fails because it could not find a class referenced in the > class being redefined, the printed NoClassDefFoundError error message > is not very helpful. It does not print the class name for which > NoClassDefFoundError was encountered, and that makes it very hard to > find the real cause of redefinition failure. > > The proposed solution prints the class name during class linking and > verification failures. Example output produced with these changes: > > With 'redefine' tag: > ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) > ???? [java] [3.243s][debug][redefine,class,load??????? ] loading name=org.apache.commons.logging.impl.Log4JLogger kind=101 (avail_mem=819540K) > ???? [java] [3.244s][info ][redefine,class,load,exceptions] link_class exception: 'java/lang/NoClassDefFoundError org/apache/log4j/Priority' > ???? [java] Java Result: 1 > With 'verification' tag: > ???? [java] [49.702s][info ][verification] Verification for org.apache.commons.logging.impl.Log4JLogger has exception pending 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' > ???? [java] [49.702s][info ][verification] End class verification for: org.apache.commons.logging.impl.Log4JLogger > > Improved error message: > ???? [java] Exception in thread "main" java.lang.InternalError: class redefinition failed: invalid class > ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) > ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) > ???? [java] ??? at Main.main(Unknown Source) > > Thanks, > Poonam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Tue Jun 9 16:15:09 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 9 Jun 2020 18:15:09 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. Message-ID: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> Hi all, If the direct handshake is executed by the target thread, the JNI local(s) are created in that thread but returned in the handshaking thread. They thus are not safe to use. (thread might even have exited by this point) Code: http://cr.openjdk.java.net/~rehn/8247248/v1/webrev/ Unfortunately there is no way the distinguish a local jobject vs a global. Which makes it hard to track when the jobject is global and not. Issue: https://bugs.openjdk.java.net/browse/JDK-8247248 Local testing of JDI/JVMTI and t1-5. (no real crash so there is nothing to reproduce) Thanks, Robbin From coleen.phillimore at oracle.com Tue Jun 9 16:59:23 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jun 2020 12:59:23 -0400 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: References: Message-ID: <9b55ff08-7f73-30a6-0fbd-913dc5017ec3@oracle.com> Hi, For some reason, my message filters are dropping messages. But I think Harold pointed out that if ex_msg is NULL then the logging will crash with %s in the change for both files. Coleen On 6/9/20 10:46 AM, Poonam Parhar wrote: > Hello, > > Please review this simple change for improving diagnostics around > class verification and linking failures: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 > Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ > > Problem: During the class redefinition process, if a class > verification fails because it could not find a class referenced in the > class being redefined, the printed NoClassDefFoundError error message > is not very helpful. It does not print the class name for which > NoClassDefFoundError was encountered, and that makes it very hard to > find the real cause of redefinition failure. > > The proposed solution prints the class name during class linking and > verification failures. Example output produced with these changes: > > With 'redefine' tag: > ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) > ???? [java] [3.243s][debug][redefine,class,load??????? ] loading name=org.apache.commons.logging.impl.Log4JLogger kind=101 (avail_mem=819540K) > ???? [java] [3.244s][info ][redefine,class,load,exceptions] link_class exception: 'java/lang/NoClassDefFoundError org/apache/log4j/Priority' > ???? [java] Java Result: 1 > With 'verification' tag: > ???? [java] [49.702s][info ][verification] Verification for org.apache.commons.logging.impl.Log4JLogger has exception pending 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' > ???? [java] [49.702s][info ][verification] End class verification for: org.apache.commons.logging.impl.Log4JLogger > > Improved error message: > ???? [java] Exception in thread "main" java.lang.InternalError: class redefinition failed: invalid class > ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) > ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) > ???? [java] ??? at Main.main(Unknown Source) > > Thanks, > Poonam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Jun 9 17:56:08 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 10:56:08 -0700 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jianglizhou at google.com Tue Jun 9 18:06:23 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Tue, 9 Jun 2020 11:06:23 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <1d7a3514-f5b8-015b-294d-520328bb1c95@oracle.com> References: <4de9bb9c-e83d-f33b-fc50-3431f69e46aa@oracle.com> <8fe912f1-8407-df1d-0c1f-cf37f08363db@oracle.com> <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <1d7a3514-f5b8-015b-294d-520328bb1c95@oracle.com> Message-ID: On Mon, Jun 8, 2020 at 6:36 PM Ioi Lam wrote: > > Hi Jiangli, > > I've asked this before. Do you have any performance data showing the > benefit of the JVMTI part of the optimization, when JVMTI is used? > Hi Ioi, The performance gain for this change is a constant (for any fixed number of loaded classes), with or without JVMTI agent, hope you are able to understand it. The startup data that I provided was without a JVMTI agent. However the same saving applies when an agent exists, as I've pointed out earlier already. I'll run a comparison with agent enabled and provide it to the community. > I think ultimately the JVMTI team should evaluate the patch. I CC'ed > Serguei. I think it will be helpful if there's data that shows how JVMTI > can benefit from this patch. > The serviceability-dev at openjdk.java.net mailing list has been CC'ed and I've asked for Serguei or Chris to comment in one of the earlier emails. > BTW, there's actually a race condition with the latest patch. I think > this shows just how difficult it's to get things right in this very > complicated part of the JVM. > > 1706 update_dictionary(d_hash, p_index, p_hash, > 1707 k, class_loader_h, THREAD); > 1708 } > 1709 k->eager_initialize(THREAD); > 1710 > > > >>>> HERE > > 1711 // notify jvmti > 1712 if (JvmtiExport::should_post_class_load()) { > 1713 assert(THREAD->is_Java_thread(), "thread->is_Java_thread()"); > 1714 JvmtiExport::post_class_load((JavaThread *) THREAD, k); > 1715 > 1716 } > 1717 post_class_define_event(k, loader_data); > 1718 if (k->is_shared() && k->is_linked()) { > 1719 if (JvmtiExport::should_post_class_prepare()) { > 1720 // To keep the same behavior as for dynamically loaded classes, > 1721 // lock the init_lock before posting the ClassPrepare event. > 1722 Handle h_init_lock(THREAD, k->init_lock()); > 1723 ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); > 1724 JvmtiExport::post_class_prepare((JavaThread *)THREAD, k); > 1725 } > 1726 } > 1727 } > > For a non-boot class, between the time where klass is added to the > dictionary, to where you checked for ik->is_linked(), another thread > could have called into klass->link_class_impl() and hence invoked the > ClassPrepare callback. So your code will invoke the callback again. > Please don't mix this change with non-boot class. This change only handles the archived boot class. Thanks, Jiangli > Thanks > - Ioi > > > On 6/3/20 12:06 PM, Jiangli Zhou wrote: > > Hi Ioi, > > > > Really appreciate that you think through the details! I agree with > > your analysis about the serializing effect of '_init_lock' for posting > > class_prepare events. However I don't agree that an agent can rely on > > that for class hierarchy analysis as that's VM implementation specific > > behavior, but that is a different topic and does not belong to this > > thread. > > > > Let's analyze the runtime archived boot class loading behavior as > > well. When loading a shared class, the VM first loads all it's super > > types. There are multiple locks involved during loading a boot class. > > Those include the _system_loader_lock_obj (which is used for the boot > > loader as well) and the SystemDictionary_lock. These locks are held > > when loading a boot class by the NULL loader. That ensures the same > > serializing effect for posting class_prepare events after runtime > > restoration. Please let me know if you see any hole here. > > > > Thanks! > > Jiangli > > > > On Tue, Jun 2, 2020 at 10:46 PM Ioi Lam wrote: > >> > >> > >> On 6/2/20 10:16 PM, David Holmes wrote: > >>> Hi Ioi, > >>> > >>> On 3/06/2020 2:55 pm, Ioi Lam wrote: > >>>> > >>>> On 5/27/20 11:13 PM, David Holmes wrote: > >>>>> Hi Jiangli, > >>>>> > >>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: > >>>>>> > >>>>>>> I was going to take the suggestion, but realized that it would add > >>>>>>> unnecessary complications for archived boot classes with class > >>>>>>> pre-initialization support. Some agents may set > >>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to support > >>>>>>> class pre-init uniformly for archived boot classes with > >>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. > >>>>>> This would introduce behavioral changes when JVMTI is enabled: > >>>>>> > >>>>>> + The order of JvmtiExport::post_class_prepare is different than > >>>>>> before > >>>>>> + JvmtiExport::post_class_prepare may be called for a class that > >>>>>> was not called before (if the class is never linked during run time) > >>>>>> + JvmtiExport::post_class_prepare was called inside the init_lock, > >>>>>> now it's called outside of the init_lock > >>>>> I have to say I share Ioi's concerns here. This change will impact > >>>>> JVM TI agents in a way we can't be sure of. From a specification > >>>>> perspective I think we are fine as linking can be lazy or eager, so > >>>>> there's no implied order either. But this would be a behavioural > >>>>> change that will be observable by agents. (I'm less concerned about > >>>>> the init_lock situation as it seems potentially buggy to me to call > >>>>> out to an agent with the init_lock held in the first place! I find > >>>>> it hard to imagine an agent only working correctly if the init_lock > >>>>> is held.) > >>>> David, > >>>> > >>>> The init_lock has a serializing effect. The callback for a subclass > >>>> will not be executed until the callback for its super class has been > >>>> finished. > >>> Sorry I don't see that is the case. The init_lock for the subclass is > >>> distinct from the init_lock of the superclass, and linking of > >>> subclasses and superclasses is independent. > >> > >> In InstanceKlass::link_class_impl, you first link all of your super classes. > >> > >> If another thread is already linking your super class, you will block on > >> that superclass's init_lock. > >> > >> Of course, I may be wrong and my analysis may be bogus. But I hope you > >> can appreciate that this is not going to be a trivial change to analyze. > >> > >> Thanks > >> - Ioi > >>> David > >>> ----- > >>> > >>>> With the proposed patch, the callback for both the super class and > >>>> subclass can proceed in parallel. So if an agent performs class > >>>> hierarchy analysis, for example, it may need to perform extra > >>>> synchronization. > >>>> > >>>> This is just one example that I can think of. I am sure there are > >>>> other issues that we have not thought about. > >>>> > >>>> The fact is we are dealing with arbitrary code in the callbacks, and > >>>> we are changing the conditions of how they are called. The calls > >>>> happen inside very delicate code (class loading, system dictionary). > >>>> I am reluctant to do the due diligence, which is substantial, of > >>>> verifying that this is a safe change, unless we have a really > >>>> compelling reason to do so. > >>>> > >>>> Thanks > >>>> - Ioi > >>>> > >>>> > From serguei.spitsyn at oracle.com Tue Jun 9 18:30:18 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 11:30:18 -0700 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> Message-ID: <014b74c7-2bfd-ea31-0890-acccc52df86a@oracle.com> Hi Robbin, Nice catch! The fix looks good in general. I'd be nice to add comments to explain why these global refs are created. Thanks, Serguei On 6/9/20 09:15, Robbin Ehn wrote: > Hi all, > > If the direct handshake is executed by the target thread, the JNI > local(s) are created in that thread but returned in the handshaking > thread. > They thus are not safe to use. (thread might even have exited by this > point) > > Code: > http://cr.openjdk.java.net/~rehn/8247248/v1/webrev/ > > Unfortunately there is no way the distinguish a local jobject vs a > global. Which makes it hard to track when the jobject is global and not. > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8247248 > > Local testing of JDI/JVMTI and t1-5. > (no real crash so there is nothing to reproduce) > > Thanks, Robbin From poonam.bajaj at oracle.com Tue Jun 9 18:49:37 2020 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Tue, 9 Jun 2020 11:49:37 -0700 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: <9b55ff08-7f73-30a6-0fbd-913dc5017ec3@oracle.com> References: <9b55ff08-7f73-30a6-0fbd-913dc5017ec3@oracle.com> Message-ID: Thanks Coleen! I will fix the null string issue. regards, Poonam On 6/9/20 9:59 AM, coleen.phillimore at oracle.com wrote: > > Hi, For some reason, my message filters are dropping messages. > > But I think Harold pointed out that if ex_msg is NULL then the logging > will crash with %s in the change for both files. > > Coleen > > On 6/9/20 10:46 AM, Poonam Parhar wrote: >> Hello, >> >> Please review this simple change for improving diagnostics around >> class verification and linking failures: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 >> Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ >> >> Problem: During the class redefinition process, if a class >> verification fails because it could not find a class referenced in >> the class being redefined, the printed NoClassDefFoundError error >> message is not very helpful. It does not print the class name for >> which NoClassDefFoundError was encountered, and that makes it very >> hard to find the real cause of redefinition failure. >> >> The proposed solution prints the class name during class linking and >> verification failures. Example output produced with these changes: >> >> With 'redefine' tag: >> ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) >> ???? [java] [3.243s][debug][redefine,class,load??????? ] loading name=org.apache.commons.logging.impl.Log4JLogger kind=101 (avail_mem=819540K) >> ???? [java] [3.244s][info ][redefine,class,load,exceptions] link_class exception: 'java/lang/NoClassDefFoundError org/apache/log4j/Priority' >> ???? [java] Java Result: 1 >> With 'verification' tag: >> ???? [java] [49.702s][info ][verification] Verification for org.apache.commons.logging.impl.Log4JLogger has exception pending 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' >> ???? [java] [49.702s][info ][verification] End class verification for: org.apache.commons.logging.impl.Log4JLogger >> >> Improved error message: >> ???? [java] Exception in thread "main" java.lang.InternalError: class redefinition failed: invalid class >> ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) >> ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) >> ???? [java] ??? at Main.main(Unknown Source) >> >> Thanks, >> Poonam >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From poonam.bajaj at oracle.com Tue Jun 9 18:49:59 2020 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Tue, 9 Jun 2020 11:49:59 -0700 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: References: Message-ID: Thanks Serguei! regards, Poonam On 6/9/20 10:56 AM, serguei.spitsyn at oracle.com wrote: > Hi Poonam, > > Thank you for taking care about this! > It looks good besides the comment from Harold and Coleen about ex_msg > can be equal to NULL. > > Thanks, > Serguei > > On 6/9/20 07:46, Poonam Parhar wrote: >> Hello, >> >> Please review this simple change for improving diagnostics around >> class verification and linking failures: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 >> Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ >> >> Problem: During the class redefinition process, if a class >> verification fails because it could not find a class referenced in >> the class being redefined, the printed NoClassDefFoundError error >> message is not very helpful. It does not print the class name for >> which NoClassDefFoundError was encountered, and that makes it very >> hard to find the real cause of redefinition failure. >> >> The proposed solution prints the class name during class linking and >> verification failures. Example output produced with these changes: >> >> With 'redefine' tag: >> ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) >> ???? [java] [3.243s][debug][redefine,class,load??????? ] loading name=org.apache.commons.logging.impl.Log4JLogger kind=101 (avail_mem=819540K) >> ???? [java] [3.244s][info ][redefine,class,load,exceptions] link_class exception: 'java/lang/NoClassDefFoundError org/apache/log4j/Priority' >> ???? [java] Java Result: 1 >> With 'verification' tag: >> ???? [java] [49.702s][info ][verification] Verification for org.apache.commons.logging.impl.Log4JLogger has exception pending 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' >> ???? [java] [49.702s][info ][verification] End class verification for: org.apache.commons.logging.impl.Log4JLogger >> >> Improved error message: >> ???? [java] Exception in thread "main" java.lang.InternalError: class redefinition failed: invalid class >> ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) >> ???? [java] ??? at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) >> ???? [java] ??? at Main.main(Unknown Source) >> >> Thanks, >> Poonam >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Jun 9 19:34:37 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 12:34:37 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: References: Message-ID: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> An HTML attachment was scrubbed... URL: From leonid.mesnik at oracle.com Tue Jun 9 19:45:52 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Tue, 9 Jun 2020 12:45:52 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> Message-ID: <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/jtreg/vmTestbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html I see that isJFRActive() depends on "nsk.share.jdi.HeapwalkingDebuggee". It is not going to work of debugee is not "nsk.share.jdi.HeapwalkingDebuggee". Shouldn't it be placed in HeapWalkingDebugger? Leonid On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: > Hi Fairoz, > > LGTM. > > Thanks, > Serguei > > > On 6/8/20 21:20, Fairoz Matte wrote: >> Hi Serguei, >> >> Thanks for the clarifications, >> I have incorporated the 2nd suggestion, below is the webrev, >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ >> >> Thanks, >> Fairoz >> >> From: Serguei Spitsyn >> Sent: Monday, June 8, 2020 10:34 PM >> To: Fairoz Matte ; Erik Gahlin >> >> Cc: serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >> incorrect and corresponsing logic seems to be broken >> >> Hi Fairoz, >> >> >> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: >> Hi Fairoz, >> >> There are two different isJFRActive() methods, one is on debuggee >> side and another on the debugger side. >> The one on debuggee side is better to keep in Debuggee.java (where it >> was before) instead of moving it to HeapwalkingDebuggee.java. >> It is okay to keep the call to it in the HeapwalkingDebuggee.java. >> >> Please, skip this suggestion as Debugger.java is not one of supers of >> HeapwalkingDebuggee.java as I've assumed. >> >> Thanks, >> Serguei >> >> >> +??? protected boolean isJFRActive() { >> +??????? boolean isJFRActive = false; >> +??????? ReferenceType referenceType = >> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); >> +??????? if (referenceType == null) >> +?????????? throw new RuntimeException("Debugeee is not initialized >> yet"); >> + >> +??????? Field isJFRActiveFld = >> referenceType.fieldByName("isJFRActive"); >> +??????? isJFRActive = >> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >> +??????? return isJFRActive; >> ????? } >> It is better to remove the line: >> +??????? boolean isJFRActive = false; >> and just change this one: >> +??????? boolean isJFRActive = >> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >> >> Otherwise, it looks good to me. >> I hope, it really works now. >> >> Thanks, >> Serguei >> >> On 6/8/20 00:26, Fairoz Matte wrote: >> Hi Serguei, Erik, >> ? Thanks for the reviews, >> Below webrev contains the suggested changes, >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ >> ? The only thing I couldn?t do is to keep the local copy of >> isJFRActive() in HeapwalkingDebugger, >> The method is called in debugee code. >> In debugger, we have access to debugee before test started or after >> test completes. >> isJFRActive() method need to be executed during the test execution. >> Hence I didn?t find place to initialize and cannot make local copy. >> ? Thanks, >> Fairoz >> ? From: Serguei Spitsyn >> Sent: Tuesday, June 2, 2020 7:57 AM >> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin >> mailto:erik.gahlin at oracle.com >> Cc: mailto:serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >> incorrect and corresponsing logic seems to be broken >> ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: >> Hi Fairoz, >> >> It looks okay in general. >> But I'm not sure this check is going to work. >> The problem is the HeapwalkingDebuggee.useStrictCheck method is >> invoked in the >> context of the HeapwalkingDebugger process, not the >> HeapwalkingDebuggee process. >> >> Probably, you wanted to get this bit of information from the Debuggee >> process. >> The debuggee has to evaluate it itself and store in some field. >> The debugger should use the JDI to get this value from the debuggee. >> >> Thanks, >> Serguei >> >> I'm not sure, what exactly you wanted to do here. >> It can occasionally work for you as long as both processes are run >> with the same options. >> >> Thanks, >> Serguei >> >> >> On 6/1/20 08:52, Fairoz Matte wrote: >> Hi Erik, >> ? Thanks for the review, below is the updated webrev. >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ >> ? Thanks, >> Fairoz >> ? -----Original Message----- >> From: Erik Gahlin >> Sent: Monday, June 1, 2020 4:26 PM >> To: Fairoz Matte mailto:fairoz.matte at oracle.com >> Cc: mailto:serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >> incorrect >> and corresponsing logic seems to be broken >> ? Hi Fairoz, >> ? What I think you need to do is something like this: >> ? ? ??????? if (className.equals("java.lang.Thread")) { >> ? ??????????? return !isJfrInitialized(); >> ? ??????? } >> ? ... >> ? ? ??? private static boolean isJfrInitialized() { >> ? ??????? try { >> ? ??????????? Class clazz = Class.forName("jdk.jfr.FlightRecorder"); >> ? ??????????? Method method = clazz.getDeclaredMethod("isInitialized", >> new Class[0]); >> ? ??????????? return (boolean) method.invoke(null, new Object[0]); >> ? ??????? } catch (Exception e) { >> ? ??????????? return false; >> ? ??????? } >> ? ??? } >> ? Erik >> ? On 2020-06-01 12:30, Fairoz Matte wrote: >> Hi Erik, >> ? Thanks for your quick response, >> Below is the updated webrev to handle if jfr module is not present >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ >> ? Thanks, >> Fairoz >> ? -----Original Message----- >> From: Erik Gahlin >> Sent: Monday, June 1, 2020 2:31 PM >> To: Fairoz Matte mailto:fairoz.matte at oracle.com >> Cc: mailto:serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >> incorrect and corresponsing logic seems to be broken >> ? Hi Fairoz, >> ? If the test needs to run with builds where the JFR module is not >> present(?), you need to do the check using reflection. >> ? If not, looks good. >> ? Erik >> ? On 1 Jun 2020, at 10:27, Fairoz Matte >> mailto:fairoz.matte at oracle.com wrote: >> ? Hi, >> ? Please review this small test infra change to identify at runtime >> the JFR is >> active or not. >> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 >> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ >> ? Thanks, >> Fairoz >> > From leonid.mesnik at oracle.com Tue Jun 9 19:58:55 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Tue, 9 Jun 2020 12:58:55 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> References: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> Message-ID: Hi On 6/9/20 12:34 PM, serguei.spitsyn at oracle.com wrote: > Hi Leonid, > > Thank you for taking care about this! > It looks good in general. > However, I think, a similar return is needed in more cases. > > One example: > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/jvmti/Exception/exception001/exception001.cpp.frames.html > > 99 err = jvmti_env->GetMethodDeclaringClass(method, &cls); > 100 if (err != JVMTI_ERROR_NONE) { > 101 printf("(GetMethodDeclaringClass#t) unexpected error: %s (%d)\n", > 102 TranslateError(err), err); > 103 result = STATUS_FAILED; > 104 return; > 105 } > 106 err = jvmti_env->GetClassSignature(cls, &ex.t_cls, &generic); > 107 if (err != JVMTI_ERROR_NONE) { > 108 printf("(GetClassSignature#t) unexpected error: %s (%d)\n", > 109 TranslateError(err), err); > 110 result = STATUS_FAILED; > 111 } > 112 err = jvmti_env->GetMethodName(method, > 113 &ex.t_name, &ex.t_sig, &generic); > 114 if (err != JVMTI_ERROR_NONE) { > 115 printf("(GetMethodName#t) unexpected error: %s (%d)\n", > 116 TranslateError(err), err); > 117 result = STATUS_FAILED; > 118 } > 119 ex.t_loc = location; > 120 err = jvmti_env->GetMethodDeclaringClass(catch_method, &cls); > 121 if (err != JVMTI_ERROR_NONE) { > 122 printf("(GetMethodDeclaringClass#c) unexpected error: %s (%d)\n", > 123 TranslateError(err), err); > 124 result = STATUS_FAILED; > 125 return; > 126 } > 127 err = jvmti_env->GetClassSignature(cls, &ex.c_cls, &generic); > 128 if (err != JVMTI_ERROR_NONE) { > 129 printf("(GetClassSignature#c) unexpected error: %s (%d)\n", > 130 TranslateError(err), err); > 131 result = STATUS_FAILED; > 132 } > 133 err = jvmti_env->GetMethodName(catch_method, > 134 &ex.c_name, &ex.c_sig, &generic); > 135 if (err != JVMTI_ERROR_NONE) { > 136 printf("(GetMethodName#c) unexpected error: %s (%d)\n", > 137 TranslateError(err), err); > 138 result = STATUS_FAILED; > 139 } > > In the fragment above you added return for JVMTI > GetMethodDeclaringClass error. > But GetMethodName and GetClassSignature can be also problematic as the > returned names are printed below. > It seems to be more safe and even simpler to add returns for such > cases as well. > Otherwise, the code reader is puzzled why there is a return in one > failure case and there is no such return in another. It is a good question if we want to fix such places or even fails with first JVMTI failure. (I even started to fix it in the such way but find that existing tests usually don't fail always). The difference is that test tries to reuse "cls" in other JVMTI function and going to generate very misleading crash. How it just tries to compare ex and exs values. So test might crash but clearly outside of JVMTI function and with some useful info. So I am not sure if fixing these lines improve test failure handling. Assuming that most of existing tests fails early only if going to re-use possible corrupted data I propose to fix this separately. We need to figure out when to fail or to try to finish. Leonid > > Thanks, > Serguei > > > On 6/1/20 21:33, Leonid Mesnik wrote: >> Hi >> >> Could you please review following fix which stop test execution if >> JVMTI function returns error. The test fails anyway however using >> potentially bad data in JVMTI function might cause misleading crash >> failures. The hs_err will contains the stacktrace not with problem >> function but with function called with corrupted data. Most of tests >> already has such behavior but not all. Also I fixed a couple of tests >> to finish if they haven't managed to suspend thread. >> >> I've updated only tests which try to use corrupted data in JVMTI >> functions after errors. I haven't updated tests which just >> compare/print values from erroring JVMTI functions. The crash in >> strcmp/println is not so misleading and might be point to real issue. >> >> webrev: http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/ >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8242891 >> >> Leonid >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Tue Jun 9 20:22:54 2020 From: christoph.langer at sap.com (Langer, Christoph) Date: Tue, 9 Jun 2020 20:22:54 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> , Message-ID: Hi Ralf, I finally managed to fully read through your change. Very nice piece of work. I only found a few minor nits which would be nice if you could address them before pushing. But no need for further webrev. Here we go: workgroup.cpp - update copyright year L111: little spelling issue: forergound -> foreground diagnosticCommand.cpp L509: spelling recommneded -> recommended L510: Initialization of default value ("1") is not necessary as current implementation wouldn't allow the parameter -gz without value. heapDumperCompression.hpp and heapDumperCompression.cpp: License header says: Copyright (c) 2005, 2020, Oracle and/or its affiliates. All rights reserved. However, it's a net new file, so it should just be 2020, Also, since this is new code, coming from SAP, you should credit SAP in the copyright header (same way as you have done it in the test files). test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java: L88: new ArrayList<> (diamond operator without type) Thanks & Best regards Christoph > -----Original Message----- > From: Schmelter, Ralf > Sent: Montag, 8. Juni 2020 11:38 > To: Lindenmaier, Goetz ; Langer, Christoph > > Cc: serviceability-dev at openjdk.java.net; hotspot-runtime- > dev at openjdk.java.net runtime ; > David Holmes ; serguei.spitsyn at oracle.com; Ioi > Lam > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > Hi Goetz, > > > What kind of tests did you run? > > The jdk submit repo, the JCK tests (apart from API) and the jtreg tests on > Windows x86/64, MacOS X, linux on x86/64, ppcle, ppcbe, zarch and aarch64 > and on AIX. > > If there aren't any other concerns, I would like to commit this this change on > Wednesday. > > Best regards, > Ralf > > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Friday, 5 June 2020 18:02 > To: Schmelter, Ralf ; Langer, Christoph > > Cc: serviceability-dev at openjdk.java.net; hotspot-runtime- > dev at openjdk.java.net runtime > Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap > dump > > Hi Ralf, > > Thanks for the quick reply and all the fixes. > The changes to the workgroup are ok. > Reviewed. (An incremental webrev would have helped ??) > > What kind of tests did you run? > > > Yes, the buffer is now smaller (1M) versus the original (8M). You need > > to be able to at least allocate one buffer or you get an error (this > > is handled in the CompressionBackend ctor). You then allocate > > additional buffers as needed (we want a new buffer, but there is no > > free one), until we have a buffer for every worker thread or until > > the allocation of the buffer failed. In this case some threads will > > be idle, since we cannot have a buffer for each thread. > Ok, that's what I thought. Thanks for the explanation. > > > >?Another question. > > > The basic dumping is done sequential, right? The comression > > > is parallel. Is there a tradeoff in #of threads where > > > the compression is faster than writing? > > Yes. The compression and writing is done parallel. Depeding on > > the compression level and the speed of your harddrive, not all > > threads will be active all the time. But since we reuse the GC threads > > this should not matter. And the relative poor performance of > > deflate() ensures that at least 5 to 10 threads will probably always > > be active ;) > Ok, thanks. > > Best regards, > Goetz. From igor.ignatyev at oracle.com Tue Jun 9 23:47:51 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 9 Jun 2020 16:47:51 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API Message-ID: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 > > 38 lines changed: 8 ins; 16 del; 14 mod; Hi all, could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? testing: test/hotspot/jtreg/serviceability webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 Thanks, -- Igor From brian.burkhalter at oracle.com Wed Jun 10 00:18:38 2020 From: brian.burkhalter at oracle.com (Brian Burkhalter) Date: Tue, 9 Jun 2020 17:18:38 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> Message-ID: <2AF18884-0407-455C-9BE2-4355D3D9AD6C@oracle.com> Hi Igor, > On Jun 9, 2020, at 4:47 PM, Igor Ignatyev wrote: > > could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? > > testing: test/hotspot/jtreg/serviceability > webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 The NIO changes look all right to me. Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexey.menkov at oracle.com Wed Jun 10 01:36:11 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Tue, 9 Jun 2020 18:36:11 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> Message-ID: Hi Igor, In SATestUtils.java you do var bb = ... Files.readAllBytes(...) ... and then use bb[0] if the file has 0 length, old code throws EOFException and new one will throw IndexOutOfBoundsException. And looks like the caller doesn't expect it (it catches IOException). --alex On 06/09/2020 16:47, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >> >> 38 lines changed: 8 ins; 16 del; 14 mod; > > Hi all, > > could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? > > testing: test/hotspot/jtreg/serviceability > webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 > > Thanks, > -- Igor > From serguei.spitsyn at oracle.com Wed Jun 10 01:59:54 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 18:59:54 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: References: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Jun 10 02:30:15 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 10 Jun 2020 12:30:15 +1000 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> Message-ID: <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> Hi Robbin, On 10/06/2020 2:15 am, Robbin Ehn wrote: > Hi all, > > If the direct handshake is executed by the target thread, the JNI > local(s) are created in that thread but returned in the handshaking > thread. > They thus are not safe to use. (thread might even have exited by this > point) > > Code: > http://cr.openjdk.java.net/~rehn/8247248/v1/webrev/ > > Unfortunately there is no way the distinguish a local jobject vs a > global. Which makes it hard to track when the jobject is global and not. I have some comments/concerns that I've added to the bug report. Switching from local refs to global refs adds a bit of overhead and will likely impact the performance here. Not a showstopper but would be nice to avoid if possible as it seems a bit of a kludge to communicate values across threads via global refs. In the old VMOperation solution the VMThread used the JvmtiEnv* of the calling thread so that the local refs were in the right place. Can we do the same with the handshake code? It will mean restoring the "calling thread" argument to the handshake operation but I think this is a workable approach. (The removal of the calling thread argument should have rung alarm bells :( .) Or, could this be case where we don't want the target thread to execute the handshake and we need a way to mark the handshake operation as such? That's a bigger change of course. Thanks, David ----- > Issue: > https://bugs.openjdk.java.net/browse/JDK-8247248 > > Local testing of JDI/JVMTI and t1-5. > (no real crash so there is nothing to reproduce) > > Thanks, Robbin From igor.ignatyev at oracle.com Wed Jun 10 03:11:48 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 9 Jun 2020 20:11:48 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> Message-ID: Hi Alex, as far as I can see, the caller just rethrows IOException as RuntimeException, so I don't think throwing IndexOutOfBoundsException would be much different, albeit it will be a bit more cryptic. yet given the content of /proc/sys/kernel/yama/ptrace_scope and /sys/fs/selinux/booleans/deny_ptrace is part of linux kernel contract, I doubt we will encounter IIOOBE in any reasonable setups. however, if you want I can check the length of bb arrays at L#171 and L#190 and throw an Error w/ message suggesting that something went completely wrong. -- Igor > On Jun 9, 2020, at 6:36 PM, Alex Menkov wrote: > > Hi Igor, > > In SATestUtils.java you do > > var bb = ... Files.readAllBytes(...) ... > and then use bb[0] > > if the file has 0 length, old code throws EOFException and new one will throw IndexOutOfBoundsException. > And looks like the caller doesn't expect it (it catches IOException). > > --alex > > On 06/09/2020 16:47, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>> >>> 38 lines changed: 8 ins; 16 del; 14 mod; >> Hi all, >> could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? >> testing: test/hotspot/jtreg/serviceability >> webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 >> Thanks, >> -- Igor From igor.ignatyev at oracle.com Wed Jun 10 03:17:20 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 9 Jun 2020 20:17:20 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: <2AF18884-0407-455C-9BE2-4355D3D9AD6C@oracle.com> References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> <2AF18884-0407-455C-9BE2-4355D3D9AD6C@oracle.com> Message-ID: Hi Brian, thank you for your review. Cheers, -- Igor > On Jun 9, 2020, at 5:18 PM, Brian Burkhalter wrote: > > Hi Igor, > >> On Jun 9, 2020, at 4:47 PM, Igor Ignatyev > wrote: >> >> could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? >> >> testing: test/hotspot/jtreg/serviceability >> webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 > The NIO changes look all right to me. > > Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Jun 10 05:08:13 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 22:08:13 -0700 Subject: Retroactive CSR review request(XS): 8246811: Update JDWP, JDI and Instrumentation specs for Record attribute Message-ID: <31daadd5-d803-fe81-0629-0e78b3013172@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Jun 10 05:29:46 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 10 Jun 2020 15:29:46 +1000 Subject: Retroactive CSR review request(XS): 8246811: Update JDWP, JDI and Instrumentation specs for Record attribute In-Reply-To: <31daadd5-d803-fe81-0629-0e78b3013172@oracle.com> References: <31daadd5-d803-fe81-0629-0e78b3013172@oracle.com> Message-ID: <88fb9235-2e79-0fc4-cec5-34d5fda78476@oracle.com> Hi Serguei, I've added my review as well. The request can be Finalized. Thanks, David On 10/06/2020 3:08 pm, serguei.spitsyn at oracle.com wrote: > Please, review a retroactive CSR for fix integrated to 14: > https://bugs.openjdk.java.net/browse/JDK-8235360 > > CSR: > https://bugs.openjdk.java.net/browse/JDK-8246811 > > > Summary: > ? It is formal public request. > ? The CSR was already reviewed by Harold but other reviews are welcome. > The update is to add a clarification to the JDI, JDWP and Instrumentation > ? specs that introduced in JDK 14 class file attribute |Record| can not be > ? changed in a class redefinition or retransformation. > > Thanks, > Serguei From serguei.spitsyn at oracle.com Wed Jun 10 05:30:42 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 22:30:42 -0700 Subject: Retroactive CSR review request(XS): 8246811: Update JDWP, JDI and Instrumentation specs for Record attribute In-Reply-To: <88fb9235-2e79-0fc4-cec5-34d5fda78476@oracle.com> References: <31daadd5-d803-fe81-0629-0e78b3013172@oracle.com> <88fb9235-2e79-0fc4-cec5-34d5fda78476@oracle.com> Message-ID: <14426245-6bd3-921b-0877-1bfb5358e1af@oracle.com> Thanks, David! Serguei On 6/9/20 22:29, David Holmes wrote: > Hi Serguei, > > I've added my review as well. The request can be Finalized. > > Thanks, > David > > On 10/06/2020 3:08 pm, serguei.spitsyn at oracle.com wrote: >> Please, review a retroactive CSR for fix integrated to 14: >> https://bugs.openjdk.java.net/browse/JDK-8235360 >> >> CSR: >> https://bugs.openjdk.java.net/browse/JDK-8246811 >> >> >> Summary: >> ?? It is formal public request. >> ?? The CSR was already reviewed by Harold but other reviews are welcome. >> The update is to add a clarification to the JDI, JDWP and >> Instrumentation >> ?? specs that introduced in JDK 14 class file attribute |Record| can >> not be >> ?? changed in a class redefinition or retransformation. >> >> Thanks, >> Serguei From fairoz.matte at oracle.com Wed Jun 10 06:00:23 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Tue, 9 Jun 2020 23:00:23 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> Message-ID: <3fe586cf-d485-4738-a432-92d3d9aa52da@default> Hi Leonid, The call isJFRActive() need to be executed on HeapwalkingDebuggee side. This is what my understanding is. Thanks, Fairoz > -----Original Message----- > From: Leonid Mesnik > Sent: Wednesday, June 10, 2020 1:16 AM > To: Serguei Spitsyn ; Fairoz Matte > ; Erik Gahlin > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect > and corresponsing logic seems to be broken > > http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/jtreg/vmT > estbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html > > I see that isJFRActive() depends on "nsk.share.jdi.HeapwalkingDebuggee". > It is not going to work of debugee is not "nsk.share.jdi.HeapwalkingDebuggee". > > Shouldn't it be placed in HeapWalkingDebugger? > > Leonid > > On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: > > Hi Fairoz, > > > > LGTM. > > > > Thanks, > > Serguei > > > > > > On 6/8/20 21:20, Fairoz Matte wrote: > >> Hi Serguei, > >> > >> Thanks for the clarifications, > >> I have incorporated the 2nd suggestion, below is the webrev, > >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ > >> > >> Thanks, > >> Fairoz > >> > >> From: Serguei Spitsyn > >> Sent: Monday, June 8, 2020 10:34 PM > >> To: Fairoz Matte ; Erik Gahlin > >> > >> Cc: serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > >> incorrect and corresponsing logic seems to be broken > >> > >> Hi Fairoz, > >> > >> > >> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: > >> Hi Fairoz, > >> > >> There are two different isJFRActive() methods, one is on debuggee > >> side and another on the debugger side. > >> The one on debuggee side is better to keep in Debuggee.java (where it > >> was before) instead of moving it to HeapwalkingDebuggee.java. > >> It is okay to keep the call to it in the HeapwalkingDebuggee.java. > >> > >> Please, skip this suggestion as Debugger.java is not one of supers of > >> HeapwalkingDebuggee.java as I've assumed. > >> > >> Thanks, > >> Serguei > >> > >> > >> +??? protected boolean isJFRActive() { > >> +??????? boolean isJFRActive = false; > >> +??????? ReferenceType referenceType = > >> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); > >> +??????? if (referenceType == null) > >> +?????????? throw new RuntimeException("Debugeee is not initialized > >> yet"); > >> + > >> +??????? Field isJFRActiveFld = > >> referenceType.fieldByName("isJFRActive"); > >> +??????? isJFRActive = > >> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > >> +??????? return isJFRActive; > >> ????? } > >> It is better to remove the line: > >> +??????? boolean isJFRActive = false; > >> and just change this one: > >> +??????? boolean isJFRActive = > >> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > >> > >> Otherwise, it looks good to me. > >> I hope, it really works now. > >> > >> Thanks, > >> Serguei > >> > >> On 6/8/20 00:26, Fairoz Matte wrote: > >> Hi Serguei, Erik, > >> ? Thanks for the reviews, > >> Below webrev contains the suggested changes, > >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ > >> ? The only thing I couldn?t do is to keep the local copy of > >> isJFRActive() in HeapwalkingDebugger, The method is called in debugee > >> code. > >> In debugger, we have access to debugee before test started or after > >> test completes. > >> isJFRActive() method need to be executed during the test execution. > >> Hence I didn?t find place to initialize and cannot make local copy. > >> ? Thanks, > >> Fairoz > >> ? From: Serguei Spitsyn > >> Sent: Tuesday, June 2, 2020 7:57 AM > >> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin > >> mailto:erik.gahlin at oracle.com > >> Cc: mailto:serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > >> incorrect and corresponsing logic seems to be broken > >> ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: > >> Hi Fairoz, > >> > >> It looks okay in general. > >> But I'm not sure this check is going to work. > >> The problem is the HeapwalkingDebuggee.useStrictCheck method is > >> invoked in the context of the HeapwalkingDebugger process, not the > >> HeapwalkingDebuggee process. > >> > >> Probably, you wanted to get this bit of information from the Debuggee > >> process. > >> The debuggee has to evaluate it itself and store in some field. > >> The debugger should use the JDI to get this value from the debuggee. > >> > >> Thanks, > >> Serguei > >> > >> I'm not sure, what exactly you wanted to do here. > >> It can occasionally work for you as long as both processes are run > >> with the same options. > >> > >> Thanks, > >> Serguei > >> > >> > >> On 6/1/20 08:52, Fairoz Matte wrote: > >> Hi Erik, > >> ? Thanks for the review, below is the updated webrev. > >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ > >> ? Thanks, > >> Fairoz > >> ? -----Original Message----- > >> From: Erik Gahlin > >> Sent: Monday, June 1, 2020 4:26 PM > >> To: Fairoz Matte mailto:fairoz.matte at oracle.com > >> Cc: mailto:serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > >> incorrect and corresponsing logic seems to be broken > >> ? Hi Fairoz, > >> ? What I think you need to do is something like this: > >> ? ? ??????? if (className.equals("java.lang.Thread")) { > >> ? ??????????? return !isJfrInitialized(); > >> ? ??????? } > >> ? ... > >> ? ? ??? private static boolean isJfrInitialized() { > >> ? ??????? try { > >> ? ??????????? Class clazz = > >> Class.forName("jdk.jfr.FlightRecorder"); > >> ? ??????????? Method method = > >> clazz.getDeclaredMethod("isInitialized", > >> new Class[0]); > >> ? ??????????? return (boolean) method.invoke(null, new Object[0]); > >> ? ??????? } catch (Exception e) { > >> ? ??????????? return false; > >> ? ??????? } > >> ? ??? } > >> ? Erik > >> ? On 2020-06-01 12:30, Fairoz Matte wrote: > >> Hi Erik, > >> ? Thanks for your quick response, > >> Below is the updated webrev to handle if jfr module is not present > >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ > >> ? Thanks, > >> Fairoz > >> ? -----Original Message----- > >> From: Erik Gahlin > >> Sent: Monday, June 1, 2020 2:31 PM > >> To: Fairoz Matte mailto:fairoz.matte at oracle.com > >> Cc: mailto:serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > >> incorrect and corresponsing logic seems to be broken > >> ? Hi Fairoz, > >> ? If the test needs to run with builds where the JFR module is not > >> present(?), you need to do the check using reflection. > >> ? If not, looks good. > >> ? Erik > >> ? On 1 Jun 2020, at 10:27, Fairoz Matte > >> mailto:fairoz.matte at oracle.com wrote: > >> ? Hi, > >> ? Please review this small test infra change to identify at runtime > >> the JFR is active or not. > >> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > >> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > >> ? Thanks, > >> Fairoz > >> > > From serguei.spitsyn at oracle.com Wed Jun 10 06:11:36 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 9 Jun 2020 23:11:36 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <3fe586cf-d485-4738-a432-92d3d9aa52da@default> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> <3fe586cf-d485-4738-a432-92d3d9aa52da@default> Message-ID: <7aceddb3-1984-32b8-d0ae-8054a59dbcee@oracle.com> Hi Fairoz, It is confusing there is methods with the same name isJFRActive on both debuggee and debugger side. Leonid is talking about the isJFRActive that belongs to the debugger. He suggests to move this method from the TestDebuggerType2 to HeapWalkingDebugger. The reason is the HeapWalkingDebugger should have a knowledge about the HeapWalkingDebuggee, not its super class TestDebuggerType2. It looks like a good suggestion to me. Thanks, Serguei On 6/9/20 23:00, Fairoz Matte wrote: > Hi Leonid, > > The call isJFRActive() need to be executed on HeapwalkingDebuggee side. > This is what my understanding is. > > Thanks, > Fairoz > >> -----Original Message----- >> From: Leonid Mesnik >> Sent: Wednesday, June 10, 2020 1:16 AM >> To: Serguei Spitsyn ; Fairoz Matte >> ; Erik Gahlin >> Cc: serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect >> and corresponsing logic seems to be broken >> >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/jtreg/vmT >> estbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html >> >> I see that isJFRActive() depends on "nsk.share.jdi.HeapwalkingDebuggee". >> It is not going to work of debugee is not "nsk.share.jdi.HeapwalkingDebuggee". >> >> Shouldn't it be placed in HeapWalkingDebugger? >> >> Leonid >> >> On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Fairoz, >>> >>> LGTM. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/8/20 21:20, Fairoz Matte wrote: >>>> Hi Serguei, >>>> >>>> Thanks for the clarifications, >>>> I have incorporated the 2nd suggestion, below is the webrev, >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ >>>> >>>> Thanks, >>>> Fairoz >>>> >>>> From: Serguei Spitsyn >>>> Sent: Monday, June 8, 2020 10:34 PM >>>> To: Fairoz Matte ; Erik Gahlin >>>> >>>> Cc: serviceability-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >>>> incorrect and corresponsing logic seems to be broken >>>> >>>> Hi Fairoz, >>>> >>>> >>>> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: >>>> Hi Fairoz, >>>> >>>> There are two different isJFRActive() methods, one is on debuggee >>>> side and another on the debugger side. >>>> The one on debuggee side is better to keep in Debuggee.java (where it >>>> was before) instead of moving it to HeapwalkingDebuggee.java. >>>> It is okay to keep the call to it in the HeapwalkingDebuggee.java. >>>> >>>> Please, skip this suggestion as Debugger.java is not one of supers of >>>> HeapwalkingDebuggee.java as I've assumed. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> +??? protected boolean isJFRActive() { >>>> +??????? boolean isJFRActive = false; >>>> +??????? ReferenceType referenceType = >>>> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); >>>> +??????? if (referenceType == null) >>>> +?????????? throw new RuntimeException("Debugeee is not initialized >>>> yet"); >>>> + >>>> +??????? Field isJFRActiveFld = >>>> referenceType.fieldByName("isJFRActive"); >>>> +??????? isJFRActive = >>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >>>> +??????? return isJFRActive; >>>> ????? } >>>> It is better to remove the line: >>>> +??????? boolean isJFRActive = false; >>>> and just change this one: >>>> +??????? boolean isJFRActive = >>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >>>> >>>> Otherwise, it looks good to me. >>>> I hope, it really works now. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> On 6/8/20 00:26, Fairoz Matte wrote: >>>> Hi Serguei, Erik, >>>> ? Thanks for the reviews, >>>> Below webrev contains the suggested changes, >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ >>>> ? The only thing I couldn?t do is to keep the local copy of >>>> isJFRActive() in HeapwalkingDebugger, The method is called in debugee >>>> code. >>>> In debugger, we have access to debugee before test started or after >>>> test completes. >>>> isJFRActive() method need to be executed during the test execution. >>>> Hence I didn?t find place to initialize and cannot make local copy. >>>> ? Thanks, >>>> Fairoz >>>> ? From: Serguei Spitsyn >>>> Sent: Tuesday, June 2, 2020 7:57 AM >>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin >>>> mailto:erik.gahlin at oracle.com >>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >>>> incorrect and corresponsing logic seems to be broken >>>> ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: >>>> Hi Fairoz, >>>> >>>> It looks okay in general. >>>> But I'm not sure this check is going to work. >>>> The problem is the HeapwalkingDebuggee.useStrictCheck method is >>>> invoked in the context of the HeapwalkingDebugger process, not the >>>> HeapwalkingDebuggee process. >>>> >>>> Probably, you wanted to get this bit of information from the Debuggee >>>> process. >>>> The debuggee has to evaluate it itself and store in some field. >>>> The debugger should use the JDI to get this value from the debuggee. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> I'm not sure, what exactly you wanted to do here. >>>> It can occasionally work for you as long as both processes are run >>>> with the same options. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/1/20 08:52, Fairoz Matte wrote: >>>> Hi Erik, >>>> ? Thanks for the review, below is the updated webrev. >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ >>>> ? Thanks, >>>> Fairoz >>>> ? -----Original Message----- >>>> From: Erik Gahlin >>>> Sent: Monday, June 1, 2020 4:26 PM >>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com >>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >>>> incorrect and corresponsing logic seems to be broken >>>> ? Hi Fairoz, >>>> ? What I think you need to do is something like this: >>>> ? ? ??????? if (className.equals("java.lang.Thread")) { >>>> ? ??????????? return !isJfrInitialized(); >>>> ? ??????? } >>>> ? ... >>>> ? ? ??? private static boolean isJfrInitialized() { >>>> ? ??????? try { >>>> ? ??????????? Class clazz = >>>> Class.forName("jdk.jfr.FlightRecorder"); >>>> ? ??????????? Method method = >>>> clazz.getDeclaredMethod("isInitialized", >>>> new Class[0]); >>>> ? ??????????? return (boolean) method.invoke(null, new Object[0]); >>>> ? ??????? } catch (Exception e) { >>>> ? ??????????? return false; >>>> ? ??????? } >>>> ? ??? } >>>> ? Erik >>>> ? On 2020-06-01 12:30, Fairoz Matte wrote: >>>> Hi Erik, >>>> ? Thanks for your quick response, >>>> Below is the updated webrev to handle if jfr module is not present >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ >>>> ? Thanks, >>>> Fairoz >>>> ? -----Original Message----- >>>> From: Erik Gahlin >>>> Sent: Monday, June 1, 2020 2:31 PM >>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com >>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >>>> incorrect and corresponsing logic seems to be broken >>>> ? Hi Fairoz, >>>> ? If the test needs to run with builds where the JFR module is not >>>> present(?), you need to do the check using reflection. >>>> ? If not, looks good. >>>> ? Erik >>>> ? On 1 Jun 2020, at 10:27, Fairoz Matte >>>> mailto:fairoz.matte at oracle.com wrote: >>>> ? Hi, >>>> ? Please review this small test infra change to identify at runtime >>>> the JFR is active or not. >>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 >>>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ >>>> ? Thanks, >>>> Fairoz >>>> From fairoz.matte at oracle.com Wed Jun 10 06:35:53 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Tue, 9 Jun 2020 23:35:53 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <7aceddb3-1984-32b8-d0ae-8054a59dbcee@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> <3fe586cf-d485-4738-a432-92d3d9aa52da@default> <7aceddb3-1984-32b8-d0ae-8054a59dbcee@oracle.com> Message-ID: Hi Serguei, Thanks for the clarification. I will work on to move isJFRActive () method from the TestDebuggerType2 to HeapWalkingDebugger Thanks, Fairoz > -----Original Message----- > From: Serguei Spitsyn > Sent: Wednesday, June 10, 2020 11:42 AM > To: Fairoz Matte ; Leonid Mesnik > ; Erik Gahlin > Cc: serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect > and corresponsing logic seems to be broken > > Hi Fairoz, > > It is confusing there is methods with the same name isJFRActive on both > debuggee and debugger side. > Leonid is talking about the isJFRActive that belongs to the debugger. > He suggests to move this method from the TestDebuggerType2 to > HeapWalkingDebugger. > The reason is the HeapWalkingDebugger should have a knowledge about the > HeapWalkingDebuggee, not its super class TestDebuggerType2. > It looks like a good suggestion to me. > > Thanks, > Serguei > > > On 6/9/20 23:00, Fairoz Matte wrote: > > Hi Leonid, > > > > The call isJFRActive() need to be executed on HeapwalkingDebuggee side. > > This is what my understanding is. > > > > Thanks, > > Fairoz > > > >> -----Original Message----- > >> From: Leonid Mesnik > >> Sent: Wednesday, June 10, 2020 1:16 AM > >> To: Serguei Spitsyn ; Fairoz Matte > >> ; Erik Gahlin > >> Cc: serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is > >> incorrect and corresponsing logic seems to be broken > >> > >> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/jtr > >> eg/vmT estbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html > >> > >> I see that isJFRActive() depends on "nsk.share.jdi.HeapwalkingDebuggee". > >> It is not going to work of debugee is not > "nsk.share.jdi.HeapwalkingDebuggee". > >> > >> Shouldn't it be placed in HeapWalkingDebugger? > >> > >> Leonid > >> > >> On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: > >>> Hi Fairoz, > >>> > >>> LGTM. > >>> > >>> Thanks, > >>> Serguei > >>> > >>> > >>> On 6/8/20 21:20, Fairoz Matte wrote: > >>>> Hi Serguei, > >>>> > >>>> Thanks for the clarifications, > >>>> I have incorporated the 2nd suggestion, below is the webrev, > >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ > >>>> > >>>> Thanks, > >>>> Fairoz > >>>> > >>>> From: Serguei Spitsyn > >>>> Sent: Monday, June 8, 2020 10:34 PM > >>>> To: Fairoz Matte ; Erik Gahlin > >>>> > >>>> Cc: serviceability-dev at openjdk.java.net > >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() > >>>> is incorrect and corresponsing logic seems to be broken > >>>> > >>>> Hi Fairoz, > >>>> > >>>> > >>>> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: > >>>> Hi Fairoz, > >>>> > >>>> There are two different isJFRActive() methods, one is on debuggee > >>>> side and another on the debugger side. > >>>> The one on debuggee side is better to keep in Debuggee.java (where > >>>> it was before) instead of moving it to HeapwalkingDebuggee.java. > >>>> It is okay to keep the call to it in the HeapwalkingDebuggee.java. > >>>> > >>>> Please, skip this suggestion as Debugger.java is not one of supers > >>>> of HeapwalkingDebuggee.java as I've assumed. > >>>> > >>>> Thanks, > >>>> Serguei > >>>> > >>>> > >>>> +??? protected boolean isJFRActive() { > >>>> +??????? boolean isJFRActive = false; > >>>> +??????? ReferenceType referenceType = > >>>> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); > >>>> +??????? if (referenceType == null) > >>>> +?????????? throw new RuntimeException("Debugeee is not initialized > >>>> yet"); > >>>> + > >>>> +??????? Field isJFRActiveFld = > >>>> referenceType.fieldByName("isJFRActive"); > >>>> +??????? isJFRActive = > >>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > >>>> +??????? return isJFRActive; > >>>> ????? } > >>>> It is better to remove the line: > >>>> +??????? boolean isJFRActive = false; > >>>> and just change this one: > >>>> +??????? boolean isJFRActive = > >>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > >>>> > >>>> Otherwise, it looks good to me. > >>>> I hope, it really works now. > >>>> > >>>> Thanks, > >>>> Serguei > >>>> > >>>> On 6/8/20 00:26, Fairoz Matte wrote: > >>>> Hi Serguei, Erik, > >>>> ? Thanks for the reviews, > >>>> Below webrev contains the suggested changes, > >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ > >>>> ? The only thing I couldn?t do is to keep the local copy of > >>>> isJFRActive() in HeapwalkingDebugger, The method is called in > >>>> debugee code. > >>>> In debugger, we have access to debugee before test started or after > >>>> test completes. > >>>> isJFRActive() method need to be executed during the test execution. > >>>> Hence I didn?t find place to initialize and cannot make local copy. > >>>> ? Thanks, > >>>> Fairoz > >>>> ? From: Serguei Spitsyn > >>>> Sent: Tuesday, June 2, 2020 7:57 AM > >>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin > >>>> mailto:erik.gahlin at oracle.com > >>>> Cc: mailto:serviceability-dev at openjdk.java.net > >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() > >>>> is incorrect and corresponsing logic seems to be broken > >>>> ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: > >>>> Hi Fairoz, > >>>> > >>>> It looks okay in general. > >>>> But I'm not sure this check is going to work. > >>>> The problem is the HeapwalkingDebuggee.useStrictCheck method is > >>>> invoked in the context of the HeapwalkingDebugger process, not the > >>>> HeapwalkingDebuggee process. > >>>> > >>>> Probably, you wanted to get this bit of information from the > >>>> Debuggee process. > >>>> The debuggee has to evaluate it itself and store in some field. > >>>> The debugger should use the JDI to get this value from the debuggee. > >>>> > >>>> Thanks, > >>>> Serguei > >>>> > >>>> I'm not sure, what exactly you wanted to do here. > >>>> It can occasionally work for you as long as both processes are run > >>>> with the same options. > >>>> > >>>> Thanks, > >>>> Serguei > >>>> > >>>> > >>>> On 6/1/20 08:52, Fairoz Matte wrote: > >>>> Hi Erik, > >>>> ? Thanks for the review, below is the updated webrev. > >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ > >>>> ? Thanks, > >>>> Fairoz > >>>> ? -----Original Message----- > >>>> From: Erik Gahlin > >>>> Sent: Monday, June 1, 2020 4:26 PM > >>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com > >>>> Cc: mailto:serviceability-dev at openjdk.java.net > >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() > >>>> is incorrect and corresponsing logic seems to be broken > >>>> ? Hi Fairoz, > >>>> ? What I think you need to do is something like this: > >>>> ? ? ??????? if (className.equals("java.lang.Thread")) { > >>>> ? ??????????? return !isJfrInitialized(); > >>>> ? ??????? } > >>>> ? ... > >>>> ? ? ??? private static boolean isJfrInitialized() { > >>>> ? ??????? try { > >>>> ? ??????????? Class clazz = > >>>> Class.forName("jdk.jfr.FlightRecorder"); > >>>> ? ??????????? Method method = > >>>> clazz.getDeclaredMethod("isInitialized", > >>>> new Class[0]); > >>>> ? ??????????? return (boolean) method.invoke(null, new Object[0]); > >>>> ? ??????? } catch (Exception e) { > >>>> ? ??????????? return false; > >>>> ? ??????? } > >>>> ? ??? } > >>>> ? Erik > >>>> ? On 2020-06-01 12:30, Fairoz Matte wrote: > >>>> Hi Erik, > >>>> ? Thanks for your quick response, > >>>> Below is the updated webrev to handle if jfr module is not present > >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ > >>>> ? Thanks, > >>>> Fairoz > >>>> ? -----Original Message----- > >>>> From: Erik Gahlin > >>>> Sent: Monday, June 1, 2020 2:31 PM > >>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com > >>>> Cc: mailto:serviceability-dev at openjdk.java.net > >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() > >>>> is incorrect and corresponsing logic seems to be broken > >>>> ? Hi Fairoz, > >>>> ? If the test needs to run with builds where the JFR module is not > >>>> present(?), you need to do the check using reflection. > >>>> ? If not, looks good. > >>>> ? Erik > >>>> ? On 1 Jun 2020, at 10:27, Fairoz Matte > >>>> mailto:fairoz.matte at oracle.com wrote: > >>>> ? Hi, > >>>> ? Please review this small test infra change to identify at > >>>> runtime the JFR is active or not. > >>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > >>>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > >>>> ? Thanks, > >>>> Fairoz > >>>> > From ralf.schmelter at sap.com Wed Jun 10 06:55:21 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Wed, 10 Jun 2020 06:55:21 +0000 Subject: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump In-Reply-To: References: <0343dfac-61f7-1b1c-ee96-bdee130578ad@oracle.com> <2363c58d-38c1-ae19-ed34-c82af6304780@oracle.com> , Message-ID: Hi Christoph, thanks for your review. I've incorporated your changes. I will run the relevant tests again and if no problems show up, I will submit the change later this day. Best regards, Ralf -----Original Message----- From: Langer, Christoph Sent: Tuesday, 9 June 2020 22:23 To: Schmelter, Ralf Cc: serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime ; David Holmes ; serguei.spitsyn at oracle.com; Ioi Lam ; Lindenmaier, Goetz Subject: RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump Hi Ralf, I finally managed to fully read through your change. Very nice piece of work. I only found a few minor nits which would be nice if you could address them before pushing. But no need for further webrev. Here we go: workgroup.cpp - update copyright year L111: little spelling issue: forergound -> foreground diagnosticCommand.cpp L509: spelling recommneded -> recommended L510: Initialization of default value ("1") is not necessary as current implementation wouldn't allow the parameter -gz without value. heapDumperCompression.hpp and heapDumperCompression.cpp: License header says: Copyright (c) 2005, 2020, Oracle and/or its affiliates. All rights reserved. However, it's a net new file, so it should just be 2020, Also, since this is new code, coming from SAP, you should credit SAP in the copyright header (same way as you have done it in the test files). test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java: L88: new ArrayList<> (diamond operator without type) Thanks & Best regards Christoph From serguei.spitsyn at oracle.com Wed Jun 10 07:28:49 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 10 Jun 2020 00:28:49 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> <3fe586cf-d485-4738-a432-92d3d9aa52da@default> <7aceddb3-1984-32b8-d0ae-8054a59dbcee@oracle.com> Message-ID: <073df252-52d4-3c1b-ccfd-82fae69e363c@oracle.com> On 6/9/20 23:35, Fairoz Matte wrote: > Hi Serguei, > > Thanks for the clarification. > I will work on to move isJFRActive () method from the TestDebuggerType2 to HeapWalkingDebugger Probably, there is no need in another webrev if you move it. But you did not get a final thumbs up from Leonid yet. Thanks, Serguei > Thanks, > Fairoz > >> -----Original Message----- >> From: Serguei Spitsyn >> Sent: Wednesday, June 10, 2020 11:42 AM >> To: Fairoz Matte ; Leonid Mesnik >> ; Erik Gahlin >> Cc: serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect >> and corresponsing logic seems to be broken >> >> Hi Fairoz, >> >> It is confusing there is methods with the same name isJFRActive on both >> debuggee and debugger side. >> Leonid is talking about the isJFRActive that belongs to the debugger. >> He suggests to move this method from the TestDebuggerType2 to >> HeapWalkingDebugger. >> The reason is the HeapWalkingDebugger should have a knowledge about the >> HeapWalkingDebuggee, not its super class TestDebuggerType2. >> It looks like a good suggestion to me. >> >> Thanks, >> Serguei >> >> >> On 6/9/20 23:00, Fairoz Matte wrote: >>> Hi Leonid, >>> >>> The call isJFRActive() need to be executed on HeapwalkingDebuggee side. >>> This is what my understanding is. >>> >>> Thanks, >>> Fairoz >>> >>>> -----Original Message----- >>>> From: Leonid Mesnik >>>> Sent: Wednesday, June 10, 2020 1:16 AM >>>> To: Serguei Spitsyn ; Fairoz Matte >>>> ; Erik Gahlin >>>> Cc: serviceability-dev at openjdk.java.net >>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >>>> incorrect and corresponsing logic seems to be broken >>>> >>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/jtr >>>> eg/vmT estbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html >>>> >>>> I see that isJFRActive() depends on "nsk.share.jdi.HeapwalkingDebuggee". >>>> It is not going to work of debugee is not >> "nsk.share.jdi.HeapwalkingDebuggee". >>>> Shouldn't it be placed in HeapWalkingDebugger? >>>> >>>> Leonid >>>> >>>> On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: >>>>> Hi Fairoz, >>>>> >>>>> LGTM. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/8/20 21:20, Fairoz Matte wrote: >>>>>> Hi Serguei, >>>>>> >>>>>> Thanks for the clarifications, >>>>>> I have incorporated the 2nd suggestion, below is the webrev, >>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ >>>>>> >>>>>> Thanks, >>>>>> Fairoz >>>>>> >>>>>> From: Serguei Spitsyn >>>>>> Sent: Monday, June 8, 2020 10:34 PM >>>>>> To: Fairoz Matte ; Erik Gahlin >>>>>> >>>>>> Cc: serviceability-dev at openjdk.java.net >>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>> is incorrect and corresponsing logic seems to be broken >>>>>> >>>>>> Hi Fairoz, >>>>>> >>>>>> >>>>>> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: >>>>>> Hi Fairoz, >>>>>> >>>>>> There are two different isJFRActive() methods, one is on debuggee >>>>>> side and another on the debugger side. >>>>>> The one on debuggee side is better to keep in Debuggee.java (where >>>>>> it was before) instead of moving it to HeapwalkingDebuggee.java. >>>>>> It is okay to keep the call to it in the HeapwalkingDebuggee.java. >>>>>> >>>>>> Please, skip this suggestion as Debugger.java is not one of supers >>>>>> of HeapwalkingDebuggee.java as I've assumed. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> +??? protected boolean isJFRActive() { >>>>>> +??????? boolean isJFRActive = false; >>>>>> +??????? ReferenceType referenceType = >>>>>> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); >>>>>> +??????? if (referenceType == null) >>>>>> +?????????? throw new RuntimeException("Debugeee is not initialized >>>>>> yet"); >>>>>> + >>>>>> +??????? Field isJFRActiveFld = >>>>>> referenceType.fieldByName("isJFRActive"); >>>>>> +??????? isJFRActive = >>>>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >>>>>> +??????? return isJFRActive; >>>>>> ????? } >>>>>> It is better to remove the line: >>>>>> +??????? boolean isJFRActive = false; >>>>>> and just change this one: >>>>>> +??????? boolean isJFRActive = >>>>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >>>>>> >>>>>> Otherwise, it looks good to me. >>>>>> I hope, it really works now. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> On 6/8/20 00:26, Fairoz Matte wrote: >>>>>> Hi Serguei, Erik, >>>>>> ? Thanks for the reviews, >>>>>> Below webrev contains the suggested changes, >>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ >>>>>> ? The only thing I couldn?t do is to keep the local copy of >>>>>> isJFRActive() in HeapwalkingDebugger, The method is called in >>>>>> debugee code. >>>>>> In debugger, we have access to debugee before test started or after >>>>>> test completes. >>>>>> isJFRActive() method need to be executed during the test execution. >>>>>> Hence I didn?t find place to initialize and cannot make local copy. >>>>>> ? Thanks, >>>>>> Fairoz >>>>>> ? From: Serguei Spitsyn >>>>>> Sent: Tuesday, June 2, 2020 7:57 AM >>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin >>>>>> mailto:erik.gahlin at oracle.com >>>>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>> is incorrect and corresponsing logic seems to be broken >>>>>> ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: >>>>>> Hi Fairoz, >>>>>> >>>>>> It looks okay in general. >>>>>> But I'm not sure this check is going to work. >>>>>> The problem is the HeapwalkingDebuggee.useStrictCheck method is >>>>>> invoked in the context of the HeapwalkingDebugger process, not the >>>>>> HeapwalkingDebuggee process. >>>>>> >>>>>> Probably, you wanted to get this bit of information from the >>>>>> Debuggee process. >>>>>> The debuggee has to evaluate it itself and store in some field. >>>>>> The debugger should use the JDI to get this value from the debuggee. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> I'm not sure, what exactly you wanted to do here. >>>>>> It can occasionally work for you as long as both processes are run >>>>>> with the same options. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 6/1/20 08:52, Fairoz Matte wrote: >>>>>> Hi Erik, >>>>>> ? Thanks for the review, below is the updated webrev. >>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ >>>>>> ? Thanks, >>>>>> Fairoz >>>>>> ? -----Original Message----- >>>>>> From: Erik Gahlin >>>>>> Sent: Monday, June 1, 2020 4:26 PM >>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com >>>>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>> is incorrect and corresponsing logic seems to be broken >>>>>> ? Hi Fairoz, >>>>>> ? What I think you need to do is something like this: >>>>>> ? ? ??????? if (className.equals("java.lang.Thread")) { >>>>>> ? ??????????? return !isJfrInitialized(); >>>>>> ? ??????? } >>>>>> ? ... >>>>>> ? ? ??? private static boolean isJfrInitialized() { >>>>>> ? ??????? try { >>>>>> ? ??????????? Class clazz = >>>>>> Class.forName("jdk.jfr.FlightRecorder"); >>>>>> ? ??????????? Method method = >>>>>> clazz.getDeclaredMethod("isInitialized", >>>>>> new Class[0]); >>>>>> ? ??????????? return (boolean) method.invoke(null, new Object[0]); >>>>>> ? ??????? } catch (Exception e) { >>>>>> ? ??????????? return false; >>>>>> ? ??????? } >>>>>> ? ??? } >>>>>> ? Erik >>>>>> ? On 2020-06-01 12:30, Fairoz Matte wrote: >>>>>> Hi Erik, >>>>>> ? Thanks for your quick response, >>>>>> Below is the updated webrev to handle if jfr module is not present >>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ >>>>>> ? Thanks, >>>>>> Fairoz >>>>>> ? -----Original Message----- >>>>>> From: Erik Gahlin >>>>>> Sent: Monday, June 1, 2020 2:31 PM >>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com >>>>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>> is incorrect and corresponsing logic seems to be broken >>>>>> ? Hi Fairoz, >>>>>> ? If the test needs to run with builds where the JFR module is not >>>>>> present(?), you need to do the check using reflection. >>>>>> ? If not, looks good. >>>>>> ? Erik >>>>>> ? On 1 Jun 2020, at 10:27, Fairoz Matte >>>>>> mailto:fairoz.matte at oracle.com wrote: >>>>>> ? Hi, >>>>>> ? Please review this small test infra change to identify at >>>>>> runtime the JFR is active or not. >>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 >>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ >>>>>> ? Thanks, >>>>>> Fairoz >>>>>> From robbin.ehn at oracle.com Wed Jun 10 07:55:15 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 10 Jun 2020 09:55:15 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> Message-ID: Hi David, On 2020-06-10 04:30, David Holmes wrote: > Hi Robbin, > > On 10/06/2020 2:15 am, Robbin Ehn wrote: >> Hi all, >> >> If the direct handshake is executed by the target thread, the JNI >> local(s) are created in that thread but returned in the handshaking >> thread. >> They thus are not safe to use. (thread might even have exited by this >> point) >> >> Code: >> http://cr.openjdk.java.net/~rehn/8247248/v1/webrev/ >> >> Unfortunately there is no way the distinguish a local jobject vs a >> global. Which makes it hard to track when the jobject is global and not. > > I have some comments/concerns that I've added to the bug report. > > Switching from local refs to global refs adds a bit of overhead and will > likely impact the performance here. Not a showstopper but would be nice > to avoid if possible as it seems a bit of a kludge to communicate values > across threads via global refs. Using our test to measure nanos, it's seem to be 5%, 21 us to 22 us when target is current thread, no handshake at all. (handshake case is around 65 us, so noise is larger than overhead). In earlier version of the patch I skipped global when doing thread self. But since it's not easy to track if it's a global or local I removed that for the simplification. > > In the old VMOperation solution the VMThread used the JvmtiEnv* of the > calling thread so that the local refs were in the right place. Can we do > the same with the handshake code? It will mean restoring the "calling > thread" argument to the handshake operation but I think this is a > workable approach. (The removal of the calling thread argument should > have rung alarm bells :( .) > > Or, could this be case where we don't want the target thread to execute > the handshake and we need a way to mark the handshake operation as such? > That's a bigger change of course. This is a good idea, and we can easily implement a generic facility in the, hopefully, upcoming asynchronous handshakes. But no matter in what form I don't see us getting this enhancement to handshakes before JDK 15. How do you want to proceed for JDK 15? Thanks, Robbin > > Thanks, > David > ----- > >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8247248 >> >> Local testing of JDI/JVMTI and t1-5. >> (no real crash so there is nothing to reproduce) >> >> Thanks, Robbin From robbin.ehn at oracle.com Wed Jun 10 07:56:12 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 10 Jun 2020 09:56:12 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <014b74c7-2bfd-ea31-0890-acccc52df86a@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <014b74c7-2bfd-ea31-0890-acccc52df86a@oracle.com> Message-ID: Hi Serguei, thanks! Yes I'll add a comments. Thanks, Robbin On 2020-06-09 20:30, serguei.spitsyn at oracle.com wrote: > Hi Robbin, > > Nice catch! > The fix looks good in general. > I'd be nice to add comments to explain why these global refs are created. > > > Thanks, > Serguei > > > On 6/9/20 09:15, Robbin Ehn wrote: >> Hi all, >> >> If the direct handshake is executed by the target thread, the JNI >> local(s) are created in that thread but returned in the handshaking >> thread. >> They thus are not safe to use. (thread might even have exited by this >> point) >> >> Code: >> http://cr.openjdk.java.net/~rehn/8247248/v1/webrev/ >> >> Unfortunately there is no way the distinguish a local jobject vs a >> global. Which makes it hard to track when the jobject is global and not. >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8247248 >> >> Local testing of JDI/JVMTI and t1-5. >> (no real crash so there is nothing to reproduce) >> >> Thanks, Robbin > From david.holmes at oracle.com Wed Jun 10 08:33:40 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 10 Jun 2020 18:33:40 +1000 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> Message-ID: <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Hi Robbin, On 10/06/2020 5:55 pm, Robbin Ehn wrote: > Hi David, > > On 2020-06-10 04:30, David Holmes wrote: >> Hi Robbin, >> >> On 10/06/2020 2:15 am, Robbin Ehn wrote: >>> Hi all, >>> >>> If the direct handshake is executed by the target thread, the JNI >>> local(s) are created in that thread but returned in the handshaking >>> thread. >>> They thus are not safe to use. (thread might even have exited by this >>> point) >>> >>> Code: >>> http://cr.openjdk.java.net/~rehn/8247248/v1/webrev/ >>> >>> Unfortunately there is no way the distinguish a local jobject vs a >>> global. Which makes it hard to track when the jobject is global and not. >> >> I have some comments/concerns that I've added to the bug report. >> >> Switching from local refs to global refs adds a bit of overhead and >> will likely impact the performance here. Not a showstopper but would >> be nice to avoid if possible as it seems a bit of a kludge to >> communicate values across threads via global refs. > > Using our test to measure nanos, it's seem to be 5%, 21 us to 22 us when > target is current thread, no handshake at all. > (handshake case is around 65 us, so noise is larger than overhead). > > In earlier version of the patch I skipped global when doing thread self. > But since it's not easy to track if it's a global or local I removed > that for the simplification. > >> >> In the old VMOperation solution the VMThread used the JvmtiEnv* of the >> calling thread so that the local refs were in the right place. Can we >> do the same with the handshake code? It will mean restoring the >> "calling thread" argument to the handshake operation but I think this >> is a workable approach. (The removal of the calling thread argument >> should have rung alarm bells :( .) >> >> Or, could this be case where we don't want the target thread to >> execute the handshake and we need a way to mark the handshake >> operation as such? That's a bigger change of course. > > This is a good idea, and we can easily implement a generic facility in > the, hopefully, upcoming asynchronous handshakes. > > But no matter in what form I don't see us getting this enhancement to > handshakes before JDK 15. > > How do you want to proceed for JDK 15? Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. We know it is functionally correct and it should also have the same performance profile. Thanks, David > Thanks, Robbin > >> >> Thanks, >> David >> ----- >> >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>> >>> Local testing of JDI/JVMTI and t1-5. >>> (no real crash so there is nothing to reproduce) >>> >>> Thanks, Robbin From robbin.ehn at oracle.com Wed Jun 10 10:17:15 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 10 Jun 2020 12:17:15 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: <56ff2d7d-ceaa-7f7f-ce27-87379472785e@oracle.com> Hi David, > Honestly I think I'd like to see things reverted to the use of > calling_thread as done for the VMOperation previously. We know it is > functionally correct and it should also have the same performance profile. Ok, sure. But I don't have time to do that before fork. I'll fix that in 16 and backport to 15. Thanks, Robbin > > Thanks, > David > >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> ----- >>> >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>> >>>> Local testing of JDI/JVMTI and t1-5. >>>> (no real crash so there is nothing to reproduce) >>>> >>>> Thanks, Robbin From david.holmes at oracle.com Wed Jun 10 11:44:37 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 10 Jun 2020 21:44:37 +1000 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <56ff2d7d-ceaa-7f7f-ce27-87379472785e@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> <56ff2d7d-ceaa-7f7f-ce27-87379472785e@oracle.com> Message-ID: On 10/06/2020 8:17 pm, Robbin Ehn wrote: > Hi David, >> Honestly I think I'd like to see things reverted to the use of >> calling_thread as done for the VMOperation previously. We know it is >> functionally correct and it should also have the same performance >> profile. > > Ok, sure. But I don't have time to do that before fork. > I'll fix that in 16 and backport to 15. If you fix in 15 it will be automatically forward-ported to 16. Thanks, David > Thanks, Robbin > >> >> Thanks, >> David >> >>> Thanks, Robbin >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>> >>>>> Local testing of JDI/JVMTI and t1-5. >>>>> (no real crash so there is nothing to reproduce) >>>>> >>>>> Thanks, Robbin From poonam.bajaj at oracle.com Wed Jun 10 13:06:40 2020 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Wed, 10 Jun 2020 06:06:40 -0700 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: <4395108a-3ad6-5d3e-fff6-e9338a4211e9@oracle.com> References: <4395108a-3ad6-5d3e-fff6-e9338a4211e9@oracle.com> Message-ID: <861755cc-f820-ed3b-4507-be76c476b800@oracle.com> Hello Harold, Thanks for your review! I fixed the null string issue, and here's the updated webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.01/ Thanks, Poonam On 6/9/20 8:38 AM, Harold Seigel wrote: > Hi Poonam, > > Thanks for making this change. > > In verifier.cpp, if ex_msg is NULL, will the call to st->print_cr() at > line 142 - 143, fail? > > Thanks, Harold > > On 6/9/2020 10:46 AM, Poonam Parhar wrote: >> Hello, >> >> Please review this simple change for improving diagnostics around >> class verification and linking failures: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 >> Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ >> >> Problem: During the class redefinition process, if a class >> verification fails because it could not find a class referenced in >> the class being redefined, the printed NoClassDefFoundError error >> message is not very helpful. It does not print the class name for >> which NoClassDefFoundError was encountered, and that makes it very >> hard to find the real cause of redefinition failure. >> >> The proposed solution prints the class name during class linking and >> verification failures. Example output produced with these changes: >> >> With 'redefine' tag: >> >> ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded >> name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) >> ???? [java] [3.243s][debug][redefine,class,load??????? ] loading >> name=org.apache.commons.logging.impl.Log4JLogger kind=101 >> (avail_mem=819540K) >> ???? [java] [3.244s][info ][redefine,class,load,exceptions] >> link_class exception: 'java/lang/NoClassDefFoundError >> org/apache/log4j/Priority' >> ???? [java] Java Result: 1 >> >> With 'verification' tag: >> >> ???? [java] [49.702s][info ][verification] Verification for >> org.apache.commons.logging.impl.Log4JLogger has exception pending >> 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' >> ???? [java] [49.702s][info ][verification] End class verification >> for: org.apache.commons.logging.impl.Log4JLogger >> >> >> Improved error message: >> >> ???? [java] Exception in thread "main" java.lang.InternalError: class >> redefinition failed: invalid class >> ???? [java] ??? at >> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native >> Method) >> ???? [java] ??? at >> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) >> ???? [java] ??? at Main.main(Unknown Source) >> >> >> Thanks, >> Poonam >> From daniel.daugherty at oracle.com Wed Jun 10 13:09:56 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 10 Jun 2020 09:09:56 -0400 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> <56ff2d7d-ceaa-7f7f-ce27-87379472785e@oracle.com> Message-ID: <2062bdbc-4f42-9e6d-e1fd-57063b640ef1@oracle.com> On 6/10/20 7:44 AM, David Holmes wrote: > On 10/06/2020 8:17 pm, Robbin Ehn wrote: >> Hi David, >>> Honestly I think I'd like to see things reverted to the use of >>> calling_thread as done for the VMOperation previously. We know it is >>> functionally correct and it should also have the same performance >>> profile. >> >> Ok, sure. But I don't have time to do that before fork. >> I'll fix that in 16 and backport to 15. > > If you fix in 15 it will be automatically forward-ported to 16. This is a P2 bug so it can be pushed after the FC cutoff right? Dan > > Thanks, > David > >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Issue: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>>> >>>>>> Local testing of JDI/JVMTI and t1-5. >>>>>> (no real crash so there is nothing to reproduce) >>>>>> >>>>>> Thanks, Robbin From robbin.ehn at oracle.com Wed Jun 10 13:38:35 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 10 Jun 2020 15:38:35 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <2062bdbc-4f42-9e6d-e1fd-57063b640ef1@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> <56ff2d7d-ceaa-7f7f-ce27-87379472785e@oracle.com> <2062bdbc-4f42-9e6d-e1fd-57063b640ef1@oracle.com> Message-ID: <68b87493-21b0-ee41-d9c0-54d7f8b2099b@oracle.com> Hi Dan, On 2020-06-10 15:09, Daniel D. Daugherty wrote: > This is a P2 bug so it can be pushed after the FC cutoff right? Not sure, I put in P2 so I could do that if necessary. /Robbin > > Dan > > >> >> Thanks, >> David >> >>> Thanks, Robbin >>> >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Robbin >>>>> >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Issue: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>>>> >>>>>>> Local testing of JDI/JVMTI and t1-5. >>>>>>> (no real crash so there is nothing to reproduce) >>>>>>> >>>>>>> Thanks, Robbin > From robbin.ehn at oracle.com Wed Jun 10 13:57:36 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 10 Jun 2020 15:57:36 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: Hi David and Serguei, (Dan feel free to chime in) > Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. We > know it is functionally correct and it should also have the same performance profile. Done: http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ Passes: hotspot jdi/jvmti testing, running mach5. I'll push tomorrow morning if test is ok and you all are happy (+- nits). (and no objection to break the 24h rule) I started this patch with reverting "8242425: JVMTI monitor operations should use Thread-Local Handshakes". And work my way forward. Thanks, Robbin > > Thanks, > David > >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> ----- >>> >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>> >>>> Local testing of JDI/JVMTI and t1-5. >>>> (no real crash so there is nothing to reproduce) >>>> >>>> Thanks, Robbin From coleen.phillimore at oracle.com Wed Jun 10 14:36:08 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 10 Jun 2020 10:36:08 -0400 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: <861755cc-f820-ed3b-4507-be76c476b800@oracle.com> References: <4395108a-3ad6-5d3e-fff6-e9338a4211e9@oracle.com> <861755cc-f820-ed3b-4507-be76c476b800@oracle.com> Message-ID: Looks good to me now. thanks, Coleen On 6/10/20 9:06 AM, Poonam Parhar wrote: > Hello Harold, > > Thanks for your review! I fixed the null string issue, and here's the > updated webrev: > http://cr.openjdk.java.net/~poonam/8243290/webrev.01/ > > Thanks, > Poonam > > On 6/9/20 8:38 AM, Harold Seigel wrote: >> Hi Poonam, >> >> Thanks for making this change. >> >> In verifier.cpp, if ex_msg is NULL, will the call to st->print_cr() >> at line 142 - 143, fail? >> >> Thanks, Harold >> >> On 6/9/2020 10:46 AM, Poonam Parhar wrote: >>> Hello, >>> >>> Please review this simple change for improving diagnostics around >>> class verification and linking failures: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 >>> Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ >>> >>> Problem: During the class redefinition process, if a class >>> verification fails because it could not find a class referenced in >>> the class being redefined, the printed NoClassDefFoundError error >>> message is not very helpful. It does not print the class name for >>> which NoClassDefFoundError was encountered, and that makes it very >>> hard to find the real cause of redefinition failure. >>> >>> The proposed solution prints the class name during class linking and >>> verification failures. Example output produced with these changes: >>> >>> With 'redefine' tag: >>> >>> ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded >>> name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) >>> ???? [java] [3.243s][debug][redefine,class,load??????? ] loading >>> name=org.apache.commons.logging.impl.Log4JLogger kind=101 >>> (avail_mem=819540K) >>> ???? [java] [3.244s][info ][redefine,class,load,exceptions] >>> link_class exception: 'java/lang/NoClassDefFoundError >>> org/apache/log4j/Priority' >>> ???? [java] Java Result: 1 >>> >>> With 'verification' tag: >>> >>> ???? [java] [49.702s][info ][verification] Verification for >>> org.apache.commons.logging.impl.Log4JLogger has exception pending >>> 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' >>> ???? [java] [49.702s][info ][verification] End class verification >>> for: org.apache.commons.logging.impl.Log4JLogger >>> >>> >>> Improved error message: >>> >>> ???? [java] Exception in thread "main" java.lang.InternalError: >>> class redefinition failed: invalid class >>> ???? [java] ??? at >>> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native >>> Method) >>> ???? [java] ??? at >>> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) >>> ???? [java] ??? at Main.main(Unknown Source) >>> >>> >>> Thanks, >>> Poonam >>> > From harold.seigel at oracle.com Wed Jun 10 15:17:38 2020 From: harold.seigel at oracle.com (Harold Seigel) Date: Wed, 10 Jun 2020 11:17:38 -0400 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: References: <4395108a-3ad6-5d3e-fff6-e9338a4211e9@oracle.com> <861755cc-f820-ed3b-4507-be76c476b800@oracle.com> Message-ID: <9cd5b895-11fa-aec2-bf80-474ee4159a0c@oracle.com> +1 Thanks, Harold On 6/10/2020 10:36 AM, coleen.phillimore at oracle.com wrote: > Looks good to me now. > thanks, > Coleen > > On 6/10/20 9:06 AM, Poonam Parhar wrote: >> Hello Harold, >> >> Thanks for your review! I fixed the null string issue, and here's the >> updated webrev: >> http://cr.openjdk.java.net/~poonam/8243290/webrev.01/ >> >> Thanks, >> Poonam >> >> On 6/9/20 8:38 AM, Harold Seigel wrote: >>> Hi Poonam, >>> >>> Thanks for making this change. >>> >>> In verifier.cpp, if ex_msg is NULL, will the call to st->print_cr() >>> at line 142 - 143, fail? >>> >>> Thanks, Harold >>> >>> On 6/9/2020 10:46 AM, Poonam Parhar wrote: >>>> Hello, >>>> >>>> Please review this simple change for improving diagnostics around >>>> class verification and linking failures: >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 >>>> Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ >>>> >>>> Problem: During the class redefinition process, if a class >>>> verification fails because it could not find a class referenced in >>>> the class being redefined, the printed NoClassDefFoundError error >>>> message is not very helpful. It does not print the class name for >>>> which NoClassDefFoundError was encountered, and that makes it very >>>> hard to find the real cause of redefinition failure. >>>> >>>> The proposed solution prints the class name during class linking >>>> and verification failures. Example output produced with these changes: >>>> >>>> With 'redefine' tag: >>>> >>>> ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded >>>> name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) >>>> ???? [java] [3.243s][debug][redefine,class,load??????? ] loading >>>> name=org.apache.commons.logging.impl.Log4JLogger kind=101 >>>> (avail_mem=819540K) >>>> ???? [java] [3.244s][info ][redefine,class,load,exceptions] >>>> link_class exception: 'java/lang/NoClassDefFoundError >>>> org/apache/log4j/Priority' >>>> ???? [java] Java Result: 1 >>>> >>>> With 'verification' tag: >>>> >>>> ???? [java] [49.702s][info ][verification] Verification for >>>> org.apache.commons.logging.impl.Log4JLogger has exception pending >>>> 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' >>>> ???? [java] [49.702s][info ][verification] End class verification >>>> for: org.apache.commons.logging.impl.Log4JLogger >>>> >>>> >>>> Improved error message: >>>> >>>> ???? [java] Exception in thread "main" java.lang.InternalError: >>>> class redefinition failed: invalid class >>>> ???? [java] ??? at >>>> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native >>>> Method) >>>> ???? [java] ??? at >>>> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) >>>> ???? [java] ??? at Main.main(Unknown Source) >>>> >>>> >>>> Thanks, >>>> Poonam >>>> >> > From serguei.spitsyn at oracle.com Wed Jun 10 15:52:07 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 10 Jun 2020 08:52:07 -0700 Subject: RFR: 8243290: Improve diagnostic messages for class verification and redefinition failures In-Reply-To: <9cd5b895-11fa-aec2-bf80-474ee4159a0c@oracle.com> References: <4395108a-3ad6-5d3e-fff6-e9338a4211e9@oracle.com> <861755cc-f820-ed3b-4507-be76c476b800@oracle.com> <9cd5b895-11fa-aec2-bf80-474ee4159a0c@oracle.com> Message-ID: +1 Thanks, Serguei On 6/10/20 08:17, Harold Seigel wrote: > +1 > > Thanks, Harold > > On 6/10/2020 10:36 AM, coleen.phillimore at oracle.com wrote: >> Looks good to me now. >> thanks, >> Coleen >> >> On 6/10/20 9:06 AM, Poonam Parhar wrote: >>> Hello Harold, >>> >>> Thanks for your review! I fixed the null string issue, and here's >>> the updated webrev: >>> http://cr.openjdk.java.net/~poonam/8243290/webrev.01/ >>> >>> Thanks, >>> Poonam >>> >>> On 6/9/20 8:38 AM, Harold Seigel wrote: >>>> Hi Poonam, >>>> >>>> Thanks for making this change. >>>> >>>> In verifier.cpp, if ex_msg is NULL, will the call to st->print_cr() >>>> at line 142 - 143, fail? >>>> >>>> Thanks, Harold >>>> >>>> On 6/9/2020 10:46 AM, Poonam Parhar wrote: >>>>> Hello, >>>>> >>>>> Please review this simple change for improving diagnostics around >>>>> class verification and linking failures: >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8243290 >>>>> Webrev: http://cr.openjdk.java.net/~poonam/8243290/webrev.00/ >>>>> >>>>> Problem: During the class redefinition process, if a class >>>>> verification fails because it could not find a class referenced in >>>>> the class being redefined, the printed NoClassDefFoundError error >>>>> message is not very helpful. It does not print the class name for >>>>> which NoClassDefFoundError was encountered, and that makes it very >>>>> hard to find the real cause of redefinition failure. >>>>> >>>>> The proposed solution prints the class name during class linking >>>>> and verification failures. Example output produced with these >>>>> changes: >>>>> >>>>> With 'redefine' tag: >>>>> >>>>> ???? [java] [3.243s][debug][redefine,class,load??????? ] loaded >>>>> name=org.apache.commons.logging.impl.Jdk14Logger (avail_mem=819540K) >>>>> ???? [java] [3.243s][debug][redefine,class,load??????? ] loading >>>>> name=org.apache.commons.logging.impl.Log4JLogger kind=101 >>>>> (avail_mem=819540K) >>>>> ???? [java] [3.244s][info ][redefine,class,load,exceptions] >>>>> link_class exception: 'java/lang/NoClassDefFoundError >>>>> org/apache/log4j/Priority' >>>>> ???? [java] Java Result: 1 >>>>> >>>>> With 'verification' tag: >>>>> >>>>> ???? [java] [49.702s][info ][verification] Verification for >>>>> org.apache.commons.logging.impl.Log4JLogger has exception pending >>>>> 'java.lang.NoClassDefFoundError org/apache/log4j/Priority' >>>>> ???? [java] [49.702s][info ][verification] End class verification >>>>> for: org.apache.commons.logging.impl.Log4JLogger >>>>> >>>>> >>>>> Improved error message: >>>>> >>>>> ???? [java] Exception in thread "main" java.lang.InternalError: >>>>> class redefinition failed: invalid class >>>>> ???? [java] ??? at >>>>> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native >>>>> Method) >>>>> ???? [java] ??? at >>>>> java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:167) >>>>> ???? [java] ??? at Main.main(Unknown Source) >>>>> >>>>> >>>>> Thanks, >>>>> Poonam >>>>> >>> >> From leonid.mesnik at oracle.com Wed Jun 10 16:48:41 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 10 Jun 2020 09:48:41 -0700 Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: <073df252-52d4-3c1b-ccfd-82fae69e363c@oracle.com> References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> <3fe586cf-d485-4738-a432-92d3d9aa52da@default> <7aceddb3-1984-32b8-d0ae-8054a59dbcee@oracle.com> <073df252-52d4-3c1b-ccfd-82fae69e363c@oracle.com> Message-ID: Looks good, no other webrev is needed. Leonid On 6/10/20 12:28 AM, serguei.spitsyn at oracle.com wrote: > > On 6/9/20 23:35, Fairoz Matte wrote: >> Hi Serguei, >> >> Thanks for the clarification. >> I will work on to? move isJFRActive () method from the >> TestDebuggerType2 to HeapWalkingDebugger > > Probably, there is no need in another webrev if you move it. > But you did not get a final thumbs up from Leonid yet. > > Thanks, > Serguei > >> Thanks, >> Fairoz >> >>> -----Original Message----- >>> From: Serguei Spitsyn >>> Sent: Wednesday, June 10, 2020 11:42 AM >>> To: Fairoz Matte ; Leonid Mesnik >>> ; Erik Gahlin >>> Cc: serviceability-dev at openjdk.java.net >>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>> is incorrect >>> and corresponsing logic seems to be broken >>> >>> Hi Fairoz, >>> >>> It is confusing there is methods with the same name isJFRActive on both >>> debuggee and debugger side. >>> Leonid is talking about the isJFRActive that belongs to the debugger. >>> He suggests to move this method from the TestDebuggerType2 to >>> HeapWalkingDebugger. >>> The reason is the HeapWalkingDebugger should have a knowledge about the >>> HeapWalkingDebuggee, not its super class TestDebuggerType2. >>> It looks like a good suggestion to me. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/9/20 23:00, Fairoz Matte wrote: >>>> Hi Leonid, >>>> >>>> The call isJFRActive() need to be executed on HeapwalkingDebuggee? >>>> side. >>>> This is what my understanding is. >>>> >>>> Thanks, >>>> Fairoz >>>> >>>>> -----Original Message----- >>>>> From: Leonid Mesnik >>>>> Sent: Wednesday, June 10, 2020 1:16 AM >>>>> To: Serguei Spitsyn ; Fairoz Matte >>>>> ; Erik Gahlin >>>>> Cc: serviceability-dev at openjdk.java.net >>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is >>>>> incorrect and corresponsing logic seems to be broken >>>>> >>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/jtr >>>>> eg/vmT estbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html >>>>> >>>>> I see that isJFRActive() depends on >>>>> "nsk.share.jdi.HeapwalkingDebuggee". >>>>> It is not going to work of debugee is not >>> "nsk.share.jdi.HeapwalkingDebuggee". >>>>> Shouldn't it be placed in HeapWalkingDebugger? >>>>> >>>>> Leonid >>>>> >>>>> On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Fairoz, >>>>>> >>>>>> LGTM. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 6/8/20 21:20, Fairoz Matte wrote: >>>>>>> Hi Serguei, >>>>>>> >>>>>>> Thanks for the clarifications, >>>>>>> I have incorporated the 2nd suggestion, below is the webrev, >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ >>>>>>> >>>>>>> Thanks, >>>>>>> Fairoz >>>>>>> >>>>>>> From: Serguei Spitsyn >>>>>>> Sent: Monday, June 8, 2020 10:34 PM >>>>>>> To: Fairoz Matte ; Erik Gahlin >>>>>>> >>>>>>> Cc: serviceability-dev at openjdk.java.net >>>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>>> is incorrect and corresponsing logic seems to be broken >>>>>>> >>>>>>> Hi Fairoz, >>>>>>> >>>>>>> >>>>>>> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Fairoz, >>>>>>> >>>>>>> There are two different isJFRActive() methods, one is on debuggee >>>>>>> side and another on the debugger side. >>>>>>> The one on debuggee side is better to keep in Debuggee.java (where >>>>>>> it was before) instead of moving it to HeapwalkingDebuggee.java. >>>>>>> It is okay to keep the call to it in the HeapwalkingDebuggee.java. >>>>>>> >>>>>>> Please, skip this suggestion as Debugger.java is not one of supers >>>>>>> of HeapwalkingDebuggee.java as I've assumed. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> +??? protected boolean isJFRActive() { >>>>>>> +??????? boolean isJFRActive = false; >>>>>>> +??????? ReferenceType referenceType = >>>>>>> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); >>>>>>> +??????? if (referenceType == null) >>>>>>> +?????????? throw new RuntimeException("Debugeee is not initialized >>>>>>> yet"); >>>>>>> + >>>>>>> +??????? Field isJFRActiveFld = >>>>>>> referenceType.fieldByName("isJFRActive"); >>>>>>> +??????? isJFRActive = >>>>>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >>>>>>> +??????? return isJFRActive; >>>>>>> ? ????? } >>>>>>> It is better to remove the line: >>>>>>> +??????? boolean isJFRActive = false; >>>>>>> and just change this one: >>>>>>> +??????? boolean isJFRActive = >>>>>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); >>>>>>> >>>>>>> Otherwise, it looks good to me. >>>>>>> I hope, it really works now. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> On 6/8/20 00:26, Fairoz Matte wrote: >>>>>>> Hi Serguei, Erik, >>>>>>> ? ? Thanks for the reviews, >>>>>>> Below webrev contains the suggested changes, >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ >>>>>>> ? ? The only thing I couldn?t do is to keep the local copy of >>>>>>> isJFRActive() in HeapwalkingDebugger, The method is called in >>>>>>> debugee code. >>>>>>> In debugger, we have access to debugee before test started or after >>>>>>> test completes. >>>>>>> isJFRActive() method need to be executed during the test execution. >>>>>>> Hence I didn?t find place to initialize and cannot make local copy. >>>>>>> ? ? Thanks, >>>>>>> Fairoz >>>>>>> ? ? From: Serguei Spitsyn >>>>>>> Sent: Tuesday, June 2, 2020 7:57 AM >>>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin >>>>>>> mailto:erik.gahlin at oracle.com >>>>>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>>> is incorrect and corresponsing logic seems to be broken >>>>>>> ? ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Fairoz, >>>>>>> >>>>>>> It looks okay in general. >>>>>>> But I'm not sure this check is going to work. >>>>>>> The problem is the HeapwalkingDebuggee.useStrictCheck method is >>>>>>> invoked in the context of the HeapwalkingDebugger process, not the >>>>>>> HeapwalkingDebuggee process. >>>>>>> >>>>>>> Probably, you wanted to get this bit of information from the >>>>>>> Debuggee process. >>>>>>> The debuggee has to evaluate it itself and store in some field. >>>>>>> The debugger should use the JDI to get this value from the >>>>>>> debuggee. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> I'm not sure, what exactly you wanted to do here. >>>>>>> It can occasionally work for you as long as both processes are run >>>>>>> with the same options. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> On 6/1/20 08:52, Fairoz Matte wrote: >>>>>>> Hi Erik, >>>>>>> ? ? Thanks for the review, below is the updated webrev. >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ >>>>>>> ? ? Thanks, >>>>>>> Fairoz >>>>>>> ? ? -----Original Message----- >>>>>>> From: Erik Gahlin >>>>>>> Sent: Monday, June 1, 2020 4:26 PM >>>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com >>>>>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>>> is incorrect and corresponsing logic seems to be broken >>>>>>> ? ? Hi Fairoz, >>>>>>> ? ? What I think you need to do is something like this: >>>>>>> ? ? ? ??????? if (className.equals("java.lang.Thread")) { >>>>>>> ? ? ??????????? return !isJfrInitialized(); >>>>>>> ? ? ??????? } >>>>>>> ? ? ... >>>>>>> ? ? ? ??? private static boolean isJfrInitialized() { >>>>>>> ? ? ??????? try { >>>>>>> ? ? ??????????? Class clazz = >>>>>>> Class.forName("jdk.jfr.FlightRecorder"); >>>>>>> ? ? ??????????? Method method = >>>>>>> clazz.getDeclaredMethod("isInitialized", >>>>>>> new Class[0]); >>>>>>> ? ? ??????????? return (boolean) method.invoke(null, new >>>>>>> Object[0]); >>>>>>> ? ? ??????? } catch (Exception e) { >>>>>>> ? ? ??????????? return false; >>>>>>> ? ? ??????? } >>>>>>> ? ? ??? } >>>>>>> ? ? Erik >>>>>>> ? ? On 2020-06-01 12:30, Fairoz Matte wrote: >>>>>>> Hi Erik, >>>>>>> ? ? Thanks for your quick response, >>>>>>> Below is the updated webrev to handle if jfr module is not present >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ >>>>>>> ? ? Thanks, >>>>>>> Fairoz >>>>>>> ? ? -----Original Message----- >>>>>>> From: Erik Gahlin >>>>>>> Sent: Monday, June 1, 2020 2:31 PM >>>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com >>>>>>> Cc: mailto:serviceability-dev at openjdk.java.net >>>>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() >>>>>>> is incorrect and corresponsing logic seems to be broken >>>>>>> ? ? Hi Fairoz, >>>>>>> ? ? If the test needs to run with builds where the JFR module is >>>>>>> not >>>>>>> present(?), you need to do the check using reflection. >>>>>>> ? ? If not, looks good. >>>>>>> ? ? Erik >>>>>>> ? ? On 1 Jun 2020, at 10:27, Fairoz Matte >>>>>>> mailto:fairoz.matte at oracle.com wrote: >>>>>>> ? ? Hi, >>>>>>> ? ? Please review this small test infra change to identify at >>>>>>> runtime the JFR is active or not. >>>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 >>>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ >>>>>>> ? ? Thanks, >>>>>>> Fairoz >>>>>>> > From serguei.spitsyn at oracle.com Wed Jun 10 17:54:04 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 10 Jun 2020 10:54:04 -0700 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: <321f0d20-3d71-7287-74d5-3d700c4f06bf@oracle.com> Hi Robbin, I like this variant and it looks good to me. Thanks, Serguei On 6/10/20 06:57, Robbin Ehn wrote: > Hi David and Serguei, (Dan feel free to chime in) > >> Honestly I think I'd like to see things reverted to the use of >> calling_thread as done for the VMOperation previously. We know it is >> functionally correct and it should also have the same performance >> profile. > > Done: > http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ > > Passes: hotspot jdi/jvmti testing, running mach5. > > I'll push tomorrow morning if test is ok and you all are happy (+- > nits). (and no objection to break the 24h rule) > I started this patch with reverting "8242425: JVMTI monitor operations > should use Thread-Local Handshakes". > And work my way forward. > > Thanks, Robbin > >> >> Thanks, >> David >> >>> Thanks, Robbin >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>> >>>>> Local testing of JDI/JVMTI and t1-5. >>>>> (no real crash so there is nothing to reproduce) >>>>> >>>>> Thanks, Robbin From robbin.ehn at oracle.com Wed Jun 10 18:47:28 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 10 Jun 2020 20:47:28 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <321f0d20-3d71-7287-74d5-3d700c4f06bf@oracle.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> <321f0d20-3d71-7287-74d5-3d700c4f06bf@oracle.com> Message-ID: Hi Serguei, Great, thanks! FYI: Now passed t1-5 (no new failures). /Robbin On 2020-06-10 19:54, serguei.spitsyn at oracle.com wrote: > Hi Robbin, > > I like this variant and it looks good to me. > > Thanks, > Serguei > > > On 6/10/20 06:57, Robbin Ehn wrote: >> Hi David and Serguei, (Dan feel free to chime in) >> >>> Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. >>> We know it is functionally correct and it should also have the same performance profile. >> >> Done: >> http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ >> >> Passes: hotspot jdi/jvmti testing, running mach5. >> >> I'll push tomorrow morning if test is ok and you all are happy (+- nits). (and no objection to break the 24h rule) >> I started this patch with reverting "8242425: JVMTI monitor operations should use Thread-Local Handshakes". >> And work my way forward. >> >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Issue: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>>> >>>>>> Local testing of JDI/JVMTI and t1-5. >>>>>> (no real crash so there is nothing to reproduce) >>>>>> >>>>>> Thanks, Robbin > From alexey.menkov at oracle.com Wed Jun 10 19:20:52 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Wed, 10 Jun 2020 12:20:52 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> Message-ID: <9e159e51-0901-c0a3-2c36-42dbd6696dc9@oracle.com> Hi Igor, On 06/09/2020 20:11, Igor Ignatyev wrote: > Hi Alex, > > as far as I can see, the caller just rethrows IOException as RuntimeException, so I don't think throwing IndexOutOfBoundsException would be much different, albeit it will be a bit more cryptic. yet given the content of /proc/sys/kernel/yama/ptrace_scope and /sys/fs/selinux/booleans/deny_ptrace is part of linux kernel contract, I doubt we will encounter IIOOBE in any reasonable setups. however, if you want I can check the length of bb arrays at L#171 and L#190 and throw an Error w/ message suggesting that something went completely wrong. Yes, the test still fails in the case, but if I see IndexOutOfBoundsException (or something similar) as a test failure reason, my first thought that this is the test issue. Could you please add the checks. --alex > > -- Igor > >> On Jun 9, 2020, at 6:36 PM, Alex Menkov wrote: >> >> Hi Igor, >> >> In SATestUtils.java you do >> >> var bb = ... Files.readAllBytes(...) ... >> and then use bb[0] >> >> if the file has 0 length, old code throws EOFException and new one will throw IndexOutOfBoundsException. >> And looks like the caller doesn't expect it (it catches IOException). >> >> --alex >> >> On 06/09/2020 16:47, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>> >>>> 38 lines changed: 8 ins; 16 del; 14 mod; >>> Hi all, >>> could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? >>> testing: test/hotspot/jtreg/serviceability >>> webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 >>> Thanks, >>> -- Igor > From igor.ignatyev at oracle.com Wed Jun 10 19:29:30 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 10 Jun 2020 12:29:30 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: <9e159e51-0901-c0a3-2c36-42dbd6696dc9@oracle.com> References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> <9e159e51-0901-c0a3-2c36-42dbd6696dc9@oracle.com> Message-ID: Hi Alex, sure, here is the incremental diff, so now Error is thrown if bb arrays are empty -- http://cr.openjdk.java.net/~iignatyev//8183040/webrev.0-1 Thanks, -- Igor > On Jun 10, 2020, at 12:20 PM, Alex Menkov wrote: > > Hi Igor, > > On 06/09/2020 20:11, Igor Ignatyev wrote: >> Hi Alex, >> as far as I can see, the caller just rethrows IOException as RuntimeException, so I don't think throwing IndexOutOfBoundsException would be much different, albeit it will be a bit more cryptic. yet given the content of /proc/sys/kernel/yama/ptrace_scope and /sys/fs/selinux/booleans/deny_ptrace is part of linux kernel contract, I doubt we will encounter IIOOBE in any reasonable setups. however, if you want I can check the length of bb arrays at L#171 and L#190 and throw an Error w/ message suggesting that something went completely wrong. > > Yes, the test still fails in the case, but if I see IndexOutOfBoundsException (or something similar) as a test failure reason, my first thought that this is the test issue. > Could you please add the checks. > > --alex > >> -- Igor >>> On Jun 9, 2020, at 6:36 PM, Alex Menkov wrote: >>> >>> Hi Igor, >>> >>> In SATestUtils.java you do >>> >>> var bb = ... Files.readAllBytes(...) ... >>> and then use bb[0] >>> >>> if the file has 0 length, old code throws EOFException and new one will throw IndexOutOfBoundsException. >>> And looks like the caller doesn't expect it (it catches IOException). >>> >>> --alex >>> >>> On 06/09/2020 16:47, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>>> >>>>> 38 lines changed: 8 ins; 16 del; 14 mod; >>>> Hi all, >>>> could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? >>>> testing: test/hotspot/jtreg/serviceability >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 >>>> Thanks, >>>> -- Igor From daniel.daugherty at oracle.com Wed Jun 10 19:59:56 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 10 Jun 2020 15:59:56 -0400 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: On 6/10/20 9:57 AM, Robbin Ehn wrote: > Hi David and Serguei, (Dan feel free to chime in) > >> Honestly I think I'd like to see things reverted to the use of >> calling_thread as done for the VMOperation previously. We know it is >> functionally correct and it should also have the same performance >> profile. > > Done: > http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ src/hotspot/share/prims/jvmtiEnvBase.hpp ??? No comments. src/hotspot/share/prims/jvmtiEnvBase.cpp ??? No comments. src/hotspot/share/prims/jvmtiEnv.cpp ??? L1248: ? JavaThread* calling_thread? = JavaThread::current(); ??? L1296: ? JavaThread* calling_thread? = JavaThread::current(); ??????? nit - please delete extra space before '='. Thumbs up. I like the switch back to use of calling_thread. Thanks! Dan > > Passes: hotspot jdi/jvmti testing, running mach5. > > I'll push tomorrow morning if test is ok and you all are happy (+- > nits). (and no objection to break the 24h rule) > I started this patch with reverting "8242425: JVMTI monitor operations > should use Thread-Local Handshakes". > And work my way forward. > > Thanks, Robbin > >> >> Thanks, >> David >> >>> Thanks, Robbin >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>> >>>>> Local testing of JDI/JVMTI and t1-5. >>>>> (no real crash so there is nothing to reproduce) >>>>> >>>>> Thanks, Robbin From alexey.menkov at oracle.com Wed Jun 10 20:05:01 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Wed, 10 Jun 2020 13:05:01 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> <9e159e51-0901-c0a3-2c36-42dbd6696dc9@oracle.com> Message-ID: <64f236d0-4ec1-8024-b3b9-e9f67141b4ad@oracle.com> Hi Igor, LGTM. --alex On 06/10/2020 12:29, Igor Ignatyev wrote: > Hi Alex, > > sure, here is the incremental diff, so now Error is thrown if bb arrays are empty -- http://cr.openjdk.java.net/~iignatyev//8183040/webrev.0-1 > > Thanks, > -- Igor > >> On Jun 10, 2020, at 12:20 PM, Alex Menkov wrote: >> >> Hi Igor, >> >> On 06/09/2020 20:11, Igor Ignatyev wrote: >>> Hi Alex, >>> as far as I can see, the caller just rethrows IOException as RuntimeException, so I don't think throwing IndexOutOfBoundsException would be much different, albeit it will be a bit more cryptic. yet given the content of /proc/sys/kernel/yama/ptrace_scope and /sys/fs/selinux/booleans/deny_ptrace is part of linux kernel contract, I doubt we will encounter IIOOBE in any reasonable setups. however, if you want I can check the length of bb arrays at L#171 and L#190 and throw an Error w/ message suggesting that something went completely wrong. >> >> Yes, the test still fails in the case, but if I see IndexOutOfBoundsException (or something similar) as a test failure reason, my first thought that this is the test issue. >> Could you please add the checks. >> >> --alex >> >>> -- Igor >>>> On Jun 9, 2020, at 6:36 PM, Alex Menkov wrote: >>>> >>>> Hi Igor, >>>> >>>> In SATestUtils.java you do >>>> >>>> var bb = ... Files.readAllBytes(...) ... >>>> and then use bb[0] >>>> >>>> if the file has 0 length, old code throws EOFException and new one will throw IndexOutOfBoundsException. >>>> And looks like the caller doesn't expect it (it catches IOException). >>>> >>>> --alex >>>> >>>> On 06/09/2020 16:47, Igor Ignatyev wrote: >>>>> http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>>>> >>>>>> 38 lines changed: 8 ins; 16 del; 14 mod; >>>>> Hi all, >>>>> could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? >>>>> testing: test/hotspot/jtreg/serviceability >>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 >>>>> Thanks, >>>>> -- Igor > From igor.ignatyev at oracle.com Wed Jun 10 20:19:51 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 10 Jun 2020 13:19:51 -0700 Subject: RFR(S) : 8183040 : update jdk/test/lib/Platform.java to use NIO file API In-Reply-To: <64f236d0-4ec1-8024-b3b9-e9f67141b4ad@oracle.com> References: <8CA6F4A9-7B57-4B70-9087-D1ACBF561714@oracle.com> <9e159e51-0901-c0a3-2c36-42dbd6696dc9@oracle.com> <64f236d0-4ec1-8024-b3b9-e9f67141b4ad@oracle.com> Message-ID: thanks Alex, pushed. -- Igor > On Jun 10, 2020, at 1:05 PM, Alex Menkov wrote: > > Hi Igor, > > LGTM. > > --alex > > On 06/10/2020 12:29, Igor Ignatyev wrote: >> Hi Alex, >> sure, here is the incremental diff, so now Error is thrown if bb arrays are empty -- http://cr.openjdk.java.net/~iignatyev//8183040/webrev.0-1 >> Thanks, >> -- Igor >>> On Jun 10, 2020, at 12:20 PM, Alex Menkov wrote: >>> >>> Hi Igor, >>> >>> On 06/09/2020 20:11, Igor Ignatyev wrote: >>>> Hi Alex, >>>> as far as I can see, the caller just rethrows IOException as RuntimeException, so I don't think throwing IndexOutOfBoundsException would be much different, albeit it will be a bit more cryptic. yet given the content of /proc/sys/kernel/yama/ptrace_scope and /sys/fs/selinux/booleans/deny_ptrace is part of linux kernel contract, I doubt we will encounter IIOOBE in any reasonable setups. however, if you want I can check the length of bb arrays at L#171 and L#190 and throw an Error w/ message suggesting that something went completely wrong. >>> >>> Yes, the test still fails in the case, but if I see IndexOutOfBoundsException (or something similar) as a test failure reason, my first thought that this is the test issue. >>> Could you please add the checks. >>> >>> --alex >>> >>>> -- Igor >>>>> On Jun 9, 2020, at 6:36 PM, Alex Menkov wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> In SATestUtils.java you do >>>>> >>>>> var bb = ... Files.readAllBytes(...) ... >>>>> and then use bb[0] >>>>> >>>>> if the file has 0 length, old code throws EOFException and new one will throw IndexOutOfBoundsException. >>>>> And looks like the caller doesn't expect it (it catches IOException). >>>>> >>>>> --alex >>>>> >>>>> On 06/09/2020 16:47, Igor Ignatyev wrote: >>>>>> http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>>>>> >>>>>>> 38 lines changed: 8 ins; 16 del; 14 mod; >>>>>> Hi all, >>>>>> could you please review this small clean up of testlibrary classes which updates j.t.lib.Platform and j.t.l.SA.SATestUtils (as it now contains the methods which 8183040 was about) to use NIO file API? >>>>>> testing: test/hotspot/jtreg/serviceability >>>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8183040/webrev.00 >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8183040 >>>>>> Thanks, >>>>>> -- Igor From daniel.daugherty at oracle.com Wed Jun 10 20:57:11 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 10 Jun 2020 16:57:11 -0400 Subject: RFR(XS): 8222005: ClassRedefinition crashes with: guarantee(false) failed: OLD and/or OBSOLETE method(s) found In-Reply-To: <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> References: <5942b42c-b9b3-f1d4-6c13-774649fca32b@oracle.com> <2f9aa92c-18f5-1203-1523-3c1fd9ba9ad1@oracle.com> <52ba0f0f-a705-2043-1c1d-15ba4a441aba@oracle.com> <31ca58d7-99ac-c53d-461f-680461fb5698@oracle.com> <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> Message-ID: <6bb2733a-36f8-567b-e39f-30e2d5c3c962@oracle.com> Hi Serguei, Sorry for the late review... On 5/28/20 7:16 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > The updated webrev version is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.3/ src/hotspot/share/oops/cpCache.cpp ??? nit - please update copyright year before you push ??? L570: ? log_trace(redefine, class, update, constantpool) ??? L571: ??????? ("cpc %s entry update: %s", entry_type, new_method->external_name()); ??????? nit - The continued line indent on the other line you touched in this ?????????????? file is two spaces. This one is six... Not your bug, but can you ?????????????? fix it while you are here? ??? L816: ??????? ("cpcache check found old method entry: class: %s, old: %d, obsolete: %d, method: %s\n", ??????? nit - I don't think you want the "\n" here. src/hotspot/share/oops/klassVtable.cpp ??? L1004: ??????? ("vtable check found old method entry: class: %s old: %d obsolete: %d, %s\n", ??? L1319: ??????? ("itable check found old method entry: class: %s old: %d obsolete: %d, %s\n", ??????? nit - I don't think you want the "\n" here. ?? ? ?? In the new log_trace() call in cpCache.cpp, you include a ?? ? ?? label for the method output: ??????? L816: ??????? ("cpcache check found old method entry: class: %s, old: %d, obsolete: %d, method: %s\n", ??????? but you don't here. I think you should. src/hotspot/share/prims/jvmtiRedefineClasses.cpp ??? L74: // this flag is global as the constructor does not reset it ??????? nit - Please s/this/This/ and add a ':' to the end. ??? old L3586: ??? if (!_has_null_class_loader && ik->class_loader() == NULL) { ??? old L3587: ????? return; ??????? This optimization has been here for a long time! Thanks for the ??????? explanation in "3) Optimization based on the flag _has_null_class_loader" ??????? below... I'm probably the one that got that wrong so long ago... ??? L3601: ??? // and needs cpchache method entries adjusted. For simplicity, the cpcache ??????? typo - s/cpchache/cpcache/ ??? old L3616: ??? if (!ik->is_being_redefined()) { ??????? Nice explanation on L3599-3604 for why this optimization is ??????? not a good idea. Thumbs up! I only have nits and typos above. I don't need to see another webrev. Dan > > It has your suggestions addressed: > ?- remove log_is_enabled conditions > ?- move ResourceMark's out of loops > > Thanks, > Serguei > > > On 5/28/20 14:44, serguei.spitsyn at oracle.com wrote: >> Hi Coleen, >> >> Thank you a lot for reviewing this! >> >> >> On 5/28/20 12:48, coleen.phillimore at oracle.com wrote: >>> Hi Serguei, >>> Sorry for the delay reviewing this again. >>> >>> On 5/18/20 3:30 AM, serguei.spitsyn at oracle.com wrote: >>>> Hi Coleen and potential reviewers, >>>> >>>> Now, the webrev: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/ >>>> >>>> has a complete fix for all three failure modes related to the >>>> guarantee about OLD and OBSOLETE methods. >>>> >>>> The root cause are the following optimizations: >>>> >>>> ?1) Optimization based on the flag ik->is_being_redefined(): >>>> ??? The problem is that the cpcache method entries of such classes >>>> are not being adjusted. >>>> ??? It is explained below in the initial RFR summary. >>>> ??? The fix is to get rid of this optimization. >>> >>> This seems like a good thing to do even though (actually especially >>> because) I can't re-imagine the logic that went into this optimization. >> >> Probably, I've not explained it well enough. >> The logic was that the class marked as is_being_redefined was >> considered as being redefined in the current redefinition operation. >> For classes redefined in current redefinition the cpcache is empty, >> so there is? nothing to adjust. >> The problem is that classes can be marked as is_being_redefined by >> doit_prologue of one of the following redefinition operations. >> In such a case, the VM_RedefineClasses::CheckClass::do_klass fails >> with this guarantee. >> It is because the VM_RedefineClasses::CheckClass::do_klass does not >> have this optimization >> and does not skip such classes as the >> VM_RedefineClasses::AdjustAndCleanMetadata::do_class. >> Without this catch this issue could have unknown consequences in the >> future execution far away from the root cause. >> >>>> >>>> ?2) Optimization for array classes based on the flag >>>> _has_redefined_Object. >>>> ??? The problem is that the vtable method entries are not adjusted >>>> for array classes. >>>> ??? The array classes have to be adjusted even if the >>>> java.lang.Object was redefined >>>> ??? by one of previous VM_RedefineClasses operation, not only if it >>>> was redefined in >>>> ??? the current VM_RedefineClasses operation. The fix is is follow >>>> this requirement. >>> >>> This I can't understand.? The redefinitions are serialized in >>> safepoints, so why would you need to replace vtable entries for >>> arrays if java.lang.Object isn't redefined in this safepoint? >> The VM_RedefineClasses::CheckClass::do_klass fails with the same >> guarantee because of this. >> It never fails this way with this optimization relaxed. >> I've already broke my head trying to understand it. >> It can be because of another bug we don't know yet. >> >>>> >>>> ?3) Optimization based on the flag _has_null_class_loader which >>>> assumes that the Hotspot >>>> ??? does not support delegation from the bootstrap class loader to >>>> auser-defined class >>>> ? ? loader.The assumption is that if the current class being >>>> redefined has a user-defined >>>> ??? classloader as its defining class loader, then allclasses >>>> loaded by the bootstrap >>>> ? ? class loader can be skipped for vtable/itable method entries >>>> adjustment. >>>> ??? The problem is that this assumption is not really correct. >>>> There are classes that >>>> ??? still need the adjustment. For instance, the class >>>> java.util.IdentityHashMap$KeyIterator >>>> ??? loaded by the bootstrap class loader has the vtable/itable >>>> references to the method: >>>> java.util.Iterator.forEachRemaining(java.util.function.Consumer) >>>> ??? The class java.util.Iterator is defined by a user-defined class >>>> loader. >>>> ??? The fix is to get rid of this optimization. >>> >>> Also with this optimization, I'm not sure what the logic was that >>> determined that this was safe, so it's best to remove it.? Above >>> makes sense. >> >> I don't know the full theory behind this optimization. We only have a >> comment. >> >> >>>> All three failure modes are observed with the -Xcomp flag. >>>> With all three fixes above in place, the Kitchensink does not fail >>>> with this guarantee anymore. >>> >>> >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/src/hotspot/share/oops/cpCache.cpp.udiff.html >>> >>> For logging, the log_trace function will also repeat the 'if' >>> statement and not allocate the external_name() if logging isn't >>> specified, so you don't need the 'if' statement above. >>> >>> + if (log_is_enabled(Trace, redefine, class, update)) { >>> + log_trace(redefine, class, update, constantpool) >>> + ("cpc %s entry update: %s", entry_type, new_method->external_name()); >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/src/hotspot/share/oops/klassVtable.cpp.udiff.html >>> >>> Same in two cases here, and you could move the ResourceMark outside >>> the loop at the top. >> >> Good suggestions, taken. >> >> Thanks! >> Serguei >> >>> >>> Thanks, >>> Coleen >>>> >>>> There is still a JIT compiler relted failure: >>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>> ??? Kitchensink fails with: assert(destination == (address)-1 || >>>> destination == entry) failed: b) MT-unsafe modification of inline cache >>>> >>>> I also saw this failure but just once: >>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>> ??? Kitchensink fails with: assert(!method->is_old()) failed: >>>> Should not be installing old methods >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 5/15/20 15:14, serguei.spitsyn at oracle.com wrote: >>>>> Hi Coleen, >>>>> >>>>> Thanks a lot for review! >>>>> Good suggestion, will use it. >>>>> >>>>> In fact, I've found two more related problems with the same guarantee. >>>>> One is with vtable method entries adjustment and another with itable. >>>>> This webrev version includes a fix for the vtable related issue: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.2/ >>>>> >>>>> I'm still investigating the itable related issue. >>>>> >>>>> It is interesting that the Kitchensink with Instrumentation >>>>> modules enabled is like a Pandora box full of surprises. >>>>> New problems are getting discovered after some road blocks are >>>>> removed. >>>>> I've just filed a couple of compiler bugs discovered in this mode >>>>> of testing: >>>>> https://bugs.openjdk.java.net/browse/JDK-8245126 >>>>> ??? Kitchensink fails with: assert(!method->is_old()) failed: >>>>> Should not be installing old methods >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8245128 >>>>> ??? Kitchensink fails with: assert(destination == (address)-1 || >>>>> destination == entry) failed: b) MT-unsafe modification of inline >>>>> cache >>>>> >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 5/15/20 05:12, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> Serguei, >>>>>> >>>>>> Good find!!? The fix looks good.? I'm sure the optimization >>>>>> wasn't noticeable and thank you for the additional comments. >>>>>> >>>>>> There is a Method::external_name() function that I believe prints >>>>>> all the things you want in the logging here: >>>>>> >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.1/src/hotspot/share/oops/cpCache.cpp.udiff.html >>>>>> >>>>>> I don't need to see another webrev if you make this change. >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>>> >>>>>> On 5/14/20 12:26 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Please, review a fix for The Kitchensink bug: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222005 >>>>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/jvmti-redef.1/ >>>>>>> >>>>>>> Summary: >>>>>>> ? The VM_RedefineClasses::doit() uses two helper classes to walk >>>>>>> all VM classes. >>>>>>> ? First is AdjustAndCleanMetadata to adjust method entries in >>>>>>> the vtables/itables/cpcaches. >>>>>>> ? Second is CheckClass to check that adjustments for all method >>>>>>> entries are correct. >>>>>>> ? The Kitchensink test is failing with two modes: >>>>>>> ??? - guarantee(false) failed: OLD and/or OBSOLETE method(s) >>>>>>> found in the >>>>>>> ????? VM_RedefineClasses::CheckClass::do_klass() >>>>>>> ??? - SIGSEGV in the >>>>>>> ConstantPoolCacheEntry::get_interesting_method_entry() in context >>>>>>> ????? of VM_RedefineClasses::CheckClass::do_klass() execution >>>>>>> >>>>>>> ? The second failure mode is rare. In is before the first one in >>>>>>> the code path. >>>>>>> ? The root cause of both is that the >>>>>>> VM_RedefineClasses::AdjustAndCleanMetadata::do_klass() >>>>>>> ? is skipping the cpcache update for classes that are being >>>>>>> redefined assuming they are >>>>>>> ? being redefined by the current VM_RedefineClasses operation. >>>>>>> In such cases, the adjustment >>>>>>> ? is not needed as the cpcache is empty. The problem is that the >>>>>>> assumption above is wrong. >>>>>>> ? The class can also be redefined by another VM_RedefineClasses >>>>>>> operation which has already >>>>>>> ? executed its doit_prologue. The cpcache djustment for such >>>>>>> class is necessary. >>>>>>> ? The fix is to always call the >>>>>>> cp_cache->adjust_method_entries() even if the class is >>>>>>> ? being redefined by the current VM_RedefineClasses operation. >>>>>>> It is possible to skip it >>>>>>> ? but it will add extra complexity to the code. >>>>>>> ? The fix also includes minor tweak in the cpCache.cpp to >>>>>>> include method's class name to >>>>>>> ? the redefinition cpcache log. >>>>>>> >>>>>>> Testing: >>>>>>> ? Ran Kitchensink test locally on a Linux server with the >>>>>>> Instrumentation module enabled. >>>>>>> ? The test does not fail anymore. >>>>>>> ? In progress, a mach5 tiers 1-5 and runs and separate mach5 >>>>>>> Kitchensink run. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.schmelter at sap.com Wed Jun 10 21:00:15 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Wed, 10 Jun 2020 21:00:15 +0000 Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to "Multiple garbage collectors selected" Message-ID: Hi, https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which did not properly protect against explicitly set GCs (for serial, parallel and G1 GC). This fixes it by adding the corresponding @requires tag for each of the three GCs. bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362 webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/ Best regards, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Wed Jun 10 21:06:43 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 10 Jun 2020 17:06:43 -0400 Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to "Multiple garbage collectors selected" In-Reply-To: References: Message-ID: <708c0e2c-7838-0cfb-ea1b-1de5ae43a830@oracle.com> Hi Ralf, This looks correct to me, but please wait for one of the GC folks to chime in on this thread... Dan On 6/10/20 5:00 PM, Schmelter, Ralf wrote: > Hi, > > https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which did not properly protect against explicitly set GCs (for serial, parallel and G1 GC). This fixes it by adding the corresponding @requires tag for each of the three GCs. > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362 > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/ > > Best regards, > Ralf > From serguei.spitsyn at oracle.com Wed Jun 10 21:20:01 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 10 Jun 2020 14:20:01 -0700 Subject: RFR(XS): 8222005: ClassRedefinition crashes with: guarantee(false) failed: OLD and/or OBSOLETE method(s) found In-Reply-To: <6bb2733a-36f8-567b-e39f-30e2d5c3c962@oracle.com> References: <5942b42c-b9b3-f1d4-6c13-774649fca32b@oracle.com> <2f9aa92c-18f5-1203-1523-3c1fd9ba9ad1@oracle.com> <52ba0f0f-a705-2043-1c1d-15ba4a441aba@oracle.com> <31ca58d7-99ac-c53d-461f-680461fb5698@oracle.com> <9b75fa4e-f579-e4a7-7996-bc307d001972@oracle.com> <3a497901-7a05-e87a-33e6-6f1011c32b8b@oracle.com> <6bb2733a-36f8-567b-e39f-30e2d5c3c962@oracle.com> Message-ID: <0e64c496-75e1-db75-3722-16af99366565@oracle.com> An HTML attachment was scrubbed... URL: From stefan.karlsson at oracle.com Wed Jun 10 21:32:35 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 10 Jun 2020 23:32:35 +0200 Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to "Multiple garbage collectors selected" In-Reply-To: References: Message-ID: Looks good. StefanK On 2020-06-10 23:00, Schmelter, Ralf wrote: > > Hi, > > https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which > did not properly protect against explicitly set GCs (for serial, > parallel and G1 GC). This fixes it by adding the corresponding > @requires tag for each of the three GCs. > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362 > > > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/ > > Best regards, > > Ralf > From ralf.schmelter at sap.com Wed Jun 10 21:38:33 2020 From: ralf.schmelter at sap.com (Schmelter, Ralf) Date: Wed, 10 Jun 2020 21:38:33 +0000 Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to "Multiple garbage collectors selected" In-Reply-To: References: Message-ID: Hi Stefan and Daniel, Thanks for reviewing. I will push this change if there are no further concerns. Best regards, Ralf -----Original Message----- From: Stefan Karlsson Sent: Wednesday, 10 June 2020 23:33 To: Schmelter, Ralf ; serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime ; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to "Multiple garbage collectors selected" Looks good. StefanK On 2020-06-10 23:00, Schmelter, Ralf wrote: > > Hi, > > https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which > did not properly protect against explicitly set GCs (for serial, > parallel and G1 GC). This fixes it by adding the corresponding > @requires tag for each of the three GCs. > > bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362 > > > webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/ > > Best regards, > > Ralf > From suenaga at oss.nttdata.com Thu Jun 11 01:36:38 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 11 Jun 2020 10:36:38 +0900 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: <5eb74cf3-1e04-587e-e564-63f43998b2d3@oss.nttdata.com> Hi Robbin, Thanks for catch up this issue. It looks good to me. Yasumasa On 2020/06/10 22:57, Robbin Ehn wrote: > Hi David and Serguei, (Dan feel free to chime in) > >> Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. We know it is functionally correct and it should also have the same performance profile. > > Done: > http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ > > Passes: hotspot jdi/jvmti testing, running mach5. > > I'll push tomorrow morning if test is ok and you all are happy (+- nits). (and no objection to break the 24h rule) > I started this patch with reverting "8242425: JVMTI monitor operations should use Thread-Local Handshakes". > And work my way forward. > > Thanks, Robbin > >> >> Thanks, >> David >> >>> Thanks, Robbin >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>> >>>>> Local testing of JDI/JVMTI and t1-5. >>>>> (no real crash so there is nothing to reproduce) >>>>> >>>>> Thanks, Robbin From david.holmes at oracle.com Thu Jun 11 02:01:23 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 11 Jun 2020 12:01:23 +1000 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: Looks good! Thanks for making the change. On a positive note I think this would now show that the conversion from VMop to direct handshake was actually much simpler than might have been thought. :) I'm not sure when the repo fork is happening, so am unclear whether this will head to the right repo in time. :) Thanks, David ----- On 10/06/2020 11:57 pm, Robbin Ehn wrote: > Hi David and Serguei, (Dan feel free to chime in) > >> Honestly I think I'd like to see things reverted to the use of >> calling_thread as done for the VMOperation previously. We know it is >> functionally correct and it should also have the same performance >> profile. > > Done: > http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ > > Passes: hotspot jdi/jvmti testing, running mach5. > > I'll push tomorrow morning if test is ok and you all are happy (+- > nits). (and no objection to break the 24h rule) > I started this patch with reverting "8242425: JVMTI monitor operations > should use Thread-Local Handshakes". > And work my way forward. > > Thanks, Robbin > >> >> Thanks, >> David >> >>> Thanks, Robbin >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>> >>>>> Local testing of JDI/JVMTI and t1-5. >>>>> (no real crash so there is nothing to reproduce) >>>>> >>>>> Thanks, Robbin From fairoz.matte at oracle.com Thu Jun 11 04:59:00 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Wed, 10 Jun 2020 21:59:00 -0700 (PDT) Subject: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect and corresponsing logic seems to be broken In-Reply-To: References: <6fdf54ba-1edf-4dbf-b21d-2d799a13ffda@default> <9D5BA6C8-8D49-47CE-9B2D-C7E0C6A9723F@oracle.com> <6e55941c-142b-df7b-f6df-c87aacb0e4b3@oracle.com> <0e88ac93-83f1-abdc-02e3-191e1c5026aa@oracle.com> <5b83a8b3-f877-4f72-9188-97e3de6f986b@default> <18e88a44-8d62-8752-834c-dce436e8ebc4@oracle.com> <5adafff1-0da9-8bf2-4b27-5b00b5c48526@oracle.com> <42fdf66e-b097-47e4-9062-391e9b43968c@default> <85de2bb0-94a0-5dbd-f29f-0a9c96f12579@oracle.com> <5c50b83d-964a-d74d-d7d7-77d7b348d533@oracle.com> <3fe586cf-d485-4738-a432-92d3d9aa52da@default> <7aceddb3-1984-32b8-d0ae-8054a59dbcee@oracle.com> <073df252-52d4-3c1b-ccfd-82fae69e363c@oracle.com> Message-ID: Thanks Serguei, Leonid for the reviews. Thanks, Fairoz > -----Original Message----- > From: Leonid Mesnik > Sent: Wednesday, June 10, 2020 10:19 PM > To: Fairoz Matte > Cc: Serguei Spitsyn ; Erik Gahlin > ; serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() is incorrect > and corresponsing logic seems to be broken > > Looks good, no other webrev is needed. > > Leonid > > On 6/10/20 12:28 AM, serguei.spitsyn at oracle.com wrote: > > > > On 6/9/20 23:35, Fairoz Matte wrote: > >> Hi Serguei, > >> > >> Thanks for the clarification. > >> I will work on to? move isJFRActive () method from the > >> TestDebuggerType2 to HeapWalkingDebugger > > > > Probably, there is no need in another webrev if you move it. > > But you did not get a final thumbs up from Leonid yet. > > > > Thanks, > > Serguei > > > >> Thanks, > >> Fairoz > >> > >>> -----Original Message----- > >>> From: Serguei Spitsyn > >>> Sent: Wednesday, June 10, 2020 11:42 AM > >>> To: Fairoz Matte ; Leonid Mesnik > >>> ; Erik Gahlin > >>> Cc: serviceability-dev at openjdk.java.net > >>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() > >>> is incorrect > >>> and corresponsing logic seems to be broken > >>> > >>> Hi Fairoz, > >>> > >>> It is confusing there is methods with the same name isJFRActive on > >>> both debuggee and debugger side. > >>> Leonid is talking about the isJFRActive that belongs to the debugger. > >>> He suggests to move this method from the TestDebuggerType2 to > >>> HeapWalkingDebugger. > >>> The reason is the HeapWalkingDebugger should have a knowledge about > >>> the HeapWalkingDebuggee, not its super class TestDebuggerType2. > >>> It looks like a good suggestion to me. > >>> > >>> Thanks, > >>> Serguei > >>> > >>> > >>> On 6/9/20 23:00, Fairoz Matte wrote: > >>>> Hi Leonid, > >>>> > >>>> The call isJFRActive() need to be executed on HeapwalkingDebuggee > >>>> side. > >>>> This is what my understanding is. > >>>> > >>>> Thanks, > >>>> Fairoz > >>>> > >>>>> -----Original Message----- > >>>>> From: Leonid Mesnik > >>>>> Sent: Wednesday, June 10, 2020 1:16 AM > >>>>> To: Serguei Spitsyn ; Fairoz Matte > >>>>> ; Erik Gahlin > >>>>> Cc: serviceability-dev at openjdk.java.net > >>>>> Subject: Re: RFR(s): 8243451: nsk.share.jdi.Debugee.isJFR_active() > >>>>> is incorrect and corresponsing logic seems to be broken > >>>>> > >>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/test/hotspot/ > >>>>> jtr eg/vmT estbase/nsk/share/jdi/TestDebuggerType2.java.udiff.html > >>>>> > >>>>> I see that isJFRActive() depends on > >>>>> "nsk.share.jdi.HeapwalkingDebuggee". > >>>>> It is not going to work of debugee is not > >>> "nsk.share.jdi.HeapwalkingDebuggee". > >>>>> Shouldn't it be placed in HeapWalkingDebugger? > >>>>> > >>>>> Leonid > >>>>> > >>>>> On 6/8/20 9:26 PM, serguei.spitsyn at oracle.com wrote: > >>>>>> Hi Fairoz, > >>>>>> > >>>>>> LGTM. > >>>>>> > >>>>>> Thanks, > >>>>>> Serguei > >>>>>> > >>>>>> > >>>>>> On 6/8/20 21:20, Fairoz Matte wrote: > >>>>>>> Hi Serguei, > >>>>>>> > >>>>>>> Thanks for the clarifications, > >>>>>>> I have incorporated the 2nd suggestion, below is the webrev, > >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.09/ > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Fairoz > >>>>>>> > >>>>>>> From: Serguei Spitsyn > >>>>>>> Sent: Monday, June 8, 2020 10:34 PM > >>>>>>> To: Fairoz Matte ; Erik Gahlin > >>>>>>> > >>>>>>> Cc: serviceability-dev at openjdk.java.net > >>>>>>> Subject: Re: RFR(s): 8243451: > >>>>>>> nsk.share.jdi.Debugee.isJFR_active() > >>>>>>> is incorrect and corresponsing logic seems to be broken > >>>>>>> > >>>>>>> Hi Fairoz, > >>>>>>> > >>>>>>> > >>>>>>> On 6/8/20 02:08, mailto:serguei.spitsyn at oracle.com wrote: > >>>>>>> Hi Fairoz, > >>>>>>> > >>>>>>> There are two different isJFRActive() methods, one is on > >>>>>>> debuggee side and another on the debugger side. > >>>>>>> The one on debuggee side is better to keep in Debuggee.java > >>>>>>> (where it was before) instead of moving it to > HeapwalkingDebuggee.java. > >>>>>>> It is okay to keep the call to it in the HeapwalkingDebuggee.java. > >>>>>>> > >>>>>>> Please, skip this suggestion as Debugger.java is not one of > >>>>>>> supers of HeapwalkingDebuggee.java as I've assumed. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Serguei > >>>>>>> > >>>>>>> > >>>>>>> +??? protected boolean isJFRActive() { > >>>>>>> +??????? boolean isJFRActive = false; > >>>>>>> +??????? ReferenceType referenceType = > >>>>>>> debuggee.classByName("nsk.share.jdi.HeapwalkingDebuggee"); > >>>>>>> +??????? if (referenceType == null) > >>>>>>> +?????????? throw new RuntimeException("Debugeee is not > >>>>>>> +initialized > >>>>>>> yet"); > >>>>>>> + > >>>>>>> +??????? Field isJFRActiveFld = > >>>>>>> referenceType.fieldByName("isJFRActive"); > >>>>>>> +??????? isJFRActive = > >>>>>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > >>>>>>> +??????? return isJFRActive; > >>>>>>> ? ????? } > >>>>>>> It is better to remove the line: > >>>>>>> +??????? boolean isJFRActive = false; > >>>>>>> and just change this one: > >>>>>>> +??????? boolean isJFRActive = > >>>>>>> ((BooleanValue)referenceType.getValue(isJFRActiveFld)).value(); > >>>>>>> > >>>>>>> Otherwise, it looks good to me. > >>>>>>> I hope, it really works now. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Serguei > >>>>>>> > >>>>>>> On 6/8/20 00:26, Fairoz Matte wrote: > >>>>>>> Hi Serguei, Erik, > >>>>>>> ? ? Thanks for the reviews, > >>>>>>> Below webrev contains the suggested changes, > >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.08/ > >>>>>>> ? ? The only thing I couldn?t do is to keep the local copy of > >>>>>>> isJFRActive() in HeapwalkingDebugger, The method is called in > >>>>>>> debugee code. > >>>>>>> In debugger, we have access to debugee before test started or > >>>>>>> after test completes. > >>>>>>> isJFRActive() method need to be executed during the test execution. > >>>>>>> Hence I didn?t find place to initialize and cannot make local copy. > >>>>>>> ? ? Thanks, > >>>>>>> Fairoz > >>>>>>> ? ? From: Serguei Spitsyn > >>>>>>> Sent: Tuesday, June 2, 2020 7:57 AM > >>>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com; Erik Gahlin > >>>>>>> mailto:erik.gahlin at oracle.com > >>>>>>> Cc: mailto:serviceability-dev at openjdk.java.net > >>>>>>> Subject: Re: RFR(s): 8243451: > >>>>>>> nsk.share.jdi.Debugee.isJFR_active() > >>>>>>> is incorrect and corresponsing logic seems to be broken > >>>>>>> ? ? On 6/1/20 12:30, mailto:serguei.spitsyn at oracle.com wrote: > >>>>>>> Hi Fairoz, > >>>>>>> > >>>>>>> It looks okay in general. > >>>>>>> But I'm not sure this check is going to work. > >>>>>>> The problem is the HeapwalkingDebuggee.useStrictCheck method is > >>>>>>> invoked in the context of the HeapwalkingDebugger process, not > >>>>>>> the HeapwalkingDebuggee process. > >>>>>>> > >>>>>>> Probably, you wanted to get this bit of information from the > >>>>>>> Debuggee process. > >>>>>>> The debuggee has to evaluate it itself and store in some field. > >>>>>>> The debugger should use the JDI to get this value from the > >>>>>>> debuggee. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Serguei > >>>>>>> > >>>>>>> I'm not sure, what exactly you wanted to do here. > >>>>>>> It can occasionally work for you as long as both processes are > >>>>>>> run with the same options. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Serguei > >>>>>>> > >>>>>>> > >>>>>>> On 6/1/20 08:52, Fairoz Matte wrote: > >>>>>>> Hi Erik, > >>>>>>> ? ? Thanks for the review, below is the updated webrev. > >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.02/ > >>>>>>> ? ? Thanks, > >>>>>>> Fairoz > >>>>>>> ? ? -----Original Message----- > >>>>>>> From: Erik Gahlin > >>>>>>> Sent: Monday, June 1, 2020 4:26 PM > >>>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com > >>>>>>> Cc: mailto:serviceability-dev at openjdk.java.net > >>>>>>> Subject: Re: RFR(s): 8243451: > >>>>>>> nsk.share.jdi.Debugee.isJFR_active() > >>>>>>> is incorrect and corresponsing logic seems to be broken > >>>>>>> ? ? Hi Fairoz, > >>>>>>> ? ? What I think you need to do is something like this: > >>>>>>> ? ? ? ??????? if (className.equals("java.lang.Thread")) { > >>>>>>> ? ? ??????????? return !isJfrInitialized(); > >>>>>>> ? ? ??????? } > >>>>>>> ? ? ... > >>>>>>> ? ? ? ??? private static boolean isJfrInitialized() { > >>>>>>> ? ? ??????? try { > >>>>>>> ? ? ??????????? Class clazz = > >>>>>>> Class.forName("jdk.jfr.FlightRecorder"); > >>>>>>> ? ? ??????????? Method method = > >>>>>>> clazz.getDeclaredMethod("isInitialized", > >>>>>>> new Class[0]); > >>>>>>> ? ? ??????????? return (boolean) method.invoke(null, new > >>>>>>> Object[0]); > >>>>>>> ? ? ??????? } catch (Exception e) { > >>>>>>> ? ? ??????????? return false; > >>>>>>> ? ? ??????? } > >>>>>>> ? ? ??? } > >>>>>>> ? ? Erik > >>>>>>> ? ? On 2020-06-01 12:30, Fairoz Matte wrote: > >>>>>>> Hi Erik, > >>>>>>> ? ? Thanks for your quick response, Below is the updated webrev > >>>>>>> to handle if jfr module is not present > >>>>>>> http://cr.openjdk.java.net/~fmatte/8243451/webrev.01/ > >>>>>>> ? ? Thanks, > >>>>>>> Fairoz > >>>>>>> ? ? -----Original Message----- > >>>>>>> From: Erik Gahlin > >>>>>>> Sent: Monday, June 1, 2020 2:31 PM > >>>>>>> To: Fairoz Matte mailto:fairoz.matte at oracle.com > >>>>>>> Cc: mailto:serviceability-dev at openjdk.java.net > >>>>>>> Subject: Re: RFR(s): 8243451: > >>>>>>> nsk.share.jdi.Debugee.isJFR_active() > >>>>>>> is incorrect and corresponsing logic seems to be broken > >>>>>>> ? ? Hi Fairoz, > >>>>>>> ? ? If the test needs to run with builds where the JFR module is > >>>>>>> not present(?), you need to do the check using reflection. > >>>>>>> ? ? If not, looks good. > >>>>>>> ? ? Erik > >>>>>>> ? ? On 1 Jun 2020, at 10:27, Fairoz Matte > >>>>>>> mailto:fairoz.matte at oracle.com wrote: > >>>>>>> ? ? Hi, > >>>>>>> ? ? Please review this small test infra change to identify at > >>>>>>> runtime the JFR is active or not. > >>>>>>> JBS - https://bugs.openjdk.java.net/browse/JDK-8243451 > >>>>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8243451/webrev.00/ > >>>>>>> ? ? Thanks, > >>>>>>> Fairoz > >>>>>>> > > From leonid.mesnik at oracle.com Thu Jun 11 05:28:59 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Wed, 10 Jun 2020 22:28:59 -0700 Subject: RFR: 8244965: Incorrect error message in vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java Message-ID: Hi Could you please review following trivial fix which just correct error message (thread name) in a couple of tests. I grepped other jdi tests but didn't found similar issues. webrev: http://cr.openjdk.java.net/~lmesnik/8244965/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8244965 Leonid -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Thu Jun 11 05:44:26 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 11 Jun 2020 15:44:26 +1000 Subject: RFR: 8244965: Incorrect error message in vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java In-Reply-To: References: Message-ID: <4f37204f-f26f-3cb3-d622-5d0635ec5666@oracle.com> Hi Leonid, Looks good and trivial. Don't forget to update copyright year. Thanks, David On 11/06/2020 3:28 pm, Leonid Mesnik wrote: > Hi > > Could you please review following trivial fix which just correct error > message (thread name) in a couple of tests. > I grepped other jdi tests but didn't found similar issues. > > webrev: http://cr.openjdk.java.net/~lmesnik/8244965/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8244965 > > Leonid From robbin.ehn at oracle.com Thu Jun 11 07:35:45 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 11 Jun 2020 09:35:45 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: <16fff639-175b-2a6c-3578-7c4b116d320b@oracle.com> Hi Dan, fixed, thanks! /Robbin On 2020-06-10 21:59, Daniel D. Daugherty wrote: > On 6/10/20 9:57 AM, Robbin Ehn wrote: >> Hi David and Serguei, (Dan feel free to chime in) >> >>> Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. >>> We know it is functionally correct and it should also have the same performance profile. >> >> Done: >> http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ > > src/hotspot/share/prims/jvmtiEnvBase.hpp > ??? No comments. > > src/hotspot/share/prims/jvmtiEnvBase.cpp > ??? No comments. > > src/hotspot/share/prims/jvmtiEnv.cpp > ??? L1248: ? JavaThread* calling_thread? = JavaThread::current(); > ??? L1296: ? JavaThread* calling_thread? = JavaThread::current(); > ??????? nit - please delete extra space before '='. > > Thumbs up. I like the switch back to use of calling_thread. Thanks! > > Dan > > >> >> Passes: hotspot jdi/jvmti testing, running mach5. >> >> I'll push tomorrow morning if test is ok and you all are happy (+- nits). (and no objection to break the 24h rule) >> I started this patch with reverting "8242425: JVMTI monitor operations should use Thread-Local Handshakes". >> And work my way forward. >> >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Issue: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>>> >>>>>> Local testing of JDI/JVMTI and t1-5. >>>>>> (no real crash so there is nothing to reproduce) >>>>>> >>>>>> Thanks, Robbin > From robbin.ehn at oracle.com Thu Jun 11 07:36:31 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 11 Jun 2020 09:36:31 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: <5eb74cf3-1e04-587e-e564-63f43998b2d3@oss.nttdata.com> References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> <5eb74cf3-1e04-587e-e564-63f43998b2d3@oss.nttdata.com> Message-ID: Thanks Yasumasa! /Robbin On 2020-06-11 03:36, Yasumasa Suenaga wrote: > Hi Robbin, > > Thanks for catch up this issue. > It looks good to me. > > Yasumasa > > > On 2020/06/10 22:57, Robbin Ehn wrote: >> Hi David and Serguei, (Dan feel free to chime in) >> >>> Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. >>> We know it is functionally correct and it should also have the same performance profile. >> >> Done: >> http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ >> >> Passes: hotspot jdi/jvmti testing, running mach5. >> >> I'll push tomorrow morning if test is ok and you all are happy (+- nits). (and no objection to break the 24h rule) >> I started this patch with reverting "8242425: JVMTI monitor operations should use Thread-Local Handshakes". >> And work my way forward. >> >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Issue: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>>> >>>>>> Local testing of JDI/JVMTI and t1-5. >>>>>> (no real crash so there is nothing to reproduce) >>>>>> >>>>>> Thanks, Robbin From robbin.ehn at oracle.com Thu Jun 11 07:36:58 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 11 Jun 2020 09:36:58 +0200 Subject: RFR(s): 8247248: JVM TI might create JNI locals in another thread when using handshakes. In-Reply-To: References: <6bb7304c-af0c-5386-8ae8-f300bb6e8707@oracle.com> <6954bce9-e924-2160-17f2-2ca9c11dc9f7@oracle.com> <82a1ed06-53eb-f958-0baf-9ed39404d43b@oracle.com> Message-ID: <3cdafb4a-bc05-1fa5-19ef-88b35bfb633a@oracle.com> Thanks David! /Robbin On 2020-06-11 04:01, David Holmes wrote: > Looks good! > > Thanks for making the change. > > On a positive note I think this would now show that the conversion from VMop to direct handshake was actually much > simpler than might have been thought. :) > > I'm not sure when the repo fork is happening, so am unclear whether this will head to the right repo in time. :) > > Thanks, > David > ----- > > On 10/06/2020 11:57 pm, Robbin Ehn wrote: >> Hi David and Serguei, (Dan feel free to chime in) >> >>> Honestly I think I'd like to see things reverted to the use of calling_thread as done for the VMOperation previously. >>> We know it is functionally correct and it should also have the same performance profile. >> >> Done: >> http://cr.openjdk.java.net/~rehn/8247248/v2/webrev/ >> >> Passes: hotspot jdi/jvmti testing, running mach5. >> >> I'll push tomorrow morning if test is ok and you all are happy (+- nits). (and no objection to break the 24h rule) >> I started this patch with reverting "8242425: JVMTI monitor operations should use Thread-Local Handshakes". >> And work my way forward. >> >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Issue: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247248 >>>>>> >>>>>> Local testing of JDI/JVMTI and t1-5. >>>>>> (no real crash so there is nothing to reproduce) >>>>>> >>>>>> Thanks, Robbin From serguei.spitsyn at oracle.com Thu Jun 11 09:10:17 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 11 Jun 2020 02:10:17 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <623437f8-2d29-b0f4-3313-3cb570651452@oracle.com> <064147c8-957f-dc47-139d-6b1a362c9e98@oracle.com> <1b1ecfca-22e7-a64c-a5f2-6f5ea7b37604@oracle.com> Message-ID: <892b5621-c4c8-9b68-7756-9aee3376daef@oracle.com> Hi Jiangli, I'm sorry for being that late to the party. I had a problem to follow all the details in this email thread discussion. It is hard to notice race issues from simple webrev reading. So, thanks a lot to Ioi and David for catching it. As I get from the review comments this fix is not mature enough and more work and discussions are necessary. I'll try to better track this discussion in the future. Thanks, Serguei On 6/9/20 05:43, coleen.phillimore at oracle.com wrote: > (Posting on the right thread and list now...) > > On 6/9/20 2:26 AM, David Holmes wrote: >> Hi Jiangli, >> >> >??? http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ >> >> I'm having trouble keeping track of all the issues, so let me walk >> through the changes as I see them: >> >> - InstanceKlass::restore_unshareable_info >> >> For boot loader classes, when no verification is enabled, we mark the >> class as linked immediately. By doing this in >> restore_unshareable_info there are no races (as the class is not >> exposed to anyone yet) and it allows later checks for is_linked to be >> by-passed (under the assumption that the class and its supertypes >> truly are in a state that appears linked). However, this doesn't >> generate the JVM TI class prepare event, and we can't do it here as >> that would introduce a number of potential issues with JVM TI. >> >> I see in the bug report some metrics from HelloWorld, but really this >> needs to be backed up by a lot more performance measurements to >> establish this is actually a worthwhile optimisation. >> >> - SystemDictionary::define_instance_class >> >> This is where we catch up with the JVM TI requirements and >> immediately after posting the class load event we post the class >> prepare event. >> >> As we have discussed, this earlier posting of the event is observable >> to a JVMTI agent and although permitted by the specification it is a >> change in behaviour that might impact existing agents. >> >> Ioi has raised an issue about there being a race here with the >> potential for the event being delivered multiple times. I agree this >> code is not adequate: >> >> 1718?? if (k->is_shared() && k->is_linked()) { >> >> You only want to fire the event for exactly those classes that you >> pre-linked, so at a minimum this has to be restricted to boot classes >> only. Even then as Ioi points out once the class is exported to the >> SystemDictionary and visibly seen to be loaded, then other threads >> may race to link it and so have already posted the class prepare >> event. In normal linking this race is avoided by the use of the >> init_lock to check the linked state, do the linking and issue the >> class prepare event, atomically. But your approach cannot do this as >> it stands, you would need to add an additional flag to track whether >> the prepare event had already be issued. >> > > Thanks to Ioi and David for seeing this race.? As I looked at the > change, it looked fairly simple and almost straightforward, but very > scary how these changes interact in such surprising ways. Without this > careful review, these changes cause endless work later on.? The area > of class loading and our code for doing so has all sorts of subtle > details that are hard to reason about.? I wish this weren't so and we > can have code that we're not afraid of. > > The CSR is a nice writeup but I didn't see the race from the CSR either. > > We need to take the opportunity to look at this from the top down in a > project like Leyden. > > There are still some opportunities to speed up class loading in the > context of CDS and finding places that we can simplify, but this was > alarmingly not simple.? I'm grateful to Ioi and David for doing this > work, and yours, for thorougly discussing this change. > > Thanks, > Coleen >> --- >> >> So the change as it stands is incomplete, and introduces a >> behavioural change to JVM TI, and the benefits of it have not been >> clearly established. >> >> The JBS issue states this is a first step towards pre-initialization >> and other optimisations, and it is certainly a pre-requisite to >> pre-link before you can pre-initialize, but I don't think pulling out >> pre-linking as a separate optimisation is really a worthwhile first >> step. I have grave reservations about the ability to pre-initialize >> in general and those issues have to be fleshed out in a project like >> Leyden. Further, as Coleen points out this pre-linking optimisation >> is incompatible with proposed vtable changes. Additionally, this >> seems it will be incompatible with changes proposed in Valhalla, as >> additional link-time actions will be needed that can't be done at the >> time of restore_unshareable_info. >> >> Bottom line for me is that I just don't think this change is worth >> pursuing as a stand-alone optimisation at this time. Sorry. >> >> Cheers, >> David >> ----- >> >> On 5/06/2020 8:14 am, Jiangli Zhou wrote: >>> Hi David, >>> >>> On Wed, Jun 3, 2020 at 9:59 PM David Holmes >>> wrote: >>>> >>>> Ioi pointed out that my proposal was incomplete and that it would need >>>> to be more like: >>>> >>>> if (is_shared() && >>>> ????? JvmtiExport::should_post_class_prepare() && >>>> ????? !BytecodeVerificationLocal && >>>> ????? loader_data->is_the_null_class_loader_data()) { >>>> ????? Handle h_init_lock(THREAD, init_lock()); >>>> ????? ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); >>>> ????? set_init_state(linked); >>>> ????? >>> call JVMTI >>>> ????? return true; >>>> ??? } >>>> >>>> This alleviates any concerns about behavioural changes to JVM TI, and >>>> also allows JVM TI enabled code to partially benefit from the >>>> pre-linking optimisation. >>>> >>>> Otherwise I agree with Ioi that any behaviour change to JVM TI >>>> needs to >>>> be justified by significant performance gains. >>>> >>> >>> Thanks a lot for the input and suggestion! Locking the init_lock for >>> the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is >>> normally posted before the ClassPrepare. That's why the change was >>> made in systemDictionary.cpp instead of within >>> InstanceKlass::restore_unshareable_info() function, to keep the same >>> events ordering for any given class. I added the 'init_lock' locking >>> code for post_class_prepare(), and kept the code in >>> systemDictionary.cpp in webreve.03 below.? Not changing the JVMTI >>> events ordering feels safer to me. Would the following be ok to >>> everyone? >>> >>> ?? http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ >>> >>> I also changed the InstanceKlass::restore_unshareable_info() to set >>> _init_state via set_init_state API as you suggested. We can get away >>> without locking the init_lock for setting the flag itself. >>> >>> Best regards, >>> >>> Jiangli >>> >>> >>>> David >>>> ----- >>>> >>>> On 4/06/2020 8:42 am, David Holmes wrote: >>>>> Correction ... >>>>> >>>>> On 3/06/2020 5:19 pm, David Holmes wrote: >>>>>> On 3/06/2020 3:44 pm, Ioi Lam wrote: >>>>>>> On 6/2/20 10:16 PM, David Holmes wrote: >>>>>>>> Hi Ioi, >>>>>>>> >>>>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5/27/20 11:13 PM, David Holmes wrote: >>>>>>>>>> Hi Jiangli, >>>>>>>>>> >>>>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> I was going to take the suggestion, but realized that it >>>>>>>>>>>> would add >>>>>>>>>>>> unnecessary complications for archived boot classes with class >>>>>>>>>>>> pre-initialization support. Some agents may set >>>>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to >>>>>>>>>>>> support >>>>>>>>>>>> class pre-init uniformly for archived boot classes with >>>>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. >>>>>>>>>>> >>>>>>>>>>> This would introduce behavioral changes when JVMTI is enabled: >>>>>>>>>>> >>>>>>>>>>> + The order of JvmtiExport::post_class_prepare is different >>>>>>>>>>> than >>>>>>>>>>> before >>>>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class >>>>>>>>>>> that >>>>>>>>>>> was not called before (if the class is never linked during >>>>>>>>>>> run time) >>>>>>>>>>> + JvmtiExport::post_class_prepare was called inside the >>>>>>>>>>> init_lock, now it's called outside of the init_lock >>>>>>>>>> >>>>>>>>>> I have to say I share Ioi's concerns here. This change will >>>>>>>>>> impact >>>>>>>>>> JVM TI agents in a way we can't be sure of. From a specification >>>>>>>>>> perspective I think we are fine as linking can be lazy or eager, >>>>>>>>>> so there's no implied order either. But this would be a >>>>>>>>>> behavioural change that will be observable by agents. (I'm less >>>>>>>>>> concerned about the init_lock situation as it seems potentially >>>>>>>>>> buggy to me to call out to an agent with the init_lock held >>>>>>>>>> in the >>>>>>>>>> first place! I find it hard to imagine an agent only working >>>>>>>>>> correctly if the init_lock is held.) >>>>>>>>> >>>>>>>>> David, >>>>>>>>> >>>>>>>>> The init_lock has a serializing effect. The callback for a >>>>>>>>> subclass >>>>>>>>> will not be executed until the callback for its super class has >>>>>>>>> been finished. >>>>>>>> >>>>>>>> Sorry I don't see that is the case. The init_lock for the subclass >>>>>>>> is distinct from the init_lock of the superclass, and linking of >>>>>>>> subclasses and superclasses is independent. >>>>>>> >>>>>>> >>>>>>> In InstanceKlass::link_class_impl, you first link all of your super >>>>>>> classes. >>>>>>> >>>>>>> If another thread is already linking your super class, you will >>>>>>> block >>>>>>> on that superclass's init_lock. >>>>>> >>>>>> The point is that there is already a race in terms of the >>>>>> execution of >>>>>> the two callbacks. So while this change can certainly produce a >>>>>> different result to what would previously be seen, such a result is >>>>>> already possible in the general case. >>>>>> >>>>>>> Of course, I may be wrong and my analysis may be bogus. But I hope >>>>>>> you can appreciate that this is not going to be a trivial change to >>>>>>> analyze. >>>>>> >>>>>> Yes I agree. While in general ordering of the class_prepare >>>>>> callbacks >>>>>> is not guaranteed for independent classes, if a given application >>>>>> explicitly loads and links classes in a known order then it can >>>>>> (reasonably) expect its callbacks to execute in that order. If this >>>>>> change means classes will now be linked in an order independent of >>>>>> what the normal runtime order would be then that could be a problem >>>>>> for existing agents. >>>>>> >>>>>> So where does this leave us? The change is within spec, but could >>>>>> trigger changes in agent behaviour that we can't really evaluate >>>>>> a-priori. So as you say we should have a fairly good reason for >>>>>> doing >>>>>> this. I can easily envisage that pre-linking when no callbacks are >>>>>> enabled would be a performance boost. But with callbacks enabled and >>>>>> consuming CPU cycles any benefit from pre-linking could be lost >>>>>> in the >>>>>> noise. >>>>>> >>>>>> What if we did as Ioi suggested and only set the class as linked in >>>>>> restore_unshareable_info if >>>>>> !JvmtiExport::should_post_class_prepare(); >>>>>> and in addition in InstanceKlass::link_class_imp we added an >>>>>> additional check at the start: >>>>>> >>>>>> // Pre-linking at load time may have been disabled for shared >>>>>> classes, >>>>>> // but we may be able to do it now. >>>>>> if (JvmtiExport::should_post_class_prepare() && >>>>>> ????? !BytecodeVerificationLocal && >>>>>> ????? loader_data->is_the_null_class_loader_data()) { >>>>>> ??? _init_state = linked; >>>>>> } >>>>> >>>>> There should obviously be a check for is_shared() in there as well. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> ? >>>>>> >>>>>> That avoids the problem of changing the JVM TI callback >>>>>> behaviour, but >>>>>> also shortens the link time path when the callbacks are enabled. >>>>>> >>>>>> Hope I got that right. :) >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Thanks >>>>>>> - Ioi >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> With the proposed patch, the callback for both the super class >>>>>>>>> and >>>>>>>>> subclass can proceed in parallel. So if an agent performs class >>>>>>>>> hierarchy analysis, for example, it may need to perform extra >>>>>>>>> synchronization. >>>>>>>>> >>>>>>>>> This is just one example that I can think of. I am sure there are >>>>>>>>> other issues that we have not thought about. >>>>>>>>> >>>>>>>>> The fact is we are dealing with arbitrary code in the callbacks, >>>>>>>>> and we are changing the conditions of how they are called. The >>>>>>>>> calls happen inside very delicate code (class loading, system >>>>>>>>> dictionary). I am reluctant to do the due diligence, which is >>>>>>>>> substantial, of verifying that this is a safe change, unless we >>>>>>>>> have a really compelling reason to do so. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> - Ioi >>>>>>>>> >>>>>>>>> >>>>>>> > From daniil.x.titov at oracle.com Thu Jun 11 19:56:29 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Thu, 11 Jun 2020 12:56:29 -0700 Subject: RFR: 8246196: javax/management/MBeanServer/OldMBeanServerTest fails with AssertionError Message-ID: Please review change [1] that fixes an intermittent failure of the test when it is runs with -Xcomp. The problem here is that the timespan the test uses to count notifications is not adjusted for "test.timeout.factor" system property. The original issue is reproducible in JDK 11 and on Solaris platform only. However, I think it makes sense to apply this change in JDK 15 to prevent this from possible happening in the future and then backport it to 11. [1] http://cr.openjdk.java.net/~dtitov/8246196/webrev.01/ [2] https://bugs.openjdk.java.net/browse/JDK-8246196 Thank you, Daniil From leonid.mesnik at oracle.com Thu Jun 11 21:09:20 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 11 Jun 2020 14:09:20 -0700 Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean Message-ID: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> Hi Could you review following fix which change leftovers of ThreadMBean to ThreadMXBean. In the most cases the comments were updated only. webrev: http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8242328 Leonid From leonid.mesnik at oracle.com Thu Jun 11 22:30:42 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 11 Jun 2020 15:30:42 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: References: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> Message-ID: <2cf4e45a-4d44-3c0a-a272-480f56a5e6e8@oracle.com> Agree, it would be better to don't try to use data from functions with error code. The new webrev: http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/ I tried to prevent any usage of possibly corrupted data. Mostly strings or allocated data, sometimes method/class id which are used my other JVMTI functions. Leonid On 6/9/20 6:59 PM, serguei.spitsyn at oracle.com wrote: > On 6/9/20 12:58, Leonid Mesnik wrote: >> >> Hi >> >> >> On 6/9/20 12:34 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Leonid, >>> >>> Thank you for taking care about this! >>> It looks good in general. >>> However, I think, a similar return is needed in more cases. >>> >>> One example: >>> >>> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/jvmti/Exception/exception001/exception001.cpp.frames.html >>> >>> 99 err = jvmti_env->GetMethodDeclaringClass(method, &cls); >>> 100 if (err != JVMTI_ERROR_NONE) { >>> 101 printf("(GetMethodDeclaringClass#t) unexpected error: %s (%d)\n", >>> 102 TranslateError(err), err); >>> 103 result = STATUS_FAILED; >>> 104 return; >>> 105 } >>> 106 err = jvmti_env->GetClassSignature(cls, &ex.t_cls, &generic); >>> 107 if (err != JVMTI_ERROR_NONE) { >>> 108 printf("(GetClassSignature#t) unexpected error: %s (%d)\n", >>> 109 TranslateError(err), err); >>> 110 result = STATUS_FAILED; >>> 111 } >>> 112 err = jvmti_env->GetMethodName(method, >>> 113 &ex.t_name, &ex.t_sig, &generic); >>> 114 if (err != JVMTI_ERROR_NONE) { >>> 115 printf("(GetMethodName#t) unexpected error: %s (%d)\n", >>> 116 TranslateError(err), err); >>> 117 result = STATUS_FAILED; >>> 118 } >>> 119 ex.t_loc = location; >>> 120 err = jvmti_env->GetMethodDeclaringClass(catch_method, &cls); >>> 121 if (err != JVMTI_ERROR_NONE) { >>> 122 printf("(GetMethodDeclaringClass#c) unexpected error: %s (%d)\n", >>> 123 TranslateError(err), err); >>> 124 result = STATUS_FAILED; >>> 125 return; >>> 126 } >>> 127 err = jvmti_env->GetClassSignature(cls, &ex.c_cls, &generic); >>> 128 if (err != JVMTI_ERROR_NONE) { >>> 129 printf("(GetClassSignature#c) unexpected error: %s (%d)\n", >>> 130 TranslateError(err), err); >>> 131 result = STATUS_FAILED; >>> 132 } >>> 133 err = jvmti_env->GetMethodName(catch_method, >>> 134 &ex.c_name, &ex.c_sig, &generic); >>> 135 if (err != JVMTI_ERROR_NONE) { >>> 136 printf("(GetMethodName#c) unexpected error: %s (%d)\n", >>> 137 TranslateError(err), err); >>> 138 result = STATUS_FAILED; >>> 139 } >>> >>> In the fragment above you added return for JVMTI >>> GetMethodDeclaringClass error. >>> But GetMethodName and GetClassSignature can be also problematic as >>> the returned names are printed below. >>> It seems to be more safe and even simpler to add returns for such >>> cases as well. >>> Otherwise, the code reader is puzzled why there is a return in one >>> failure case and there is no such return in another. >> >> It is a good question if we want to fix such places or even fails >> with first JVMTI failure. (I even started to fix it in the such way >> but find that existing tests usually don't fail always). >> > > I do not suggest to fix all the tests but those which you are already > fixing. > > >> The difference is that test tries to reuse "cls" in other JVMTI >> function and going to generate very misleading crash. How it just >> tries to compare ex and exs values. So test might crash but clearly >> outside of JVMTI function and with some useful info. So I am not sure >> if fixing these lines improve test failure handling. >> > > If JVMTI functions fail with an error code the results with symbolic > strings must be considered invalid. > However, they are used later (the values are printed). > It is better to bail out in such cases. > It should not be a problem to add similar returns in such cases. > Or do you think it is important to continue execution for some reason? > >> Assuming that most of existing tests fails early only if going to >> re-use possible corrupted data I propose to fix this separately. We >> need to figure out when to fail or to try to finish. >> > > Do you suggest it for the updated tests only or for all the tests with > such problems? > > Thanks, > Serguei > >> Leonid >> >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/1/20 21:33, Leonid Mesnik wrote: >>>> Hi >>>> >>>> Could you please review following fix which stop test execution if >>>> JVMTI function returns error. The test fails anyway however using >>>> potentially bad data in JVMTI function might cause misleading crash >>>> failures. The hs_err will contains the stacktrace not with problem >>>> function but with function called with corrupted data. Most of >>>> tests already has such behavior but not all. Also I fixed a couple >>>> of tests to finish if they haven't managed to suspend thread. >>>> >>>> I've updated only tests which try to use corrupted data in JVMTI >>>> functions after errors. I haven't updated tests which just >>>> compare/print values from erroring JVMTI functions. The crash in >>>> strcmp/println is not so misleading and might be point to real issue. >>>> >>>> webrev: http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/ >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8242891 >>>> >>>> Leonid >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Thu Jun 11 23:48:23 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 12 Jun 2020 09:48:23 +1000 Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean In-Reply-To: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> References: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> Message-ID: Hi Leonid, On 12/06/2020 7:09 am, Leonid Mesnik wrote: > Hi > > Could you review following fix which change leftovers of ThreadMBean to > ThreadMXBean. In the most cases the comments were updated only. > > webrev: http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/ Looks good! I find this whole MBean vs MXBean terminology very confusing. :) Thanks, David > bug: https://bugs.openjdk.java.net/browse/JDK-8242328 > > Leonid > From david.holmes at oracle.com Thu Jun 11 23:51:51 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 12 Jun 2020 09:51:51 +1000 Subject: RFR: 8246196: javax/management/MBeanServer/OldMBeanServerTest fails with AssertionError In-Reply-To: References: Message-ID: Hi Daniil, On 12/06/2020 5:56 am, Daniil Titov wrote: > Please review change [1] that fixes an intermittent failure of the test when it is runs with -Xcomp. > > The problem here is that the timespan the test uses to count notifications is not adjusted for "test.timeout.factor" system property. The adjustment looks fine. > The original issue is reproducible in JDK 11 and on Solaris platform only. However, I think it makes sense to apply this change in JDK 15 to prevent this from possible happening in the future and then backport it to 11. Do you still intend this for 15 or just 16? If 15 then push to jdk15 repo and it will get forward ported to jdk/jdk automatically. Thanks, David > [1] http://cr.openjdk.java.net/~dtitov/8246196/webrev.01/ > [2] https://bugs.openjdk.java.net/browse/JDK-8246196 > > Thank you, > Daniil > > > From alexey.menkov at oracle.com Fri Jun 12 00:28:34 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Thu, 11 Jun 2020 17:28:34 -0700 Subject: RFR: 8246196: javax/management/MBeanServer/OldMBeanServerTest fails with AssertionError In-Reply-To: References: Message-ID: +1 --alex On 06/11/2020 16:51, David Holmes wrote: > Hi Daniil, > > On 12/06/2020 5:56 am, Daniil Titov wrote: >> Please review change [1] that fixes an intermittent? failure of the >> test when it is runs with -Xcomp. >> >> The problem here is that the timespan the test uses to count >> notifications? is not adjusted for "test.timeout.factor" system property. > > The adjustment looks fine. > >> The original issue is reproducible in JDK 11 and on Solaris platform >> only. However,? I think it makes sense to apply this change in JDK 15 >> to prevent this from possible happening in the future and then >> backport it to 11. > > Do you still intend this for 15 or just 16? If 15 then push to jdk15 > repo and it will get forward ported to jdk/jdk automatically. > > Thanks, > David > >> [1] http://cr.openjdk.java.net/~dtitov/8246196/webrev.01/ >> [2] https://bugs.openjdk.java.net/browse/JDK-8246196 >> >> Thank you, >> Daniil >> >> >> From igor.ignatyev at ORACLE.COM Thu Jun 11 23:57:25 2020 From: igor.ignatyev at ORACLE.COM (Igor Ignatyev) Date: Thu, 11 Jun 2020 16:57:25 -0700 Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean In-Reply-To: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> References: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> Message-ID: <40FE830B-3760-4A48-89D9-F2190987450F@oracle.com> LGTM -- Igor > On Jun 11, 2020, at 2:09 PM, Leonid Mesnik wrote: > > Hi > > Could you review following fix which change leftovers of ThreadMBean to ThreadMXBean. In the most cases the comments were updated only. > > webrev: http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/ > > bug: https://bugs.openjdk.java.net/browse/JDK-8242328 > > Leonid > From jianglizhou at google.com Fri Jun 12 01:35:44 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Thu, 11 Jun 2020 18:35:44 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: <892b5621-c4c8-9b68-7756-9aee3376daef@oracle.com> References: <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <623437f8-2d29-b0f4-3313-3cb570651452@oracle.com> <064147c8-957f-dc47-139d-6b1a362c9e98@oracle.com> <1b1ecfca-22e7-a64c-a5f2-6f5ea7b37604@oracle.com> <892b5621-c4c8-9b68-7756-9aee3376daef@oracle.com> Message-ID: Hi Serguei, Coleen, David and Ioi, Thank you for all the responses! Sorry for the delay. I found time today to collect data with a JVMTI agent enabled. To have a more controlled measurement, I created a micro benchmark with an agent that registers a callback to handle the ClassPrepare event. The agent code is the same as the libSimpleClassPrepare.c (included in the webrev). The main app is a simple HelloWorld app. Following are the Before and After (with the lastest code). The saving for each class is a constant, and is proportional to the number of classes with early linked state. Before --------- 82.37 msec task-clock:u # 1.568 CPUs utilized ( +- 0.35% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4,020 page-faults:u # 0.049 M/sec ( +- 0.08% ) 93,940,085 cycles:u # 1.140 GHz ( +- 0.17% ) 89,093,125 instructions:u # 0.95 insn per cycle ( +- 0.08% ) 17,585,478 branches:u # 213.484 M/sec ( +- 0.09% ) 643,748 branch-misses:u # 3.66% of all branches ( +- 0.13% ) 0.052522 +- 0.000244 seconds time elapsed ( +- 0.47% ) After ------ 82.73 msec task-clock:u # 1.559 CPUs utilized ( +- 0.38% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4,007 page-faults:u # 0.048 M/sec ( +- 0.09% ) 96,329,114 cycles:u # 1.164 GHz ( +- 0.21% ) 89,985,873 instructions:u # 0.93 insn per cycle ( +- 0.08% ) 17,770,854 branches:u # 214.814 M/sec ( +- 0.08% ) 644,142 branch-misses:u # 3.62% of all branches ( +- 0.12% ) 0.053056 +- 0.000263 seconds time elapsed ( +- 0.49% ) David, thanks for the details on the potential event race. It makes sense to me. Coleen thanks for pointing to Erik's new binding changes. I will find more details from Erik on if it is workable to have early link states with his changes, and put the update for the current change temporarily on hold again before that can be worked out. Thanks again to everyone's feedback! Best, Jiangli On Thu, Jun 11, 2020 at 2:10 AM serguei.spitsyn at oracle.com wrote: > > Hi Jiangli, > > I'm sorry for being that late to the party. > I had a problem to follow all the details in this email thread discussion. > > It is hard to notice race issues from simple webrev reading. > So, thanks a lot to Ioi and David for catching it. > As I get from the review comments this fix is not mature enough and more > work and discussions are necessary. > I'll try to better track this discussion in the future. > > Thanks, > Serguei > > > On 6/9/20 05:43, coleen.phillimore at oracle.com wrote: > > (Posting on the right thread and list now...) > > > > On 6/9/20 2:26 AM, David Holmes wrote: > >> Hi Jiangli, > >> > >> > http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ > >> > >> I'm having trouble keeping track of all the issues, so let me walk > >> through the changes as I see them: > >> > >> - InstanceKlass::restore_unshareable_info > >> > >> For boot loader classes, when no verification is enabled, we mark the > >> class as linked immediately. By doing this in > >> restore_unshareable_info there are no races (as the class is not > >> exposed to anyone yet) and it allows later checks for is_linked to be > >> by-passed (under the assumption that the class and its supertypes > >> truly are in a state that appears linked). However, this doesn't > >> generate the JVM TI class prepare event, and we can't do it here as > >> that would introduce a number of potential issues with JVM TI. > >> > >> I see in the bug report some metrics from HelloWorld, but really this > >> needs to be backed up by a lot more performance measurements to > >> establish this is actually a worthwhile optimisation. > >> > >> - SystemDictionary::define_instance_class > >> > >> This is where we catch up with the JVM TI requirements and > >> immediately after posting the class load event we post the class > >> prepare event. > >> > >> As we have discussed, this earlier posting of the event is observable > >> to a JVMTI agent and although permitted by the specification it is a > >> change in behaviour that might impact existing agents. > >> > >> Ioi has raised an issue about there being a race here with the > >> potential for the event being delivered multiple times. I agree this > >> code is not adequate: > >> > >> 1718 if (k->is_shared() && k->is_linked()) { > >> > >> You only want to fire the event for exactly those classes that you > >> pre-linked, so at a minimum this has to be restricted to boot classes > >> only. Even then as Ioi points out once the class is exported to the > >> SystemDictionary and visibly seen to be loaded, then other threads > >> may race to link it and so have already posted the class prepare > >> event. In normal linking this race is avoided by the use of the > >> init_lock to check the linked state, do the linking and issue the > >> class prepare event, atomically. But your approach cannot do this as > >> it stands, you would need to add an additional flag to track whether > >> the prepare event had already be issued. > >> > > > > Thanks to Ioi and David for seeing this race. As I looked at the > > change, it looked fairly simple and almost straightforward, but very > > scary how these changes interact in such surprising ways. Without this > > careful review, these changes cause endless work later on. The area > > of class loading and our code for doing so has all sorts of subtle > > details that are hard to reason about. I wish this weren't so and we > > can have code that we're not afraid of. > > > > The CSR is a nice writeup but I didn't see the race from the CSR either. > > > > We need to take the opportunity to look at this from the top down in a > > project like Leyden. > > > > There are still some opportunities to speed up class loading in the > > context of CDS and finding places that we can simplify, but this was > > alarmingly not simple. I'm grateful to Ioi and David for doing this > > work, and yours, for thorougly discussing this change. > > > > Thanks, > > Coleen > >> --- > >> > >> So the change as it stands is incomplete, and introduces a > >> behavioural change to JVM TI, and the benefits of it have not been > >> clearly established. > >> > >> The JBS issue states this is a first step towards pre-initialization > >> and other optimisations, and it is certainly a pre-requisite to > >> pre-link before you can pre-initialize, but I don't think pulling out > >> pre-linking as a separate optimisation is really a worthwhile first > >> step. I have grave reservations about the ability to pre-initialize > >> in general and those issues have to be fleshed out in a project like > >> Leyden. Further, as Coleen points out this pre-linking optimisation > >> is incompatible with proposed vtable changes. Additionally, this > >> seems it will be incompatible with changes proposed in Valhalla, as > >> additional link-time actions will be needed that can't be done at the > >> time of restore_unshareable_info. > >> > >> Bottom line for me is that I just don't think this change is worth > >> pursuing as a stand-alone optimisation at this time. Sorry. > >> > >> Cheers, > >> David > >> ----- > >> > >> On 5/06/2020 8:14 am, Jiangli Zhou wrote: > >>> Hi David, > >>> > >>> On Wed, Jun 3, 2020 at 9:59 PM David Holmes > >>> wrote: > >>>> > >>>> Ioi pointed out that my proposal was incomplete and that it would need > >>>> to be more like: > >>>> > >>>> if (is_shared() && > >>>> JvmtiExport::should_post_class_prepare() && > >>>> !BytecodeVerificationLocal && > >>>> loader_data->is_the_null_class_loader_data()) { > >>>> Handle h_init_lock(THREAD, init_lock()); > >>>> ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); > >>>> set_init_state(linked); > >>>> >>> call JVMTI > >>>> return true; > >>>> } > >>>> > >>>> This alleviates any concerns about behavioural changes to JVM TI, and > >>>> also allows JVM TI enabled code to partially benefit from the > >>>> pre-linking optimisation. > >>>> > >>>> Otherwise I agree with Ioi that any behaviour change to JVM TI > >>>> needs to > >>>> be justified by significant performance gains. > >>>> > >>> > >>> Thanks a lot for the input and suggestion! Locking the init_lock for > >>> the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is > >>> normally posted before the ClassPrepare. That's why the change was > >>> made in systemDictionary.cpp instead of within > >>> InstanceKlass::restore_unshareable_info() function, to keep the same > >>> events ordering for any given class. I added the 'init_lock' locking > >>> code for post_class_prepare(), and kept the code in > >>> systemDictionary.cpp in webreve.03 below. Not changing the JVMTI > >>> events ordering feels safer to me. Would the following be ok to > >>> everyone? > >>> > >>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ > >>> > >>> I also changed the InstanceKlass::restore_unshareable_info() to set > >>> _init_state via set_init_state API as you suggested. We can get away > >>> without locking the init_lock for setting the flag itself. > >>> > >>> Best regards, > >>> > >>> Jiangli > >>> > >>> > >>>> David > >>>> ----- > >>>> > >>>> On 4/06/2020 8:42 am, David Holmes wrote: > >>>>> Correction ... > >>>>> > >>>>> On 3/06/2020 5:19 pm, David Holmes wrote: > >>>>>> On 3/06/2020 3:44 pm, Ioi Lam wrote: > >>>>>>> On 6/2/20 10:16 PM, David Holmes wrote: > >>>>>>>> Hi Ioi, > >>>>>>>> > >>>>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 5/27/20 11:13 PM, David Holmes wrote: > >>>>>>>>>> Hi Jiangli, > >>>>>>>>>> > >>>>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> I was going to take the suggestion, but realized that it > >>>>>>>>>>>> would add > >>>>>>>>>>>> unnecessary complications for archived boot classes with class > >>>>>>>>>>>> pre-initialization support. Some agents may set > >>>>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to > >>>>>>>>>>>> support > >>>>>>>>>>>> class pre-init uniformly for archived boot classes with > >>>>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. > >>>>>>>>>>> > >>>>>>>>>>> This would introduce behavioral changes when JVMTI is enabled: > >>>>>>>>>>> > >>>>>>>>>>> + The order of JvmtiExport::post_class_prepare is different > >>>>>>>>>>> than > >>>>>>>>>>> before > >>>>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class > >>>>>>>>>>> that > >>>>>>>>>>> was not called before (if the class is never linked during > >>>>>>>>>>> run time) > >>>>>>>>>>> + JvmtiExport::post_class_prepare was called inside the > >>>>>>>>>>> init_lock, now it's called outside of the init_lock > >>>>>>>>>> > >>>>>>>>>> I have to say I share Ioi's concerns here. This change will > >>>>>>>>>> impact > >>>>>>>>>> JVM TI agents in a way we can't be sure of. From a specification > >>>>>>>>>> perspective I think we are fine as linking can be lazy or eager, > >>>>>>>>>> so there's no implied order either. But this would be a > >>>>>>>>>> behavioural change that will be observable by agents. (I'm less > >>>>>>>>>> concerned about the init_lock situation as it seems potentially > >>>>>>>>>> buggy to me to call out to an agent with the init_lock held > >>>>>>>>>> in the > >>>>>>>>>> first place! I find it hard to imagine an agent only working > >>>>>>>>>> correctly if the init_lock is held.) > >>>>>>>>> > >>>>>>>>> David, > >>>>>>>>> > >>>>>>>>> The init_lock has a serializing effect. The callback for a > >>>>>>>>> subclass > >>>>>>>>> will not be executed until the callback for its super class has > >>>>>>>>> been finished. > >>>>>>>> > >>>>>>>> Sorry I don't see that is the case. The init_lock for the subclass > >>>>>>>> is distinct from the init_lock of the superclass, and linking of > >>>>>>>> subclasses and superclasses is independent. > >>>>>>> > >>>>>>> > >>>>>>> In InstanceKlass::link_class_impl, you first link all of your super > >>>>>>> classes. > >>>>>>> > >>>>>>> If another thread is already linking your super class, you will > >>>>>>> block > >>>>>>> on that superclass's init_lock. > >>>>>> > >>>>>> The point is that there is already a race in terms of the > >>>>>> execution of > >>>>>> the two callbacks. So while this change can certainly produce a > >>>>>> different result to what would previously be seen, such a result is > >>>>>> already possible in the general case. > >>>>>> > >>>>>>> Of course, I may be wrong and my analysis may be bogus. But I hope > >>>>>>> you can appreciate that this is not going to be a trivial change to > >>>>>>> analyze. > >>>>>> > >>>>>> Yes I agree. While in general ordering of the class_prepare > >>>>>> callbacks > >>>>>> is not guaranteed for independent classes, if a given application > >>>>>> explicitly loads and links classes in a known order then it can > >>>>>> (reasonably) expect its callbacks to execute in that order. If this > >>>>>> change means classes will now be linked in an order independent of > >>>>>> what the normal runtime order would be then that could be a problem > >>>>>> for existing agents. > >>>>>> > >>>>>> So where does this leave us? The change is within spec, but could > >>>>>> trigger changes in agent behaviour that we can't really evaluate > >>>>>> a-priori. So as you say we should have a fairly good reason for > >>>>>> doing > >>>>>> this. I can easily envisage that pre-linking when no callbacks are > >>>>>> enabled would be a performance boost. But with callbacks enabled and > >>>>>> consuming CPU cycles any benefit from pre-linking could be lost > >>>>>> in the > >>>>>> noise. > >>>>>> > >>>>>> What if we did as Ioi suggested and only set the class as linked in > >>>>>> restore_unshareable_info if > >>>>>> !JvmtiExport::should_post_class_prepare(); > >>>>>> and in addition in InstanceKlass::link_class_imp we added an > >>>>>> additional check at the start: > >>>>>> > >>>>>> // Pre-linking at load time may have been disabled for shared > >>>>>> classes, > >>>>>> // but we may be able to do it now. > >>>>>> if (JvmtiExport::should_post_class_prepare() && > >>>>>> !BytecodeVerificationLocal && > >>>>>> loader_data->is_the_null_class_loader_data()) { > >>>>>> _init_state = linked; > >>>>>> } > >>>>> > >>>>> There should obviously be a check for is_shared() in there as well. > >>>>> > >>>>> David > >>>>> ----- > >>>>> > >>>>>> ? > >>>>>> > >>>>>> That avoids the problem of changing the JVM TI callback > >>>>>> behaviour, but > >>>>>> also shortens the link time path when the callbacks are enabled. > >>>>>> > >>>>>> Hope I got that right. :) > >>>>>> > >>>>>> David > >>>>>> ----- > >>>>>> > >>>>>>> Thanks > >>>>>>> - Ioi > >>>>>>>> > >>>>>>>> David > >>>>>>>> ----- > >>>>>>>> > >>>>>>>>> With the proposed patch, the callback for both the super class > >>>>>>>>> and > >>>>>>>>> subclass can proceed in parallel. So if an agent performs class > >>>>>>>>> hierarchy analysis, for example, it may need to perform extra > >>>>>>>>> synchronization. > >>>>>>>>> > >>>>>>>>> This is just one example that I can think of. I am sure there are > >>>>>>>>> other issues that we have not thought about. > >>>>>>>>> > >>>>>>>>> The fact is we are dealing with arbitrary code in the callbacks, > >>>>>>>>> and we are changing the conditions of how they are called. The > >>>>>>>>> calls happen inside very delicate code (class loading, system > >>>>>>>>> dictionary). I am reluctant to do the due diligence, which is > >>>>>>>>> substantial, of verifying that this is a safe change, unless we > >>>>>>>>> have a really compelling reason to do so. > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> - Ioi > >>>>>>>>> > >>>>>>>>> > >>>>>>> > > > From jianglizhou at google.com Fri Jun 12 02:10:39 2020 From: jianglizhou at google.com (Jiangli Zhou) Date: Thu, 11 Jun 2020 19:10:39 -0700 Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is restored at runtime In-Reply-To: References: <8a2077b7-9e1e-ddd3-320e-7094c11512f9@oracle.com> <01bf79e3-8fca-7239-559f-cdf45f36d4a9@oracle.com> <57f8b328-3617-8290-c2a2-2fb3a6e2f082@oracle.com> <1c271fc5-7c28-61a8-2627-9a3931039d79@oracle.com> <1d8900f1-3399-bbc0-98bb-00375f90ac56@oracle.com> <8337cabe-c18e-3a54-a28a-8d94fed4fcab@oracle.com> <623437f8-2d29-b0f4-3313-3cb570651452@oracle.com> <064147c8-957f-dc47-139d-6b1a362c9e98@oracle.com> <1b1ecfca-22e7-a64c-a5f2-6f5ea7b37604@oracle.com> <892b5621-c4c8-9b68-7756-9aee3376daef@oracle.com> Message-ID: Correction. The Before and After in the previous email were labeled wrong. After ------ 82.37 msec task-clock:u # 1.568 CPUs utilized ( +- 0.35% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4,020 page-faults:u # 0.049 M/sec ( +- 0.08% ) 93,940,085 cycles:u # 1.140 GHz ( +- 0.17% ) 89,093,125 instructions:u # 0.95 insn per cycle ( +- 0.08% ) 17,585,478 branches:u # 213.484 M/sec ( +- 0.09% ) 643,748 branch-misses:u # 3.66% of all branches ( +- 0.13% ) 0.052522 +- 0.000244 seconds time elapsed ( +- 0.47% ) Before ------ 82.73 msec task-clock:u # 1.559 CPUs utilized ( +- 0.38% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4,007 page-faults:u # 0.048 M/sec ( +- 0.09% ) 96,329,114 cycles:u # 1.164 GHz ( +- 0.21% ) 89,985,873 instructions:u # 0.93 insn per cycle ( +- 0.08% ) 17,770,854 branches:u # 214.814 M/sec ( +- 0.08% ) 644,142 branch-misses:u # 3.62% of all branches ( +- 0.12% ) 0.053056 +- 0.000263 seconds time elapsed ( +- 0.49% ) On Thu, Jun 11, 2020 at 6:35 PM Jiangli Zhou wrote: > > Hi Serguei, Coleen, David and Ioi, > > Thank you for all the responses! > > Sorry for the delay. I found time today to collect data with a JVMTI > agent enabled. > > To have a more controlled measurement, I created a micro benchmark > with an agent that registers a callback to handle the ClassPrepare > event. The agent code is the same as the libSimpleClassPrepare.c > (included in the webrev). The main app is a simple HelloWorld app. > Following are the Before and After (with the lastest code). The saving > for each class is a constant, and is proportional to the number of > classes with early linked state. > > Before > --------- > 82.37 msec task-clock:u # 1.568 CPUs > utilized ( +- 0.35% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 4,020 page-faults:u # 0.049 M/sec > ( +- 0.08% ) > 93,940,085 cycles:u # 1.140 GHz > ( +- 0.17% ) > 89,093,125 instructions:u # 0.95 insn per > cycle ( +- 0.08% ) > 17,585,478 branches:u # 213.484 M/sec > ( +- 0.09% ) > 643,748 branch-misses:u # 3.66% of all > branches ( +- 0.13% ) > > 0.052522 +- 0.000244 seconds time elapsed ( +- 0.47% ) > > After > ------ > 82.73 msec task-clock:u # 1.559 CPUs > utilized ( +- 0.38% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 4,007 page-faults:u # 0.048 M/sec > ( +- 0.09% ) > 96,329,114 cycles:u # 1.164 GHz > ( +- 0.21% ) > 89,985,873 instructions:u # 0.93 insn per > cycle ( +- 0.08% ) > 17,770,854 branches:u # 214.814 M/sec > ( +- 0.08% ) > 644,142 branch-misses:u # 3.62% of all > branches ( +- 0.12% ) > 0.053056 +- 0.000263 seconds time elapsed ( +- 0.49% ) > > David, thanks for the details on the potential event race. It makes > sense to me. Coleen thanks for pointing to Erik's new binding changes. > I will find more details from Erik on if it is workable to have early > link states with his changes, and put the update for the current > change temporarily on hold again before that can be worked out. > > Thanks again to everyone's feedback! > > Best, > Jiangli > > On Thu, Jun 11, 2020 at 2:10 AM serguei.spitsyn at oracle.com > wrote: > > > > Hi Jiangli, > > > > I'm sorry for being that late to the party. > > I had a problem to follow all the details in this email thread discussion. > > > > It is hard to notice race issues from simple webrev reading. > > So, thanks a lot to Ioi and David for catching it. > > As I get from the review comments this fix is not mature enough and more > > work and discussions are necessary. > > I'll try to better track this discussion in the future. > > > > Thanks, > > Serguei > > > > > > On 6/9/20 05:43, coleen.phillimore at oracle.com wrote: > > > (Posting on the right thread and list now...) > > > > > > On 6/9/20 2:26 AM, David Holmes wrote: > > >> Hi Jiangli, > > >> > > >> > http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ > > >> > > >> I'm having trouble keeping track of all the issues, so let me walk > > >> through the changes as I see them: > > >> > > >> - InstanceKlass::restore_unshareable_info > > >> > > >> For boot loader classes, when no verification is enabled, we mark the > > >> class as linked immediately. By doing this in > > >> restore_unshareable_info there are no races (as the class is not > > >> exposed to anyone yet) and it allows later checks for is_linked to be > > >> by-passed (under the assumption that the class and its supertypes > > >> truly are in a state that appears linked). However, this doesn't > > >> generate the JVM TI class prepare event, and we can't do it here as > > >> that would introduce a number of potential issues with JVM TI. > > >> > > >> I see in the bug report some metrics from HelloWorld, but really this > > >> needs to be backed up by a lot more performance measurements to > > >> establish this is actually a worthwhile optimisation. > > >> > > >> - SystemDictionary::define_instance_class > > >> > > >> This is where we catch up with the JVM TI requirements and > > >> immediately after posting the class load event we post the class > > >> prepare event. > > >> > > >> As we have discussed, this earlier posting of the event is observable > > >> to a JVMTI agent and although permitted by the specification it is a > > >> change in behaviour that might impact existing agents. > > >> > > >> Ioi has raised an issue about there being a race here with the > > >> potential for the event being delivered multiple times. I agree this > > >> code is not adequate: > > >> > > >> 1718 if (k->is_shared() && k->is_linked()) { > > >> > > >> You only want to fire the event for exactly those classes that you > > >> pre-linked, so at a minimum this has to be restricted to boot classes > > >> only. Even then as Ioi points out once the class is exported to the > > >> SystemDictionary and visibly seen to be loaded, then other threads > > >> may race to link it and so have already posted the class prepare > > >> event. In normal linking this race is avoided by the use of the > > >> init_lock to check the linked state, do the linking and issue the > > >> class prepare event, atomically. But your approach cannot do this as > > >> it stands, you would need to add an additional flag to track whether > > >> the prepare event had already be issued. > > >> > > > > > > Thanks to Ioi and David for seeing this race. As I looked at the > > > change, it looked fairly simple and almost straightforward, but very > > > scary how these changes interact in such surprising ways. Without this > > > careful review, these changes cause endless work later on. The area > > > of class loading and our code for doing so has all sorts of subtle > > > details that are hard to reason about. I wish this weren't so and we > > > can have code that we're not afraid of. > > > > > > The CSR is a nice writeup but I didn't see the race from the CSR either. > > > > > > We need to take the opportunity to look at this from the top down in a > > > project like Leyden. > > > > > > There are still some opportunities to speed up class loading in the > > > context of CDS and finding places that we can simplify, but this was > > > alarmingly not simple. I'm grateful to Ioi and David for doing this > > > work, and yours, for thorougly discussing this change. > > > > > > Thanks, > > > Coleen > > >> --- > > >> > > >> So the change as it stands is incomplete, and introduces a > > >> behavioural change to JVM TI, and the benefits of it have not been > > >> clearly established. > > >> > > >> The JBS issue states this is a first step towards pre-initialization > > >> and other optimisations, and it is certainly a pre-requisite to > > >> pre-link before you can pre-initialize, but I don't think pulling out > > >> pre-linking as a separate optimisation is really a worthwhile first > > >> step. I have grave reservations about the ability to pre-initialize > > >> in general and those issues have to be fleshed out in a project like > > >> Leyden. Further, as Coleen points out this pre-linking optimisation > > >> is incompatible with proposed vtable changes. Additionally, this > > >> seems it will be incompatible with changes proposed in Valhalla, as > > >> additional link-time actions will be needed that can't be done at the > > >> time of restore_unshareable_info. > > >> > > >> Bottom line for me is that I just don't think this change is worth > > >> pursuing as a stand-alone optimisation at this time. Sorry. > > >> > > >> Cheers, > > >> David > > >> ----- > > >> > > >> On 5/06/2020 8:14 am, Jiangli Zhou wrote: > > >>> Hi David, > > >>> > > >>> On Wed, Jun 3, 2020 at 9:59 PM David Holmes > > >>> wrote: > > >>>> > > >>>> Ioi pointed out that my proposal was incomplete and that it would need > > >>>> to be more like: > > >>>> > > >>>> if (is_shared() && > > >>>> JvmtiExport::should_post_class_prepare() && > > >>>> !BytecodeVerificationLocal && > > >>>> loader_data->is_the_null_class_loader_data()) { > > >>>> Handle h_init_lock(THREAD, init_lock()); > > >>>> ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL); > > >>>> set_init_state(linked); > > >>>> >>> call JVMTI > > >>>> return true; > > >>>> } > > >>>> > > >>>> This alleviates any concerns about behavioural changes to JVM TI, and > > >>>> also allows JVM TI enabled code to partially benefit from the > > >>>> pre-linking optimisation. > > >>>> > > >>>> Otherwise I agree with Ioi that any behaviour change to JVM TI > > >>>> needs to > > >>>> be justified by significant performance gains. > > >>>> > > >>> > > >>> Thanks a lot for the input and suggestion! Locking the init_lock for > > >>> the JVMTI ClassPrepre event here sounds ok to me. The ClassDefine is > > >>> normally posted before the ClassPrepare. That's why the change was > > >>> made in systemDictionary.cpp instead of within > > >>> InstanceKlass::restore_unshareable_info() function, to keep the same > > >>> events ordering for any given class. I added the 'init_lock' locking > > >>> code for post_class_prepare(), and kept the code in > > >>> systemDictionary.cpp in webreve.03 below. Not changing the JVMTI > > >>> events ordering feels safer to me. Would the following be ok to > > >>> everyone? > > >>> > > >>> http://cr.openjdk.java.net/~jiangli/8232222/webrev.03/ > > >>> > > >>> I also changed the InstanceKlass::restore_unshareable_info() to set > > >>> _init_state via set_init_state API as you suggested. We can get away > > >>> without locking the init_lock for setting the flag itself. > > >>> > > >>> Best regards, > > >>> > > >>> Jiangli > > >>> > > >>> > > >>>> David > > >>>> ----- > > >>>> > > >>>> On 4/06/2020 8:42 am, David Holmes wrote: > > >>>>> Correction ... > > >>>>> > > >>>>> On 3/06/2020 5:19 pm, David Holmes wrote: > > >>>>>> On 3/06/2020 3:44 pm, Ioi Lam wrote: > > >>>>>>> On 6/2/20 10:16 PM, David Holmes wrote: > > >>>>>>>> Hi Ioi, > > >>>>>>>> > > >>>>>>>> On 3/06/2020 2:55 pm, Ioi Lam wrote: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On 5/27/20 11:13 PM, David Holmes wrote: > > >>>>>>>>>> Hi Jiangli, > > >>>>>>>>>> > > >>>>>>>>>> On 28/05/2020 11:35 am, Ioi Lam wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> I was going to take the suggestion, but realized that it > > >>>>>>>>>>>> would add > > >>>>>>>>>>>> unnecessary complications for archived boot classes with class > > >>>>>>>>>>>> pre-initialization support. Some agents may set > > >>>>>>>>>>>> JvmtiExport::should_post_class_prepare(). It's worthwhile to > > >>>>>>>>>>>> support > > >>>>>>>>>>>> class pre-init uniformly for archived boot classes with > > >>>>>>>>>>>> JvmtiExport::should_post_class_prepare() enabled or disabled. > > >>>>>>>>>>> > > >>>>>>>>>>> This would introduce behavioral changes when JVMTI is enabled: > > >>>>>>>>>>> > > >>>>>>>>>>> + The order of JvmtiExport::post_class_prepare is different > > >>>>>>>>>>> than > > >>>>>>>>>>> before > > >>>>>>>>>>> + JvmtiExport::post_class_prepare may be called for a class > > >>>>>>>>>>> that > > >>>>>>>>>>> was not called before (if the class is never linked during > > >>>>>>>>>>> run time) > > >>>>>>>>>>> + JvmtiExport::post_class_prepare was called inside the > > >>>>>>>>>>> init_lock, now it's called outside of the init_lock > > >>>>>>>>>> > > >>>>>>>>>> I have to say I share Ioi's concerns here. This change will > > >>>>>>>>>> impact > > >>>>>>>>>> JVM TI agents in a way we can't be sure of. From a specification > > >>>>>>>>>> perspective I think we are fine as linking can be lazy or eager, > > >>>>>>>>>> so there's no implied order either. But this would be a > > >>>>>>>>>> behavioural change that will be observable by agents. (I'm less > > >>>>>>>>>> concerned about the init_lock situation as it seems potentially > > >>>>>>>>>> buggy to me to call out to an agent with the init_lock held > > >>>>>>>>>> in the > > >>>>>>>>>> first place! I find it hard to imagine an agent only working > > >>>>>>>>>> correctly if the init_lock is held.) > > >>>>>>>>> > > >>>>>>>>> David, > > >>>>>>>>> > > >>>>>>>>> The init_lock has a serializing effect. The callback for a > > >>>>>>>>> subclass > > >>>>>>>>> will not be executed until the callback for its super class has > > >>>>>>>>> been finished. > > >>>>>>>> > > >>>>>>>> Sorry I don't see that is the case. The init_lock for the subclass > > >>>>>>>> is distinct from the init_lock of the superclass, and linking of > > >>>>>>>> subclasses and superclasses is independent. > > >>>>>>> > > >>>>>>> > > >>>>>>> In InstanceKlass::link_class_impl, you first link all of your super > > >>>>>>> classes. > > >>>>>>> > > >>>>>>> If another thread is already linking your super class, you will > > >>>>>>> block > > >>>>>>> on that superclass's init_lock. > > >>>>>> > > >>>>>> The point is that there is already a race in terms of the > > >>>>>> execution of > > >>>>>> the two callbacks. So while this change can certainly produce a > > >>>>>> different result to what would previously be seen, such a result is > > >>>>>> already possible in the general case. > > >>>>>> > > >>>>>>> Of course, I may be wrong and my analysis may be bogus. But I hope > > >>>>>>> you can appreciate that this is not going to be a trivial change to > > >>>>>>> analyze. > > >>>>>> > > >>>>>> Yes I agree. While in general ordering of the class_prepare > > >>>>>> callbacks > > >>>>>> is not guaranteed for independent classes, if a given application > > >>>>>> explicitly loads and links classes in a known order then it can > > >>>>>> (reasonably) expect its callbacks to execute in that order. If this > > >>>>>> change means classes will now be linked in an order independent of > > >>>>>> what the normal runtime order would be then that could be a problem > > >>>>>> for existing agents. > > >>>>>> > > >>>>>> So where does this leave us? The change is within spec, but could > > >>>>>> trigger changes in agent behaviour that we can't really evaluate > > >>>>>> a-priori. So as you say we should have a fairly good reason for > > >>>>>> doing > > >>>>>> this. I can easily envisage that pre-linking when no callbacks are > > >>>>>> enabled would be a performance boost. But with callbacks enabled and > > >>>>>> consuming CPU cycles any benefit from pre-linking could be lost > > >>>>>> in the > > >>>>>> noise. > > >>>>>> > > >>>>>> What if we did as Ioi suggested and only set the class as linked in > > >>>>>> restore_unshareable_info if > > >>>>>> !JvmtiExport::should_post_class_prepare(); > > >>>>>> and in addition in InstanceKlass::link_class_imp we added an > > >>>>>> additional check at the start: > > >>>>>> > > >>>>>> // Pre-linking at load time may have been disabled for shared > > >>>>>> classes, > > >>>>>> // but we may be able to do it now. > > >>>>>> if (JvmtiExport::should_post_class_prepare() && > > >>>>>> !BytecodeVerificationLocal && > > >>>>>> loader_data->is_the_null_class_loader_data()) { > > >>>>>> _init_state = linked; > > >>>>>> } > > >>>>> > > >>>>> There should obviously be a check for is_shared() in there as well. > > >>>>> > > >>>>> David > > >>>>> ----- > > >>>>> > > >>>>>> ? > > >>>>>> > > >>>>>> That avoids the problem of changing the JVM TI callback > > >>>>>> behaviour, but > > >>>>>> also shortens the link time path when the callbacks are enabled. > > >>>>>> > > >>>>>> Hope I got that right. :) > > >>>>>> > > >>>>>> David > > >>>>>> ----- > > >>>>>> > > >>>>>>> Thanks > > >>>>>>> - Ioi > > >>>>>>>> > > >>>>>>>> David > > >>>>>>>> ----- > > >>>>>>>> > > >>>>>>>>> With the proposed patch, the callback for both the super class > > >>>>>>>>> and > > >>>>>>>>> subclass can proceed in parallel. So if an agent performs class > > >>>>>>>>> hierarchy analysis, for example, it may need to perform extra > > >>>>>>>>> synchronization. > > >>>>>>>>> > > >>>>>>>>> This is just one example that I can think of. I am sure there are > > >>>>>>>>> other issues that we have not thought about. > > >>>>>>>>> > > >>>>>>>>> The fact is we are dealing with arbitrary code in the callbacks, > > >>>>>>>>> and we are changing the conditions of how they are called. The > > >>>>>>>>> calls happen inside very delicate code (class loading, system > > >>>>>>>>> dictionary). I am reluctant to do the due diligence, which is > > >>>>>>>>> substantial, of verifying that this is a safe change, unless we > > >>>>>>>>> have a really compelling reason to do so. > > >>>>>>>>> > > >>>>>>>>> Thanks > > >>>>>>>>> - Ioi > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > > > > From serguei.spitsyn at oracle.com Fri Jun 12 03:41:19 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 11 Jun 2020 20:41:19 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: <2cf4e45a-4d44-3c0a-a272-480f56a5e6e8@oracle.com> References: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> <2cf4e45a-4d44-3c0a-a272-480f56a5e6e8@oracle.com> Message-ID: <3927ae7c-efa9-eb9f-ab98-18d778d5a966@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Fri Jun 12 05:48:49 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 11 Jun 2020 22:48:49 -0700 Subject: RFR: 8246196: javax/management/MBeanServer/OldMBeanServerTest fails with AssertionError In-Reply-To: References: Message-ID: <870214c5-df4b-a3b3-6816-a67a44ecf308@oracle.com> +1 Thanks, Serguei On 6/11/20 17:28, Alex Menkov wrote: > +1 > > --alex > > On 06/11/2020 16:51, David Holmes wrote: >> Hi Daniil, >> >> On 12/06/2020 5:56 am, Daniil Titov wrote: >>> Please review change [1] that fixes an intermittent? failure of the >>> test when it is runs with -Xcomp. >>> >>> The problem here is that the timespan the test uses to count >>> notifications? is not adjusted for "test.timeout.factor" system >>> property. >> >> The adjustment looks fine. >> >>> The original issue is reproducible in JDK 11 and on Solaris platform >>> only. However,? I think it makes sense to apply this change in JDK >>> 15 to prevent this from possible happening in the future and then >>> backport it to 11. >> >> Do you still intend this for 15 or just 16? If 15 then push to jdk15 >> repo and it will get forward ported to jdk/jdk automatically. >> >> Thanks, >> David >> >>> [1] http://cr.openjdk.java.net/~dtitov/8246196/webrev.01/ >>> [2] https://bugs.openjdk.java.net/browse/JDK-8246196 >>> >>> Thank you, >>> Daniil >>> >>> >>> From serguei.spitsyn at oracle.com Fri Jun 12 06:02:16 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 11 Jun 2020 23:02:16 -0700 (PDT) Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean In-Reply-To: References: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> Message-ID: <10b069a1-4888-531f-a61e-aa7594cdf67f@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Jun 12 07:27:25 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 12 Jun 2020 17:27:25 +1000 Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean In-Reply-To: <10b069a1-4888-531f-a61e-aa7594cdf67f@oracle.com> References: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> <10b069a1-4888-531f-a61e-aa7594cdf67f@oracle.com> Message-ID: On 12/06/2020 4:02 pm, serguei.spitsyn at oracle.com wrote: > Hi Leonid, > > It looks okay to me. > > > I find this whole MBean vs MXBean terminology very confusing. :) > > Me too. :) > For instance, I see some references to MBeanServer (should they also be > replaced with MXBeanServer?): No. There is no MXBeanServer. This attempts to shed some light on everything: https://docs.oracle.com/en/java/javase/14/docs/api/java.management/javax/management/MXBean.html "An MXBean is a kind of MBean. An MXBean object can be registered directly in the MBean Server, or it can be used as an argument to StandardMBean and the resultant MBean registered in the MBean Server." Cheers, David ----- > http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon002/TestDescription.java.udiff.html > > * The test checks that > - * CompilationMBean.isCompilationTimeMonitoringSupported() > + * CompilationMXBean.isCompilationTimeMonitoringSupported() > * method returns true. The test performs access to management metrics > * through default*MBeanServer*. > > > http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon003/TestDescription.java.udiff.html > > * The test checks that > - * CompilationMBean.isCompilationTimeMonitoringSupported() > + * CompilationMXBean.isCompilationTimeMonitoringSupported() > * method returns true. The test performs access to management metrics > * through custom*MBeanServer* (developed and saved in > > > http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon004/TestDescription.java.udiff.html > > * DESCRIPTION > * The test checks that > - * CompilationMBean.isCompilationTimeMonitoringSupported() > + * CompilationMXBean.isCompilationTimeMonitoringSupported() > * method returns true. The test performs access to management metrics > * through default*MBeanServer* proxy. > > > http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon005/TestDescription.java.udiff.html > > * The test checks that > - * CompilationMBean.isCompilationTimeMonitoringSupported() > + * CompilationMXBean.isCompilationTimeMonitoringSupported() > * method returns true. The test performs access to management metrics > * through custom*MBeanServer* proxy (developed and saved in > > > Thanks, > Serguei > > > On 6/11/20 16:48, David Holmes wrote: >> Hi Leonid, >> >> On 12/06/2020 7:09 am, Leonid Mesnik wrote: >>> Hi >>> >>> Could you review following fix which change leftovers of ThreadMBean >>> to ThreadMXBean. In the most cases the comments were updated only. >>> >>> webrev: http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/ >> >> Looks good! >> >> I find this whole MBean vs MXBean terminology very confusing. :) >> >> Thanks, >> David >> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8242328 >>> >>> Leonid >>> > From daniel.daugherty at oracle.com Fri Jun 12 16:46:10 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 12:46:10 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java Message-ID: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> Greetings, It's time to reduce the noise in the CI so I'm ProblemListing tests. Here's the bug for failure: ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field value ??? https://bugs.openjdk.java.net/browse/JDK-8205957 and here's the bug for the ProblemListing: ??? JDK-8247495 ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java ??? https://bugs.openjdk.java.net/browse/JDK-8247495 I'm considering this a trivial change so I need a single (R)eviewer. Here's the context diff for the change: $ hg diff diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 -0700 +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 -0400 @@ -141,6 +141,7 @@ ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java 8219652 aix-ppc64 ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java 8219652 aix-ppc64 ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java 8219652 aix-ppc64 +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java 8205957 generic-all ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java 8208243,8192647 generic-all This issue is actually much older than JDK-8205957 would indicate (first sighting in JDK11 for that bug ID). The older version of the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 and that failures first sighting is in JDK7. Thanks, in advance, for any comments, questions, or suggestions. Dan From daniil.x.titov at oracle.com Fri Jun 12 17:55:48 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Fri, 12 Jun 2020 10:55:48 -0700 Subject: RFR: 8246196: javax/management/MBeanServer/OldMBeanServerTest fails with AssertionError In-Reply-To: References: Message-ID: <1E63BB2E-9327-47E1-92BA-6D19EE415528@oracle.com> Hi David and Alex, Thank for reviewing this change. I will push it to jdk15 repo as David suggested. Best regards, Daniil ?On 6/11/20, 5:28 PM, "Alex Menkov" wrote: +1 --alex On 06/11/2020 16:51, David Holmes wrote: > Hi Daniil, > > On 12/06/2020 5:56 am, Daniil Titov wrote: >> Please review change [1] that fixes an intermittent failure of the >> test when it is runs with -Xcomp. >> >> The problem here is that the timespan the test uses to count >> notifications is not adjusted for "test.timeout.factor" system property. > > The adjustment looks fine. > >> The original issue is reproducible in JDK 11 and on Solaris platform >> only. However, I think it makes sense to apply this change in JDK 15 >> to prevent this from possible happening in the future and then >> backport it to 11. > > Do you still intend this for 15 or just 16? If 15 then push to jdk15 > repo and it will get forward ported to jdk/jdk automatically. > > Thanks, > David > >> [1] http://cr.openjdk.java.net/~dtitov/8246196/webrev.01/ >> [2] https://bugs.openjdk.java.net/browse/JDK-8246196 >> >> Thank you, >> Daniil >> >> >> From daniel.daugherty at oracle.com Fri Jun 12 18:40:11 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 14:40:11 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> Message-ID: <7cbde455-72ca-4d8a-b4fa-15096a9b1a87@oracle.com> Tap, tap, tap... is this thing working? Anyone out there? This is a trivial 1-liner review... Dan On 6/12/20 12:46 PM, Daniel D. Daugherty wrote: > Greetings, > > It's time to reduce the noise in the CI so I'm ProblemListing tests. > > Here's the bug for failure: > > ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field > value > ??? https://bugs.openjdk.java.net/browse/JDK-8205957 > > and here's the bug for the ProblemListing: > > ??? JDK-8247495 ProblemList > vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java > ??? https://bugs.openjdk.java.net/browse/JDK-8247495 > > I'm considering this a trivial change so I need a single (R)eviewer. > > Here's the context diff for the change: > > $ hg diff > diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt > --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 > -0700 > +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 > -0400 > @@ -141,6 +141,7 @@ > ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java > 8219652 aix-ppc64 > ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java > 8219652 aix-ppc64 > ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java > 8219652 aix-ppc64 > +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java > 8205957 generic-all > > ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java > 8208243,8192647 generic-all > > > This issue is actually much older than JDK-8205957 would indicate > (first sighting in JDK11 for that bug ID). The older version of > the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 > and that failures first sighting is in JDK7. > > > Thanks, in advance, for any comments, questions, or suggestions. > > Dan > From yumin.qi at oracle.com Fri Jun 12 18:51:17 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Fri, 12 Jun 2020 11:51:17 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <7cbde455-72ca-4d8a-b4fa-15096a9b1a87@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <7cbde455-72ca-4d8a-b4fa-15096a9b1a87@oracle.com> Message-ID: <53f6cede-0aaf-8e66-a93a-033ffeb778c0@oracle.com> Hi, Dan ? Looks good to me and it is trivial. Thanks Yumin On 6/12/20 11:40 AM, Daniel D. Daugherty wrote: > Tap, tap, tap... is this thing working? > > Anyone out there? This is a trivial 1-liner review... > > Dan > > > On 6/12/20 12:46 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> It's time to reduce the noise in the CI so I'm ProblemListing tests. >> >> Here's the bug for failure: >> >> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field >> value >> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >> >> and here's the bug for the ProblemListing: >> >> ??? JDK-8247495 ProblemList >> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >> >> I'm considering this a trivial change so I need a single (R)eviewer. >> >> Here's the context diff for the change: >> >> $ hg diff >> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 >> -0700 >> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 >> -0400 >> @@ -141,6 +141,7 @@ >> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >> 8219652 aix-ppc64 >> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >> 8219652 aix-ppc64 >> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >> 8219652 aix-ppc64 >> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >> 8205957 generic-all >> >> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >> 8208243,8192647 generic-all >> >> >> This issue is actually much older than JDK-8205957 would indicate >> (first sighting in JDK11 for that bug ID). The older version of >> the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 >> and that failures first sighting is in JDK7. >> >> >> Thanks, in advance, for any comments, questions, or suggestions. >> >> Dan >> > From chris.plummer at oracle.com Fri Jun 12 18:49:03 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 12 Jun 2020 11:49:03 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> Message-ID: Hi Dan, What's the criteria for "noise". I don't consider the failures for this test as noisy. I only see 3 in mach5 CI testing for all of JDK 15. JDK 14 does? appear to have been somewhat noisy, possibly enough so that it looks like maybe something changed to reduce the number of failures in 15. In any case, do you plan on backporting to 14? thanks, Chris On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: > Greetings, > > It's time to reduce the noise in the CI so I'm ProblemListing tests. > > Here's the bug for failure: > > ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field > value > ??? https://bugs.openjdk.java.net/browse/JDK-8205957 > > and here's the bug for the ProblemListing: > > ??? JDK-8247495 ProblemList > vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java > ??? https://bugs.openjdk.java.net/browse/JDK-8247495 > > I'm considering this a trivial change so I need a single (R)eviewer. > > Here's the context diff for the change: > > $ hg diff > diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt > --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 > -0700 > +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 > -0400 > @@ -141,6 +141,7 @@ > ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java > 8219652 aix-ppc64 > ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java > 8219652 aix-ppc64 > ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java > 8219652 aix-ppc64 > +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java > 8205957 generic-all > > ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java > 8208243,8192647 generic-all > > > This issue is actually much older than JDK-8205957 would indicate > (first sighting in JDK11 for that bug ID). The older version of > the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 > and that failures first sighting is in JDK7. > > > Thanks, in advance, for any comments, questions, or suggestions. > > Dan > From daniel.daugherty at oracle.com Fri Jun 12 18:52:56 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 14:52:56 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> Message-ID: <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> On 6/12/20 2:49 PM, Chris Plummer wrote: > Hi Dan, > > What's the criteria for "noise". There is no specific criteria that I'm aware of. It popped up in today's JDK15 testing so it got on my radar (again). > I don't consider the failures for this test as noisy. I only see 3 in > mach5 CI testing for all of JDK 15. JDK 14 does? appear to have been > somewhat noisy, possibly enough so that it looks like maybe something > changed to reduce the number of failures in 15. In any case, do you > plan on backporting to 14? This failure has been around in one form or another since JDK7. If someone decides to fix it, then they can un-ProblemList it. I'm planning to push it to JDK15 and JDK16. Those two releases are the focus of my CI noise reduction efforts. I don't monitor the JDK14u CI... May I proceed with the ProblemListing? Dan > > thanks, > > Chris > > On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >> Greetings, >> >> It's time to reduce the noise in the CI so I'm ProblemListing tests. >> >> Here's the bug for failure: >> >> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field >> value >> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >> >> and here's the bug for the ProblemListing: >> >> ??? JDK-8247495 ProblemList >> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >> >> I'm considering this a trivial change so I need a single (R)eviewer. >> >> Here's the context diff for the change: >> >> $ hg diff >> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 >> -0700 >> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 >> -0400 >> @@ -141,6 +141,7 @@ >> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >> 8219652 aix-ppc64 >> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >> 8219652 aix-ppc64 >> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >> 8219652 aix-ppc64 >> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >> 8205957 generic-all >> >> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >> 8208243,8192647 generic-all >> >> >> This issue is actually much older than JDK-8205957 would indicate >> (first sighting in JDK11 for that bug ID). The older version of >> the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 >> and that failures first sighting is in JDK7. >> >> >> Thanks, in advance, for any comments, questions, or suggestions. >> >> Dan >> > > From daniel.daugherty at oracle.com Fri Jun 12 18:53:30 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 14:53:30 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <53f6cede-0aaf-8e66-a93a-033ffeb778c0@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <7cbde455-72ca-4d8a-b4fa-15096a9b1a87@oracle.com> <53f6cede-0aaf-8e66-a93a-033ffeb778c0@oracle.com> Message-ID: <7841c9b3-0c83-41af-54dc-c02407e60d56@oracle.com> Yumin, Thanks for the review! Dan On 6/12/20 2:51 PM, Yumin Qi wrote: > Hi, Dan > > ? Looks good to me and it is trivial. > > > Thanks > > Yumin > > On 6/12/20 11:40 AM, Daniel D. Daugherty wrote: >> Tap, tap, tap... is this thing working? >> >> Anyone out there? This is a trivial 1-liner review... >> >> Dan >> >> >> On 6/12/20 12:46 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> It's time to reduce the noise in the CI so I'm ProblemListing tests. >>> >>> Here's the bug for failure: >>> >>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field >>> value >>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>> >>> and here's the bug for the ProblemListing: >>> >>> ??? JDK-8247495 ProblemList >>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>> >>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>> >>> I'm considering this a trivial change so I need a single (R)eviewer. >>> >>> Here's the context diff for the change: >>> >>> $ hg diff >>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 >>> -0700 >>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 >>> -0400 >>> @@ -141,6 +141,7 @@ >>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>> 8219652 aix-ppc64 >>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>> 8219652 aix-ppc64 >>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>> 8219652 aix-ppc64 >>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>> 8205957 generic-all >>> >>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>> 8208243,8192647 generic-all >>> >>> >>> This issue is actually much older than JDK-8205957 would indicate >>> (first sighting in JDK11 for that bug ID). The older version of >>> the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 >>> and that failures first sighting is in JDK7. >>> >>> >>> Thanks, in advance, for any comments, questions, or suggestions. >>> >>> Dan >>> >> From chris.plummer at oracle.com Fri Jun 12 18:58:07 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 12 Jun 2020 11:58:07 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> Message-ID: On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: > On 6/12/20 2:49 PM, Chris Plummer wrote: >> Hi Dan, >> >> What's the criteria for "noise". > > There is no specific criteria that I'm aware of. > > It popped up in today's JDK15 testing so it got on my radar (again). > > >> I don't consider the failures for this test as noisy. I only see 3 in >> mach5 CI testing for all of JDK 15. JDK 14 does? appear to have been >> somewhat noisy, possibly enough so that it looks like maybe something >> changed to reduce the number of failures in 15. In any case, do you >> plan on backporting to 14? > > This failure has been around in one form or another since JDK7. If > someone > decides to fix it, then they can un-ProblemList it. > > I'm planning to push it to JDK15 and JDK16. Those two releases are the > focus > of my CI noise reduction efforts. I don't monitor the JDK14u CI... > > May I proceed with the ProblemListing? I just don't feel if we problem list tests with this failure rate that in the long run it is a productive or good thing to do. 3 failures during an entire 6 month CI test cycle seems rather low to me. I'd like to get opinions from others. Chris > > Dan > >> >> thanks, >> >> Chris >> >> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> It's time to reduce the noise in the CI so I'm ProblemListing tests. >>> >>> Here's the bug for failure: >>> >>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad field >>> value >>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>> >>> and here's the bug for the ProblemListing: >>> >>> ??? JDK-8247495 ProblemList >>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>> >>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>> >>> I'm considering this a trivial change so I need a single (R)eviewer. >>> >>> Here's the context diff for the change: >>> >>> $ hg diff >>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 2020 >>> -0700 >>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 2020 >>> -0400 >>> @@ -141,6 +141,7 @@ >>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>> 8219652 aix-ppc64 >>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>> 8219652 aix-ppc64 >>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>> 8219652 aix-ppc64 >>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>> 8205957 generic-all >>> >>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>> 8208243,8192647 generic-all >>> >>> >>> This issue is actually much older than JDK-8205957 would indicate >>> (first sighting in JDK11 for that bug ID). The older version of >>> the test is covered by https://bugs.openjdk.java.net/browse/JDK-6528079 >>> and that failures first sighting is in JDK7. >>> >>> >>> Thanks, in advance, for any comments, questions, or suggestions. >>> >>> Dan >>> >> >> > From daniil.x.titov at oracle.com Fri Jun 12 19:09:56 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Fri, 12 Jun 2020 12:09:56 -0700 Subject: RFR: 8247469: getSystemCpuLoad() returns -1 on linux when some offline cpus are present and cpusets.effective_cpus is not available In-Reply-To: References: Message-ID: <97230AF3-579A-461F-AB4C-690F37DDFBCE@oracle.com> Hi Matthias, The change looks good to me. Probably it also makes sense to remove method getHostConfiguredCpuCount0() since it is no longer used. Thanks, Daniil ?On 6/12/20, 8:25 AM, "Baesken, Matthias" wrote: Hello, please review the following change . We have a Linux machine where OperatingSystemMXBean mbean = (com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean(); double load = mbean.getSystemCpuLoad(); returns -1 ; Reason is that there are offline CPUs (48 configured , 32 online ). Additionally cpusets.effective_cpus is not available on that Linux system . Bug/webrev : https://bugs.openjdk.java.net/browse/JDK-8247466 http://cr.openjdk.java.net/~mbaesken/webrevs/8247469.0/ Thanks, Matthias -----Original Message----- From: Bob Vandette Sent: Freitag, 12. Juni 2020 15:02 To: Baesken, Matthias Cc: daniil.x.titov at oracle.com Subject: Re: getCpuLoad() / getSystemCpuLoad() returns -1 on linux when some offline cpus are present and cpusets.effective_cpus is not available I looks like there are two problems here: 1. containerMetrics.getCpuSetCpus().length returns the online CPUs but getHostConfiguredCpuCount0() returns the total number of CPUs including offline ones. One solution might be to add a getHostOnlineCpuCount0() function. 2. If getEffectiveCpuSetCpus is not available then we should use getCpuSetCpus. Bob. > On Jun 12, 2020, at 6:43 AM, Baesken, Matthias wrote: > > Hello, I noticed the following on one of our Linux machines : > > OperatingSystemMXBean mbean = (com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean(); > double load = mbean.getSystemCpuLoad(); > > returns -1 ; this seems to be related to ?8226575: OperatingSystemMXBean should be made container aware? . > > This machine has the following ?special features? > > - a few CPUs are offline (means the configured cpus are 48 but the online cpus are 32) so > > > private boolean isCpuSetSameAsHostCpuSet() { > if (containerMetrics != null && containerMetrics.getCpuSetCpus() != null) { > return containerMetrics.getCpuSetCpus().length == getHostConfiguredCpuCount0(); > } > return false; > } > > Returns false > > - the machine does not have cpusets.effective_cpus (not all Linux machines have it ) > > In this case getSystemCpuLoad() / getCpuLoad() returns -1 (because it checks that 48 != 32, and next it checks for cpusets.effective_cpus which is not present ). > > See the coding at : > https://hg.openjdk.java.net/jdk/jdk/file/bdc14b8d31ff/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java#l136 > > // If the cpuset is the same as the host's one there is no need to iterate over each CPU > if (isCpuSetSameAsHostCpuSet()) { > return getCpuLoad0(); > } else { > int[] cpuSet = containerMetrics.getEffectiveCpuSetCpus(); > if (cpuSet != null && cpuSet.length > 0) { > double systemLoad = 0.0; > for (int cpu : cpuSet) { > double cpuLoad = getSingleCpuLoad0(cpu); > if (cpuLoad < 0) { > return -1; > } > systemLoad += cpuLoad; > } > return systemLoad / cpuSet.length; > } > return -1; > } > > > Could we better a) return the native getCpuLoad0(); in this case or b) use the available containerMetrics.getCpuSetCpus(); when getEffectiveCpuSetCpus(); > Gives an empty array (btw. getCpuSetCpus() returns on this machine the online cpus = 0,31 = 32 ) ? > > I opened > > https://bugs.openjdk.java.net/browse/JDK-8247469 > > to track this (I see the issue in jdk/jdk but it seems it came also to oracle jdk8u261, which the July update ). > > Best regards, Matthias From daniel.daugherty at oracle.com Fri Jun 12 19:13:24 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 15:13:24 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> Message-ID: <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> On 6/12/20 2:58 PM, Chris Plummer wrote: > On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >> On 6/12/20 2:49 PM, Chris Plummer wrote: >>> Hi Dan, >>> >>> What's the criteria for "noise". >> >> There is no specific criteria that I'm aware of. >> >> It popped up in today's JDK15 testing so it got on my radar (again). >> >> >>> I don't consider the failures for this test as noisy. I only see 3 >>> in mach5 CI testing for all of JDK 15. JDK 14 does? appear to have >>> been somewhat noisy, possibly enough so that it looks like maybe >>> something changed to reduce the number of failures in 15. In any >>> case, do you plan on backporting to 14? >> >> This failure has been around in one form or another since JDK7. If >> someone >> decides to fix it, then they can un-ProblemList it. >> >> I'm planning to push it to JDK15 and JDK16. Those two releases are >> the focus >> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >> >> May I proceed with the ProblemListing? > I just don't feel if we problem list tests with this failure rate that > in the long run it is a productive or good thing to do. 3 failures > during an entire 6 month CI test cycle seems rather low to me. I'd > like to get opinions from others. It's not just the failure rate. It's the fact that this bug has sat for years without being fixed. I have tracked this bug for a very long time since I'm the guy that filed both bugs. Mach5 is showing 54 sightings of 8205957 and here's the linking distribution: $ sort /tmp/fred | uniq -c | sort -rn ? 20 daniel.daugherty at oracle.com ? 10 rahul.v.raghavan at oracle.com ?? 7 martin.thompson at oracle.com ?? 4 leonid.mesnik at oracle.com ?? 3 jesper.wilhelmsson at oracle.com ?? 3 chris.plummer at oracle.com ?? 2 mikael.vidstedt at oracle.com ?? 1 tobias.hartmann at oracle.com ?? 1 sangheon.kim at oracle.com ?? 1 kim.barrett at oracle.com ?? 1 daniil.x.titov at oracle.com ?? 1 calvin.cheung at oracle.com As you can see, I've observed and linked this bug a lot. I'm tired of it. Dan > > Chris >> >> Dan >> >>> >>> thanks, >>> >>> Chris >>> >>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> It's time to reduce the noise in the CI so I'm ProblemListing tests. >>>> >>>> Here's the bug for failure: >>>> >>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>> field value >>>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>>> >>>> and here's the bug for the ProblemListing: >>>> >>>> ??? JDK-8247495 ProblemList >>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>> >>>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>>> >>>> I'm considering this a trivial change so I need a single (R)eviewer. >>>> >>>> Here's the context diff for the change: >>>> >>>> $ hg diff >>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 >>>> 2020 -0700 >>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 >>>> 2020 -0400 >>>> @@ -141,6 +141,7 @@ >>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>> 8219652 aix-ppc64 >>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>> 8219652 aix-ppc64 >>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>> 8219652 aix-ppc64 >>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>> 8205957 generic-all >>>> >>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>> 8208243,8192647 generic-all >>>> >>>> >>>> This issue is actually much older than JDK-8205957 would indicate >>>> (first sighting in JDK11 for that bug ID). The older version of >>>> the test is covered by >>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>> and that failures first sighting is in JDK7. >>>> >>>> >>>> Thanks, in advance, for any comments, questions, or suggestions. >>>> >>>> Dan >>>> >>> >>> >> > > From mandy.chung at oracle.com Fri Jun 12 20:43:43 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 12 Jun 2020 13:43:43 -0700 Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean In-Reply-To: References: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> <10b069a1-4888-531f-a61e-aa7594cdf67f@oracle.com> Message-ID: <9f7edd05-ecf9-d599-53af-a2cfad7f6173@oracle.com> On 6/12/20 12:27 AM, David Holmes wrote: > > No. There is no MXBeanServer. > > This attempts to shed some light on everything: > > https://docs.oracle.com/en/java/javase/14/docs/api/java.management/javax/management/MXBean.html > > > "An MXBean is a kind of MBean. An MXBean object can be registered > directly in the MBean Server, or it can be used as an argument to > StandardMBean and the resultant MBean registered in the MBean Server." MXBean was added in Java SE 6 to support the management bean API changes and interoperability. For a remote client to access a MBean, all classes referenced by a MBean interface must be observable in the remote client.? If a MBean is updated to reference type X, a remote client attempts to access this MBean may fail if X is not present. MXBean is a type of open MBean which means that a remote client accesses a MXBean using the predefined set of open types.?? For example, if ThreadMXBean were modified to access a new type say StackFrame in JDK N, a remote client can continue to run on an older JDK while it's able to monitor a JVM running on a newer release N using the open types (javax.management.openmbean.* types). Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Fri Jun 12 20:48:58 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 12 Jun 2020 13:48:58 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> Message-ID: On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: > On 6/12/20 2:58 PM, Chris Plummer wrote: >> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>> Hi Dan, >>>> >>>> What's the criteria for "noise". >>> >>> There is no specific criteria that I'm aware of. >>> >>> It popped up in today's JDK15 testing so it got on my radar (again). >>> >>> >>>> I don't consider the failures for this test as noisy. I only see 3 >>>> in mach5 CI testing for all of JDK 15. JDK 14 does? appear to have >>>> been somewhat noisy, possibly enough so that it looks like maybe >>>> something changed to reduce the number of failures in 15. In any >>>> case, do you plan on backporting to 14? >>> >>> This failure has been around in one form or another since JDK7. If >>> someone >>> decides to fix it, then they can un-ProblemList it. >>> >>> I'm planning to push it to JDK15 and JDK16. Those two releases are >>> the focus >>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>> >>> May I proceed with the ProblemListing? >> I just don't feel if we problem list tests with this failure rate >> that in the long run it is a productive or good thing to do. 3 >> failures during an entire 6 month CI test cycle seems rather low to >> me. I'd like to get opinions from others. > > It's not just the failure rate. It's the fact that this bug has sat for > years without being fixed. I have tracked this bug for a very long time > since I'm the guy that filed both bugs. > > Mach5 is showing 54 sightings of 8205957 and here's the linking > distribution: > > $ sort /tmp/fred | uniq -c | sort -rn > ? 20 daniel.daugherty at oracle.com > ? 10 rahul.v.raghavan at oracle.com > ?? 7 martin.thompson at oracle.com > ?? 4 leonid.mesnik at oracle.com > ?? 3 jesper.wilhelmsson at oracle.com > ?? 3 chris.plummer at oracle.com > ?? 2 mikael.vidstedt at oracle.com > ?? 1 tobias.hartmann at oracle.com > ?? 1 sangheon.kim at oracle.com > ?? 1 kim.barrett at oracle.com > ?? 1 daniil.x.titov at oracle.com > ?? 1 calvin.cheung at oracle.com > > As you can see, I've observed and linked this bug a lot. > I'm tired of it. I still think that what is most relevant is how often it reproduces with CI with the current release, and for that the # is 3 times in 6 months. In our current test history it's failed 6 out of 21390 runs, so you are disabling a test that passes 99.97% of the time. My concern is that if a bug is introduced that makes it start failing every run, or at least very frequently, it will be missed. We need to carefully weigh the annoyance of failure noise with the importance of test coverage. I don't think the balance is right for this test to justify problem listing it. What you might want to consider is disabling it in the mode where it seems to be failing. The failures all seem to be with -Xcomp. Maybe you should just problem list it in ProblemList-Xcomp.txt. Chris > > Dan > > >> >> Chris >>> >>> Dan >>> >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> It's time to reduce the noise in the CI so I'm ProblemListing tests. >>>>> >>>>> Here's the bug for failure: >>>>> >>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>>> field value >>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>> >>>>> and here's the bug for the ProblemListing: >>>>> >>>>> ??? JDK-8247495 ProblemList >>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>> >>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>> >>>>> I'm considering this a trivial change so I need a single (R)eviewer. >>>>> >>>>> Here's the context diff for the change: >>>>> >>>>> $ hg diff >>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 >>>>> 2020 -0700 >>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 >>>>> 2020 -0400 >>>>> @@ -141,6 +141,7 @@ >>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>> 8219652 aix-ppc64 >>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>> 8219652 aix-ppc64 >>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>> 8219652 aix-ppc64 >>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>> 8205957 generic-all >>>>> >>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>> 8208243,8192647 generic-all >>>>> >>>>> >>>>> This issue is actually much older than JDK-8205957 would indicate >>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>> the test is covered by >>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>> and that failures first sighting is in JDK7. >>>>> >>>>> >>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>> >>>>> Dan >>>>> >>>> >>>> >>> >> >> > From serguei.spitsyn at oracle.com Fri Jun 12 20:59:07 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 12 Jun 2020 13:59:07 -0700 Subject: RFR: 8242328: Update mentions of ThreadMBean to ThreadMXBean In-Reply-To: References: <7dd150a5-ebea-b91a-407b-17a69855f387@oracle.com> <10b069a1-4888-531f-a61e-aa7594cdf67f@oracle.com> Message-ID: <4d2caadb-111e-9366-1571-2cc034e46dbe@oracle.com> Thanks, David! Serguei On 6/12/20 00:27, David Holmes wrote: > On 12/06/2020 4:02 pm, serguei.spitsyn at oracle.com wrote: >> Hi Leonid, >> >> It looks okay to me. >> >> ?> I find this whole MBean vs MXBean terminology very confusing. :) >> >> Me too. :) >> For instance, I see some references to MBeanServer (should they also >> be replaced with MXBeanServer?): > > No. There is no MXBeanServer. > > This attempts to shed some light on everything: > > https://docs.oracle.com/en/java/javase/14/docs/api/java.management/javax/management/MXBean.html > > > "An MXBean is a kind of MBean. An MXBean object can be registered > directly in the MBean Server, or it can be used as an argument to > StandardMBean and the resultant MBean registered in the MBean Server." > > Cheers, > David > ----- > >> http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon002/TestDescription.java.udiff.html >> >> >> ? *???? The test checks that >> - * CompilationMBean.isCompilationTimeMonitoringSupported() >> + * CompilationMXBean.isCompilationTimeMonitoringSupported() >> ?? *???? method returns true. The test performs access to management >> metrics >> ?? *???? through default*MBeanServer*. >> >> >> http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon003/TestDescription.java.udiff.html >> >> >> ?? *???? The test checks that >> - * CompilationMBean.isCompilationTimeMonitoringSupported() >> + * CompilationMXBean.isCompilationTimeMonitoringSupported() >> ?? *???? method returns true. The test performs access to management >> metrics >> ?? *???? through custom*MBeanServer*? (developed and saved in >> >> >> http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon004/TestDescription.java.udiff.html >> >> >> ?? * DESCRIPTION >> ?? *???? The test checks that >> - * CompilationMBean.isCompilationTimeMonitoringSupported() >> + * CompilationMXBean.isCompilationTimeMonitoringSupported() >> ?? *???? method returns true. The test performs access to management >> metrics >> ?? *???? through default*MBeanServer*? proxy. >> >> >> http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/monitoring/CompilationMXBean/comptimemon005/TestDescription.java.udiff.html >> >> >> ?? *???? The test checks that >> - * CompilationMBean.isCompilationTimeMonitoringSupported() >> + * CompilationMXBean.isCompilationTimeMonitoringSupported() >> ?? *???? method returns true. The test performs access to management >> metrics >> ?? *???? through custom*MBeanServer*? proxy (developed and saved in >> >> >> Thanks, >> Serguei >> >> >> On 6/11/20 16:48, David Holmes wrote: >>> Hi Leonid, >>> >>> On 12/06/2020 7:09 am, Leonid Mesnik wrote: >>>> Hi >>>> >>>> Could you review following fix which change leftovers of >>>> ThreadMBean to ThreadMXBean. In the most cases the comments were >>>> updated only. >>>> >>>> webrev: http://cr.openjdk.java.net/~lmesnik/8242328/webrev.00/ >>> >>> Looks good! >>> >>> I find this whole MBean vs MXBean terminology very confusing. :) >>> >>> Thanks, >>> David >>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8242328 >>>> >>>> Leonid >>>> >> From daniel.daugherty at oracle.com Fri Jun 12 20:59:48 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 16:59:48 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> Message-ID: On 6/12/20 4:48 PM, Chris Plummer wrote: > On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: >> On 6/12/20 2:58 PM, Chris Plummer wrote: >>> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>>> Hi Dan, >>>>> >>>>> What's the criteria for "noise". >>>> >>>> There is no specific criteria that I'm aware of. >>>> >>>> It popped up in today's JDK15 testing so it got on my radar (again). >>>> >>>> >>>>> I don't consider the failures for this test as noisy. I only see 3 >>>>> in mach5 CI testing for all of JDK 15. JDK 14 does? appear to have >>>>> been somewhat noisy, possibly enough so that it looks like maybe >>>>> something changed to reduce the number of failures in 15. In any >>>>> case, do you plan on backporting to 14? >>>> >>>> This failure has been around in one form or another since JDK7. If >>>> someone >>>> decides to fix it, then they can un-ProblemList it. >>>> >>>> I'm planning to push it to JDK15 and JDK16. Those two releases are >>>> the focus >>>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>>> >>>> May I proceed with the ProblemListing? >>> I just don't feel if we problem list tests with this failure rate >>> that in the long run it is a productive or good thing to do. 3 >>> failures during an entire 6 month CI test cycle seems rather low to >>> me. I'd like to get opinions from others. >> >> It's not just the failure rate. It's the fact that this bug has sat for >> years without being fixed. I have tracked this bug for a very long time >> since I'm the guy that filed both bugs. >> >> Mach5 is showing 54 sightings of 8205957 and here's the linking >> distribution: >> >> $ sort /tmp/fred | uniq -c | sort -rn >> ? 20 daniel.daugherty at oracle.com >> ? 10 rahul.v.raghavan at oracle.com >> ?? 7 martin.thompson at oracle.com >> ?? 4 leonid.mesnik at oracle.com >> ?? 3 jesper.wilhelmsson at oracle.com >> ?? 3 chris.plummer at oracle.com >> ?? 2 mikael.vidstedt at oracle.com >> ?? 1 tobias.hartmann at oracle.com >> ?? 1 sangheon.kim at oracle.com >> ?? 1 kim.barrett at oracle.com >> ?? 1 daniil.x.titov at oracle.com >> ?? 1 calvin.cheung at oracle.com >> >> As you can see, I've observed and linked this bug a lot. >> I'm tired of it. > I still think that what is most relevant is how often it reproduces > with CI with the current release, and for that the # is 3 times in 6 > months. In our current test history it's failed 6 out of 21390 runs, > so you are disabling a test that passes 99.97% of the time. My concern > is that if a bug is introduced that makes it start failing every run, > or at least very frequently, it will be missed. We need to carefully > weigh the annoyance of failure noise with the importance of test > coverage. I don't think the balance is right for this test to justify > problem listing it. > > What you might want to consider is disabling it in the mode where it > seems to be failing. The failures all seem to be with -Xcomp. Maybe > you should just problem list it in ProblemList-Xcomp.txt. I didn't notice that this is an -Xcomp only failure. I was able to verify that fact for 46 of the 54 sightings. For the 8 oldest sightings, the task name has been lost to the dustbin of time so I can't confirm those. I can move the entry from test/hotspot/jtreg/ProblemList.txt to test/hotspot/jtreg/ProblemList-Xcomp.txt. Here's the context diff: $ hg diff diff -r 015533451f4c test/hotspot/jtreg/ProblemList-Xcomp.txt --- a/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 09:31:08 2020 -0700 +++ b/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 16:58:18 2020 -0400 @@ -27,3 +27,4 @@ ?# ?############################################################################# +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java 8205957 generic-all Is this acceptable to you? Dan > > Chris >> >> Dan >> >> >>> >>> Chris >>>> >>>> Dan >>>> >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> It's time to reduce the noise in the CI so I'm ProblemListing tests. >>>>>> >>>>>> Here's the bug for failure: >>>>>> >>>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>>>> field value >>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>>> >>>>>> and here's the bug for the ProblemListing: >>>>>> >>>>>> ??? JDK-8247495 ProblemList >>>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>> >>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>>> >>>>>> I'm considering this a trivial change so I need a single (R)eviewer. >>>>>> >>>>>> Here's the context diff for the change: >>>>>> >>>>>> $ hg diff >>>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 >>>>>> 2020 -0700 >>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 >>>>>> 2020 -0400 >>>>>> @@ -141,6 +141,7 @@ >>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>>> 8219652 aix-ppc64 >>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>>> 8219652 aix-ppc64 >>>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>>> 8219652 aix-ppc64 >>>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>> 8205957 generic-all >>>>>> >>>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>>> 8208243,8192647 generic-all >>>>>> >>>>>> >>>>>> This issue is actually much older than JDK-8205957 would indicate >>>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>>> the test is covered by >>>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>>> and that failures first sighting is in JDK7. >>>>>> >>>>>> >>>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>>> >>>>>> Dan >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > From chris.plummer at oracle.com Fri Jun 12 21:20:17 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 12 Jun 2020 14:20:17 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> Message-ID: <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> On 6/12/20 1:59 PM, Daniel D. Daugherty wrote: > On 6/12/20 4:48 PM, Chris Plummer wrote: >> On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: >>> On 6/12/20 2:58 PM, Chris Plummer wrote: >>>> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>>>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>>>> Hi Dan, >>>>>> >>>>>> What's the criteria for "noise". >>>>> >>>>> There is no specific criteria that I'm aware of. >>>>> >>>>> It popped up in today's JDK15 testing so it got on my radar (again). >>>>> >>>>> >>>>>> I don't consider the failures for this test as noisy. I only see >>>>>> 3 in mach5 CI testing for all of JDK 15. JDK 14 does? appear to >>>>>> have been somewhat noisy, possibly enough so that it looks like >>>>>> maybe something changed to reduce the number of failures in 15. >>>>>> In any case, do you plan on backporting to 14? >>>>> >>>>> This failure has been around in one form or another since JDK7. If >>>>> someone >>>>> decides to fix it, then they can un-ProblemList it. >>>>> >>>>> I'm planning to push it to JDK15 and JDK16. Those two releases are >>>>> the focus >>>>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>>>> >>>>> May I proceed with the ProblemListing? >>>> I just don't feel if we problem list tests with this failure rate >>>> that in the long run it is a productive or good thing to do. 3 >>>> failures during an entire 6 month CI test cycle seems rather low to >>>> me. I'd like to get opinions from others. >>> >>> It's not just the failure rate. It's the fact that this bug has sat for >>> years without being fixed. I have tracked this bug for a very long time >>> since I'm the guy that filed both bugs. >>> >>> Mach5 is showing 54 sightings of 8205957 and here's the linking >>> distribution: >>> >>> $ sort /tmp/fred | uniq -c | sort -rn >>> ? 20 daniel.daugherty at oracle.com >>> ? 10 rahul.v.raghavan at oracle.com >>> ?? 7 martin.thompson at oracle.com >>> ?? 4 leonid.mesnik at oracle.com >>> ?? 3 jesper.wilhelmsson at oracle.com >>> ?? 3 chris.plummer at oracle.com >>> ?? 2 mikael.vidstedt at oracle.com >>> ?? 1 tobias.hartmann at oracle.com >>> ?? 1 sangheon.kim at oracle.com >>> ?? 1 kim.barrett at oracle.com >>> ?? 1 daniil.x.titov at oracle.com >>> ?? 1 calvin.cheung at oracle.com >>> >>> As you can see, I've observed and linked this bug a lot. >>> I'm tired of it. >> I still think that what is most relevant is how often it reproduces >> with CI with the current release, and for that the # is 3 times in 6 >> months. In our current test history it's failed 6 out of 21390 runs, >> so you are disabling a test that passes 99.97% of the time. My >> concern is that if a bug is introduced that makes it start failing >> every run, or at least very frequently, it will be missed. We need to >> carefully weigh the annoyance of failure noise with the importance of >> test coverage. I don't think the balance is right for this test to >> justify problem listing it. >> >> What you might want to consider is disabling it in the mode where it >> seems to be failing. The failures all seem to be with -Xcomp. Maybe >> you should just problem list it in ProblemList-Xcomp.txt. > > I didn't notice that this is an -Xcomp only failure. I was able to verify > that fact for 46 of the 54 sightings. For the 8 oldest sightings, the > task > name has been lost to the dustbin of time so I can't confirm those. > > I can move the entry from test/hotspot/jtreg/ProblemList.txt to > test/hotspot/jtreg/ProblemList-Xcomp.txt. > > Here's the context diff: > > $ hg diff > diff -r 015533451f4c test/hotspot/jtreg/ProblemList-Xcomp.txt > --- a/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 09:31:08 > 2020 -0700 > +++ b/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 16:58:18 > 2020 -0400 > @@ -27,3 +27,4 @@ > ?# > ?############################################################################# > > > +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java > 8205957 generic-all > > > Is this acceptable to you? Yes, that works for me. Chris > > Dan > > > >> >> Chris >>> >>> Dan >>> >>> >>>> >>>> Chris >>>>> >>>>> Dan >>>>> >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>> >>>>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> It's time to reduce the noise in the CI so I'm ProblemListing >>>>>>> tests. >>>>>>> >>>>>>> Here's the bug for failure: >>>>>>> >>>>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>>>>> field value >>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>>>> >>>>>>> and here's the bug for the ProblemListing: >>>>>>> >>>>>>> ??? JDK-8247495 ProblemList >>>>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>> >>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>>>> >>>>>>> I'm considering this a trivial change so I need a single >>>>>>> (R)eviewer. >>>>>>> >>>>>>> Here's the context diff for the change: >>>>>>> >>>>>>> $ hg diff >>>>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 >>>>>>> 2020 -0700 >>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 >>>>>>> 2020 -0400 >>>>>>> @@ -141,6 +141,7 @@ >>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>>>> 8219652 aix-ppc64 >>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>>>> 8219652 aix-ppc64 >>>>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>>>> 8219652 aix-ppc64 >>>>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>> 8205957 generic-all >>>>>>> >>>>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>>>> 8208243,8192647 generic-all >>>>>>> >>>>>>> >>>>>>> This issue is actually much older than JDK-8205957 would indicate >>>>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>>>> the test is covered by >>>>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>>>> and that failures first sighting is in JDK7. >>>>>>> >>>>>>> >>>>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>>>> >>>>>>> Dan >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From daniel.daugherty at oracle.com Fri Jun 12 21:25:02 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 17:25:02 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> Message-ID: <33f4b0bb-28f1-adf4-f730-0f1a5841373f@oracle.com> On 6/12/20 5:20 PM, Chris Plummer wrote: > On 6/12/20 1:59 PM, Daniel D. Daugherty wrote: >> On 6/12/20 4:48 PM, Chris Plummer wrote: >>> On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: >>>> On 6/12/20 2:58 PM, Chris Plummer wrote: >>>>> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>>>>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> What's the criteria for "noise". >>>>>> >>>>>> There is no specific criteria that I'm aware of. >>>>>> >>>>>> It popped up in today's JDK15 testing so it got on my radar (again). >>>>>> >>>>>> >>>>>>> I don't consider the failures for this test as noisy. I only see >>>>>>> 3 in mach5 CI testing for all of JDK 15. JDK 14 does? appear to >>>>>>> have been somewhat noisy, possibly enough so that it looks like >>>>>>> maybe something changed to reduce the number of failures in 15. >>>>>>> In any case, do you plan on backporting to 14? >>>>>> >>>>>> This failure has been around in one form or another since JDK7. >>>>>> If someone >>>>>> decides to fix it, then they can un-ProblemList it. >>>>>> >>>>>> I'm planning to push it to JDK15 and JDK16. Those two releases >>>>>> are the focus >>>>>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>>>>> >>>>>> May I proceed with the ProblemListing? >>>>> I just don't feel if we problem list tests with this failure rate >>>>> that in the long run it is a productive or good thing to do. 3 >>>>> failures during an entire 6 month CI test cycle seems rather low >>>>> to me. I'd like to get opinions from others. >>>> >>>> It's not just the failure rate. It's the fact that this bug has sat >>>> for >>>> years without being fixed. I have tracked this bug for a very long >>>> time >>>> since I'm the guy that filed both bugs. >>>> >>>> Mach5 is showing 54 sightings of 8205957 and here's the linking >>>> distribution: >>>> >>>> $ sort /tmp/fred | uniq -c | sort -rn >>>> ? 20 daniel.daugherty at oracle.com >>>> ? 10 rahul.v.raghavan at oracle.com >>>> ?? 7 martin.thompson at oracle.com >>>> ?? 4 leonid.mesnik at oracle.com >>>> ?? 3 jesper.wilhelmsson at oracle.com >>>> ?? 3 chris.plummer at oracle.com >>>> ?? 2 mikael.vidstedt at oracle.com >>>> ?? 1 tobias.hartmann at oracle.com >>>> ?? 1 sangheon.kim at oracle.com >>>> ?? 1 kim.barrett at oracle.com >>>> ?? 1 daniil.x.titov at oracle.com >>>> ?? 1 calvin.cheung at oracle.com >>>> >>>> As you can see, I've observed and linked this bug a lot. >>>> I'm tired of it. >>> I still think that what is most relevant is how often it reproduces >>> with CI with the current release, and for that the # is 3 times in 6 >>> months. In our current test history it's failed 6 out of 21390 runs, >>> so you are disabling a test that passes 99.97% of the time. My >>> concern is that if a bug is introduced that makes it start failing >>> every run, or at least very frequently, it will be missed. We need >>> to carefully weigh the annoyance of failure noise with the >>> importance of test coverage. I don't think the balance is right for >>> this test to justify problem listing it. >>> >>> What you might want to consider is disabling it in the mode where it >>> seems to be failing. The failures all seem to be with -Xcomp. Maybe >>> you should just problem list it in ProblemList-Xcomp.txt. >> >> I didn't notice that this is an -Xcomp only failure. I was able to >> verify >> that fact for 46 of the 54 sightings. For the 8 oldest sightings, the >> task >> name has been lost to the dustbin of time so I can't confirm those. >> >> I can move the entry from test/hotspot/jtreg/ProblemList.txt to >> test/hotspot/jtreg/ProblemList-Xcomp.txt. >> >> Here's the context diff: >> >> $ hg diff >> diff -r 015533451f4c test/hotspot/jtreg/ProblemList-Xcomp.txt >> --- a/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 09:31:08 >> 2020 -0700 >> +++ b/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 16:58:18 >> 2020 -0400 >> @@ -27,3 +27,4 @@ >> ?# >> ?############################################################################# >> >> >> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >> 8205957 generic-all >> >> >> Is this acceptable to you? > Yes, that works for me. Thanks! Dan > > Chris >> >> Dan >> >> >> >>> >>> Chris >>>> >>>> Dan >>>> >>>> >>>>> >>>>> Chris >>>>>> >>>>>> Dan >>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> It's time to reduce the noise in the CI so I'm ProblemListing >>>>>>>> tests. >>>>>>>> >>>>>>>> Here's the bug for failure: >>>>>>>> >>>>>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>>>>>> field value >>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>>>>> >>>>>>>> and here's the bug for the ProblemListing: >>>>>>>> >>>>>>>> ??? JDK-8247495 ProblemList >>>>>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>> >>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>>>>> >>>>>>>> I'm considering this a trivial change so I need a single >>>>>>>> (R)eviewer. >>>>>>>> >>>>>>>> Here's the context diff for the change: >>>>>>>> >>>>>>>> $ hg diff >>>>>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 >>>>>>>> 2020 -0700 >>>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 >>>>>>>> 2020 -0400 >>>>>>>> @@ -141,6 +141,7 @@ >>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>>>>> 8219652 aix-ppc64 >>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>>>>> 8219652 aix-ppc64 >>>>>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>>>>> 8219652 aix-ppc64 >>>>>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>> 8205957 generic-all >>>>>>>> >>>>>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>>>>> 8208243,8192647 generic-all >>>>>>>> >>>>>>>> >>>>>>>> This issue is actually much older than JDK-8205957 would indicate >>>>>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>>>>> the test is covered by >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>>>>> and that failures first sighting is in JDK7. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > From serguei.spitsyn at oracle.com Fri Jun 12 22:40:57 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 12 Jun 2020 15:40:57 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> Message-ID: Hi Dan and Chris, Problem-listing it for Xcomp only looks right to me. Thank you for taking care about it! Thanks, Serguei On 6/12/20 14:20, Chris Plummer wrote: > On 6/12/20 1:59 PM, Daniel D. Daugherty wrote: >> On 6/12/20 4:48 PM, Chris Plummer wrote: >>> On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: >>>> On 6/12/20 2:58 PM, Chris Plummer wrote: >>>>> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>>>>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> What's the criteria for "noise". >>>>>> >>>>>> There is no specific criteria that I'm aware of. >>>>>> >>>>>> It popped up in today's JDK15 testing so it got on my radar (again). >>>>>> >>>>>> >>>>>>> I don't consider the failures for this test as noisy. I only see >>>>>>> 3 in mach5 CI testing for all of JDK 15. JDK 14 does? appear to >>>>>>> have been somewhat noisy, possibly enough so that it looks like >>>>>>> maybe something changed to reduce the number of failures in 15. >>>>>>> In any case, do you plan on backporting to 14? >>>>>> >>>>>> This failure has been around in one form or another since JDK7. >>>>>> If someone >>>>>> decides to fix it, then they can un-ProblemList it. >>>>>> >>>>>> I'm planning to push it to JDK15 and JDK16. Those two releases >>>>>> are the focus >>>>>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>>>>> >>>>>> May I proceed with the ProblemListing? >>>>> I just don't feel if we problem list tests with this failure rate >>>>> that in the long run it is a productive or good thing to do. 3 >>>>> failures during an entire 6 month CI test cycle seems rather low >>>>> to me. I'd like to get opinions from others. >>>> >>>> It's not just the failure rate. It's the fact that this bug has sat >>>> for >>>> years without being fixed. I have tracked this bug for a very long >>>> time >>>> since I'm the guy that filed both bugs. >>>> >>>> Mach5 is showing 54 sightings of 8205957 and here's the linking >>>> distribution: >>>> >>>> $ sort /tmp/fred | uniq -c | sort -rn >>>> ? 20 daniel.daugherty at oracle.com >>>> ? 10 rahul.v.raghavan at oracle.com >>>> ?? 7 martin.thompson at oracle.com >>>> ?? 4 leonid.mesnik at oracle.com >>>> ?? 3 jesper.wilhelmsson at oracle.com >>>> ?? 3 chris.plummer at oracle.com >>>> ?? 2 mikael.vidstedt at oracle.com >>>> ?? 1 tobias.hartmann at oracle.com >>>> ?? 1 sangheon.kim at oracle.com >>>> ?? 1 kim.barrett at oracle.com >>>> ?? 1 daniil.x.titov at oracle.com >>>> ?? 1 calvin.cheung at oracle.com >>>> >>>> As you can see, I've observed and linked this bug a lot. >>>> I'm tired of it. >>> I still think that what is most relevant is how often it reproduces >>> with CI with the current release, and for that the # is 3 times in 6 >>> months. In our current test history it's failed 6 out of 21390 runs, >>> so you are disabling a test that passes 99.97% of the time. My >>> concern is that if a bug is introduced that makes it start failing >>> every run, or at least very frequently, it will be missed. We need >>> to carefully weigh the annoyance of failure noise with the >>> importance of test coverage. I don't think the balance is right for >>> this test to justify problem listing it. >>> >>> What you might want to consider is disabling it in the mode where it >>> seems to be failing. The failures all seem to be with -Xcomp. Maybe >>> you should just problem list it in ProblemList-Xcomp.txt. >> >> I didn't notice that this is an -Xcomp only failure. I was able to >> verify >> that fact for 46 of the 54 sightings. For the 8 oldest sightings, the >> task >> name has been lost to the dustbin of time so I can't confirm those. >> >> I can move the entry from test/hotspot/jtreg/ProblemList.txt to >> test/hotspot/jtreg/ProblemList-Xcomp.txt. >> >> Here's the context diff: >> >> $ hg diff >> diff -r 015533451f4c test/hotspot/jtreg/ProblemList-Xcomp.txt >> --- a/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 09:31:08 >> 2020 -0700 >> +++ b/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 16:58:18 >> 2020 -0400 >> @@ -27,3 +27,4 @@ >> ?# >> ?############################################################################# >> >> >> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >> 8205957 generic-all >> >> >> Is this acceptable to you? > Yes, that works for me. > > Chris >> >> Dan >> >> >> >>> >>> Chris >>>> >>>> Dan >>>> >>>> >>>>> >>>>> Chris >>>>>> >>>>>> Dan >>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> It's time to reduce the noise in the CI so I'm ProblemListing >>>>>>>> tests. >>>>>>>> >>>>>>>> Here's the bug for failure: >>>>>>>> >>>>>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>>>>>> field value >>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>>>>> >>>>>>>> and here's the bug for the ProblemListing: >>>>>>>> >>>>>>>> ??? JDK-8247495 ProblemList >>>>>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>> >>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>>>>> >>>>>>>> I'm considering this a trivial change so I need a single >>>>>>>> (R)eviewer. >>>>>>>> >>>>>>>> Here's the context diff for the change: >>>>>>>> >>>>>>>> $ hg diff >>>>>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 09:31:08 >>>>>>>> 2020 -0700 >>>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 12:40:17 >>>>>>>> 2020 -0400 >>>>>>>> @@ -141,6 +141,7 @@ >>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>>>>> 8219652 aix-ppc64 >>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>>>>> 8219652 aix-ppc64 >>>>>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>>>>> 8219652 aix-ppc64 >>>>>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>> 8205957 generic-all >>>>>>>> >>>>>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>>>>> 8208243,8192647 generic-all >>>>>>>> >>>>>>>> >>>>>>>> This issue is actually much older than JDK-8205957 would indicate >>>>>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>>>>> the test is covered by >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>>>>> and that failures first sighting is in JDK7. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > From leonid.mesnik at oracle.com Fri Jun 12 23:18:18 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Fri, 12 Jun 2020 16:18:18 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: <3927ae7c-efa9-eb9f-ab98-18d778d5a966@oracle.com> References: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> <2cf4e45a-4d44-3c0a-a272-480f56a5e6e8@oracle.com> <3927ae7c-efa9-eb9f-ab98-18d778d5a966@oracle.com> Message-ID: <547f02e9-604c-09e5-5fe1-2afb1be54f2d@oracle.com> Fixed all places, updated copyright. Still need second review http://cr.openjdk.java.net/~lmesnik/8242891/webrev.02/ Leonid On 6/11/20 8:41 PM, serguei.spitsyn at oracle.com wrote: > Hi Leonid, > > It is much better now. > > Several places still need the same fix. > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetAllThreads/allthr001/allthr001.cpp.frames.html > > 211 for (i = 0; i < thrInfo[ind].cnt; i++) { > 212 for (j = 0, found = 0; j < threadsCount && !found; j++) { > 213 err = jvmti->GetThreadInfo(threads[j], &inf); > 214 if (err != JVMTI_ERROR_NONE) { > 215 printf("Failed to get thread info: %s (%d)\n", > 216 TranslateError(err), err); > 217 result = STATUS_FAILED; > 218 } > 219 if (printdump == JNI_TRUE) { > 220 printf(" >>> %s", inf.name); > 221 } > 222 found = (inf.name != NULL && > 223 strstr(inf.name, thrInfo[ind].thrNames[i]) == inf.name && > 224 (ind == 4 || strlen(inf.name) == > 225 strlen(thrInfo[ind].thrNames[i]))); > 226 } > A return is needed after line 217, otherwise the the inf value is used > at lines 222-224. > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetBytecodes/bytecodes003/bytecodes003.cpp.frames.html > > A return is needed for the errors: > 363 result = STATUS_FAILED; > 372 result = STATUS_FAILED; > 384 result = STATUS_FAILED; > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/MethodEntry/mentry001/mentry001.cpp.frames.html > > A return is needed for the errors: > 82 result = STATUS_FAILED; > 94 result = STATUS_FAILED; > 100 result = STATUS_FAILED; > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/MethodExit/mexit001/mexit001.cpp.frames.html > > A return is needed for the error: > 98 result = STATUS_FAILED; > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/MethodExit/mexit002/mexit002.cpp.frames.html > > A return is needed for the error: > 98 result = STATUS_FAILED; > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/RedefineClasses/redefclass019/redefclass019.cpp.frames.html > > A return is needed for the error: > 186 result = STATUS_FAILED; > > Also, I do not like many uninitialized locals in these tests. > But it is for another pass. > > Otherwise, it looks good. > No need for another webrev if you fix the above. > I hope, you will update copyright comments before push. > > Thanks, > Serguei > > > On 6/11/20 15:30, Leonid Mesnik wrote: >> >> Agree, it would be better to don't try to use data from functions >> with error code. The new webrev: >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/ >> >> I tried to prevent any usage of possibly corrupted data. Mostly >> strings or allocated data, sometimes method/class id which are used >> my other JVMTI functions. >> >> Leonid >> >> On 6/9/20 6:59 PM, serguei.spitsyn at oracle.com wrote: >>> On 6/9/20 12:58, Leonid Mesnik wrote: >>>> >>>> Hi >>>> >>>> >>>> On 6/9/20 12:34 PM, serguei.spitsyn at oracle.com wrote: >>>>> Hi Leonid, >>>>> >>>>> Thank you for taking care about this! >>>>> It looks good in general. >>>>> However, I think, a similar return is needed in more cases. >>>>> >>>>> One example: >>>>> >>>>> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/jvmti/Exception/exception001/exception001.cpp.frames.html >>>>> >>>>> 99 err = jvmti_env->GetMethodDeclaringClass(method, &cls); >>>>> 100 if (err != JVMTI_ERROR_NONE) { >>>>> 101 printf("(GetMethodDeclaringClass#t) unexpected error: %s (%d)\n", >>>>> 102 TranslateError(err), err); >>>>> 103 result = STATUS_FAILED; >>>>> 104 return; >>>>> 105 } >>>>> 106 err = jvmti_env->GetClassSignature(cls, &ex.t_cls, &generic); >>>>> 107 if (err != JVMTI_ERROR_NONE) { >>>>> 108 printf("(GetClassSignature#t) unexpected error: %s (%d)\n", >>>>> 109 TranslateError(err), err); >>>>> 110 result = STATUS_FAILED; >>>>> 111 } >>>>> 112 err = jvmti_env->GetMethodName(method, >>>>> 113 &ex.t_name, &ex.t_sig, &generic); >>>>> 114 if (err != JVMTI_ERROR_NONE) { >>>>> 115 printf("(GetMethodName#t) unexpected error: %s (%d)\n", >>>>> 116 TranslateError(err), err); >>>>> 117 result = STATUS_FAILED; >>>>> 118 } >>>>> 119 ex.t_loc = location; >>>>> 120 err = jvmti_env->GetMethodDeclaringClass(catch_method, &cls); >>>>> 121 if (err != JVMTI_ERROR_NONE) { >>>>> 122 printf("(GetMethodDeclaringClass#c) unexpected error: %s (%d)\n", >>>>> 123 TranslateError(err), err); >>>>> 124 result = STATUS_FAILED; >>>>> 125 return; >>>>> 126 } >>>>> 127 err = jvmti_env->GetClassSignature(cls, &ex.c_cls, &generic); >>>>> 128 if (err != JVMTI_ERROR_NONE) { >>>>> 129 printf("(GetClassSignature#c) unexpected error: %s (%d)\n", >>>>> 130 TranslateError(err), err); >>>>> 131 result = STATUS_FAILED; >>>>> 132 } >>>>> 133 err = jvmti_env->GetMethodName(catch_method, >>>>> 134 &ex.c_name, &ex.c_sig, &generic); >>>>> 135 if (err != JVMTI_ERROR_NONE) { >>>>> 136 printf("(GetMethodName#c) unexpected error: %s (%d)\n", >>>>> 137 TranslateError(err), err); >>>>> 138 result = STATUS_FAILED; >>>>> 139 } >>>>> >>>>> In the fragment above you added return for JVMTI >>>>> GetMethodDeclaringClass error. >>>>> But GetMethodName and GetClassSignature can be also problematic as >>>>> the returned names are printed below. >>>>> It seems to be more safe and even simpler to add returns for such >>>>> cases as well. >>>>> Otherwise, the code reader is puzzled why there is a return in one >>>>> failure case and there is no such return in another. >>>> >>>> It is a good question if we want to fix such places or even fails >>>> with first JVMTI failure. (I even started to fix it in the such way >>>> but find that existing tests usually don't fail always). >>>> >>> >>> I do not suggest to fix all the tests but those which you are >>> already fixing. >>> >>> >>>> The difference is that test tries to reuse "cls" in other JVMTI >>>> function and going to generate very misleading crash. How it just >>>> tries to compare ex and exs values. So test might crash but clearly >>>> outside of JVMTI function and with some useful info. So I am not >>>> sure if fixing these lines improve test failure handling. >>>> >>> >>> If JVMTI functions fail with an error code the results with symbolic >>> strings must be considered invalid. >>> However, they are used later (the values are printed). >>> It is better to bail out in such cases. >>> It should not be a problem to add similar returns in such cases. >>> Or do you think it is important to continue execution for some reason? >>> >>>> Assuming that most of existing tests fails early only if going to >>>> re-use possible corrupted data I propose to fix this separately. We >>>> need to figure out when to fail or to try to finish. >>>> >>> >>> Do you suggest it for the updated tests only or for all the tests >>> with such problems? >>> >>> Thanks, >>> Serguei >>> >>>> Leonid >>>> >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/1/20 21:33, Leonid Mesnik wrote: >>>>>> Hi >>>>>> >>>>>> Could you please review following fix which stop test execution >>>>>> if JVMTI function returns error. The test fails anyway however >>>>>> using potentially bad data in JVMTI function might cause >>>>>> misleading crash failures. The hs_err will contains the >>>>>> stacktrace not with problem function but with function called >>>>>> with corrupted data. Most of tests already has such behavior but >>>>>> not all. Also I fixed a couple of tests to finish if they haven't >>>>>> managed to suspend thread. >>>>>> >>>>>> I've updated only tests which try to use corrupted data in JVMTI >>>>>> functions after errors. I haven't updated tests which just >>>>>> compare/print values from erroring JVMTI functions. The crash in >>>>>> strcmp/println is not so misleading and might be point to real >>>>>> issue. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/ >>>>>> >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8242891 >>>>>> >>>>>> Leonid >>>>>> >>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Sat Jun 13 00:22:14 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 12 Jun 2020 20:22:14 -0400 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> Message-ID: <124d2968-8185-70e4-8e29-4796c0ea986f@oracle.com> Hi Serguei, Thanks for reviewing! I pushed the changeset just before I took a dinner break so I won't be able to list you as a reviewer. Dan On 6/12/20 6:40 PM, serguei.spitsyn at oracle.com wrote: > Hi Dan and Chris, > > Problem-listing it for Xcomp only looks right to me. > Thank you for taking care about it! > > Thanks, > Serguei > > > On 6/12/20 14:20, Chris Plummer wrote: >> On 6/12/20 1:59 PM, Daniel D. Daugherty wrote: >>> On 6/12/20 4:48 PM, Chris Plummer wrote: >>>> On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: >>>>> On 6/12/20 2:58 PM, Chris Plummer wrote: >>>>>> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>>>>>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>>>>>> Hi Dan, >>>>>>>> >>>>>>>> What's the criteria for "noise". >>>>>>> >>>>>>> There is no specific criteria that I'm aware of. >>>>>>> >>>>>>> It popped up in today's JDK15 testing so it got on my radar >>>>>>> (again). >>>>>>> >>>>>>> >>>>>>>> I don't consider the failures for this test as noisy. I only >>>>>>>> see 3 in mach5 CI testing for all of JDK 15. JDK 14 does? >>>>>>>> appear to have been somewhat noisy, possibly enough so that it >>>>>>>> looks like maybe something changed to reduce the number of >>>>>>>> failures in 15. In any case, do you plan on backporting to 14? >>>>>>> >>>>>>> This failure has been around in one form or another since JDK7. >>>>>>> If someone >>>>>>> decides to fix it, then they can un-ProblemList it. >>>>>>> >>>>>>> I'm planning to push it to JDK15 and JDK16. Those two releases >>>>>>> are the focus >>>>>>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>>>>>> >>>>>>> May I proceed with the ProblemListing? >>>>>> I just don't feel if we problem list tests with this failure rate >>>>>> that in the long run it is a productive or good thing to do. 3 >>>>>> failures during an entire 6 month CI test cycle seems rather low >>>>>> to me. I'd like to get opinions from others. >>>>> >>>>> It's not just the failure rate. It's the fact that this bug has >>>>> sat for >>>>> years without being fixed. I have tracked this bug for a very long >>>>> time >>>>> since I'm the guy that filed both bugs. >>>>> >>>>> Mach5 is showing 54 sightings of 8205957 and here's the linking >>>>> distribution: >>>>> >>>>> $ sort /tmp/fred | uniq -c | sort -rn >>>>> ? 20 daniel.daugherty at oracle.com >>>>> ? 10 rahul.v.raghavan at oracle.com >>>>> ?? 7 martin.thompson at oracle.com >>>>> ?? 4 leonid.mesnik at oracle.com >>>>> ?? 3 jesper.wilhelmsson at oracle.com >>>>> ?? 3 chris.plummer at oracle.com >>>>> ?? 2 mikael.vidstedt at oracle.com >>>>> ?? 1 tobias.hartmann at oracle.com >>>>> ?? 1 sangheon.kim at oracle.com >>>>> ?? 1 kim.barrett at oracle.com >>>>> ?? 1 daniil.x.titov at oracle.com >>>>> ?? 1 calvin.cheung at oracle.com >>>>> >>>>> As you can see, I've observed and linked this bug a lot. >>>>> I'm tired of it. >>>> I still think that what is most relevant is how often it reproduces >>>> with CI with the current release, and for that the # is 3 times in >>>> 6 months. In our current test history it's failed 6 out of 21390 >>>> runs, so you are disabling a test that passes 99.97% of the time. >>>> My concern is that if a bug is introduced that makes it start >>>> failing every run, or at least very frequently, it will be missed. >>>> We need to carefully weigh the annoyance of failure noise with the >>>> importance of test coverage. I don't think the balance is right for >>>> this test to justify problem listing it. >>>> >>>> What you might want to consider is disabling it in the mode where >>>> it seems to be failing. The failures all seem to be with -Xcomp. >>>> Maybe you should just problem list it in ProblemList-Xcomp.txt. >>> >>> I didn't notice that this is an -Xcomp only failure. I was able to >>> verify >>> that fact for 46 of the 54 sightings. For the 8 oldest sightings, >>> the task >>> name has been lost to the dustbin of time so I can't confirm those. >>> >>> I can move the entry from test/hotspot/jtreg/ProblemList.txt to >>> test/hotspot/jtreg/ProblemList-Xcomp.txt. >>> >>> Here's the context diff: >>> >>> $ hg diff >>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList-Xcomp.txt >>> --- a/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 >>> 09:31:08 2020 -0700 >>> +++ b/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 >>> 16:58:18 2020 -0400 >>> @@ -27,3 +27,4 @@ >>> ?# >>> ?############################################################################# >>> >>> >>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>> 8205957 generic-all >>> >>> >>> Is this acceptable to you? >> Yes, that works for me. >> >> Chris >>> >>> Dan >>> >>> >>> >>>> >>>> Chris >>>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> Chris >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> It's time to reduce the noise in the CI so I'm ProblemListing >>>>>>>>> tests. >>>>>>>>> >>>>>>>>> Here's the bug for failure: >>>>>>>>> >>>>>>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with bad >>>>>>>>> field value >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>>>>>> >>>>>>>>> and here's the bug for the ProblemListing: >>>>>>>>> >>>>>>>>> ??? JDK-8247495 ProblemList >>>>>>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>>>>>> >>>>>>>>> I'm considering this a trivial change so I need a single >>>>>>>>> (R)eviewer. >>>>>>>>> >>>>>>>>> Here's the context diff for the change: >>>>>>>>> >>>>>>>>> $ hg diff >>>>>>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 >>>>>>>>> 09:31:08 2020 -0700 >>>>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 >>>>>>>>> 12:40:17 2020 -0400 >>>>>>>>> @@ -141,6 +141,7 @@ >>>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>>>>>> 8219652 aix-ppc64 >>>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>>>>>> 8219652 aix-ppc64 >>>>>>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>>>>>> 8219652 aix-ppc64 >>>>>>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>>> 8205957 generic-all >>>>>>>>> >>>>>>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>>>>>> 8208243,8192647 generic-all >>>>>>>>> >>>>>>>>> >>>>>>>>> This issue is actually much older than JDK-8205957 would indicate >>>>>>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>>>>>> the test is covered by >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>>>>>> and that failures first sighting is in JDK7. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From serguei.spitsyn at oracle.com Sat Jun 13 00:28:24 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 12 Jun 2020 17:28:24 -0700 Subject: RFR(T): 8247495: ProblemList vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java In-Reply-To: <124d2968-8185-70e4-8e29-4796c0ea986f@oracle.com> References: <12f036a0-3540-212e-dc78-4a96c821b63a@oracle.com> <3ced73df-552a-e4a8-a1b8-f06372d19a43@oracle.com> <33a37561-d063-fb7d-4430-731bc17738a2@oracle.com> <5fecf0a5-5bc0-e6de-4708-9535fbcad015@oracle.com> <124d2968-8185-70e4-8e29-4796c0ea986f@oracle.com> Message-ID: <77c25722-2f2b-2ba8-1452-a3f4e56818dd@oracle.com> On 6/12/20 17:22, Daniel D. Daugherty wrote: > Hi Serguei, > > Thanks for reviewing! I pushed the changeset just before I took a dinner > break Great! > so I won't be able to list you as a reviewer. Not a big deal. :) Thanks, Serguei > > Dan > > > On 6/12/20 6:40 PM, serguei.spitsyn at oracle.com wrote: >> Hi Dan and Chris, >> >> Problem-listing it for Xcomp only looks right to me. >> Thank you for taking care about it! >> >> Thanks, >> Serguei >> >> >> On 6/12/20 14:20, Chris Plummer wrote: >>> On 6/12/20 1:59 PM, Daniel D. Daugherty wrote: >>>> On 6/12/20 4:48 PM, Chris Plummer wrote: >>>>> On 6/12/20 12:13 PM, Daniel D. Daugherty wrote: >>>>>> On 6/12/20 2:58 PM, Chris Plummer wrote: >>>>>>> On 6/12/20 11:52 AM, Daniel D. Daugherty wrote: >>>>>>>> On 6/12/20 2:49 PM, Chris Plummer wrote: >>>>>>>>> Hi Dan, >>>>>>>>> >>>>>>>>> What's the criteria for "noise". >>>>>>>> >>>>>>>> There is no specific criteria that I'm aware of. >>>>>>>> >>>>>>>> It popped up in today's JDK15 testing so it got on my radar >>>>>>>> (again). >>>>>>>> >>>>>>>> >>>>>>>>> I don't consider the failures for this test as noisy. I only >>>>>>>>> see 3 in mach5 CI testing for all of JDK 15. JDK 14 does >>>>>>>>> appear to have been somewhat noisy, possibly enough so that it >>>>>>>>> looks like maybe something changed to reduce the number of >>>>>>>>> failures in 15. In any case, do you plan on backporting to 14? >>>>>>>> >>>>>>>> This failure has been around in one form or another since JDK7. >>>>>>>> If someone >>>>>>>> decides to fix it, then they can un-ProblemList it. >>>>>>>> >>>>>>>> I'm planning to push it to JDK15 and JDK16. Those two releases >>>>>>>> are the focus >>>>>>>> of my CI noise reduction efforts. I don't monitor the JDK14u CI... >>>>>>>> >>>>>>>> May I proceed with the ProblemListing? >>>>>>> I just don't feel if we problem list tests with this failure >>>>>>> rate that in the long run it is a productive or good thing to >>>>>>> do. 3 failures during an entire 6 month CI test cycle seems >>>>>>> rather low to me. I'd like to get opinions from others. >>>>>> >>>>>> It's not just the failure rate. It's the fact that this bug has >>>>>> sat for >>>>>> years without being fixed. I have tracked this bug for a very >>>>>> long time >>>>>> since I'm the guy that filed both bugs. >>>>>> >>>>>> Mach5 is showing 54 sightings of 8205957 and here's the linking >>>>>> distribution: >>>>>> >>>>>> $ sort /tmp/fred | uniq -c | sort -rn >>>>>> ? 20 daniel.daugherty at oracle.com >>>>>> ? 10 rahul.v.raghavan at oracle.com >>>>>> ?? 7 martin.thompson at oracle.com >>>>>> ?? 4 leonid.mesnik at oracle.com >>>>>> ?? 3 jesper.wilhelmsson at oracle.com >>>>>> ?? 3 chris.plummer at oracle.com >>>>>> ?? 2 mikael.vidstedt at oracle.com >>>>>> ?? 1 tobias.hartmann at oracle.com >>>>>> ?? 1 sangheon.kim at oracle.com >>>>>> ?? 1 kim.barrett at oracle.com >>>>>> ?? 1 daniil.x.titov at oracle.com >>>>>> ?? 1 calvin.cheung at oracle.com >>>>>> >>>>>> As you can see, I've observed and linked this bug a lot. >>>>>> I'm tired of it. >>>>> I still think that what is most relevant is how often it >>>>> reproduces with CI with the current release, and for that the # is >>>>> 3 times in 6 months. In our current test history it's failed 6 out >>>>> of 21390 runs, so you are disabling a test that passes 99.97% of >>>>> the time. My concern is that if a bug is introduced that makes it >>>>> start failing every run, or at least very frequently, it will be >>>>> missed. We need to carefully weigh the annoyance of failure noise >>>>> with the importance of test coverage. I don't think the balance is >>>>> right for this test to justify problem listing it. >>>>> >>>>> What you might want to consider is disabling it in the mode where >>>>> it seems to be failing. The failures all seem to be with -Xcomp. >>>>> Maybe you should just problem list it in ProblemList-Xcomp.txt. >>>> >>>> I didn't notice that this is an -Xcomp only failure. I was able to >>>> verify >>>> that fact for 46 of the 54 sightings. For the 8 oldest sightings, >>>> the task >>>> name has been lost to the dustbin of time so I can't confirm those. >>>> >>>> I can move the entry from test/hotspot/jtreg/ProblemList.txt to >>>> test/hotspot/jtreg/ProblemList-Xcomp.txt. >>>> >>>> Here's the context diff: >>>> >>>> $ hg diff >>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList-Xcomp.txt >>>> --- a/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 >>>> 09:31:08 2020 -0700 >>>> +++ b/test/hotspot/jtreg/ProblemList-Xcomp.txt??? Fri Jun 12 >>>> 16:58:18 2020 -0400 >>>> @@ -27,3 +27,4 @@ >>>> ?# >>>> ?############################################################################# >>>> >>>> >>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>> 8205957 generic-all >>>> >>>> >>>> Is this acceptable to you? >>> Yes, that works for me. >>> >>> Chris >>>> >>>> Dan >>>> >>>> >>>> >>>>> >>>>> Chris >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> >>>>>>> Chris >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> On 6/12/20 9:46 AM, Daniel D. Daugherty wrote: >>>>>>>>>> Greetings, >>>>>>>>>> >>>>>>>>>> It's time to reduce the noise in the CI so I'm ProblemListing >>>>>>>>>> tests. >>>>>>>>>> >>>>>>>>>> Here's the bug for failure: >>>>>>>>>> >>>>>>>>>> ??? JDK-8205957 setfldw001/TestDescription.java fails with >>>>>>>>>> bad field value >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8205957 >>>>>>>>>> >>>>>>>>>> and here's the bug for the ProblemListing: >>>>>>>>>> >>>>>>>>>> ??? JDK-8247495 ProblemList >>>>>>>>>> vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>>>> >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247495 >>>>>>>>>> >>>>>>>>>> I'm considering this a trivial change so I need a single >>>>>>>>>> (R)eviewer. >>>>>>>>>> >>>>>>>>>> Here's the context diff for the change: >>>>>>>>>> >>>>>>>>>> $ hg diff >>>>>>>>>> diff -r 015533451f4c test/hotspot/jtreg/ProblemList.txt >>>>>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 >>>>>>>>>> 09:31:08 2020 -0700 >>>>>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Jun 12 >>>>>>>>>> 12:40:17 2020 -0400 >>>>>>>>>> @@ -141,6 +141,7 @@ >>>>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java >>>>>>>>>> 8219652 aix-ppc64 >>>>>>>>>> ?vmTestbase/nsk/jvmti/scenarios/jni_interception/JI06/ji06t001/TestDescription.java >>>>>>>>>> 8219652 aix-ppc64 >>>>>>>>>> ?vmTestbase/nsk/jvmti/SetJNIFunctionTable/setjniftab001/TestDescription.java >>>>>>>>>> 8219652 aix-ppc64 >>>>>>>>>> +vmTestbase/nsk/jvmti/SetFieldAccessWatch/setfldw001/TestDescription.java >>>>>>>>>> 8205957 generic-all >>>>>>>>>> >>>>>>>>>> ?vmTestbase/gc/lock/jni/jnilock002/TestDescription.java >>>>>>>>>> 8208243,8192647 generic-all >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This issue is actually much older than JDK-8205957 would >>>>>>>>>> indicate >>>>>>>>>> (first sighting in JDK11 for that bug ID). The older version of >>>>>>>>>> the test is covered by >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6528079 >>>>>>>>>> and that failures first sighting is in JDK7. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, in advance, for any comments, questions, or suggestions. >>>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > From suenaga at oss.nttdata.com Mon Jun 15 04:49:49 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 15 Jun 2020 13:49:49 +0900 Subject: Question about GetObjectMonitorUsage() JVMTI function Message-ID: Hi all, I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. If owner thread is not suspended, the owner might be changed to others in subsequent code. For example, the owner might release the monitor before [2]. Thanks, Yasumasa [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 From david.holmes at oracle.com Mon Jun 15 05:15:57 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 15 Jun 2020 15:15:57 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: References: Message-ID: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> Hi Yasumasa, On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: > Hi all, > > I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of > GetObjectMonitorUsage()) does not perform at safepoint. GetObjectMonitorUsage will use a safepoint if the target is not suspended: jvmtiError JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { JavaThread* calling_thread = JavaThread::current(); jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { // Some of the critical threads were not suspended. go to a safepoint and try again VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); VMThread::execute(&op); err = op.result(); } return err; } /* end GetObject */ > Monitor owner would be acquired from monitor object at first [1], but it > would perform concurrently. > If owner thread is not suspended, the owner might be changed to others > in subsequent code. > > For example, the owner might release the monitor before [2]. The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. David ----- > > > Thanks, > > Yasumasa > > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 > > [2] > http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 > From suenaga at oss.nttdata.com Mon Jun 15 06:02:49 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 15 Jun 2020 15:02:49 +0900 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> Message-ID: Hi David, On 2020/06/15 14:15, David Holmes wrote: > Hi Yasumasa, > > On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >> Hi all, >> >> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. > > GetObjectMonitorUsage will use a safepoint if the target is not suspended: > > jvmtiError > JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { > ? JavaThread* calling_thread = JavaThread::current(); > ? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); > ? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { > ??? // Some of the critical threads were not suspended. go to a safepoint and try again > ??? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); > ??? VMThread::execute(&op); > ??? err = op.result(); > ? } > ? return err; > } /* end GetObject */ I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >> If owner thread is not suspended, the owner might be changed to others in subsequent code. >> >> For example, the owner might release the monitor before [2]. > > The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. I think the owner thread might resume unfortunately after suspending check. JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. > This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." GetObjectMonitorUsage() might return incorrect information in some case. It starts with finding owner thread, but the owner might be just before wakeup. So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. Thanks, Yasumasa [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 > However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: > > ?owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); > > and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. > > David > ----- > >> >> >> Thanks, >> >> Yasumasa >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 From david.holmes at oracle.com Mon Jun 15 07:26:55 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 15 Jun 2020 17:26:55 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> Message-ID: <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: > Hi David, > > On 2020/06/15 14:15, David Holmes wrote: >> Hi Yasumasa, >> >> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation >>> of GetObjectMonitorUsage()) does not perform at safepoint. >> >> GetObjectMonitorUsage will use a safepoint if the target is not >> suspended: >> >> jvmtiError >> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* >> info_ptr) { >> ?? JavaThread* calling_thread = JavaThread::current(); >> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, >> info_ptr); >> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >> ???? // Some of the critical threads were not suspended. go to a >> safepoint and try again >> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >> ???? VMThread::execute(&op); >> ???? err = op.result(); >> ?? } >> ?? return err; >> } /* end GetObject */ > > I saw this code, so I guess there are some cases when > JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from > get_object_monitor_usage(). > > >>> Monitor owner would be acquired from monitor object at first [1], but >>> it would perform concurrently. >>> If owner thread is not suspended, the owner might be changed to >>> others in subsequent code. >>> >>> For example, the owner might release the monitor before [2]. >> >> The expectation is that when we find an owner thread it is either >> suspended or not. If it is suspended then it cannot release the >> monitor. If it is not suspended we detect that and redo the whole >> query at a safepoint. > > I think the owner thread might resume unfortunately after suspending check. Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. > JavaThread::is_ext_suspend_completed() is used to check thread state, it > returns `true` when the thread is sleeping [3], or when it performs in > native [4]. Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. > >> This appears to be an optimisation for the assumed common case where >> threads are first suspended and then the monitors are queried. > > I agree with this, but I could find out it from JVMTI spec - it just > says "Get information about the object's monitor." Yes it was just an implementation optimisation, nothing to do with the spec. > GetObjectMonitorUsage() might return incorrect information in some case. > > It starts with finding owner thread, but the owner might be just before > wakeup. > So I think it is more safe if GetObjectMonitorUsage() is called at > safepoint in any case. Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. Cheers, David ----- > > Thanks, > > Yasumasa > > > [3] > http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 > > [4] > http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 > > > >> However there is still a potential bug as the thread reported as the >> owner may not be suspended at the time we first see it, and may >> release the monitor, but then it may get suspended before we call: >> >> ??owning_thread = >> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >> >> and so we think it is still the monitor owner and proceed to query the >> monitor information in a racy way. This can't happen when suspension >> itself requires a safepoint as the current thread won't go to that >> safepoint during this code. However, if suspension is implemented via >> a direct handshake with the target thread then we have a problem. >> >> David >> ----- >> >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> [1] >>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>> >>> [2] >>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>> From suenaga at oss.nttdata.com Mon Jun 15 08:05:56 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 15 Jun 2020 17:05:56 +0900 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> Message-ID: Thank you for clarification! Yasumasa On 2020/06/15 16:26, David Holmes wrote: > On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >> Hi David, >> >> On 2020/06/15 14:15, David Holmes wrote: >>> Hi Yasumasa, >>> >>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>> >>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>> >>> jvmtiError >>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>> ?? JavaThread* calling_thread = JavaThread::current(); >>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>> ???? VMThread::execute(&op); >>> ???? err = op.result(); >>> ?? } >>> ?? return err; >>> } /* end GetObject */ >> >> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >> >> >>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>> >>>> For example, the owner might release the monitor before [2]. >>> >>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >> >> I think the owner thread might resume unfortunately after suspending check. > > Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. > >> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. > > Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. > >> >>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >> >> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." > > Yes it was just an implementation optimisation, nothing to do with the spec. > >> GetObjectMonitorUsage() might return incorrect information in some case. >> >> It starts with finding owner thread, but the owner might be just before wakeup. >> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. > > Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. > > Cheers, > David > ----- > >> >> Thanks, >> >> Yasumasa >> >> >> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >> >> >>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>> >>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>> >>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>> >>> David >>> ----- >>> >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 From MRasmussen at perforce.com Mon Jun 15 10:48:24 2020 From: MRasmussen at perforce.com (Michael Rasmussen) Date: Mon, 15 Jun 2020 10:48:24 +0000 Subject: Java agents in paths containing unicode characters on Windows Message-ID: Hi, Trying to attach a javaagent that is located in a folder that contains characters that cannot be represented in the current windows system code page will fail to load, even if specified with relative path or using a full path using short name. Example: agent.jar file is a javaagent located in a folder with unicode characters, in my example: C:\tmp\Te?t (on my system, the short name (8.3) for that is: C:\Tmp\tet~1) no-agent.jar is a jar file that is not a javaagent C:\>dir /s /b C:\tmp\Te?t C:\tmp\Te?t\agent.jar C:\tmp\Te?t\no-agent.jar C:\>dir /s /b C:\tmp\tet~1\ C:\tmp\tet~1\agent.jar C:\tmp\tet~1\no-agent.jar C:\tmp\Te?t>java -javaagent:agent.jar -version Unexpected error (103) returned by AddToSystemClassLoaderSearch Unable to add agent.jar to system class path - the system class loader does not define the appendToClassPathForInstrumentation method or the method failed FATAL ERROR in native method: processing of -javaagent failed, appending to system class path failed If using full path using 8.3 names that is all in ASCII, it still fails: C:\>java -javaagent:C:\tmp\tet~1\agent.jar -version Unexpected error (103) returned by AddToSystemClassLoaderSearch Unable to add C:\tmp\tet~1\agent.jar to system class path - the system class loader does not define the appendToClassPathForInstrumentation method or the method failed FATAL ERROR in native method: processing of -javaagent failed, appending to system class path failed If I try a jar file that doesn't have the necessary manifest entries to be a javaagent: C:\>java -javaagent:C:\tmp\tet~1\no-agent.jar -version Failed to find Premain-Class manifest attribute in C:\tmp\tet~1\no-agent.jar Error occurred during initialization of VM agent library failed to init: instrument So it can find the jar file, is able to load and read the manifest, but fails afterwards when trying to add to classpath. The above was tried with current JDK14 and JDK11 versions. /Michael This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Mon Jun 15 13:38:42 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 15 Jun 2020 09:38:42 -0400 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> Message-ID: <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> On 6/15/20 3:26 AM, David Holmes wrote: > On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >> Hi David, >> >> On 2020/06/15 14:15, David Holmes wrote: >>> Hi Yasumasa, >>> >>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>> safepoint. >>> >>> GetObjectMonitorUsage will use a safepoint if the target is not >>> suspended: >>> >>> jvmtiError >>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* >>> info_ptr) { >>> ?? JavaThread* calling_thread = JavaThread::current(); >>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, >>> info_ptr); >>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>> ???? // Some of the critical threads were not suspended. go to a >>> safepoint and try again >>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>> info_ptr); >>> ???? VMThread::execute(&op); >>> ???? err = op.result(); >>> ?? } >>> ?? return err; >>> } /* end GetObject */ >> >> I saw this code, so I guess there are some cases when >> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >> get_object_monitor_usage(). >> >> >>>> Monitor owner would be acquired from monitor object at first [1], >>>> but it would perform concurrently. >>>> If owner thread is not suspended, the owner might be changed to >>>> others in subsequent code. >>>> >>>> For example, the owner might release the monitor before [2]. >>> >>> The expectation is that when we find an owner thread it is either >>> suspended or not. If it is suspended then it cannot release the >>> monitor. If it is not suspended we detect that and redo the whole >>> query at a safepoint. >> >> I think the owner thread might resume unfortunately after suspending >> check. > > Yes you are right. I was thinking resuming also required a safepoint > but it only requires the Threads_lock. So yes the code is wrong. Which code is wrong? Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller has started the process of gathering the information while not at a safepoint. Thus the information returned by GetObjectMonitorUsage() might be stale, but that's a bug in the agent code. Dan > > >> JavaThread::is_ext_suspend_completed() is used to check thread state, >> it returns `true` when the thread is sleeping [3], or when it >> performs in native [4]. > > Sure but if the thread is actually suspended it can't continue > execution in the VM or in Java code. > >> >>> This appears to be an optimisation for the assumed common case where >>> threads are first suspended and then the monitors are queried. >> >> I agree with this, but I could find out it from JVMTI spec - it just >> says "Get information about the object's monitor." > > Yes it was just an implementation optimisation, nothing to do with the > spec. > >> GetObjectMonitorUsage() might return incorrect information in some case. >> >> It starts with finding owner thread, but the owner might be just >> before wakeup. >> So I think it is more safe if GetObjectMonitorUsage() is called at >> safepoint in any case. > > Except we're moving away from safepoints to using Handshakes, so this > particular operation will require that the apparent owner is > Handshake-safe (by entering a handshake with it) before querying the > monitor. This would still be preferable I think to always using a > safepoint for the entire operation. > > Cheers, > David > ----- > >> >> Thanks, >> >> Yasumasa >> >> >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >> >> [4] >> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >> >> >> >>> However there is still a potential bug as the thread reported as the >>> owner may not be suspended at the time we first see it, and may >>> release the monitor, but then it may get suspended before we call: >>> >>> ??owning_thread = >>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>> >>> and so we think it is still the monitor owner and proceed to query >>> the monitor information in a racy way. This can't happen when >>> suspension itself requires a safepoint as the current thread won't >>> go to that safepoint during this code. However, if suspension is >>> implemented via a direct handshake with the target thread then we >>> have a problem. >>> >>> David >>> ----- >>> >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> [1] >>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>> >>>> [2] >>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>> From suenaga at oss.nttdata.com Mon Jun 15 14:45:23 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 15 Jun 2020 23:45:23 +0900 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> Message-ID: <57b97ec4-4fd4-ffca-c82e-00eb5c9f3468@oss.nttdata.com> On 2020/06/15 22:38, Daniel D. Daugherty wrote: > On 6/15/20 3:26 AM, David Holmes wrote: >> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>> Hi David, >>> >>> On 2020/06/15 14:15, David Holmes wrote: >>>> Hi Yasumasa, >>>> >>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>> Hi all, >>>>> >>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>> >>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>> >>>> jvmtiError >>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>> ???? VMThread::execute(&op); >>>> ???? err = op.result(); >>>> ?? } >>>> ?? return err; >>>> } /* end GetObject */ >>> >>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>> >>> >>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>> >>>>> For example, the owner might release the monitor before [2]. >>>> >>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>> >>> I think the owner thread might resume unfortunately after suspending check. >> >> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. > > Which code is wrong? > > Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller > has started the process of gathering the information while not at a > safepoint. Thus the information returned by GetObjectMonitorUsage() > might be stale, but that's a bug in the agent code. I don't think so. For example, JVMTI agent might attempt to get monitor owner from sleeping thread or in native (e.g. during socket operation) thread, and it might resume during GetObjectMonitorUsage() call. Agent code can (should) not control application threads as an observer. IMHO GetObjectMonitorUsage() should perform at safepoint or should start direct handshake immediately after getting owner thread from monitor. Yasumasa > Dan > > >> >> >>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >> >> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >> >>> >>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>> >>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >> >> Yes it was just an implementation optimisation, nothing to do with the spec. >> >>> GetObjectMonitorUsage() might return incorrect information in some case. >>> >>> It starts with finding owner thread, but the owner might be just before wakeup. >>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >> >> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >> >> Cheers, >> David >> ----- >> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>> >>> >>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>> >>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>> >>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>> >>>> David >>>> ----- >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 > From daniel.daugherty at oracle.com Mon Jun 15 15:13:02 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 15 Jun 2020 11:13:02 -0400 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <57b97ec4-4fd4-ffca-c82e-00eb5c9f3468@oss.nttdata.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <57b97ec4-4fd4-ffca-c82e-00eb5c9f3468@oss.nttdata.com> Message-ID: <8dba4f96-acf2-bc5c-02fd-d34fcccf376d@oracle.com> On 6/15/20 10:45 AM, Yasumasa Suenaga wrote: > On 2020/06/15 22:38, Daniel D. Daugherty wrote: >> On 6/15/20 3:26 AM, David Holmes wrote: >>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>> Hi David, >>>> >>>> On 2020/06/15 14:15, David Holmes wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>> Hi all, >>>>>> >>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>>>> safepoint. >>>>> >>>>> GetObjectMonitorUsage will use a safepoint if the target is not >>>>> suspended: >>>>> >>>>> jvmtiError >>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* >>>>> info_ptr) { >>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>> object, info_ptr); >>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>> ???? // Some of the critical threads were not suspended. go to a >>>>> safepoint and try again >>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>> info_ptr); >>>>> ???? VMThread::execute(&op); >>>>> ???? err = op.result(); >>>>> ?? } >>>>> ?? return err; >>>>> } /* end GetObject */ >>>> >>>> I saw this code, so I guess there are some cases when >>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>> get_object_monitor_usage(). >>>> >>>> >>>>>> Monitor owner would be acquired from monitor object at first [1], >>>>>> but it would perform concurrently. >>>>>> If owner thread is not suspended, the owner might be changed to >>>>>> others in subsequent code. >>>>>> >>>>>> For example, the owner might release the monitor before [2]. >>>>> >>>>> The expectation is that when we find an owner thread it is either >>>>> suspended or not. If it is suspended then it cannot release the >>>>> monitor. If it is not suspended we detect that and redo the whole >>>>> query at a safepoint. >>>> >>>> I think the owner thread might resume unfortunately after >>>> suspending check. >>> >>> Yes you are right. I was thinking resuming also required a safepoint >>> but it only requires the Threads_lock. So yes the code is wrong. >> >> Which code is wrong? >> >> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >> has started the process of gathering the information while not at a >> safepoint. Thus the information returned by GetObjectMonitorUsage() >> might be stale, but that's a bug in the agent code. > > I don't think so. > > For example, JVMTI agent might attempt to get monitor owner from > sleeping thread or in native (e.g. during socket operation) thread, > and it might resume during GetObjectMonitorUsage() call. A "sleeping thread" or a thread "in native" does not count as being suspended so GetObjectMonitorUsage() will take the safepoint code path. > Agent code can (should) not control application threads as an observer. The agent writer has the choice of using JVM/TI SuspendThread() and JVM/TI ResumeThread() along with GetObjectMonitorUsage() or it can use GetObjectMonitorUsage() without direct calls to SuspendThread() and ResumeThread. It's the agent writer's choice. If the agent is only interested in a single thread, then doing: ??? SuspendThread()??????????? // might safepoint ??? GetObjectMonitorUsage()??? // won't safepoint ??? ResumeThread()???????????? // won't safepoint is more performant than doing just GetObjectMonitorUsage() which is guaranteed to safepoint (implementation detail). > IMHO GetObjectMonitorUsage() should perform at safepoint or should > start direct handshake immediately after getting owner thread from > monitor. Handshakes are a separate matter all together. I'm talking about the way the code works now and I don't think we've switched JVM/TI GetObjectMonitorUsage() to use handshakes yet or have we done so? Dan > > > Yasumasa > > >> Dan >> >> >>> >>> >>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>> state, it returns `true` when the thread is sleeping [3], or when >>>> it performs in native [4]. >>> >>> Sure but if the thread is actually suspended it can't continue >>> execution in the VM or in Java code. >>> >>>> >>>>> This appears to be an optimisation for the assumed common case >>>>> where threads are first suspended and then the monitors are queried. >>>> >>>> I agree with this, but I could find out it from JVMTI spec - it >>>> just says "Get information about the object's monitor." >>> >>> Yes it was just an implementation optimisation, nothing to do with >>> the spec. >>> >>>> GetObjectMonitorUsage() might return incorrect information in some >>>> case. >>>> >>>> It starts with finding owner thread, but the owner might be just >>>> before wakeup. >>>> So I think it is more safe if GetObjectMonitorUsage() is called at >>>> safepoint in any case. >>> >>> Except we're moving away from safepoints to using Handshakes, so >>> this particular operation will require that the apparent owner is >>> Handshake-safe (by entering a handshake with it) before querying the >>> monitor. This would still be preferable I think to always using a >>> safepoint for the entire operation. >>> >>> Cheers, >>> David >>> ----- >>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> [3] >>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>> [4] >>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>> >>>> >>>>> However there is still a potential bug as the thread reported as >>>>> the owner may not be suspended at the time we first see it, and >>>>> may release the monitor, but then it may get suspended before we >>>>> call: >>>>> >>>>> ??owning_thread = >>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>> >>>>> and so we think it is still the monitor owner and proceed to query >>>>> the monitor information in a racy way. This can't happen when >>>>> suspension itself requires a safepoint as the current thread won't >>>>> go to that safepoint during this code. However, if suspension is >>>>> implemented via a direct handshake with the target thread then we >>>>> have a problem. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> [1] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>> [2] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >> From jcbeyler at google.com Mon Jun 15 16:53:02 2020 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Mon, 15 Jun 2020 09:53:02 -0700 Subject: JVMTI callback SampledObjectAlloc always fires for first allocation in a new thread In-Reply-To: References: Message-ID: Hi Markus, I created: https://bugs.openjdk.java.net/browse/JDK-8247615 And I'll see what needs to be done for it :) Jc On Fri, Jun 5, 2020 at 3:45 AM Markus Gaisbauer wrote: > Hi, > > JVMTI callback SampledObjectAlloc is currently always called for the first > allocation of a thread. This generates a lot of bias in an application that > regularly starts new threads. > > I tested this with latest Java 11 and Java 15. > > E.g. here is a sample that creates 100 threads and allocates one object in > each thread. > > public class AllocationProfilingBiasReproducer { > public static void main(String[] args) throws Exception { > for (int i = 0; i < 100; i++) { > new Thread(new Task(), "Task " + i).start(); > Thread.sleep(1); > } > Thread.sleep(1000); > } > private static class Task implements Runnable { > @Override > public void run() { > new A(); > } > } > private static class A { > } > } > > I built a simple JVMTI agent that registers SampledObjectAlloc callback > and sets interval to 1 MB with SetHeapSamplingInterval. The callback simply > logs thread name and class name of allocated object. > > I see the following output: > > SampledObjectAlloc Ljava/lang/String; via Task 0 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 1 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 2 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 3 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 4 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 5 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 6 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 7 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 8 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 9 > SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 10 > ... > > This is not expected. > > I set a breakpoint in my SampledObjectAlloc callback and observed the > following: > > In MemAllocator::Allocation::notify_allocation_jvmti_sampler() the local > var bytes_since_last is always 0xf1f1f1f1f1f1f1f1 for first allocation of a > thread. So first allocation is always reported to my agent. > > ThreadLocalAllocBuffer::_bytes_since_last_sample_point does not seem to be > explicitly initialized before accessing it for the first time. I assume > 0xf1f1f1f1f1f1f1f1 is a default value provided by some Hotspot allocator. > Only after the first event fired, notify_allocation_jvmti_sampler > calls ThreadLocalAllocBuffer::set_sample_end which initializes > _bytes_since_last_sample_point to a proper value. > > I am looking for someone who could create a JIRA ticket for this. > > Regards, > Markus > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen.fitch at oracle.com Mon Jun 15 19:42:26 2020 From: stephen.fitch at oracle.com (Stephen Fitch) Date: Mon, 15 Jun 2020 12:42:26 -0700 Subject: Survey : On the jinfo, jmap, jstack serviceability tools Message-ID: <260b8d05-43f0-f8e8-107f-5ac4784d62ab@oracle.com> Hello: We are considering deprecation and (eventual) removal of the jinfo, jmap, jstack - (aka ?j* tools?) and building out a future foundation for some aspect of serviceability on jcmd, however we don?t have a lot of data about how how these tools are used in practice, especially outside of Oracle. Therefore, we have created a survey [1] to gather more information and help us evaluate and understand how others are using these tools in the JDK.If you have used, or have (support) processes that utilize these j*commands, then we would definitely appreciate a completed survey. We are specifically interested in your use-cases and how these tools are effective for you in resolving JVM issues. The survey will remain open through July 15 2020. The results of the survey will be made public after the survey closes. Thank you very much for your time and support. [1] https://www.questionpro.com/t/AQk5jZhiww From chris.plummer at oracle.com Mon Jun 15 20:03:28 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 15 Jun 2020 13:03:28 -0700 Subject: RFR(XS): 8246369: CodeCache.findBlobUnsafe(addr) sometimes asserts with valid address Message-ID: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> Hello, Please help review the following simple fix: http://cr.openjdk.java.net/~cjplummer/8246369/webrev.00/index.html https://bugs.openjdk.java.net/browse/JDK-8246369 Details are in the CR description. thanks, Chris From serguei.spitsyn at oracle.com Mon Jun 15 22:12:34 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 15 Jun 2020 15:12:34 -0700 Subject: RFR(XS): 8246369: CodeCache.findBlobUnsafe(addr) sometimes asserts with valid address In-Reply-To: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> References: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon Jun 15 22:14:15 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Jun 2020 08:14:15 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> Message-ID: <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> Hi Dan, On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: > On 6/15/20 3:26 AM, David Holmes wrote: >> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>> Hi David, >>> >>> On 2020/06/15 14:15, David Holmes wrote: >>>> Hi Yasumasa, >>>> >>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>> Hi all, >>>>> >>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>>> safepoint. >>>> >>>> GetObjectMonitorUsage will use a safepoint if the target is not >>>> suspended: >>>> >>>> jvmtiError >>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* >>>> info_ptr) { >>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, >>>> info_ptr); >>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>> ???? // Some of the critical threads were not suspended. go to a >>>> safepoint and try again >>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>> info_ptr); >>>> ???? VMThread::execute(&op); >>>> ???? err = op.result(); >>>> ?? } >>>> ?? return err; >>>> } /* end GetObject */ >>> >>> I saw this code, so I guess there are some cases when >>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>> get_object_monitor_usage(). >>> >>> >>>>> Monitor owner would be acquired from monitor object at first [1], >>>>> but it would perform concurrently. >>>>> If owner thread is not suspended, the owner might be changed to >>>>> others in subsequent code. >>>>> >>>>> For example, the owner might release the monitor before [2]. >>>> >>>> The expectation is that when we find an owner thread it is either >>>> suspended or not. If it is suspended then it cannot release the >>>> monitor. If it is not suspended we detect that and redo the whole >>>> query at a safepoint. >>> >>> I think the owner thread might resume unfortunately after suspending >>> check. >> >> Yes you are right. I was thinking resuming also required a safepoint >> but it only requires the Threads_lock. So yes the code is wrong. > > Which code is wrong? > > Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller > has started the process of gathering the information while not at a > safepoint. Thus the information returned by GetObjectMonitorUsage() > might be stale, but that's a bug in the agent code. The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. Cheers, David > Dan > > >> >> >>> JavaThread::is_ext_suspend_completed() is used to check thread state, >>> it returns `true` when the thread is sleeping [3], or when it >>> performs in native [4]. >> >> Sure but if the thread is actually suspended it can't continue >> execution in the VM or in Java code. >> >>> >>>> This appears to be an optimisation for the assumed common case where >>>> threads are first suspended and then the monitors are queried. >>> >>> I agree with this, but I could find out it from JVMTI spec - it just >>> says "Get information about the object's monitor." >> >> Yes it was just an implementation optimisation, nothing to do with the >> spec. >> >>> GetObjectMonitorUsage() might return incorrect information in some case. >>> >>> It starts with finding owner thread, but the owner might be just >>> before wakeup. >>> So I think it is more safe if GetObjectMonitorUsage() is called at >>> safepoint in any case. >> >> Except we're moving away from safepoints to using Handshakes, so this >> particular operation will require that the apparent owner is >> Handshake-safe (by entering a handshake with it) before querying the >> monitor. This would still be preferable I think to always using a >> safepoint for the entire operation. >> >> Cheers, >> David >> ----- >> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> [3] >>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>> >>> [4] >>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>> >>> >>> >>>> However there is still a potential bug as the thread reported as the >>>> owner may not be suspended at the time we first see it, and may >>>> release the monitor, but then it may get suspended before we call: >>>> >>>> ??owning_thread = >>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>> >>>> and so we think it is still the monitor owner and proceed to query >>>> the monitor information in a racy way. This can't happen when >>>> suspension itself requires a safepoint as the current thread won't >>>> go to that safepoint during this code. However, if suspension is >>>> implemented via a direct handshake with the target thread then we >>>> have a problem. >>>> >>>> David >>>> ----- >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> [1] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>> >>>>> [2] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>> > From daniel.daugherty at oracle.com Mon Jun 15 22:40:10 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 15 Jun 2020 18:40:10 -0400 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> Message-ID: <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> On 6/15/20 6:14 PM, David Holmes wrote: > Hi Dan, > > On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >> On 6/15/20 3:26 AM, David Holmes wrote: >>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>> Hi David, >>>> >>>> On 2020/06/15 14:15, David Holmes wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>> Hi all, >>>>>> >>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>>>> safepoint. >>>>> >>>>> GetObjectMonitorUsage will use a safepoint if the target is not >>>>> suspended: >>>>> >>>>> jvmtiError >>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* >>>>> info_ptr) { >>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>> object, info_ptr); >>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>> ???? // Some of the critical threads were not suspended. go to a >>>>> safepoint and try again >>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>> info_ptr); >>>>> ???? VMThread::execute(&op); >>>>> ???? err = op.result(); >>>>> ?? } >>>>> ?? return err; >>>>> } /* end GetObject */ >>>> >>>> I saw this code, so I guess there are some cases when >>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>> get_object_monitor_usage(). >>>> >>>> >>>>>> Monitor owner would be acquired from monitor object at first [1], >>>>>> but it would perform concurrently. >>>>>> If owner thread is not suspended, the owner might be changed to >>>>>> others in subsequent code. >>>>>> >>>>>> For example, the owner might release the monitor before [2]. >>>>> >>>>> The expectation is that when we find an owner thread it is either >>>>> suspended or not. If it is suspended then it cannot release the >>>>> monitor. If it is not suspended we detect that and redo the whole >>>>> query at a safepoint. >>>> >>>> I think the owner thread might resume unfortunately after >>>> suspending check. >>> >>> Yes you are right. I was thinking resuming also required a safepoint >>> but it only requires the Threads_lock. So yes the code is wrong. >> >> Which code is wrong? >> >> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >> has started the process of gathering the information while not at a >> safepoint. Thus the information returned by GetObjectMonitorUsage() >> might be stale, but that's a bug in the agent code. > > The code tries to make sure that it either collects data about a > monitor owned by a thread that is suspended, or else it collects that > data at a safepoint. But the owning thread can be resumed just after > the code determined it was suspended. The monitor can then be released > and the information gathered not only stale but potentially completely > wrong as it could now be owned by a different thread and will report > that thread's entry count. If the agent is not using SuspendThread(), then as soon as GetObjectMonitorUsage() returns to the caller the information can be stale. In fact as soon as the implementation returns from the safepoint that gathered the info, the target thread could have moved on. The only way to make sure you don't have stale information is to use SuspendThread(), but it's not required. Perhaps the doc should have more clear about the possibility of returning stale info. That's a question for Robert F. > GetObjectMonitorUsage says nothing about thread's being suspended so I > can't see how this could be construed as an agent bug. In your scenario above, you mention that the target thread was suspended, GetObjectMonitorUsage() was called while the target was suspended, and then the target thread was resumed after GetObjectMonitorUsage() checked for suspension, but before GetObjectMonitorUsage() was able to gather the info. All three of those calls: SuspendThread(), GetObjectMonitorUsage() and ResumeThread() are made by the agent and the agent should not resume the target thread while also calling GetObjectMonitorUsage(). The calls were allowed to be made out of order so agent bug. > Using a handshake on the owner thread will allow this to be fixed in > the future without forcing/using any safepoints. I have to think about that which is why I'm avoiding talking about handshakes in this thread. Dan > > Cheers, > David > >> Dan >> >> >>> >>> >>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>> state, it returns `true` when the thread is sleeping [3], or when >>>> it performs in native [4]. >>> >>> Sure but if the thread is actually suspended it can't continue >>> execution in the VM or in Java code. >>> >>>> >>>>> This appears to be an optimisation for the assumed common case >>>>> where threads are first suspended and then the monitors are queried. >>>> >>>> I agree with this, but I could find out it from JVMTI spec - it >>>> just says "Get information about the object's monitor." >>> >>> Yes it was just an implementation optimisation, nothing to do with >>> the spec. >>> >>>> GetObjectMonitorUsage() might return incorrect information in some >>>> case. >>>> >>>> It starts with finding owner thread, but the owner might be just >>>> before wakeup. >>>> So I think it is more safe if GetObjectMonitorUsage() is called at >>>> safepoint in any case. >>> >>> Except we're moving away from safepoints to using Handshakes, so >>> this particular operation will require that the apparent owner is >>> Handshake-safe (by entering a handshake with it) before querying the >>> monitor. This would still be preferable I think to always using a >>> safepoint for the entire operation. >>> >>> Cheers, >>> David >>> ----- >>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> [3] >>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>> >>>> [4] >>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>> >>>> >>>> >>>>> However there is still a potential bug as the thread reported as >>>>> the owner may not be suspended at the time we first see it, and >>>>> may release the monitor, but then it may get suspended before we >>>>> call: >>>>> >>>>> ??owning_thread = >>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>> >>>>> and so we think it is still the monitor owner and proceed to query >>>>> the monitor information in a racy way. This can't happen when >>>>> suspension itself requires a safepoint as the current thread won't >>>>> go to that safepoint during this code. However, if suspension is >>>>> implemented via a direct handshake with the target thread then we >>>>> have a problem. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> [1] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>> >>>>>> [2] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>> >> From david.holmes at oracle.com Mon Jun 15 23:19:01 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Jun 2020 09:19:01 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> Message-ID: <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: > On 6/15/20 6:14 PM, David Holmes wrote: >> Hi Dan, >> >> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>> On 6/15/20 3:26 AM, David Holmes wrote: >>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>> Hi David, >>>>> >>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>> Hi Yasumasa, >>>>>> >>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>>>>> safepoint. >>>>>> >>>>>> GetObjectMonitorUsage will use a safepoint if the target is not >>>>>> suspended: >>>>>> >>>>>> jvmtiError >>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* >>>>>> info_ptr) { >>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>> object, info_ptr); >>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>> ???? // Some of the critical threads were not suspended. go to a >>>>>> safepoint and try again >>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>>> info_ptr); >>>>>> ???? VMThread::execute(&op); >>>>>> ???? err = op.result(); >>>>>> ?? } >>>>>> ?? return err; >>>>>> } /* end GetObject */ >>>>> >>>>> I saw this code, so I guess there are some cases when >>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>> get_object_monitor_usage(). >>>>> >>>>> >>>>>>> Monitor owner would be acquired from monitor object at first [1], >>>>>>> but it would perform concurrently. >>>>>>> If owner thread is not suspended, the owner might be changed to >>>>>>> others in subsequent code. >>>>>>> >>>>>>> For example, the owner might release the monitor before [2]. >>>>>> >>>>>> The expectation is that when we find an owner thread it is either >>>>>> suspended or not. If it is suspended then it cannot release the >>>>>> monitor. If it is not suspended we detect that and redo the whole >>>>>> query at a safepoint. >>>>> >>>>> I think the owner thread might resume unfortunately after >>>>> suspending check. >>>> >>>> Yes you are right. I was thinking resuming also required a safepoint >>>> but it only requires the Threads_lock. So yes the code is wrong. >>> >>> Which code is wrong? >>> >>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>> has started the process of gathering the information while not at a >>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>> might be stale, but that's a bug in the agent code. >> >> The code tries to make sure that it either collects data about a >> monitor owned by a thread that is suspended, or else it collects that >> data at a safepoint. But the owning thread can be resumed just after >> the code determined it was suspended. The monitor can then be released >> and the information gathered not only stale but potentially completely >> wrong as it could now be owned by a different thread and will report >> that thread's entry count. > > If the agent is not using SuspendThread(), then as soon as > GetObjectMonitorUsage() returns to the caller the information > can be stale. In fact as soon as the implementation returns > from the safepoint that gathered the info, the target thread > could have moved on. That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. > The only way to make sure you don't have stale information is > to use SuspendThread(), but it's not required. Perhaps the doc > should have more clear about the possibility of returning stale > info. That's a question for Robert F. > > >> GetObjectMonitorUsage says nothing about thread's being suspended so I >> can't see how this could be construed as an agent bug. > > In your scenario above, you mention that the target thread was > suspended, GetObjectMonitorUsage() was called while the target > was suspended, and then the target thread was resumed after > GetObjectMonitorUsage() checked for suspension, but before > GetObjectMonitorUsage() was able to gather the info. > > All three of those calls: SuspendThread(), GetObjectMonitorUsage() > and ResumeThread() are made by the agent and the agent should not > resume the target thread while also calling GetObjectMonitorUsage(). > The calls were allowed to be made out of order so agent bug. Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >> Using a handshake on the owner thread will allow this to be fixed in >> the future without forcing/using any safepoints. > > I have to think about that which is why I'm avoiding talking about > handshakes in this thread. Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. Cheers, David > Dan > > > >> >> Cheers, >> David >> >>> Dan >>> >>> >>>> >>>> >>>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>>> state, it returns `true` when the thread is sleeping [3], or when >>>>> it performs in native [4]. >>>> >>>> Sure but if the thread is actually suspended it can't continue >>>> execution in the VM or in Java code. >>>> >>>>> >>>>>> This appears to be an optimisation for the assumed common case >>>>>> where threads are first suspended and then the monitors are queried. >>>>> >>>>> I agree with this, but I could find out it from JVMTI spec - it >>>>> just says "Get information about the object's monitor." >>>> >>>> Yes it was just an implementation optimisation, nothing to do with >>>> the spec. >>>> >>>>> GetObjectMonitorUsage() might return incorrect information in some >>>>> case. >>>>> >>>>> It starts with finding owner thread, but the owner might be just >>>>> before wakeup. >>>>> So I think it is more safe if GetObjectMonitorUsage() is called at >>>>> safepoint in any case. >>>> >>>> Except we're moving away from safepoints to using Handshakes, so >>>> this particular operation will require that the apparent owner is >>>> Handshake-safe (by entering a handshake with it) before querying the >>>> monitor. This would still be preferable I think to always using a >>>> safepoint for the entire operation. >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> [3] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>> >>>>> [4] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>> >>>>> >>>>> >>>>>> However there is still a potential bug as the thread reported as >>>>>> the owner may not be suspended at the time we first see it, and >>>>>> may release the monitor, but then it may get suspended before we >>>>>> call: >>>>>> >>>>>> ??owning_thread = >>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>> >>>>>> and so we think it is still the monitor owner and proceed to query >>>>>> the monitor information in a racy way. This can't happen when >>>>>> suspension itself requires a safepoint as the current thread won't >>>>>> go to that safepoint during this code. However, if suspension is >>>>>> implemented via a direct handshake with the target thread then we >>>>>> have a problem. >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>> >>>>>>> [2] >>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>> >>> > From chris.plummer at oracle.com Mon Jun 15 23:40:13 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 15 Jun 2020 16:40:13 -0700 Subject: RFR(XS): 8246369: CodeCache.findBlobUnsafe(addr) sometimes asserts with valid address In-Reply-To: References: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From jcbeyler at google.com Tue Jun 16 00:25:33 2020 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Mon, 15 Jun 2020 17:25:33 -0700 Subject: JVMTI callback SampledObjectAlloc always fires for first allocation in a new thread In-Reply-To: References: Message-ID: Hi Markus, I played around adding your Java code in the testing framework and I don't get exactly the same failure as you do. Basically, I get about 5% samples compared to the number of threads, whereas you seem to get a sample for each element. Could you add the code you used for the agent so I can see if you are doing something different than I am in that regard? This doesn't change the issue, I'm just curious why you seem to be exposing it more. I'm still digging into what would be the right solution for this. Thanks, Jc On Mon, Jun 15, 2020 at 9:53 AM Jean Christophe Beyler wrote: > Hi Markus, > > I created: > https://bugs.openjdk.java.net/browse/JDK-8247615 > > And I'll see what needs to be done for it :) > Jc > > On Fri, Jun 5, 2020 at 3:45 AM Markus Gaisbauer < > markus.gaisbauer at gmail.com> wrote: > >> Hi, >> >> JVMTI callback SampledObjectAlloc is currently always called for the >> first allocation of a thread. This generates a lot of bias in an >> application that regularly starts new threads. >> >> I tested this with latest Java 11 and Java 15. >> >> E.g. here is a sample that creates 100 threads and allocates one object >> in each thread. >> >> public class AllocationProfilingBiasReproducer { >> public static void main(String[] args) throws Exception { >> for (int i = 0; i < 100; i++) { >> new Thread(new Task(), "Task " + i).start(); >> Thread.sleep(1); >> } >> Thread.sleep(1000); >> } >> private static class Task implements Runnable { >> @Override >> public void run() { >> new A(); >> } >> } >> private static class A { >> } >> } >> >> I built a simple JVMTI agent that registers SampledObjectAlloc callback >> and sets interval to 1 MB with SetHeapSamplingInterval. The callback simply >> logs thread name and class name of allocated object. >> >> I see the following output: >> >> SampledObjectAlloc Ljava/lang/String; via Task 0 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 1 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 2 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 3 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 4 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 5 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 6 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 7 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 8 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 9 >> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 10 >> ... >> >> This is not expected. >> >> I set a breakpoint in my SampledObjectAlloc callback and observed the >> following: >> >> In MemAllocator::Allocation::notify_allocation_jvmti_sampler() the local >> var bytes_since_last is always 0xf1f1f1f1f1f1f1f1 for first allocation of a >> thread. So first allocation is always reported to my agent. >> >> ThreadLocalAllocBuffer::_bytes_since_last_sample_point does not seem to >> be explicitly initialized before accessing it for the first time. I assume >> 0xf1f1f1f1f1f1f1f1 is a default value provided by some Hotspot allocator. >> Only after the first event fired, notify_allocation_jvmti_sampler >> calls ThreadLocalAllocBuffer::set_sample_end which initializes >> _bytes_since_last_sample_point to a proper value. >> >> I am looking for someone who could create a JIRA ticket for this. >> >> Regards, >> Markus >> > > > -- > > Thanks, > Jc > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Tue Jun 16 00:57:56 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 15 Jun 2020 20:57:56 -0400 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> Message-ID: On 6/15/20 7:19 PM, David Holmes wrote: > On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >> On 6/15/20 6:14 PM, David Holmes wrote: >>> Hi Dan, >>> >>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>> Hi David, >>>>>> >>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>>>>>> safepoint. >>>>>>> >>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not >>>>>>> suspended: >>>>>>> >>>>>>> jvmtiError >>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>>> object, info_ptr); >>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>> ???? // Some of the critical threads were not suspended. go to a >>>>>>> safepoint and try again >>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>>>> info_ptr); >>>>>>> ???? VMThread::execute(&op); >>>>>>> ???? err = op.result(); >>>>>>> ?? } >>>>>>> ?? return err; >>>>>>> } /* end GetObject */ >>>>>> >>>>>> I saw this code, so I guess there are some cases when >>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>> get_object_monitor_usage(). >>>>>> >>>>>> >>>>>>>> Monitor owner would be acquired from monitor object at first >>>>>>>> [1], but it would perform concurrently. >>>>>>>> If owner thread is not suspended, the owner might be changed to >>>>>>>> others in subsequent code. >>>>>>>> >>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>> >>>>>>> The expectation is that when we find an owner thread it is >>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>> release the monitor. If it is not suspended we detect that and >>>>>>> redo the whole query at a safepoint. >>>>>> >>>>>> I think the owner thread might resume unfortunately after >>>>>> suspending check. >>>>> >>>>> Yes you are right. I was thinking resuming also required a >>>>> safepoint but it only requires the Threads_lock. So yes the code >>>>> is wrong. >>>> >>>> Which code is wrong? >>>> >>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>> has started the process of gathering the information while not at a >>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>> might be stale, but that's a bug in the agent code. >>> >>> The code tries to make sure that it either collects data about a >>> monitor owned by a thread that is suspended, or else it collects >>> that data at a safepoint. But the owning thread can be resumed just >>> after the code determined it was suspended. The monitor can then be >>> released and the information gathered not only stale but potentially >>> completely wrong as it could now be owned by a different thread and >>> will report that thread's entry count. >> >> If the agent is not using SuspendThread(), then as soon as >> GetObjectMonitorUsage() returns to the caller the information >> can be stale. In fact as soon as the implementation returns >> from the safepoint that gathered the info, the target thread >> could have moved on. > > That isn't the issue. That the info is stale is fine. But the > expectation is that the information was actually an accurate snapshot > of the state of the monitor at some point in time. The current code > does not ensure that. Please explain. I clearly don't understand why you think the info returned isn't "an accurate snapshot of the state of the monitor at some point in time". > >> The only way to make sure you don't have stale information is >> to use SuspendThread(), but it's not required. Perhaps the doc >> should have more clear about the possibility of returning stale >> info. That's a question for Robert F. >> >> >>> GetObjectMonitorUsage says nothing about thread's being suspended so >>> I can't see how this could be construed as an agent bug. >> >> In your scenario above, you mention that the target thread was >> suspended, GetObjectMonitorUsage() was called while the target >> was suspended, and then the target thread was resumed after >> GetObjectMonitorUsage() checked for suspension, but before >> GetObjectMonitorUsage() was able to gather the info. >> >> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >> and ResumeThread() are made by the agent and the agent should not >> resume the target thread while also calling GetObjectMonitorUsage(). >> The calls were allowed to be made out of order so agent bug. > > Perhaps. I was thinking more generally about an independent resume, > but you're right that doesn't really make a lot of sense. But when the > spec says nothing about suspension ... And it is intentional that suspension is not required. JVM/DI and JVM/PI used to require suspension for these kinds of get-the-info APIs. JVM/TI intentionally was designed to not require suspension. As I've said before, we could add a note about the data being potentially stale unless SuspendThread is used. I think of it like stat(2). You can fetch the file's info, but there's no guarantee that the info is current by the time you process what you got back. Is it too much motherhood to state that the data might be stale? I could go either way... > >>> Using a handshake on the owner thread will allow this to be fixed in >>> the future without forcing/using any safepoints. >> >> I have to think about that which is why I'm avoiding talking about >> handshakes in this thread. > > Effectively the handshake can "suspend" the thread whilst the monitor > is queried. In effect the operation would create a per-thread safepoint. I "know" that, but I still need time to think about it and probably see the code to see if there are holes... > Semantically it is no different to the code actually suspending the > owner thread, but it can't actually do that because suspends/resume > don't nest. Yeah... we used have a suspend count back when we tracked internal and external suspends separately. That was a nightmare... Dan > > Cheers, > David > >> Dan >> >> >> >>> >>> Cheers, >>> David >>> >>>> Dan >>>> >>>> >>>>> >>>>> >>>>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>>>> state, it returns `true` when the thread is sleeping [3], or when >>>>>> it performs in native [4]. >>>>> >>>>> Sure but if the thread is actually suspended it can't continue >>>>> execution in the VM or in Java code. >>>>> >>>>>> >>>>>>> This appears to be an optimisation for the assumed common case >>>>>>> where threads are first suspended and then the monitors are >>>>>>> queried. >>>>>> >>>>>> I agree with this, but I could find out it from JVMTI spec - it >>>>>> just says "Get information about the object's monitor." >>>>> >>>>> Yes it was just an implementation optimisation, nothing to do with >>>>> the spec. >>>>> >>>>>> GetObjectMonitorUsage() might return incorrect information in >>>>>> some case. >>>>>> >>>>>> It starts with finding owner thread, but the owner might be just >>>>>> before wakeup. >>>>>> So I think it is more safe if GetObjectMonitorUsage() is called >>>>>> at safepoint in any case. >>>>> >>>>> Except we're moving away from safepoints to using Handshakes, so >>>>> this particular operation will require that the apparent owner is >>>>> Handshake-safe (by entering a handshake with it) before querying >>>>> the monitor. This would still be preferable I think to always >>>>> using a safepoint for the entire operation. >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> [3] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>> >>>>>> [4] >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>> >>>>>> >>>>>> >>>>>>> However there is still a potential bug as the thread reported as >>>>>>> the owner may not be suspended at the time we first see it, and >>>>>>> may release the monitor, but then it may get suspended before we >>>>>>> call: >>>>>>> >>>>>>> ??owning_thread = >>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>> >>>>>>> and so we think it is still the monitor owner and proceed to >>>>>>> query the monitor information in a racy way. This can't happen >>>>>>> when suspension itself requires a safepoint as the current >>>>>>> thread won't go to that safepoint during this code. However, if >>>>>>> suspension is implemented via a direct handshake with the target >>>>>>> thread then we have a problem. >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>> >>>>>>>> [2] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>> >>>> >> From david.holmes at oracle.com Tue Jun 16 01:28:53 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Jun 2020 11:28:53 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> Message-ID: On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: > On 6/15/20 7:19 PM, David Holmes wrote: >> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>> On 6/15/20 6:14 PM, David Holmes wrote: >>>> Hi Dan, >>>> >>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>> (implementation of GetObjectMonitorUsage()) does not perform at >>>>>>>>> safepoint. >>>>>>>> >>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not >>>>>>>> suspended: >>>>>>>> >>>>>>>> jvmtiError >>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>>>> object, info_ptr); >>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>> ???? // Some of the critical threads were not suspended. go to a >>>>>>>> safepoint and try again >>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>>>>> info_ptr); >>>>>>>> ???? VMThread::execute(&op); >>>>>>>> ???? err = op.result(); >>>>>>>> ?? } >>>>>>>> ?? return err; >>>>>>>> } /* end GetObject */ >>>>>>> >>>>>>> I saw this code, so I guess there are some cases when >>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>> get_object_monitor_usage(). >>>>>>> >>>>>>> >>>>>>>>> Monitor owner would be acquired from monitor object at first >>>>>>>>> [1], but it would perform concurrently. >>>>>>>>> If owner thread is not suspended, the owner might be changed to >>>>>>>>> others in subsequent code. >>>>>>>>> >>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>> >>>>>>>> The expectation is that when we find an owner thread it is >>>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>>> release the monitor. If it is not suspended we detect that and >>>>>>>> redo the whole query at a safepoint. >>>>>>> >>>>>>> I think the owner thread might resume unfortunately after >>>>>>> suspending check. >>>>>> >>>>>> Yes you are right. I was thinking resuming also required a >>>>>> safepoint but it only requires the Threads_lock. So yes the code >>>>>> is wrong. >>>>> >>>>> Which code is wrong? >>>>> >>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>> has started the process of gathering the information while not at a >>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>> might be stale, but that's a bug in the agent code. >>>> >>>> The code tries to make sure that it either collects data about a >>>> monitor owned by a thread that is suspended, or else it collects >>>> that data at a safepoint. But the owning thread can be resumed just >>>> after the code determined it was suspended. The monitor can then be >>>> released and the information gathered not only stale but potentially >>>> completely wrong as it could now be owned by a different thread and >>>> will report that thread's entry count. >>> >>> If the agent is not using SuspendThread(), then as soon as >>> GetObjectMonitorUsage() returns to the caller the information >>> can be stale. In fact as soon as the implementation returns >>> from the safepoint that gathered the info, the target thread >>> could have moved on. >> >> That isn't the issue. That the info is stale is fine. But the >> expectation is that the information was actually an accurate snapshot >> of the state of the monitor at some point in time. The current code >> does not ensure that. > > Please explain. I clearly don't understand why you think the info > returned isn't "an accurate snapshot of the state of the monitor > at some point in time". Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). David ----- > > >> >>> The only way to make sure you don't have stale information is >>> to use SuspendThread(), but it's not required. Perhaps the doc >>> should have more clear about the possibility of returning stale >>> info. That's a question for Robert F. >>> >>> >>>> GetObjectMonitorUsage says nothing about thread's being suspended so >>>> I can't see how this could be construed as an agent bug. >>> >>> In your scenario above, you mention that the target thread was >>> suspended, GetObjectMonitorUsage() was called while the target >>> was suspended, and then the target thread was resumed after >>> GetObjectMonitorUsage() checked for suspension, but before >>> GetObjectMonitorUsage() was able to gather the info. >>> >>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>> and ResumeThread() are made by the agent and the agent should not >>> resume the target thread while also calling GetObjectMonitorUsage(). >>> The calls were allowed to be made out of order so agent bug. >> >> Perhaps. I was thinking more generally about an independent resume, >> but you're right that doesn't really make a lot of sense. But when the >> spec says nothing about suspension ... > > And it is intentional that suspension is not required. JVM/DI and JVM/PI > used to require suspension for these kinds of get-the-info APIs. JVM/TI > intentionally was designed to not require suspension. > > As I've said before, we could add a note about the data being potentially > stale unless SuspendThread is used. I think of it like stat(2). You can > fetch the file's info, but there's no guarantee that the info is current > by the time you process what you got back. Is it too much motherhood to > state that the data might be stale? I could go either way... > > >> >>>> Using a handshake on the owner thread will allow this to be fixed in >>>> the future without forcing/using any safepoints. >>> >>> I have to think about that which is why I'm avoiding talking about >>> handshakes in this thread. >> >> Effectively the handshake can "suspend" the thread whilst the monitor >> is queried. In effect the operation would create a per-thread safepoint. > > I "know" that, but I still need time to think about it and probably > see the code to see if there are holes... > > >> Semantically it is no different to the code actually suspending the >> owner thread, but it can't actually do that because suspends/resume >> don't nest. > > Yeah... we used have a suspend count back when we tracked internal and > external suspends separately. That was a nightmare... > > Dan > > >> >> Cheers, >> David >> >>> Dan >>> >>> >>> >>>> >>>> Cheers, >>>> David >>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> >>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>>>>> state, it returns `true` when the thread is sleeping [3], or when >>>>>>> it performs in native [4]. >>>>>> >>>>>> Sure but if the thread is actually suspended it can't continue >>>>>> execution in the VM or in Java code. >>>>>> >>>>>>> >>>>>>>> This appears to be an optimisation for the assumed common case >>>>>>>> where threads are first suspended and then the monitors are >>>>>>>> queried. >>>>>>> >>>>>>> I agree with this, but I could find out it from JVMTI spec - it >>>>>>> just says "Get information about the object's monitor." >>>>>> >>>>>> Yes it was just an implementation optimisation, nothing to do with >>>>>> the spec. >>>>>> >>>>>>> GetObjectMonitorUsage() might return incorrect information in >>>>>>> some case. >>>>>>> >>>>>>> It starts with finding owner thread, but the owner might be just >>>>>>> before wakeup. >>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called >>>>>>> at safepoint in any case. >>>>>> >>>>>> Except we're moving away from safepoints to using Handshakes, so >>>>>> this particular operation will require that the apparent owner is >>>>>> Handshake-safe (by entering a handshake with it) before querying >>>>>> the monitor. This would still be preferable I think to always >>>>>> using a safepoint for the entire operation. >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> [3] >>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>> >>>>>>> [4] >>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>> >>>>>>> >>>>>>> >>>>>>>> However there is still a potential bug as the thread reported as >>>>>>>> the owner may not be suspended at the time we first see it, and >>>>>>>> may release the monitor, but then it may get suspended before we >>>>>>>> call: >>>>>>>> >>>>>>>> ??owning_thread = >>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>> >>>>>>>> and so we think it is still the monitor owner and proceed to >>>>>>>> query the monitor information in a racy way. This can't happen >>>>>>>> when suspension itself requires a safepoint as the current >>>>>>>> thread won't go to that safepoint during this code. However, if >>>>>>>> suspension is implemented via a direct handshake with the target >>>>>>>> thread then we have a problem. >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>> >>>>> >>> > From daniel.daugherty at oracle.com Tue Jun 16 14:39:21 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 16 Jun 2020 10:39:21 -0400 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> Message-ID: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> On 6/15/20 9:28 PM, David Holmes wrote: > On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >> On 6/15/20 7:19 PM, David Holmes wrote: >>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>> Hi Dan, >>>>> >>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not perform >>>>>>>>>> at safepoint. >>>>>>>>> >>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is >>>>>>>>> not suspended: >>>>>>>>> >>>>>>>>> jvmtiError >>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>>>>> object, info_ptr); >>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>> ???? // Some of the critical threads were not suspended. go to >>>>>>>>> a safepoint and try again >>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>>>>>> info_ptr); >>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>> ???? err = op.result(); >>>>>>>>> ?? } >>>>>>>>> ?? return err; >>>>>>>>> } /* end GetObject */ >>>>>>>> >>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>> get_object_monitor_usage(). >>>>>>>> >>>>>>>> >>>>>>>>>> Monitor owner would be acquired from monitor object at first >>>>>>>>>> [1], but it would perform concurrently. >>>>>>>>>> If owner thread is not suspended, the owner might be changed >>>>>>>>>> to others in subsequent code. >>>>>>>>>> >>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>> >>>>>>>>> The expectation is that when we find an owner thread it is >>>>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>>>> release the monitor. If it is not suspended we detect that and >>>>>>>>> redo the whole query at a safepoint. >>>>>>>> >>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>> suspending check. >>>>>>> >>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>> safepoint but it only requires the Threads_lock. So yes the code >>>>>>> is wrong. >>>>>> >>>>>> Which code is wrong? >>>>>> >>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() >>>>>> caller >>>>>> has started the process of gathering the information while not at a >>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>> might be stale, but that's a bug in the agent code. >>>>> >>>>> The code tries to make sure that it either collects data about a >>>>> monitor owned by a thread that is suspended, or else it collects >>>>> that data at a safepoint. But the owning thread can be resumed >>>>> just after the code determined it was suspended. The monitor can >>>>> then be released and the information gathered not only stale but >>>>> potentially completely wrong as it could now be owned by a >>>>> different thread and will report that thread's entry count. >>>> >>>> If the agent is not using SuspendThread(), then as soon as >>>> GetObjectMonitorUsage() returns to the caller the information >>>> can be stale. In fact as soon as the implementation returns >>>> from the safepoint that gathered the info, the target thread >>>> could have moved on. >>> >>> That isn't the issue. That the info is stale is fine. But the >>> expectation is that the information was actually an accurate >>> snapshot of the state of the monitor at some point in time. The >>> current code does not ensure that. >> >> Please explain. I clearly don't understand why you think the info >> returned isn't "an accurate snapshot of the state of the monitor >> at some point in time". > > Because it may not be a "snapshot" at all. There is no atomicity**. > The reported owner thread may not own it any longer when the entry > count is read, so straight away you may have the wrong entry count > information. The set of threads trying to acquire the monitor, or wait > on the monitor can change in unexpected ways. It would be possible for > instance to report the same thread as being the owner, being blocked > trying to enter the monitor, and being in the wait-set of the monitor > - apparently all at the same time! > > ** even if the owner is suspended we don't have complete atomicity > because threads can join the set of threads trying to enter the > monitor (unless they are all suspended). Consider the case when the monitor's owner is _not_ suspended: ? - GetObjectMonitorUsage() uses a safepoint to gather the info about ??? the object's monitor. Since we're at a safepoint, the info that ??? we are gathering cannot change until we return from the safepoint. ??? It is a snapshot and a valid one at that. Consider the case when the monitor's owner is suspended: ? - GetObjectMonitorUsage() will gather info about the object's ??? monitor while _not_ at a safepoint. Assuming that no other ??? thread is suspended, then entry_count can change because ??? another thread can block on entry while we are gathering ??? info. waiter_count and waiters can change if a thread was ??? in a timed wait that has timed out and now that thread is ??? blocked on re-entry. I don't think that notify_waiter_count ??? and notify_waiters can change. ??? So in this case, the owner info and notify info is stable, ??? but the entry_count and waiter info is not stable. Consider the case when the monitor is not owned: ? - GetObjectMonitorUsage() will start to gather info about the ??? object's monitor while _not_ at a safepoint. If it finds a ??? thread on the entry queue that is not suspended, then it will ??? bail out and redo the info gather at a safepoint. I just ??? noticed that it doesn't check for suspension for the threads ??? on the waiters list so a timed Object.wait() call can cause ??? some confusion here. ??? So in this case, the owner info is not stable if a thread ??? comes out of a timed wait and reenters the monitor. This ??? case is no different than if a "barger" thread comes in ??? after the NULL owner field is observed and enters the ??? monitor. We'll return that there is no owner, a list of ??? suspended pending entry thread and a list of waiting ??? threads. The reality is that the object's monitor is ??? owned by the "barger" that completely bypassed the entry ??? queue by virtue of seeing the NULL owner field at exactly ??? the right time. So the owner field is only stable when we have an owner. If that owner is not suspended, then the other fields are also stable because we gathered the info at a safepoint. If the owner is suspended, then the owner and notify info is stable, but the entry_count and waiter info is not stable. If we have a NULL owner field, then the info is only stable if you have a non-suspended thread on the entry list. Ouch! That's deterministic, but not without some work. Okay so only when we gather the info at a safepoint is all of it a valid and stable snapshot. Unfortunately, we only do that at a safepoint when the owner thread is not suspended or if owner == NULL and one of the entry threads is not suspended. If either of those conditions is not true, then the different pieces of info is unstable to varying degrees. As for this claim: > It would be possible for instance to report the same thread > as being the owner, being blocked trying to enter the monitor, > and being in the wait-set of the monitor - apparently all at > the same time! I can't figure out a way to make that scenario work. If the thread is seen as the owner and is not suspended, then we gather info at a safepoint. If it is suspended, then it can't then be seen as on the entry queue or on the wait queue since it is suspended. If it is seen on the entry queue and is not suspended, then we gather info at a safepoint. If it is suspended on the entry queue, then it can't be seen on the wait queue. So the info instability of this API is bad, but it's not quite that bad. :-) (That is a small mercy.) Handshaking is not going to make this situation any better for GetObjectMonitorUsage(). If the monitor is owned and we handshake with the owner, the stability or instability of the other fields remains the same as when SuspendThread is used. Handshaking with all threads won't make the data as stable as when at a safepoint because individual threads can resume execution after doing their handshake so there will still be field instability. Short version: GetObjectMonitorUsage() should only gather data at a safepoint. Yes, I've changed my mind. Dan > > David > ----- > >> >> >>> >>>> The only way to make sure you don't have stale information is >>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>> should have more clear about the possibility of returning stale >>>> info. That's a question for Robert F. >>>> >>>> >>>>> GetObjectMonitorUsage says nothing about thread's being suspended >>>>> so I can't see how this could be construed as an agent bug. >>>> >>>> In your scenario above, you mention that the target thread was >>>> suspended, GetObjectMonitorUsage() was called while the target >>>> was suspended, and then the target thread was resumed after >>>> GetObjectMonitorUsage() checked for suspension, but before >>>> GetObjectMonitorUsage() was able to gather the info. >>>> >>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>> and ResumeThread() are made by the agent and the agent should not >>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>> The calls were allowed to be made out of order so agent bug. >>> >>> Perhaps. I was thinking more generally about an independent resume, >>> but you're right that doesn't really make a lot of sense. But when >>> the spec says nothing about suspension ... >> >> And it is intentional that suspension is not required. JVM/DI and JVM/PI >> used to require suspension for these kinds of get-the-info APIs. JVM/TI >> intentionally was designed to not require suspension. >> >> As I've said before, we could add a note about the data being >> potentially >> stale unless SuspendThread is used. I think of it like stat(2). You can >> fetch the file's info, but there's no guarantee that the info is current >> by the time you process what you got back. Is it too much motherhood to >> state that the data might be stale? I could go either way... >> >> >>> >>>>> Using a handshake on the owner thread will allow this to be fixed >>>>> in the future without forcing/using any safepoints. >>>> >>>> I have to think about that which is why I'm avoiding talking about >>>> handshakes in this thread. >>> >>> Effectively the handshake can "suspend" the thread whilst the >>> monitor is queried. In effect the operation would create a >>> per-thread safepoint. >> >> I "know" that, but I still need time to think about it and probably >> see the code to see if there are holes... >> >> >>> Semantically it is no different to the code actually suspending the >>> owner thread, but it can't actually do that because suspends/resume >>> don't nest. >> >> Yeah... we used have a suspend count back when we tracked internal and >> external suspends separately. That was a nightmare... >> >> Dan >> >> >>> >>> Cheers, >>> David >>> >>>> Dan >>>> >>>> >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>>>>>> state, it returns `true` when the thread is sleeping [3], or >>>>>>>> when it performs in native [4]. >>>>>>> >>>>>>> Sure but if the thread is actually suspended it can't continue >>>>>>> execution in the VM or in Java code. >>>>>>> >>>>>>>> >>>>>>>>> This appears to be an optimisation for the assumed common case >>>>>>>>> where threads are first suspended and then the monitors are >>>>>>>>> queried. >>>>>>>> >>>>>>>> I agree with this, but I could find out it from JVMTI spec - it >>>>>>>> just says "Get information about the object's monitor." >>>>>>> >>>>>>> Yes it was just an implementation optimisation, nothing to do >>>>>>> with the spec. >>>>>>> >>>>>>>> GetObjectMonitorUsage() might return incorrect information in >>>>>>>> some case. >>>>>>>> >>>>>>>> It starts with finding owner thread, but the owner might be >>>>>>>> just before wakeup. >>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called >>>>>>>> at safepoint in any case. >>>>>>> >>>>>>> Except we're moving away from safepoints to using Handshakes, so >>>>>>> this particular operation will require that the apparent owner >>>>>>> is Handshake-safe (by entering a handshake with it) before >>>>>>> querying the monitor. This would still be preferable I think to >>>>>>> always using a safepoint for the entire operation. >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> [3] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>> >>>>>>>> [4] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> However there is still a potential bug as the thread reported >>>>>>>>> as the owner may not be suspended at the time we first see it, >>>>>>>>> and may release the monitor, but then it may get suspended >>>>>>>>> before we call: >>>>>>>>> >>>>>>>>> ??owning_thread = >>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>> >>>>>>>>> and so we think it is still the monitor owner and proceed to >>>>>>>>> query the monitor information in a racy way. This can't happen >>>>>>>>> when suspension itself requires a safepoint as the current >>>>>>>>> thread won't go to that safepoint during this code. However, >>>>>>>>> if suspension is implemented via a direct handshake with the >>>>>>>>> target thread then we have a problem. >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>> >>>>>>>>>> [2] >>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>> >>>>>> >>>> >> From chris.plummer at oracle.com Tue Jun 16 21:10:17 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 16 Jun 2020 14:10:17 -0700 Subject: RFR(XS): 8246369: CodeCache.findBlobUnsafe(addr) sometimes asserts with valid address In-Reply-To: References: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> Message-ID: <3c91793e-5871-33c6-a176-64692965d6b9@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Jun 16 22:35:37 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 17 Jun 2020 08:35:37 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> Message-ID: <2f549182-b1fc-a920-fe8f-5181bc8b8f6b@oracle.com> Hi Dan, On 17/06/2020 12:39 am, Daniel D. Daugherty wrote: > On 6/15/20 9:28 PM, David Holmes wrote: >> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>> On 6/15/20 7:19 PM, David Holmes wrote: >>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>> Hi Dan, >>>>>> >>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not perform >>>>>>>>>>> at safepoint. >>>>>>>>>> >>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is >>>>>>>>>> not suspended: >>>>>>>>>> >>>>>>>>>> jvmtiError >>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>>>>>> object, info_ptr); >>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>> ???? // Some of the critical threads were not suspended. go to >>>>>>>>>> a safepoint and try again >>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, >>>>>>>>>> info_ptr); >>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>> ???? err = op.result(); >>>>>>>>>> ?? } >>>>>>>>>> ?? return err; >>>>>>>>>> } /* end GetObject */ >>>>>>>>> >>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>> get_object_monitor_usage(). >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Monitor owner would be acquired from monitor object at first >>>>>>>>>>> [1], but it would perform concurrently. >>>>>>>>>>> If owner thread is not suspended, the owner might be changed >>>>>>>>>>> to others in subsequent code. >>>>>>>>>>> >>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>> >>>>>>>>>> The expectation is that when we find an owner thread it is >>>>>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>>>>> release the monitor. If it is not suspended we detect that and >>>>>>>>>> redo the whole query at a safepoint. >>>>>>>>> >>>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>>> suspending check. >>>>>>>> >>>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>>> safepoint but it only requires the Threads_lock. So yes the code >>>>>>>> is wrong. >>>>>>> >>>>>>> Which code is wrong? >>>>>>> >>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() >>>>>>> caller >>>>>>> has started the process of gathering the information while not at a >>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>> might be stale, but that's a bug in the agent code. >>>>>> >>>>>> The code tries to make sure that it either collects data about a >>>>>> monitor owned by a thread that is suspended, or else it collects >>>>>> that data at a safepoint. But the owning thread can be resumed >>>>>> just after the code determined it was suspended. The monitor can >>>>>> then be released and the information gathered not only stale but >>>>>> potentially completely wrong as it could now be owned by a >>>>>> different thread and will report that thread's entry count. >>>>> >>>>> If the agent is not using SuspendThread(), then as soon as >>>>> GetObjectMonitorUsage() returns to the caller the information >>>>> can be stale. In fact as soon as the implementation returns >>>>> from the safepoint that gathered the info, the target thread >>>>> could have moved on. >>>> >>>> That isn't the issue. That the info is stale is fine. But the >>>> expectation is that the information was actually an accurate >>>> snapshot of the state of the monitor at some point in time. The >>>> current code does not ensure that. >>> >>> Please explain. I clearly don't understand why you think the info >>> returned isn't "an accurate snapshot of the state of the monitor >>> at some point in time". >> >> Because it may not be a "snapshot" at all. There is no atomicity**. >> The reported owner thread may not own it any longer when the entry >> count is read, so straight away you may have the wrong entry count >> information. The set of threads trying to acquire the monitor, or wait >> on the monitor can change in unexpected ways. It would be possible for >> instance to report the same thread as being the owner, being blocked >> trying to enter the monitor, and being in the wait-set of the monitor >> - apparently all at the same time! >> >> ** even if the owner is suspended we don't have complete atomicity >> because threads can join the set of threads trying to enter the >> monitor (unless they are all suspended). > > Consider the case when the monitor's owner is _not_ suspended: > > ? - GetObjectMonitorUsage() uses a safepoint to gather the info about > ??? the object's monitor. Since we're at a safepoint, the info that > ??? we are gathering cannot change until we return from the safepoint. > ??? It is a snapshot and a valid one at that. Correct. > Consider the case when the monitor's owner is suspended: > > ? - GetObjectMonitorUsage() will gather info about the object's > ??? monitor while _not_ at a safepoint. Assuming that no other > ??? thread is suspended, then entry_count can change because > ??? another thread can block on entry while we are gathering Terminology correction: "entry count" is the number of times the owner thread has acquired the monitor. The "waiter_count" is the number of threads trying to acquire the monitor. The "notify_waiter" count is the number of threads in the wait-set of the monitor. The waiter_count can change while we are gathering information. > ??? info. waiter_count and waiters can change if a thread was > ??? in a timed wait that has timed out and now that thread is > ??? blocked on re-entry. I don't think that notify_waiter_count > ??? and notify_waiters can change. Yes I missed the case where a notify_waiter was in a timed-wait, or is interrupted and so can move from the notify_waiters list to the waiters list. > ??? So in this case, the owner info and notify info is stable, > ??? but the entry_count and waiter info is not stable. > > Consider the case when the monitor is not owned: > > ? - GetObjectMonitorUsage() will start to gather info about the > ??? object's monitor while _not_ at a safepoint. If it finds a > ??? thread on the entry queue that is not suspended, then it will > ??? bail out and redo the info gather at a safepoint. I just > ??? noticed that it doesn't check for suspension for the threads > ??? on the waiters list so a timed Object.wait() call can cause > ??? some confusion here. I hadn't spotted that additional fallback to using a safepoint. > ??? So in this case, the owner info is not stable if a thread > ??? comes out of a timed wait and reenters the monitor. This > ??? case is no different than if a "barger" thread comes in > ??? after the NULL owner field is observed and enters the > ??? monitor. We'll return that there is no owner, a list of > ??? suspended pending entry thread and a list of waiting > ??? threads. The reality is that the object's monitor is > ??? owned by the "barger" that completely bypassed the entry > ??? queue by virtue of seeing the NULL owner field at exactly > ??? the right time. > > So the owner field is only stable when we have an owner. If > that owner is not suspended, then the other fields are also > stable because we gathered the info at a safepoint. If the > owner is suspended, then the owner and notify info is stable, > but the entry_count and waiter info is not stable. > > If we have a NULL owner field, then the info is only stable > if you have a non-suspended thread on the entry list. Ouch! > That's deterministic, but not without some work. > > > Okay so only when we gather the info at a safepoint is all > of it a valid and stable snapshot. Unfortunately, we only > do that at a safepoint when the owner thread is not suspended > or if owner == NULL and one of the entry threads is not > suspended. If either of those conditions is not true, then > the different pieces of info is unstable to varying degrees. Yes you are right. The "snapshot" semantics are even worse than I had thought. This just highlights how meaningless it is to request this information when not all the threads are suspended. That said there are two bits of information that are connected and should always be correct: that the entry_count reflects the number of times the owner thread has acquired the monitor. That relationship is correct if we use a safepoint or if the owner thread is suspended and remains suspended. > As for this claim: > >> It would be possible for instance to report the same thread >> as being the owner, being blocked trying to enter the monitor, >> and being in the wait-set of the monitor - apparently all at >> the same time! > > I can't figure out a way to make that scenario work. If the > thread is seen as the owner and is not suspended, then we > gather info at a safepoint. If it is suspended, then it can't > then be seen as on the entry queue or on the wait queue since > it is suspended. This is the case where the suspended thread gets resumed after it has been seen to be suspended, so we are not at a safepoint. The "owner" thread can release the monitor, a new thread acquire it and the "owner" then block trying to reacquire it - and so appear in the waiters list. Meanwhile the new owner releases the monitor, "owner" acquires it again and performs a wait() - now it is seen on the notify_waiters list. > If it is seen on the entry queue and is not > suspended, then we gather info at a safepoint. If it is > suspended on the entry queue, then it can't be seen on the > wait queue. > > So the info instability of this API is bad, but it's not > quite that bad. :-) (That is a small mercy.) > > > Handshaking is not going to make this situation any better > for GetObjectMonitorUsage(). If the monitor is owned and we > handshake with the owner, the stability or instability of > the other fields remains the same as when SuspendThread is > used. Handshaking with all threads won't make the data as > stable as when at a safepoint because individual threads > can resume execution after doing their handshake so there > will still be field instability. A handshake with the owner thread that "suspends" the owner, at least addresses the problem of a suspended thread being resumed. But as you note it is otherwise no more "stable" than a well-behaved suspended owner case today. > > Short version: GetObjectMonitorUsage() should only gather > data at a safepoint. Yes, I've changed my mind. :) I should have skipped to the bottom first. Cheers, David > Dan > >> >> David >> ----- >> >>> >>> >>>> >>>>> The only way to make sure you don't have stale information is >>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>> should have more clear about the possibility of returning stale >>>>> info. That's a question for Robert F. >>>>> >>>>> >>>>>> GetObjectMonitorUsage says nothing about thread's being suspended >>>>>> so I can't see how this could be construed as an agent bug. >>>>> >>>>> In your scenario above, you mention that the target thread was >>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>> was suspended, and then the target thread was resumed after >>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>> GetObjectMonitorUsage() was able to gather the info. >>>>> >>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>> and ResumeThread() are made by the agent and the agent should not >>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>> The calls were allowed to be made out of order so agent bug. >>>> >>>> Perhaps. I was thinking more generally about an independent resume, >>>> but you're right that doesn't really make a lot of sense. But when >>>> the spec says nothing about suspension ... >>> >>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>> intentionally was designed to not require suspension. >>> >>> As I've said before, we could add a note about the data being >>> potentially >>> stale unless SuspendThread is used. I think of it like stat(2). You can >>> fetch the file's info, but there's no guarantee that the info is current >>> by the time you process what you got back. Is it too much motherhood to >>> state that the data might be stale? I could go either way... >>> >>> >>>> >>>>>> Using a handshake on the owner thread will allow this to be fixed >>>>>> in the future without forcing/using any safepoints. >>>>> >>>>> I have to think about that which is why I'm avoiding talking about >>>>> handshakes in this thread. >>>> >>>> Effectively the handshake can "suspend" the thread whilst the >>>> monitor is queried. In effect the operation would create a >>>> per-thread safepoint. >>> >>> I "know" that, but I still need time to think about it and probably >>> see the code to see if there are holes... >>> >>> >>>> Semantically it is no different to the code actually suspending the >>>> owner thread, but it can't actually do that because suspends/resume >>>> don't nest. >>> >>> Yeah... we used have a suspend count back when we tracked internal and >>> external suspends separately. That was a nightmare... >>> >>> Dan >>> >>> >>>> >>>> Cheers, >>>> David >>>> >>>>> Dan >>>>> >>>>> >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>>>>>>> state, it returns `true` when the thread is sleeping [3], or >>>>>>>>> when it performs in native [4]. >>>>>>>> >>>>>>>> Sure but if the thread is actually suspended it can't continue >>>>>>>> execution in the VM or in Java code. >>>>>>>> >>>>>>>>> >>>>>>>>>> This appears to be an optimisation for the assumed common case >>>>>>>>>> where threads are first suspended and then the monitors are >>>>>>>>>> queried. >>>>>>>>> >>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it >>>>>>>>> just says "Get information about the object's monitor." >>>>>>>> >>>>>>>> Yes it was just an implementation optimisation, nothing to do >>>>>>>> with the spec. >>>>>>>> >>>>>>>>> GetObjectMonitorUsage() might return incorrect information in >>>>>>>>> some case. >>>>>>>>> >>>>>>>>> It starts with finding owner thread, but the owner might be >>>>>>>>> just before wakeup. >>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called >>>>>>>>> at safepoint in any case. >>>>>>>> >>>>>>>> Except we're moving away from safepoints to using Handshakes, so >>>>>>>> this particular operation will require that the apparent owner >>>>>>>> is Handshake-safe (by entering a handshake with it) before >>>>>>>> querying the monitor. This would still be preferable I think to >>>>>>>> always using a safepoint for the entire operation. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> [3] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>> >>>>>>>>> [4] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> However there is still a potential bug as the thread reported >>>>>>>>>> as the owner may not be suspended at the time we first see it, >>>>>>>>>> and may release the monitor, but then it may get suspended >>>>>>>>>> before we call: >>>>>>>>>> >>>>>>>>>> ??owning_thread = >>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>> >>>>>>>>>> and so we think it is still the monitor owner and proceed to >>>>>>>>>> query the monitor information in a racy way. This can't happen >>>>>>>>>> when suspension itself requires a safepoint as the current >>>>>>>>>> thread won't go to that safepoint during this code. However, >>>>>>>>>> if suspension is implemented via a direct handshake with the >>>>>>>>>> target thread then we have a problem. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>> >>>>>>>>>>> [2] >>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>> >>>>>>> >>>>> >>> > From alexey.menkov at oracle.com Tue Jun 16 23:40:49 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Tue, 16 Jun 2020 16:40:49 -0700 Subject: RFR(XS): 8246369: CodeCache.findBlobUnsafe(addr) sometimes asserts with valid address In-Reply-To: <3c91793e-5871-33c6-a176-64692965d6b9@oracle.com> References: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> <3c91793e-5871-33c6-a176-64692965d6b9@oracle.com> Message-ID: <4b2a8b6f-d857-4703-f103-9c16e469520e@oracle.com> Hi Chris, LGTM. --alex On 06/16/2020 14:10, Chris Plummer wrote: > Ping! Can I get one more review please? This is a very simple change and > doesn't really require knowing much about SA. I bit of understanding of > hotspot native heap memory allocations helps a bit (HeapBlock in heap.hpp). > > thanks, > > Chris > > On 6/15/20 4:40 PM, Chris Plummer wrote: >> I'll fix the indenting. >> >> Thanks for the review! >> >> Chris >> >> On 6/15/20 3:12 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Chris, >>> >>> It looks good. >>> 134 if (Assert.ASSERTS_ENABLED) { >>> 135 // The pointer to the HeapBlock that contains this blob is >>> outside of the blob, >>> 136 // but it shouldn't be an error to find a blob based on the >>> pointer to the HeapBlock. >>> 137 // The heap block header is padded out to an 8-byte boundary. See >>> heap.hpp. The >>> 138 // simplest way to compute the header size is just 2 * addressSize. >>> 139 Assert.that(result.blobContains(start) || >>> 140 result.blobContains(start.addOffsetTo(2 * >>> VM.getVM().getAddressSize())), >>> 141 "found wrong CodeBlob"); >>> 142 } >>> The lines 139-141 have wrong indent. >>> No need for another webrev. >>> >>> Thanks >>> Serguei >>> >>> >>> On 6/15/20 13:03, Chris Plummer wrote: >>>> Hello, >>>> >>>> Please help review the following simple fix: >>>> >>>> http://cr.openjdk.java.net/~cjplummer/8246369/webrev.00/index.html >>>> https://bugs.openjdk.java.net/browse/JDK-8246369 >>>> >>>> Details are in the CR description. >>>> >>>> thanks, >>>> >>>> Chris >>> >> > From chris.plummer at oracle.com Tue Jun 16 23:41:47 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 16 Jun 2020 16:41:47 -0700 Subject: RFR(XS): 8246369: CodeCache.findBlobUnsafe(addr) sometimes asserts with valid address In-Reply-To: <4b2a8b6f-d857-4703-f103-9c16e469520e@oracle.com> References: <5c593a7d-8082-fbfe-a7ab-795b0c9aa707@oracle.com> <3c91793e-5871-33c6-a176-64692965d6b9@oracle.com> <4b2a8b6f-d857-4703-f103-9c16e469520e@oracle.com> Message-ID: <6bded345-0b4e-8cae-2fcc-83d255a45578@oracle.com> Thanks! On 6/16/20 4:40 PM, Alex Menkov wrote: > Hi Chris, > > LGTM. > > --alex > > On 06/16/2020 14:10, Chris Plummer wrote: >> Ping! Can I get one more review please? This is a very simple change >> and doesn't really require knowing much about SA. I bit of >> understanding of hotspot native heap memory allocations helps a bit >> (HeapBlock in heap.hpp). >> >> thanks, >> >> Chris >> >> On 6/15/20 4:40 PM, Chris Plummer wrote: >>> I'll fix the indenting. >>> >>> Thanks for the review! >>> >>> Chris >>> >>> On 6/15/20 3:12 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Chris, >>>> >>>> It looks good. >>>> ? 134???? if (Assert.ASSERTS_ENABLED) { >>>> 135 // The pointer to the HeapBlock that contains this blob is >>>> outside of the blob, >>>> 136 // but it shouldn't be an error to find a blob based on the >>>> pointer to the HeapBlock. >>>> 137 // The heap block header is padded out to an 8-byte boundary. >>>> See heap.hpp. The >>>> 138 // simplest way to compute the header size is just 2 * >>>> addressSize. >>>> 139 Assert.that(result.blobContains(start) || >>>> 140 result.blobContains(start.addOffsetTo(2 * >>>> VM.getVM().getAddressSize())), >>>> ? 141???????????????????? "found wrong CodeBlob"); >>>> ? 142???? } >>>> The lines 139-141 have wrong indent. >>>> No need for another webrev. >>>> >>>> Thanks >>>> Serguei >>>> >>>> >>>> On 6/15/20 13:03, Chris Plummer wrote: >>>>> Hello, >>>>> >>>>> Please help review the following simple fix: >>>>> >>>>> http://cr.openjdk.java.net/~cjplummer/8246369/webrev.00/index.html >>>>> https://bugs.openjdk.java.net/browse/JDK-8246369 >>>>> >>>>> Details are in the CR description. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>> >>> >> From serguei.spitsyn at oracle.com Tue Jun 16 23:47:17 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 16 Jun 2020 16:47:17 -0700 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> Message-ID: <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> Hi Dan, David and Yasumasa, On 6/16/20 07:39, Daniel D. Daugherty wrote: > On 6/15/20 9:28 PM, David Holmes wrote: >> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>> On 6/15/20 7:19 PM, David Holmes wrote: >>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>> Hi Dan, >>>>>> >>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not perform >>>>>>>>>>> at safepoint. >>>>>>>>>> >>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is >>>>>>>>>> not suspended: >>>>>>>>>> >>>>>>>>>> jvmtiError >>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>>>>>> object, info_ptr); >>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>> ???? // Some of the critical threads were not suspended. go >>>>>>>>>> to a safepoint and try again >>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>> object, info_ptr); >>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>> ???? err = op.result(); >>>>>>>>>> ?? } >>>>>>>>>> ?? return err; >>>>>>>>>> } /* end GetObject */ >>>>>>>>> >>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>> get_object_monitor_usage(). >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Monitor owner would be acquired from monitor object at first >>>>>>>>>>> [1], but it would perform concurrently. >>>>>>>>>>> If owner thread is not suspended, the owner might be changed >>>>>>>>>>> to others in subsequent code. >>>>>>>>>>> >>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>> >>>>>>>>>> The expectation is that when we find an owner thread it is >>>>>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>>>>> release the monitor. If it is not suspended we detect that >>>>>>>>>> and redo the whole query at a safepoint. >>>>>>>>> >>>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>>> suspending check. >>>>>>>> >>>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>>> safepoint but it only requires the Threads_lock. So yes the >>>>>>>> code is wrong. >>>>>>> >>>>>>> Which code is wrong? >>>>>>> >>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() >>>>>>> caller >>>>>>> has started the process of gathering the information while not at a >>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>> might be stale, but that's a bug in the agent code. >>>>>> >>>>>> The code tries to make sure that it either collects data about a >>>>>> monitor owned by a thread that is suspended, or else it collects >>>>>> that data at a safepoint. But the owning thread can be resumed >>>>>> just after the code determined it was suspended. The monitor can >>>>>> then be released and the information gathered not only stale but >>>>>> potentially completely wrong as it could now be owned by a >>>>>> different thread and will report that thread's entry count. >>>>> >>>>> If the agent is not using SuspendThread(), then as soon as >>>>> GetObjectMonitorUsage() returns to the caller the information >>>>> can be stale. In fact as soon as the implementation returns >>>>> from the safepoint that gathered the info, the target thread >>>>> could have moved on. >>>> >>>> That isn't the issue. That the info is stale is fine. But the >>>> expectation is that the information was actually an accurate >>>> snapshot of the state of the monitor at some point in time. The >>>> current code does not ensure that. >>> >>> Please explain. I clearly don't understand why you think the info >>> returned isn't "an accurate snapshot of the state of the monitor >>> at some point in time". >> >> Because it may not be a "snapshot" at all. There is no atomicity**. >> The reported owner thread may not own it any longer when the entry >> count is read, so straight away you may have the wrong entry count >> information. The set of threads trying to acquire the monitor, or >> wait on the monitor can change in unexpected ways. It would be >> possible for instance to report the same thread as being the owner, >> being blocked trying to enter the monitor, and being in the wait-set >> of the monitor - apparently all at the same time! >> >> ** even if the owner is suspended we don't have complete atomicity >> because threads can join the set of threads trying to enter the >> monitor (unless they are all suspended). > > Consider the case when the monitor's owner is _not_ suspended: > > ? - GetObjectMonitorUsage() uses a safepoint to gather the info about > ??? the object's monitor. Since we're at a safepoint, the info that > ??? we are gathering cannot change until we return from the safepoint. > ??? It is a snapshot and a valid one at that. > > Consider the case when the monitor's owner is suspended: > > ? - GetObjectMonitorUsage() will gather info about the object's > ??? monitor while _not_ at a safepoint. Assuming that no other > ??? thread is suspended, then entry_count can change because > ??? another thread can block on entry while we are gathering > ??? info. waiter_count and waiters can change if a thread was > ??? in a timed wait that has timed out and now that thread is > ??? blocked on re-entry. I don't think that notify_waiter_count > ??? and notify_waiters can change. > > ??? So in this case, the owner info and notify info is stable, > ??? but the entry_count and waiter info is not stable. > > Consider the case when the monitor is not owned: > > ? - GetObjectMonitorUsage() will start to gather info about the > ??? object's monitor while _not_ at a safepoint. If it finds a > ??? thread on the entry queue that is not suspended, then it will > ??? bail out and redo the info gather at a safepoint. I just > ??? noticed that it doesn't check for suspension for the threads > ??? on the waiters list so a timed Object.wait() call can cause > ??? some confusion here. > > ??? So in this case, the owner info is not stable if a thread > ??? comes out of a timed wait and reenters the monitor. This > ??? case is no different than if a "barger" thread comes in > ??? after the NULL owner field is observed and enters the > ??? monitor. We'll return that there is no owner, a list of > ??? suspended pending entry thread and a list of waiting > ??? threads. The reality is that the object's monitor is > ??? owned by the "barger" that completely bypassed the entry > ??? queue by virtue of seeing the NULL owner field at exactly > ??? the right time. > > So the owner field is only stable when we have an owner. If > that owner is not suspended, then the other fields are also > stable because we gathered the info at a safepoint. If the > owner is suspended, then the owner and notify info is stable, > but the entry_count and waiter info is not stable. > > If we have a NULL owner field, then the info is only stable > if you have a non-suspended thread on the entry list. Ouch! > That's deterministic, but not without some work. > > > Okay so only when we gather the info at a safepoint is all > of it a valid and stable snapshot. Unfortunately, we only > do that at a safepoint when the owner thread is not suspended > or if owner == NULL and one of the entry threads is not > suspended. If either of those conditions is not true, then > the different pieces of info is unstable to varying degrees. > > As for this claim: > >> It would be possible for instance to report the same thread >> as being the owner, being blocked trying to enter the monitor, >> and being in the wait-set of the monitor - apparently all at >> the same time! > > I can't figure out a way to make that scenario work. If the > thread is seen as the owner and is not suspended, then we > gather info at a safepoint. If it is suspended, then it can't > then be seen as on the entry queue or on the wait queue since > it is suspended. If it is seen on the entry queue and is not > suspended, then we gather info at a safepoint. If it is > suspended on the entry queue, then it can't be seen on the > wait queue. > > So the info instability of this API is bad, but it's not > quite that bad. :-) (That is a small mercy.) > > > Handshaking is not going to make this situation any better > for GetObjectMonitorUsage(). If the monitor is owned and we > handshake with the owner, the stability or instability of > the other fields remains the same as when SuspendThread is > used. Handshaking with all threads won't make the data as > stable as when at a safepoint because individual threads > can resume execution after doing their handshake so there > will still be field instability. > > > Short version: GetObjectMonitorUsage() should only gather > data at a safepoint. Yes, I've changed my mind. I agree with this. The advantages are: ?- the result is stable ?- the implementation can be simplified Performance impact is not very clear but should not be that big as suspending all the threads has some overhead too. I'm not sure if using handshakes can make performance better. Thanks, Serguei > Dan > >> >> David >> ----- >> >>> >>> >>>> >>>>> The only way to make sure you don't have stale information is >>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>> should have more clear about the possibility of returning stale >>>>> info. That's a question for Robert F. >>>>> >>>>> >>>>>> GetObjectMonitorUsage says nothing about thread's being suspended >>>>>> so I can't see how this could be construed as an agent bug. >>>>> >>>>> In your scenario above, you mention that the target thread was >>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>> was suspended, and then the target thread was resumed after >>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>> GetObjectMonitorUsage() was able to gather the info. >>>>> >>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>> and ResumeThread() are made by the agent and the agent should not >>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>> The calls were allowed to be made out of order so agent bug. >>>> >>>> Perhaps. I was thinking more generally about an independent resume, >>>> but you're right that doesn't really make a lot of sense. But when >>>> the spec says nothing about suspension ... >>> >>> And it is intentional that suspension is not required. JVM/DI and >>> JVM/PI >>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>> intentionally was designed to not require suspension. >>> >>> As I've said before, we could add a note about the data being >>> potentially >>> stale unless SuspendThread is used. I think of it like stat(2). You can >>> fetch the file's info, but there's no guarantee that the info is >>> current >>> by the time you process what you got back. Is it too much motherhood to >>> state that the data might be stale? I could go either way... >>> >>> >>>> >>>>>> Using a handshake on the owner thread will allow this to be fixed >>>>>> in the future without forcing/using any safepoints. >>>>> >>>>> I have to think about that which is why I'm avoiding talking about >>>>> handshakes in this thread. >>>> >>>> Effectively the handshake can "suspend" the thread whilst the >>>> monitor is queried. In effect the operation would create a >>>> per-thread safepoint. >>> >>> I "know" that, but I still need time to think about it and probably >>> see the code to see if there are holes... >>> >>> >>>> Semantically it is no different to the code actually suspending the >>>> owner thread, but it can't actually do that because suspends/resume >>>> don't nest. >>> >>> Yeah... we used have a suspend count back when we tracked internal and >>> external suspends separately. That was a nightmare... >>> >>> Dan >>> >>> >>>> >>>> Cheers, >>>> David >>>> >>>>> Dan >>>>> >>>>> >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread >>>>>>>>> state, it returns `true` when the thread is sleeping [3], or >>>>>>>>> when it performs in native [4]. >>>>>>>> >>>>>>>> Sure but if the thread is actually suspended it can't continue >>>>>>>> execution in the VM or in Java code. >>>>>>>> >>>>>>>>> >>>>>>>>>> This appears to be an optimisation for the assumed common >>>>>>>>>> case where threads are first suspended and then the monitors >>>>>>>>>> are queried. >>>>>>>>> >>>>>>>>> I agree with this, but I could find out it from JVMTI spec - >>>>>>>>> it just says "Get information about the object's monitor." >>>>>>>> >>>>>>>> Yes it was just an implementation optimisation, nothing to do >>>>>>>> with the spec. >>>>>>>> >>>>>>>>> GetObjectMonitorUsage() might return incorrect information in >>>>>>>>> some case. >>>>>>>>> >>>>>>>>> It starts with finding owner thread, but the owner might be >>>>>>>>> just before wakeup. >>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is >>>>>>>>> called at safepoint in any case. >>>>>>>> >>>>>>>> Except we're moving away from safepoints to using Handshakes, >>>>>>>> so this particular operation will require that the apparent >>>>>>>> owner is Handshake-safe (by entering a handshake with it) >>>>>>>> before querying the monitor. This would still be preferable I >>>>>>>> think to always using a safepoint for the entire operation. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> [3] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>> >>>>>>>>> [4] >>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> However there is still a potential bug as the thread reported >>>>>>>>>> as the owner may not be suspended at the time we first see >>>>>>>>>> it, and may release the monitor, but then it may get >>>>>>>>>> suspended before we call: >>>>>>>>>> >>>>>>>>>> ??owning_thread = >>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>> >>>>>>>>>> and so we think it is still the monitor owner and proceed to >>>>>>>>>> query the monitor information in a racy way. This can't >>>>>>>>>> happen when suspension itself requires a safepoint as the >>>>>>>>>> current thread won't go to that safepoint during this code. >>>>>>>>>> However, if suspension is implemented via a direct handshake >>>>>>>>>> with the target thread then we have a problem. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>> >>>>>>>>>>> [2] >>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>> >>>>>>> >>>>> >>> > From suenaga at oss.nttdata.com Wed Jun 17 00:23:44 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 17 Jun 2020 09:23:44 +0900 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> Message-ID: <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: > Hi Dan, David and Yasumasa, > > > On 6/16/20 07:39, Daniel D. Daugherty wrote: >> On 6/15/20 9:28 PM, David Holmes wrote: >>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>> >>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>> >>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>> >>>>>>>>>>> jvmtiError >>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>> ?? } >>>>>>>>>>> ?? return err; >>>>>>>>>>> } /* end GetObject */ >>>>>>>>>> >>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>> >>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>> >>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>> >>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>> >>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>> >>>>>>>> Which code is wrong? >>>>>>>> >>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>> has started the process of gathering the information while not at a >>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>> >>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>> >>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>> can be stale. In fact as soon as the implementation returns >>>>>> from the safepoint that gathered the info, the target thread >>>>>> could have moved on. >>>>> >>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>> >>>> Please explain. I clearly don't understand why you think the info >>>> returned isn't "an accurate snapshot of the state of the monitor >>>> at some point in time". >>> >>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>> >>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >> >> Consider the case when the monitor's owner is _not_ suspended: >> >> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >> ??? the object's monitor. Since we're at a safepoint, the info that >> ??? we are gathering cannot change until we return from the safepoint. >> ??? It is a snapshot and a valid one at that. >> >> Consider the case when the monitor's owner is suspended: >> >> ? - GetObjectMonitorUsage() will gather info about the object's >> ??? monitor while _not_ at a safepoint. Assuming that no other >> ??? thread is suspended, then entry_count can change because >> ??? another thread can block on entry while we are gathering >> ??? info. waiter_count and waiters can change if a thread was >> ??? in a timed wait that has timed out and now that thread is >> ??? blocked on re-entry. I don't think that notify_waiter_count >> ??? and notify_waiters can change. >> >> ??? So in this case, the owner info and notify info is stable, >> ??? but the entry_count and waiter info is not stable. >> >> Consider the case when the monitor is not owned: >> >> ? - GetObjectMonitorUsage() will start to gather info about the >> ??? object's monitor while _not_ at a safepoint. If it finds a >> ??? thread on the entry queue that is not suspended, then it will >> ??? bail out and redo the info gather at a safepoint. I just >> ??? noticed that it doesn't check for suspension for the threads >> ??? on the waiters list so a timed Object.wait() call can cause >> ??? some confusion here. >> >> ??? So in this case, the owner info is not stable if a thread >> ??? comes out of a timed wait and reenters the monitor. This >> ??? case is no different than if a "barger" thread comes in >> ??? after the NULL owner field is observed and enters the >> ??? monitor. We'll return that there is no owner, a list of >> ??? suspended pending entry thread and a list of waiting >> ??? threads. The reality is that the object's monitor is >> ??? owned by the "barger" that completely bypassed the entry >> ??? queue by virtue of seeing the NULL owner field at exactly >> ??? the right time. >> >> So the owner field is only stable when we have an owner. If >> that owner is not suspended, then the other fields are also >> stable because we gathered the info at a safepoint. If the >> owner is suspended, then the owner and notify info is stable, >> but the entry_count and waiter info is not stable. >> >> If we have a NULL owner field, then the info is only stable >> if you have a non-suspended thread on the entry list. Ouch! >> That's deterministic, but not without some work. >> >> >> Okay so only when we gather the info at a safepoint is all >> of it a valid and stable snapshot. Unfortunately, we only >> do that at a safepoint when the owner thread is not suspended >> or if owner == NULL and one of the entry threads is not >> suspended. If either of those conditions is not true, then >> the different pieces of info is unstable to varying degrees. >> >> As for this claim: >> >>> It would be possible for instance to report the same thread >>> as being the owner, being blocked trying to enter the monitor, >>> and being in the wait-set of the monitor - apparently all at >>> the same time! >> >> I can't figure out a way to make that scenario work. If the >> thread is seen as the owner and is not suspended, then we >> gather info at a safepoint. If it is suspended, then it can't >> then be seen as on the entry queue or on the wait queue since >> it is suspended. If it is seen on the entry queue and is not >> suspended, then we gather info at a safepoint. If it is >> suspended on the entry queue, then it can't be seen on the >> wait queue. >> >> So the info instability of this API is bad, but it's not >> quite that bad. :-) (That is a small mercy.) >> >> >> Handshaking is not going to make this situation any better >> for GetObjectMonitorUsage(). If the monitor is owned and we >> handshake with the owner, the stability or instability of >> the other fields remains the same as when SuspendThread is >> used. Handshaking with all threads won't make the data as >> stable as when at a safepoint because individual threads >> can resume execution after doing their handshake so there >> will still be field instability. >> >> >> Short version: GetObjectMonitorUsage() should only gather >> data at a safepoint. Yes, I've changed my mind. > > I agree with this. > The advantages are: > ?- the result is stable > ?- the implementation can be simplified > > Performance impact is not very clear but should not be that > big as suspending all the threads has some overhead too. > I'm not sure if using handshakes can make performance better. Ok, may I file it to JBS and fix it? Yasumasa > Thanks, > Serguei > >> Dan >> >>> >>> David >>> ----- >>> >>>> >>>> >>>>> >>>>>> The only way to make sure you don't have stale information is >>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>> should have more clear about the possibility of returning stale >>>>>> info. That's a question for Robert F. >>>>>> >>>>>> >>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>> >>>>>> In your scenario above, you mention that the target thread was >>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>> was suspended, and then the target thread was resumed after >>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>> >>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>> The calls were allowed to be made out of order so agent bug. >>>>> >>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>> >>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>> intentionally was designed to not require suspension. >>>> >>>> As I've said before, we could add a note about the data being potentially >>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>> fetch the file's info, but there's no guarantee that the info is current >>>> by the time you process what you got back. Is it too much motherhood to >>>> state that the data might be stale? I could go either way... >>>> >>>> >>>>> >>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>> >>>>>> I have to think about that which is why I'm avoiding talking about >>>>>> handshakes in this thread. >>>>> >>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>> >>>> I "know" that, but I still need time to think about it and probably >>>> see the code to see if there are holes... >>>> >>>> >>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>> >>>> Yeah... we used have a suspend count back when we tracked internal and >>>> external suspends separately. That was a nightmare... >>>> >>>> Dan >>>> >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>> >>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>> >>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>> >>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>> >>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>> >>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>> >>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>> >>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>> >>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>> >>>>>> >>>> >> > From david.holmes at oracle.com Wed Jun 17 01:20:58 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 17 Jun 2020 11:20:58 +1000 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> Message-ID: Hi Ioi, On 17/06/2020 6:14 am, Ioi Lam wrote: > https://bugs.openjdk.java.net/browse/JDK-8246019 > http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ > > > PerfClassTraceTime is (a rarely used feature) for measuring the time > spent during class linking and initialization. "A special command jcmd PerfCounter.print prints all performance counters in the process." How do you know this is a "rarely used feature"? I find it hard to evaluate whether this short-circuiting of the time tracing is reasonable or not. Obviously any monitoring mechanism should impose minimal overhead compared to what is being measured, and these timers fall short in that regard. But if these stats become meaningless then they may as well be removed. I think the serviceability folk (cc'd) need to evaluate this in the context of the M&M tools. > However, it's quite > expensive and it needs to start and stop a bunch of timers. With CDS, > it's quite often for the overhead of the timer itself to be much more > than the time it's trying to measure, giving unreliable measurement. > > In this patch, when it's clear that the init and linking will be very > quick, I disable the timer and count only the number of invocations. > This shows a small improvement in start-up I'm curious if you tried to forcing EagerInitialization to be true to see how that improves the baseline. I've always noticed eager_init in the code, but hadn't realized it is disabled by default. Cheers, David ----- > Results of " perf stat -r 100 bin/java -Xshare:on > -XX:SharedArchiveFile=jdk2.jsa -Xint -version " > > 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - > 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- > 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - > 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- > 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- > 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- > 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 (? 0.310) + > 59614612 59325177 (-289435)?? -----? 41.397? 41.738 (? 0.341) ++ > 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- > 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- > ================================================ > 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- > instruction delta =????? -284608??? -0.4774% > time??????? delta =?????? -0.434 ms -1.0435% > > The number of PerfClassTraceTime's used is reduced from 564 to 116 (so > we have an overhead of about 715 instructions per use, yikes!). From david.holmes at oracle.com Wed Jun 17 01:34:01 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 17 Jun 2020 11:34:01 +1000 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> Message-ID: <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> > Ok, may I file it to JBS and fix it? Go for it! :) Cheers, David On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: > On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >> Hi Dan, David and Yasumasa, >> >> >> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>> On 6/15/20 9:28 PM, David Holmes wrote: >>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>> Hi Dan, >>>>>>>> >>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>> >>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not >>>>>>>>>>>>> perform at safepoint. >>>>>>>>>>>> >>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is >>>>>>>>>>>> not suspended: >>>>>>>>>>>> >>>>>>>>>>>> jvmtiError >>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, >>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go >>>>>>>>>>>> to a safepoint and try again >>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>> ?? } >>>>>>>>>>>> ?? return err; >>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>> >>>>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>>>> get_object_monitor_usage(). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Monitor owner would be acquired from monitor object at >>>>>>>>>>>>> first [1], but it would perform concurrently. >>>>>>>>>>>>> If owner thread is not suspended, the owner might be >>>>>>>>>>>>> changed to others in subsequent code. >>>>>>>>>>>>> >>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>> >>>>>>>>>>>> The expectation is that when we find an owner thread it is >>>>>>>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>>>>>>> release the monitor. If it is not suspended we detect that >>>>>>>>>>>> and redo the whole query at a safepoint. >>>>>>>>>>> >>>>>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>>>>> suspending check. >>>>>>>>>> >>>>>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>>>>> safepoint but it only requires the Threads_lock. So yes the >>>>>>>>>> code is wrong. >>>>>>>>> >>>>>>>>> Which code is wrong? >>>>>>>>> >>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() >>>>>>>>> caller >>>>>>>>> has started the process of gathering the information while not >>>>>>>>> at a >>>>>>>>> safepoint. Thus the information returned by >>>>>>>>> GetObjectMonitorUsage() >>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>> >>>>>>>> The code tries to make sure that it either collects data about a >>>>>>>> monitor owned by a thread that is suspended, or else it collects >>>>>>>> that data at a safepoint. But the owning thread can be resumed >>>>>>>> just after the code determined it was suspended. The monitor can >>>>>>>> then be released and the information gathered not only stale but >>>>>>>> potentially completely wrong as it could now be owned by a >>>>>>>> different thread and will report that thread's entry count. >>>>>>> >>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>> from the safepoint that gathered the info, the target thread >>>>>>> could have moved on. >>>>>> >>>>>> That isn't the issue. That the info is stale is fine. But the >>>>>> expectation is that the information was actually an accurate >>>>>> snapshot of the state of the monitor at some point in time. The >>>>>> current code does not ensure that. >>>>> >>>>> Please explain. I clearly don't understand why you think the info >>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>> at some point in time". >>>> >>>> Because it may not be a "snapshot" at all. There is no atomicity**. >>>> The reported owner thread may not own it any longer when the entry >>>> count is read, so straight away you may have the wrong entry count >>>> information. The set of threads trying to acquire the monitor, or >>>> wait on the monitor can change in unexpected ways. It would be >>>> possible for instance to report the same thread as being the owner, >>>> being blocked trying to enter the monitor, and being in the wait-set >>>> of the monitor - apparently all at the same time! >>>> >>>> ** even if the owner is suspended we don't have complete atomicity >>>> because threads can join the set of threads trying to enter the >>>> monitor (unless they are all suspended). >>> >>> Consider the case when the monitor's owner is _not_ suspended: >>> >>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>> ??? the object's monitor. Since we're at a safepoint, the info that >>> ??? we are gathering cannot change until we return from the safepoint. >>> ??? It is a snapshot and a valid one at that. >>> >>> Consider the case when the monitor's owner is suspended: >>> >>> ? - GetObjectMonitorUsage() will gather info about the object's >>> ??? monitor while _not_ at a safepoint. Assuming that no other >>> ??? thread is suspended, then entry_count can change because >>> ??? another thread can block on entry while we are gathering >>> ??? info. waiter_count and waiters can change if a thread was >>> ??? in a timed wait that has timed out and now that thread is >>> ??? blocked on re-entry. I don't think that notify_waiter_count >>> ??? and notify_waiters can change. >>> >>> ??? So in this case, the owner info and notify info is stable, >>> ??? but the entry_count and waiter info is not stable. >>> >>> Consider the case when the monitor is not owned: >>> >>> ? - GetObjectMonitorUsage() will start to gather info about the >>> ??? object's monitor while _not_ at a safepoint. If it finds a >>> ??? thread on the entry queue that is not suspended, then it will >>> ??? bail out and redo the info gather at a safepoint. I just >>> ??? noticed that it doesn't check for suspension for the threads >>> ??? on the waiters list so a timed Object.wait() call can cause >>> ??? some confusion here. >>> >>> ??? So in this case, the owner info is not stable if a thread >>> ??? comes out of a timed wait and reenters the monitor. This >>> ??? case is no different than if a "barger" thread comes in >>> ??? after the NULL owner field is observed and enters the >>> ??? monitor. We'll return that there is no owner, a list of >>> ??? suspended pending entry thread and a list of waiting >>> ??? threads. The reality is that the object's monitor is >>> ??? owned by the "barger" that completely bypassed the entry >>> ??? queue by virtue of seeing the NULL owner field at exactly >>> ??? the right time. >>> >>> So the owner field is only stable when we have an owner. If >>> that owner is not suspended, then the other fields are also >>> stable because we gathered the info at a safepoint. If the >>> owner is suspended, then the owner and notify info is stable, >>> but the entry_count and waiter info is not stable. >>> >>> If we have a NULL owner field, then the info is only stable >>> if you have a non-suspended thread on the entry list. Ouch! >>> That's deterministic, but not without some work. >>> >>> >>> Okay so only when we gather the info at a safepoint is all >>> of it a valid and stable snapshot. Unfortunately, we only >>> do that at a safepoint when the owner thread is not suspended >>> or if owner == NULL and one of the entry threads is not >>> suspended. If either of those conditions is not true, then >>> the different pieces of info is unstable to varying degrees. >>> >>> As for this claim: >>> >>>> It would be possible for instance to report the same thread >>>> as being the owner, being blocked trying to enter the monitor, >>>> and being in the wait-set of the monitor - apparently all at >>>> the same time! >>> >>> I can't figure out a way to make that scenario work. If the >>> thread is seen as the owner and is not suspended, then we >>> gather info at a safepoint. If it is suspended, then it can't >>> then be seen as on the entry queue or on the wait queue since >>> it is suspended. If it is seen on the entry queue and is not >>> suspended, then we gather info at a safepoint. If it is >>> suspended on the entry queue, then it can't be seen on the >>> wait queue. >>> >>> So the info instability of this API is bad, but it's not >>> quite that bad. :-) (That is a small mercy.) >>> >>> >>> Handshaking is not going to make this situation any better >>> for GetObjectMonitorUsage(). If the monitor is owned and we >>> handshake with the owner, the stability or instability of >>> the other fields remains the same as when SuspendThread is >>> used. Handshaking with all threads won't make the data as >>> stable as when at a safepoint because individual threads >>> can resume execution after doing their handshake so there >>> will still be field instability. >>> >>> >>> Short version: GetObjectMonitorUsage() should only gather >>> data at a safepoint. Yes, I've changed my mind. >> >> I agree with this. >> The advantages are: >> ??- the result is stable >> ??- the implementation can be simplified >> >> Performance impact is not very clear but should not be that >> big as suspending all the threads has some overhead too. >> I'm not sure if using handshakes can make performance better. > > Ok, may I file it to JBS and fix it? > > Yasumasa > > >> Thanks, >> Serguei >> >>> Dan >>> >>>> >>>> David >>>> ----- >>>> >>>>> >>>>> >>>>>> >>>>>>> The only way to make sure you don't have stale information is >>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>> should have more clear about the possibility of returning stale >>>>>>> info. That's a question for Robert F. >>>>>>> >>>>>>> >>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>> suspended so I can't see how this could be construed as an agent >>>>>>>> bug. >>>>>>> >>>>>>> In your scenario above, you mention that the target thread was >>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>> was suspended, and then the target thread was resumed after >>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>> >>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>> >>>>>> Perhaps. I was thinking more generally about an independent >>>>>> resume, but you're right that doesn't really make a lot of sense. >>>>>> But when the spec says nothing about suspension ... >>>>> >>>>> And it is intentional that suspension is not required. JVM/DI and >>>>> JVM/PI >>>>> used to require suspension for these kinds of get-the-info APIs. >>>>> JVM/TI >>>>> intentionally was designed to not require suspension. >>>>> >>>>> As I've said before, we could add a note about the data being >>>>> potentially >>>>> stale unless SuspendThread is used. I think of it like stat(2). You >>>>> can >>>>> fetch the file's info, but there's no guarantee that the info is >>>>> current >>>>> by the time you process what you got back. Is it too much >>>>> motherhood to >>>>> state that the data might be stale? I could go either way... >>>>> >>>>> >>>>>> >>>>>>>> Using a handshake on the owner thread will allow this to be >>>>>>>> fixed in the future without forcing/using any safepoints. >>>>>>> >>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>> handshakes in this thread. >>>>>> >>>>>> Effectively the handshake can "suspend" the thread whilst the >>>>>> monitor is queried. In effect the operation would create a >>>>>> per-thread safepoint. >>>>> >>>>> I "know" that, but I still need time to think about it and probably >>>>> see the code to see if there are holes... >>>>> >>>>> >>>>>> Semantically it is no different to the code actually suspending >>>>>> the owner thread, but it can't actually do that because >>>>>> suspends/resume don't nest. >>>>> >>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>> external suspends separately. That was a nightmare... >>>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check >>>>>>>>>>> thread state, it returns `true` when the thread is sleeping >>>>>>>>>>> [3], or when it performs in native [4]. >>>>>>>>>> >>>>>>>>>> Sure but if the thread is actually suspended it can't continue >>>>>>>>>> execution in the VM or in Java code. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> This appears to be an optimisation for the assumed common >>>>>>>>>>>> case where threads are first suspended and then the monitors >>>>>>>>>>>> are queried. >>>>>>>>>>> >>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - >>>>>>>>>>> it just says "Get information about the object's monitor." >>>>>>>>>> >>>>>>>>>> Yes it was just an implementation optimisation, nothing to do >>>>>>>>>> with the spec. >>>>>>>>>> >>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in >>>>>>>>>>> some case. >>>>>>>>>>> >>>>>>>>>>> It starts with finding owner thread, but the owner might be >>>>>>>>>>> just before wakeup. >>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is >>>>>>>>>>> called at safepoint in any case. >>>>>>>>>> >>>>>>>>>> Except we're moving away from safepoints to using Handshakes, >>>>>>>>>> so this particular operation will require that the apparent >>>>>>>>>> owner is Handshake-safe (by entering a handshake with it) >>>>>>>>>> before querying the monitor. This would still be preferable I >>>>>>>>>> think to always using a safepoint for the entire operation. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [3] >>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>> >>>>>>>>>>> [4] >>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> However there is still a potential bug as the thread >>>>>>>>>>>> reported as the owner may not be suspended at the time we >>>>>>>>>>>> first see it, and may release the monitor, but then it may >>>>>>>>>>>> get suspended before we call: >>>>>>>>>>>> >>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>> >>>>>>>>>>>> and so we think it is still the monitor owner and proceed to >>>>>>>>>>>> query the monitor information in a racy way. This can't >>>>>>>>>>>> happen when suspension itself requires a safepoint as the >>>>>>>>>>>> current thread won't go to that safepoint during this code. >>>>>>>>>>>> However, if suspension is implemented via a direct handshake >>>>>>>>>>>> with the target thread then we have a problem. >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>> >>>>>>>>>>>>> [2] >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >> From ioi.lam at oracle.com Wed Jun 17 03:19:35 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 16 Jun 2020 20:19:35 -0700 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> Message-ID: <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> On 6/16/20 6:20 PM, David Holmes wrote: > Hi Ioi, > > On 17/06/2020 6:14 am, Ioi Lam wrote: >> https://bugs.openjdk.java.net/browse/JDK-8246019 >> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >> >> >> PerfClassTraceTime is (a rarely used feature) for measuring the time >> spent during class linking and initialization. > > "A special command jcmd PerfCounter.print > prints all performance counters in the process." > > How do you know this is a "rarely used feature"? Hi David, Sure, the counter will be dumped, but by "rarely used" -- I mean no one will find this particular counter useful, and no one will be actively looking at it. I changed two parts of the code -- class init and class linking. For class initialization, the counter may be useful for people who want to know how much time is spent in their functions, and my patch doesn't change that. It only avoids using the counter when a class has no , i.e., we know that the counter counts nothing (except for a logging statement). ===== For class linking, no user code is executed, so it only measures VM code. If it's useful for anyone, that would be VM engineers like me who are trying to optimize the speed of class loading. However, due to the overhead of the counter vs what it's trying to measure, the results are pretty meaningless. Note that I've not disabled the counter altogether. Instead, I disable it only when linking a CDS shared class, and we know that very little is happening for this class (e.g., no verification). I think the class linking timer might have been useful 15 years ago when it was introduced, or it might be useful today when CDS is disabled. But with CDS enabled, we are paying a constant price that seems to benefit no one. I think we should short-circuit it when it seems appropriate. If this indeed causes problems for our users, it's easy to re-enable it. That's better than just keeping this forever just because we're afraid to touch anything. > > I find it hard to evaluate whether this short-circuiting of the time > tracing is reasonable or not. Obviously any monitoring mechanism > should impose minimal overhead compared to what is being measured, and > these timers fall short in that regard. But if these stats become > meaningless then they may as well be removed. > > I think the serviceability folk (cc'd) need to evaluate this in the > context of the M&M tools. > >> However, it's quite expensive and it needs to start and stop a bunch >> of timers. With CDS, it's quite often for the overhead of the timer >> itself to be much more than the time it's trying to measure, giving >> unreliable measurement. >> >> In this patch, when it's clear that the init and linking will be very >> quick, I disable the timer and count only the number of invocations. >> This shows a small improvement in start-up > > I'm curious if you tried to forcing EagerInitialization to be true to > see how that improves the baseline. I've always noticed eager_init in > the code, but hadn't realized it is disabled by default. > I think it cannot be done by default, as it will violate the JLS. A class can be initialized only when it's touched by bytecodes. It can also backfire as we may load many classes without initializing them. E.g., during bytecode verification, we load many classes and just check that one is a supertype of another. Thanks - Ioi > Cheers, > David > ----- > >> Results of " perf stat -r 100 bin/java -Xshare:on >> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >> >> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 (? 0.310) + >> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 (? 0.341) ++ >> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >> ================================================ >> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >> instruction delta =????? -284608??? -0.4774% >> time??????? delta =?????? -0.434 ms -1.0435% >> >> The number of PerfClassTraceTime's used is reduced from 564 to 116 >> (so we have an overhead of about 715 instructions per use, yikes!). From yumin.qi at oracle.com Wed Jun 17 03:54:51 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Tue, 16 Jun 2020 20:54:51 -0700 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> Message-ID: ? product(bool, UsePerfData, true,????????????????????????????????????????? \ ????????? "Flag to disable jvmstat instrumentation for performance testing "\ ????????? "and problem isolation purposes")???????????????????????????????? \ The flag default value set to true --- should we change that? If the flag set to false at default, performance can benefit from that. If users want to collect performance data, should explicitly turn it on. Thanks Yumin On 6/16/20 8:19 PM, Ioi Lam wrote: > > > On 6/16/20 6:20 PM, David Holmes wrote: >> Hi Ioi, >> >> On 17/06/2020 6:14 am, Ioi Lam wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>> >>> >>> PerfClassTraceTime is (a rarely used feature) for measuring the time >>> spent during class linking and initialization. >> >> "A special command jcmd PerfCounter.print >> prints all performance counters in the process." >> >> How do you know this is a "rarely used feature"? > Hi David, > > Sure, the counter will be dumped, but by "rarely used" -- I mean no > one will find this particular counter useful, and no one will be > actively looking at it. > > I changed two parts of the code -- class init and class linking. > > For class initialization, the counter may be useful for people who > want to know how much time is spent in their functions, and > my patch doesn't change that. It only avoids using the counter when a > class has no , i.e., we know that the counter counts nothing > (except for a logging statement). > > ===== > > For class linking, no user code is executed, so it only measures VM > code. If it's useful for anyone, that would be VM engineers like me > who are trying to optimize the speed of class loading. However, due to > the overhead of the counter vs what it's trying to measure, the > results are pretty meaningless. > > Note that I've not disabled the counter altogether. Instead, I disable > it only when linking a CDS shared class, and we know that very little > is happening for this class (e.g., no verification). > > I think the class linking timer might have been useful 15 years ago > when it was introduced, or it might be useful today when CDS is > disabled. But with CDS enabled, we are paying a constant price that > seems to benefit no one. > > I think we should short-circuit it when it seems appropriate. If this > indeed causes problems for our users, it's easy to re-enable it. > That's better than just keeping this forever just because we're afraid > to touch anything. > >> >> I find it hard to evaluate whether this short-circuiting of the time >> tracing is reasonable or not. Obviously any monitoring mechanism >> should impose minimal overhead compared to what is being measured, >> and these timers fall short in that regard. But if these stats become >> meaningless then they may as well be removed. >> >> I think the serviceability folk (cc'd) need to evaluate this in the >> context of the M&M tools. >> >>> However, it's quite expensive and it needs to start and stop a bunch >>> of timers. With CDS, it's quite often for the overhead of the timer >>> itself to be much more than the time it's trying to measure, giving >>> unreliable measurement. >>> >>> In this patch, when it's clear that the init and linking will be >>> very quick, I disable the timer and count only the number of >>> invocations. This shows a small improvement in start-up >> >> I'm curious if you tried to forcing EagerInitialization to be true to >> see how that improves the baseline. I've always noticed eager_init in >> the code, but hadn't realized it is disabled by default. >> > > I think it cannot be done by default, as it will violate the JLS. A > class can be initialized only when it's touched by bytecodes. > > It can also backfire as we may load many classes without initializing > them. E.g., during bytecode verification, we load many classes and > just check that one is a supertype of another. > > Thanks > - Ioi > >> Cheers, >> David >> ----- >> >>> Results of " perf stat -r 100 bin/java -Xshare:on >>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>> >>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 (? 0.310) + >>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 (? 0.341) ++ >>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>> ================================================ >>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>> instruction delta =????? -284608??? -0.4774% >>> time??????? delta =?????? -0.434 ms -1.0435% >>> >>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>> (so we have an overhead of about 715 instructions per use, yikes!). > From serguei.spitsyn at oracle.com Wed Jun 17 05:37:36 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 16 Jun 2020 22:37:36 -0700 Subject: Question about GetObjectMonitorUsage() JVMTI function In-Reply-To: <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> Message-ID: <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> Yes. It seems we have a consensus. Thank you for taking care about it. Thanks, Serguei On 6/16/20 18:34, David Holmes wrote: >> Ok, may I file it to JBS and fix it? > > Go for it! :) > > Cheers, > David > > On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>> Hi Dan, David and Yasumasa, >>> >>> >>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>> Hi Dan, >>>>>>>>> >>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>> >>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not >>>>>>>>>>>>>> perform at safepoint. >>>>>>>>>>>>> >>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target >>>>>>>>>>>>> is not suspended: >>>>>>>>>>>>> >>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. >>>>>>>>>>>>> go to a safepoint and try again >>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>> ?? } >>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>> >>>>>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>>>>> get_object_monitor_usage(). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at >>>>>>>>>>>>>> first [1], but it would perform concurrently. >>>>>>>>>>>>>> If owner thread is not suspended, the owner might be >>>>>>>>>>>>>> changed to others in subsequent code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>> >>>>>>>>>>>>> The expectation is that when we find an owner thread it is >>>>>>>>>>>>> either suspended or not. If it is suspended then it cannot >>>>>>>>>>>>> release the monitor. If it is not suspended we detect that >>>>>>>>>>>>> and redo the whole query at a safepoint. >>>>>>>>>>>> >>>>>>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>>>>>> suspending check. >>>>>>>>>>> >>>>>>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>>>>>> safepoint but it only requires the Threads_lock. So yes the >>>>>>>>>>> code is wrong. >>>>>>>>>> >>>>>>>>>> Which code is wrong? >>>>>>>>>> >>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>> has started the process of gathering the information while >>>>>>>>>> not at a >>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>> >>>>>>>>> The code tries to make sure that it either collects data about >>>>>>>>> a monitor owned by a thread that is suspended, or else it >>>>>>>>> collects that data at a safepoint. But the owning thread can >>>>>>>>> be resumed just after the code determined it was suspended. >>>>>>>>> The monitor can then be released and the information gathered >>>>>>>>> not only stale but potentially completely wrong as it could >>>>>>>>> now be owned by a different thread and will report that >>>>>>>>> thread's entry count. >>>>>>>> >>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>> could have moved on. >>>>>>> >>>>>>> That isn't the issue. That the info is stale is fine. But the >>>>>>> expectation is that the information was actually an accurate >>>>>>> snapshot of the state of the monitor at some point in time. The >>>>>>> current code does not ensure that. >>>>>> >>>>>> Please explain. I clearly don't understand why you think the info >>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>> at some point in time". >>>>> >>>>> Because it may not be a "snapshot" at all. There is no >>>>> atomicity**. The reported owner thread may not own it any longer >>>>> when the entry count is read, so straight away you may have the >>>>> wrong entry count information. The set of threads trying to >>>>> acquire the monitor, or wait on the monitor can change in >>>>> unexpected ways. It would be possible for instance to report the >>>>> same thread as being the owner, being blocked trying to enter the >>>>> monitor, and being in the wait-set of the monitor - apparently all >>>>> at the same time! >>>>> >>>>> ** even if the owner is suspended we don't have complete atomicity >>>>> because threads can join the set of threads trying to enter the >>>>> monitor (unless they are all suspended). >>>> >>>> Consider the case when the monitor's owner is _not_ suspended: >>>> >>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>> ??? we are gathering cannot change until we return from the safepoint. >>>> ??? It is a snapshot and a valid one at that. >>>> >>>> Consider the case when the monitor's owner is suspended: >>>> >>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>> ??? thread is suspended, then entry_count can change because >>>> ??? another thread can block on entry while we are gathering >>>> ??? info. waiter_count and waiters can change if a thread was >>>> ??? in a timed wait that has timed out and now that thread is >>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>> ??? and notify_waiters can change. >>>> >>>> ??? So in this case, the owner info and notify info is stable, >>>> ??? but the entry_count and waiter info is not stable. >>>> >>>> Consider the case when the monitor is not owned: >>>> >>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>> ??? thread on the entry queue that is not suspended, then it will >>>> ??? bail out and redo the info gather at a safepoint. I just >>>> ??? noticed that it doesn't check for suspension for the threads >>>> ??? on the waiters list so a timed Object.wait() call can cause >>>> ??? some confusion here. >>>> >>>> ??? So in this case, the owner info is not stable if a thread >>>> ??? comes out of a timed wait and reenters the monitor. This >>>> ??? case is no different than if a "barger" thread comes in >>>> ??? after the NULL owner field is observed and enters the >>>> ??? monitor. We'll return that there is no owner, a list of >>>> ??? suspended pending entry thread and a list of waiting >>>> ??? threads. The reality is that the object's monitor is >>>> ??? owned by the "barger" that completely bypassed the entry >>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>> ??? the right time. >>>> >>>> So the owner field is only stable when we have an owner. If >>>> that owner is not suspended, then the other fields are also >>>> stable because we gathered the info at a safepoint. If the >>>> owner is suspended, then the owner and notify info is stable, >>>> but the entry_count and waiter info is not stable. >>>> >>>> If we have a NULL owner field, then the info is only stable >>>> if you have a non-suspended thread on the entry list. Ouch! >>>> That's deterministic, but not without some work. >>>> >>>> >>>> Okay so only when we gather the info at a safepoint is all >>>> of it a valid and stable snapshot. Unfortunately, we only >>>> do that at a safepoint when the owner thread is not suspended >>>> or if owner == NULL and one of the entry threads is not >>>> suspended. If either of those conditions is not true, then >>>> the different pieces of info is unstable to varying degrees. >>>> >>>> As for this claim: >>>> >>>>> It would be possible for instance to report the same thread >>>>> as being the owner, being blocked trying to enter the monitor, >>>>> and being in the wait-set of the monitor - apparently all at >>>>> the same time! >>>> >>>> I can't figure out a way to make that scenario work. If the >>>> thread is seen as the owner and is not suspended, then we >>>> gather info at a safepoint. If it is suspended, then it can't >>>> then be seen as on the entry queue or on the wait queue since >>>> it is suspended. If it is seen on the entry queue and is not >>>> suspended, then we gather info at a safepoint. If it is >>>> suspended on the entry queue, then it can't be seen on the >>>> wait queue. >>>> >>>> So the info instability of this API is bad, but it's not >>>> quite that bad. :-) (That is a small mercy.) >>>> >>>> >>>> Handshaking is not going to make this situation any better >>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>> handshake with the owner, the stability or instability of >>>> the other fields remains the same as when SuspendThread is >>>> used. Handshaking with all threads won't make the data as >>>> stable as when at a safepoint because individual threads >>>> can resume execution after doing their handshake so there >>>> will still be field instability. >>>> >>>> >>>> Short version: GetObjectMonitorUsage() should only gather >>>> data at a safepoint. Yes, I've changed my mind. >>> >>> I agree with this. >>> The advantages are: >>> ??- the result is stable >>> ??- the implementation can be simplified >>> >>> Performance impact is not very clear but should not be that >>> big as suspending all the threads has some overhead too. >>> I'm not sure if using handshakes can make performance better. >> >> Ok, may I file it to JBS and fix it? >> >> Yasumasa >> >> >>> Thanks, >>> Serguei >>> >>>> Dan >>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>>> The only way to make sure you don't have stale information is >>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>> should have more clear about the possibility of returning stale >>>>>>>> info. That's a question for Robert F. >>>>>>>> >>>>>>>> >>>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>>> suspended so I can't see how this could be construed as an >>>>>>>>> agent bug. >>>>>>>> >>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>> >>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>> resume the target thread while also calling >>>>>>>> GetObjectMonitorUsage(). >>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>> >>>>>>> Perhaps. I was thinking more generally about an independent >>>>>>> resume, but you're right that doesn't really make a lot of >>>>>>> sense. But when the spec says nothing about suspension ... >>>>>> >>>>>> And it is intentional that suspension is not required. JVM/DI and >>>>>> JVM/PI >>>>>> used to require suspension for these kinds of get-the-info APIs. >>>>>> JVM/TI >>>>>> intentionally was designed to not require suspension. >>>>>> >>>>>> As I've said before, we could add a note about the data being >>>>>> potentially >>>>>> stale unless SuspendThread is used. I think of it like stat(2). >>>>>> You can >>>>>> fetch the file's info, but there's no guarantee that the info is >>>>>> current >>>>>> by the time you process what you got back. Is it too much >>>>>> motherhood to >>>>>> state that the data might be stale? I could go either way... >>>>>> >>>>>> >>>>>>> >>>>>>>>> Using a handshake on the owner thread will allow this to be >>>>>>>>> fixed in the future without forcing/using any safepoints. >>>>>>>> >>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>> handshakes in this thread. >>>>>>> >>>>>>> Effectively the handshake can "suspend" the thread whilst the >>>>>>> monitor is queried. In effect the operation would create a >>>>>>> per-thread safepoint. >>>>>> >>>>>> I "know" that, but I still need time to think about it and probably >>>>>> see the code to see if there are holes... >>>>>> >>>>>> >>>>>>> Semantically it is no different to the code actually suspending >>>>>>> the owner thread, but it can't actually do that because >>>>>>> suspends/resume don't nest. >>>>>> >>>>>> Yeah... we used have a suspend count back when we tracked >>>>>> internal and >>>>>> external suspends separately. That was a nightmare... >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check >>>>>>>>>>>> thread state, it returns `true` when the thread is sleeping >>>>>>>>>>>> [3], or when it performs in native [4]. >>>>>>>>>>> >>>>>>>>>>> Sure but if the thread is actually suspended it can't >>>>>>>>>>> continue execution in the VM or in Java code. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> This appears to be an optimisation for the assumed common >>>>>>>>>>>>> case where threads are first suspended and then the >>>>>>>>>>>>> monitors are queried. >>>>>>>>>>>> >>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec >>>>>>>>>>>> - it just says "Get information about the object's monitor." >>>>>>>>>>> >>>>>>>>>>> Yes it was just an implementation optimisation, nothing to >>>>>>>>>>> do with the spec. >>>>>>>>>>> >>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information >>>>>>>>>>>> in some case. >>>>>>>>>>>> >>>>>>>>>>>> It starts with finding owner thread, but the owner might be >>>>>>>>>>>> just before wakeup. >>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is >>>>>>>>>>>> called at safepoint in any case. >>>>>>>>>>> >>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>> Handshakes, so this particular operation will require that >>>>>>>>>>> the apparent owner is Handshake-safe (by entering a >>>>>>>>>>> handshake with it) before querying the monitor. This would >>>>>>>>>>> still be preferable I think to always using a safepoint for >>>>>>>>>>> the entire operation. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [3] >>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>> >>>>>>>>>>>> [4] >>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> However there is still a potential bug as the thread >>>>>>>>>>>>> reported as the owner may not be suspended at the time we >>>>>>>>>>>>> first see it, and may release the monitor, but then it may >>>>>>>>>>>>> get suspended before we call: >>>>>>>>>>>>> >>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>> >>>>>>>>>>>>> and so we think it is still the monitor owner and proceed >>>>>>>>>>>>> to query the monitor information in a racy way. This can't >>>>>>>>>>>>> happen when suspension itself requires a safepoint as the >>>>>>>>>>>>> current thread won't go to that safepoint during this >>>>>>>>>>>>> code. However, if suspension is implemented via a direct >>>>>>>>>>>>> handshake with the target thread then we have a problem. >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>> >>>>>>>>>>>>>> [2] >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>> From markus.gaisbauer at gmail.com Wed Jun 17 07:57:45 2020 From: markus.gaisbauer at gmail.com (Markus Gaisbauer) Date: Wed, 17 Jun 2020 09:57:45 +0200 Subject: JVMTI callback SampledObjectAlloc always fires for first allocation in a new thread In-Reply-To: References: Message-ID: Hi Jean, Thank you for having a look at this. I attached the code of my basic JVMTI agent. I ran my tests on Windows. Maybe this 0xf1f1f1f1f1f1f1f1 is a Windows thing and Linux initializes the memory to all zeros. Regards, Markus On Tue, Jun 16, 2020 at 2:25 AM Jean Christophe Beyler wrote: > Hi Markus, > > I played around adding your Java code in the testing framework and I don't > get exactly the same failure as you do. Basically, I get about 5% samples > compared to the number of threads, whereas you seem to get a sample for > each element. Could you add the code you used for the agent so I can see if > you are doing something different than I am in that regard? > > This doesn't change the issue, I'm just curious why you seem to be > exposing it more. I'm still digging into what would be the right solution > for this. > > Thanks, > Jc > > On Mon, Jun 15, 2020 at 9:53 AM Jean Christophe Beyler < > jcbeyler at google.com> wrote: > >> Hi Markus, >> >> I created: >> https://bugs.openjdk.java.net/browse/JDK-8247615 >> >> And I'll see what needs to be done for it :) >> Jc >> >> On Fri, Jun 5, 2020 at 3:45 AM Markus Gaisbauer < >> markus.gaisbauer at gmail.com> wrote: >> >>> Hi, >>> >>> JVMTI callback SampledObjectAlloc is currently always called for the >>> first allocation of a thread. This generates a lot of bias in an >>> application that regularly starts new threads. >>> >>> I tested this with latest Java 11 and Java 15. >>> >>> E.g. here is a sample that creates 100 threads and allocates one object >>> in each thread. >>> >>> public class AllocationProfilingBiasReproducer { >>> public static void main(String[] args) throws Exception { >>> for (int i = 0; i < 100; i++) { >>> new Thread(new Task(), "Task " + i).start(); >>> Thread.sleep(1); >>> } >>> Thread.sleep(1000); >>> } >>> private static class Task implements Runnable { >>> @Override >>> public void run() { >>> new A(); >>> } >>> } >>> private static class A { >>> } >>> } >>> >>> I built a simple JVMTI agent that registers SampledObjectAlloc callback >>> and sets interval to 1 MB with SetHeapSamplingInterval. The callback simply >>> logs thread name and class name of allocated object. >>> >>> I see the following output: >>> >>> SampledObjectAlloc Ljava/lang/String; via Task 0 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 1 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 2 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 3 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 4 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 5 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 6 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 7 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 8 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 9 >>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 10 >>> ... >>> >>> This is not expected. >>> >>> I set a breakpoint in my SampledObjectAlloc callback and observed the >>> following: >>> >>> In MemAllocator::Allocation::notify_allocation_jvmti_sampler() the local >>> var bytes_since_last is always 0xf1f1f1f1f1f1f1f1 for first allocation of a >>> thread. So first allocation is always reported to my agent. >>> >>> ThreadLocalAllocBuffer::_bytes_since_last_sample_point does not seem to >>> be explicitly initialized before accessing it for the first time. I assume >>> 0xf1f1f1f1f1f1f1f1 is a default value provided by some Hotspot allocator. >>> Only after the first event fired, notify_allocation_jvmti_sampler >>> calls ThreadLocalAllocBuffer::set_sample_end which initializes >>> _bytes_since_last_sample_point to a proper value. >>> >>> I am looking for someone who could create a JIRA ticket for this. >>> >>> Regards, >>> Markus >>> >> >> >> -- >> >> Thanks, >> Jc >> > > > -- > > Thanks, > Jc > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ?#include #include #include #include #include #include void checkError(jvmtiEnv* jvmti, jvmtiError error) { if (error != JVMTI_ERROR_NONE) { char* errorName = nullptr; jvmti->GetErrorName(error, &errorName); fprintf(stderr, "%s\n", errorName); exit(1); } } static void JNICALL SampledObjectAlloc(jvmtiEnv *jvmti, JNIEnv *jni, jthread thread, jobject object, jclass object_klass, jlong size) { jvmtiThreadInfo threadInfo; jvmtiError error = jvmti->GetThreadInfo(thread, &threadInfo); checkError(jvmti, error); std::string threadName = threadInfo.name ? threadInfo.name : ""; char* classSignaturePtr = nullptr; error = jvmti->GetClassSignature(object_klass, &classSignaturePtr, nullptr); checkError(jvmti, error); std::string classSignature = classSignaturePtr ? classSignaturePtr : ""; fprintf(stderr, "SampledObjectAlloc %s via %s\n", classSignature.c_str(), threadName.c_str()); jvmti->Deallocate((unsigned char*) threadInfo.name); jni->DeleteLocalRef(threadInfo.context_class_loader); jni->DeleteLocalRef(threadInfo.thread_group); } extern "C" JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved) { jvmtiEnv* jvmti = nullptr; jint const r = jvm->GetEnv(reinterpret_cast(&jvmti), JVMTI_VERSION_1_2); if (r != JNI_OK) { fprintf(stderr, "GetEnv(JVMTI_VERSION_1_2) failed with %d.\n", r); } jvmtiError error; jvmtiCapabilities capabilities; memset(&capabilities, 0, sizeof(capabilities)); capabilities.can_generate_sampled_object_alloc_events = 1; jvmti->AddCapabilities(&capabilities); jvmtiEventCallbacks callbacks {}; callbacks.SampledObjectAlloc = SampledObjectAlloc; error = jvmti->SetEventCallbacks(&callbacks, sizeof(jvmtiEventCallbacks)); checkError(jvmti, error); error = jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_SAMPLED_OBJECT_ALLOC, nullptr); checkError(jvmti, error); jvmti->SetHeapSamplingInterval(1024 * 1024); return JNI_OK; } extern "C" JNIEXPORT void JNICALL Agent_OnUnload(JavaVM *jvm) { } From markus.gaisbauer at gmail.com Wed Jun 17 08:00:16 2020 From: markus.gaisbauer at gmail.com (Markus Gaisbauer) Date: Wed, 17 Jun 2020 10:00:16 +0200 Subject: JVMTI callback SampledObjectAlloc always fires for first allocation in a new thread In-Reply-To: References: Message-ID: Please forget about the Windows thing. I forgot that my colleague saw the same 0xf1f1f1f1f1f1f1f1 also on Linux. Markus On Wed, Jun 17, 2020 at 9:57 AM Markus Gaisbauer wrote: > Hi Jean, > > Thank you for having a look at this. > > I attached the code of my basic JVMTI agent. I ran my tests on Windows. > Maybe this 0xf1f1f1f1f1f1f1f1 is a Windows thing and Linux initializes the > memory to all zeros. > > Regards, > Markus > > > On Tue, Jun 16, 2020 at 2:25 AM Jean Christophe Beyler < > jcbeyler at google.com> wrote: > >> Hi Markus, >> >> I played around adding your Java code in the testing framework and I >> don't get exactly the same failure as you do. Basically, I get about 5% >> samples compared to the number of threads, whereas you seem to get a sample >> for each element. Could you add the code you used for the agent so I can >> see if you are doing something different than I am in that regard? >> >> This doesn't change the issue, I'm just curious why you seem to be >> exposing it more. I'm still digging into what would be the right solution >> for this. >> >> Thanks, >> Jc >> >> On Mon, Jun 15, 2020 at 9:53 AM Jean Christophe Beyler < >> jcbeyler at google.com> wrote: >> >>> Hi Markus, >>> >>> I created: >>> https://bugs.openjdk.java.net/browse/JDK-8247615 >>> >>> And I'll see what needs to be done for it :) >>> Jc >>> >>> On Fri, Jun 5, 2020 at 3:45 AM Markus Gaisbauer < >>> markus.gaisbauer at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> JVMTI callback SampledObjectAlloc is currently always called for the >>>> first allocation of a thread. This generates a lot of bias in an >>>> application that regularly starts new threads. >>>> >>>> I tested this with latest Java 11 and Java 15. >>>> >>>> E.g. here is a sample that creates 100 threads and allocates one object >>>> in each thread. >>>> >>>> public class AllocationProfilingBiasReproducer { >>>> public static void main(String[] args) throws Exception { >>>> for (int i = 0; i < 100; i++) { >>>> new Thread(new Task(), "Task " + i).start(); >>>> Thread.sleep(1); >>>> } >>>> Thread.sleep(1000); >>>> } >>>> private static class Task implements Runnable { >>>> @Override >>>> public void run() { >>>> new A(); >>>> } >>>> } >>>> private static class A { >>>> } >>>> } >>>> >>>> I built a simple JVMTI agent that registers SampledObjectAlloc callback >>>> and sets interval to 1 MB with SetHeapSamplingInterval. The callback simply >>>> logs thread name and class name of allocated object. >>>> >>>> I see the following output: >>>> >>>> SampledObjectAlloc Ljava/lang/String; via Task 0 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 1 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 2 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 3 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 4 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 5 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 6 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 7 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 8 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 9 >>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 10 >>>> ... >>>> >>>> This is not expected. >>>> >>>> I set a breakpoint in my SampledObjectAlloc callback and observed the >>>> following: >>>> >>>> In MemAllocator::Allocation::notify_allocation_jvmti_sampler() the >>>> local var bytes_since_last is always 0xf1f1f1f1f1f1f1f1 for first >>>> allocation of a thread. So first allocation is always reported to my agent. >>>> >>>> ThreadLocalAllocBuffer::_bytes_since_last_sample_point does not seem to >>>> be explicitly initialized before accessing it for the first time. I assume >>>> 0xf1f1f1f1f1f1f1f1 is a default value provided by some Hotspot allocator. >>>> Only after the first event fired, notify_allocation_jvmti_sampler >>>> calls ThreadLocalAllocBuffer::set_sample_end which initializes >>>> _bytes_since_last_sample_point to a proper value. >>>> >>>> I am looking for someone who could create a JIRA ticket for this. >>>> >>>> Regards, >>>> Markus >>>> >>> >>> >>> -- >>> >>> Thanks, >>> Jc >>> >> >> >> -- >> >> Thanks, >> Jc >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From suenaga at oss.nttdata.com Wed Jun 17 09:18:45 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 17 Jun 2020 18:18:45 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> Message-ID: (Change subject for RFR) Hi, I filed it to JBS and upload a webrev for it. Could you review it? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ This change has passed tests on submit repo. Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. Thanks, Yasumasa On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: > Yes. It seems we have a consensus. > Thank you for taking care about it. > > Thanks, > Serguei > > > On 6/16/20 18:34, David Holmes wrote: >>> Ok, may I file it to JBS and fix it? >> >> Go for it! :) >> >> Cheers, >> David >> >> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>> Hi Dan, David and Yasumasa, >>>> >>>> >>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>> Hi Dan, >>>>>>>>>> >>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>> >>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>> >>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>> >>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>> >>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>> >>>>>>>>>>> Which code is wrong? >>>>>>>>>>> >>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>> >>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>> >>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>> could have moved on. >>>>>>>> >>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>> >>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>> at some point in time". >>>>>> >>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>> >>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>> >>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>> >>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>> ??? It is a snapshot and a valid one at that. >>>>> >>>>> Consider the case when the monitor's owner is suspended: >>>>> >>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>> ??? thread is suspended, then entry_count can change because >>>>> ??? another thread can block on entry while we are gathering >>>>> ??? info. waiter_count and waiters can change if a thread was >>>>> ??? in a timed wait that has timed out and now that thread is >>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>> ??? and notify_waiters can change. >>>>> >>>>> ??? So in this case, the owner info and notify info is stable, >>>>> ??? but the entry_count and waiter info is not stable. >>>>> >>>>> Consider the case when the monitor is not owned: >>>>> >>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>> ??? thread on the entry queue that is not suspended, then it will >>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>> ??? noticed that it doesn't check for suspension for the threads >>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>> ??? some confusion here. >>>>> >>>>> ??? So in this case, the owner info is not stable if a thread >>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>> ??? case is no different than if a "barger" thread comes in >>>>> ??? after the NULL owner field is observed and enters the >>>>> ??? monitor. We'll return that there is no owner, a list of >>>>> ??? suspended pending entry thread and a list of waiting >>>>> ??? threads. The reality is that the object's monitor is >>>>> ??? owned by the "barger" that completely bypassed the entry >>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>> ??? the right time. >>>>> >>>>> So the owner field is only stable when we have an owner. If >>>>> that owner is not suspended, then the other fields are also >>>>> stable because we gathered the info at a safepoint. If the >>>>> owner is suspended, then the owner and notify info is stable, >>>>> but the entry_count and waiter info is not stable. >>>>> >>>>> If we have a NULL owner field, then the info is only stable >>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>> That's deterministic, but not without some work. >>>>> >>>>> >>>>> Okay so only when we gather the info at a safepoint is all >>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>> do that at a safepoint when the owner thread is not suspended >>>>> or if owner == NULL and one of the entry threads is not >>>>> suspended. If either of those conditions is not true, then >>>>> the different pieces of info is unstable to varying degrees. >>>>> >>>>> As for this claim: >>>>> >>>>>> It would be possible for instance to report the same thread >>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>> and being in the wait-set of the monitor - apparently all at >>>>>> the same time! >>>>> >>>>> I can't figure out a way to make that scenario work. If the >>>>> thread is seen as the owner and is not suspended, then we >>>>> gather info at a safepoint. If it is suspended, then it can't >>>>> then be seen as on the entry queue or on the wait queue since >>>>> it is suspended. If it is seen on the entry queue and is not >>>>> suspended, then we gather info at a safepoint. If it is >>>>> suspended on the entry queue, then it can't be seen on the >>>>> wait queue. >>>>> >>>>> So the info instability of this API is bad, but it's not >>>>> quite that bad. :-) (That is a small mercy.) >>>>> >>>>> >>>>> Handshaking is not going to make this situation any better >>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>> handshake with the owner, the stability or instability of >>>>> the other fields remains the same as when SuspendThread is >>>>> used. Handshaking with all threads won't make the data as >>>>> stable as when at a safepoint because individual threads >>>>> can resume execution after doing their handshake so there >>>>> will still be field instability. >>>>> >>>>> >>>>> Short version: GetObjectMonitorUsage() should only gather >>>>> data at a safepoint. Yes, I've changed my mind. >>>> >>>> I agree with this. >>>> The advantages are: >>>> ??- the result is stable >>>> ??- the implementation can be simplified >>>> >>>> Performance impact is not very clear but should not be that >>>> big as suspending all the threads has some overhead too. >>>> I'm not sure if using handshakes can make performance better. >>> >>> Ok, may I file it to JBS and fix it? >>> >>> Yasumasa >>> >>> >>>> Thanks, >>>> Serguei >>>> >>>>> Dan >>>>> >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>> info. That's a question for Robert F. >>>>>>>>> >>>>>>>>> >>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>> >>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>> >>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>> >>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>> >>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>> intentionally was designed to not require suspension. >>>>>>> >>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>> state that the data might be stale? I could go either way... >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>> >>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>> handshakes in this thread. >>>>>>>> >>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>> >>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>> see the code to see if there are holes... >>>>>>> >>>>>>> >>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>> >>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>> external suspends separately. That was a nightmare... >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>> >>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>> >>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>> >>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>> >>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>> >>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>> >>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > From david.holmes at oracle.com Wed Jun 17 09:42:10 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 17 Jun 2020 19:42:10 +1000 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> Message-ID: On 17/06/2020 1:54 pm, Yumin Qi wrote: > ? product(bool, UsePerfData, > true,????????????????????????????????????????? \ > ????????? "Flag to disable jvmstat instrumentation for performance > testing "\ > ????????? "and problem isolation > purposes")???????????????????????????????? \ > > > The flag default value set to true --- should we change that? If the > flag set to false at default, performance can benefit from that. If > users want to collect performance data, should explicitly turn it on. See comments in JDK-8246020. David > > Thanks > > Yumin > > > On 6/16/20 8:19 PM, Ioi Lam wrote: >> >> >> On 6/16/20 6:20 PM, David Holmes wrote: >>> Hi Ioi, >>> >>> On 17/06/2020 6:14 am, Ioi Lam wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>>> >>>> >>>> PerfClassTraceTime is (a rarely used feature) for measuring the time >>>> spent during class linking and initialization. >>> >>> "A special command jcmd PerfCounter.print >>> prints all performance counters in the process." >>> >>> How do you know this is a "rarely used feature"? >> Hi David, >> >> Sure, the counter will be dumped, but by "rarely used" -- I mean no >> one will find this particular counter useful, and no one will be >> actively looking at it. >> >> I changed two parts of the code -- class init and class linking. >> >> For class initialization, the counter may be useful for people who >> want to know how much time is spent in their functions, and >> my patch doesn't change that. It only avoids using the counter when a >> class has no , i.e., we know that the counter counts nothing >> (except for a logging statement). >> >> ===== >> >> For class linking, no user code is executed, so it only measures VM >> code. If it's useful for anyone, that would be VM engineers like me >> who are trying to optimize the speed of class loading. However, due to >> the overhead of the counter vs what it's trying to measure, the >> results are pretty meaningless. >> >> Note that I've not disabled the counter altogether. Instead, I disable >> it only when linking a CDS shared class, and we know that very little >> is happening for this class (e.g., no verification). >> >> I think the class linking timer might have been useful 15 years ago >> when it was introduced, or it might be useful today when CDS is >> disabled. But with CDS enabled, we are paying a constant price that >> seems to benefit no one. >> >> I think we should short-circuit it when it seems appropriate. If this >> indeed causes problems for our users, it's easy to re-enable it. >> That's better than just keeping this forever just because we're afraid >> to touch anything. >> >>> >>> I find it hard to evaluate whether this short-circuiting of the time >>> tracing is reasonable or not. Obviously any monitoring mechanism >>> should impose minimal overhead compared to what is being measured, >>> and these timers fall short in that regard. But if these stats become >>> meaningless then they may as well be removed. >>> >>> I think the serviceability folk (cc'd) need to evaluate this in the >>> context of the M&M tools. >>> >>>> However, it's quite expensive and it needs to start and stop a bunch >>>> of timers. With CDS, it's quite often for the overhead of the timer >>>> itself to be much more than the time it's trying to measure, giving >>>> unreliable measurement. >>>> >>>> In this patch, when it's clear that the init and linking will be >>>> very quick, I disable the timer and count only the number of >>>> invocations. This shows a small improvement in start-up >>> >>> I'm curious if you tried to forcing EagerInitialization to be true to >>> see how that improves the baseline. I've always noticed eager_init in >>> the code, but hadn't realized it is disabled by default. >>> >> >> I think it cannot be done by default, as it will violate the JLS. A >> class can be initialized only when it's touched by bytecodes. >> >> It can also backfire as we may load many classes without initializing >> them. E.g., during bytecode verification, we load many classes and >> just check that one is a supertype of another. >> >> Thanks >> - Ioi >> >>> Cheers, >>> David >>> ----- >>> >>>> Results of " perf stat -r 100 bin/java -Xshare:on >>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>>> >>>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 (? 0.310) + >>>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 (? 0.341) ++ >>>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>>> ================================================ >>>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>>> instruction delta =????? -284608??? -0.4774% >>>> time??????? delta =?????? -0.434 ms -1.0435% >>>> >>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>>> (so we have an overhead of about 715 instructions per use, yikes!). >> From serguei.spitsyn at oracle.com Wed Jun 17 15:38:11 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 17 Jun 2020 08:38:11 -0700 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> Message-ID: <7335f91e-9eb9-5b1a-0edc-23f9d78733b2@oracle.com> Hi Yasumasa, This fix is not enough. The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. The non-VMop mode has to be removed. Thanks, Serguei On 6/17/20 02:18, Yasumasa Suenaga wrote: > (Change subject for RFR) > > Hi, > > I filed it to JBS and upload a webrev for it. > Could you review it? > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ > > This change has passed tests on submit repo. > Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on > Linux x64. > > > Thanks, > > Yasumasa > > > On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >> Yes. It seems we have a consensus. >> Thank you for taking care about it. >> >> Thanks, >> Serguei >> >> >> On 6/16/20 18:34, David Holmes wrote: >>>> Ok, may I file it to JBS and fix it? >>> >>> Go for it! :) >>> >>> Cheers, >>> David >>> >>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>> Hi Dan, David and Yasumasa, >>>>> >>>>> >>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>> Hi Dan, >>>>>>>>>>> >>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not >>>>>>>>>>>>>>>> perform at safepoint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target >>>>>>>>>>>>>>> is not suspended: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. >>>>>>>>>>>>>>> go to a safepoint and try again >>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>> >>>>>>>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>>>>>>> get_object_monitor_usage(). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at >>>>>>>>>>>>>>>> first [1], but it would perform concurrently. >>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be >>>>>>>>>>>>>>>> changed to others in subsequent code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For example, the owner might release the monitor before >>>>>>>>>>>>>>>> [2]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The expectation is that when we find an owner thread it >>>>>>>>>>>>>>> is either suspended or not. If it is suspended then it >>>>>>>>>>>>>>> cannot release the monitor. If it is not suspended we >>>>>>>>>>>>>>> detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>>>>>>>> suspending check. >>>>>>>>>>>>> >>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>>>>>>>> safepoint but it only requires the Threads_lock. So yes >>>>>>>>>>>>> the code is wrong. >>>>>>>>>>>> >>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>> >>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>> has started the process of gathering the information while >>>>>>>>>>>> not at a >>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>> >>>>>>>>>>> The code tries to make sure that it either collects data >>>>>>>>>>> about a monitor owned by a thread that is suspended, or else >>>>>>>>>>> it collects that data at a safepoint. But the owning thread >>>>>>>>>>> can be resumed just after the code determined it was >>>>>>>>>>> suspended. The monitor can then be released and the >>>>>>>>>>> information gathered not only stale but potentially >>>>>>>>>>> completely wrong as it could now be owned by a different >>>>>>>>>>> thread and will report that thread's entry count. >>>>>>>>>> >>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>> could have moved on. >>>>>>>>> >>>>>>>>> That isn't the issue. That the info is stale is fine. But the >>>>>>>>> expectation is that the information was actually an accurate >>>>>>>>> snapshot of the state of the monitor at some point in time. >>>>>>>>> The current code does not ensure that. >>>>>>>> >>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>> at some point in time". >>>>>>> >>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>> atomicity**. The reported owner thread may not own it any longer >>>>>>> when the entry count is read, so straight away you may have the >>>>>>> wrong entry count information. The set of threads trying to >>>>>>> acquire the monitor, or wait on the monitor can change in >>>>>>> unexpected ways. It would be possible for instance to report the >>>>>>> same thread as being the owner, being blocked trying to enter >>>>>>> the monitor, and being in the wait-set of the monitor - >>>>>>> apparently all at the same time! >>>>>>> >>>>>>> ** even if the owner is suspended we don't have complete >>>>>>> atomicity because threads can join the set of threads trying to >>>>>>> enter the monitor (unless they are all suspended). >>>>>> >>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>> >>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info >>>>>> about >>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>> ??? we are gathering cannot change until we return from the >>>>>> safepoint. >>>>>> ??? It is a snapshot and a valid one at that. >>>>>> >>>>>> Consider the case when the monitor's owner is suspended: >>>>>> >>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>> ??? thread is suspended, then entry_count can change because >>>>>> ??? another thread can block on entry while we are gathering >>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>> ??? and notify_waiters can change. >>>>>> >>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>> ??? but the entry_count and waiter info is not stable. >>>>>> >>>>>> Consider the case when the monitor is not owned: >>>>>> >>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>> ??? some confusion here. >>>>>> >>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>> ??? case is no different than if a "barger" thread comes in >>>>>> ??? after the NULL owner field is observed and enters the >>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>> ??? suspended pending entry thread and a list of waiting >>>>>> ??? threads. The reality is that the object's monitor is >>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>> ??? the right time. >>>>>> >>>>>> So the owner field is only stable when we have an owner. If >>>>>> that owner is not suspended, then the other fields are also >>>>>> stable because we gathered the info at a safepoint. If the >>>>>> owner is suspended, then the owner and notify info is stable, >>>>>> but the entry_count and waiter info is not stable. >>>>>> >>>>>> If we have a NULL owner field, then the info is only stable >>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>> That's deterministic, but not without some work. >>>>>> >>>>>> >>>>>> Okay so only when we gather the info at a safepoint is all >>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>> do that at a safepoint when the owner thread is not suspended >>>>>> or if owner == NULL and one of the entry threads is not >>>>>> suspended. If either of those conditions is not true, then >>>>>> the different pieces of info is unstable to varying degrees. >>>>>> >>>>>> As for this claim: >>>>>> >>>>>>> It would be possible for instance to report the same thread >>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>> the same time! >>>>>> >>>>>> I can't figure out a way to make that scenario work. If the >>>>>> thread is seen as the owner and is not suspended, then we >>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>> then be seen as on the entry queue or on the wait queue since >>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>> suspended, then we gather info at a safepoint. If it is >>>>>> suspended on the entry queue, then it can't be seen on the >>>>>> wait queue. >>>>>> >>>>>> So the info instability of this API is bad, but it's not >>>>>> quite that bad. :-) (That is a small mercy.) >>>>>> >>>>>> >>>>>> Handshaking is not going to make this situation any better >>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>> handshake with the owner, the stability or instability of >>>>>> the other fields remains the same as when SuspendThread is >>>>>> used. Handshaking with all threads won't make the data as >>>>>> stable as when at a safepoint because individual threads >>>>>> can resume execution after doing their handshake so there >>>>>> will still be field instability. >>>>>> >>>>>> >>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>> data at a safepoint. Yes, I've changed my mind. >>>>> >>>>> I agree with this. >>>>> The advantages are: >>>>> ??- the result is stable >>>>> ??- the implementation can be simplified >>>>> >>>>> Performance impact is not very clear but should not be that >>>>> big as suspending all the threads has some overhead too. >>>>> I'm not sure if using handshakes can make performance better. >>>> >>>> Ok, may I file it to JBS and fix it? >>>> >>>> Yasumasa >>>> >>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>>> Dan >>>>>> >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>>>>> suspended so I can't see how this could be construed as an >>>>>>>>>>> agent bug. >>>>>>>>>> >>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>> >>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>> and ResumeThread() are made by the agent and the agent should >>>>>>>>>> not >>>>>>>>>> resume the target thread while also calling >>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>> >>>>>>>>> Perhaps. I was thinking more generally about an independent >>>>>>>>> resume, but you're right that doesn't really make a lot of >>>>>>>>> sense. But when the spec says nothing about suspension ... >>>>>>>> >>>>>>>> And it is intentional that suspension is not required. JVM/DI >>>>>>>> and JVM/PI >>>>>>>> used to require suspension for these kinds of get-the-info >>>>>>>> APIs. JVM/TI >>>>>>>> intentionally was designed to not require suspension. >>>>>>>> >>>>>>>> As I've said before, we could add a note about the data being >>>>>>>> potentially >>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). >>>>>>>> You can >>>>>>>> fetch the file's info, but there's no guarantee that the info >>>>>>>> is current >>>>>>>> by the time you process what you got back. Is it too much >>>>>>>> motherhood to >>>>>>>> state that the data might be stale? I could go either way... >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>> Using a handshake on the owner thread will allow this to be >>>>>>>>>>> fixed in the future without forcing/using any safepoints. >>>>>>>>>> >>>>>>>>>> I have to think about that which is why I'm avoiding talking >>>>>>>>>> about >>>>>>>>>> handshakes in this thread. >>>>>>>>> >>>>>>>>> Effectively the handshake can "suspend" the thread whilst the >>>>>>>>> monitor is queried. In effect the operation would create a >>>>>>>>> per-thread safepoint. >>>>>>>> >>>>>>>> I "know" that, but I still need time to think about it and >>>>>>>> probably >>>>>>>> see the code to see if there are holes... >>>>>>>> >>>>>>>> >>>>>>>>> Semantically it is no different to the code actually >>>>>>>>> suspending the owner thread, but it can't actually do that >>>>>>>>> because suspends/resume don't nest. >>>>>>>> >>>>>>>> Yeah... we used have a suspend count back when we tracked >>>>>>>> internal and >>>>>>>> external suspends separately. That was a nightmare... >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check >>>>>>>>>>>>>> thread state, it returns `true` when the thread is >>>>>>>>>>>>>> sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>> >>>>>>>>>>>>> Sure but if the thread is actually suspended it can't >>>>>>>>>>>>> continue execution in the VM or in Java code. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> This appears to be an optimisation for the assumed >>>>>>>>>>>>>>> common case where threads are first suspended and then >>>>>>>>>>>>>>> the monitors are queried. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI >>>>>>>>>>>>>> spec - it just says "Get information about the object's >>>>>>>>>>>>>> monitor." >>>>>>>>>>>>> >>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to >>>>>>>>>>>>> do with the spec. >>>>>>>>>>>>> >>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It starts with finding owner thread, but the owner might >>>>>>>>>>>>>> be just before wakeup. >>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is >>>>>>>>>>>>>> called at safepoint in any case. >>>>>>>>>>>>> >>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>> Handshakes, so this particular operation will require that >>>>>>>>>>>>> the apparent owner is Handshake-safe (by entering a >>>>>>>>>>>>> handshake with it) before querying the monitor. This would >>>>>>>>>>>>> still be preferable I think to always using a safepoint >>>>>>>>>>>>> for the entire operation. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [3] >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>> [4] >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> However there is still a potential bug as the thread >>>>>>>>>>>>>>> reported as the owner may not be suspended at the time >>>>>>>>>>>>>>> we first see it, and may release the monitor, but then >>>>>>>>>>>>>>> it may get suspended before we call: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>> proceed to query the monitor information in a racy way. >>>>>>>>>>>>>>> This can't happen when suspension itself requires a >>>>>>>>>>>>>>> safepoint as the current thread won't go to that >>>>>>>>>>>>>>> safepoint during this code. However, if suspension is >>>>>>>>>>>>>>> implemented via a direct handshake with the target >>>>>>>>>>>>>>> thread then we have a problem. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >> From serguei.spitsyn at oracle.com Wed Jun 17 15:42:04 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 17 Jun 2020 08:42:04 -0700 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> Message-ID: <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> Hi Yasumasa, This fix is not enough. The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. The non-VMop mode has to be removed. Thanks, Serguei On 6/17/20 02:18, Yasumasa Suenaga wrote: > (Change subject for RFR) > > Hi, > > I filed it to JBS and upload a webrev for it. > Could you review it? > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ > > This change has passed tests on submit repo. > Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on > Linux x64. > > > Thanks, > > Yasumasa > > > On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >> Yes. It seems we have a consensus. >> Thank you for taking care about it. >> >> Thanks, >> Serguei >> >> >> On 6/16/20 18:34, David Holmes wrote: >>>> Ok, may I file it to JBS and fix it? >>> >>> Go for it! :) >>> >>> Cheers, >>> David >>> >>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>> Hi Dan, David and Yasumasa, >>>>> >>>>> >>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>> Hi Dan, >>>>>>>>>>> >>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not >>>>>>>>>>>>>>>> perform at safepoint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target >>>>>>>>>>>>>>> is not suspended: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. >>>>>>>>>>>>>>> go to a safepoint and try again >>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>> >>>>>>>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>>>>>>> get_object_monitor_usage(). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at >>>>>>>>>>>>>>>> first [1], but it would perform concurrently. >>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be >>>>>>>>>>>>>>>> changed to others in subsequent code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For example, the owner might release the monitor before >>>>>>>>>>>>>>>> [2]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The expectation is that when we find an owner thread it >>>>>>>>>>>>>>> is either suspended or not. If it is suspended then it >>>>>>>>>>>>>>> cannot release the monitor. If it is not suspended we >>>>>>>>>>>>>>> detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think the owner thread might resume unfortunately after >>>>>>>>>>>>>> suspending check. >>>>>>>>>>>>> >>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a >>>>>>>>>>>>> safepoint but it only requires the Threads_lock. So yes >>>>>>>>>>>>> the code is wrong. >>>>>>>>>>>> >>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>> >>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>> has started the process of gathering the information while >>>>>>>>>>>> not at a >>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>> >>>>>>>>>>> The code tries to make sure that it either collects data >>>>>>>>>>> about a monitor owned by a thread that is suspended, or else >>>>>>>>>>> it collects that data at a safepoint. But the owning thread >>>>>>>>>>> can be resumed just after the code determined it was >>>>>>>>>>> suspended. The monitor can then be released and the >>>>>>>>>>> information gathered not only stale but potentially >>>>>>>>>>> completely wrong as it could now be owned by a different >>>>>>>>>>> thread and will report that thread's entry count. >>>>>>>>>> >>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>> could have moved on. >>>>>>>>> >>>>>>>>> That isn't the issue. That the info is stale is fine. But the >>>>>>>>> expectation is that the information was actually an accurate >>>>>>>>> snapshot of the state of the monitor at some point in time. >>>>>>>>> The current code does not ensure that. >>>>>>>> >>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>> at some point in time". >>>>>>> >>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>> atomicity**. The reported owner thread may not own it any longer >>>>>>> when the entry count is read, so straight away you may have the >>>>>>> wrong entry count information. The set of threads trying to >>>>>>> acquire the monitor, or wait on the monitor can change in >>>>>>> unexpected ways. It would be possible for instance to report the >>>>>>> same thread as being the owner, being blocked trying to enter >>>>>>> the monitor, and being in the wait-set of the monitor - >>>>>>> apparently all at the same time! >>>>>>> >>>>>>> ** even if the owner is suspended we don't have complete >>>>>>> atomicity because threads can join the set of threads trying to >>>>>>> enter the monitor (unless they are all suspended). >>>>>> >>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>> >>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info >>>>>> about >>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>> ??? we are gathering cannot change until we return from the >>>>>> safepoint. >>>>>> ??? It is a snapshot and a valid one at that. >>>>>> >>>>>> Consider the case when the monitor's owner is suspended: >>>>>> >>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>> ??? thread is suspended, then entry_count can change because >>>>>> ??? another thread can block on entry while we are gathering >>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>> ??? and notify_waiters can change. >>>>>> >>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>> ??? but the entry_count and waiter info is not stable. >>>>>> >>>>>> Consider the case when the monitor is not owned: >>>>>> >>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>> ??? some confusion here. >>>>>> >>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>> ??? case is no different than if a "barger" thread comes in >>>>>> ??? after the NULL owner field is observed and enters the >>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>> ??? suspended pending entry thread and a list of waiting >>>>>> ??? threads. The reality is that the object's monitor is >>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>> ??? the right time. >>>>>> >>>>>> So the owner field is only stable when we have an owner. If >>>>>> that owner is not suspended, then the other fields are also >>>>>> stable because we gathered the info at a safepoint. If the >>>>>> owner is suspended, then the owner and notify info is stable, >>>>>> but the entry_count and waiter info is not stable. >>>>>> >>>>>> If we have a NULL owner field, then the info is only stable >>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>> That's deterministic, but not without some work. >>>>>> >>>>>> >>>>>> Okay so only when we gather the info at a safepoint is all >>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>> do that at a safepoint when the owner thread is not suspended >>>>>> or if owner == NULL and one of the entry threads is not >>>>>> suspended. If either of those conditions is not true, then >>>>>> the different pieces of info is unstable to varying degrees. >>>>>> >>>>>> As for this claim: >>>>>> >>>>>>> It would be possible for instance to report the same thread >>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>> the same time! >>>>>> >>>>>> I can't figure out a way to make that scenario work. If the >>>>>> thread is seen as the owner and is not suspended, then we >>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>> then be seen as on the entry queue or on the wait queue since >>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>> suspended, then we gather info at a safepoint. If it is >>>>>> suspended on the entry queue, then it can't be seen on the >>>>>> wait queue. >>>>>> >>>>>> So the info instability of this API is bad, but it's not >>>>>> quite that bad. :-) (That is a small mercy.) >>>>>> >>>>>> >>>>>> Handshaking is not going to make this situation any better >>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>> handshake with the owner, the stability or instability of >>>>>> the other fields remains the same as when SuspendThread is >>>>>> used. Handshaking with all threads won't make the data as >>>>>> stable as when at a safepoint because individual threads >>>>>> can resume execution after doing their handshake so there >>>>>> will still be field instability. >>>>>> >>>>>> >>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>> data at a safepoint. Yes, I've changed my mind. >>>>> >>>>> I agree with this. >>>>> The advantages are: >>>>> ??- the result is stable >>>>> ??- the implementation can be simplified >>>>> >>>>> Performance impact is not very clear but should not be that >>>>> big as suspending all the threads has some overhead too. >>>>> I'm not sure if using handshakes can make performance better. >>>> >>>> Ok, may I file it to JBS and fix it? >>>> >>>> Yasumasa >>>> >>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>>> Dan >>>>>> >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>>>>> suspended so I can't see how this could be construed as an >>>>>>>>>>> agent bug. >>>>>>>>>> >>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>> >>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>> and ResumeThread() are made by the agent and the agent should >>>>>>>>>> not >>>>>>>>>> resume the target thread while also calling >>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>> >>>>>>>>> Perhaps. I was thinking more generally about an independent >>>>>>>>> resume, but you're right that doesn't really make a lot of >>>>>>>>> sense. But when the spec says nothing about suspension ... >>>>>>>> >>>>>>>> And it is intentional that suspension is not required. JVM/DI >>>>>>>> and JVM/PI >>>>>>>> used to require suspension for these kinds of get-the-info >>>>>>>> APIs. JVM/TI >>>>>>>> intentionally was designed to not require suspension. >>>>>>>> >>>>>>>> As I've said before, we could add a note about the data being >>>>>>>> potentially >>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). >>>>>>>> You can >>>>>>>> fetch the file's info, but there's no guarantee that the info >>>>>>>> is current >>>>>>>> by the time you process what you got back. Is it too much >>>>>>>> motherhood to >>>>>>>> state that the data might be stale? I could go either way... >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>> Using a handshake on the owner thread will allow this to be >>>>>>>>>>> fixed in the future without forcing/using any safepoints. >>>>>>>>>> >>>>>>>>>> I have to think about that which is why I'm avoiding talking >>>>>>>>>> about >>>>>>>>>> handshakes in this thread. >>>>>>>>> >>>>>>>>> Effectively the handshake can "suspend" the thread whilst the >>>>>>>>> monitor is queried. In effect the operation would create a >>>>>>>>> per-thread safepoint. >>>>>>>> >>>>>>>> I "know" that, but I still need time to think about it and >>>>>>>> probably >>>>>>>> see the code to see if there are holes... >>>>>>>> >>>>>>>> >>>>>>>>> Semantically it is no different to the code actually >>>>>>>>> suspending the owner thread, but it can't actually do that >>>>>>>>> because suspends/resume don't nest. >>>>>>>> >>>>>>>> Yeah... we used have a suspend count back when we tracked >>>>>>>> internal and >>>>>>>> external suspends separately. That was a nightmare... >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check >>>>>>>>>>>>>> thread state, it returns `true` when the thread is >>>>>>>>>>>>>> sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>> >>>>>>>>>>>>> Sure but if the thread is actually suspended it can't >>>>>>>>>>>>> continue execution in the VM or in Java code. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> This appears to be an optimisation for the assumed >>>>>>>>>>>>>>> common case where threads are first suspended and then >>>>>>>>>>>>>>> the monitors are queried. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI >>>>>>>>>>>>>> spec - it just says "Get information about the object's >>>>>>>>>>>>>> monitor." >>>>>>>>>>>>> >>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to >>>>>>>>>>>>> do with the spec. >>>>>>>>>>>>> >>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It starts with finding owner thread, but the owner might >>>>>>>>>>>>>> be just before wakeup. >>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is >>>>>>>>>>>>>> called at safepoint in any case. >>>>>>>>>>>>> >>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>> Handshakes, so this particular operation will require that >>>>>>>>>>>>> the apparent owner is Handshake-safe (by entering a >>>>>>>>>>>>> handshake with it) before querying the monitor. This would >>>>>>>>>>>>> still be preferable I think to always using a safepoint >>>>>>>>>>>>> for the entire operation. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [3] >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>> [4] >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> However there is still a potential bug as the thread >>>>>>>>>>>>>>> reported as the owner may not be suspended at the time >>>>>>>>>>>>>>> we first see it, and may release the monitor, but then >>>>>>>>>>>>>>> it may get suspended before we call: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>> proceed to query the monitor information in a racy way. >>>>>>>>>>>>>>> This can't happen when suspension itself requires a >>>>>>>>>>>>>>> safepoint as the current thread won't go to that >>>>>>>>>>>>>>> safepoint during this code. However, if suspension is >>>>>>>>>>>>>>> implemented via a direct handshake with the target >>>>>>>>>>>>>>> thread then we have a problem. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >> From chris.plummer at oracle.com Wed Jun 17 20:34:23 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 17 Jun 2020 13:34:23 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp Message-ID: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> Hello, Please help review the following: https://bugs.openjdk.java.net/browse/JDK-8247533 http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html The CR contains all the needed details. Here's a summary of changes in each file: src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c -Improve a print_debug message src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java -Deal with the array of registers read in being null due to the OS ThreadID not being valid. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. thanks, Chris From coleen.phillimore at oracle.com Wed Jun 17 21:25:25 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 17 Jun 2020 17:25:25 -0400 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage Message-ID: Summary: Remove JVMTI oops_do calls from JVMTI and GCs Tested with tier1-3, also built shenandoah to verify shenandoah changes. open webrev at http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8247808 Thanks, Coleen From jcbeyler at google.com Wed Jun 17 22:47:54 2020 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Wed, 17 Jun 2020 15:47:54 -0700 Subject: JVMTI callback SampledObjectAlloc always fires for first allocation in a new thread In-Reply-To: References: Message-ID: Hi Markus, No it's fine. I figured out what I was doing wrong by just opening my eyes (number of sampling vs number of samples in my cache...); I saw the same issue (not that I doubted actually) and reviewed the code paths. It is a 1-liner to fix this but I'm just working on the test and ensuring it passes, Jc On Wed, Jun 17, 2020 at 1:00 AM Markus Gaisbauer wrote: > Please forget about the Windows thing. I forgot that my colleague saw the > same 0xf1f1f1f1f1f1f1f1 also on Linux. > > Markus > > On Wed, Jun 17, 2020 at 9:57 AM Markus Gaisbauer < > markus.gaisbauer at gmail.com> wrote: > >> Hi Jean, >> >> Thank you for having a look at this. >> >> I attached the code of my basic JVMTI agent. I ran my tests on Windows. >> Maybe this 0xf1f1f1f1f1f1f1f1 is a Windows thing and Linux initializes the >> memory to all zeros. >> >> Regards, >> Markus >> >> >> On Tue, Jun 16, 2020 at 2:25 AM Jean Christophe Beyler < >> jcbeyler at google.com> wrote: >> >>> Hi Markus, >>> >>> I played around adding your Java code in the testing framework and I >>> don't get exactly the same failure as you do. Basically, I get about 5% >>> samples compared to the number of threads, whereas you seem to get a sample >>> for each element. Could you add the code you used for the agent so I can >>> see if you are doing something different than I am in that regard? >>> >>> This doesn't change the issue, I'm just curious why you seem to be >>> exposing it more. I'm still digging into what would be the right solution >>> for this. >>> >>> Thanks, >>> Jc >>> >>> On Mon, Jun 15, 2020 at 9:53 AM Jean Christophe Beyler < >>> jcbeyler at google.com> wrote: >>> >>>> Hi Markus, >>>> >>>> I created: >>>> https://bugs.openjdk.java.net/browse/JDK-8247615 >>>> >>>> And I'll see what needs to be done for it :) >>>> Jc >>>> >>>> On Fri, Jun 5, 2020 at 3:45 AM Markus Gaisbauer < >>>> markus.gaisbauer at gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> JVMTI callback SampledObjectAlloc is currently always called for the >>>>> first allocation of a thread. This generates a lot of bias in an >>>>> application that regularly starts new threads. >>>>> >>>>> I tested this with latest Java 11 and Java 15. >>>>> >>>>> E.g. here is a sample that creates 100 threads and allocates one >>>>> object in each thread. >>>>> >>>>> public class AllocationProfilingBiasReproducer { >>>>> public static void main(String[] args) throws Exception { >>>>> for (int i = 0; i < 100; i++) { >>>>> new Thread(new Task(), "Task " + i).start(); >>>>> Thread.sleep(1); >>>>> } >>>>> Thread.sleep(1000); >>>>> } >>>>> private static class Task implements Runnable { >>>>> @Override >>>>> public void run() { >>>>> new A(); >>>>> } >>>>> } >>>>> private static class A { >>>>> } >>>>> } >>>>> >>>>> I built a simple JVMTI agent that registers SampledObjectAlloc >>>>> callback and sets interval to 1 MB with SetHeapSamplingInterval. The >>>>> callback simply logs thread name and class name of allocated object. >>>>> >>>>> I see the following output: >>>>> >>>>> SampledObjectAlloc Ljava/lang/String; via Task 0 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 1 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 2 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 3 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 4 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 5 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 6 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 7 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 8 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 9 >>>>> SampledObjectAlloc LAllocationProfilingBiasReproducer$A; via Task 10 >>>>> ... >>>>> >>>>> This is not expected. >>>>> >>>>> I set a breakpoint in my SampledObjectAlloc callback and observed the >>>>> following: >>>>> >>>>> In MemAllocator::Allocation::notify_allocation_jvmti_sampler() the >>>>> local var bytes_since_last is always 0xf1f1f1f1f1f1f1f1 for first >>>>> allocation of a thread. So first allocation is always reported to my agent. >>>>> >>>>> ThreadLocalAllocBuffer::_bytes_since_last_sample_point does not seem >>>>> to be explicitly initialized before accessing it for the first time. I >>>>> assume 0xf1f1f1f1f1f1f1f1 is a default value provided by some Hotspot >>>>> allocator. Only after the first event fired, >>>>> notify_allocation_jvmti_sampler >>>>> calls ThreadLocalAllocBuffer::set_sample_end which initializes >>>>> _bytes_since_last_sample_point to a proper value. >>>>> >>>>> I am looking for someone who could create a JIRA ticket for this. >>>>> >>>>> Regards, >>>>> Markus >>>>> >>>> >>>> >>>> -- >>>> >>>> Thanks, >>>> Jc >>>> >>> >>> >>> -- >>> >>> Thanks, >>> Jc >>> >> -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Jun 17 23:49:49 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 09:49:49 +1000 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage In-Reply-To: References: Message-ID: <890246ae-2d56-95ff-c360-dd66532579a8@oracle.com> Hi Coleen, On 18/06/2020 7:25 am, coleen.phillimore at oracle.com wrote: > Summary: Remove JVMTI oops_do calls from JVMTI and GCs > > Tested with tier1-3, also built shenandoah to verify shenandoah changes. > > open webrev at http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8247808 This is a nice cleanup and simplification of the code for working with OopStorage! So LGTM. One query ... I'm assuming that the processing previously done in JvmtiExport::oops_do is now done by OopStorageSet::vm_global()->oops_do. In most cases I can see the call to OopStorageSet::vm_global()->oops_do in the same vicinity as the call to JvmtiExport::oops_do, but not all i.e. ZRootsIterator::oops_do and ShenandoahSerialRoots::oops_do. Tracking through it seems that for those GCs the VM global roots are processed concurrently, whereas currently JVMTI roots are not. Does that make any potential difference? Thanks, David ----- > Thanks, > Coleen From suenaga at oss.nttdata.com Thu Jun 18 00:11:17 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 18 Jun 2020 09:11:17 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> Message-ID: <3d61636e-02c5-3c49-0561-a69fb75fa7ce@oss.nttdata.com> Hi Chris, Can you handle ESRCH only in this case? ptrace(2) might fail by other reason. Thanks, Yasumasa On 2020/06/18 5:34, Chris Plummer wrote: > Hello, > > Please help review the following: > > https://bugs.openjdk.java.net/browse/JDK-8247533 > http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html > > The CR contains all the needed details. Here's a summary of changes in each file: > > src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp > src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m > src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp > -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. > > src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c > -Improve a print_debug message > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java > -Deal with the array of registers read in being null due to the OS ThreadID not being valid. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java > -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. > > thanks, > > Chris From coleen.phillimore at oracle.com Thu Jun 18 01:09:33 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 17 Jun 2020 21:09:33 -0400 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage In-Reply-To: <890246ae-2d56-95ff-c360-dd66532579a8@oracle.com> References: <890246ae-2d56-95ff-c360-dd66532579a8@oracle.com> Message-ID: On 6/17/20 7:49 PM, David Holmes wrote: > Hi Coleen, > > On 18/06/2020 7:25 am, coleen.phillimore at oracle.com wrote: >> Summary: Remove JVMTI oops_do calls from JVMTI and GCs >> >> Tested with tier1-3, also built shenandoah to verify shenandoah changes. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8247808 > > This is a nice cleanup and simplification of the code for working with > OopStorage! So LGTM. Thanks, David. > > One query ... I'm assuming that the processing previously done in > JvmtiExport::oops_do is now done by > OopStorageSet::vm_global()->oops_do. In most cases I can see the call > to OopStorageSet::vm_global()->oops_do in the same vicinity as the > call to JvmtiExport::oops_do, but not all i.e. ZRootsIterator::oops_do > and ShenandoahSerialRoots::oops_do. Tracking through it seems that for > those GCs the VM global roots are processed concurrently, whereas > currently JVMTI roots are not. Does that make any potential difference? ZGC and Shenandoah are better because when the vm_global() roots grow, they'll be processed faster.? Because accessing oops in OopStorage uses resolve() which uses the Access API, any oops will be marked or fixed when accessed if the GC hasn't yet gotten to this oop in it's concurrent processing. Kim noticed that G1 and ParallelGC should be processing these roots in parallel (with many threads, since OopStorage has that support) and he's going to or has filed a bug to fix it.? As we add more things to OopStorage (see upcoming RFRs), this will become important. Thanks, Coleen > > Thanks, > David > ----- > >> Thanks, >> Coleen From serguei.spitsyn at oracle.com Thu Jun 18 01:46:16 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 17 Jun 2020 18:46:16 -0700 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage In-Reply-To: References: Message-ID: <0a3648ec-27be-7911-b7b3-cff48f55f793@oracle.com> Hi Coleen, Nice simplification! It looks good to me. I assume you will run the nsk.jvmti tests. Thanks, Serguei On 6/17/20 14:25, coleen.phillimore at oracle.com wrote: > Summary: Remove JVMTI oops_do calls from JVMTI and GCs > > Tested with tier1-3, also built shenandoah to verify shenandoah changes. > > open webrev at http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8247808 > > Thanks, > Coleen From david.holmes at oracle.com Thu Jun 18 02:38:25 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 12:38:25 +1000 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> Message-ID: <91d7ab26-cccb-8d9c-8929-5b37664f6313@oracle.com> Hi Ioi, On 17/06/2020 1:19 pm, Ioi Lam wrote: > On 6/16/20 6:20 PM, David Holmes wrote: >> Hi Ioi, >> >> On 17/06/2020 6:14 am, Ioi Lam wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>> >>> >>> PerfClassTraceTime is (a rarely used feature) for measuring the time >>> spent during class linking and initialization. >> >> "A special command jcmd PerfCounter.print >> prints all performance counters in the process." >> >> How do you know this is a "rarely used feature"? > Hi David, > > Sure, the counter will be dumped, but by "rarely used" -- I mean no one > will find this particular counter useful, and no one will be actively > looking at it. > > I changed two parts of the code -- class init and class linking. > > For class initialization, the counter may be useful for people who want > to know how much time is spent in their functions, and my patch > doesn't change that. It only avoids using the counter when a class has > no , i.e., we know that the counter counts nothing (except for a > logging statement). I can see where you are coming from here. Today we keep track of the time taken to mark a class as initialized when there is no clinit to execute, and we actually record pure timer overhead as it dwarfs the simple update to the class state. With your change we now won't track the time taken to mark the class as initialized. In both cases the time recorded is inaccurate - in opposite senses. In that regard your slight underestimation of the class initalization cost seems better then the present over-estimate. > ===== > > For class linking, no user code is executed, so it only measures VM > code. If it's useful for anyone, that would be VM engineers like me who > are trying to optimize the speed of class loading. However, due to the > overhead of the counter vs what it's trying to measure, the results are > pretty meaningless. > > Note that I've not disabled the counter altogether. Instead, I disable > it only when linking a CDS shared class, and we know that very little is > happening for this class (e.g., no verification). Yes I have little concern for this part. Linking is a multi-phase process so "time for linking" is already ill-defined. And the fact you only do it for CDS makes it even less interesting. > I think the class linking timer might have been useful 15 years ago when > it was introduced, or it might be useful today when CDS is disabled. But > with CDS enabled, we are paying a constant price that seems to benefit > no one. > > I think we should short-circuit it when it seems appropriate. If this > indeed causes problems for our users, it's easy to re-enable it. That's > better than just keeping this forever just because we're afraid to touch > anything. I'm uncomfortable with both the "keep forever as we're too afraid to change" and "change it now and restore it if anyone complains" ends of this spectrum. Obviously we need to make progress, but the "change it now and change back later if needed" is a bit naive, as once any change is made we can't change back without affecting another set of users, and we don't know how long it will be before the change reaches users and the problems return to us. From a CSR perspective I want to see that due diligence has been applied with regard to these behavioural changes, and JDK engineers are often not in a position to understand how end users use this kind of information. I don't have a solution for that general problem. In this particular case I think under-estimating the class initialization overhead is better than the present over-estimate. Though anyone tracking the trends here will be surprised when the cost suddenly drops. >> >> I find it hard to evaluate whether this short-circuiting of the time >> tracing is reasonable or not. Obviously any monitoring mechanism >> should impose minimal overhead compared to what is being measured, and >> these timers fall short in that regard. But if these stats become >> meaningless then they may as well be removed. >> >> I think the serviceability folk (cc'd) need to evaluate this in the >> context of the M&M tools. >> >>> However, it's quite expensive and it needs to start and stop a bunch >>> of timers. With CDS, it's quite often for the overhead of the timer >>> itself to be much more than the time it's trying to measure, giving >>> unreliable measurement. >>> >>> In this patch, when it's clear that the init and linking will be very >>> quick, I disable the timer and count only the number of invocations. >>> This shows a small improvement in start-up >> >> I'm curious if you tried to forcing EagerInitialization to be true to >> see how that improves the baseline. I've always noticed eager_init in >> the code, but hadn't realized it is disabled by default. >> > > I think it cannot be done by default, as it will violate the JLS. A > class can be initialized only when it's touched by bytecodes. I'm also not sure it violates JLS as you can't directly query if a class is initialized, but anyway I wasn't suggesting turning this on by default, I meant only in regard to get a performance baseline to compare against the changes you have implemented here. Aside: I have to wonder whether anyone uses EagerInitialization or whether we should get rid of it? > It can also backfire as we may load many classes without initializing > them. E.g., during bytecode verification, we load many classes and just > check that one is a supertype of another. Not sure what is backfiring here ?? Thanks, David ----- > Thanks > - Ioi > >> Cheers, >> David >> ----- >> >>> Results of " perf stat -r 100 bin/java -Xshare:on >>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>> >>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 (? 0.310) + >>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 (? 0.341) ++ >>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>> ================================================ >>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>> instruction delta =????? -284608??? -0.4774% >>> time??????? delta =?????? -0.434 ms -1.0435% >>> >>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>> (so we have an overhead of about 715 instructions per use, yikes!). > From david.holmes at oracle.com Thu Jun 18 02:43:37 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 12:43:37 +1000 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> Message-ID: Hi Chris, On 18/06/2020 6:34 am, Chris Plummer wrote: > Hello, > > Please help review the following: > > https://bugs.openjdk.java.net/browse/JDK-8247533 > http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html > > The CR contains all the needed details. Here's a summary of changes in > each file: The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? Cheers, David > src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp > src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m > src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp > -Instead of throwing an exception when the OS ThreadID is invalid, print > a warning. > > src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c > -Improve a print_debug message > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java > > -Deal with the array of registers read in being null due to the OS > ThreadID not being valid. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java > > -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing > twice when printing the exception. > > thanks, > > Chris From suenaga at oss.nttdata.com Thu Jun 18 02:59:45 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 18 Jun 2020 11:59:45 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> Message-ID: Hi Serguei, Thanks for your comment! I uploaded new webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ I'm not sure the following change is correct. Can we assume owning_thread is not NULL at safepoint? All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. ``` // This monitor is owned so we have to find the owning JavaThread. owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); - // Cannot assume (owning_thread != NULL) here because this function - // may not have been called at a safepoint and the owning_thread - // might not be suspended. - if (owning_thread != NULL) { - // The monitor's owner either has to be the current thread, at safepoint - // or it has to be suspended. Any of these conditions will prevent both - // contending and waiting threads from modifying the state of - // the monitor. - if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { - // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED - // will not make it back to the JVM/TI agent. The error code will - // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which - // will retry the call via a VM_GetObjectMonitorUsage VM op. - return JVMTI_ERROR_THREAD_NOT_SUSPENDED; - } - HandleMark hm; + assert(owning_thread != NULL, "owning JavaThread must not be NULL"); Handle th(current_thread, owning_thread->threadObj()); ret.owner = (jthread)jni_reference(calling_thread, th); ``` Thanks, Yasumasa On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > This fix is not enough. > The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. > The non-VMop mode has to be removed. > > Thanks, > Serguei > > > On 6/17/20 02:18, Yasumasa Suenaga wrote: >> (Change subject for RFR) >> >> Hi, >> >> I filed it to JBS and upload a webrev for it. >> Could you review it? >> >> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >> >> This change has passed tests on submit repo. >> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>> Yes. It seems we have a consensus. >>> Thank you for taking care about it. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/16/20 18:34, David Holmes wrote: >>>>> Ok, may I file it to JBS and fix it? >>>> >>>> Go for it! :) >>>> >>>> Cheers, >>>> David >>>> >>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Dan, David and Yasumasa, >>>>>> >>>>>> >>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>> Hi Dan, >>>>>>>>>>>> >>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>> >>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>> >>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>> >>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>> could have moved on. >>>>>>>>>> >>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>> >>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>> at some point in time". >>>>>>>> >>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>> >>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>> >>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>> >>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>> >>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>> >>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>> ??? another thread can block on entry while we are gathering >>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>> ??? and notify_waiters can change. >>>>>>> >>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>> >>>>>>> Consider the case when the monitor is not owned: >>>>>>> >>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>> ??? some confusion here. >>>>>>> >>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>> ??? the right time. >>>>>>> >>>>>>> So the owner field is only stable when we have an owner. If >>>>>>> that owner is not suspended, then the other fields are also >>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>> but the entry_count and waiter info is not stable. >>>>>>> >>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>> That's deterministic, but not without some work. >>>>>>> >>>>>>> >>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>> suspended. If either of those conditions is not true, then >>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>> >>>>>>> As for this claim: >>>>>>> >>>>>>>> It would be possible for instance to report the same thread >>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>> the same time! >>>>>>> >>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>> wait queue. >>>>>>> >>>>>>> So the info instability of this API is bad, but it's not >>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>> >>>>>>> >>>>>>> Handshaking is not going to make this situation any better >>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>> handshake with the owner, the stability or instability of >>>>>>> the other fields remains the same as when SuspendThread is >>>>>>> used. Handshaking with all threads won't make the data as >>>>>>> stable as when at a safepoint because individual threads >>>>>>> can resume execution after doing their handshake so there >>>>>>> will still be field instability. >>>>>>> >>>>>>> >>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>> >>>>>> I agree with this. >>>>>> The advantages are: >>>>>> ??- the result is stable >>>>>> ??- the implementation can be simplified >>>>>> >>>>>> Performance impact is not very clear but should not be that >>>>>> big as suspending all the threads has some overhead too. >>>>>> I'm not sure if using handshakes can make performance better. >>>>> >>>>> Ok, may I file it to JBS and fix it? >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>>> Dan >>>>>>> >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>> >>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>> >>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>> >>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>> >>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>> >>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>> >>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>> handshakes in this thread. >>>>>>>>>> >>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>> >>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>> see the code to see if there are holes... >>>>>>>>> >>>>>>>>> >>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>> >>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> Dan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>> > From chris.plummer at oracle.com Thu Jun 18 04:33:36 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 17 Jun 2020 21:33:36 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> Message-ID: <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> On 6/17/20 7:43 PM, David Holmes wrote: > Hi Chris, > > On 18/06/2020 6:34 am, Chris Plummer wrote: >> Hello, >> >> Please help review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8247533 >> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >> >> The CR contains all the needed details. Here's a summary of changes >> in each file: > > The problem sounds to me like a variation of the more general problem > of not ensuring a thread is kept alive whilst acting upon it. I don't > know how the SA finds these references to the threads it is going to > stackwalk, but is it possible to fix this via appropriate uses of > ThreadsListHandle/Iterator? It fetches ThreadsSMRSupport::_java_thread_list. Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. Chris > > Cheers, > David > >> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >> -Instead of throwing an exception when the OS ThreadID is invalid, >> print a warning. >> >> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >> -Improve a print_debug message >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >> >> -Deal with the array of registers read in being null due to the OS >> ThreadID not being valid. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >> >> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >> appearing twice when printing the exception. >> >> thanks, >> >> Chris From ioi.lam at oracle.com Thu Jun 18 04:34:41 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 17 Jun 2020 21:34:41 -0700 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <91d7ab26-cccb-8d9c-8929-5b37664f6313@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> <91d7ab26-cccb-8d9c-8929-5b37664f6313@oracle.com> Message-ID: On 6/17/20 7:38 PM, David Holmes wrote: > Hi Ioi, > > On 17/06/2020 1:19 pm, Ioi Lam wrote: >> On 6/16/20 6:20 PM, David Holmes wrote: >>> Hi Ioi, >>> >>> On 17/06/2020 6:14 am, Ioi Lam wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>>> >>>> >>>> PerfClassTraceTime is (a rarely used feature) for measuring the >>>> time spent during class linking and initialization. >>> >>> "A special command jcmd PerfCounter.print >>> prints all performance counters in the process." >>> >>> How do you know this is a "rarely used feature"? >> Hi David, >> >> Sure, the counter will be dumped, but by "rarely used" -- I mean no >> one will find this particular counter useful, and no one will be >> actively looking at it. >> >> I changed two parts of the code -- class init and class linking. >> >> For class initialization, the counter may be useful for people who >> want to know how much time is spent in their functions, and >> my patch doesn't change that. It only avoids using the counter when a >> class has no , i.e., we know that the counter counts nothing >> (except for a logging statement). > > I can see where you are coming from here. Today we keep track of the > time taken to mark a class as initialized when there is no clinit to > execute, and we actually record pure timer overhead as it dwarfs the > simple update to the class state. With your change we now won't track > the time taken to mark the class as initialized. In both cases the > time recorded is inaccurate - in opposite senses. In that regard your > slight underestimation of the class initalization cost seems better > then the present over-estimate. > >> ===== >> >> For class linking, no user code is executed, so it only measures VM >> code. If it's useful for anyone, that would be VM engineers like me >> who are trying to optimize the speed of class loading. However, due >> to the overhead of the counter vs what it's trying to measure, the >> results are pretty meaningless. >> >> Note that I've not disabled the counter altogether. Instead, I >> disable it only when linking a CDS shared class, and we know that >> very little is happening for this class (e.g., no verification). > > Yes I have little concern for this part. Linking is a multi-phase > process so "time for linking" is already ill-defined. And the fact you > only do it for CDS makes it even less interesting. > >> I think the class linking timer might have been useful 15 years ago >> when it was introduced, or it might be useful today when CDS is >> disabled. But with CDS enabled, we are paying a constant price that >> seems to benefit no one. >> >> I think we should short-circuit it when it seems appropriate. If this >> indeed causes problems for our users, it's easy to re-enable it. >> That's better than just keeping this forever just because we're >> afraid to touch anything. > > I'm uncomfortable with both the "keep forever as we're too afraid to > change" and "change it now and restore it if anyone complains" ends of > this spectrum. Obviously we need to make progress, but the "change it > now and change back later if needed" is a bit naive, as once any > change is made we can't change back without affecting another set of > users, and we don't know how long it will be before the change reaches > users and the problems return to us. From a CSR perspective I want to > see that due diligence has been applied with regard to these > behavioural changes, and JDK engineers are often not in a position to > understand how end users use this kind of information. I don't have a > solution for that general problem. > > In this particular case I think under-estimating the class > initialization overhead is better than the present over-estimate. > Though anyone tracking the trends here will be surprised when the cost > suddenly drops. Hi David, I don't have a solution, either. I am in no hurry, and the improvement is minor. I could post a CSR and let it stand for a few months to see if anyone objects. My impression is "linking time" is such an esoteric feature that no one will care, but I may be wrong. Actually it would be good if someone tells me I am wrong -- they probably are experiencing some overhead in class loading that we don't currently know about. As I mentioned earlier, if anyone is using this timer, it would be VM engineers who work on class loading. In fact, Yumin fixed 8178349 "Cache builtin class loader constraints to avoid re-initializing itable/vtable for shared classes" in JDK 15, which significantly reduced the amount of time spent during class linking. However, we didn't use this timer for measuring the effectiveness of that fix, as the overhead and variability are too high. We have reached a point in the class loading code that we can only make small incremental improvements, and we can only measure the effect of our changes with external profilers such as "perf stat -r 200 bin/java -version" that launches the VM 200 times (and repeat that 10 times) and averages the elapsed time. Maybe it's time for the "class linking" timer to go away completely, or at least be disabled when CDS is enabled. It's pretty much useless. I wish we had an established deprecation process for such legacy features. > >>> >>> I find it hard to evaluate whether this short-circuiting of the time >>> tracing is reasonable or not. Obviously any monitoring mechanism >>> should impose minimal overhead compared to what is being measured, >>> and these timers fall short in that regard. But if these stats >>> become meaningless then they may as well be removed. >>> >>> I think the serviceability folk (cc'd) need to evaluate this in the >>> context of the M&M tools. >>> >>>> However, it's quite expensive and it needs to start and stop a >>>> bunch of timers. With CDS, it's quite often for the overhead of the >>>> timer itself to be much more than the time it's trying to measure, >>>> giving unreliable measurement. >>>> >>>> In this patch, when it's clear that the init and linking will be >>>> very quick, I disable the timer and count only the number of >>>> invocations. This shows a small improvement in start-up >>> >>> I'm curious if you tried to forcing EagerInitialization to be true >>> to see how that improves the baseline. I've always noticed >>> eager_init in the code, but hadn't realized it is disabled by default. >>> >> >> I think it cannot be done by default, as it will violate the JLS. A >> class can be initialized only when it's touched by bytecodes. > > I'm also not sure it violates JLS as you can't directly query if a > class is initialized, but anyway I wasn't suggesting turning this on > by default, I meant only in regard to get a performance baseline to > compare against the changes you have implemented here. I don't think EagerInitialization will make a difference that's related to the timer. Both the two instances of the timer will still be used for exactly the same number of times, just under different call stacks. > > Aside: I have to wonder whether anyone uses EagerInitialization or > whether we should get rid of it? > I wonder about the same thing. I'll ask around and file an RFE if appropriate. >> It can also backfire as we may load many classes without initializing >> them. E.g., during bytecode verification, we load many classes and >> just check that one is a supertype of another. > > Not sure what is backfiring here ?? Here's an example that both violates the JLS and backfires as it slows down VM start up, because can have observable side effects: class Main { ??? static int X; ??? public void main(String args[]) { ??????? System.out.println(X); ??? } ??? void deadcode() { ??????? Super s = new Child(); ??????? s.method(); ??? } } class Super { ??? void method() {} } class Child extends Super { ??? static { ??????? for (int i=0; i<1000000; i++) { ??????????? Main.X ++; ??????? } ??? } } When Main is linked, its bytecodes are verified, including deadcode(). Since deadcode() has an implicit cast of Child to Super, the verifier needs to load both Child and Super, and check that Child is indeed a subclass of Super. If EagerInitialization is enabled, Child will be initialized as soon as it's entered into the system dictionary. This violates https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1 A class or interface type T will be initialized immediately before the first occurrence of any one of the following: T is a class and an instance of T is created. A static method declared by T is invoked. A static field declared by T is assigned. A static field declared by T is used and the field is not a constant variable (?4.12.4). .... A class or interface will not be initialized under any other circumstance. Also, app developers will generally expect 0 to be printed. So EagerInitialization will probably break apps in subtle ways. And initializing Child may recursively load in more classes during the verification of Child ...... Thanks - Ioi > > Thanks, > David > ----- > >> Thanks >> - Ioi >> >>> Cheers, >>> David >>> ----- >>> >>>> Results of " perf stat -r 100 bin/java -Xshare:on >>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>>> >>>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 ( 0.310) + >>>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 ( 0.341) ++ >>>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>>> ================================================ >>>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>>> instruction delta =????? -284608??? -0.4774% >>>> time??????? delta =?????? -0.434 ms -1.0435% >>>> >>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>>> (so we have an overhead of about 715 instructions per use, yikes!). >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Thu Jun 18 04:36:09 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 17 Jun 2020 21:36:09 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <3d61636e-02c5-3c49-0561-a69fb75fa7ce@oss.nttdata.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <3d61636e-02c5-3c49-0561-a69fb75fa7ce@oss.nttdata.com> Message-ID: Hi Yasumasa, Actually I intentionally decided to take this approach no matter what the cause of the failure. If we can't get the registers, print a WARNING, and then SA stack walking code will use the fall back of last_java_frame if available, otherwise not produce a stack trace. thanks, Chris On 6/17/20 5:11 PM, Yasumasa Suenaga wrote: > Hi Chris, > > Can you handle ESRCH only in this case? > ptrace(2) might fail by other reason. > > > Thanks, > > Yasumasa > > > On 2020/06/18 5:34, Chris Plummer wrote: >> Hello, >> >> Please help review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8247533 >> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >> >> The CR contains all the needed details. Here's a summary of changes >> in each file: >> >> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >> -Instead of throwing an exception when the OS ThreadID is invalid, >> print a warning. >> >> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >> -Improve a print_debug message >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >> >> -Deal with the array of registers read in being null due to the OS >> ThreadID not being valid. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >> >> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >> appearing twice when printing the exception. >> >> thanks, >> >> Chris From david.holmes at oracle.com Thu Jun 18 04:58:19 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 14:58:19 +1000 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> Message-ID: <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> Hi Yasumasa, On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: > Hi Serguei, > > Thanks for your comment! > I uploaded new webrev: > > ? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ > > I'm not sure the following change is correct. > Can we assume owning_thread is not NULL at safepoint? We can if "owner != NULL". So that change seem fine to me. But given this is now only executed at a safepoint there are additional simplifications that can be made: - current thread determination can be simplified: 945 Thread* current_thread = Thread::current(); becomes: Thread* current_thread = VMThread::vm_thread(); assert(current_thread == Thread::current(), "must be"); - these comments can be removed 994 // Use current thread since function can be called from a 995 // JavaThread or the VMThread. 1053 // Use current thread since function can be called from a 1054 // JavaThread or the VMThread. - these TLH constructions should be passing current_thread (existing bug) 996 ThreadsListHandle tlh; 1055 ThreadsListHandle tlh; - All ResourceMarks should be passing current_thread (existing bug) Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( Thanks, David ----- > All tests on submit repo and serviceability/jvmti and > vmTestbase/nsk/jvmti have been passed with this change. > > > ``` > ?????? // This monitor is owned so we have to find the owning JavaThread. > ?????? owning_thread = > Threads::owning_thread_from_monitor_owner(tlh.list(), owner); > -????? // Cannot assume (owning_thread != NULL) here because this function > -????? // may not have been called at a safepoint and the owning_thread > -????? // might not be suspended. > -????? if (owning_thread != NULL) { > -??????? // The monitor's owner either has to be the current thread, at > safepoint > -??????? // or it has to be suspended. Any of these conditions will > prevent both > -??????? // contending and waiting threads from modifying the state of > -??????? // the monitor. > -??????? if (!at_safepoint && > !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { > -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED > -????????? // will not make it back to the JVM/TI agent. The error code > will > -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which > -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. > -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; > -??????? } > -??????? HandleMark hm; > +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); > ???????? Handle???? th(current_thread, owning_thread->threadObj()); > ???????? ret.owner = (jthread)jni_reference(calling_thread, th); > > ``` > > > Thanks, > > Yasumasa > > > On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >> Hi Yasumasa, >> >> This fix is not enough. >> The function JvmtiEnvBase::get_object_monitor_usage works in two >> modes: in VMop and non-VMop. >> The non-VMop mode has to be removed. >> >> Thanks, >> Serguei >> >> >> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>> (Change subject for RFR) >>> >>> Hi, >>> >>> I filed it to JBS and upload a webrev for it. >>> Could you review it? >>> >>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>> >>> This change has passed tests on submit repo. >>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti >>> on Linux x64. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>> Yes. It seems we have a consensus. >>>> Thank you for taking care about it. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/16/20 18:34, David Holmes wrote: >>>>>> Ok, may I file it to JBS and fix it? >>>>> >>>>> Go for it! :) >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Dan, David and Yasumasa, >>>>>>> >>>>>>> >>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>> >>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not >>>>>>>>>>>>>>>>>> perform at safepoint. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the >>>>>>>>>>>>>>>>> target is not suspended: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, >>>>>>>>>>>>>>>>> info_ptr); >>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>>>>>>>>> get_object_monitor_usage(). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at >>>>>>>>>>>>>>>>>> first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be >>>>>>>>>>>>>>>>>> changed to others in subsequent code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For example, the owner might release the monitor >>>>>>>>>>>>>>>>>> before [2]. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it >>>>>>>>>>>>>>>>> is either suspended or not. If it is suspended then it >>>>>>>>>>>>>>>>> cannot release the monitor. If it is not suspended we >>>>>>>>>>>>>>>>> detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately >>>>>>>>>>>>>>>> after suspending check. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required >>>>>>>>>>>>>>> a safepoint but it only requires the Threads_lock. So yes >>>>>>>>>>>>>>> the code is wrong. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>> has started the process of gathering the information while >>>>>>>>>>>>>> not at a >>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>> >>>>>>>>>>>>> The code tries to make sure that it either collects data >>>>>>>>>>>>> about a monitor owned by a thread that is suspended, or >>>>>>>>>>>>> else it collects that data at a safepoint. But the owning >>>>>>>>>>>>> thread can be resumed just after the code determined it was >>>>>>>>>>>>> suspended. The monitor can then be released and the >>>>>>>>>>>>> information gathered not only stale but potentially >>>>>>>>>>>>> completely wrong as it could now be owned by a different >>>>>>>>>>>>> thread and will report that thread's entry count. >>>>>>>>>>>> >>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>> could have moved on. >>>>>>>>>>> >>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the >>>>>>>>>>> expectation is that the information was actually an accurate >>>>>>>>>>> snapshot of the state of the monitor at some point in time. >>>>>>>>>>> The current code does not ensure that. >>>>>>>>>> >>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>> at some point in time". >>>>>>>>> >>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>> atomicity**. The reported owner thread may not own it any >>>>>>>>> longer when the entry count is read, so straight away you may >>>>>>>>> have the wrong entry count information. The set of threads >>>>>>>>> trying to acquire the monitor, or wait on the monitor can >>>>>>>>> change in unexpected ways. It would be possible for instance to >>>>>>>>> report the same thread as being the owner, being blocked trying >>>>>>>>> to enter the monitor, and being in the wait-set of the monitor >>>>>>>>> - apparently all at the same time! >>>>>>>>> >>>>>>>>> ** even if the owner is suspended we don't have complete >>>>>>>>> atomicity because threads can join the set of threads trying to >>>>>>>>> enter the monitor (unless they are all suspended). >>>>>>>> >>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>> >>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info >>>>>>>> about >>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>> ??? we are gathering cannot change until we return from the >>>>>>>> safepoint. >>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>> >>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>> >>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>> ??? and notify_waiters can change. >>>>>>>> >>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>> >>>>>>>> Consider the case when the monitor is not owned: >>>>>>>> >>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>> ??? some confusion here. >>>>>>>> >>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>> ??? the right time. >>>>>>>> >>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>> >>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>> That's deterministic, but not without some work. >>>>>>>> >>>>>>>> >>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>> >>>>>>>> As for this claim: >>>>>>>> >>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>> the same time! >>>>>>>> >>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>> wait queue. >>>>>>>> >>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>> >>>>>>>> >>>>>>>> Handshaking is not going to make this situation any better >>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>> handshake with the owner, the stability or instability of >>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>> stable as when at a safepoint because individual threads >>>>>>>> can resume execution after doing their handshake so there >>>>>>>> will still be field instability. >>>>>>>> >>>>>>>> >>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>> >>>>>>> I agree with this. >>>>>>> The advantages are: >>>>>>> ??- the result is stable >>>>>>> ??- the implementation can be simplified >>>>>>> >>>>>>> Performance impact is not very clear but should not be that >>>>>>> big as suspending all the threads has some overhead too. >>>>>>> I'm not sure if using handshakes can make performance better. >>>>>> >>>>>> Ok, may I file it to JBS and fix it? >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>>>>>>> suspended so I can't see how this could be construed as an >>>>>>>>>>>>> agent bug. >>>>>>>>>>>> >>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>> >>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>> and ResumeThread() are made by the agent and the agent >>>>>>>>>>>> should not >>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>> >>>>>>>>>>> Perhaps. I was thinking more generally about an independent >>>>>>>>>>> resume, but you're right that doesn't really make a lot of >>>>>>>>>>> sense. But when the spec says nothing about suspension ... >>>>>>>>>> >>>>>>>>>> And it is intentional that suspension is not required. JVM/DI >>>>>>>>>> and JVM/PI >>>>>>>>>> used to require suspension for these kinds of get-the-info >>>>>>>>>> APIs. JVM/TI >>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>> >>>>>>>>>> As I've said before, we could add a note about the data being >>>>>>>>>> potentially >>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>> stat(2). You can >>>>>>>>>> fetch the file's info, but there's no guarantee that the info >>>>>>>>>> is current >>>>>>>>>> by the time you process what you got back. Is it too much >>>>>>>>>> motherhood to >>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be >>>>>>>>>>>>> fixed in the future without forcing/using any safepoints. >>>>>>>>>>>> >>>>>>>>>>>> I have to think about that which is why I'm avoiding talking >>>>>>>>>>>> about >>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>> >>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the >>>>>>>>>>> monitor is queried. In effect the operation would create a >>>>>>>>>>> per-thread safepoint. >>>>>>>>>> >>>>>>>>>> I "know" that, but I still need time to think about it and >>>>>>>>>> probably >>>>>>>>>> see the code to see if there are holes... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>> suspending the owner thread, but it can't actually do that >>>>>>>>>>> because suspends/resume don't nest. >>>>>>>>>> >>>>>>>>>> Yeah... we used have a suspend count back when we tracked >>>>>>>>>> internal and >>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>>> Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check >>>>>>>>>>>>>>>> thread state, it returns `true` when the thread is >>>>>>>>>>>>>>>> sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't >>>>>>>>>>>>>>> continue execution in the VM or in Java code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed >>>>>>>>>>>>>>>>> common case where threads are first suspended and then >>>>>>>>>>>>>>>>> the monitors are queried. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI >>>>>>>>>>>>>>>> spec - it just says "Get information about the object's >>>>>>>>>>>>>>>> monitor." >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing >>>>>>>>>>>>>>> to do with the spec. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might >>>>>>>>>>>>>>>> be just before wakeup. >>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is >>>>>>>>>>>>>>>> called at safepoint in any case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>> Handshakes, so this particular operation will require >>>>>>>>>>>>>>> that the apparent owner is Handshake-safe (by entering a >>>>>>>>>>>>>>> handshake with it) before querying the monitor. This >>>>>>>>>>>>>>> would still be preferable I think to always using a >>>>>>>>>>>>>>> safepoint for the entire operation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However there is still a potential bug as the thread >>>>>>>>>>>>>>>>> reported as the owner may not be suspended at the time >>>>>>>>>>>>>>>>> we first see it, and may release the monitor, but then >>>>>>>>>>>>>>>>> it may get suspended before we call: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>>>> proceed to query the monitor information in a racy way. >>>>>>>>>>>>>>>>> This can't happen when suspension itself requires a >>>>>>>>>>>>>>>>> safepoint as the current thread won't go to that >>>>>>>>>>>>>>>>> safepoint during this code. However, if suspension is >>>>>>>>>>>>>>>>> implemented via a direct handshake with the target >>>>>>>>>>>>>>>>> thread then we have a problem. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>> >> From david.holmes at oracle.com Thu Jun 18 05:09:15 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 15:09:15 +1000 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> Message-ID: <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> On 18/06/2020 2:33 pm, Chris Plummer wrote: > On 6/17/20 7:43 PM, David Holmes wrote: >> Hi Chris, >> >> On 18/06/2020 6:34 am, Chris Plummer wrote: >>> Hello, >>> >>> Please help review the following: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>> >>> The CR contains all the needed details. Here's a summary of changes >>> in each file: >> >> The problem sounds to me like a variation of the more general problem >> of not ensuring a thread is kept alive whilst acting upon it. I don't >> know how the SA finds these references to the threads it is going to >> stackwalk, but is it possible to fix this via appropriate uses of >> ThreadsListHandle/Iterator? > It fetches ThreadsSMRSupport::_java_thread_list. > > Keep in mind that once SA attaches, nothing in the VM changes. For > example, SA can't create a wrapper to a JavaThread, only to have the > JavaThread be freed later on. It's just not possible. Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. David ----- > Chris >> >> Cheers, >> David >> >>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>> -Instead of throwing an exception when the OS ThreadID is invalid, >>> print a warning. >>> >>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>> -Improve a print_debug message >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>> >>> -Deal with the array of registers read in being null due to the OS >>> ThreadID not being valid. >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>> >>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>> appearing twice when printing the exception. >>> >>> thanks, >>> >>> Chris > From chris.plummer at oracle.com Thu Jun 18 05:13:23 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 17 Jun 2020 22:13:23 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> Message-ID: On 6/17/20 10:09 PM, David Holmes wrote: > On 18/06/2020 2:33 pm, Chris Plummer wrote: >> On 6/17/20 7:43 PM, David Holmes wrote: >>> Hi Chris, >>> >>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>> Hello, >>>> >>>> Please help review the following: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>> >>>> The CR contains all the needed details. Here's a summary of changes >>>> in each file: >>> >>> The problem sounds to me like a variation of the more general >>> problem of not ensuring a thread is kept alive whilst acting upon >>> it. I don't know how the SA finds these references to the threads it >>> is going to stackwalk, but is it possible to fix this via >>> appropriate uses of ThreadsListHandle/Iterator? >> It fetches ThreadsSMRSupport::_java_thread_list. >> >> Keep in mind that once SA attaches, nothing in the VM changes. For >> example, SA can't create a wrapper to a JavaThread, only to have the >> JavaThread be freed later on. It's just not possible. > > Then how does it obtain a reference to a JavaThread for which the > native OS thread id is invalid? Any thread found in _java_thread_list > is either live or still to be started. In the latter case the > JavaThread->osThread does not have its thread_id set yet. > My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. Chris > David > ----- > >> Chris >>> >>> Cheers, >>> David >>> >>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>> -Instead of throwing an exception when the OS ThreadID is invalid, >>>> print a warning. >>>> >>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>> -Improve a print_debug message >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>> >>>> -Deal with the array of registers read in being null due to the OS >>>> ThreadID not being valid. >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>> >>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>> appearing twice when printing the exception. >>>> >>>> thanks, >>>> >>>> Chris >> From david.holmes at oracle.com Thu Jun 18 05:16:30 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 15:16:30 +1000 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> <91d7ab26-cccb-8d9c-8929-5b37664f6313@oracle.com> Message-ID: Hi Ioi, Re EagerInitialize ... skip to bottom part ... On 18/06/2020 2:34 pm, Ioi Lam wrote: > > > On 6/17/20 7:38 PM, David Holmes wrote: >> Hi Ioi, >> >> On 17/06/2020 1:19 pm, Ioi Lam wrote: >>> On 6/16/20 6:20 PM, David Holmes wrote: >>>> Hi Ioi, >>>> >>>> On 17/06/2020 6:14 am, Ioi Lam wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>>>> >>>>> >>>>> PerfClassTraceTime is (a rarely used feature) for measuring the >>>>> time spent during class linking and initialization. >>>> >>>> "A special command jcmd PerfCounter.print >>>> prints all performance counters in the process." >>>> >>>> How do you know this is a "rarely used feature"? >>> Hi David, >>> >>> Sure, the counter will be dumped, but by "rarely used" -- I mean no >>> one will find this particular counter useful, and no one will be >>> actively looking at it. >>> >>> I changed two parts of the code -- class init and class linking. >>> >>> For class initialization, the counter may be useful for people who >>> want to know how much time is spent in their functions, and >>> my patch doesn't change that. It only avoids using the counter when a >>> class has no , i.e., we know that the counter counts nothing >>> (except for a logging statement). >> >> I can see where you are coming from here. Today we keep track of the >> time taken to mark a class as initialized when there is no clinit to >> execute, and we actually record pure timer overhead as it dwarfs the >> simple update to the class state. With your change we now won't track >> the time taken to mark the class as initialized. In both cases the >> time recorded is inaccurate - in opposite senses. In that regard your >> slight underestimation of the class initalization cost seems better >> then the present over-estimate. >> >>> ===== >>> >>> For class linking, no user code is executed, so it only measures VM >>> code. If it's useful for anyone, that would be VM engineers like me >>> who are trying to optimize the speed of class loading. However, due >>> to the overhead of the counter vs what it's trying to measure, the >>> results are pretty meaningless. >>> >>> Note that I've not disabled the counter altogether. Instead, I >>> disable it only when linking a CDS shared class, and we know that >>> very little is happening for this class (e.g., no verification). >> >> Yes I have little concern for this part. Linking is a multi-phase >> process so "time for linking" is already ill-defined. And the fact you >> only do it for CDS makes it even less interesting. >> >>> I think the class linking timer might have been useful 15 years ago >>> when it was introduced, or it might be useful today when CDS is >>> disabled. But with CDS enabled, we are paying a constant price that >>> seems to benefit no one. >>> >>> I think we should short-circuit it when it seems appropriate. If this >>> indeed causes problems for our users, it's easy to re-enable it. >>> That's better than just keeping this forever just because we're >>> afraid to touch anything. >> >> I'm uncomfortable with both the "keep forever as we're too afraid to >> change" and "change it now and restore it if anyone complains" ends of >> this spectrum. Obviously we need to make progress, but the "change it >> now and change back later if needed" is a bit naive, as once any >> change is made we can't change back without affecting another set of >> users, and we don't know how long it will be before the change reaches >> users and the problems return to us. From a CSR perspective I want to >> see that due diligence has been applied with regard to these >> behavioural changes, and JDK engineers are often not in a position to >> understand how end users use this kind of information. I don't have a >> solution for that general problem. >> >> In this particular case I think under-estimating the class >> initialization overhead is better than the present over-estimate. >> Though anyone tracking the trends here will be surprised when the cost >> suddenly drops. > > Hi David, > > I don't have a solution, either. I am in no hurry, and the improvement > is minor. > > I could post a CSR and let it stand for a few months to see if anyone > objects. My impression is "linking time" is such an esoteric feature > that no one will care, but I may be wrong. Actually it would be good if > someone tells me I am wrong -- they probably are experiencing some > overhead in class loading that we don't currently know about. > > As I mentioned earlier, if anyone is using this timer, it would be VM > engineers who work on class loading. In fact, Yumin fixed 8178349 "Cache > builtin class loader constraints to avoid re-initializing itable/vtable > for shared classes" in JDK 15, which significantly reduced the amount of > time spent during class linking. However, we didn't use this timer for > measuring the effectiveness of that fix, as the overhead and variability > are too high. We have reached a point in the class loading code that we > can only make small incremental improvements, and we can only measure > the effect of our changes with external profilers such as "perf stat -r > 200 bin/java -version" that launches the VM 200 times (and repeat that > 10 times) and averages the elapsed time. > > Maybe it's time for the "class linking" timer to go away completely, or > at least be disabled when CDS is enabled. It's pretty much useless. I > wish we had an established deprecation process for such legacy features. > > >> >>>> >>>> I find it hard to evaluate whether this short-circuiting of the time >>>> tracing is reasonable or not. Obviously any monitoring mechanism >>>> should impose minimal overhead compared to what is being measured, >>>> and these timers fall short in that regard. But if these stats >>>> become meaningless then they may as well be removed. >>>> >>>> I think the serviceability folk (cc'd) need to evaluate this in the >>>> context of the M&M tools. >>>> >>>>> However, it's quite expensive and it needs to start and stop a >>>>> bunch of timers. With CDS, it's quite often for the overhead of the >>>>> timer itself to be much more than the time it's trying to measure, >>>>> giving unreliable measurement. >>>>> >>>>> In this patch, when it's clear that the init and linking will be >>>>> very quick, I disable the timer and count only the number of >>>>> invocations. This shows a small improvement in start-up >>>> >>>> I'm curious if you tried to forcing EagerInitialization to be true >>>> to see how that improves the baseline. I've always noticed >>>> eager_init in the code, but hadn't realized it is disabled by default. >>>> >>> >>> I think it cannot be done by default, as it will violate the JLS. A >>> class can be initialized only when it's touched by bytecodes. >> >> I'm also not sure it violates JLS as you can't directly query if a >> class is initialized, but anyway I wasn't suggesting turning this on >> by default, I meant only in regard to get a performance baseline to >> compare against the changes you have implemented here. > > I don't think EagerInitialization will make a difference that's related > to the timer. Both the two instances of the timer will still be used for > exactly the same number of times, just under different call stacks. Eager initialization doesn't use any timers. >> >> Aside: I have to wonder whether anyone uses EagerInitialization or >> whether we should get rid of it? >> > I wonder about the same thing. I'll ask around and file an RFE if > appropriate. > >>> It can also backfire as we may load many classes without initializing >>> them. E.g., during bytecode verification, we load many classes and >>> just check that one is a supertype of another. >> >> Not sure what is backfiring here ?? > > Here's an example that both violates the JLS and backfires as it slows > down VM start up, because can have observable side effects: > > class Main { > ??? static int X; > ??? public void main(String args[]) { > ??????? System.out.println(X); > ??? } > ??? void deadcode() { > ??????? Super s = new Child(); > ??????? s.method(); > ??? } > } > > class Super { > ??? void method() {} > } > > class Child extends Super { > ??? static { > ??????? for (int i=0; i<1000000; i++) { > ??????????? Main.X ++; > ??????? } > ??? } > } > > When Main is linked, its bytecodes are verified, including deadcode(). > Since deadcode() has an implicit cast of Child to Super, the verifier > needs to load both Child and Super, and check that Child is indeed a > subclass of Super. > > If EagerInitialization is enabled, Child will be initialized as soon as > it's entered into the system dictionary. This violates No it won't. Eager initialization is only applicable when there is no and where the super classes are already initialised: void InstanceKlass::eager_initialize(Thread *thread) { if (!EagerInitialization) return; if (this->is_not_initialized()) { // abort if the the class has a class initializer if (this->class_initializer() != NULL) return; // abort if it is java.lang.Object (initialization is handled in genesis) Klass* super_klass = super(); if (super_klass == NULL) return; // abort if the super class should be initialized if (!InstanceKlass::cast(super_klass)->is_initialized()) return; // call body to expose the this pointer eager_initialize_impl(); } } David ----- > https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1 > > A class or interface type T will be initialized immediately before > the first occurrence of any one of the following: > > T is a class and an instance of T is created. > A static method declared by T is invoked. > A static field declared by T is assigned. > A static field declared by T is used and the field is not a constant > variable (?4.12.4). > .... > A class or interface will not be initialized under any other > circumstance. > > > Also, app developers will generally expect 0 to be printed. So > EagerInitialization will probably break apps in subtle ways. > > And initializing Child may recursively load in more classes during the > verification of Child ...... > > Thanks > - Ioi > >> >> Thanks, >> David >> ----- >> >>> Thanks >>> - Ioi >>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> Results of " perf stat -r 100 bin/java -Xshare:on >>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>>>> >>>>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>>>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>>>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>>>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>>>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>>>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>>>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 ( 0.310) + >>>>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 ( 0.341) ++ >>>>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>>>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>>>> ================================================ >>>>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>>>> instruction delta =????? -284608??? -0.4774% >>>>> time??????? delta =?????? -0.434 ms -1.0435% >>>>> >>>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>>>> (so we have an overhead of about 715 instructions per use, yikes!). >>> > From david.holmes at oracle.com Thu Jun 18 05:29:44 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 15:29:44 +1000 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> Message-ID: On 18/06/2020 3:13 pm, Chris Plummer wrote: > On 6/17/20 10:09 PM, David Holmes wrote: >> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>> On 6/17/20 7:43 PM, David Holmes wrote: >>>> Hi Chris, >>>> >>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>> Hello, >>>>> >>>>> Please help review the following: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>> >>>>> The CR contains all the needed details. Here's a summary of changes >>>>> in each file: >>>> >>>> The problem sounds to me like a variation of the more general >>>> problem of not ensuring a thread is kept alive whilst acting upon >>>> it. I don't know how the SA finds these references to the threads it >>>> is going to stackwalk, but is it possible to fix this via >>>> appropriate uses of ThreadsListHandle/Iterator? >>> It fetches ThreadsSMRSupport::_java_thread_list. >>> >>> Keep in mind that once SA attaches, nothing in the VM changes. For >>> example, SA can't create a wrapper to a JavaThread, only to have the >>> JavaThread be freed later on. It's just not possible. >> >> Then how does it obtain a reference to a JavaThread for which the >> native OS thread id is invalid? Any thread found in _java_thread_list >> is either live or still to be started. In the latter case the >> JavaThread->osThread does not have its thread_id set yet. >> > My assumption was that the JavaThread is in the process of being > destroyed, and it has freed its OS thread but is itself still in the > thread list. I did notice that the OS thread id being used looked to be > in the range of thread id #'s you would expect for the running app, so > that to me indicated it was once valid, but is no more. > > Keep in mind that although hotspot may have synchronization code that > prevents you from pulling a JavaThread off the thread list when it is in > the process of being destroyed (I'm guessing it does), SA has no such > protections. But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. David ----- > Chris >> David >> ----- >> >>> Chris >>>> >>>> Cheers, >>>> David >>>> >>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>> -Instead of throwing an exception when the OS ThreadID is invalid, >>>>> print a warning. >>>>> >>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>> -Improve a print_debug message >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>> >>>>> -Deal with the array of registers read in being null due to the OS >>>>> ThreadID not being valid. >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>> >>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>>> appearing twice when printing the exception. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>> > From suenaga at oss.nttdata.com Thu Jun 18 05:47:21 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 18 Jun 2020 14:47:21 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> References: <9c2060bf-6661-8835-06e6-16d1803c3753@oracle.com> <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> Message-ID: <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> Hi David, Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. Do you mean we should it explicitly in c'tor? Thanks, Yasumasa On 2020/06/18 13:58, David Holmes wrote: > Hi Yasumasa, > > On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >> Hi Serguei, >> >> Thanks for your comment! >> I uploaded new webrev: >> >> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >> >> I'm not sure the following change is correct. >> Can we assume owning_thread is not NULL at safepoint? > > We can if "owner != NULL". So that change seem fine to me. > > But given this is now only executed at a safepoint there are additional simplifications that can be made: > > - current thread determination can be simplified: > > 945?? Thread* current_thread = Thread::current(); > > becomes: > > ?? Thread* current_thread = VMThread::vm_thread(); > ?? assert(current_thread == Thread::current(), "must be"); > > - these comments can be removed > > ?994?????? // Use current thread since function can be called from a > ?995?????? // JavaThread or the VMThread. > 1053?????? // Use current thread since function can be called from a > 1054?????? // JavaThread or the VMThread. > > - these TLH constructions should be passing current_thread (existing bug) > > 996?????? ThreadsListHandle tlh; > 1055?????? ThreadsListHandle tlh; > > - All ResourceMarks should be passing current_thread (existing bug) > > > Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( > > Thanks, > David > ----- > >> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >> >> >> ``` >> ??????? // This monitor is owned so we have to find the owning JavaThread. >> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >> -????? // Cannot assume (owning_thread != NULL) here because this function >> -????? // may not have been called at a safepoint and the owning_thread >> -????? // might not be suspended. >> -????? if (owning_thread != NULL) { >> -??????? // The monitor's owner either has to be the current thread, at safepoint >> -??????? // or it has to be suspended. Any of these conditions will prevent both >> -??????? // contending and waiting threads from modifying the state of >> -??????? // the monitor. >> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >> -????????? // will not make it back to the JVM/TI agent. The error code will >> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >> -??????? } >> -??????? HandleMark hm; >> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >> >> ``` >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>> Hi Yasumasa, >>> >>> This fix is not enough. >>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>> The non-VMop mode has to be removed. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>> (Change subject for RFR) >>>> >>>> Hi, >>>> >>>> I filed it to JBS and upload a webrev for it. >>>> Could you review it? >>>> >>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>> >>>> This change has passed tests on submit repo. >>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>> Yes. It seems we have a consensus. >>>>> Thank you for taking care about it. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>> Ok, may I file it to JBS and fix it? >>>>>> >>>>>> Go for it! :) >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>> >>>>>>>> >>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>> >>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>> could have moved on. >>>>>>>>>>>> >>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>> >>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>> at some point in time". >>>>>>>>>> >>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>> >>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>> >>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>> >>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>> >>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>> >>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>> ??? and notify_waiters can change. >>>>>>>>> >>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>> >>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>> >>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>> ??? some confusion here. >>>>>>>>> >>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>> ??? the right time. >>>>>>>>> >>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>> >>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>> That's deterministic, but not without some work. >>>>>>>>> >>>>>>>>> >>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>> >>>>>>>>> As for this claim: >>>>>>>>> >>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>> the same time! >>>>>>>>> >>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>> wait queue. >>>>>>>>> >>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>> >>>>>>>>> >>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>> will still be field instability. >>>>>>>>> >>>>>>>>> >>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>> >>>>>>>> I agree with this. >>>>>>>> The advantages are: >>>>>>>> ??- the result is stable >>>>>>>> ??- the implementation can be simplified >>>>>>>> >>>>>>>> Performance impact is not very clear but should not be that >>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>> >>>>>>> Ok, may I file it to JBS and fix it? >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>> >>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>> >>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>> >>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>> >>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>> >>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>> >>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>> >>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>> >>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>> >>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>> >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> Dan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>> From ioi.lam at oracle.com Thu Jun 18 05:57:21 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 17 Jun 2020 22:57:21 -0700 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> <91d7ab26-cccb-8d9c-8929-5b37664f6313@oracle.com> Message-ID: <0c5147de-f8d5-b069-b76b-9c5b0ce15457@oracle.com> On 6/17/20 10:16 PM, David Holmes wrote: > Hi Ioi, > > Re EagerInitialize ... skip to bottom part ... > > On 18/06/2020 2:34 pm, Ioi Lam wrote: >> >> >> On 6/17/20 7:38 PM, David Holmes wrote: >>> Hi Ioi, >>> >>> On 17/06/2020 1:19 pm, Ioi Lam wrote: >>>> On 6/16/20 6:20 PM, David Holmes wrote: >>>>> Hi Ioi, >>>>> >>>>> On 17/06/2020 6:14 am, Ioi Lam wrote: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>>>>> >>>>>> >>>>>> PerfClassTraceTime is (a rarely used feature) for measuring the >>>>>> time spent during class linking and initialization. >>>>> >>>>> "A special command jcmd PerfCounter.print >>>>> prints all performance counters in the process." >>>>> >>>>> How do you know this is a "rarely used feature"? >>>> Hi David, >>>> >>>> Sure, the counter will be dumped, but by "rarely used" -- I mean no >>>> one will find this particular counter useful, and no one will be >>>> actively looking at it. >>>> >>>> I changed two parts of the code -- class init and class linking. >>>> >>>> For class initialization, the counter may be useful for people who >>>> want to know how much time is spent in their functions, >>>> and my patch doesn't change that. It only avoids using the counter >>>> when a class has no , i.e., we know that the counter counts >>>> nothing (except for a logging statement). >>> >>> I can see where you are coming from here. Today we keep track of the >>> time taken to mark a class as initialized when there is no clinit to >>> execute, and we actually record pure timer overhead as it dwarfs the >>> simple update to the class state. With your change we now won't >>> track the time taken to mark the class as initialized. In both cases >>> the time recorded is inaccurate - in opposite senses. In that regard >>> your slight underestimation of the class initalization cost seems >>> better then the present over-estimate. >>> >>>> ===== >>>> >>>> For class linking, no user code is executed, so it only measures VM >>>> code. If it's useful for anyone, that would be VM engineers like me >>>> who are trying to optimize the speed of class loading. However, due >>>> to the overhead of the counter vs what it's trying to measure, the >>>> results are pretty meaningless. >>>> >>>> Note that I've not disabled the counter altogether. Instead, I >>>> disable it only when linking a CDS shared class, and we know that >>>> very little is happening for this class (e.g., no verification). >>> >>> Yes I have little concern for this part. Linking is a multi-phase >>> process so "time for linking" is already ill-defined. And the fact >>> you only do it for CDS makes it even less interesting. >>> >>>> I think the class linking timer might have been useful 15 years ago >>>> when it was introduced, or it might be useful today when CDS is >>>> disabled. But with CDS enabled, we are paying a constant price that >>>> seems to benefit no one. >>>> >>>> I think we should short-circuit it when it seems appropriate. If >>>> this indeed causes problems for our users, it's easy to re-enable >>>> it. That's better than just keeping this forever just because we're >>>> afraid to touch anything. >>> >>> I'm uncomfortable with both the "keep forever as we're too afraid to >>> change" and "change it now and restore it if anyone complains" ends >>> of this spectrum. Obviously we need to make progress, but the >>> "change it now and change back later if needed" is a bit naive, as >>> once any change is made we can't change back without affecting >>> another set of users, and we don't know how long it will be before >>> the change reaches users and the problems return to us. From a CSR >>> perspective I want to see that due diligence has been applied with >>> regard to these behavioural changes, and JDK engineers are often not >>> in a position to understand how end users use this kind of >>> information. I don't have a solution for that general problem. >>> >>> In this particular case I think under-estimating the class >>> initialization overhead is better than the present over-estimate. >>> Though anyone tracking the trends here will be surprised when the >>> cost suddenly drops. >> >> Hi David, >> >> I don't have a solution, either. I am in no hurry, and the >> improvement is minor. >> >> I could post a CSR and let it stand for a few months to see if anyone >> objects. My impression is "linking time" is such an esoteric feature >> that no one will care, but I may be wrong. Actually it would be good >> if someone tells me I am wrong -- they probably are experiencing some >> overhead in class loading that we don't currently know about. >> >> As I mentioned earlier, if anyone is using this timer, it would be VM >> engineers who work on class loading. In fact, Yumin fixed 8178349 >> "Cache builtin class loader constraints to avoid re-initializing >> itable/vtable for shared classes" in JDK 15, which significantly >> reduced the amount of time spent during class linking. However, we >> didn't use this timer for measuring the effectiveness of that fix, as >> the overhead and variability are too high. We have reached a point in >> the class loading code that we can only make small incremental >> improvements, and we can only measure the effect of our changes with >> external profilers such as "perf stat -r 200 bin/java -version" that >> launches the VM 200 times (and repeat that 10 times) and averages the >> elapsed time. >> >> Maybe it's time for the "class linking" timer to go away completely, >> or at least be disabled when CDS is enabled. It's pretty much >> useless. I wish we had an established deprecation process for such >> legacy features. >> >> >>> >>>>> >>>>> I find it hard to evaluate whether this short-circuiting of the >>>>> time tracing is reasonable or not. Obviously any monitoring >>>>> mechanism should impose minimal overhead compared to what is being >>>>> measured, and these timers fall short in that regard. But if these >>>>> stats become meaningless then they may as well be removed. >>>>> >>>>> I think the serviceability folk (cc'd) need to evaluate this in >>>>> the context of the M&M tools. >>>>> >>>>>> However, it's quite expensive and it needs to start and stop a >>>>>> bunch of timers. With CDS, it's quite often for the overhead of >>>>>> the timer itself to be much more than the time it's trying to >>>>>> measure, giving unreliable measurement. >>>>>> >>>>>> In this patch, when it's clear that the init and linking will be >>>>>> very quick, I disable the timer and count only the number of >>>>>> invocations. This shows a small improvement in start-up >>>>> >>>>> I'm curious if you tried to forcing EagerInitialization to be true >>>>> to see how that improves the baseline. I've always noticed >>>>> eager_init in the code, but hadn't realized it is disabled by >>>>> default. >>>>> >>>> >>>> I think it cannot be done by default, as it will violate the JLS. A >>>> class can be initialized only when it's touched by bytecodes. >>> >>> I'm also not sure it violates JLS as you can't directly query if a >>> class is initialized, but anyway I wasn't suggesting turning this on >>> by default, I meant only in regard to get a performance baseline to >>> compare against the changes you have implemented here. >> >> I don't think EagerInitialization will make a difference that's >> related to the timer. Both the two instances of the timer will still >> be used for exactly the same number of times, just under different >> call stacks. > > Eager initialization doesn't use any timers. > >>> >>> Aside: I have to wonder whether anyone uses EagerInitialization or >>> whether we should get rid of it? >>> >> I wonder about the same thing. I'll ask around and file an RFE if >> appropriate. >> >>>> It can also backfire as we may load many classes without >>>> initializing them. E.g., during bytecode verification, we load many >>>> classes and just check that one is a supertype of another. >>> >>> Not sure what is backfiring here ?? >> >> Here's an example that both violates the JLS and backfires as it >> slows down VM start up, because can have observable side >> effects: >> >> class Main { >> ???? static int X; >> ???? public void main(String args[]) { >> ???????? System.out.println(X); >> ???? } >> ???? void deadcode() { >> ???????? Super s = new Child(); >> ???????? s.method(); >> ???? } >> } >> >> class Super { >> ???? void method() {} >> } >> >> class Child extends Super { >> ???? static { >> ???????? for (int i=0; i<1000000; i++) { >> ???????????? Main.X ++; >> ???????? } >> ???? } >> } >> >> When Main is linked, its bytecodes are verified, including >> deadcode(). Since deadcode() has an implicit cast of Child to Super, >> the verifier needs to load both Child and Super, and check that Child >> is indeed a subclass of Super. >> >> If EagerInitialization is enabled, Child will be initialized as soon >> as it's entered into the system dictionary. This violates > > No it won't. Eager initialization is only applicable when there is no > and where the super classes are already initialised: > > void InstanceKlass::eager_initialize(Thread *thread) { > ? if (!EagerInitialization) return; > > ? if (this->is_not_initialized()) { > ??? // abort if the the class has a class initializer > ??? if (this->class_initializer() != NULL) return; > > ??? // abort if it is java.lang.Object (initialization is handled in > genesis) > ??? Klass* super_klass = super(); > ??? if (super_klass == NULL) return; > > ??? // abort if the super class should be initialized > ??? if (!InstanceKlass::cast(super_klass)->is_initialized()) return; > > ??? // call body to expose the this pointer > ??? eager_initialize_impl(); > ? } > } > > David > ----- > Ah, I didn't see that. I should have read the code first :-) Anyway, Eager initialization will trigger verification which may cause more classes to be loaded (and initialized if they don't have , and so forth). So it's not equivalent to, and is hard to compare with, my proposed changes for skipping the timer. Thanks - Ioi >> https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1 >> >> >> ??? A class or interface type T will be initialized immediately before >> ??? the first occurrence of any one of the following: >> >> ??? T is a class and an instance of T is created. >> ??? A static method declared by T is invoked. >> ??? A static field declared by T is assigned. >> ??? A static field declared by T is used and the field is not a constant >> ??? variable (?4.12.4). >> ??? .... >> ??? A class or interface will not be initialized under any other >> ??? circumstance. >> >> >> Also, app developers will generally expect 0 to be printed. So >> EagerInitialization will probably break apps in subtle ways. >> >> And initializing Child may recursively load in more classes during >> the verification of Child ...... >> >> Thanks >> - Ioi >> >>> >>> Thanks, >>> David >>> ----- >>> >>>> Thanks >>>> - Ioi >>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> Results of " perf stat -r 100 bin/java -Xshare:on >>>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>>>>> >>>>>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>>>>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>>>>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>>>>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>>>>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>>>>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>>>>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 ( 0.310) + >>>>>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 ( 0.341) ++ >>>>>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>>>>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>>>>> ================================================ >>>>>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>>>>> instruction delta =????? -284608??? -0.4774% >>>>>> time??????? delta =?????? -0.434 ms -1.0435% >>>>>> >>>>>> The number of PerfClassTraceTime's used is reduced from 564 to >>>>>> 116 (so we have an overhead of about 715 instructions per use, >>>>>> yikes!). >>>> >> From chris.plummer at oracle.com Thu Jun 18 06:49:05 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 17 Jun 2020 23:49:05 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> Message-ID: <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> On 6/17/20 10:29 PM, David Holmes wrote: > On 18/06/2020 3:13 pm, Chris Plummer wrote: >> On 6/17/20 10:09 PM, David Holmes wrote: >>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>> Hi Chris, >>>>> >>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>> Hello, >>>>>> >>>>>> Please help review the following: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>> >>>>>> The CR contains all the needed details. Here's a summary of >>>>>> changes in each file: >>>>> >>>>> The problem sounds to me like a variation of the more general >>>>> problem of not ensuring a thread is kept alive whilst acting upon >>>>> it. I don't know how the SA finds these references to the threads >>>>> it is going to stackwalk, but is it possible to fix this via >>>>> appropriate uses of ThreadsListHandle/Iterator? >>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>> >>>> Keep in mind that once SA attaches, nothing in the VM changes. For >>>> example, SA can't create a wrapper to a JavaThread, only to have >>>> the JavaThread be freed later on. It's just not possible. >>> >>> Then how does it obtain a reference to a JavaThread for which the >>> native OS thread id is invalid? Any thread found in >>> _java_thread_list is either live or still to be started. In the >>> latter case the JavaThread->osThread does not have its thread_id set >>> yet. >>> >> My assumption was that the JavaThread is in the process of being >> destroyed, and it has freed its OS thread but is itself still in the >> thread list. I did notice that the OS thread id being used looked to >> be in the range of thread id #'s you would expect for the running >> app, so that to me indicated it was once valid, but is no more. >> >> Keep in mind that although hotspot may have synchronization code that >> prevents you from pulling a JavaThread off the thread list when it is >> in the process of being destroyed (I'm guessing it does), SA has no >> such protections. > > But you stated that once the SA has attached, the target VM can't > change. If the SA gets its set of thread from one attach then tries to > make queries about those threads in a separate attach, then obviously > it could be providing garbage thread information. So you would need to > re-validate the JavaThread in the target VM before trying to do > anything with it. That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. Chris > > David > ----- > >> Chris >>> David >>> ----- >>> >>>> Chris >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>> invalid, print a warning. >>>>>> >>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>> -Improve a print_debug message >>>>>> >>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>> >>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>> >>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>> >>>>>> -Deal with the array of registers read in being null due to the >>>>>> OS ThreadID not being valid. >>>>>> >>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>> >>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>> >>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>>>> appearing twice when printing the exception. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>> >> From stefan.karlsson at oracle.com Thu Jun 18 07:25:37 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 18 Jun 2020 09:25:37 +0200 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage In-Reply-To: References: Message-ID: <94da70e9-2c61-a97d-e31a-6e0ec4478faa@oracle.com> Hi Coleen, On 2020-06-17 23:25, coleen.phillimore at oracle.com wrote: > Summary: Remove JVMTI oops_do calls from JVMTI and GCs > > Tested with tier1-3, also built shenandoah to verify shenandoah changes. > > open webrev at http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev https://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev/src/hotspot/share/prims/jvmtiImpl.cpp.udiff.html JvmtiBreakpoint::~JvmtiBreakpoint() { - if (_class_holder != NULL) { - NativeAccess<>::oop_store(_class_holder, (oop)NULL); - OopStorageSet::vm_global()->release(_class_holder); + if (_class_holder.resolve() != NULL) { + _class_holder.release(); } } Could this be changed to peek() / release() instead? The resolve() call is going to keep the object alive until next for ZGC marking cycle. The rest looks OK. Below are some comments about things that I find odd and non-obvious from reading the code, and may be potentials for cleanups to make it easier for the next to understand the code: The above code assumes that as soon as OopHandle::create has been called, we won't store NULL into the _obj pointer. If someone does, then we would leak the memory. OopHandle has a function ptr_raw, that allows someone to clear the _obj pointer. I have to assume that this function isn't used in this code. --- 214 void JvmtiBreakpoint::copy(JvmtiBreakpoint& bp) { 215 _method = bp._method; 216 _bci = bp._bci; 217 _class_holder = OopHandle::create(bp._class_holder.resolve()); 218 } This one looks odd, because the _class_holder is overwritten without releasing the old OopHandle. This is currently OK, because copy is only called from clone, which just created a new JvmtiBreakpoint: ? GrowableElement *clone()??????? { ??? JvmtiBreakpoint *bp = new JvmtiBreakpoint(); ??? bp->copy(*this); ??? return bp; ? } ?I think this would have been much more obvious if copy/clone were a copy constructor. With that said, it looks like we now have two JvmtiBreakpoints with the same OopHandle contents. So, OopHandle::release will be called twice. Now that works because release clears the oop value: inline void OopHandle::release() { ? // Clear the OopHandle first ? NativeAccess<>::oop_store(_obj, (oop)NULL); ? OopStorageSet::vm_global()->release(_obj); } and the resolve() != NULL check will prevent the OopHandle from being released twice: + if (_class_holder.resolve() != NULL) { + _class_holder.release(); } StefanK > bug link https://bugs.openjdk.java.net/browse/JDK-8247808 > > Thanks, > Coleen From david.holmes at oracle.com Thu Jun 18 08:36:23 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 18:36:23 +1000 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> References: <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> Message-ID: On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: > Hi David, > > Both ThreadsListHandle and ResourceMarks would use `Thread::current()` > for their resource. It is set as default parameter in c'tor. > Do you mean we should it explicitly in c'tor? Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). David > > Thanks, > > Yasumasa > > > On 2020/06/18 13:58, David Holmes wrote: >> Hi Yasumasa, >> >> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>> Hi Serguei, >>> >>> Thanks for your comment! >>> I uploaded new webrev: >>> >>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>> >>> I'm not sure the following change is correct. >>> Can we assume owning_thread is not NULL at safepoint? >> >> We can if "owner != NULL". So that change seem fine to me. >> >> But given this is now only executed at a safepoint there are >> additional simplifications that can be made: >> >> - current thread determination can be simplified: >> >> 945?? Thread* current_thread = Thread::current(); >> >> becomes: >> >> ??? Thread* current_thread = VMThread::vm_thread(); >> ??? assert(current_thread == Thread::current(), "must be"); >> >> - these comments can be removed >> >> ??994?????? // Use current thread since function can be called from a >> ??995?????? // JavaThread or the VMThread. >> 1053?????? // Use current thread since function can be called from a >> 1054?????? // JavaThread or the VMThread. >> >> - these TLH constructions should be passing current_thread (existing bug) >> >> 996?????? ThreadsListHandle tlh; >> 1055?????? ThreadsListHandle tlh; >> >> - All ResourceMarks should be passing current_thread (existing bug) >> >> >> Aside: there is a major inconsistency between the spec and >> implementation for this method. I've traced the history to see how >> this came about from JVMDI (ref JDK-4546581) but it never resulted in >> the JVM TI specification clearly stating what the waiters/waiter_count >> means. I will file a bug to have the spec clarified to match the >> implementation (even though I think the implementation is what is >> wrong). :( >> >> Thanks, >> David >> ----- >> >>> All tests on submit repo and serviceability/jvmti and >>> vmTestbase/nsk/jvmti have been passed with this change. >>> >>> >>> ``` >>> ??????? // This monitor is owned so we have to find the owning >>> JavaThread. >>> ??????? owning_thread = >>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>> -????? // Cannot assume (owning_thread != NULL) here because this >>> function >>> -????? // may not have been called at a safepoint and the owning_thread >>> -????? // might not be suspended. >>> -????? if (owning_thread != NULL) { >>> -??????? // The monitor's owner either has to be the current thread, >>> at safepoint >>> -??????? // or it has to be suspended. Any of these conditions will >>> prevent both >>> -??????? // contending and waiting threads from modifying the state of >>> -??????? // the monitor. >>> -??????? if (!at_safepoint && >>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>> -????????? // Don't worry! This return of >>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>> -????????? // will not make it back to the JVM/TI agent. The error >>> code will >>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>> -??????? } >>> -??????? HandleMark hm; >>> +????? assert(owning_thread != NULL, "owning JavaThread must not be >>> NULL"); >>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>> >>> ``` >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>> Hi Yasumasa, >>>> >>>> This fix is not enough. >>>> The function JvmtiEnvBase::get_object_monitor_usage works in two >>>> modes: in VMop and non-VMop. >>>> The non-VMop mode has to be removed. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>> (Change subject for RFR) >>>>> >>>>> Hi, >>>>> >>>>> I filed it to JBS and upload a webrev for it. >>>>> Could you review it? >>>>> >>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>> >>>>> This change has passed tests on submit repo. >>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti >>>>> on Linux x64. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>> Yes. It seems we have a consensus. >>>>>> Thank you for taking care about it. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>> >>>>>>> Go for it! :) >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>> >>>>>>>>> >>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not >>>>>>>>>>>>>>>>>>>> perform at safepoint. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the >>>>>>>>>>>>>>>>>>> target is not suspended: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, >>>>>>>>>>>>>>>>>>> info_ptr); >>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, >>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when >>>>>>>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from >>>>>>>>>>>>>>>>>> get_object_monitor_usage(). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object >>>>>>>>>>>>>>>>>>>> at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be >>>>>>>>>>>>>>>>>>>> changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor >>>>>>>>>>>>>>>>>>>> before [2]. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread >>>>>>>>>>>>>>>>>>> it is either suspended or not. If it is suspended >>>>>>>>>>>>>>>>>>> then it cannot release the monitor. If it is not >>>>>>>>>>>>>>>>>>> suspended we detect that and redo the whole query at >>>>>>>>>>>>>>>>>>> a safepoint. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately >>>>>>>>>>>>>>>>>> after suspending check. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>> has started the process of gathering the information >>>>>>>>>>>>>>>> while not at a >>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The code tries to make sure that it either collects data >>>>>>>>>>>>>>> about a monitor owned by a thread that is suspended, or >>>>>>>>>>>>>>> else it collects that data at a safepoint. But the owning >>>>>>>>>>>>>>> thread can be resumed just after the code determined it >>>>>>>>>>>>>>> was suspended. The monitor can then be released and the >>>>>>>>>>>>>>> information gathered not only stale but potentially >>>>>>>>>>>>>>> completely wrong as it could now be owned by a different >>>>>>>>>>>>>>> thread and will report that thread's entry count. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>> >>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But >>>>>>>>>>>>> the expectation is that the information was actually an >>>>>>>>>>>>> accurate snapshot of the state of the monitor at some point >>>>>>>>>>>>> in time. The current code does not ensure that. >>>>>>>>>>>> >>>>>>>>>>>> Please explain. I clearly don't understand why you think the >>>>>>>>>>>> info >>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the >>>>>>>>>>>> monitor >>>>>>>>>>>> at some point in time". >>>>>>>>>>> >>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>> atomicity**. The reported owner thread may not own it any >>>>>>>>>>> longer when the entry count is read, so straight away you may >>>>>>>>>>> have the wrong entry count information. The set of threads >>>>>>>>>>> trying to acquire the monitor, or wait on the monitor can >>>>>>>>>>> change in unexpected ways. It would be possible for instance >>>>>>>>>>> to report the same thread as being the owner, being blocked >>>>>>>>>>> trying to enter the monitor, and being in the wait-set of the >>>>>>>>>>> monitor - apparently all at the same time! >>>>>>>>>>> >>>>>>>>>>> ** even if the owner is suspended we don't have complete >>>>>>>>>>> atomicity because threads can join the set of threads trying >>>>>>>>>>> to enter the monitor (unless they are all suspended). >>>>>>>>>> >>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>> >>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the >>>>>>>>>> info about >>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info >>>>>>>>>> that >>>>>>>>>> ??? we are gathering cannot change until we return from the >>>>>>>>>> safepoint. >>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>> >>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>> >>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>> >>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>> >>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>> >>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>> ??? some confusion here. >>>>>>>>>> >>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>> ??? the right time. >>>>>>>>>> >>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>> >>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>> >>>>>>>>>> As for this claim: >>>>>>>>>> >>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>> the same time! >>>>>>>>>> >>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>> wait queue. >>>>>>>>>> >>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>> will still be field instability. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>> >>>>>>>>> I agree with this. >>>>>>>>> The advantages are: >>>>>>>>> ??- the result is stable >>>>>>>>> ??- the implementation can be simplified >>>>>>>>> >>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>> >>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the >>>>>>>>>>>>>> doc >>>>>>>>>>>>>> should have more clear about the possibility of returning >>>>>>>>>>>>>> stale >>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>>>>>>>>> suspended so I can't see how this could be construed as >>>>>>>>>>>>>>> an agent bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In your scenario above, you mention that the target thread >>>>>>>>>>>>>> was >>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the >>>>>>>>>>>>>> target >>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>> >>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent >>>>>>>>>>>>>> should not >>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>> >>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent >>>>>>>>>>>>> resume, but you're right that doesn't really make a lot of >>>>>>>>>>>>> sense. But when the spec says nothing about suspension ... >>>>>>>>>>>> >>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>> used to require suspension for these kinds of get-the-info >>>>>>>>>>>> APIs. JVM/TI >>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>> >>>>>>>>>>>> As I've said before, we could add a note about the data >>>>>>>>>>>> being potentially >>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>> stat(2). You can >>>>>>>>>>>> fetch the file's info, but there's no guarantee that the >>>>>>>>>>>> info is current >>>>>>>>>>>> by the time you process what you got back. Is it too much >>>>>>>>>>>> motherhood to >>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to >>>>>>>>>>>>>>> be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>> talking about >>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>> >>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst >>>>>>>>>>>>> the monitor is queried. In effect the operation would >>>>>>>>>>>>> create a per-thread safepoint. >>>>>>>>>>>> >>>>>>>>>>>> I "know" that, but I still need time to think about it and >>>>>>>>>>>> probably >>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>> suspending the owner thread, but it can't actually do that >>>>>>>>>>>>> because suspends/resume don't nest. >>>>>>>>>>>> >>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked >>>>>>>>>>>> internal and >>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>>> Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to >>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the thread >>>>>>>>>>>>>>>>>> is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't >>>>>>>>>>>>>>>>> continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed >>>>>>>>>>>>>>>>>>> common case where threads are first suspended and >>>>>>>>>>>>>>>>>>> then the monitors are queried. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI >>>>>>>>>>>>>>>>>> spec - it just says "Get information about the >>>>>>>>>>>>>>>>>> object's monitor." >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing >>>>>>>>>>>>>>>>> to do with the spec. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner >>>>>>>>>>>>>>>>>> might be just before wakeup. >>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>> is called at safepoint in any case. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>> Handshakes, so this particular operation will require >>>>>>>>>>>>>>>>> that the apparent owner is Handshake-safe (by entering >>>>>>>>>>>>>>>>> a handshake with it) before querying the monitor. This >>>>>>>>>>>>>>>>> would still be preferable I think to always using a >>>>>>>>>>>>>>>>> safepoint for the entire operation. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread >>>>>>>>>>>>>>>>>>> reported as the owner may not be suspended at the >>>>>>>>>>>>>>>>>>> time we first see it, and may release the monitor, >>>>>>>>>>>>>>>>>>> but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a racy >>>>>>>>>>>>>>>>>>> way. This can't happen when suspension itself >>>>>>>>>>>>>>>>>>> requires a safepoint as the current thread won't go >>>>>>>>>>>>>>>>>>> to that safepoint during this code. However, if >>>>>>>>>>>>>>>>>>> suspension is implemented via a direct handshake with >>>>>>>>>>>>>>>>>>> the target thread then we have a problem. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>> From david.holmes at oracle.com Thu Jun 18 08:43:37 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 18:43:37 +1000 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> Message-ID: <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> On 18/06/2020 4:49 pm, Chris Plummer wrote: > On 6/17/20 10:29 PM, David Holmes wrote: >> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>> On 6/17/20 10:09 PM, David Holmes wrote: >>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>> Hi Chris, >>>>>> >>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>> Hello, >>>>>>> >>>>>>> Please help review the following: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>> >>>>>>> The CR contains all the needed details. Here's a summary of >>>>>>> changes in each file: >>>>>> >>>>>> The problem sounds to me like a variation of the more general >>>>>> problem of not ensuring a thread is kept alive whilst acting upon >>>>>> it. I don't know how the SA finds these references to the threads >>>>>> it is going to stackwalk, but is it possible to fix this via >>>>>> appropriate uses of ThreadsListHandle/Iterator? >>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>> >>>>> Keep in mind that once SA attaches, nothing in the VM changes. For >>>>> example, SA can't create a wrapper to a JavaThread, only to have >>>>> the JavaThread be freed later on. It's just not possible. >>>> >>>> Then how does it obtain a reference to a JavaThread for which the >>>> native OS thread id is invalid? Any thread found in >>>> _java_thread_list is either live or still to be started. In the >>>> latter case the JavaThread->osThread does not have its thread_id set >>>> yet. >>>> >>> My assumption was that the JavaThread is in the process of being >>> destroyed, and it has freed its OS thread but is itself still in the >>> thread list. I did notice that the OS thread id being used looked to >>> be in the range of thread id #'s you would expect for the running >>> app, so that to me indicated it was once valid, but is no more. >>> >>> Keep in mind that although hotspot may have synchronization code that >>> prevents you from pulling a JavaThread off the thread list when it is >>> in the process of being destroyed (I'm guessing it does), SA has no >>> such protections. >> >> But you stated that once the SA has attached, the target VM can't >> change. If the SA gets its set of thread from one attach then tries to >> make queries about those threads in a separate attach, then obviously >> it could be providing garbage thread information. So you would need to >> re-validate the JavaThread in the target VM before trying to do >> anything with it. > That's not what is going on here. It's attaching and doing a stack > trace, which involves getting the thread list and iterating through all > threads without detaching. Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). Cheers, David ----- > Also, even if you are using something like > clhsdb to issue commands on addresses, if the address is no longer valid > for the command you are executing, then you would get the appropriate > error when there is an attempt to create a wrapper for it. I don't know > of any command that operates directly on a JavaThread, but I think there > are for InstanceKlass. So if you remembered the address of an > InstanceKlass, and then reattached and tried a command that takes an > InstanceKlass address, you would get an exception when SA tries to > create the wrapper for the InsanceKlass if it were no longer a valid > address for one. > > Chris >> >> David >> ----- >> >>> Chris >>>> David >>>> ----- >>>> >>>>> Chris >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>>> invalid, print a warning. >>>>>>> >>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>> -Improve a print_debug message >>>>>>> >>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>> >>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>> >>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>> >>>>>>> -Deal with the array of registers read in being null due to the >>>>>>> OS ThreadID not being valid. >>>>>>> >>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>> >>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>> >>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>> appearing twice when printing the exception. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>> >>> > From suenaga at oss.nttdata.com Thu Jun 18 09:07:51 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 18 Jun 2020 18:07:51 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <53f77d9a-4ba0-52f5-6698-39f2153b7bce@oracle.com> <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> Message-ID: <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> On 2020/06/18 17:36, David Holmes wrote: > On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >> Hi David, >> >> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. >> Do you mean we should it explicitly in c'tor? > > Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). Ok, I've fixed them. Could you review again? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ Thanks, Yasumasa > David > >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/18 13:58, David Holmes wrote: >>> Hi Yasumasa, >>> >>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>> Hi Serguei, >>>> >>>> Thanks for your comment! >>>> I uploaded new webrev: >>>> >>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>> >>>> I'm not sure the following change is correct. >>>> Can we assume owning_thread is not NULL at safepoint? >>> >>> We can if "owner != NULL". So that change seem fine to me. >>> >>> But given this is now only executed at a safepoint there are additional simplifications that can be made: >>> >>> - current thread determination can be simplified: >>> >>> 945?? Thread* current_thread = Thread::current(); >>> >>> becomes: >>> >>> ??? Thread* current_thread = VMThread::vm_thread(); >>> ??? assert(current_thread == Thread::current(), "must be"); >>> >>> - these comments can be removed >>> >>> ??994?????? // Use current thread since function can be called from a >>> ??995?????? // JavaThread or the VMThread. >>> 1053?????? // Use current thread since function can be called from a >>> 1054?????? // JavaThread or the VMThread. >>> >>> - these TLH constructions should be passing current_thread (existing bug) >>> >>> 996?????? ThreadsListHandle tlh; >>> 1055?????? ThreadsListHandle tlh; >>> >>> - All ResourceMarks should be passing current_thread (existing bug) >>> >>> >>> Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( >>> >>> Thanks, >>> David >>> ----- >>> >>>> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >>>> >>>> >>>> ``` >>>> ??????? // This monitor is owned so we have to find the owning JavaThread. >>>> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>> -????? // Cannot assume (owning_thread != NULL) here because this function >>>> -????? // may not have been called at a safepoint and the owning_thread >>>> -????? // might not be suspended. >>>> -????? if (owning_thread != NULL) { >>>> -??????? // The monitor's owner either has to be the current thread, at safepoint >>>> -??????? // or it has to be suspended. Any of these conditions will prevent both >>>> -??????? // contending and waiting threads from modifying the state of >>>> -??????? // the monitor. >>>> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>> -????????? // will not make it back to the JVM/TI agent. The error code will >>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>> -??????? } >>>> -??????? HandleMark hm; >>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>> >>>> ``` >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>> Hi Yasumasa, >>>>> >>>>> This fix is not enough. >>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>>>> The non-VMop mode has to be removed. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>> (Change subject for RFR) >>>>>> >>>>>> Hi, >>>>>> >>>>>> I filed it to JBS and upload a webrev for it. >>>>>> Could you review it? >>>>>> >>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>> >>>>>> This change has passed tests on submit repo. >>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>> Yes. It seems we have a consensus. >>>>>>> Thank you for taking care about it. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>> >>>>>>>> Go for it! :) >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>>>> >>>>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>>>> at some point in time". >>>>>>>>>>>> >>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>> >>>>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>>>> >>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>> >>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>> >>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>> >>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>> >>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>> >>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>> >>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>> ??? some confusion here. >>>>>>>>>>> >>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>> ??? the right time. >>>>>>>>>>> >>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>> >>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>> >>>>>>>>>>> As for this claim: >>>>>>>>>>> >>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>> the same time! >>>>>>>>>>> >>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>> wait queue. >>>>>>>>>>> >>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>> will still be field instability. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>> >>>>>>>>>> I agree with this. >>>>>>>>>> The advantages are: >>>>>>>>>> ??- the result is stable >>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>> >>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>> >>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>>>> >>>>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>> >>>>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>>>> >>>>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>>>> >>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>> >>>>>>>>>>>>> Dan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> From claes.redestad at oracle.com Thu Jun 18 11:38:50 2020 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 18 Jun 2020 13:38:50 +0200 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> Message-ID: <31af0e67-7bee-0680-3f23-09864030bba4@oracle.com> On 2020-06-17 05:19, Ioi Lam wrote: > > > On 6/16/20 6:20 PM, David Holmes wrote: >> Hi Ioi, >> >> On 17/06/2020 6:14 am, Ioi Lam wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>> >>> >>> PerfClassTraceTime is (a rarely used feature) for measuring the time >>> spent during class linking and initialization. >> >> "A special command jcmd PerfCounter.print >> prints all performance counters in the process." >> >> How do you know this is a "rarely used feature"? > Hi David, > > Sure, the counter will be dumped, but by "rarely used" -- I mean no one > will find this particular counter useful, and no one will be actively > looking at it. > > I changed two parts of the code -- class init and class linking. > > For class initialization, the counter may be useful for people who want > to know how much time is spent in their functions, and my patch > doesn't change that. It only avoids using the counter when a class has > no , i.e., we know that the counter counts nothing (except for a > logging statement). > > ===== > > For class linking, no user code is executed, so it only measures VM > code. If it's useful for anyone, that would be VM engineers like me who > are trying to optimize the speed of class loading. However, due to the > overhead of the counter vs what it's trying to measure, the results are > pretty meaningless. > > Note that I've not disabled the counter altogether. Instead, I disable > it only when linking a CDS shared class, and we know that very little is > happening for this class (e.g., no verification). > > I think the class linking timer might have been useful 15 years ago when > it was introduced, or it might be useful today when CDS is disabled. But > with CDS enabled, we are paying a constant price that seems to benefit > no one. > > I think we should short-circuit it when it seems appropriate. If this > indeed causes problems for our users, it's easy to re-enable it. That's > better than just keeping this forever just because we're afraid to touch > anything. I think this seems like well-rounded approach overall, but this assumes that we're mostly measuring the overhead of measurement here. I don't doubt that's the case for the scenarios you're excluding here and now, but it's hard to guarantee this property hold in the future. Perhaps a diagnostic flag to enable timing unconditionally would be appropriate? With such a flag we could verify that the time deltas of running some applications with and without the flag roughly matches the time delta in reported linking time. If they diverge, we might need to adjust the conditions. > >> >> I find it hard to evaluate whether this short-circuiting of the time >> tracing is reasonable or not. Obviously any monitoring mechanism >> should impose minimal overhead compared to what is being measured, and >> these timers fall short in that regard. But if these stats become >> meaningless then they may as well be removed. >> >> I think the serviceability folk (cc'd) need to evaluate this in the >> context of the M&M tools. As a complement (or even alternative) there might be ways we can reduce time-to-measure overheads. E.g, JFR added FastUnorderedElapsedCounterSource (share/utilities/ticks.hpp) which uses rdtsc if available (x86 - fallback to os::elapsed_counter otherwise). This might be a reasonable alternative for the Perf* timers, which should be short-running events on a single thread. /Claes >> >>> However, it's quite expensive and it needs to start and stop a bunch >>> of timers. With CDS, it's quite often for the overhead of the timer >>> itself to be much more than the time it's trying to measure, giving >>> unreliable measurement. >>> >>> In this patch, when it's clear that the init and linking will be very >>> quick, I disable the timer and count only the number of invocations. >>> This shows a small improvement in start-up >> >> I'm curious if you tried to forcing EagerInitialization to be true to >> see how that improves the baseline. I've always noticed eager_init in >> the code, but hadn't realized it is disabled by default. >> > > I think it cannot be done by default, as it will violate the JLS. A > class can be initialized only when it's touched by bytecodes. > > It can also backfire as we may load many classes without initializing > them. E.g., during bytecode verification, we load many classes and just > check that one is a supertype of another. > > Thanks > - Ioi > >> Cheers, >> David >> ----- >> >>> Results of " perf stat -r 100 bin/java -Xshare:on >>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>> >>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 (? 0.310) + >>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 (? 0.341) ++ >>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>> ================================================ >>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>> instruction delta =????? -284608??? -0.4774% >>> time??????? delta =?????? -0.434 ms -1.0435% >>> >>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>> (so we have an overhead of about 715 instructions per use, yikes!). > From coleen.phillimore at oracle.com Thu Jun 18 11:39:34 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 18 Jun 2020 07:39:34 -0400 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage In-Reply-To: <0a3648ec-27be-7911-b7b3-cff48f55f793@oracle.com> References: <0a3648ec-27be-7911-b7b3-cff48f55f793@oracle.com> Message-ID: On 6/17/20 9:46 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > Nice simplification! > It looks good to me. > I assume you will run the nsk.jvmti tests. Thanks Serguei.? The nsk jvmti tests are run with tier3, I believe but I'll run them on the command line to verify. thanks! Coleen > > Thanks, > Serguei > > > On 6/17/20 14:25, coleen.phillimore at oracle.com wrote: >> Summary: Remove JVMTI oops_do calls from JVMTI and GCs >> >> Tested with tier1-3, also built shenandoah to verify shenandoah changes. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8247808 >> >> Thanks, >> Coleen > From david.holmes at oracle.com Thu Jun 18 13:18:00 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Jun 2020 23:18:00 +1000 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> References: <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> Message-ID: <98d1761f-df82-8990-a784-25c953472793@oracle.com> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote: > On 2020/06/18 17:36, David Holmes wrote: >> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>> Hi David, >>> >>> Both ThreadsListHandle and ResourceMarks would use >>> `Thread::current()` for their resource. It is set as default >>> parameter in c'tor. >>> Do you mean we should it explicitly in c'tor? >> >> Yes pass current_thread so we don't do the additional unnecessary >> calls to Thread::current(). > > Ok, I've fixed them. Could you review again? > > ? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ Updates look good. One nit I missed before: src/hotspot/share/prims/jvmtiEnv.cpp // It need to perform at safepoint for gathering stable data please change to: // This need to be performed at a safepoint to gather stable data Thanks, David > > Thanks, > > Yasumasa > > >> David >> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/18 13:58, David Holmes wrote: >>>> Hi Yasumasa, >>>> >>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>> Hi Serguei, >>>>> >>>>> Thanks for your comment! >>>>> I uploaded new webrev: >>>>> >>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>> >>>>> I'm not sure the following change is correct. >>>>> Can we assume owning_thread is not NULL at safepoint? >>>> >>>> We can if "owner != NULL". So that change seem fine to me. >>>> >>>> But given this is now only executed at a safepoint there are >>>> additional simplifications that can be made: >>>> >>>> - current thread determination can be simplified: >>>> >>>> 945?? Thread* current_thread = Thread::current(); >>>> >>>> becomes: >>>> >>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>> ??? assert(current_thread == Thread::current(), "must be"); >>>> >>>> - these comments can be removed >>>> >>>> ??994?????? // Use current thread since function can be called from a >>>> ??995?????? // JavaThread or the VMThread. >>>> 1053?????? // Use current thread since function can be called from a >>>> 1054?????? // JavaThread or the VMThread. >>>> >>>> - these TLH constructions should be passing current_thread (existing >>>> bug) >>>> >>>> 996?????? ThreadsListHandle tlh; >>>> 1055?????? ThreadsListHandle tlh; >>>> >>>> - All ResourceMarks should be passing current_thread (existing bug) >>>> >>>> >>>> Aside: there is a major inconsistency between the spec and >>>> implementation for this method. I've traced the history to see how >>>> this came about from JVMDI (ref JDK-4546581) but it never resulted >>>> in the JVM TI specification clearly stating what the >>>> waiters/waiter_count means. I will file a bug to have the spec >>>> clarified to match the implementation (even though I think the >>>> implementation is what is wrong). :( >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> All tests on submit repo and serviceability/jvmti and >>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>> >>>>> >>>>> ``` >>>>> ??????? // This monitor is owned so we have to find the owning >>>>> JavaThread. >>>>> ??????? owning_thread = >>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>> -????? // Cannot assume (owning_thread != NULL) here because this >>>>> function >>>>> -????? // may not have been called at a safepoint and the >>>>> owning_thread >>>>> -????? // might not be suspended. >>>>> -????? if (owning_thread != NULL) { >>>>> -??????? // The monitor's owner either has to be the current >>>>> thread, at safepoint >>>>> -??????? // or it has to be suspended. Any of these conditions will >>>>> prevent both >>>>> -??????? // contending and waiting threads from modifying the state of >>>>> -??????? // the monitor. >>>>> -??????? if (!at_safepoint && >>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>> -????????? // Don't worry! This return of >>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>> -????????? // will not make it back to the JVM/TI agent. The error >>>>> code will >>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() >>>>> which >>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM >>>>> op. >>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>> -??????? } >>>>> -??????? HandleMark hm; >>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be >>>>> NULL"); >>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>> >>>>> ``` >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Yasumasa, >>>>>> >>>>>> This fix is not enough. >>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two >>>>>> modes: in VMop and non-VMop. >>>>>> The non-VMop mode has to be removed. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>> (Change subject for RFR) >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>> Could you review it? >>>>>>> >>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>> ? webrev: >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>> >>>>>>> This change has passed tests on submit repo. >>>>>>> Also I tested it with serviceability/jvmti and >>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>> Yes. It seems we have a consensus. >>>>>>>> Thank you for taking care about it. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> >>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>> >>>>>>>>> Go for it! :) >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does >>>>>>>>>>>>>>>>>>>>>> not perform at safepoint. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the >>>>>>>>>>>>>>>>>>>>> target is not suspended: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, >>>>>>>>>>>>>>>>>>>>> info_ptr); >>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>> ???? VM_GetObjectMonitorUsage op(this, >>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>> ???? VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases >>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not >>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might >>>>>>>>>>>>>>>>>>>>>> be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor >>>>>>>>>>>>>>>>>>>>>> before [2]. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If it >>>>>>>>>>>>>>>>>>>>> is not suspended we detect that and redo the whole >>>>>>>>>>>>>>>>>>>>> query at a safepoint. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately >>>>>>>>>>>>>>>>>>>> after suspending check. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>> has started the process of gathering the information >>>>>>>>>>>>>>>>>> while not at a >>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The code tries to make sure that it either collects >>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is >>>>>>>>>>>>>>>>> suspended, or else it collects that data at a >>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just >>>>>>>>>>>>>>>>> after the code determined it was suspended. The monitor >>>>>>>>>>>>>>>>> can then be released and the information gathered not >>>>>>>>>>>>>>>>> only stale but potentially completely wrong as it could >>>>>>>>>>>>>>>>> now be owned by a different thread and will report that >>>>>>>>>>>>>>>>> thread's entry count. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target >>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But >>>>>>>>>>>>>>> the expectation is that the information was actually an >>>>>>>>>>>>>>> accurate snapshot of the state of the monitor at some >>>>>>>>>>>>>>> point in time. The current code does not ensure that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please explain. I clearly don't understand why you think >>>>>>>>>>>>>> the info >>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the >>>>>>>>>>>>>> monitor >>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>> >>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any >>>>>>>>>>>>> longer when the entry count is read, so straight away you >>>>>>>>>>>>> may have the wrong entry count information. The set of >>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the >>>>>>>>>>>>> monitor can change in unexpected ways. It would be possible >>>>>>>>>>>>> for instance to report the same thread as being the owner, >>>>>>>>>>>>> being blocked trying to enter the monitor, and being in the >>>>>>>>>>>>> wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>>> >>>>>>>>>>>>> ** even if the owner is suspended we don't have complete >>>>>>>>>>>>> atomicity because threads can join the set of threads >>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>> >>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>> >>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the >>>>>>>>>>>> info about >>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the >>>>>>>>>>>> info that >>>>>>>>>>>> ??? we are gathering cannot change until we return from the >>>>>>>>>>>> safepoint. >>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>> >>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>> >>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>> >>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>> >>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>> >>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it >>>>>>>>>>>> will >>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the >>>>>>>>>>>> threads >>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>> >>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>> ??? the right time. >>>>>>>>>>>> >>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>> >>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>> >>>>>>>>>>>> As for this claim: >>>>>>>>>>>> >>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>> the same time! >>>>>>>>>>>> >>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>> wait queue. >>>>>>>>>>>> >>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>> will still be field instability. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>> >>>>>>>>>>> I agree with this. >>>>>>>>>>> The advantages are: >>>>>>>>>>> ??- the result is stable >>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>> >>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>> >>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps >>>>>>>>>>>>>>>> the doc >>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being >>>>>>>>>>>>>>>>> suspended so I can't see how this could be construed as >>>>>>>>>>>>>>>>> an agent bug. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the >>>>>>>>>>>>>>>> target >>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent >>>>>>>>>>>>>>>> should not >>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent >>>>>>>>>>>>>>>> bug. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>> independent resume, but you're right that doesn't really >>>>>>>>>>>>>>> make a lot of sense. But when the spec says nothing about >>>>>>>>>>>>>>> suspension ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info >>>>>>>>>>>>>> APIs. JVM/TI >>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As I've said before, we could add a note about the data >>>>>>>>>>>>>> being potentially >>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>>>> stat(2). You can >>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the >>>>>>>>>>>>>> info is current >>>>>>>>>>>>>> by the time you process what you got back. Is it too much >>>>>>>>>>>>>> motherhood to >>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this >>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any >>>>>>>>>>>>>>>>> safepoints. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst >>>>>>>>>>>>>>> the monitor is queried. In effect the operation would >>>>>>>>>>>>>>> create a per-thread safepoint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I "know" that, but I still need time to think about it and >>>>>>>>>>>>>> probably >>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do >>>>>>>>>>>>>>> that because suspends/resume don't nest. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked >>>>>>>>>>>>>> internal and >>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to >>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the >>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in >>>>>>>>>>>>>>>>>>>> native [4]. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't >>>>>>>>>>>>>>>>>>> continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed >>>>>>>>>>>>>>>>>>>>> common case where threads are first suspended and >>>>>>>>>>>>>>>>>>>>> then the monitors are queried. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about the >>>>>>>>>>>>>>>>>>>> object's monitor." >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner >>>>>>>>>>>>>>>>>>>> might be just before wakeup. >>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in >>>>>>>>>>>>>>>>>>>> any case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will require >>>>>>>>>>>>>>>>>>> that the apparent owner is Handshake-safe (by >>>>>>>>>>>>>>>>>>> entering a handshake with it) before querying the >>>>>>>>>>>>>>>>>>> monitor. This would still be preferable I think to >>>>>>>>>>>>>>>>>>> always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended >>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the >>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a racy >>>>>>>>>>>>>>>>>>>>> way. This can't happen when suspension itself >>>>>>>>>>>>>>>>>>>>> requires a safepoint as the current thread won't go >>>>>>>>>>>>>>>>>>>>> to that safepoint during this code. However, if >>>>>>>>>>>>>>>>>>>>> suspension is implemented via a direct handshake >>>>>>>>>>>>>>>>>>>>> with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> From daniel.daugherty at oracle.com Thu Jun 18 13:55:22 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 18 Jun 2020 09:55:22 -0400 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <98d1761f-df82-8990-a784-25c953472793@oracle.com> References: <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <98d1761f-df82-8990-a784-25c953472793@oracle.com> Message-ID: <9aecb945-109b-334e-8e11-f2c8224048e2@oracle.com> On 6/18/20 9:18 AM, David Holmes wrote: > On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote: >> On 2020/06/18 17:36, David Holmes wrote: >>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>> Hi David, >>>> >>>> Both ThreadsListHandle and ResourceMarks would use >>>> `Thread::current()` for their resource. It is set as default >>>> parameter in c'tor. >>>> Do you mean we should it explicitly in c'tor? >>> >>> Yes pass current_thread so we don't do the additional unnecessary >>> calls to Thread::current(). >> >> Ok, I've fixed them. Could you review again? >> >> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ > > Updates look good. One nit I missed before: > > src/hotspot/share/prims/jvmtiEnv.cpp > > // It need to perform at safepoint for gathering stable data > > please change to: > > // This need to be performed at a safepoint to gather stable data Just a comment on this comment... I still haven't gotten to the webrev yet... Perhaps: ??? // This needs to be performed at a safepoint to gather stable data. Dan > > Thanks, > David > >> >> Thanks, >> >> Yasumasa >> >> >>> David >>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/18 13:58, David Holmes wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>> Hi Serguei, >>>>>> >>>>>> Thanks for your comment! >>>>>> I uploaded new webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>> >>>>>> I'm not sure the following change is correct. >>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>> >>>>> We can if "owner != NULL". So that change seem fine to me. >>>>> >>>>> But given this is now only executed at a safepoint there are >>>>> additional simplifications that can be made: >>>>> >>>>> - current thread determination can be simplified: >>>>> >>>>> 945?? Thread* current_thread = Thread::current(); >>>>> >>>>> becomes: >>>>> >>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>> >>>>> - these comments can be removed >>>>> >>>>> ??994?????? // Use current thread since function can be called from a >>>>> ??995?????? // JavaThread or the VMThread. >>>>> 1053?????? // Use current thread since function can be called from a >>>>> 1054?????? // JavaThread or the VMThread. >>>>> >>>>> - these TLH constructions should be passing current_thread >>>>> (existing bug) >>>>> >>>>> 996?????? ThreadsListHandle tlh; >>>>> 1055?????? ThreadsListHandle tlh; >>>>> >>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>> >>>>> >>>>> Aside: there is a major inconsistency between the spec and >>>>> implementation for this method. I've traced the history to see how >>>>> this came about from JVMDI (ref JDK-4546581) but it never resulted >>>>> in the JVM TI specification clearly stating what the >>>>> waiters/waiter_count means. I will file a bug to have the spec >>>>> clarified to match the implementation (even though I think the >>>>> implementation is what is wrong). :( >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> All tests on submit repo and serviceability/jvmti and >>>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>>> >>>>>> >>>>>> ``` >>>>>> ??????? // This monitor is owned so we have to find the owning >>>>>> JavaThread. >>>>>> ??????? owning_thread = >>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>> -????? // Cannot assume (owning_thread != NULL) here because this >>>>>> function >>>>>> -????? // may not have been called at a safepoint and the >>>>>> owning_thread >>>>>> -????? // might not be suspended. >>>>>> -????? if (owning_thread != NULL) { >>>>>> -??????? // The monitor's owner either has to be the current >>>>>> thread, at safepoint >>>>>> -??????? // or it has to be suspended. Any of these conditions >>>>>> will prevent both >>>>>> -??????? // contending and waiting threads from modifying the >>>>>> state of >>>>>> -??????? // the monitor. >>>>>> -??????? if (!at_safepoint && >>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>> -????????? // Don't worry! This return of >>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>> -????????? // will not make it back to the JVM/TI agent. The >>>>>> error code will >>>>>> -????????? // get intercepted in >>>>>> JvmtiEnv::GetObjectMonitorUsage() which >>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage >>>>>> VM op. >>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>> -??????? } >>>>>> -??????? HandleMark hm; >>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not >>>>>> be NULL"); >>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>> >>>>>> ``` >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> This fix is not enough. >>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two >>>>>>> modes: in VMop and non-VMop. >>>>>>> The non-VMop mode has to be removed. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>> (Change subject for RFR) >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>> Could you review it? >>>>>>>> >>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>> ? webrev: >>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>> >>>>>>>> This change has passed tests on submit repo. >>>>>>>> Also I tested it with serviceability/jvmti and >>>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>> Thank you for taking care about it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>> >>>>>>>>>> Go for it! :) >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does >>>>>>>>>>>>>>>>>>>>>>> not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the >>>>>>>>>>>>>>>>>>>>>> target is not suspended: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, >>>>>>>>>>>>>>>>>>>>>> info_ptr); >>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases >>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not >>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner >>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor >>>>>>>>>>>>>>>>>>>>>>> before [2]. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If >>>>>>>>>>>>>>>>>>>>>> it is not suspended we detect that and redo the >>>>>>>>>>>>>>>>>>>>>> whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume >>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>> has started the process of gathering the information >>>>>>>>>>>>>>>>>>> while not at a >>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects >>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is >>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a >>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just >>>>>>>>>>>>>>>>>> after the code determined it was suspended. The >>>>>>>>>>>>>>>>>> monitor can then be released and the information >>>>>>>>>>>>>>>>>> gathered not only stale but potentially completely >>>>>>>>>>>>>>>>>> wrong as it could now be owned by a different thread >>>>>>>>>>>>>>>>>> and will report that thread's entry count. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as >>>>>>>>>>>>>>>>> soon as >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation >>>>>>>>>>>>>>>>> returns >>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target >>>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. >>>>>>>>>>>>>>>> But the expectation is that the information was >>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the >>>>>>>>>>>>>>>> monitor at some point in time. The current code does >>>>>>>>>>>>>>>> not ensure that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think >>>>>>>>>>>>>>> the info >>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the >>>>>>>>>>>>>>> monitor >>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>> >>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any >>>>>>>>>>>>>> longer when the entry count is read, so straight away you >>>>>>>>>>>>>> may have the wrong entry count information. The set of >>>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the >>>>>>>>>>>>>> monitor can change in unexpected ways. It would be >>>>>>>>>>>>>> possible for instance to report the same thread as being >>>>>>>>>>>>>> the owner, being blocked trying to enter the monitor, and >>>>>>>>>>>>>> being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>> >>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete >>>>>>>>>>>>>> atomicity because threads can join the set of threads >>>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ >>>>>>>>>>>>> suspended: >>>>>>>>>>>>> >>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the >>>>>>>>>>>>> info about >>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the >>>>>>>>>>>>> info that >>>>>>>>>>>>> ??? we are gathering cannot change until we return from >>>>>>>>>>>>> the safepoint. >>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>> >>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the >>>>>>>>>>>>> object's >>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no >>>>>>>>>>>>> other >>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>> >>>>>>>>>>>>> ??? So in this case, the owner info and notify info is >>>>>>>>>>>>> stable, >>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>> >>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info >>>>>>>>>>>>> about the >>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it >>>>>>>>>>>>> finds a >>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then >>>>>>>>>>>>> it will >>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the >>>>>>>>>>>>> threads >>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can >>>>>>>>>>>>> cause >>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>> >>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>> >>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>> >>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>> >>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>> >>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>> as being the owner, being blocked trying to enter the >>>>>>>>>>>>>> monitor, >>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>> the same time! >>>>>>>>>>>>> >>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>> wait queue. >>>>>>>>>>>>> >>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>> >>>>>>>>>>>> I agree with this. >>>>>>>>>>>> The advantages are: >>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>> >>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>> >>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>>> Dan >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps >>>>>>>>>>>>>>>>> the doc >>>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's >>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be >>>>>>>>>>>>>>>>>> construed as an agent bug. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while >>>>>>>>>>>>>>>>> the target >>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed >>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but >>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent >>>>>>>>>>>>>>>>> should not >>>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so >>>>>>>>>>>>>>>>> agent bug. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>>> independent resume, but you're right that doesn't >>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says >>>>>>>>>>>>>>>> nothing about suspension ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As I've said before, we could add a note about the data >>>>>>>>>>>>>>> being potentially >>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>>>>> stat(2). You can >>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the >>>>>>>>>>>>>>> info is current >>>>>>>>>>>>>>> by the time you process what you got back. Is it too >>>>>>>>>>>>>>> much motherhood to >>>>>>>>>>>>>>> state that the data might be stale? I could go either >>>>>>>>>>>>>>> way... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this >>>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any >>>>>>>>>>>>>>>>>> safepoints. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the operation >>>>>>>>>>>>>>>> would create a per-thread safepoint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I "know" that, but I still need time to think about it >>>>>>>>>>>>>>> and probably >>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do >>>>>>>>>>>>>>>> that because suspends/resume don't nest. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we >>>>>>>>>>>>>>> tracked internal and >>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to >>>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the >>>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in >>>>>>>>>>>>>>>>>>>>> native [4]. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it >>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the >>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first >>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about >>>>>>>>>>>>>>>>>>>>> the object's monitor." >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner >>>>>>>>>>>>>>>>>>>>> might be just before wakeup. >>>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in >>>>>>>>>>>>>>>>>>>>> any case. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will >>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe >>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying >>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I think >>>>>>>>>>>>>>>>>>>> to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended >>>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the >>>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we >>>>>>>>>>>>>>>>>>>>>> call: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a >>>>>>>>>>>>>>>>>>>>>> racy way. This can't happen when suspension >>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current thread >>>>>>>>>>>>>>>>>>>>>> won't go to that safepoint during this code. >>>>>>>>>>>>>>>>>>>>>> However, if suspension is implemented via a >>>>>>>>>>>>>>>>>>>>>> direct handshake with the target thread then we >>>>>>>>>>>>>>>>>>>>>> have a problem. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> From david.holmes at oracle.com Thu Jun 18 14:04:19 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Jun 2020 00:04:19 +1000 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <9aecb945-109b-334e-8e11-f2c8224048e2@oracle.com> References: <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <98d1761f-df82-8990-a784-25c953472793@oracle.com> <9aecb945-109b-334e-8e11-f2c8224048e2@oracle.com> Message-ID: <8192d5c0-c30b-c410-b14c-47698f43748b@oracle.com> On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote: > On 6/18/20 9:18 AM, David Holmes wrote: >> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote: >>> On 2020/06/18 17:36, David Holmes wrote: >>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>> Hi David, >>>>> >>>>> Both ThreadsListHandle and ResourceMarks would use >>>>> `Thread::current()` for their resource. It is set as default >>>>> parameter in c'tor. >>>>> Do you mean we should it explicitly in c'tor? >>>> >>>> Yes pass current_thread so we don't do the additional unnecessary >>>> calls to Thread::current(). >>> >>> Ok, I've fixed them. Could you review again? >>> >>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >> >> Updates look good. One nit I missed before: >> >> src/hotspot/share/prims/jvmtiEnv.cpp >> >> // It need to perform at safepoint for gathering stable data >> >> please change to: >> >> // This need to be performed at a safepoint to gather stable data > > Just a comment on this comment... I still haven't gotten to the webrev > yet... > > Perhaps: > > ??? // This needs to be performed at a safepoint to gather stable data. There is a second line that continues the sentence // because monitor owner / waiters might not be suspended. David ----- > Dan >> >> Thanks, >> David >> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> David >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>> Hi Yasumasa, >>>>>> >>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>> Hi Serguei, >>>>>>> >>>>>>> Thanks for your comment! >>>>>>> I uploaded new webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>> >>>>>>> I'm not sure the following change is correct. >>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>> >>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>> >>>>>> But given this is now only executed at a safepoint there are >>>>>> additional simplifications that can be made: >>>>>> >>>>>> - current thread determination can be simplified: >>>>>> >>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>> >>>>>> becomes: >>>>>> >>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>> >>>>>> - these comments can be removed >>>>>> >>>>>> ??994?????? // Use current thread since function can be called from a >>>>>> ??995?????? // JavaThread or the VMThread. >>>>>> 1053?????? // Use current thread since function can be called from a >>>>>> 1054?????? // JavaThread or the VMThread. >>>>>> >>>>>> - these TLH constructions should be passing current_thread >>>>>> (existing bug) >>>>>> >>>>>> 996?????? ThreadsListHandle tlh; >>>>>> 1055?????? ThreadsListHandle tlh; >>>>>> >>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>> >>>>>> >>>>>> Aside: there is a major inconsistency between the spec and >>>>>> implementation for this method. I've traced the history to see how >>>>>> this came about from JVMDI (ref JDK-4546581) but it never resulted >>>>>> in the JVM TI specification clearly stating what the >>>>>> waiters/waiter_count means. I will file a bug to have the spec >>>>>> clarified to match the implementation (even though I think the >>>>>> implementation is what is wrong). :( >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> All tests on submit repo and serviceability/jvmti and >>>>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>>>> >>>>>>> >>>>>>> ``` >>>>>>> ??????? // This monitor is owned so we have to find the owning >>>>>>> JavaThread. >>>>>>> ??????? owning_thread = >>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>> -????? // Cannot assume (owning_thread != NULL) here because this >>>>>>> function >>>>>>> -????? // may not have been called at a safepoint and the >>>>>>> owning_thread >>>>>>> -????? // might not be suspended. >>>>>>> -????? if (owning_thread != NULL) { >>>>>>> -??????? // The monitor's owner either has to be the current >>>>>>> thread, at safepoint >>>>>>> -??????? // or it has to be suspended. Any of these conditions >>>>>>> will prevent both >>>>>>> -??????? // contending and waiting threads from modifying the >>>>>>> state of >>>>>>> -??????? // the monitor. >>>>>>> -??????? if (!at_safepoint && >>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>> -????????? // Don't worry! This return of >>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>> -????????? // will not make it back to the JVM/TI agent. The >>>>>>> error code will >>>>>>> -????????? // get intercepted in >>>>>>> JvmtiEnv::GetObjectMonitorUsage() which >>>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage >>>>>>> VM op. >>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>> -??????? } >>>>>>> -??????? HandleMark hm; >>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not >>>>>>> be NULL"); >>>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>> >>>>>>> ``` >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> This fix is not enough. >>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two >>>>>>>> modes: in VMop and non-VMop. >>>>>>>> The non-VMop mode has to be removed. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> >>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>> (Change subject for RFR) >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>> Could you review it? >>>>>>>>> >>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>> ? webrev: >>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>> >>>>>>>>> This change has passed tests on submit repo. >>>>>>>>> Also I tested it with serviceability/jvmti and >>>>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>> Thank you for taking care about it. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>> >>>>>>>>>>> Go for it! :) >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does >>>>>>>>>>>>>>>>>>>>>>>> not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the >>>>>>>>>>>>>>>>>>>>>>> target is not suspended: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, >>>>>>>>>>>>>>>>>>>>>>> info_ptr); >>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases >>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not >>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner >>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor >>>>>>>>>>>>>>>>>>>>>>>> before [2]. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If >>>>>>>>>>>>>>>>>>>>>>> it is not suspended we detect that and redo the >>>>>>>>>>>>>>>>>>>>>>> whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume >>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>> has started the process of gathering the information >>>>>>>>>>>>>>>>>>>> while not at a >>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects >>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is >>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a >>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just >>>>>>>>>>>>>>>>>>> after the code determined it was suspended. The >>>>>>>>>>>>>>>>>>> monitor can then be released and the information >>>>>>>>>>>>>>>>>>> gathered not only stale but potentially completely >>>>>>>>>>>>>>>>>>> wrong as it could now be owned by a different thread >>>>>>>>>>>>>>>>>>> and will report that thread's entry count. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as >>>>>>>>>>>>>>>>>> soon as >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation >>>>>>>>>>>>>>>>>> returns >>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target >>>>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. >>>>>>>>>>>>>>>>> But the expectation is that the information was >>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the >>>>>>>>>>>>>>>>> monitor at some point in time. The current code does >>>>>>>>>>>>>>>>> not ensure that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think >>>>>>>>>>>>>>>> the info >>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the >>>>>>>>>>>>>>>> monitor >>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any >>>>>>>>>>>>>>> longer when the entry count is read, so straight away you >>>>>>>>>>>>>>> may have the wrong entry count information. The set of >>>>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the >>>>>>>>>>>>>>> monitor can change in unexpected ways. It would be >>>>>>>>>>>>>>> possible for instance to report the same thread as being >>>>>>>>>>>>>>> the owner, being blocked trying to enter the monitor, and >>>>>>>>>>>>>>> being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete >>>>>>>>>>>>>>> atomicity because threads can join the set of threads >>>>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ >>>>>>>>>>>>>> suspended: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the >>>>>>>>>>>>>> info about >>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the >>>>>>>>>>>>>> info that >>>>>>>>>>>>>> ??? we are gathering cannot change until we return from >>>>>>>>>>>>>> the safepoint. >>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the >>>>>>>>>>>>>> object's >>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no >>>>>>>>>>>>>> other >>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is >>>>>>>>>>>>>> stable, >>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info >>>>>>>>>>>>>> about the >>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it >>>>>>>>>>>>>> finds a >>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then >>>>>>>>>>>>>> it will >>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the >>>>>>>>>>>>>> threads >>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can >>>>>>>>>>>>>> cause >>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the >>>>>>>>>>>>>>> monitor, >>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>> >>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>>> >>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>>> Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps >>>>>>>>>>>>>>>>>> the doc >>>>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's >>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be >>>>>>>>>>>>>>>>>>> construed as an agent bug. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while >>>>>>>>>>>>>>>>>> the target >>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed >>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but >>>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent >>>>>>>>>>>>>>>>>> should not >>>>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so >>>>>>>>>>>>>>>>>> agent bug. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't >>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says >>>>>>>>>>>>>>>>> nothing about suspension ... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As I've said before, we could add a note about the data >>>>>>>>>>>>>>>> being potentially >>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>>>>>> stat(2). You can >>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the >>>>>>>>>>>>>>>> info is current >>>>>>>>>>>>>>>> by the time you process what you got back. Is it too >>>>>>>>>>>>>>>> much motherhood to >>>>>>>>>>>>>>>> state that the data might be stale? I could go either >>>>>>>>>>>>>>>> way... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this >>>>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any >>>>>>>>>>>>>>>>>>> safepoints. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the operation >>>>>>>>>>>>>>>>> would create a per-thread safepoint. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it >>>>>>>>>>>>>>>> and probably >>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do >>>>>>>>>>>>>>>>> that because suspends/resume don't nest. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we >>>>>>>>>>>>>>>> tracked internal and >>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to >>>>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the >>>>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in >>>>>>>>>>>>>>>>>>>>>> native [4]. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it >>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the >>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first >>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about >>>>>>>>>>>>>>>>>>>>>> the object's monitor." >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner >>>>>>>>>>>>>>>>>>>>>> might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in >>>>>>>>>>>>>>>>>>>>>> any case. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will >>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe >>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying >>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I think >>>>>>>>>>>>>>>>>>>>> to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended >>>>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the >>>>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we >>>>>>>>>>>>>>>>>>>>>>> call: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a >>>>>>>>>>>>>>>>>>>>>>> racy way. This can't happen when suspension >>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current thread >>>>>>>>>>>>>>>>>>>>>>> won't go to that safepoint during this code. >>>>>>>>>>>>>>>>>>>>>>> However, if suspension is implemented via a >>>>>>>>>>>>>>>>>>>>>>> direct handshake with the target thread then we >>>>>>>>>>>>>>>>>>>>>>> have a problem. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> > From daniel.daugherty at oracle.com Thu Jun 18 14:06:38 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 18 Jun 2020 10:06:38 -0400 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <8192d5c0-c30b-c410-b14c-47698f43748b@oracle.com> References: <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <98d1761f-df82-8990-a784-25c953472793@oracle.com> <9aecb945-109b-334e-8e11-f2c8224048e2@oracle.com> <8192d5c0-c30b-c410-b14c-47698f43748b@oracle.com> Message-ID: <4802dc5c-394f-0557-7522-f73df7ace327@oracle.com> On 6/18/20 10:04 AM, David Holmes wrote: > On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote: >> On 6/18/20 9:18 AM, David Holmes wrote: >>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote: >>>> On 2020/06/18 17:36, David Holmes wrote: >>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>> Hi David, >>>>>> >>>>>> Both ThreadsListHandle and ResourceMarks would use >>>>>> `Thread::current()` for their resource. It is set as default >>>>>> parameter in c'tor. >>>>>> Do you mean we should it explicitly in c'tor? >>>>> >>>>> Yes pass current_thread so we don't do the additional unnecessary >>>>> calls to Thread::current(). >>>> >>>> Ok, I've fixed them. Could you review again? >>>> >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>> >>> Updates look good. One nit I missed before: >>> >>> src/hotspot/share/prims/jvmtiEnv.cpp >>> >>> // It need to perform at safepoint for gathering stable data >>> >>> please change to: >>> >>> // This need to be performed at a safepoint to gather stable data >> >> Just a comment on this comment... I still haven't gotten to the >> webrev yet... >> >> Perhaps: >> >> ???? // This needs to be performed at a safepoint to gather stable data. > > There is a second line that continues the sentence > > // because monitor owner / waiters might not be suspended. So no period at the end... but the s/need/needs/ still works. :-) Dan > > David > ----- > >> Dan >>> >>> Thanks, >>> David >>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>>> David >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>> Hi Serguei, >>>>>>>> >>>>>>>> Thanks for your comment! >>>>>>>> I uploaded new webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>> >>>>>>>> I'm not sure the following change is correct. >>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>> >>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>> >>>>>>> But given this is now only executed at a safepoint there are >>>>>>> additional simplifications that can be made: >>>>>>> >>>>>>> - current thread determination can be simplified: >>>>>>> >>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>> >>>>>>> becomes: >>>>>>> >>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>> >>>>>>> - these comments can be removed >>>>>>> >>>>>>> ??994?????? // Use current thread since function can be called >>>>>>> from a >>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>> 1053?????? // Use current thread since function can be called >>>>>>> from a >>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>> >>>>>>> - these TLH constructions should be passing current_thread >>>>>>> (existing bug) >>>>>>> >>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>> >>>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>>> >>>>>>> >>>>>>> Aside: there is a major inconsistency between the spec and >>>>>>> implementation for this method. I've traced the history to see >>>>>>> how this came about from JVMDI (ref JDK-4546581) but it never >>>>>>> resulted in the JVM TI specification clearly stating what the >>>>>>> waiters/waiter_count means. I will file a bug to have the spec >>>>>>> clarified to match the implementation (even though I think the >>>>>>> implementation is what is wrong). :( >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> All tests on submit repo and serviceability/jvmti and >>>>>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>> >>>>>>>> >>>>>>>> ``` >>>>>>>> ??????? // This monitor is owned so we have to find the owning >>>>>>>> JavaThread. >>>>>>>> ??????? owning_thread = >>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because >>>>>>>> this function >>>>>>>> -????? // may not have been called at a safepoint and the >>>>>>>> owning_thread >>>>>>>> -????? // might not be suspended. >>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>> -??????? // The monitor's owner either has to be the current >>>>>>>> thread, at safepoint >>>>>>>> -??????? // or it has to be suspended. Any of these conditions >>>>>>>> will prevent both >>>>>>>> -??????? // contending and waiting threads from modifying the >>>>>>>> state of >>>>>>>> -??????? // the monitor. >>>>>>>> -??????? if (!at_safepoint && >>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>> -????????? // Don't worry! This return of >>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>> -????????? // will not make it back to the JVM/TI agent. The >>>>>>>> error code will >>>>>>>> -????????? // get intercepted in >>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>> -????????? // will retry the call via a >>>>>>>> VM_GetObjectMonitorUsage VM op. >>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>> -??????? } >>>>>>>> -??????? HandleMark hm; >>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must >>>>>>>> not be NULL"); >>>>>>>> ????????? Handle???? th(current_thread, >>>>>>>> owning_thread->threadObj()); >>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>> >>>>>>>> ``` >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> This fix is not enough. >>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in >>>>>>>>> two modes: in VMop and non-VMop. >>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>> (Change subject for RFR) >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>> Could you review it? >>>>>>>>>> >>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>> ? webrev: >>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>> >>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>> Also I tested it with serviceability/jvmti and >>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>> >>>>>>>>>>>> Go for it! :) >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) >>>>>>>>>>>>>>>>>>>>>>>>> does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if >>>>>>>>>>>>>>>>>>>>>>>> the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, >>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, >>>>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases >>>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not >>>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner >>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the >>>>>>>>>>>>>>>>>>>>>>>>> monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. >>>>>>>>>>>>>>>>>>>>>>>> If it is not suspended we detect that and redo >>>>>>>>>>>>>>>>>>>>>>>> the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume >>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>> has started the process of gathering the >>>>>>>>>>>>>>>>>>>>> information while not at a >>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects >>>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is >>>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a >>>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed >>>>>>>>>>>>>>>>>>>> just after the code determined it was suspended. >>>>>>>>>>>>>>>>>>>> The monitor can then be released and the >>>>>>>>>>>>>>>>>>>> information gathered not only stale but potentially >>>>>>>>>>>>>>>>>>>> completely wrong as it could now be owned by a >>>>>>>>>>>>>>>>>>>> different thread and will report that thread's >>>>>>>>>>>>>>>>>>>> entry count. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as >>>>>>>>>>>>>>>>>>> soon as >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation >>>>>>>>>>>>>>>>>>> returns >>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the >>>>>>>>>>>>>>>>>>> target thread >>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. >>>>>>>>>>>>>>>>>> But the expectation is that the information was >>>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the >>>>>>>>>>>>>>>>>> monitor at some point in time. The current code does >>>>>>>>>>>>>>>>>> not ensure that. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you >>>>>>>>>>>>>>>>> think the info >>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of >>>>>>>>>>>>>>>>> the monitor >>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it >>>>>>>>>>>>>>>> any longer when the entry count is read, so straight >>>>>>>>>>>>>>>> away you may have the wrong entry count information. >>>>>>>>>>>>>>>> The set of threads trying to acquire the monitor, or >>>>>>>>>>>>>>>> wait on the monitor can change in unexpected ways. It >>>>>>>>>>>>>>>> would be possible for instance to report the same >>>>>>>>>>>>>>>> thread as being the owner, being blocked trying to >>>>>>>>>>>>>>>> enter the monitor, and being in the wait-set of the >>>>>>>>>>>>>>>> monitor - apparently all at the same time! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have >>>>>>>>>>>>>>>> complete atomicity because threads can join the set of >>>>>>>>>>>>>>>> threads trying to enter the monitor (unless they are >>>>>>>>>>>>>>>> all suspended). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ >>>>>>>>>>>>>>> suspended: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather >>>>>>>>>>>>>>> the info about >>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, >>>>>>>>>>>>>>> the info that >>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from >>>>>>>>>>>>>>> the safepoint. >>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the >>>>>>>>>>>>>>> object's >>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no >>>>>>>>>>>>>>> other >>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change >>>>>>>>>>>>>>> because >>>>>>>>>>>>>>> ??? another thread can block on entry while we are >>>>>>>>>>>>>>> gathering >>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a >>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that >>>>>>>>>>>>>>> thread is >>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is >>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info >>>>>>>>>>>>>>> about the >>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it >>>>>>>>>>>>>>> finds a >>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, >>>>>>>>>>>>>>> then it will >>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I >>>>>>>>>>>>>>> just >>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the >>>>>>>>>>>>>>> threads >>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call >>>>>>>>>>>>>>> can cause >>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a >>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. >>>>>>>>>>>>>>> This >>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the >>>>>>>>>>>>>>> entry >>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at >>>>>>>>>>>>>>> exactly >>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is >>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>> the different pieces of info is unstable to varying >>>>>>>>>>>>>>> degrees. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It would be possible for instance to report the same >>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the >>>>>>>>>>>>>>>> monitor, >>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently >>>>>>>>>>>>>>>> all at >>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it >>>>>>>>>>>>>>> can't >>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue >>>>>>>>>>>>>>> since >>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>> >>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>> I'm not sure if using handshakes can make performance >>>>>>>>>>>>>> better. >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>> >>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. >>>>>>>>>>>>>>>>>>> Perhaps the doc >>>>>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's >>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be >>>>>>>>>>>>>>>>>>>> construed as an agent bug. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while >>>>>>>>>>>>>>>>>>> the target >>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was >>>>>>>>>>>>>>>>>>> resumed after >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but >>>>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the >>>>>>>>>>>>>>>>>>> agent should not >>>>>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so >>>>>>>>>>>>>>>>>>> agent bug. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't >>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says >>>>>>>>>>>>>>>>>> nothing about suspension ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As I've said before, we could add a note about the >>>>>>>>>>>>>>>>> data being potentially >>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>>>>>>> stat(2). You can >>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that >>>>>>>>>>>>>>>>> the info is current >>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too >>>>>>>>>>>>>>>>> much motherhood to >>>>>>>>>>>>>>>>> state that the data might be stale? I could go either >>>>>>>>>>>>>>>>> way... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow >>>>>>>>>>>>>>>>>>>> this to be fixed in the future without >>>>>>>>>>>>>>>>>>>> forcing/using any safepoints. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the >>>>>>>>>>>>>>>>>> operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it >>>>>>>>>>>>>>>>> and probably >>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do >>>>>>>>>>>>>>>>>> that because suspends/resume don't nest. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we >>>>>>>>>>>>>>>>> tracked internal and >>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used >>>>>>>>>>>>>>>>>>>>>>> to check thread state, it returns `true` when >>>>>>>>>>>>>>>>>>>>>>> the thread is sleeping [3], or when it performs >>>>>>>>>>>>>>>>>>>>>>> in native [4]. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it >>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the >>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first >>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about >>>>>>>>>>>>>>>>>>>>>>> the object's monitor." >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the >>>>>>>>>>>>>>>>>>>>>>> owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint >>>>>>>>>>>>>>>>>>>>>>> in any case. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will >>>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe >>>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying >>>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I >>>>>>>>>>>>>>>>>>>>>> think to always using a safepoint for the entire >>>>>>>>>>>>>>>>>>>>>> operation. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be >>>>>>>>>>>>>>>>>>>>>>>> suspended at the time we first see it, and may >>>>>>>>>>>>>>>>>>>>>>>> release the monitor, but then it may get >>>>>>>>>>>>>>>>>>>>>>>> suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner >>>>>>>>>>>>>>>>>>>>>>>> and proceed to query the monitor information in >>>>>>>>>>>>>>>>>>>>>>>> a racy way. This can't happen when suspension >>>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current >>>>>>>>>>>>>>>>>>>>>>>> thread won't go to that safepoint during this >>>>>>>>>>>>>>>>>>>>>>>> code. However, if suspension is implemented via >>>>>>>>>>>>>>>>>>>>>>>> a direct handshake with the target thread then >>>>>>>>>>>>>>>>>>>>>>>> we have a problem. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >> From david.holmes at oracle.com Thu Jun 18 14:09:36 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Jun 2020 00:09:36 +1000 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <4802dc5c-394f-0557-7522-f73df7ace327@oracle.com> References: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <98d1761f-df82-8990-a784-25c953472793@oracle.com> <9aecb945-109b-334e-8e11-f2c8224048e2@oracle.com> <8192d5c0-c30b-c410-b14c-47698f43748b@oracle.com> <4802dc5c-394f-0557-7522-f73df7ace327@oracle.com> Message-ID: <10ce9876-4848-6dc8-0500-abeebeebc769@oracle.com> On 19/06/2020 12:06 am, Daniel D. Daugherty wrote: > On 6/18/20 10:04 AM, David Holmes wrote: >> On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote: >>> On 6/18/20 9:18 AM, David Holmes wrote: >>>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote: >>>>> On 2020/06/18 17:36, David Holmes wrote: >>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> Both ThreadsListHandle and ResourceMarks would use >>>>>>> `Thread::current()` for their resource. It is set as default >>>>>>> parameter in c'tor. >>>>>>> Do you mean we should it explicitly in c'tor? >>>>>> >>>>>> Yes pass current_thread so we don't do the additional unnecessary >>>>>> calls to Thread::current(). >>>>> >>>>> Ok, I've fixed them. Could you review again? >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>>> >>>> Updates look good. One nit I missed before: >>>> >>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>> >>>> // It need to perform at safepoint for gathering stable data >>>> >>>> please change to: >>>> >>>> // This need to be performed at a safepoint to gather stable data >>> >>> Just a comment on this comment... I still haven't gotten to the >>> webrev yet... >>> >>> Perhaps: >>> >>> ???? // This needs to be performed at a safepoint to gather stable data. >> >> There is a second line that continues the sentence >> >> // because monitor owner / waiters might not be suspended. > > So no period at the end... but the s/need/needs/ still works. :-) Yes missed that. Sorry blurry eyes, after midnight. Good night ;-) David > Dan > >> >> David >> ----- >> >>> Dan >>>> >>>> Thanks, >>>> David >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> David >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>>> Hi Serguei, >>>>>>>>> >>>>>>>>> Thanks for your comment! >>>>>>>>> I uploaded new webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>>> >>>>>>>>> I'm not sure the following change is correct. >>>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>>> >>>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>>> >>>>>>>> But given this is now only executed at a safepoint there are >>>>>>>> additional simplifications that can be made: >>>>>>>> >>>>>>>> - current thread determination can be simplified: >>>>>>>> >>>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>>> >>>>>>>> becomes: >>>>>>>> >>>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>>> >>>>>>>> - these comments can be removed >>>>>>>> >>>>>>>> ??994?????? // Use current thread since function can be called >>>>>>>> from a >>>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>>> 1053?????? // Use current thread since function can be called >>>>>>>> from a >>>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>>> >>>>>>>> - these TLH constructions should be passing current_thread >>>>>>>> (existing bug) >>>>>>>> >>>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>>> >>>>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>>>> >>>>>>>> >>>>>>>> Aside: there is a major inconsistency between the spec and >>>>>>>> implementation for this method. I've traced the history to see >>>>>>>> how this came about from JVMDI (ref JDK-4546581) but it never >>>>>>>> resulted in the JVM TI specification clearly stating what the >>>>>>>> waiters/waiter_count means. I will file a bug to have the spec >>>>>>>> clarified to match the implementation (even though I think the >>>>>>>> implementation is what is wrong). :( >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> All tests on submit repo and serviceability/jvmti and >>>>>>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>>> >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> ??????? // This monitor is owned so we have to find the owning >>>>>>>>> JavaThread. >>>>>>>>> ??????? owning_thread = >>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because >>>>>>>>> this function >>>>>>>>> -????? // may not have been called at a safepoint and the >>>>>>>>> owning_thread >>>>>>>>> -????? // might not be suspended. >>>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>>> -??????? // The monitor's owner either has to be the current >>>>>>>>> thread, at safepoint >>>>>>>>> -??????? // or it has to be suspended. Any of these conditions >>>>>>>>> will prevent both >>>>>>>>> -??????? // contending and waiting threads from modifying the >>>>>>>>> state of >>>>>>>>> -??????? // the monitor. >>>>>>>>> -??????? if (!at_safepoint && >>>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>>> -????????? // Don't worry! This return of >>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>> -????????? // will not make it back to the JVM/TI agent. The >>>>>>>>> error code will >>>>>>>>> -????????? // get intercepted in >>>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>>> -????????? // will retry the call via a >>>>>>>>> VM_GetObjectMonitorUsage VM op. >>>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>>> -??????? } >>>>>>>>> -??????? HandleMark hm; >>>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must >>>>>>>>> not be NULL"); >>>>>>>>> ????????? Handle???? th(current_thread, >>>>>>>>> owning_thread->threadObj()); >>>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> This fix is not enough. >>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in >>>>>>>>>> two modes: in VMop and non-VMop. >>>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>>> (Change subject for RFR) >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>>> Could you review it? >>>>>>>>>>> >>>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>>> ? webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>>> Also I tested it with serviceability/jvmti and >>>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>> >>>>>>>>>>>>> Go for it! :) >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) >>>>>>>>>>>>>>>>>>>>>>>>>> does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if >>>>>>>>>>>>>>>>>>>>>>>>> the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, >>>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, >>>>>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases >>>>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not >>>>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner >>>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the >>>>>>>>>>>>>>>>>>>>>>>>>> monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. >>>>>>>>>>>>>>>>>>>>>>>>> If it is not suspended we detect that and redo >>>>>>>>>>>>>>>>>>>>>>>>> the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume >>>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the >>>>>>>>>>>>>>>>>>>>>> information while not at a >>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects >>>>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is >>>>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a >>>>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed >>>>>>>>>>>>>>>>>>>>> just after the code determined it was suspended. >>>>>>>>>>>>>>>>>>>>> The monitor can then be released and the >>>>>>>>>>>>>>>>>>>>> information gathered not only stale but potentially >>>>>>>>>>>>>>>>>>>>> completely wrong as it could now be owned by a >>>>>>>>>>>>>>>>>>>>> different thread and will report that thread's >>>>>>>>>>>>>>>>>>>>> entry count. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as >>>>>>>>>>>>>>>>>>>> soon as >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation >>>>>>>>>>>>>>>>>>>> returns >>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the >>>>>>>>>>>>>>>>>>>> target thread >>>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. >>>>>>>>>>>>>>>>>>> But the expectation is that the information was >>>>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the >>>>>>>>>>>>>>>>>>> monitor at some point in time. The current code does >>>>>>>>>>>>>>>>>>> not ensure that. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you >>>>>>>>>>>>>>>>>> think the info >>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of >>>>>>>>>>>>>>>>>> the monitor >>>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it >>>>>>>>>>>>>>>>> any longer when the entry count is read, so straight >>>>>>>>>>>>>>>>> away you may have the wrong entry count information. >>>>>>>>>>>>>>>>> The set of threads trying to acquire the monitor, or >>>>>>>>>>>>>>>>> wait on the monitor can change in unexpected ways. It >>>>>>>>>>>>>>>>> would be possible for instance to report the same >>>>>>>>>>>>>>>>> thread as being the owner, being blocked trying to >>>>>>>>>>>>>>>>> enter the monitor, and being in the wait-set of the >>>>>>>>>>>>>>>>> monitor - apparently all at the same time! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have >>>>>>>>>>>>>>>>> complete atomicity because threads can join the set of >>>>>>>>>>>>>>>>> threads trying to enter the monitor (unless they are >>>>>>>>>>>>>>>>> all suspended). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ >>>>>>>>>>>>>>>> suspended: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather >>>>>>>>>>>>>>>> the info about >>>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, >>>>>>>>>>>>>>>> the info that >>>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from >>>>>>>>>>>>>>>> the safepoint. >>>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the >>>>>>>>>>>>>>>> object's >>>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no >>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change >>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>> ??? another thread can block on entry while we are >>>>>>>>>>>>>>>> gathering >>>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a >>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that >>>>>>>>>>>>>>>> thread is >>>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is >>>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info >>>>>>>>>>>>>>>> about the >>>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it >>>>>>>>>>>>>>>> finds a >>>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, >>>>>>>>>>>>>>>> then it will >>>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I >>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the >>>>>>>>>>>>>>>> threads >>>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call >>>>>>>>>>>>>>>> can cause >>>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a >>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. >>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the >>>>>>>>>>>>>>>> entry >>>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at >>>>>>>>>>>>>>>> exactly >>>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is >>>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>>> the different pieces of info is unstable to varying >>>>>>>>>>>>>>>> degrees. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It would be possible for instance to report the same >>>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the >>>>>>>>>>>>>>>>> monitor, >>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently >>>>>>>>>>>>>>>>> all at >>>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it >>>>>>>>>>>>>>>> can't >>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue >>>>>>>>>>>>>>>> since >>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is >>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance >>>>>>>>>>>>>>> better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. >>>>>>>>>>>>>>>>>>>> Perhaps the doc >>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's >>>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be >>>>>>>>>>>>>>>>>>>>> construed as an agent bug. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while >>>>>>>>>>>>>>>>>>>> the target >>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was >>>>>>>>>>>>>>>>>>>> resumed after >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but >>>>>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the >>>>>>>>>>>>>>>>>>>> agent should not >>>>>>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so >>>>>>>>>>>>>>>>>>>> agent bug. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't >>>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says >>>>>>>>>>>>>>>>>>> nothing about suspension ... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the >>>>>>>>>>>>>>>>>> data being potentially >>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>>>>>>>> stat(2). You can >>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that >>>>>>>>>>>>>>>>>> the info is current >>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too >>>>>>>>>>>>>>>>>> much motherhood to >>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either >>>>>>>>>>>>>>>>>> way... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow >>>>>>>>>>>>>>>>>>>>> this to be fixed in the future without >>>>>>>>>>>>>>>>>>>>> forcing/using any safepoints. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the >>>>>>>>>>>>>>>>>>> operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it >>>>>>>>>>>>>>>>>> and probably >>>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do >>>>>>>>>>>>>>>>>>> that because suspends/resume don't nest. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we >>>>>>>>>>>>>>>>>> tracked internal and >>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used >>>>>>>>>>>>>>>>>>>>>>>> to check thread state, it returns `true` when >>>>>>>>>>>>>>>>>>>>>>>> the thread is sleeping [3], or when it performs >>>>>>>>>>>>>>>>>>>>>>>> in native [4]. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it >>>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the >>>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first >>>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about >>>>>>>>>>>>>>>>>>>>>>>> the object's monitor." >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the >>>>>>>>>>>>>>>>>>>>>>>> owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint >>>>>>>>>>>>>>>>>>>>>>>> in any case. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will >>>>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe >>>>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying >>>>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I >>>>>>>>>>>>>>>>>>>>>>> think to always using a safepoint for the entire >>>>>>>>>>>>>>>>>>>>>>> operation. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be >>>>>>>>>>>>>>>>>>>>>>>>> suspended at the time we first see it, and may >>>>>>>>>>>>>>>>>>>>>>>>> release the monitor, but then it may get >>>>>>>>>>>>>>>>>>>>>>>>> suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner >>>>>>>>>>>>>>>>>>>>>>>>> and proceed to query the monitor information in >>>>>>>>>>>>>>>>>>>>>>>>> a racy way. This can't happen when suspension >>>>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current >>>>>>>>>>>>>>>>>>>>>>>>> thread won't go to that safepoint during this >>>>>>>>>>>>>>>>>>>>>>>>> code. However, if suspension is implemented via >>>>>>>>>>>>>>>>>>>>>>>>> a direct handshake with the target thread then >>>>>>>>>>>>>>>>>>>>>>>>> we have a problem. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>> > From suenaga at oss.nttdata.com Thu Jun 18 14:37:32 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 18 Jun 2020 23:37:32 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <10ce9876-4848-6dc8-0500-abeebeebc769@oracle.com> References: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <98d1761f-df82-8990-a784-25c953472793@oracle.com> <9aecb945-109b-334e-8e11-f2c8224048e2@oracle.com> <8192d5c0-c30b-c410-b14c-47698f43748b@oracle.com> <4802dc5c-394f-0557-7522-f73df7ace327@oracle.com> <10ce9876-4848-6dc8-0500-abeebeebc769@oracle.com> Message-ID: Hi David, Dan Thanks for the comment! I will update the comment as below before pushing: // This needs to be performed at a safepoint to gather stable data // because monitor owner / waiters might not be suspended. Yasumasa On 2020/06/18 23:09, David Holmes wrote: > On 19/06/2020 12:06 am, Daniel D. Daugherty wrote: >> On 6/18/20 10:04 AM, David Holmes wrote: >>> On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote: >>>> On 6/18/20 9:18 AM, David Holmes wrote: >>>>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote: >>>>>> On 2020/06/18 17:36, David Holmes wrote: >>>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. >>>>>>>> Do you mean we should it explicitly in c'tor? >>>>>>> >>>>>>> Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). >>>>>> >>>>>> Ok, I've fixed them. Could you review again? >>>>>> >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>>>> >>>>> Updates look good. One nit I missed before: >>>>> >>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>> >>>>> // It need to perform at safepoint for gathering stable data >>>>> >>>>> please change to: >>>>> >>>>> // This need to be performed at a safepoint to gather stable data >>>> >>>> Just a comment on this comment... I still haven't gotten to the webrev yet... >>>> >>>> Perhaps: >>>> >>>> ???? // This needs to be performed at a safepoint to gather stable data. >>> >>> There is a second line that continues the sentence >>> >>> // because monitor owner / waiters might not be suspended. >> >> So no period at the end... but the s/need/needs/ still works. :-) > > Yes missed that. Sorry blurry eyes, after midnight. > > Good night ;-) > > David > >> Dan >> >>> >>> David >>> ----- >>> >>>> Dan >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> David >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>>>> Hi Serguei, >>>>>>>>>> >>>>>>>>>> Thanks for your comment! >>>>>>>>>> I uploaded new webrev: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>>>> >>>>>>>>>> I'm not sure the following change is correct. >>>>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>>>> >>>>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>>>> >>>>>>>>> But given this is now only executed at a safepoint there are additional simplifications that can be made: >>>>>>>>> >>>>>>>>> - current thread determination can be simplified: >>>>>>>>> >>>>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>>>> >>>>>>>>> becomes: >>>>>>>>> >>>>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>>>> >>>>>>>>> - these comments can be removed >>>>>>>>> >>>>>>>>> ??994?????? // Use current thread since function can be called from a >>>>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>>>> 1053?????? // Use current thread since function can be called from a >>>>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>>>> >>>>>>>>> - these TLH constructions should be passing current_thread (existing bug) >>>>>>>>> >>>>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>>>> >>>>>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>>>>> >>>>>>>>> >>>>>>>>> Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> ??????? // This monitor is owned so we have to find the owning JavaThread. >>>>>>>>>> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because this function >>>>>>>>>> -????? // may not have been called at a safepoint and the owning_thread >>>>>>>>>> -????? // might not be suspended. >>>>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>>>> -??????? // The monitor's owner either has to be the current thread, at safepoint >>>>>>>>>> -??????? // or it has to be suspended. Any of these conditions will prevent both >>>>>>>>>> -??????? // contending and waiting threads from modifying the state of >>>>>>>>>> -??????? // the monitor. >>>>>>>>>> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>>>> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>>> -????????? // will not make it back to the JVM/TI agent. The error code will >>>>>>>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>>>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>>>> -??????? } >>>>>>>>>> -??????? HandleMark hm; >>>>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >>>>>>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>> >>>>>>>>>>> This fix is not enough. >>>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>>>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>>>> (Change subject for RFR) >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>>>> Could you review it? >>>>>>>>>>>> >>>>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>>>> >>>>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Go for it! :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>> >> From daniel.daugherty at oracle.com Thu Jun 18 14:38:34 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 18 Jun 2020 10:38:34 -0400 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> References: <2c71d549-ad41-df90-ca44-7e6bc3cac89f@oracle.com> <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> Message-ID: <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: > On 2020/06/18 17:36, David Holmes wrote: >> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>> Hi David, >>> >>> Both ThreadsListHandle and ResourceMarks would use >>> `Thread::current()` for their resource. It is set as default >>> parameter in c'tor. >>> Do you mean we should it explicitly in c'tor? >> >> Yes pass current_thread so we don't do the additional unnecessary >> calls to Thread::current(). > > Ok, I've fixed them. Could you review again? > > ? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ src/hotspot/share/prims/jvmtiEnv.cpp ??? L2842: ? // It need to perform at safepoint for gathering stable data ??????? Perhaps: ???????????? // This needs to be performed at a safepoint to gather stable data src/hotspot/share/prims/jvmtiEnvBase.cpp ??? No comments. Thumbs up. What testing has been done on this fix? Also, please wait to hear from Serguei on this fix... Dan > > > Thanks, > > Yasumasa > > >> David >> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/18 13:58, David Holmes wrote: >>>> Hi Yasumasa, >>>> >>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>> Hi Serguei, >>>>> >>>>> Thanks for your comment! >>>>> I uploaded new webrev: >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>> >>>>> I'm not sure the following change is correct. >>>>> Can we assume owning_thread is not NULL at safepoint? >>>> >>>> We can if "owner != NULL". So that change seem fine to me. >>>> >>>> But given this is now only executed at a safepoint there are >>>> additional simplifications that can be made: >>>> >>>> - current thread determination can be simplified: >>>> >>>> 945?? Thread* current_thread = Thread::current(); >>>> >>>> becomes: >>>> >>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>> ??? assert(current_thread == Thread::current(), "must be"); >>>> >>>> - these comments can be removed >>>> >>>> ??994?????? // Use current thread since function can be called from a >>>> ??995?????? // JavaThread or the VMThread. >>>> 1053?????? // Use current thread since function can be called from a >>>> 1054?????? // JavaThread or the VMThread. >>>> >>>> - these TLH constructions should be passing current_thread >>>> (existing bug) >>>> >>>> 996?????? ThreadsListHandle tlh; >>>> 1055?????? ThreadsListHandle tlh; >>>> >>>> - All ResourceMarks should be passing current_thread (existing bug) >>>> >>>> >>>> Aside: there is a major inconsistency between the spec and >>>> implementation for this method. I've traced the history to see how >>>> this came about from JVMDI (ref JDK-4546581) but it never resulted >>>> in the JVM TI specification clearly stating what the >>>> waiters/waiter_count means. I will file a bug to have the spec >>>> clarified to match the implementation (even though I think the >>>> implementation is what is wrong). :( >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> All tests on submit repo and serviceability/jvmti and >>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>> >>>>> >>>>> ``` >>>>> ??????? // This monitor is owned so we have to find the owning >>>>> JavaThread. >>>>> ??????? owning_thread = >>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>> -????? // Cannot assume (owning_thread != NULL) here because this >>>>> function >>>>> -????? // may not have been called at a safepoint and the >>>>> owning_thread >>>>> -????? // might not be suspended. >>>>> -????? if (owning_thread != NULL) { >>>>> -??????? // The monitor's owner either has to be the current >>>>> thread, at safepoint >>>>> -??????? // or it has to be suspended. Any of these conditions >>>>> will prevent both >>>>> -??????? // contending and waiting threads from modifying the >>>>> state of >>>>> -??????? // the monitor. >>>>> -??????? if (!at_safepoint && >>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>> -????????? // Don't worry! This return of >>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>> -????????? // will not make it back to the JVM/TI agent. The error >>>>> code will >>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() >>>>> which >>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage >>>>> VM op. >>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>> -??????? } >>>>> -??????? HandleMark hm; >>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not >>>>> be NULL"); >>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>> >>>>> ``` >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Yasumasa, >>>>>> >>>>>> This fix is not enough. >>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two >>>>>> modes: in VMop and non-VMop. >>>>>> The non-VMop mode has to be removed. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>> (Change subject for RFR) >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>> Could you review it? >>>>>>> >>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>> ? webrev: >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>> >>>>>>> This change has passed tests on submit repo. >>>>>>> Also I tested it with serviceability/jvmti and >>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>> Yes. It seems we have a consensus. >>>>>>>> Thank you for taking care about it. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Serguei >>>>>>>> >>>>>>>> >>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>> >>>>>>>>> Go for it! :) >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does >>>>>>>>>>>>>>>>>>>>>> not perform at safepoint. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the >>>>>>>>>>>>>>>>>>>>> target is not suspended: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, >>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, >>>>>>>>>>>>>>>>>>>>> info_ptr); >>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, >>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases >>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not >>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might >>>>>>>>>>>>>>>>>>>>>> be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor >>>>>>>>>>>>>>>>>>>>>> before [2]. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If >>>>>>>>>>>>>>>>>>>>> it is not suspended we detect that and redo the >>>>>>>>>>>>>>>>>>>>> whole query at a safepoint. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately >>>>>>>>>>>>>>>>>>>> after suspending check. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>> has started the process of gathering the information >>>>>>>>>>>>>>>>>> while not at a >>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The code tries to make sure that it either collects >>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is >>>>>>>>>>>>>>>>> suspended, or else it collects that data at a >>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just >>>>>>>>>>>>>>>>> after the code determined it was suspended. The >>>>>>>>>>>>>>>>> monitor can then be released and the information >>>>>>>>>>>>>>>>> gathered not only stale but potentially completely >>>>>>>>>>>>>>>>> wrong as it could now be owned by a different thread >>>>>>>>>>>>>>>>> and will report that thread's entry count. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation >>>>>>>>>>>>>>>> returns >>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target >>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. >>>>>>>>>>>>>>> But the expectation is that the information was actually >>>>>>>>>>>>>>> an accurate snapshot of the state of the monitor at some >>>>>>>>>>>>>>> point in time. The current code does not ensure that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please explain. I clearly don't understand why you think >>>>>>>>>>>>>> the info >>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the >>>>>>>>>>>>>> monitor >>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>> >>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any >>>>>>>>>>>>> longer when the entry count is read, so straight away you >>>>>>>>>>>>> may have the wrong entry count information. The set of >>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the >>>>>>>>>>>>> monitor can change in unexpected ways. It would be >>>>>>>>>>>>> possible for instance to report the same thread as being >>>>>>>>>>>>> the owner, being blocked trying to enter the monitor, and >>>>>>>>>>>>> being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>> the same time! >>>>>>>>>>>>> >>>>>>>>>>>>> ** even if the owner is suspended we don't have complete >>>>>>>>>>>>> atomicity because threads can join the set of threads >>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>> >>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>> >>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the >>>>>>>>>>>> info about >>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the >>>>>>>>>>>> info that >>>>>>>>>>>> ??? we are gathering cannot change until we return from the >>>>>>>>>>>> safepoint. >>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>> >>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>> >>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the >>>>>>>>>>>> object's >>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>> >>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>> >>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>> >>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about >>>>>>>>>>>> the >>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then >>>>>>>>>>>> it will >>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the >>>>>>>>>>>> threads >>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can >>>>>>>>>>>> cause >>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>> >>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>> ??? the right time. >>>>>>>>>>>> >>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>> >>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>> >>>>>>>>>>>> As for this claim: >>>>>>>>>>>> >>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>> as being the owner, being blocked trying to enter the >>>>>>>>>>>>> monitor, >>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>> the same time! >>>>>>>>>>>> >>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>> wait queue. >>>>>>>>>>>> >>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>> will still be field instability. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>> >>>>>>>>>>> I agree with this. >>>>>>>>>>> The advantages are: >>>>>>>>>>> ??- the result is stable >>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>> >>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>> >>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps >>>>>>>>>>>>>>>> the doc >>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's >>>>>>>>>>>>>>>>> being suspended so I can't see how this could be >>>>>>>>>>>>>>>>> construed as an agent bug. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the >>>>>>>>>>>>>>>> target >>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed >>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent >>>>>>>>>>>>>>>> should not >>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent >>>>>>>>>>>>>>>> bug. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>> independent resume, but you're right that doesn't really >>>>>>>>>>>>>>> make a lot of sense. But when the spec says nothing >>>>>>>>>>>>>>> about suspension ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> And it is intentional that suspension is not required. >>>>>>>>>>>>>> JVM/DI and JVM/PI >>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As I've said before, we could add a note about the data >>>>>>>>>>>>>> being potentially >>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like >>>>>>>>>>>>>> stat(2). You can >>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the >>>>>>>>>>>>>> info is current >>>>>>>>>>>>>> by the time you process what you got back. Is it too much >>>>>>>>>>>>>> motherhood to >>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this >>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any >>>>>>>>>>>>>>>>> safepoints. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding >>>>>>>>>>>>>>>> talking about >>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>> whilst the monitor is queried. In effect the operation >>>>>>>>>>>>>>> would create a per-thread safepoint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I "know" that, but I still need time to think about it >>>>>>>>>>>>>> and probably >>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do >>>>>>>>>>>>>>> that because suspends/resume don't nest. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked >>>>>>>>>>>>>> internal and >>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to >>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the >>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in >>>>>>>>>>>>>>>>>>>> native [4]. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it >>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed >>>>>>>>>>>>>>>>>>>>> common case where threads are first suspended and >>>>>>>>>>>>>>>>>>>>> then the monitors are queried. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about >>>>>>>>>>>>>>>>>>>> the object's monitor." >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner >>>>>>>>>>>>>>>>>>>> might be just before wakeup. >>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in >>>>>>>>>>>>>>>>>>>> any case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using >>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will >>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe >>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying >>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I think >>>>>>>>>>>>>>>>>>> to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended >>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the >>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we >>>>>>>>>>>>>>>>>>>>> call: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and >>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a racy >>>>>>>>>>>>>>>>>>>>> way. This can't happen when suspension itself >>>>>>>>>>>>>>>>>>>>> requires a safepoint as the current thread won't >>>>>>>>>>>>>>>>>>>>> go to that safepoint during this code. However, if >>>>>>>>>>>>>>>>>>>>> suspension is implemented via a direct handshake >>>>>>>>>>>>>>>>>>>>> with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> From suenaga at oss.nttdata.com Thu Jun 18 14:56:54 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 18 Jun 2020 23:56:54 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> References: <4d81ca74-36d0-42a0-9fcc-cb771cf7a5cf@oracle.com> <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> Message-ID: <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Hi Daniel, On 2020/06/18 23:38, Daniel D. Daugherty wrote: > On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: >> On 2020/06/18 17:36, David Holmes wrote: >>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>> Hi David, >>>> >>>> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. >>>> Do you mean we should it explicitly in c'tor? >>> >>> Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). >> >> Ok, I've fixed them. Could you review again? >> >> ? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ > > src/hotspot/share/prims/jvmtiEnv.cpp > ??? L2842: ? // It need to perform at safepoint for gathering stable data > ??????? Perhaps: > ???????????? // This needs to be performed at a safepoint to gather stable data I will change it before pushing. > src/hotspot/share/prims/jvmtiEnvBase.cpp > ??? No comments. > > Thumbs up. > > What testing has been done on this fix? I tested this change on serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. > Also, please wait to hear from Serguei on this fix... Ok. Thanks, Yasumasa > Dan > > >> >> >> Thanks, >> >> Yasumasa >> >> >>> David >>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/18 13:58, David Holmes wrote: >>>>> Hi Yasumasa, >>>>> >>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>> Hi Serguei, >>>>>> >>>>>> Thanks for your comment! >>>>>> I uploaded new webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>> >>>>>> I'm not sure the following change is correct. >>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>> >>>>> We can if "owner != NULL". So that change seem fine to me. >>>>> >>>>> But given this is now only executed at a safepoint there are additional simplifications that can be made: >>>>> >>>>> - current thread determination can be simplified: >>>>> >>>>> 945?? Thread* current_thread = Thread::current(); >>>>> >>>>> becomes: >>>>> >>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>> >>>>> - these comments can be removed >>>>> >>>>> ??994?????? // Use current thread since function can be called from a >>>>> ??995?????? // JavaThread or the VMThread. >>>>> 1053?????? // Use current thread since function can be called from a >>>>> 1054?????? // JavaThread or the VMThread. >>>>> >>>>> - these TLH constructions should be passing current_thread (existing bug) >>>>> >>>>> 996?????? ThreadsListHandle tlh; >>>>> 1055?????? ThreadsListHandle tlh; >>>>> >>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>> >>>>> >>>>> Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >>>>>> >>>>>> >>>>>> ``` >>>>>> ??????? // This monitor is owned so we have to find the owning JavaThread. >>>>>> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>> -????? // Cannot assume (owning_thread != NULL) here because this function >>>>>> -????? // may not have been called at a safepoint and the owning_thread >>>>>> -????? // might not be suspended. >>>>>> -????? if (owning_thread != NULL) { >>>>>> -??????? // The monitor's owner either has to be the current thread, at safepoint >>>>>> -??????? // or it has to be suspended. Any of these conditions will prevent both >>>>>> -??????? // contending and waiting threads from modifying the state of >>>>>> -??????? // the monitor. >>>>>> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>> -????????? // will not make it back to the JVM/TI agent. The error code will >>>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>> -??????? } >>>>>> -??????? HandleMark hm; >>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>> >>>>>> ``` >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> This fix is not enough. >>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>>>>>> The non-VMop mode has to be removed. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>> (Change subject for RFR) >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>> Could you review it? >>>>>>>> >>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>> >>>>>>>> This change has passed tests on submit repo. >>>>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>> Thank you for taking care about it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>> >>>>>>>>>> Go for it! :) >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>> >>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>>>> >>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>>> >>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>> >>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>> >>>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>> >>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>> >>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>> >>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>> >>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>> >>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>> >>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>> >>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>> the same time! >>>>>>>>>>>>> >>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>> wait queue. >>>>>>>>>>>>> >>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>> >>>>>>>>>>>> I agree with this. >>>>>>>>>>>> The advantages are: >>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>> >>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>> >>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>>> Dan >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> > From hohensee at amazon.com Thu Jun 18 19:11:36 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 18 Jun 2020 19:11:36 +0000 Subject: [11u] RFR/A: 8231209: [REDO] JDK-8207266 ThreadMXBean::getThreadAllocatedBytes() can be quicker for self thread Message-ID: <09229907-C812-4974-BF5D-8619FEA083BC@amazon.com> This request is for a pair of backports. 8231209 is the first (and primary), 8231968 is a minor cleanup. There are CSR?s for both. The effect is to add a convenience method getCurrentThreadAllocatedBytes() to com.sun.management.ThreadMXBean that can be implemented more efficiently than the equivalent getThreadAllocatedBytes(long id), and to make the implementation of getThreadAllocatedBytes(long id) and getThreadAllocatedBytes(long[] id) more efficient. These methods are heavily used by heap profiling tools, including Amazon?s, and their efficiency is important to us. There is no effect on the TCK because com.sun.management is a platform-specific package. See the CSRs for more detail. The patches apply cleanly (in sequence) to 11u, but I?m posting a review/approval request because the backport CSRs need approval. Once the backport CSRs are reviewed, finalized, and approved, l can tag 8231209 and 8231968. Tested with hotspot/jtreg/vmTestbase/nsk/monitoring jdk/com/sun/management jdk/jdk/jfr/event/runtime The same tests pass/fail as with unpatched jdk11u-dev. JDK-8231209: [REDO] JDK-8207266 ThreadMXBean::getThreadAllocatedBytes() can be quicker for self thread Original RFE: https://bugs.openjdk.java.net/browse/JDK-8231209 Original Patch: https://hg.openjdk.java.net/jdk/jdk/rev/c29e49148be7 Original CSR: https://bugs.openjdk.java.net/browse/JDK-8231374 Original review thread: https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-September/029208.html Backport RFE: https://bugs.openjdk.java.net/browse/JDK-8247806 Backport CSR: https://bugs.openjdk.java.net/browse/JDK-8247807 JDK-8231968: getCurrentThreadAllocatedBytes default implementation s/b getThreadAllocatedBytes Original RFE: https://bugs.openjdk.java.net/browse/JDK-8231968 Original Patch: https://hg.openjdk.java.net/jdk/jdk/rev/5bb426e9acc4 Original CSR: https://bugs.openjdk.java.net/browse/JDK-8232072 Original review thread: https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-October/029659.html Backport RFE: https://bugs.openjdk.java.net/browse/JDK-8247809 Backport CSR: https://bugs.openjdk.java.net/browse/JDK-8247810 Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleen.phillimore at oracle.com Thu Jun 18 19:54:59 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 18 Jun 2020 15:54:59 -0400 Subject: RFR 8247808: Move JVMTI strong oops to OopStorage In-Reply-To: <94da70e9-2c61-a97d-e31a-6e0ec4478faa@oracle.com> References: <94da70e9-2c61-a97d-e31a-6e0ec4478faa@oracle.com> Message-ID: <07a7aef1-61f2-aaee-aede-e76fa456e014@oracle.com> On 6/18/20 3:25 AM, Stefan Karlsson wrote: > Hi Coleen, > > On 2020-06-17 23:25, coleen.phillimore at oracle.com wrote: >> Summary: Remove JVMTI oops_do calls from JVMTI and GCs >> >> Tested with tier1-3, also built shenandoah to verify shenandoah changes. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev > > https://cr.openjdk.java.net/~coleenp/2020/8247808.01/webrev/src/hotspot/share/prims/jvmtiImpl.cpp.udiff.html > > > ?JvmtiBreakpoint::~JvmtiBreakpoint() { > - if (_class_holder != NULL) { > - NativeAccess<>::oop_store(_class_holder, (oop)NULL); > - OopStorageSet::vm_global()->release(_class_holder); > + if (_class_holder.resolve() != NULL) { > + _class_holder.release(); > ?? } > ?} > > Could this be changed to peek() / release() instead? The resolve() > call is going to keep the object alive until next for ZGC marking cycle. Yes, makes sense. Fixed. > > The rest looks OK. > > Below are some comments about things that I find odd and non-obvious > from reading the code, and may be potentials for cleanups to make it > easier for the next to understand the code: > > The above code assumes that as soon as OopHandle::create has been > called, we won't store NULL into the _obj pointer. If someone does, > then we would leak the memory. OopHandle has a function ptr_raw, that > allows someone to clear the _obj pointer. I have to assume that this > function isn't used in this code. > > --- > > ?214 void JvmtiBreakpoint::copy(JvmtiBreakpoint& bp) { > ?215?? _method?? = bp._method; > ?216?? _bci????? = bp._bci; > 217 _class_holder = OopHandle::create(bp._class_holder.resolve()); > ?218 } > > This one looks odd, because the _class_holder is overwritten without > releasing the old OopHandle. This is currently OK, because copy is > only called from clone, which just created a new JvmtiBreakpoint: > > ? GrowableElement *clone()??????? { > ??? JvmtiBreakpoint *bp = new JvmtiBreakpoint(); > ??? bp->copy(*this); > ??? return bp; > ? } > > ?I think this would have been much more obvious if copy/clone were a > copy constructor. Yes, this would make more sense.? I don't know why this was implemented as clone. > > With that said, it looks like we now have two JvmtiBreakpoints with > the same OopHandle contents. So, OopHandle::release will be called > twice. Now that works because release clears the oop value: > > inline void OopHandle::release() { > ? // Clear the OopHandle first > ? NativeAccess<>::oop_store(_obj, (oop)NULL); > ? OopStorageSet::vm_global()->release(_obj); > } > > and the resolve() != NULL check will prevent the OopHandle from being > released twice: > > + if (_class_holder.resolve() != NULL) { > + _class_holder.release(); > ?? } The release is called on the original JvmtiBreakpoint which has one OopHandle, and it's also called on the copy which has another, so release isn't called twice on the same OopHandle. That said, I had to walk through the code this morning and make sure that release is called on the copy of the JvmtiBreakpoint (it's called in remove() after the breakpoint is cleared.? The entire _bps array is not deleted). Thanks, Coleen > > StefanK > >> bug link https://bugs.openjdk.java.net/browse/JDK-8247808 >> >> Thanks, >> Coleen > From serguei.spitsyn at oracle.com Thu Jun 18 21:42:05 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 18 Jun 2020 14:42:05 -0700 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> References: <8dc2c3bc-226b-17d1-df20-49a256c29cd3@oracle.com> <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Message-ID: An HTML attachment was scrubbed... URL: From hohensee at amazon.com Thu Jun 18 21:52:16 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 18 Jun 2020 21:52:16 +0000 Subject: RFR (S): 8245129: Enhance jstat gc option output and tests In-Reply-To: <79438EAF-531A-488F-BED7-A619C3E227D5@amazon.com> References: <79438EAF-531A-488F-BED7-A619C3E227D5@amazon.com> Message-ID: <32795C52-3BFC-4277-8969-646113D1156B@amazon.com> Ping. Any takers for this simple patch? Thanks, Paul From: serviceability-dev on behalf of "Hohensee, Paul" Date: Monday, May 18, 2020 at 8:25 AM To: serviceability-dev Subject: RFR (S): 8245129: Enhance jstat gc option output and tests Please review an enhancement to the jstat gc option output to make the columns wider (for up to a 2TB heap) so one can read the output without going cross-eyed. Bug: https://bugs.openjdk.java.net/browse/JDK-8245129 Webrev: http://cr.openjdk.java.net/~phh/8245129/webrev.00/ I added tests using ParallelGC since the output can differ for non-G1 collectors. Successfully ran the test/hotspot/jtreg/serviceability/tmtools/jstat and test/jdk/sun/tools/jstat tests. A submit repo run had one failure runtime/MemberName/MemberNameLeak.java tier1 macosx-x64-debug but rerunning it on my laptop succeeded, and there?s no connection between this test and my patch. Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Thu Jun 18 22:55:44 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 18 Jun 2020 15:55:44 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> Message-ID: <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> On 6/18/20 1:43 AM, David Holmes wrote: > On 18/06/2020 4:49 pm, Chris Plummer wrote: >> On 6/17/20 10:29 PM, David Holmes wrote: >>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> Please help review the following: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>> >>>>>>>> The CR contains all the needed details. Here's a summary of >>>>>>>> changes in each file: >>>>>>> >>>>>>> The problem sounds to me like a variation of the more general >>>>>>> problem of not ensuring a thread is kept alive whilst acting >>>>>>> upon it. I don't know how the SA finds these references to the >>>>>>> threads it is going to stackwalk, but is it possible to fix this >>>>>>> via appropriate uses of ThreadsListHandle/Iterator? >>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>> >>>>>> Keep in mind that once SA attaches, nothing in the VM changes. >>>>>> For example, SA can't create a wrapper to a JavaThread, only to >>>>>> have the JavaThread be freed later on. It's just not possible. >>>>> >>>>> Then how does it obtain a reference to a JavaThread for which the >>>>> native OS thread id is invalid? Any thread found in >>>>> _java_thread_list is either live or still to be started. In the >>>>> latter case the JavaThread->osThread does not have its thread_id >>>>> set yet. >>>>> >>>> My assumption was that the JavaThread is in the process of being >>>> destroyed, and it has freed its OS thread but is itself still in >>>> the thread list. I did notice that the OS thread id being used >>>> looked to be in the range of thread id #'s you would expect for the >>>> running app, so that to me indicated it was once valid, but is no >>>> more. >>>> >>>> Keep in mind that although hotspot may have synchronization code >>>> that prevents you from pulling a JavaThread off the thread list >>>> when it is in the process of being destroyed (I'm guessing it >>>> does), SA has no such protections. >>> >>> But you stated that once the SA has attached, the target VM can't >>> change. If the SA gets its set of thread from one attach then tries >>> to make queries about those threads in a separate attach, then >>> obviously it could be providing garbage thread information. So you >>> would need to re-validate the JavaThread in the target VM before >>> trying to do anything with it. >> That's not what is going on here. It's attaching and doing a stack >> trace, which involves getting the thread list and iterating through >> all threads without detaching. > > Okay so I restate my original comment - all the JavaThreads must be > alive or not yet started, so how are you encountering an invalid > thread id? Any thread you find via the ThreadsList can't have > destroyed its osThread. In any case the logic should be checking > thread->osThread() for NULL, and then osThread()->get_state() to > ensure it is >= INITIALIZED before using the thread_id(). Hi David, I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] ?? java.lang.Thread.State: RUNNABLE ?? JavaThread state: _thread_in_native WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 ?- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) ?- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) ?- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) ?- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) ?- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. thanks, Chris > > Cheers, > David > ----- > >> Also, even if you are using something like clhsdb to issue commands >> on addresses, if the address is no longer valid for the command you >> are executing, then you would get the appropriate error when there is >> an attempt to create a wrapper for it. I don't know of any command >> that operates directly on a JavaThread, but I think there are for >> InstanceKlass. So if you remembered the address of an InstanceKlass, >> and then reattached and tried a command that takes an InstanceKlass >> address, you would get an exception when SA tries to create the >> wrapper for the InsanceKlass if it were no longer a valid address for >> one. >> >> Chris >>> >>> David >>> ----- >>> >>>> Chris >>>>> David >>>>> ----- >>>>> >>>>>> Chris >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>>>> invalid, print a warning. >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>> -Improve a print_debug message >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>> >>>>>>>> -Deal with the array of registers read in being null due to the >>>>>>>> OS ThreadID not being valid. >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>> >>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>> >>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>> appearing twice when printing the exception. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>> >>>> >> From jonathan.gibbons at oracle.com Thu Jun 18 23:00:11 2020 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Thu, 18 Jun 2020 16:00:11 -0700 Subject: RFR: [15,docs] JDK-8247894,Invalid @see in java.management Message-ID: <9e96f08b-17c2-fce3-5f95-29eb31d6287b@oracle.com> Please review a trivial fix for an invalid @see tag in java/lang/management/package.html. The presumed intent of the original is not a supported variant. The fix is to remove the `{@linkplain ...}` wrapper. -- Jon JBS: https://bugs.openjdk.java.net/browse/JDK-8247894 Patch inline: diff -r cf0df75c75c1 src/java.management/share/classes/java/lang/management/package.html --- a/src/java.management/share/classes/java/lang/management/package.html Thu Jun 18 14:07:49 2020 -0700 +++ b/src/java.management/share/classes/java/lang/management/package.html Thu Jun 18 15:51:09 2020 -0700 @@ -234,7 +234,7 @@ ?

The java.lang.management API is thread-safe. - at see {@linkplain javax.management JMX Specification} + at see javax.management JMX Specification ?@author Mandy Chung ?@since 1.5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe.darcy at oracle.com Thu Jun 18 23:12:43 2020 From: joe.darcy at oracle.com (Joe Darcy) Date: Thu, 18 Jun 2020 16:12:43 -0700 Subject: RFR: [15,docs] JDK-8247894,Invalid @see in java.management In-Reply-To: <9e96f08b-17c2-fce3-5f95-29eb31d6287b@oracle.com> References: <9e96f08b-17c2-fce3-5f95-29eb31d6287b@oracle.com> Message-ID: +1 -Joe On 6/18/2020 4:00 PM, Jonathan Gibbons wrote: > Please review a trivial fix for an invalid @see tag in > java/lang/management/package.html. > The presumed intent of the original is not a supported variant. The > fix is to remove the > `{@linkplain ...}` wrapper. > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247894 > > Patch inline: > > diff -r cf0df75c75c1 > src/java.management/share/classes/java/lang/management/package.html > --- > a/src/java.management/share/classes/java/lang/management/package.html > Thu Jun 18 14:07:49 2020 -0700 > +++ > b/src/java.management/share/classes/java/lang/management/package.html > Thu Jun 18 15:51:09 2020 -0700 > @@ -234,7 +234,7 @@ > > ?

The java.lang.management API is thread-safe. > > - at see {@linkplain javax.management JMX Specification} > + at see javax.management JMX Specification > > ?@author Mandy Chung > ?@since 1.5 > From mandy.chung at oracle.com Thu Jun 18 23:16:04 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 18 Jun 2020 16:16:04 -0700 Subject: RFR: [15,docs] JDK-8247894,Invalid @see in java.management In-Reply-To: <9e96f08b-17c2-fce3-5f95-29eb31d6287b@oracle.com> References: <9e96f08b-17c2-fce3-5f95-29eb31d6287b@oracle.com> Message-ID: <6d2f9e79-fa67-33f0-ad31-f84001174057@oracle.com> +1 Mandy On 6/18/20 4:00 PM, Jonathan Gibbons wrote: > > Please review a trivial fix for an invalid @see tag in > java/lang/management/package.html. > The presumed intent of the original is not a supported variant. The > fix is to remove the > `{@linkplain ...}` wrapper. > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247894 > > Patch inline: > > diff -r cf0df75c75c1 > src/java.management/share/classes/java/lang/management/package.html > --- > a/src/java.management/share/classes/java/lang/management/package.html > Thu Jun 18 14:07:49 2020 -0700 > +++ > b/src/java.management/share/classes/java/lang/management/package.html > Thu Jun 18 15:51:09 2020 -0700 > @@ -234,7 +234,7 @@ > > ?

The java.lang.management API is thread-safe. > > - at see {@linkplain javax.management JMX Specification} > + at see javax.management JMX Specification > > ?@author Mandy Chung > ?@since 1.5 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Thu Jun 18 23:23:58 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 18 Jun 2020 16:23:58 -0700 Subject: RFR: [15,docs] JDK-8247894,Invalid @see in java.management In-Reply-To: <6d2f9e79-fa67-33f0-ad31-f84001174057@oracle.com> References: <9e96f08b-17c2-fce3-5f95-29eb31d6287b@oracle.com> <6d2f9e79-fa67-33f0-ad31-f84001174057@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From suenaga at oss.nttdata.com Fri Jun 19 00:01:27 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 19 Jun 2020 09:01:27 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <0924b746-9d8e-415a-c484-dd890eda0e3f@oracle.com> <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Message-ID: Thanks Serguei! I fixed them, and the change works fine on my laptop with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. I will push it later. Yasumasa On 2020/06/19 6:42, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > This looks good, nice simplification. > > A couple of minor comments. > > http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/src/hotspot/share/prims/jvmtiEnvBase.cpp.frames.html > > 995 ThreadsListHandle tlh(current_thread); 1052 ThreadsListHandle tlh(current_thread); > > We can share one tlh for both fragments. > > 942 HandleMark hm; 1051 HandleMark hm; > > The second HandleMark is not needed. > Also, we can use current_thread in the first one: > > HandleMark hm(current_thread); > > > I do not need to see another webrev if you fix the above. > > Thanks, > Serguei > > > On 6/18/20 07:56, Yasumasa Suenaga wrote: >> Hi Daniel, >> >> On 2020/06/18 23:38, Daniel D. Daugherty wrote: >>> On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: >>>> On 2020/06/18 17:36, David Holmes wrote: >>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>> Hi David, >>>>>> >>>>>> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. >>>>>> Do you mean we should it explicitly in c'tor? >>>>> >>>>> Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). >>>> >>>> Ok, I've fixed them. Could you review again? >>>> >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>> >>> src/hotspot/share/prims/jvmtiEnv.cpp >>> ???? L2842: ? // It need to perform at safepoint for gathering stable data >>> ???????? Perhaps: >>> ????????????? // This needs to be performed at a safepoint to gather stable data >> >> I will change it before pushing. >> >> >>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>> ???? No comments. >>> >>> Thumbs up. >>> >>> What testing has been done on this fix? >> >> I tested this change on serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >> >> >>> Also, please wait to hear from Serguei on this fix... >> >> Ok. >> >> >> Thanks, >> >> Yasumasa >> >> >>> Dan >>> >>> >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>>> David >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>> Hi Serguei, >>>>>>>> >>>>>>>> Thanks for your comment! >>>>>>>> I uploaded new webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>> >>>>>>>> I'm not sure the following change is correct. >>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>> >>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>> >>>>>>> But given this is now only executed at a safepoint there are additional simplifications that can be made: >>>>>>> >>>>>>> - current thread determination can be simplified: >>>>>>> >>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>> >>>>>>> becomes: >>>>>>> >>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>> >>>>>>> - these comments can be removed >>>>>>> >>>>>>> ??994?????? // Use current thread since function can be called from a >>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>> 1053?????? // Use current thread since function can be called from a >>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>> >>>>>>> - these TLH constructions should be passing current_thread (existing bug) >>>>>>> >>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>> >>>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>>> >>>>>>> >>>>>>> Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>> >>>>>>>> >>>>>>>> ``` >>>>>>>> ??????? // This monitor is owned so we have to find the owning JavaThread. >>>>>>>> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because this function >>>>>>>> -????? // may not have been called at a safepoint and the owning_thread >>>>>>>> -????? // might not be suspended. >>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>> -??????? // The monitor's owner either has to be the current thread, at safepoint >>>>>>>> -??????? // or it has to be suspended. Any of these conditions will prevent both >>>>>>>> -??????? // contending and waiting threads from modifying the state of >>>>>>>> -??????? // the monitor. >>>>>>>> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>> -????????? // will not make it back to the JVM/TI agent. The error code will >>>>>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>> -??????? } >>>>>>>> -??????? HandleMark hm; >>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >>>>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>> >>>>>>>> ``` >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> This fix is not enough. >>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Serguei >>>>>>>>> >>>>>>>>> >>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>> (Change subject for RFR) >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>> Could you review it? >>>>>>>>>> >>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>> >>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>> >>>>>>>>>>>> Go for it! :) >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>> >>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>> >>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>> > From serguei.spitsyn at oracle.com Fri Jun 19 00:22:48 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 18 Jun 2020 17:22:48 -0700 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Message-ID: Hi Yasumasa, It would be even more safe to run the JDI tests as well. The ObjectReference owningThread(), waitingThreads() and entryCount() are based on this JVMTI function. See: https://docs.oracle.com/en/java/javase/14/docs/api/jdk.jdi/com/sun/jdi/ObjectReference.html Thanks, Serguei On 6/18/20 17:01, Yasumasa Suenaga wrote: > Thanks Serguei! > > I fixed them, and the change works fine on my laptop with > serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. > I will push it later. > > > Yasumasa > > > On 2020/06/19 6:42, serguei.spitsyn at oracle.com wrote: >> Hi Yasumasa, >> >> This looks good, nice simplification. >> >> A couple of minor comments. >> >> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/src/hotspot/share/prims/jvmtiEnvBase.cpp.frames.html >> >> >> 995 ThreadsListHandle tlh(current_thread); 1052 ThreadsListHandle >> tlh(current_thread); >> >> We can share one tlh for both fragments. >> >> 942 HandleMark hm; 1051 HandleMark hm; >> >> The second HandleMark is not needed. >> Also, we can use current_thread in the first one: >> >> HandleMark hm(current_thread); >> >> >> I do not need to see another webrev if you fix the above. >> >> Thanks, >> Serguei >> >> >> On 6/18/20 07:56, Yasumasa Suenaga wrote: >>> Hi Daniel, >>> >>> On 2020/06/18 23:38, Daniel D. Daugherty wrote: >>>> On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: >>>>> On 2020/06/18 17:36, David Holmes wrote: >>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> Both ThreadsListHandle and ResourceMarks would use >>>>>>> `Thread::current()` for their resource. It is set as default >>>>>>> parameter in c'tor. >>>>>>> Do you mean we should it explicitly in c'tor? >>>>>> >>>>>> Yes pass current_thread so we don't do the additional unnecessary >>>>>> calls to Thread::current(). >>>>> >>>>> Ok, I've fixed them. Could you review again? >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>>> >>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>> ???? L2842: ? // It need to perform at safepoint for gathering >>>> stable data >>>> ???????? Perhaps: >>>> ????????????? // This needs to be performed at a safepoint to >>>> gather stable data >>> >>> I will change it before pushing. >>> >>> >>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>> ???? No comments. >>>> >>>> Thumbs up. >>>> >>>> What testing has been done on this fix? >>> >>> I tested this change on serviceability/jvmti and >>> vmTestbase/nsk/jvmti on Linux x64. >>> >>> >>>> Also, please wait to hear from Serguei on this fix... >>> >>> Ok. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> Dan >>>> >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> David >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>>> Hi Serguei, >>>>>>>>> >>>>>>>>> Thanks for your comment! >>>>>>>>> I uploaded new webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>>> >>>>>>>>> I'm not sure the following change is correct. >>>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>>> >>>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>>> >>>>>>>> But given this is now only executed at a safepoint there are >>>>>>>> additional simplifications that can be made: >>>>>>>> >>>>>>>> - current thread determination can be simplified: >>>>>>>> >>>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>>> >>>>>>>> becomes: >>>>>>>> >>>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>>> >>>>>>>> - these comments can be removed >>>>>>>> >>>>>>>> ??994?????? // Use current thread since function can be called >>>>>>>> from a >>>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>>> 1053?????? // Use current thread since function can be called >>>>>>>> from a >>>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>>> >>>>>>>> - these TLH constructions should be passing current_thread >>>>>>>> (existing bug) >>>>>>>> >>>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>>> >>>>>>>> - All ResourceMarks should be passing current_thread (existing >>>>>>>> bug) >>>>>>>> >>>>>>>> >>>>>>>> Aside: there is a major inconsistency between the spec and >>>>>>>> implementation for this method. I've traced the history to see >>>>>>>> how this came about from JVMDI (ref JDK-4546581) but it never >>>>>>>> resulted in the JVM TI specification clearly stating what the >>>>>>>> waiters/waiter_count means. I will file a bug to have the spec >>>>>>>> clarified to match the implementation (even though I think the >>>>>>>> implementation is what is wrong). :( >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> All tests on submit repo and serviceability/jvmti and >>>>>>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>>> >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> ??????? // This monitor is owned so we have to find the owning >>>>>>>>> JavaThread. >>>>>>>>> ??????? owning_thread = >>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because >>>>>>>>> this function >>>>>>>>> -????? // may not have been called at a safepoint and the >>>>>>>>> owning_thread >>>>>>>>> -????? // might not be suspended. >>>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>>> -??????? // The monitor's owner either has to be the current >>>>>>>>> thread, at safepoint >>>>>>>>> -??????? // or it has to be suspended. Any of these conditions >>>>>>>>> will prevent both >>>>>>>>> -??????? // contending and waiting threads from modifying the >>>>>>>>> state of >>>>>>>>> -??????? // the monitor. >>>>>>>>> -??????? if (!at_safepoint && >>>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>>> -????????? // Don't worry! This return of >>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>> -????????? // will not make it back to the JVM/TI agent. The >>>>>>>>> error code will >>>>>>>>> -????????? // get intercepted in >>>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>>> -????????? // will retry the call via a >>>>>>>>> VM_GetObjectMonitorUsage VM op. >>>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>>> -??????? } >>>>>>>>> -??????? HandleMark hm; >>>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must >>>>>>>>> not be NULL"); >>>>>>>>> ????????? Handle???? th(current_thread, >>>>>>>>> owning_thread->threadObj()); >>>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> This fix is not enough. >>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in >>>>>>>>>> two modes: in VMop and non-VMop. >>>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Serguei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>>> (Change subject for RFR) >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>>> Could you review it? >>>>>>>>>>> >>>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>>> ? webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>>> Also I tested it with serviceability/jvmti and >>>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>> >>>>>>>>>>>>> Go for it! :) >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) >>>>>>>>>>>>>>>>>>>>>>>>>> does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if >>>>>>>>>>>>>>>>>>>>>>>>> the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject >>>>>>>>>>>>>>>>>>>>>>>>> object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, >>>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not >>>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, >>>>>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some >>>>>>>>>>>>>>>>>>>>>>>> cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is >>>>>>>>>>>>>>>>>>>>>>>> not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor >>>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform >>>>>>>>>>>>>>>>>>>>>>>>>> concurrently. >>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner >>>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the >>>>>>>>>>>>>>>>>>>>>>>>>> monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner >>>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is >>>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. >>>>>>>>>>>>>>>>>>>>>>>>> If it is not suspended we detect that and redo >>>>>>>>>>>>>>>>>>>>>>>>> the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume >>>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also >>>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the >>>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the >>>>>>>>>>>>>>>>>>>>>> information while not at a >>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either >>>>>>>>>>>>>>>>>>>>> collects data about a monitor owned by a thread >>>>>>>>>>>>>>>>>>>>> that is suspended, or else it collects that data >>>>>>>>>>>>>>>>>>>>> at a safepoint. But the owning thread can be >>>>>>>>>>>>>>>>>>>>> resumed just after the code determined it was >>>>>>>>>>>>>>>>>>>>> suspended. The monitor can then be released and >>>>>>>>>>>>>>>>>>>>> the information gathered not only stale but >>>>>>>>>>>>>>>>>>>>> potentially completely wrong as it could now be >>>>>>>>>>>>>>>>>>>>> owned by a different thread and will report that >>>>>>>>>>>>>>>>>>>>> thread's entry count. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as >>>>>>>>>>>>>>>>>>>> soon as >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation >>>>>>>>>>>>>>>>>>>> returns >>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the >>>>>>>>>>>>>>>>>>>> target thread >>>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is >>>>>>>>>>>>>>>>>>> fine. But the expectation is that the information >>>>>>>>>>>>>>>>>>> was actually an accurate snapshot of the state of >>>>>>>>>>>>>>>>>>> the monitor at some point in time. The current code >>>>>>>>>>>>>>>>>>> does not ensure that. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you >>>>>>>>>>>>>>>>>> think the info >>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of >>>>>>>>>>>>>>>>>> the monitor >>>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no >>>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it >>>>>>>>>>>>>>>>> any longer when the entry count is read, so straight >>>>>>>>>>>>>>>>> away you may have the wrong entry count information. >>>>>>>>>>>>>>>>> The set of threads trying to acquire the monitor, or >>>>>>>>>>>>>>>>> wait on the monitor can change in unexpected ways. It >>>>>>>>>>>>>>>>> would be possible for instance to report the same >>>>>>>>>>>>>>>>> thread as being the owner, being blocked trying to >>>>>>>>>>>>>>>>> enter the monitor, and being in the wait-set of the >>>>>>>>>>>>>>>>> monitor - apparently all at the same time! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have >>>>>>>>>>>>>>>>> complete atomicity because threads can join the set of >>>>>>>>>>>>>>>>> threads trying to enter the monitor (unless they are >>>>>>>>>>>>>>>>> all suspended). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ >>>>>>>>>>>>>>>> suspended: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather >>>>>>>>>>>>>>>> the info about >>>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, >>>>>>>>>>>>>>>> the info that >>>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from >>>>>>>>>>>>>>>> the safepoint. >>>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the >>>>>>>>>>>>>>>> object's >>>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that >>>>>>>>>>>>>>>> no other >>>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change >>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>> ??? another thread can block on entry while we are >>>>>>>>>>>>>>>> gathering >>>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a >>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that >>>>>>>>>>>>>>>> thread is >>>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is >>>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info >>>>>>>>>>>>>>>> about the >>>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it >>>>>>>>>>>>>>>> finds a >>>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, >>>>>>>>>>>>>>>> then it will >>>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I >>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for >>>>>>>>>>>>>>>> the threads >>>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call >>>>>>>>>>>>>>>> can cause >>>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a >>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. >>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread >>>>>>>>>>>>>>>> comes in >>>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a >>>>>>>>>>>>>>>> list of >>>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the >>>>>>>>>>>>>>>> entry >>>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at >>>>>>>>>>>>>>>> exactly >>>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So the owner field is only stable when we have an >>>>>>>>>>>>>>>> owner. If >>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is >>>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only >>>>>>>>>>>>>>>> stable >>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. >>>>>>>>>>>>>>>> Ouch! >>>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>>> the different pieces of info is unstable to varying >>>>>>>>>>>>>>>> degrees. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It would be possible for instance to report the same >>>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the >>>>>>>>>>>>>>>>> monitor, >>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently >>>>>>>>>>>>>>>>> all at >>>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it >>>>>>>>>>>>>>>> can't >>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue >>>>>>>>>>>>>>>> since >>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and >>>>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned >>>>>>>>>>>>>>>> and we >>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance >>>>>>>>>>>>>>> better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. >>>>>>>>>>>>>>>>>>>> Perhaps the doc >>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's >>>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be >>>>>>>>>>>>>>>>>>>>> construed as an agent bug. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target >>>>>>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while >>>>>>>>>>>>>>>>>>>> the target >>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was >>>>>>>>>>>>>>>>>>>> resumed after >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but >>>>>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the >>>>>>>>>>>>>>>>>>>> agent should not >>>>>>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so >>>>>>>>>>>>>>>>>>>> agent bug. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't >>>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says >>>>>>>>>>>>>>>>>>> nothing about suspension ... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> And it is intentional that suspension is not >>>>>>>>>>>>>>>>>> required. JVM/DI and JVM/PI >>>>>>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the >>>>>>>>>>>>>>>>>> data being potentially >>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it >>>>>>>>>>>>>>>>>> like stat(2). You can >>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that >>>>>>>>>>>>>>>>>> the info is current >>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too >>>>>>>>>>>>>>>>>> much motherhood to >>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either >>>>>>>>>>>>>>>>>> way... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow >>>>>>>>>>>>>>>>>>>>> this to be fixed in the future without >>>>>>>>>>>>>>>>>>>>> forcing/using any safepoints. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm >>>>>>>>>>>>>>>>>>>> avoiding talking about >>>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the >>>>>>>>>>>>>>>>>>> operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about >>>>>>>>>>>>>>>>>> it and probably >>>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually >>>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually >>>>>>>>>>>>>>>>>>> do that because suspends/resume don't nest. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we >>>>>>>>>>>>>>>>>> tracked internal and >>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used >>>>>>>>>>>>>>>>>>>>>>>> to check thread state, it returns `true` when >>>>>>>>>>>>>>>>>>>>>>>> the thread is sleeping [3], or when it performs >>>>>>>>>>>>>>>>>>>>>>>> in native [4]. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it >>>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the >>>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first >>>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from >>>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information >>>>>>>>>>>>>>>>>>>>>>>> about the object's monitor." >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, >>>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect >>>>>>>>>>>>>>>>>>>>>>>> information in some case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the >>>>>>>>>>>>>>>>>>>>>>>> owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint >>>>>>>>>>>>>>>>>>>>>>>> in any case. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to >>>>>>>>>>>>>>>>>>>>>>> using Handshakes, so this particular operation >>>>>>>>>>>>>>>>>>>>>>> will require that the apparent owner is >>>>>>>>>>>>>>>>>>>>>>> Handshake-safe (by entering a handshake with it) >>>>>>>>>>>>>>>>>>>>>>> before querying the monitor. This would still be >>>>>>>>>>>>>>>>>>>>>>> preferable I think to always using a safepoint >>>>>>>>>>>>>>>>>>>>>>> for the entire operation. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the >>>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be >>>>>>>>>>>>>>>>>>>>>>>>> suspended at the time we first see it, and may >>>>>>>>>>>>>>>>>>>>>>>>> release the monitor, but then it may get >>>>>>>>>>>>>>>>>>>>>>>>> suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner >>>>>>>>>>>>>>>>>>>>>>>>> and proceed to query the monitor information >>>>>>>>>>>>>>>>>>>>>>>>> in a racy way. This can't happen when >>>>>>>>>>>>>>>>>>>>>>>>> suspension itself requires a safepoint as the >>>>>>>>>>>>>>>>>>>>>>>>> current thread won't go to that safepoint >>>>>>>>>>>>>>>>>>>>>>>>> during this code. However, if suspension is >>>>>>>>>>>>>>>>>>>>>>>>> implemented via a direct handshake with the >>>>>>>>>>>>>>>>>>>>>>>>> target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>> >> From chris.plummer at oracle.com Fri Jun 19 00:54:06 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 18 Jun 2020 17:54:06 -0700 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" Message-ID: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> [I've added runtime-dev to this SA review since understanding interpreter invokes (code generated by TemplateInterpreterGenerator::generate_normal_entry()) and stack walking is probably more important than understanding SA.] Hello, Please help review the following: https://bugs.openjdk.java.net/browse/JDK-8244383 http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html The crux of the bug is when doing stack walking the topmost frame is in an inconsistent state because we are in the middle of pushing a new interpreter frame. Basically we are executing code generated by TemplateInterpreterGenerator::generate_normal_entry(). Since the PC register is in this code, SA assumes the topmost frame is an interpreter frame. The first issue with this interpreter frame assumption is if we haven't actually pushed the frame yet, then the current frame is the caller's frame, and could be compiled. But since SA thinks it's interpreted, later on it tries to convert the frame->bcp to a BCI, but frame->bcp is only valid for interpreter frames. Thus the "illegal BCI" failures. If the previous frame happened to be interpreted, then the existing SA code works fine. The other state of frame pushing that was problematic was when the new frame had been pushed, but frame->method and frame->bcp were not setup yet. This also would lead to "illegal BCI" later on because garbage would be stored in these locations. Fixing the above problems requires trying to determine the state of the frame push through a series of checks, and then adapting what is considered to be the current frame based on the outcome of the checks. The first things checked is that frame->method is valid (we can successfully instantiate a wrapper for the Method* without failure) and that frame->bcp is within the method. If both these pass then we can use the frame as-is. If the above checks fail, then we try to determine whether the issue is that the frame is not yet pushed and the current frame is actually compiled, or the frame has been pushed but not yet initialized. This is done by first getting the return address from the stack or RAX (it's location depends on how far along we are in the entry code) and comparing this to what is stored in frame->return_addr. If they are the same, then we have pushed the frame but not yet initialized it. In this case we use the previous frame (senderSP() and senderFP()) as the current frame since the current frame is not yet initialized. If the return address check fails, then we assume the new frame is not yet pushed, and and treat the current frame as compiled, even though PC points into the interpreter (we replace PC with RAX in this case). Comments in the code pretty well explain all the above, so it is probably easier to follow the logic in the code along with the comments rather than apply my above description to the code. I should add that it's very rare that we ever get into this special error handling code. This bug was very hard to reproduce initially. I was only able to make progress with reproducing and debugging by inserting delay loops in various spots in the code generated by TemplateInterpreterGenerator::generate_normal_entry(). By doing this I was able to reproduce the issue quite easily and hit all the logic in the new code I've added. The fix is basically entirely contained within AMD64CurrentFrameGuess.java. The rest of the changes are minor: src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java -Main fix for CR src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java -Added getInterpreterFrameBCP(), which is now needed by AMD64CurrentFrameGuess.java -I also simplified some code by using the existing getInterpreterFrameMethod() ?rather than replicating inline what it does. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java -I noticed the windows version of this code had some extra checks that were missing ?from the bsd version. I then looked at the linux version, but it had been heavily modified ?a short while back to leverage DWARF info to determine frames. So I looked at the previous ?rev and it too had these extra checks. I decided to add them to the BSD port. I'm not sure ?if it helps at all, but it certainly doesn't seem to do any harm. thanks, Chris From suenaga at oss.nttdata.com Fri Jun 19 01:56:35 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 19 Jun 2020 10:56:35 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <60c901f5-5f63-9f62-4a3d-eb76fb47a6f2@oracle.com> <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Message-ID: Hi Serguei, I tested vmTestbase/nsk/jdi with webrev.03, all tests work fine on my laptop. http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.03/ Thanks, Yasumasa On 2020/06/19 9:22, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > It would be even more safe to run the JDI tests as well. > The ObjectReference owningThread(), waitingThreads() and entryCount() are based on this JVMTI function. > See: https://docs.oracle.com/en/java/javase/14/docs/api/jdk.jdi/com/sun/jdi/ObjectReference.html > > Thanks, > Serguei > > > On 6/18/20 17:01, Yasumasa Suenaga wrote: >> Thanks Serguei! >> >> I fixed them, and the change works fine on my laptop with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >> I will push it later. >> >> >> Yasumasa >> >> >> On 2020/06/19 6:42, serguei.spitsyn at oracle.com wrote: >>> Hi Yasumasa, >>> >>> This looks good, nice simplification. >>> >>> A couple of minor comments. >>> >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/src/hotspot/share/prims/jvmtiEnvBase.cpp.frames.html >>> >>> 995 ThreadsListHandle tlh(current_thread); 1052 ThreadsListHandle tlh(current_thread); >>> >>> We can share one tlh for both fragments. >>> >>> 942 HandleMark hm; 1051 HandleMark hm; >>> >>> The second HandleMark is not needed. >>> Also, we can use current_thread in the first one: >>> >>> HandleMark hm(current_thread); >>> >>> >>> I do not need to see another webrev if you fix the above. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/18/20 07:56, Yasumasa Suenaga wrote: >>>> Hi Daniel, >>>> >>>> On 2020/06/18 23:38, Daniel D. Daugherty wrote: >>>>> On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: >>>>>> On 2020/06/18 17:36, David Holmes wrote: >>>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. >>>>>>>> Do you mean we should it explicitly in c'tor? >>>>>>> >>>>>>> Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). >>>>>> >>>>>> Ok, I've fixed them. Could you review again? >>>>>> >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>>>> >>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>> ???? L2842: ? // It need to perform at safepoint for gathering stable data >>>>> ???????? Perhaps: >>>>> ????????????? // This needs to be performed at a safepoint to gather stable data >>>> >>>> I will change it before pushing. >>>> >>>> >>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>> ???? No comments. >>>>> >>>>> Thumbs up. >>>>> >>>>> What testing has been done on this fix? >>>> >>>> I tested this change on serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>> >>>> >>>>> Also, please wait to hear from Serguei on this fix... >>>> >>>> Ok. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>>> Dan >>>>> >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> David >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>>>> Hi Serguei, >>>>>>>>>> >>>>>>>>>> Thanks for your comment! >>>>>>>>>> I uploaded new webrev: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>>>> >>>>>>>>>> I'm not sure the following change is correct. >>>>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>>>> >>>>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>>>> >>>>>>>>> But given this is now only executed at a safepoint there are additional simplifications that can be made: >>>>>>>>> >>>>>>>>> - current thread determination can be simplified: >>>>>>>>> >>>>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>>>> >>>>>>>>> becomes: >>>>>>>>> >>>>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>>>> >>>>>>>>> - these comments can be removed >>>>>>>>> >>>>>>>>> ??994?????? // Use current thread since function can be called from a >>>>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>>>> 1053?????? // Use current thread since function can be called from a >>>>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>>>> >>>>>>>>> - these TLH constructions should be passing current_thread (existing bug) >>>>>>>>> >>>>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>>>> >>>>>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>>>>> >>>>>>>>> >>>>>>>>> Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> ??????? // This monitor is owned so we have to find the owning JavaThread. >>>>>>>>>> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because this function >>>>>>>>>> -????? // may not have been called at a safepoint and the owning_thread >>>>>>>>>> -????? // might not be suspended. >>>>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>>>> -??????? // The monitor's owner either has to be the current thread, at safepoint >>>>>>>>>> -??????? // or it has to be suspended. Any of these conditions will prevent both >>>>>>>>>> -??????? // contending and waiting threads from modifying the state of >>>>>>>>>> -??????? // the monitor. >>>>>>>>>> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>>>> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>>> -????????? // will not make it back to the JVM/TI agent. The error code will >>>>>>>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>>>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>>>> -??????? } >>>>>>>>>> -??????? HandleMark hm; >>>>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >>>>>>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>> >>>>>>>>>>> This fix is not enough. >>>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>>>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Serguei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>>>> (Change subject for RFR) >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>>>> Could you review it? >>>>>>>>>>>> >>>>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>>>> >>>>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Go for it! :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>> > From jonathan.gibbons at oracle.com Fri Jun 19 02:45:45 2020 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Thu, 18 Jun 2020 19:45:45 -0700 Subject: Qu: JDK-8247901, Multiple conflicting @return for FlightRecorderMXBean Message-ID: <450c343a-ecdd-8bf7-e465-7e8614b6cfec@oracle.com> I have filed JDK-8247901, to cover an issue detected by doclint, regarding two conflicting @return descriptions for a single method. I can make the fix, if you want, but I need confirmation of which one should be deleted and which should be retained.? I can make an informed guess (one of them refers to `null` in the context of a `long` return code!) but I would prefer that someone with domain-specific knowledge make the call. -- Jon JBS: https://bugs.openjdk.java.net/browse/JDK-8247901 Here, if it helps, is the description from the bug: doclint reports: open/src/jdk.management.jfr/share/classes/jdk/management/jfr/FlightRecorderMXBean.java:213: warning: @return has already been specified ???? * @return a unique ID that can be used for reading recording data. ?????? ^ 1 warning The source shows: ???? * @return a snapshot of all available recording data, not {@code null} ???? * ???? * @throws java.lang.SecurityException if a security manager exists and the ???? *???????? caller does not have {@code ManagementPermission("control")} ???? * ???? * @return a unique ID that can be used for reading recording data. From jonathan.gibbons at oracle.com Fri Jun 19 03:12:00 2020 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Thu, 18 Jun 2020 20:12:00 -0700 Subject: RFR: https://bugs.openjdk.java.net/browse/JDK-8247784 Message-ID: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> Please review some changes to fix typos in some recent doc updates. In two places, ${docRoot} is used instead of {@docRoot} -- Jon JBS: https://bugs.openjdk.java.net/browse/JDK-8247784 Patch: diff -r c5904de55565 src/jdk.jdi/share/classes/com/sun/jdi/Type.java --- a/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 17:32:57 2020 -0700 +++ b/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 20:05:42 2020 -0700 @@ -152,7 +152,7 @@ ??????* Returns the name of this type. The result is of the same form as ??????* the name returned by {@link Class#getName()}. ??????* The returned name may not be a - * binary name. + * binary name. ??????* ??????* @return the name of this type ??????*/ diff -r c5904de55565 src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java --- a/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java Thu Jun 18 17:32:57 2020 -0700 +++ b/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java Thu Jun 18 20:05:42 2020 -0700 @@ -44,7 +44,7 @@ ?????/** ??????* Returns the {@linkplain com.sun.jdi.Type#name() name of the class} ??????* that has been unloaded. The returned string may not be a - * binary name. + * binary name. ??????* ??????* @see Class#getName() ??????*/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.gibbons at oracle.com Fri Jun 19 03:16:01 2020 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Thu, 18 Jun 2020 20:16:01 -0700 Subject: RFR: JDK-8247784,Bad link causes invalid documentation In-Reply-To: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> References: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> Message-ID: resend, with correct subject line On 6/18/20 8:12 PM, Jonathan Gibbons wrote: > > Please review some changes to fix typos in some recent doc updates. > > In two places, ${docRoot} is used instead of {@docRoot} > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247784 > > Patch: > > diff -r c5904de55565 src/jdk.jdi/share/classes/com/sun/jdi/Type.java > --- a/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 > 17:32:57 2020 -0700 > +++ b/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 > 20:05:42 2020 -0700 > @@ -152,7 +152,7 @@ > ??????* Returns the name of this type. The result is of the same form as > ??????* the name returned by {@link Class#getName()}. > ??????* The returned name may not be a > - * href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary > name. > + * href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary > name. > ??????* > ??????* @return the name of this type > ??????*/ > diff -r c5904de55565 > src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java > --- > a/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java > Thu Jun 18 17:32:57 2020 -0700 > +++ > b/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java > Thu Jun 18 20:05:42 2020 -0700 > @@ -44,7 +44,7 @@ > ?????/** > ??????* Returns the {@linkplain com.sun.jdi.Type#name() name of the class} > ??????* that has been unloaded. The returned string may not be a > - * href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary > name. > + * href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary > name. > ??????* > ??????* @see Class#getName() > ??????*/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Fri Jun 19 06:09:01 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 19 Jun 2020 07:09:01 +0100 Subject: RFR: JDK-8247784,Bad link causes invalid documentation In-Reply-To: References: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> Message-ID: Looks good. On 19/06/2020 04:16, Jonathan Gibbons wrote: > > resend, with correct subject line > > On 6/18/20 8:12 PM, Jonathan Gibbons wrote: >> >> Please review some changes to fix typos in some recent doc updates. >> >> In two places, ${docRoot} is used instead of {@docRoot} >> >> -- Jon >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8247784 >> >> Patch: >> >> diff -r c5904de55565 src/jdk.jdi/share/classes/com/sun/jdi/Type.java >> --- a/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 >> 17:32:57 2020 -0700 >> +++ b/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 >> 20:05:42 2020 -0700 >> @@ -152,7 +152,7 @@ >> ??????* Returns the name of this type. The result is of the same form as >> ??????* the name returned by {@link Class#getName()}. >> ??????* The returned name may not be a >> - * > href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> + * > href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> ??????* >> ??????* @return the name of this type >> ??????*/ >> diff -r c5904de55565 >> src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java >> --- >> a/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java >> Thu Jun 18 17:32:57 2020 -0700 >> +++ >> b/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java >> Thu Jun 18 20:05:42 2020 -0700 >> @@ -44,7 +44,7 @@ >> ?????/** >> ??????* Returns the {@linkplain com.sun.jdi.Type#name() name of the >> class} >> ??????* that has been unloaded. The returned string may not be a >> - * > href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> + * > href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> ??????* >> ??????* @see Class#getName() >> ??????*/ >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Fri Jun 19 06:25:06 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 18 Jun 2020 23:25:06 -0700 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <97a35d90-0bfa-e947-7be4-972798b65b7a@oracle.com> <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Message-ID: Hi Yasumasa, Looks good. Thanks, Serguei On 6/18/20 18:56, Yasumasa Suenaga wrote: > Hi Serguei, > > I tested vmTestbase/nsk/jdi with webrev.03, all tests work fine on my > laptop. > > ? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.03/ > > > Thanks, > > Yasumasa > > > On 2020/06/19 9:22, serguei.spitsyn at oracle.com wrote: >> Hi Yasumasa, >> >> It would be even more safe to run the JDI tests as well. >> The ObjectReference owningThread(), waitingThreads() and entryCount() >> are based on this JVMTI function. >> See: >> https://docs.oracle.com/en/java/javase/14/docs/api/jdk.jdi/com/sun/jdi/ObjectReference.html >> >> Thanks, >> Serguei >> >> >> On 6/18/20 17:01, Yasumasa Suenaga wrote: >>> Thanks Serguei! >>> >>> I fixed them, and the change works fine on my laptop with >>> serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>> I will push it later. >>> >>> >>> Yasumasa >>> >>> >>> On 2020/06/19 6:42, serguei.spitsyn at oracle.com wrote: >>>> Hi Yasumasa, >>>> >>>> This looks good, nice simplification. >>>> >>>> A couple of minor comments. >>>> >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/src/hotspot/share/prims/jvmtiEnvBase.cpp.frames.html >>>> >>>> >>>> 995 ThreadsListHandle tlh(current_thread); 1052 ThreadsListHandle >>>> tlh(current_thread); >>>> >>>> We can share one tlh for both fragments. >>>> >>>> 942 HandleMark hm; 1051 HandleMark hm; >>>> >>>> The second HandleMark is not needed. >>>> Also, we can use current_thread in the first one: >>>> >>>> HandleMark hm(current_thread); >>>> >>>> >>>> I do not need to see another webrev if you fix the above. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 6/18/20 07:56, Yasumasa Suenaga wrote: >>>>> Hi Daniel, >>>>> >>>>> On 2020/06/18 23:38, Daniel D. Daugherty wrote: >>>>>> On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: >>>>>>> On 2020/06/18 17:36, David Holmes wrote: >>>>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> Both ThreadsListHandle and ResourceMarks would use >>>>>>>>> `Thread::current()` for their resource. It is set as default >>>>>>>>> parameter in c'tor. >>>>>>>>> Do you mean we should it explicitly in c'tor? >>>>>>>> >>>>>>>> Yes pass current_thread so we don't do the additional >>>>>>>> unnecessary calls to Thread::current(). >>>>>>> >>>>>>> Ok, I've fixed them. Could you review again? >>>>>>> >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>>>>> >>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>> ???? L2842: ? // It need to perform at safepoint for gathering >>>>>> stable data >>>>>> ???????? Perhaps: >>>>>> ????????????? // This needs to be performed at a safepoint to >>>>>> gather stable data >>>>> >>>>> I will change it before pushing. >>>>> >>>>> >>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>> ???? No comments. >>>>>> >>>>>> Thumbs up. >>>>>> >>>>>> What testing has been done on this fix? >>>>> >>>>> I tested this change on serviceability/jvmti and >>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>> >>>>> >>>>>> Also, please wait to hear from Serguei on this fix... >>>>> >>>>> Ok. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>>> David >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>>>>> Hi Serguei, >>>>>>>>>>> >>>>>>>>>>> Thanks for your comment! >>>>>>>>>>> I uploaded new webrev: >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>>>>> >>>>>>>>>>> I'm not sure the following change is correct. >>>>>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>>>>> >>>>>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>>>>> >>>>>>>>>> But given this is now only executed at a safepoint there are >>>>>>>>>> additional simplifications that can be made: >>>>>>>>>> >>>>>>>>>> - current thread determination can be simplified: >>>>>>>>>> >>>>>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>>>>> >>>>>>>>>> becomes: >>>>>>>>>> >>>>>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>>>>> >>>>>>>>>> - these comments can be removed >>>>>>>>>> >>>>>>>>>> ??994?????? // Use current thread since function can be >>>>>>>>>> called from a >>>>>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>>>>> 1053?????? // Use current thread since function can be called >>>>>>>>>> from a >>>>>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>>>>> >>>>>>>>>> - these TLH constructions should be passing current_thread >>>>>>>>>> (existing bug) >>>>>>>>>> >>>>>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>>>>> >>>>>>>>>> - All ResourceMarks should be passing current_thread >>>>>>>>>> (existing bug) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Aside: there is a major inconsistency between the spec and >>>>>>>>>> implementation for this method. I've traced the history to >>>>>>>>>> see how this came about from JVMDI (ref JDK-4546581) but it >>>>>>>>>> never resulted in the JVM TI specification clearly stating >>>>>>>>>> what the waiters/waiter_count means. I will file a bug to >>>>>>>>>> have the spec clarified to match the implementation (even >>>>>>>>>> though I think the implementation is what is wrong). :( >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> All tests on submit repo and serviceability/jvmti and >>>>>>>>>>> vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> ??????? // This monitor is owned so we have to find the >>>>>>>>>>> owning JavaThread. >>>>>>>>>>> ??????? owning_thread = >>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because >>>>>>>>>>> this function >>>>>>>>>>> -????? // may not have been called at a safepoint and the >>>>>>>>>>> owning_thread >>>>>>>>>>> -????? // might not be suspended. >>>>>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>>>>> -??????? // The monitor's owner either has to be the current >>>>>>>>>>> thread, at safepoint >>>>>>>>>>> -??????? // or it has to be suspended. Any of these >>>>>>>>>>> conditions will prevent both >>>>>>>>>>> -??????? // contending and waiting threads from modifying >>>>>>>>>>> the state of >>>>>>>>>>> -??????? // the monitor. >>>>>>>>>>> -??????? if (!at_safepoint && >>>>>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>>>>> -????????? // Don't worry! This return of >>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>>>> -????????? // will not make it back to the JVM/TI agent. The >>>>>>>>>>> error code will >>>>>>>>>>> -????????? // get intercepted in >>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>>>>> -????????? // will retry the call via a >>>>>>>>>>> VM_GetObjectMonitorUsage VM op. >>>>>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>>>>> -??????? } >>>>>>>>>>> -??????? HandleMark hm; >>>>>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must >>>>>>>>>>> not be NULL"); >>>>>>>>>>> ????????? Handle???? th(current_thread, >>>>>>>>>>> owning_thread->threadObj()); >>>>>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, >>>>>>>>>>> th); >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>> >>>>>>>>>>>> This fix is not enough. >>>>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works >>>>>>>>>>>> in two modes: in VMop and non-VMop. >>>>>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Serguei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>>>>> (Change subject for RFR) >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>>>>> Could you review it? >>>>>>>>>>>>> >>>>>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>>>>> ? webrev: >>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>>>>> >>>>>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>>>>> Also I tested it with serviceability/jvmti and >>>>>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Go for it! :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why >>>>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() >>>>>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint >>>>>>>>>>>>>>>>>>>>>>>>>>> if the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject >>>>>>>>>>>>>>>>>>>>>>>>>>> object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = >>>>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = >>>>>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, >>>>>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == >>>>>>>>>>>>>>>>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were >>>>>>>>>>>>>>>>>>>>>>>>>>> not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, >>>>>>>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some >>>>>>>>>>>>>>>>>>>>>>>>>> cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>>>>>>>>>>>>>>>>>>> is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from >>>>>>>>>>>>>>>>>>>>>>>>>>>> monitor object at first [1], but it would >>>>>>>>>>>>>>>>>>>>>>>>>>>> perform concurrently. >>>>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner >>>>>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the >>>>>>>>>>>>>>>>>>>>>>>>>>>> monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an >>>>>>>>>>>>>>>>>>>>>>>>>>> owner thread it is either suspended or not. >>>>>>>>>>>>>>>>>>>>>>>>>>> If it is suspended then it cannot release >>>>>>>>>>>>>>>>>>>>>>>>>>> the monitor. If it is not suspended we >>>>>>>>>>>>>>>>>>>>>>>>>>> detect that and redo the whole query at a >>>>>>>>>>>>>>>>>>>>>>>>>>> safepoint. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume >>>>>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming >>>>>>>>>>>>>>>>>>>>>>>>> also required a safepoint but it only requires >>>>>>>>>>>>>>>>>>>>>>>>> the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the >>>>>>>>>>>>>>>>>>>>>>>> information while not at a >>>>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent >>>>>>>>>>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either >>>>>>>>>>>>>>>>>>>>>>> collects data about a monitor owned by a thread >>>>>>>>>>>>>>>>>>>>>>> that is suspended, or else it collects that data >>>>>>>>>>>>>>>>>>>>>>> at a safepoint. But the owning thread can be >>>>>>>>>>>>>>>>>>>>>>> resumed just after the code determined it was >>>>>>>>>>>>>>>>>>>>>>> suspended. The monitor can then be released and >>>>>>>>>>>>>>>>>>>>>>> the information gathered not only stale but >>>>>>>>>>>>>>>>>>>>>>> potentially completely wrong as it could now be >>>>>>>>>>>>>>>>>>>>>>> owned by a different thread and will report that >>>>>>>>>>>>>>>>>>>>>>> thread's entry count. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then >>>>>>>>>>>>>>>>>>>>>> as soon as >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the >>>>>>>>>>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the >>>>>>>>>>>>>>>>>>>>>> implementation returns >>>>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the >>>>>>>>>>>>>>>>>>>>>> target thread >>>>>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is >>>>>>>>>>>>>>>>>>>>> fine. But the expectation is that the information >>>>>>>>>>>>>>>>>>>>> was actually an accurate snapshot of the state of >>>>>>>>>>>>>>>>>>>>> the monitor at some point in time. The current >>>>>>>>>>>>>>>>>>>>> code does not ensure that. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you >>>>>>>>>>>>>>>>>>>> think the info >>>>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state >>>>>>>>>>>>>>>>>>>> of the monitor >>>>>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is >>>>>>>>>>>>>>>>>>> no atomicity**. The reported owner thread may not >>>>>>>>>>>>>>>>>>> own it any longer when the entry count is read, so >>>>>>>>>>>>>>>>>>> straight away you may have the wrong entry count >>>>>>>>>>>>>>>>>>> information. The set of threads trying to acquire >>>>>>>>>>>>>>>>>>> the monitor, or wait on the monitor can change in >>>>>>>>>>>>>>>>>>> unexpected ways. It would be possible for instance >>>>>>>>>>>>>>>>>>> to report the same thread as being the owner, being >>>>>>>>>>>>>>>>>>> blocked trying to enter the monitor, and being in >>>>>>>>>>>>>>>>>>> the wait-set of the monitor - apparently all at the >>>>>>>>>>>>>>>>>>> same time! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have >>>>>>>>>>>>>>>>>>> complete atomicity because threads can join the set >>>>>>>>>>>>>>>>>>> of threads trying to enter the monitor (unless they >>>>>>>>>>>>>>>>>>> are all suspended). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ >>>>>>>>>>>>>>>>>> suspended: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to >>>>>>>>>>>>>>>>>> gather the info about >>>>>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, >>>>>>>>>>>>>>>>>> the info that >>>>>>>>>>>>>>>>>> ??? we are gathering cannot change until we return >>>>>>>>>>>>>>>>>> from the safepoint. >>>>>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about >>>>>>>>>>>>>>>>>> the object's >>>>>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that >>>>>>>>>>>>>>>>>> no other >>>>>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change >>>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>>> ??? another thread can block on entry while we are >>>>>>>>>>>>>>>>>> gathering >>>>>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a >>>>>>>>>>>>>>>>>> thread was >>>>>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that >>>>>>>>>>>>>>>>>> thread is >>>>>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that >>>>>>>>>>>>>>>>>> notify_waiter_count >>>>>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info >>>>>>>>>>>>>>>>>> is stable, >>>>>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info >>>>>>>>>>>>>>>>>> about the >>>>>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If >>>>>>>>>>>>>>>>>> it finds a >>>>>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, >>>>>>>>>>>>>>>>>> then it will >>>>>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. >>>>>>>>>>>>>>>>>> I just >>>>>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for >>>>>>>>>>>>>>>>>> the threads >>>>>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call >>>>>>>>>>>>>>>>>> can cause >>>>>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if >>>>>>>>>>>>>>>>>> a thread >>>>>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the >>>>>>>>>>>>>>>>>> monitor. This >>>>>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread >>>>>>>>>>>>>>>>>> comes in >>>>>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a >>>>>>>>>>>>>>>>>> list of >>>>>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed >>>>>>>>>>>>>>>>>> the entry >>>>>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at >>>>>>>>>>>>>>>>>> exactly >>>>>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So the owner field is only stable when we have an >>>>>>>>>>>>>>>>>> owner. If >>>>>>>>>>>>>>>>>> that owner is not suspended, then the other fields >>>>>>>>>>>>>>>>>> are also >>>>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. >>>>>>>>>>>>>>>>>> If the >>>>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is >>>>>>>>>>>>>>>>>> stable, >>>>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only >>>>>>>>>>>>>>>>>> stable >>>>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. >>>>>>>>>>>>>>>>>> Ouch! >>>>>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint >>>>>>>>>>>>>>>>>> is all >>>>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we >>>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not >>>>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, >>>>>>>>>>>>>>>>>> then >>>>>>>>>>>>>>>>>> the different pieces of info is unstable to varying >>>>>>>>>>>>>>>>>> degrees. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It would be possible for instance to report the same >>>>>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter >>>>>>>>>>>>>>>>>>> the monitor, >>>>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - >>>>>>>>>>>>>>>>>>> apparently all at >>>>>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. >>>>>>>>>>>>>>>>>> If the >>>>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, >>>>>>>>>>>>>>>>>> then we >>>>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then >>>>>>>>>>>>>>>>>> it can't >>>>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait >>>>>>>>>>>>>>>>>> queue since >>>>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and >>>>>>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen >>>>>>>>>>>>>>>>>> on the >>>>>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Handshaking is not going to make this situation any >>>>>>>>>>>>>>>>>> better >>>>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned >>>>>>>>>>>>>>>>>> and we >>>>>>>>>>>>>>>>>> handshake with the owner, the stability or >>>>>>>>>>>>>>>>>> instability of >>>>>>>>>>>>>>>>>> the other fields remains the same as when >>>>>>>>>>>>>>>>>> SuspendThread is >>>>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the >>>>>>>>>>>>>>>>>> data as >>>>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>>>>> can resume execution after doing their handshake so >>>>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only >>>>>>>>>>>>>>>>>> gather >>>>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Performance impact is not very clear but should not be >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance >>>>>>>>>>>>>>>>> better. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale >>>>>>>>>>>>>>>>>>>>>> information is >>>>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. >>>>>>>>>>>>>>>>>>>>>> Perhaps the doc >>>>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of >>>>>>>>>>>>>>>>>>>>>> returning stale >>>>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about >>>>>>>>>>>>>>>>>>>>>>> thread's being suspended so I can't see how this >>>>>>>>>>>>>>>>>>>>>>> could be construed as an agent bug. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the >>>>>>>>>>>>>>>>>>>>>> target thread was >>>>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called >>>>>>>>>>>>>>>>>>>>>> while the target >>>>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was >>>>>>>>>>>>>>>>>>>>>> resumed after >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, >>>>>>>>>>>>>>>>>>>>>> but before >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the >>>>>>>>>>>>>>>>>>>>>> agent should not >>>>>>>>>>>>>>>>>>>>>> resume the target thread while also calling >>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so >>>>>>>>>>>>>>>>>>>>>> agent bug. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an >>>>>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't >>>>>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says >>>>>>>>>>>>>>>>>>>>> nothing about suspension ... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> And it is intentional that suspension is not >>>>>>>>>>>>>>>>>>>> required. JVM/DI and JVM/PI >>>>>>>>>>>>>>>>>>>> used to require suspension for these kinds of >>>>>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the >>>>>>>>>>>>>>>>>>>> data being potentially >>>>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it >>>>>>>>>>>>>>>>>>>> like stat(2). You can >>>>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee >>>>>>>>>>>>>>>>>>>> that the info is current >>>>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it >>>>>>>>>>>>>>>>>>>> too much motherhood to >>>>>>>>>>>>>>>>>>>> state that the data might be stale? I could go >>>>>>>>>>>>>>>>>>>> either way... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow >>>>>>>>>>>>>>>>>>>>>>> this to be fixed in the future without >>>>>>>>>>>>>>>>>>>>>>> forcing/using any safepoints. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm >>>>>>>>>>>>>>>>>>>>>> avoiding talking about >>>>>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread >>>>>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the >>>>>>>>>>>>>>>>>>>>> operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about >>>>>>>>>>>>>>>>>>>> it and probably >>>>>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Semantically it is no different to the code >>>>>>>>>>>>>>>>>>>>> actually suspending the owner thread, but it can't >>>>>>>>>>>>>>>>>>>>> actually do that because suspends/resume don't nest. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we >>>>>>>>>>>>>>>>>>>> tracked internal and >>>>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is >>>>>>>>>>>>>>>>>>>>>>>>>> used to check thread state, it returns `true` >>>>>>>>>>>>>>>>>>>>>>>>>> when the thread is sleeping [3], or when it >>>>>>>>>>>>>>>>>>>>>>>>>> performs in native [4]. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended >>>>>>>>>>>>>>>>>>>>>>>>> it can't continue execution in the VM or in >>>>>>>>>>>>>>>>>>>>>>>>> Java code. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the >>>>>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first >>>>>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it >>>>>>>>>>>>>>>>>>>>>>>>>> from JVMTI spec - it just says "Get >>>>>>>>>>>>>>>>>>>>>>>>>> information about the object's monitor." >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation >>>>>>>>>>>>>>>>>>>>>>>>> optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return >>>>>>>>>>>>>>>>>>>>>>>>>> incorrect information in some case. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the >>>>>>>>>>>>>>>>>>>>>>>>>> owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if >>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at >>>>>>>>>>>>>>>>>>>>>>>>>> safepoint in any case. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to >>>>>>>>>>>>>>>>>>>>>>>>> using Handshakes, so this particular operation >>>>>>>>>>>>>>>>>>>>>>>>> will require that the apparent owner is >>>>>>>>>>>>>>>>>>>>>>>>> Handshake-safe (by entering a handshake with >>>>>>>>>>>>>>>>>>>>>>>>> it) before querying the monitor. This would >>>>>>>>>>>>>>>>>>>>>>>>> still be preferable I think to always using a >>>>>>>>>>>>>>>>>>>>>>>>> safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as >>>>>>>>>>>>>>>>>>>>>>>>>>> the thread reported as the owner may not be >>>>>>>>>>>>>>>>>>>>>>>>>>> suspended at the time we first see it, and >>>>>>>>>>>>>>>>>>>>>>>>>>> may release the monitor, but then it may get >>>>>>>>>>>>>>>>>>>>>>>>>>> suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = >>>>>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), >>>>>>>>>>>>>>>>>>>>>>>>>>> owner); >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor >>>>>>>>>>>>>>>>>>>>>>>>>>> owner and proceed to query the monitor >>>>>>>>>>>>>>>>>>>>>>>>>>> information in a racy way. This can't happen >>>>>>>>>>>>>>>>>>>>>>>>>>> when suspension itself requires a safepoint >>>>>>>>>>>>>>>>>>>>>>>>>>> as the current thread won't go to that >>>>>>>>>>>>>>>>>>>>>>>>>>> safepoint during this code. However, if >>>>>>>>>>>>>>>>>>>>>>>>>>> suspension is implemented via a direct >>>>>>>>>>>>>>>>>>>>>>>>>>> handshake with the target thread then we >>>>>>>>>>>>>>>>>>>>>>>>>>> have a problem. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>> >>>> >> From serguei.spitsyn at oracle.com Fri Jun 19 06:28:43 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 18 Jun 2020 23:28:43 -0700 Subject: RFR: JDK-8247784,Bad link causes invalid documentation In-Reply-To: References: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From suenaga at oss.nttdata.com Fri Jun 19 06:47:16 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 19 Jun 2020 15:47:16 +0900 Subject: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information In-Reply-To: References: <06b8e360-0eb4-6c91-2bdd-d9c78fcf4fe3@oss.nttdata.com> <10d564f7-3690-4d74-197b-159c4b8840db@oracle.com> <4879757b-766e-935b-7692-ab67a4f2eddd@oracle.com> <3870c1c7-3d74-9b58-2e6c-2b56a0ae268a@oracle.com> <073f02b7-7ff0-8f4f-0bb9-b7317d255de3@oracle.com> <59ded1f7-7d5f-c238-d391-82b327cce40e@oss.nttdata.com> <35066c27-193b-ea1d-a366-deef5ec6d7c6@oss.nttdata.com> <1a2d56a6-624a-b889-6f71-d51bd972aa0d@oracle.com> <83cd0686-d744-d3fe-eb79-49cdd89b99c6@oss.nttdata.com> Message-ID: <54b61a19-8f4d-a935-f169-e773d7e682bb@oss.nttdata.com> Thanks Serguei! Yasumasa On 2020/06/19 15:25, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > Looks good. > > Thanks, > Serguei > > > On 6/18/20 18:56, Yasumasa Suenaga wrote: >> Hi Serguei, >> >> I tested vmTestbase/nsk/jdi with webrev.03, all tests work fine on my laptop. >> >> ? http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.03/ >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/19 9:22, serguei.spitsyn at oracle.com wrote: >>> Hi Yasumasa, >>> >>> It would be even more safe to run the JDI tests as well. >>> The ObjectReference owningThread(), waitingThreads() and entryCount() are based on this JVMTI function. >>> See: https://docs.oracle.com/en/java/javase/14/docs/api/jdk.jdi/com/sun/jdi/ObjectReference.html >>> >>> Thanks, >>> Serguei >>> >>> >>> On 6/18/20 17:01, Yasumasa Suenaga wrote: >>>> Thanks Serguei! >>>> >>>> I fixed them, and the change works fine on my laptop with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>> I will push it later. >>>> >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/19 6:42, serguei.spitsyn at oracle.com wrote: >>>>> Hi Yasumasa, >>>>> >>>>> This looks good, nice simplification. >>>>> >>>>> A couple of minor comments. >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/src/hotspot/share/prims/jvmtiEnvBase.cpp.frames.html >>>>> >>>>> 995 ThreadsListHandle tlh(current_thread); 1052 ThreadsListHandle tlh(current_thread); >>>>> >>>>> We can share one tlh for both fragments. >>>>> >>>>> 942 HandleMark hm; 1051 HandleMark hm; >>>>> >>>>> The second HandleMark is not needed. >>>>> Also, we can use current_thread in the first one: >>>>> >>>>> HandleMark hm(current_thread); >>>>> >>>>> >>>>> I do not need to see another webrev if you fix the above. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 6/18/20 07:56, Yasumasa Suenaga wrote: >>>>>> Hi Daniel, >>>>>> >>>>>> On 2020/06/18 23:38, Daniel D. Daugherty wrote: >>>>>>> On 6/18/20 5:07 AM, Yasumasa Suenaga wrote: >>>>>>>> On 2020/06/18 17:36, David Holmes wrote: >>>>>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote: >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` for their resource. It is set as default parameter in c'tor. >>>>>>>>>> Do you mean we should it explicitly in c'tor? >>>>>>>>> >>>>>>>>> Yes pass current_thread so we don't do the additional unnecessary calls to Thread::current(). >>>>>>>> >>>>>>>> Ok, I've fixed them. Could you review again? >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/ >>>>>>> >>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp >>>>>>> ???? L2842: ? // It need to perform at safepoint for gathering stable data >>>>>>> ???????? Perhaps: >>>>>>> ????????????? // This needs to be performed at a safepoint to gather stable data >>>>>> >>>>>> I will change it before pushing. >>>>>> >>>>>> >>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>>> ???? No comments. >>>>>>> >>>>>>> Thumbs up. >>>>>>> >>>>>>> What testing has been done on this fix? >>>>>> >>>>>> I tested this change on serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>> >>>>>> >>>>>>> Also, please wait to hear from Serguei on this fix... >>>>>> >>>>>> Ok. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>>> David >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2020/06/18 13:58, David Holmes wrote: >>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>> >>>>>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>> Hi Serguei, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for your comment! >>>>>>>>>>>> I uploaded new webrev: >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/ >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure the following change is correct. >>>>>>>>>>>> Can we assume owning_thread is not NULL at safepoint? >>>>>>>>>>> >>>>>>>>>>> We can if "owner != NULL". So that change seem fine to me. >>>>>>>>>>> >>>>>>>>>>> But given this is now only executed at a safepoint there are additional simplifications that can be made: >>>>>>>>>>> >>>>>>>>>>> - current thread determination can be simplified: >>>>>>>>>>> >>>>>>>>>>> 945?? Thread* current_thread = Thread::current(); >>>>>>>>>>> >>>>>>>>>>> becomes: >>>>>>>>>>> >>>>>>>>>>> ??? Thread* current_thread = VMThread::vm_thread(); >>>>>>>>>>> ??? assert(current_thread == Thread::current(), "must be"); >>>>>>>>>>> >>>>>>>>>>> - these comments can be removed >>>>>>>>>>> >>>>>>>>>>> ??994?????? // Use current thread since function can be called from a >>>>>>>>>>> ??995?????? // JavaThread or the VMThread. >>>>>>>>>>> 1053?????? // Use current thread since function can be called from a >>>>>>>>>>> 1054?????? // JavaThread or the VMThread. >>>>>>>>>>> >>>>>>>>>>> - these TLH constructions should be passing current_thread (existing bug) >>>>>>>>>>> >>>>>>>>>>> 996?????? ThreadsListHandle tlh; >>>>>>>>>>> 1055?????? ThreadsListHandle tlh; >>>>>>>>>>> >>>>>>>>>>> - All ResourceMarks should be passing current_thread (existing bug) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Aside: there is a major inconsistency between the spec and implementation for this method. I've traced the history to see how this came about from JVMDI (ref JDK-4546581) but it never resulted in the JVM TI specification clearly stating what the waiters/waiter_count means. I will file a bug to have the spec clarified to match the implementation (even though I think the implementation is what is wrong). :( >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> All tests on submit repo and serviceability/jvmti and vmTestbase/nsk/jvmti have been passed with this change. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ``` >>>>>>>>>>>> ??????? // This monitor is owned so we have to find the owning JavaThread. >>>>>>>>>>>> ??????? owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>> -????? // Cannot assume (owning_thread != NULL) here because this function >>>>>>>>>>>> -????? // may not have been called at a safepoint and the owning_thread >>>>>>>>>>>> -????? // might not be suspended. >>>>>>>>>>>> -????? if (owning_thread != NULL) { >>>>>>>>>>>> -??????? // The monitor's owner either has to be the current thread, at safepoint >>>>>>>>>>>> -??????? // or it has to be suspended. Any of these conditions will prevent both >>>>>>>>>>>> -??????? // contending and waiting threads from modifying the state of >>>>>>>>>>>> -??????? // the monitor. >>>>>>>>>>>> -??????? if (!at_safepoint && !owning_thread->is_thread_fully_suspended(true, &debug_bits)) { >>>>>>>>>>>> -????????? // Don't worry! This return of JVMTI_ERROR_THREAD_NOT_SUSPENDED >>>>>>>>>>>> -????????? // will not make it back to the JVM/TI agent. The error code will >>>>>>>>>>>> -????????? // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which >>>>>>>>>>>> -????????? // will retry the call via a VM_GetObjectMonitorUsage VM op. >>>>>>>>>>>> -????????? return JVMTI_ERROR_THREAD_NOT_SUSPENDED; >>>>>>>>>>>> -??????? } >>>>>>>>>>>> -??????? HandleMark hm; >>>>>>>>>>>> +????? assert(owning_thread != NULL, "owning JavaThread must not be NULL"); >>>>>>>>>>>> ????????? Handle???? th(current_thread, owning_thread->threadObj()); >>>>>>>>>>>> ????????? ret.owner = (jthread)jni_reference(calling_thread, th); >>>>>>>>>>>> >>>>>>>>>>>> ``` >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>> >>>>>>>>>>>>> This fix is not enough. >>>>>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two modes: in VMop and non-VMop. >>>>>>>>>>>>> The non-VMop mode has to be removed. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Serguei >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote: >>>>>>>>>>>>>> (Change subject for RFR) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I filed it to JBS and upload a webrev for it. >>>>>>>>>>>>>> Could you review it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8247729 >>>>>>>>>>>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> This change has passed tests on submit repo. >>>>>>>>>>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti on Linux x64. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>> Yes. It seems we have a consensus. >>>>>>>>>>>>>>> Thank you for taking care about it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote: >>>>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Go for it! :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote: >>>>>>>>>>>>>>>>>> Hi Dan, David and Yasumasa, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hi Dan, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote: >>>>>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() (implementation of GetObjectMonitorUsage()) does not perform at safepoint. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is not suspended: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError >>>>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* info_ptr) { >>>>>>>>>>>>>>>>>>>>>>>>>>>> ?? JavaThread* calling_thread = JavaThread::current(); >>>>>>>>>>>>>>>>>>>>>>>>>>>> ?? jvmtiError err = get_object_monitor_usage(calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>>>> ?? if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) { >>>>>>>>>>>>>>>>>>>>>>>>>>>> ???? // Some of the critical threads were not suspended. go to a safepoint and try again >>>>>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, object, info_ptr); >>>>>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op); >>>>>>>>>>>>>>>>>>>>>>>>>>>> ???? err = op.result(); >>>>>>>>>>>>>>>>>>>>>>>>>>>> ?? } >>>>>>>>>>>>>>>>>>>>>>>>>>>> ?? return err; >>>>>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */ >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from get_object_monitor_usage(). >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at first [1], but it would perform concurrently. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be changed to others in subsequent code. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor before [2]. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread it is either suspended or not. If it is suspended then it cannot release the monitor. If it is not suspended we detect that and redo the whole query at a safepoint. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately after suspending check. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also required a safepoint but it only requires the Threads_lock. So yes the code is wrong. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Which code is wrong? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller >>>>>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the information while not at a >>>>>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects data about a monitor owned by a thread that is suspended, or else it collects that data at a safepoint. But the owning thread can be resumed just after the code determined it was suspended. The monitor can then be released and the information gathered not only stale but potentially completely wrong as it could now be owned by a different thread and will report that thread's entry count. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information >>>>>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns >>>>>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread >>>>>>>>>>>>>>>>>>>>>>> could have moved on. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But the expectation is that the information was actually an accurate snapshot of the state of the monitor at some point in time. The current code does not ensure that. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think the info >>>>>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the monitor >>>>>>>>>>>>>>>>>>>>> at some point in time". >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no atomicity**. The reported owner thread may not own it any longer when the entry count is read, so straight away you may have the wrong entry count information. The set of threads trying to acquire the monitor, or wait on the monitor can change in unexpected ways. It would be possible for instance to report the same thread as being the owner, being blocked trying to enter the monitor, and being in the wait-set of the monitor - apparently all at the same time! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete atomicity because threads can join the set of threads trying to enter the monitor (unless they are all suspended). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() uses a safepoint to gather the info about >>>>>>>>>>>>>>>>>>> ??? the object's monitor. Since we're at a safepoint, the info that >>>>>>>>>>>>>>>>>>> ??? we are gathering cannot change until we return from the safepoint. >>>>>>>>>>>>>>>>>>> ??? It is a snapshot and a valid one at that. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will gather info about the object's >>>>>>>>>>>>>>>>>>> ??? monitor while _not_ at a safepoint. Assuming that no other >>>>>>>>>>>>>>>>>>> ??? thread is suspended, then entry_count can change because >>>>>>>>>>>>>>>>>>> ??? another thread can block on entry while we are gathering >>>>>>>>>>>>>>>>>>> ??? info. waiter_count and waiters can change if a thread was >>>>>>>>>>>>>>>>>>> ??? in a timed wait that has timed out and now that thread is >>>>>>>>>>>>>>>>>>> ??? blocked on re-entry. I don't think that notify_waiter_count >>>>>>>>>>>>>>>>>>> ??? and notify_waiters can change. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ??? So in this case, the owner info and notify info is stable, >>>>>>>>>>>>>>>>>>> ??? but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Consider the case when the monitor is not owned: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ? - GetObjectMonitorUsage() will start to gather info about the >>>>>>>>>>>>>>>>>>> ??? object's monitor while _not_ at a safepoint. If it finds a >>>>>>>>>>>>>>>>>>> ??? thread on the entry queue that is not suspended, then it will >>>>>>>>>>>>>>>>>>> ??? bail out and redo the info gather at a safepoint. I just >>>>>>>>>>>>>>>>>>> ??? noticed that it doesn't check for suspension for the threads >>>>>>>>>>>>>>>>>>> ??? on the waiters list so a timed Object.wait() call can cause >>>>>>>>>>>>>>>>>>> ??? some confusion here. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ??? So in this case, the owner info is not stable if a thread >>>>>>>>>>>>>>>>>>> ??? comes out of a timed wait and reenters the monitor. This >>>>>>>>>>>>>>>>>>> ??? case is no different than if a "barger" thread comes in >>>>>>>>>>>>>>>>>>> ??? after the NULL owner field is observed and enters the >>>>>>>>>>>>>>>>>>> ??? monitor. We'll return that there is no owner, a list of >>>>>>>>>>>>>>>>>>> ??? suspended pending entry thread and a list of waiting >>>>>>>>>>>>>>>>>>> ??? threads. The reality is that the object's monitor is >>>>>>>>>>>>>>>>>>> ??? owned by the "barger" that completely bypassed the entry >>>>>>>>>>>>>>>>>>> ??? queue by virtue of seeing the NULL owner field at exactly >>>>>>>>>>>>>>>>>>> ??? the right time. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If >>>>>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also >>>>>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the >>>>>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable, >>>>>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable >>>>>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch! >>>>>>>>>>>>>>>>>>> That's deterministic, but not without some work. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all >>>>>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only >>>>>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended >>>>>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not >>>>>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then >>>>>>>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> As for this claim: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It would be possible for instance to report the same thread >>>>>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor, >>>>>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at >>>>>>>>>>>>>>>>>>>> the same time! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the >>>>>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we >>>>>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't >>>>>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since >>>>>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not >>>>>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is >>>>>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the >>>>>>>>>>>>>>>>>>> wait queue. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not >>>>>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better >>>>>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we >>>>>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of >>>>>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is >>>>>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as >>>>>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads >>>>>>>>>>>>>>>>>>> can resume execution after doing their handshake so there >>>>>>>>>>>>>>>>>>> will still be field instability. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather >>>>>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>>>>>>> The advantages are: >>>>>>>>>>>>>>>>>> ??- the result is stable >>>>>>>>>>>>>>>>>> ??- the implementation can be simplified >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Performance impact is not very clear but should not be that >>>>>>>>>>>>>>>>>> big as suspending all the threads has some overhead too. >>>>>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance better. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Serguei >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale information is >>>>>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc >>>>>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of returning stale >>>>>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being suspended so I can't see how this could be construed as an agent bug. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target thread was >>>>>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target >>>>>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before >>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage() >>>>>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent should not >>>>>>>>>>>>>>>>>>>>>>> resume the target thread while also calling GetObjectMonitorUsage(). >>>>>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent resume, but you're right that doesn't really make a lot of sense. But when the spec says nothing about suspension ... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. JVM/DI and JVM/PI >>>>>>>>>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info APIs. JVM/TI >>>>>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the data being potentially >>>>>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like stat(2). You can >>>>>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the info is current >>>>>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too much motherhood to >>>>>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either way... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to be fixed in the future without forcing/using any safepoints. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding talking about >>>>>>>>>>>>>>>>>>>>>>> handshakes in this thread. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst the monitor is queried. In effect the operation would create a per-thread safepoint. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it and probably >>>>>>>>>>>>>>>>>>>>> see the code to see if there are holes... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually suspending the owner thread, but it can't actually do that because suspends/resume don't nest. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked internal and >>>>>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check thread state, it returns `true` when the thread is sleeping [3], or when it performs in native [4]. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't continue execution in the VM or in Java code. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed common case where threads are first suspended and then the monitors are queried. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - it just says "Get information about the object's monitor." >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do with the spec. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in some case. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner might be just before wakeup. >>>>>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is called at safepoint in any case. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, so this particular operation will require that the apparent owner is Handshake-safe (by entering a handshake with it) before querying the monitor. This would still be preferable I think to always using a safepoint for the entire operation. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 >>>>>>>>>>>>>>>>>>>>>>>>>>> [4] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread reported as the owner may not be suspended at the time we first see it, and may release the monitor, but then it may get suspended before we call: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ??owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to query the monitor information in a racy way. This can't happen when suspension itself requires a safepoint as the current thread won't go to that safepoint during this code. However, if suspension is implemented via a direct handshake with the target thread then we have a problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> >>> > From david.holmes at oracle.com Fri Jun 19 06:55:08 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Jun 2020 16:55:08 +1000 Subject: Qu: JDK-8247901, Multiple conflicting @return for FlightRecorderMXBean In-Reply-To: <450c343a-ecdd-8bf7-e465-7e8614b6cfec@oracle.com> References: <450c343a-ecdd-8bf7-e465-7e8614b6cfec@oracle.com> Message-ID: <5f9462d7-52ec-6245-a1ff-35d7d8a4d727@oracle.com> Hi Jon, Redirecting to hotspot-jfr-dev Cheers, David On 19/06/2020 12:45 pm, Jonathan Gibbons wrote: > I have filed JDK-8247901, to cover an issue detected by doclint, regarding > two conflicting @return descriptions for a single method. > > I can make the fix, if you want, but I need confirmation of which one > should > be deleted and which should be retained.? I can make an informed guess > (one of them refers to `null` in the context of a `long` return code!) > but I would prefer that someone with domain-specific knowledge make the > call. > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247901 > > Here, if it helps, is the description from the bug: > > doclint reports: > open/src/jdk.management.jfr/share/classes/jdk/management/jfr/FlightRecorderMXBean.java:213: > warning: @return has already been specified > ???? * @return a unique ID that can be used for reading recording data. > ?????? ^ > 1 warning > > The source shows: > ???? * @return a snapshot of all available recording data, not {@code > null} > ???? * > ???? * @throws java.lang.SecurityException if a security manager exists > and the > ???? *???????? caller does not have {@code > ManagementPermission("control")} > ???? * > ???? * @return a unique ID that can be used for reading recording data. > > From david.holmes at oracle.com Fri Jun 19 07:01:34 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Jun 2020 17:01:34 +1000 Subject: RFR: https://bugs.openjdk.java.net/browse/JDK-8247784 In-Reply-To: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> References: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> Message-ID: <975e9e71-3a96-55fc-51cf-ab8f8019c9cc@oracle.com> Looks good and trivial. Thanks, David On 19/06/2020 1:12 pm, Jonathan Gibbons wrote: > Please review some changes to fix typos in some recent doc updates. > > In two places, ${docRoot} is used instead of {@docRoot} > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247784 > > Patch: > > diff -r c5904de55565 src/jdk.jdi/share/classes/com/sun/jdi/Type.java > --- a/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 > 17:32:57 2020 -0700 > +++ b/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 > 20:05:42 2020 -0700 > @@ -152,7 +152,7 @@ > ??????* Returns the name of this type. The result is of the same form as > ??????* the name returned by {@link Class#getName()}. > ??????* The returned name may not be a > - * href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary name. > + * href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary name. > ??????* > ??????* @return the name of this type > ??????*/ > diff -r c5904de55565 > src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java > --- a/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java > Thu Jun 18 17:32:57 2020 -0700 > +++ b/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java > Thu Jun 18 20:05:42 2020 -0700 > @@ -44,7 +44,7 @@ > ?????/** > ??????* Returns the {@linkplain com.sun.jdi.Type#name() name of the class} > ??????* that has been unloaded. The returned string may not be a > - * href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary name. > + * href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary name. > ??????* > ??????* @see Class#getName() > ??????*/ > From david.holmes at oracle.com Fri Jun 19 07:24:10 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Jun 2020 17:24:10 +1000 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> Message-ID: Hi Chris, On 19/06/2020 8:55 am, Chris Plummer wrote: > On 6/18/20 1:43 AM, David Holmes wrote: >> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>> On 6/17/20 10:29 PM, David Holmes wrote: >>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Please help review the following: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>> >>>>>>>>> The CR contains all the needed details. Here's a summary of >>>>>>>>> changes in each file: >>>>>>>> >>>>>>>> The problem sounds to me like a variation of the more general >>>>>>>> problem of not ensuring a thread is kept alive whilst acting >>>>>>>> upon it. I don't know how the SA finds these references to the >>>>>>>> threads it is going to stackwalk, but is it possible to fix this >>>>>>>> via appropriate uses of ThreadsListHandle/Iterator? >>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>> >>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. >>>>>>> For example, SA can't create a wrapper to a JavaThread, only to >>>>>>> have the JavaThread be freed later on. It's just not possible. >>>>>> >>>>>> Then how does it obtain a reference to a JavaThread for which the >>>>>> native OS thread id is invalid? Any thread found in >>>>>> _java_thread_list is either live or still to be started. In the >>>>>> latter case the JavaThread->osThread does not have its thread_id >>>>>> set yet. >>>>>> >>>>> My assumption was that the JavaThread is in the process of being >>>>> destroyed, and it has freed its OS thread but is itself still in >>>>> the thread list. I did notice that the OS thread id being used >>>>> looked to be in the range of thread id #'s you would expect for the >>>>> running app, so that to me indicated it was once valid, but is no >>>>> more. >>>>> >>>>> Keep in mind that although hotspot may have synchronization code >>>>> that prevents you from pulling a JavaThread off the thread list >>>>> when it is in the process of being destroyed (I'm guessing it >>>>> does), SA has no such protections. >>>> >>>> But you stated that once the SA has attached, the target VM can't >>>> change. If the SA gets its set of thread from one attach then tries >>>> to make queries about those threads in a separate attach, then >>>> obviously it could be providing garbage thread information. So you >>>> would need to re-validate the JavaThread in the target VM before >>>> trying to do anything with it. >>> That's not what is going on here. It's attaching and doing a stack >>> trace, which involves getting the thread list and iterating through >>> all threads without detaching. >> >> Okay so I restate my original comment - all the JavaThreads must be >> alive or not yet started, so how are you encountering an invalid >> thread id? Any thread you find via the ThreadsList can't have >> destroyed its osThread. In any case the logic should be checking >> thread->osThread() for NULL, and then osThread()->get_state() to >> ensure it is >= INITIALIZED before using the thread_id(). > Hi David, > > I chatted with Dan about this, and he said since the JavaThread is > responsible for removing itself from the ThreadList, it is impossible to > have a JavaThread still on the ThreadList, but without and underlying OS > Thread. So I'm a bit perplexed as to how I can find a JavaThread on the > ThreadList, but that results in ESRCH when trying to access the thread > with ptrace. My only conclusion is that this failure is somehow > spurious, and maybe the issue it just that the thread is in some > temporary state that prevents its access. If so, I still think the > approach I'm taking is the correct one, but the comments should be updated. ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. ESRCH The specified process does not exist, or is not currently being traced by the caller, or is not stopped (for requests that require a stopped tracee). I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. Cheers, David ----- > I had one other finding. When this issue first turned up, it prevented > the thread from getting a stack trace due to the exception being thrown. > What I hadn't realize is that after fixing it to not throw an exception, > which resulted in the stack walking code getting all nulls for register > values, I actually started to see a stack trace printed: > > "JLine terminal non blocking reader thread" #26 daemon prio=5 > tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] > ?? java.lang.Thread.State: RUNNABLE > ?? JavaThread state: _thread_in_native > WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) > CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp > = 0x00007f125f0f47c0 > ?- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) > ?- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) > ?- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() > @bci=108, line=216 (Interpreted frame) > ?- > jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() > @bci=4 (Interpreted frame) > ?- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) > > The "CurrentFrameGuess" output is some debug tracing I had enabled, and > it indicates that the stack walking code is using the "last java frame" > setting, which it will do if current registers values don't indicate a > valid frame (as would be the case if sp was null). I had previously > assumed that without an underling valid LWP, there would be no stack > trace. Given that there is one, there must be a valid LWP. Otherwise I > don't see how the stack could have been walked. That's another > indication that the ptrace failure is spurious in nature. > > thanks, > > Chris >> >> Cheers, >> David >> ----- >> >>> Also, even if you are using something like clhsdb to issue commands >>> on addresses, if the address is no longer valid for the command you >>> are executing, then you would get the appropriate error when there is >>> an attempt to create a wrapper for it. I don't know of any command >>> that operates directly on a JavaThread, but I think there are for >>> InstanceKlass. So if you remembered the address of an InstanceKlass, >>> and then reattached and tried a command that takes an InstanceKlass >>> address, you would get an exception when SA tries to create the >>> wrapper for the InsanceKlass if it were no longer a valid address for >>> one. >>> >>> Chris >>>> >>>> David >>>> ----- >>>> >>>>> Chris >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Chris >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>>>>> invalid, print a warning. >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>> -Improve a print_debug message >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>> >>>>>>>>> -Deal with the array of registers read in being null due to the >>>>>>>>> OS ThreadID not being valid. >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>> >>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>> >>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>>> appearing twice when printing the exception. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Chris >>>>>>> >>>>> >>> > > From mandy.chung at oracle.com Fri Jun 19 16:11:31 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 19 Jun 2020 09:11:31 -0700 Subject: RFR: JDK-8247784,Bad link causes invalid documentation In-Reply-To: References: <1165ed72-80c4-48d2-9306-d9b34e5eeecf@oracle.com> Message-ID: +1.? Thanks for fixing this. Mandy On 6/18/20 8:16 PM, Jonathan Gibbons wrote: > > resend, with correct subject line > > On 6/18/20 8:12 PM, Jonathan Gibbons wrote: >> >> Please review some changes to fix typos in some recent doc updates. >> >> In two places, ${docRoot} is used instead of {@docRoot} >> >> -- Jon >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8247784 >> >> Patch: >> >> diff -r c5904de55565 src/jdk.jdi/share/classes/com/sun/jdi/Type.java >> --- a/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 >> 17:32:57 2020 -0700 >> +++ b/src/jdk.jdi/share/classes/com/sun/jdi/Type.java Thu Jun 18 >> 20:05:42 2020 -0700 >> @@ -152,7 +152,7 @@ >> ??????* Returns the name of this type. The result is of the same form as >> ??????* the name returned by {@link Class#getName()}. >> ??????* The returned name may not be a >> - * > href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> + * > href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> ??????* >> ??????* @return the name of this type >> ??????*/ >> diff -r c5904de55565 >> src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java >> --- >> a/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java >> Thu Jun 18 17:32:57 2020 -0700 >> +++ >> b/src/jdk.jdi/share/classes/com/sun/jdi/event/ClassUnloadEvent.java >> Thu Jun 18 20:05:42 2020 -0700 >> @@ -44,7 +44,7 @@ >> ?????/** >> ??????* Returns the {@linkplain com.sun.jdi.Type#name() name of the >> class} >> ??????* that has been unloaded. The returned string may not be a >> - * > href="${docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> + * > href="{@docRoot}/java.base/java/lang/ClassLoader.html#binary-name">binary >> name. >> ??????* >> ??????* @see Class#getName() >> ??????*/ >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Fri Jun 19 20:51:25 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 19 Jun 2020 13:51:25 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> Message-ID: <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> Hello, I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: https://bugs.openjdk.java.net/browse/JDK-8247533 http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html thanks, Chris On 6/19/20 12:24 AM, David Holmes wrote: > Hi Chris, > > On 19/06/2020 8:55 am, Chris Plummer wrote: >> On 6/18/20 1:43 AM, David Holmes wrote: >>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>> Hi Chris, >>>>>>>>> >>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> Please help review the following: >>>>>>>>>> >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The CR contains all the needed details. Here's a summary of >>>>>>>>>> changes in each file: >>>>>>>>> >>>>>>>>> The problem sounds to me like a variation of the more general >>>>>>>>> problem of not ensuring a thread is kept alive whilst acting >>>>>>>>> upon it. I don't know how the SA finds these references to the >>>>>>>>> threads it is going to stackwalk, but is it possible to fix >>>>>>>>> this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>> >>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. >>>>>>>> For example, SA can't create a wrapper to a JavaThread, only to >>>>>>>> have the JavaThread be freed later on. It's just not possible. >>>>>>> >>>>>>> Then how does it obtain a reference to a JavaThread for which >>>>>>> the native OS thread id is invalid? Any thread found in >>>>>>> _java_thread_list is either live or still to be started. In the >>>>>>> latter case the JavaThread->osThread does not have its thread_id >>>>>>> set yet. >>>>>>> >>>>>> My assumption was that the JavaThread is in the process of being >>>>>> destroyed, and it has freed its OS thread but is itself still in >>>>>> the thread list. I did notice that the OS thread id being used >>>>>> looked to be in the range of thread id #'s you would expect for >>>>>> the running app, so that to me indicated it was once valid, but >>>>>> is no more. >>>>>> >>>>>> Keep in mind that although hotspot may have synchronization code >>>>>> that prevents you from pulling a JavaThread off the thread list >>>>>> when it is in the process of being destroyed (I'm guessing it >>>>>> does), SA has no such protections. >>>>> >>>>> But you stated that once the SA has attached, the target VM can't >>>>> change. If the SA gets its set of thread from one attach then >>>>> tries to make queries about those threads in a separate attach, >>>>> then obviously it could be providing garbage thread information. >>>>> So you would need to re-validate the JavaThread in the target VM >>>>> before trying to do anything with it. >>>> That's not what is going on here. It's attaching and doing a stack >>>> trace, which involves getting the thread list and iterating through >>>> all threads without detaching. >>> >>> Okay so I restate my original comment - all the JavaThreads must be >>> alive or not yet started, so how are you encountering an invalid >>> thread id? Any thread you find via the ThreadsList can't have >>> destroyed its osThread. In any case the logic should be checking >>> thread->osThread() for NULL, and then osThread()->get_state() to >>> ensure it is >= INITIALIZED before using the thread_id(). >> Hi David, >> >> I chatted with Dan about this, and he said since the JavaThread is >> responsible for removing itself from the ThreadList, it is impossible >> to have a JavaThread still on the ThreadList, but without and >> underlying OS Thread. So I'm a bit perplexed as to how I can find a >> JavaThread on the ThreadList, but that results in ESRCH when trying >> to access the thread with ptrace. My only conclusion is that this >> failure is somehow spurious, and maybe the issue it just that the >> thread is in some temporary state that prevents its access. If so, I >> still think the approach I'm taking is the correct one, but the >> comments should be updated. > > ESRCH can have other meanings but I don't know enough about the > broader context to know whether they are applicable in this case. > > ??? ESRCH? The? specified? process? does not exist, or is not > currently being traced by the caller, or is not stopped > ????????????? (for requests that require a stopped tracee). > > I won't comment further on the fix/workaround as I don't know the > code. I'll leave that to other folk. > > Cheers, > David > ----- > >> I had one other finding. When this issue first turned up, it >> prevented the thread from getting a stack trace due to the exception >> being thrown. What I hadn't realize is that after fixing it to not >> throw an exception, which resulted in the stack walking code getting >> all nulls for register values, I actually started to see a stack >> trace printed: >> >> "JLine terminal non blocking reader thread" #26 daemon prio=5 >> tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >> ??? java.lang.Thread.State: RUNNABLE >> ??? JavaThread state: _thread_in_native >> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp >> (8089) >> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, >> fp = 0x00007f125f0f47c0 >> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >> @bci=108, line=216 (Interpreted frame) >> ??- >> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >> @bci=4 (Interpreted frame) >> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >> >> The "CurrentFrameGuess" output is some debug tracing I had enabled, >> and it indicates that the stack walking code is using the "last java >> frame" setting, which it will do if current registers values don't >> indicate a valid frame (as would be the case if sp was null). I had >> previously assumed that without an underling valid LWP, there would >> be no stack trace. Given that there is one, there must be a valid >> LWP. Otherwise I don't see how the stack could have been walked. >> That's another indication that the ptrace failure is spurious in nature. >> >> thanks, >> >> Chris >>> >>> Cheers, >>> David >>> ----- >>> >>>> Also, even if you are using something like clhsdb to issue commands >>>> on addresses, if the address is no longer valid for the command you >>>> are executing, then you would get the appropriate error when there >>>> is an attempt to create a wrapper for it. I don't know of any >>>> command that operates directly on a JavaThread, but I think there >>>> are for InstanceKlass. So if you remembered the address of an >>>> InstanceKlass, and then reattached and tried a command that takes >>>> an InstanceKlass address, you would get an exception when SA tries >>>> to create the wrapper for the InsanceKlass if it were no longer a >>>> valid address for one. >>>> >>>> Chris >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> Chris >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Chris >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>>>>>> invalid, print a warning. >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>> -Improve a print_debug message >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>> >>>>>>>>>> -Deal with the array of registers read in being null due to >>>>>>>>>> the OS ThreadID not being valid. >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>> >>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>> >>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>>>> appearing twice when printing the exception. >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>> >>>>>> >>>> >> >> From yasuenag at gmail.com Sat Jun 20 01:33:17 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Sat, 20 Jun 2020 10:33:17 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> Message-ID: Hi Chris, I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. So I guess it is similar to "generic" error code. https://github.com/torvalds/linux/blob/master/kernel/ptrace.c According to manpage of ptrace(2), it might return errno other than ESRCH. For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. https://www.man7.org/linux/man-pages/man2/ptrace.2.html Thanks, Yasumasa On 2020/06/20 5:51, Chris Plummer wrote: > Hello, > > I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: > > https://bugs.openjdk.java.net/browse/JDK-8247533 > http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html > > thanks, > > Chris > > On 6/19/20 12:24 AM, David Holmes wrote: >> Hi Chris, >> >> On 19/06/2020 8:55 am, Chris Plummer wrote: >>> On 6/18/20 1:43 AM, David Holmes wrote: >>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>> Hi Chris, >>>>>>>>>> >>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> Please help review the following: >>>>>>>>>>> >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>> >>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>> >>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>> >>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>> >>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>> >>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>> >>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>> >>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>> >>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>> Hi David, >>> >>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >> >> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >> >> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >> ????????????? (for requests that require a stopped tracee). >> >> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >> >> Cheers, >> David >> ----- >> >>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>> >>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>> ??? java.lang.Thread.State: RUNNABLE >>> ??? JavaThread state: _thread_in_native >>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>> >>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>> >>> thanks, >>> >>> Chris >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>> >>>>> Chris >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Chris >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>> >>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>> >>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>> >>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>> >>>>>>> >>>>> >>> >>> > > From jonathan.gibbons at oracle.com Sat Jun 20 03:08:26 2020 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Fri, 19 Jun 2020 20:08:26 -0700 Subject: RFR: [15,docs] JDK-8247958,minor HTML errors in com.sun.jdi Message-ID: <8e12fb61-a2ef-d805-1d90-182a2376fa16@oracle.com> Please review a couple of minor HTML errors (missing end tags) in a couple of classes.? These should be the last of the fixes for com.sun.jdi in this round of cleanup. -- Jon JBS: https://bugs.openjdk.java.net/browse/JDK-8247958 Patch inline: diff -r 086c7f077fc6 src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java --- a/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java Fri Jun 19 15:22:19 2020 -0400 +++ b/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java Fri Jun 19 19:59:48 2020 -0700 @@ -217,6 +217,7 @@ ??????* is false attempting any of the unsupported class file changes described ??????* in ??????* JVM TI RedefineClasses will throw this exception. + * ??????* ??????* @throws java.lang.NoClassDefFoundError if the bytes ??????* don't correspond to the reference type (the names diff -r 086c7f077fc6 src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java --- a/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java Fri Jun 19 15:22:19 2020 -0400 +++ b/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java Fri Jun 19 19:59:48 2020 -0700 @@ -185,6 +185,7 @@ ??* ??* ??* + * ??* ??* ??*

Connectors are created at start-up time. That is, they -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Sat Jun 20 04:50:40 2020 From: david.holmes at oracle.com (David Holmes) Date: Sat, 20 Jun 2020 14:50:40 +1000 Subject: RFR: [15,docs] JDK-8247958,minor HTML errors in com.sun.jdi In-Reply-To: <8e12fb61-a2ef-d805-1d90-182a2376fa16@oracle.com> References: <8e12fb61-a2ef-d805-1d90-182a2376fa16@oracle.com> Message-ID: <5d73a7d4-7480-7a25-f343-eb1803b76854@oracle.com> Looks good. Thanks, David On 20/06/2020 1:08 pm, Jonathan Gibbons wrote: > Please review a couple of minor HTML errors (missing end tags) in a couple > of classes.? These should be the last of the fixes for com.sun.jdi in > this round > of cleanup. > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8247958 > > Patch inline: > > diff -r 086c7f077fc6 > src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java > --- a/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java Fri Jun > 19 15:22:19 2020 -0400 > +++ b/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java Fri Jun > 19 19:59:48 2020 -0700 > @@ -217,6 +217,7 @@ > ??????* is false attempting any of the unsupported class file changes > described > ??????* in > ??????* JVM TI RedefineClasses will throw this exception. > + * > ??????* > ??????* @throws java.lang.NoClassDefFoundError if the bytes > ??????* don't correspond to the reference type (the names > diff -r 086c7f077fc6 > src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java > --- a/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java > Fri Jun 19 15:22:19 2020 -0400 > +++ b/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java > Fri Jun 19 19:59:48 2020 -0700 > @@ -185,6 +185,7 @@ > ??* > ??* > ??* > + * > ??* > ??* > ??*

Connectors are created at start-up time. That is, they > > From chris.plummer at oracle.com Sat Jun 20 06:20:04 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 19 Jun 2020 23:20:04 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> Message-ID: <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> Hi Yasumasa, ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. thanks, Chris On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: > Hi Chris, > > I checked Linux kernel code at a glance, ESRCH seems to be set to > errno by default. > So I guess it is similar to "generic" error code. > > https://github.com/torvalds/linux/blob/master/kernel/ptrace.c > > According to manpage of ptrace(2), it might return errno other than > ESRCH. > For example, if we analyze broken core (e.g. the core was dumped with > disk full), we might get EFAULT. > Thus I prefer to handle ESRCH only in your patch, and also I think SA > should throw DebuggerException if other error is occurred. > > https://www.man7.org/linux/man-pages/man2/ptrace.2.html > > > Thanks, > > Yasumasa > > > On 2020/06/20 5:51, Chris Plummer wrote: >> Hello, >> >> I've? updated with webrev based on the new finding that a JavaThread >> cannot be on the ThreadList after its OS thread has been destroyed >> since the JavaThread removes itself from the ThreadList, and >> therefore must be running on its OS thread. The logic of the fix is >> unchanged from the first webrev, but I updated the comments to better >> reflect what is going on. I also updated the CR: >> >> https://bugs.openjdk.java.net/browse/JDK-8247533 >> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >> >> thanks, >> >> Chris >> >> On 6/19/20 12:24 AM, David Holmes wrote: >>> Hi Chris, >>> >>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>> Hi Chris, >>>>>>>>>>> >>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> Please help review the following: >>>>>>>>>>>> >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The CR contains all the needed details. Here's a summary of >>>>>>>>>>>> changes in each file: >>>>>>>>>>> >>>>>>>>>>> The problem sounds to me like a variation of the more >>>>>>>>>>> general problem of not ensuring a thread is kept alive >>>>>>>>>>> whilst acting upon it. I don't know how the SA finds these >>>>>>>>>>> references to the threads it is going to stackwalk, but is >>>>>>>>>>> it possible to fix this via appropriate uses of >>>>>>>>>>> ThreadsListHandle/Iterator? >>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>> >>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM >>>>>>>>>> changes. For example, SA can't create a wrapper to a >>>>>>>>>> JavaThread, only to have the JavaThread be freed later on. >>>>>>>>>> It's just not possible. >>>>>>>>> >>>>>>>>> Then how does it obtain a reference to a JavaThread for which >>>>>>>>> the native OS thread id is invalid? Any thread found in >>>>>>>>> _java_thread_list is either live or still to be started. In >>>>>>>>> the latter case the JavaThread->osThread does not have its >>>>>>>>> thread_id set yet. >>>>>>>>> >>>>>>>> My assumption was that the JavaThread is in the process of >>>>>>>> being destroyed, and it has freed its OS thread but is itself >>>>>>>> still in the thread list. I did notice that the OS thread id >>>>>>>> being used looked to be in the range of thread id #'s you would >>>>>>>> expect for the running app, so that to me indicated it was once >>>>>>>> valid, but is no more. >>>>>>>> >>>>>>>> Keep in mind that although hotspot may have synchronization >>>>>>>> code that prevents you from pulling a JavaThread off the thread >>>>>>>> list when it is in the process of being destroyed (I'm guessing >>>>>>>> it does), SA has no such protections. >>>>>>> >>>>>>> But you stated that once the SA has attached, the target VM >>>>>>> can't change. If the SA gets its set of thread from one attach >>>>>>> then tries to make queries about those threads in a separate >>>>>>> attach, then obviously it could be providing garbage thread >>>>>>> information. So you would need to re-validate the JavaThread in >>>>>>> the target VM before trying to do anything with it. >>>>>> That's not what is going on here. It's attaching and doing a >>>>>> stack trace, which involves getting the thread list and iterating >>>>>> through all threads without detaching. >>>>> >>>>> Okay so I restate my original comment - all the JavaThreads must >>>>> be alive or not yet started, so how are you encountering an >>>>> invalid thread id? Any thread you find via the ThreadsList can't >>>>> have destroyed its osThread. In any case the logic should be >>>>> checking thread->osThread() for NULL, and then >>>>> osThread()->get_state() to ensure it is >= INITIALIZED before >>>>> using the thread_id(). >>>> Hi David, >>>> >>>> I chatted with Dan about this, and he said since the JavaThread is >>>> responsible for removing itself from the ThreadList, it is >>>> impossible to have a JavaThread still on the ThreadList, but >>>> without and underlying OS Thread. So I'm a bit perplexed as to how >>>> I can find a JavaThread on the ThreadList, but that results in >>>> ESRCH when trying to access the thread with ptrace. My only >>>> conclusion is that this failure is somehow spurious, and maybe the >>>> issue it just that the thread is in some temporary state that >>>> prevents its access. If so, I still think the approach I'm taking >>>> is the correct one, but the comments should be updated. >>> >>> ESRCH can have other meanings but I don't know enough about the >>> broader context to know whether they are applicable in this case. >>> >>> ??? ESRCH? The? specified? process? does not exist, or is not >>> currently being traced by the caller, or is not stopped >>> ????????????? (for requests that require a stopped tracee). >>> >>> I won't comment further on the fix/workaround as I don't know the >>> code. I'll leave that to other folk. >>> >>> Cheers, >>> David >>> ----- >>> >>>> I had one other finding. When this issue first turned up, it >>>> prevented the thread from getting a stack trace due to the >>>> exception being thrown. What I hadn't realize is that after fixing >>>> it to not throw an exception, which resulted in the stack walking >>>> code getting all nulls for register values, I actually started to >>>> see a stack trace printed: >>>> >>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 >>>> tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>> ??? java.lang.Thread.State: RUNNABLE >>>> ??? JavaThread state: _thread_in_native >>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp >>>> (8089) >>>> CurrentFrameGuess: choosing last Java frame: sp = >>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted >>>> frame) >>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>> @bci=108, line=216 (Interpreted frame) >>>> ??- >>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>> @bci=4 (Interpreted frame) >>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>> >>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, >>>> and it indicates that the stack walking code is using the "last >>>> java frame" setting, which it will do if current registers values >>>> don't indicate a valid frame (as would be the case if sp was null). >>>> I had previously assumed that without an underling valid LWP, there >>>> would be no stack trace. Given that there is one, there must be a >>>> valid LWP. Otherwise I don't see how the stack could have been >>>> walked. That's another indication that the ptrace failure is >>>> spurious in nature. >>>> >>>> thanks, >>>> >>>> Chris >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> Also, even if you are using something like clhsdb to issue >>>>>> commands on addresses, if the address is no longer valid for the >>>>>> command you are executing, then you would get the appropriate >>>>>> error when there is an attempt to create a wrapper for it. I >>>>>> don't know of any command that operates directly on a JavaThread, >>>>>> but I think there are for InstanceKlass. So if you remembered the >>>>>> address of an InstanceKlass, and then reattached and tried a >>>>>> command that takes an InstanceKlass address, you would get an >>>>>> exception when SA tries to create the wrapper for the >>>>>> InsanceKlass if it were no longer a valid address for one. >>>>>> >>>>>> Chris >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Chris >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>>>>>>>> invalid, print a warning. >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>> >>>>>>>>>>>> -Deal with the array of registers read in being null due to >>>>>>>>>>>> the OS ThreadID not being valid. >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>> >>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>> >>>>>>>>>>>> -Fix issue with >>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" appearing >>>>>>>>>>>> twice when printing the exception. >>>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>>> >> >> From yasuenag at gmail.com Sat Jun 20 07:53:42 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Sat, 20 Jun 2020 16:53:42 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> Message-ID: Hi Chris, On 2020/06/20 15:20, Chris Plummer wrote: > Hi Yasumasa, > > ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. You are right, sorry. > I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. I'm not sure we can trust call stack when ptrace() returns any errors other than ESRCH even if "last java frame" is available. For example, don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack corruption) If so, it may lead to a wrong analysis for troubleshooter. I think it should be abort dumping call stack for its thread at least. Thanks, Yasumasa > thanks, > > Chris > > On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >> Hi Chris, >> >> I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. >> So I guess it is similar to "generic" error code. >> >> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >> >> According to manpage of ptrace(2), it might return errno other than ESRCH. >> For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. >> Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. >> >> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/20 5:51, Chris Plummer wrote: >>> Hello, >>> >>> I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>> >>> thanks, >>> >>> Chris >>> >>> On 6/19/20 12:24 AM, David Holmes wrote: >>>> Hi Chris, >>>> >>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>> Hi Chris, >>>>>>>>>>>> >>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>> >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>> >>>>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>>>> >>>>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>> >>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>>>> >>>>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>> >>>>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>>>> >>>>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>>>> >>>>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>>>> >>>>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>>>> Hi David, >>>>> >>>>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >>>> >>>> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >>>> >>>> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >>>> ????????????? (for requests that require a stopped tracee). >>>> >>>> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>>>> >>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>> ??? java.lang.Thread.State: RUNNABLE >>>>> ??? JavaThread state: _thread_in_native >>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>>>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>> >>>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>>>> >>>>>>> Chris >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> Chris >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>>>> >>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>> >>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>>>> >>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>>>> >>>>>>>>>>>>> thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > From serguei.spitsyn at oracle.com Sun Jun 21 05:47:01 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sat, 20 Jun 2020 22:47:01 -0700 Subject: RFR: [15,docs] JDK-8247958,minor HTML errors in com.sun.jdi In-Reply-To: <5d73a7d4-7480-7a25-f343-eb1803b76854@oracle.com> References: <8e12fb61-a2ef-d805-1d90-182a2376fa16@oracle.com> <5d73a7d4-7480-7a25-f343-eb1803b76854@oracle.com> Message-ID: Hi Jon, It looks good. Thanks, Serguei On 6/19/20 21:50, David Holmes wrote: > Looks good. > > Thanks, > David > > On 20/06/2020 1:08 pm, Jonathan Gibbons wrote: >> Please review a couple of minor HTML errors (missing end tags) in a >> couple >> of classes.? These should be the last of the fixes for com.sun.jdi in >> this round >> of cleanup. >> >> -- Jon >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8247958 >> >> Patch inline: >> >> diff -r 086c7f077fc6 >> src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java >> --- a/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java Fri >> Jun 19 15:22:19 2020 -0400 >> +++ b/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachine.java Fri >> Jun 19 19:59:48 2020 -0700 >> @@ -217,6 +217,7 @@ >> ???????* is false attempting any of the unsupported class file >> changes described >> ???????* in >> ???????* JVM TI RedefineClasses will throw this exception. >> + * >> ???????* >> ???????* @throws java.lang.NoClassDefFoundError if the bytes >> ???????* don't correspond to the reference type (the names >> diff -r 086c7f077fc6 >> src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java >> --- >> a/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java >> Fri Jun 19 15:22:19 2020 -0400 >> +++ >> b/src/jdk.jdi/share/classes/com/sun/jdi/VirtualMachineManager.java >> Fri Jun 19 19:59:48 2020 -0700 >> @@ -185,6 +185,7 @@ >> ???* >> ???* >> ???* >> + * >> ???* >> ???* >> ???*

Connectors are created at start-up time. That is, they >> >> From volker.simonis at gmail.com Mon Jun 22 15:03:41 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 22 Jun 2020 17:03:41 +0200 Subject: RFR (S): 8245129: Enhance jstat gc option output and tests In-Reply-To: <32795C52-3BFC-4277-8969-646113D1156B@amazon.com> References: <79438EAF-531A-488F-BED7-A619C3E227D5@amazon.com> <32795C52-3BFC-4277-8969-646113D1156B@amazon.com> Message-ID: Hi Paul, thanks for fixing jstat for larger heaps. I like that you've added explicit tests for ParallelGC which hasn't been tested since G1 was made the default collector. I also agree that sizes should all be right justified. I only wonder if the header of a right justified column shouldn't be right justified as well? However, taking into account that this already hasn't been handled consistently before your change, I'm fine to postpone that to a follow-up cleanup change. I think the change looks good so thumbs up from me. Thank you and best regards, Volker On Thu, Jun 18, 2020 at 11:53 PM Hohensee, Paul wrote: > > Ping. Any takers for this simple patch? > > > > Thanks, > > Paul > > > > From: serviceability-dev on behalf of "Hohensee, Paul" > Date: Monday, May 18, 2020 at 8:25 AM > To: serviceability-dev > Subject: RFR (S): 8245129: Enhance jstat gc option output and tests > > > > Please review an enhancement to the jstat gc option output to make the columns wider (for up to a 2TB heap) so one can read the output without going cross-eyed. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8245129 > > Webrev: http://cr.openjdk.java.net/~phh/8245129/webrev.00/ > > > > I added tests using ParallelGC since the output can differ for non-G1 collectors. Successfully ran the test/hotspot/jtreg/serviceability/tmtools/jstat and test/jdk/sun/tools/jstat tests. A submit repo run had one failure > > > > runtime/MemberName/MemberNameLeak.java > > tier1 > > macosx-x64-debug > > > > but rerunning it on my laptop succeeded, and there?s no connection between this test and my patch. > > > > Thanks, > > Paul > > > > From hohensee at amazon.com Mon Jun 22 15:48:21 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Mon, 22 Jun 2020 15:48:21 +0000 Subject: RFR (S): 8245129: Enhance jstat gc option output and tests Message-ID: Thanks very much for review, Volker. I'll file a follow-up issue. One more reviewer, please? :) Paul ?On 6/22/20, 8:10 AM, "serviceability-dev on behalf of Volker Simonis" wrote: Hi Paul, thanks for fixing jstat for larger heaps. I like that you've added explicit tests for ParallelGC which hasn't been tested since G1 was made the default collector. I also agree that sizes should all be right justified. I only wonder if the header of a right justified column shouldn't be right justified as well? However, taking into account that this already hasn't been handled consistently before your change, I'm fine to postpone that to a follow-up cleanup change. I think the change looks good so thumbs up from me. Thank you and best regards, Volker On Thu, Jun 18, 2020 at 11:53 PM Hohensee, Paul wrote: > > Ping. Any takers for this simple patch? > > > > Thanks, > > Paul > > > > From: serviceability-dev on behalf of "Hohensee, Paul" > Date: Monday, May 18, 2020 at 8:25 AM > To: serviceability-dev > Subject: RFR (S): 8245129: Enhance jstat gc option output and tests > > > > Please review an enhancement to the jstat gc option output to make the columns wider (for up to a 2TB heap) so one can read the output without going cross-eyed. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8245129 > > Webrev: http://cr.openjdk.java.net/~phh/8245129/webrev.00/ > > > > I added tests using ParallelGC since the output can differ for non-G1 collectors. Successfully ran the test/hotspot/jtreg/serviceability/tmtools/jstat and test/jdk/sun/tools/jstat tests. A submit repo run had one failure > > > > runtime/MemberName/MemberNameLeak.java > > tier1 > > macosx-x64-debug > > > > but rerunning it on my laptop succeeded, and there?s no connection between this test and my patch. > > > > Thanks, > > Paul > > > > From jonathan.gibbons at oracle.com Mon Jun 22 23:55:09 2020 From: jonathan.gibbons at oracle.com (Jonathan Gibbons) Date: Mon, 22 Jun 2020 16:55:09 -0700 Subject: RFR: [15,docs] JDK-8248061,bad reference in @throws in HotSpotDiagnosticMXBean Message-ID: Please review a small change to fix an unresolved reference in `@throws IOException`. The problem is that the method signature uses a fully-qualified name for `java.io.IOException` instead of importing it. meaning that the `@throws` cannot resolve the name. Although this could be fixed by using a fully-qualified name in `@throws` as well, a better, more conventional solution is to import that name and use the simple name in both places. -- Jon JBS: https://bugs.openjdk.java.net/browse/JDK-8248061 Patch: $ hg diff -R open open/src/jdk.management diff -r 9cfa0137612f src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java --- a/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java Mon Jun 22 13:37:41 2020 -0700 +++ b/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java Mon Jun 22 16:07:38 2020 -0700 @@ -25,6 +25,7 @@ ?package com.sun.management; +import java.io.IOException; ?import java.lang.management.PlatformManagedObject; ?/** @@ -72,7 +73,7 @@ ??????* method denies write access to the named file ??????* or the caller does not have ManagmentPermission("control"). ??????*/ - public void dumpHeap(String outputFile, boolean live) throws java.io.IOException; + public void dumpHeap(String outputFile, boolean live) throws IOException; ?????/** ??????* Returns a list of {@code VMOption} objects for all diagnostic options. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Tue Jun 23 02:32:32 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 22 Jun 2020 22:32:32 -0400 Subject: RFR: [15,docs] JDK-8248061,bad reference in @throws in HotSpotDiagnosticMXBean In-Reply-To: References: Message-ID: On 6/22/20 7:55 PM, Jonathan Gibbons wrote: > > Please review a small change to fix an unresolved reference in > `@throws IOException`. > Thumbs up. Dan > The problem is that the method signature uses a fully-qualified name > for `java.io.IOException` instead of importing it. meaning that the > `@throws` cannot resolve the name. Although this could be fixed by > using a fully-qualified name in `@throws` as well, a better, more > conventional solution is to import that name and use the simple name > in both places. > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8248061 > > Patch: > > > $ hg diff -R open open/src/jdk.management > diff -r 9cfa0137612f > src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java > --- > a/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java > Mon Jun 22 13:37:41 2020 -0700 > +++ > b/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java > Mon Jun 22 16:07:38 2020 -0700 > @@ -25,6 +25,7 @@ > > ?package com.sun.management; > > +import java.io.IOException; > ?import java.lang.management.PlatformManagedObject; > > ?/** > @@ -72,7 +73,7 @@ > ??????* method denies write access to the named file > ??????* or the caller does not have ManagmentPermission("control"). > ??????*/ > - public void dumpHeap(String outputFile, boolean live) throws > java.io.IOException; > + public void dumpHeap(String outputFile, boolean live) throws > IOException; > > ?????/** > ??????* Returns a list of {@code VMOption} objects for all diagnostic > options. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Jun 23 02:43:01 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 22 Jun 2020 19:43:01 -0700 Subject: RFR: [15,docs] JDK-8248061,bad reference in @throws in HotSpotDiagnosticMXBean In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mandy.chung at oracle.com Tue Jun 23 03:43:37 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 22 Jun 2020 20:43:37 -0700 Subject: RFR: [15,docs] JDK-8248061,bad reference in @throws in HotSpotDiagnosticMXBean In-Reply-To: References: Message-ID: <17aeecd8-eb11-88f6-25a8-ded0d6341c53@oracle.com> +1 Mandy On 6/22/20 4:55 PM, Jonathan Gibbons wrote: > > Please review a small change to fix an unresolved reference in > `@throws IOException`. > > The problem is that the method signature uses a fully-qualified name > for `java.io.IOException` instead of importing it. meaning that the > `@throws` cannot resolve the name. Although this could be fixed by > using a fully-qualified name in `@throws` as well, a better, more > conventional solution is to import that name and use the simple name > in both places. > > -- Jon > > JBS: https://bugs.openjdk.java.net/browse/JDK-8248061 > > Patch: > > > $ hg diff -R open open/src/jdk.management > diff -r 9cfa0137612f > src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java > --- > a/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java > Mon Jun 22 13:37:41 2020 -0700 > +++ > b/src/jdk.management/share/classes/com/sun/management/HotSpotDiagnosticMXBean.java > Mon Jun 22 16:07:38 2020 -0700 > @@ -25,6 +25,7 @@ > > ?package com.sun.management; > > +import java.io.IOException; > ?import java.lang.management.PlatformManagedObject; > > ?/** > @@ -72,7 +73,7 @@ > ??????* method denies write access to the named file > ??????* or the caller does not have ManagmentPermission("control"). > ??????*/ > - public void dumpHeap(String outputFile, boolean live) throws > java.io.IOException; > + public void dumpHeap(String outputFile, boolean live) throws > IOException; > > ?????/** > ??????* Returns a list of {@code VMOption} objects for all diagnostic > options. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Tue Jun 23 18:16:06 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 11:16:06 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> Message-ID: <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: > Hi Chris, > > On 2020/06/20 15:20, Chris Plummer wrote: >> Hi Yasumasa, >> >> ptrace is not used for core files, so the EFAULT for a bad core file >> is not a possibility. However, get_lwp_regs() does redirect to >> core_get_lwp_regs() for core files. It can fail, but the only reason >> it ever does is if the LWP can't be found in the core (which is never >> suppose to happen). I would think if this happened due to the core >> being truncated, SA would be blowing up all over the place with >> exceptions, probably before we ever get to this code, but in any cast >> what we do here wouldn't really make a difference. > > You are right, sorry. > > >> I'm not sure why you prefer an exception for errors other than ESRCH. >> Why should they be treated differently? >> getThreadIntegerRegisterSet0() is used for finding the current frame >> for stack tracing. With my changes any failure will result in >> deferring to "last java frame" if set, and otherwise just not produce >> a stack trace (and the WARNING will be present in the output). This >> seems preferable to completely abandoning any further thread stack >> tracking. > > I'm not sure we can trust call stack when ptrace() returns any errors > other than ESRCH even if "last java frame" is available. For example, > don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack > corruption) If so, it may lead to a wrong analysis for troubleshooter. > I think it should be abort dumping call stack for its thread at least. Hi Yasumasa, In general stack walking makes a best effort and can be wrong, even when not getting errors like this. For any actively executing thread SA needs to determine where the stack starts, with register contents being the starting point (SP, FP, and PC). These registers could contain anything, and SA makes a best effort to determine a current frame from them. However, the verification steps it takes are not 100% guaranteed, and can lead to an incorrect assumption of the current frame, which in turn can result in an exception later on when walking the stack. See JDK-8247641. Keep in mind that the WARNING message will always be there. This should be enough to put the troubleshooter on alert that the stack trace may not be accurate. I think it's better to make an attempt at a stack trace then to just abandon it and not attempt to do something that may be useful. thanks, Chris > > > Thanks, > > Yasumasa > > >> thanks, >> >> Chris >> >> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>> Hi Chris, >>> >>> I checked Linux kernel code at a glance, ESRCH seems to be set to >>> errno by default. >>> So I guess it is similar to "generic" error code. >>> >>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>> >>> According to manpage of ptrace(2), it might return errno other than >>> ESRCH. >>> For example, if we analyze broken core (e.g. the core was dumped >>> with disk full), we might get EFAULT. >>> Thus I prefer to handle ESRCH only in your patch, and also I think >>> SA should throw DebuggerException if other error is occurred. >>> >>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/20 5:51, Chris Plummer wrote: >>>> Hello, >>>> >>>> I've? updated with webrev based on the new finding that a >>>> JavaThread cannot be on the ThreadList after its OS thread has been >>>> destroyed since the JavaThread removes itself from the ThreadList, >>>> and therefore must be running on its OS thread. The logic of the >>>> fix is unchanged from the first webrev, but I updated the comments >>>> to better reflect what is going on. I also updated the CR: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>> Hi Chris, >>>>> >>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The CR contains all the needed details. Here's a summary >>>>>>>>>>>>>> of changes in each file: >>>>>>>>>>>>> >>>>>>>>>>>>> The problem sounds to me like a variation of the more >>>>>>>>>>>>> general problem of not ensuring a thread is kept alive >>>>>>>>>>>>> whilst acting upon it. I don't know how the SA finds these >>>>>>>>>>>>> references to the threads it is going to stackwalk, but is >>>>>>>>>>>>> it possible to fix this via appropriate uses of >>>>>>>>>>>>> ThreadsListHandle/Iterator? >>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>> >>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM >>>>>>>>>>>> changes. For example, SA can't create a wrapper to a >>>>>>>>>>>> JavaThread, only to have the JavaThread be freed later on. >>>>>>>>>>>> It's just not possible. >>>>>>>>>>> >>>>>>>>>>> Then how does it obtain a reference to a JavaThread for >>>>>>>>>>> which the native OS thread id is invalid? Any thread found >>>>>>>>>>> in _java_thread_list is either live or still to be started. >>>>>>>>>>> In the latter case the JavaThread->osThread does not have >>>>>>>>>>> its thread_id set yet. >>>>>>>>>>> >>>>>>>>>> My assumption was that the JavaThread is in the process of >>>>>>>>>> being destroyed, and it has freed its OS thread but is itself >>>>>>>>>> still in the thread list. I did notice that the OS thread id >>>>>>>>>> being used looked to be in the range of thread id #'s you >>>>>>>>>> would expect for the running app, so that to me indicated it >>>>>>>>>> was once valid, but is no more. >>>>>>>>>> >>>>>>>>>> Keep in mind that although hotspot may have synchronization >>>>>>>>>> code that prevents you from pulling a JavaThread off the >>>>>>>>>> thread list when it is in the process of being destroyed (I'm >>>>>>>>>> guessing it does), SA has no such protections. >>>>>>>>> >>>>>>>>> But you stated that once the SA has attached, the target VM >>>>>>>>> can't change. If the SA gets its set of thread from one attach >>>>>>>>> then tries to make queries about those threads in a separate >>>>>>>>> attach, then obviously it could be providing garbage thread >>>>>>>>> information. So you would need to re-validate the JavaThread >>>>>>>>> in the target VM before trying to do anything with it. >>>>>>>> That's not what is going on here. It's attaching and doing a >>>>>>>> stack trace, which involves getting the thread list and >>>>>>>> iterating through all threads without detaching. >>>>>>> >>>>>>> Okay so I restate my original comment - all the JavaThreads must >>>>>>> be alive or not yet started, so how are you encountering an >>>>>>> invalid thread id? Any thread you find via the ThreadsList can't >>>>>>> have destroyed its osThread. In any case the logic should be >>>>>>> checking thread->osThread() for NULL, and then >>>>>>> osThread()->get_state() to ensure it is >= INITIALIZED before >>>>>>> using the thread_id(). >>>>>> Hi David, >>>>>> >>>>>> I chatted with Dan about this, and he said since the JavaThread >>>>>> is responsible for removing itself from the ThreadList, it is >>>>>> impossible to have a JavaThread still on the ThreadList, but >>>>>> without and underlying OS Thread. So I'm a bit perplexed as to >>>>>> how I can find a JavaThread on the ThreadList, but that results >>>>>> in ESRCH when trying to access the thread with ptrace. My only >>>>>> conclusion is that this failure is somehow spurious, and maybe >>>>>> the issue it just that the thread is in some temporary state that >>>>>> prevents its access. If so, I still think the approach I'm taking >>>>>> is the correct one, but the comments should be updated. >>>>> >>>>> ESRCH can have other meanings but I don't know enough about the >>>>> broader context to know whether they are applicable in this case. >>>>> >>>>> ??? ESRCH? The? specified? process? does not exist, or is not >>>>> currently being traced by the caller, or is not stopped >>>>> ????????????? (for requests that require a stopped tracee). >>>>> >>>>> I won't comment further on the fix/workaround as I don't know the >>>>> code. I'll leave that to other folk. >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> I had one other finding. When this issue first turned up, it >>>>>> prevented the thread from getting a stack trace due to the >>>>>> exception being thrown. What I hadn't realize is that after >>>>>> fixing it to not throw an exception, which resulted in the stack >>>>>> walking code getting all nulls for register values, I actually >>>>>> started to see a stack trace printed: >>>>>> >>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 >>>>>> tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>> ??? JavaThread state: _thread_in_native >>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for >>>>>> lwp (8089) >>>>>> CurrentFrameGuess: choosing last Java frame: sp = >>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted >>>>>> frame) >>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>>>> @bci=108, line=216 (Interpreted frame) >>>>>> ??- >>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>>>> @bci=4 (Interpreted frame) >>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>> >>>>>> The "CurrentFrameGuess" output is some debug tracing I had >>>>>> enabled, and it indicates that the stack walking code is using >>>>>> the "last java frame" setting, which it will do if current >>>>>> registers values don't indicate a valid frame (as would be the >>>>>> case if sp was null). I had previously assumed that without an >>>>>> underling valid LWP, there would be no stack trace. Given that >>>>>> there is one, there must be a valid LWP. Otherwise I don't see >>>>>> how the stack could have been walked. That's another indication >>>>>> that the ptrace failure is spurious in nature. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Also, even if you are using something like clhsdb to issue >>>>>>>> commands on addresses, if the address is no longer valid for >>>>>>>> the command you are executing, then you would get the >>>>>>>> appropriate error when there is an attempt to create a wrapper >>>>>>>> for it. I don't know of any command that operates directly on a >>>>>>>> JavaThread, but I think there are for InstanceKlass. So if you >>>>>>>> remembered the address of an InstanceKlass, and then reattached >>>>>>>> and tried a command that takes an InstanceKlass address, you >>>>>>>> would get an exception when SA tries to create the wrapper for >>>>>>>> the InsanceKlass if it were no longer a valid address for one. >>>>>>>> >>>>>>>> Chris >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is >>>>>>>>>>>>>> invalid, print a warning. >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Deal with the array of registers read in being null due >>>>>>>>>>>>>> to the OS ThreadID not being valid. >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Fix issue with >>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" appearing >>>>>>>>>>>>>> twice when printing the exception. >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From chris.plummer at oracle.com Tue Jun 23 18:29:07 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 11:29:07 -0700 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> Message-ID: <6efbc900-732f-ee8b-5561-f9a813ebfeca@oracle.com> Ping! If this fix is too complicated, there is a simplification I can make, but at the cost of abandoning some attempts to determine the current frame when this error condition pops up. At the start of validateInterpreterFrame() it attempts to verify that the frame is valid by verifying that frame->method and frame->bcp are valid. This part is pretty simple. The complicated part is everything that follows if the verification fails. It attempts to error correct the situation by looking at various register contents and stack contents. I could just abandon this complicated code and return false if frame->method and frame->bcp don't check out. Upon return, the caller's code would be simplified to: ??????????? if (validateInterpreterFrame(sp, fp, pc)) { ????????????? return true; // We're done. setValues() has been called for valid interpreter frame. ??????????? } else { ????????????? return checkLastJavaSP(); ??????????? } So there's still a chance we can determine a valid current frame if "last java frame" has been setup. However, if not setup we would not be able to. This is where the complicated code in validateInterpreterFrame() is useful because it can usually determine the current frame, even if "last java frame" is not setup, but it's rare enough that we run into this situation that I think failing to get the current frame is ok. So if I can get a couple promises for reviews if I make this change, I'll go ahead and do it and send out a new RFR. thanks, Chris On 6/18/20 5:54 PM, Chris Plummer wrote: > [I've added runtime-dev to this SA review since understanding > interpreter invokes (code generated by > TemplateInterpreterGenerator::generate_normal_entry()) and stack > walking is probably more important than understanding SA.] > > Hello, > > Please help review the following: > > https://bugs.openjdk.java.net/browse/JDK-8244383 > http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html > > The crux of the bug is when doing stack walking the topmost frame is > in an inconsistent state because we are in the middle of pushing a new > interpreter frame. Basically we are executing code generated by > TemplateInterpreterGenerator::generate_normal_entry(). Since the PC > register is in this code, SA assumes the topmost frame is an > interpreter frame. > > The first issue with this interpreter frame assumption is if we > haven't actually pushed the frame yet, then the current frame is the > caller's frame, and could be compiled. But since SA thinks it's > interpreted, later on it tries to convert the frame->bcp to a BCI, but > frame->bcp is only valid for interpreter frames. Thus the "illegal > BCI" failures. If the previous frame happened to be interpreted, then > the existing SA code works fine. > > The other state of frame pushing that was problematic was when the new > frame had been pushed, but frame->method and frame->bcp were not setup > yet. This also would lead to "illegal BCI" later on because garbage > would be stored in these locations. > > Fixing the above problems requires trying to determine the state of > the frame push through a series of checks, and then adapting what is > considered to be the current frame based on the outcome of the checks. > The first things checked is that frame->method is valid (we can > successfully instantiate a wrapper for the Method* without failure) > and that frame->bcp is within the method. If both these pass then we > can use the frame as-is. > > If the above checks fail, then we try to determine whether the issue > is that the frame is not yet pushed and the current frame is actually > compiled, or the frame has been pushed but not yet initialized. This > is done by first getting the return address from the stack or RAX > (it's location depends on how far along we are in the entry code) and > comparing this to what is stored in frame->return_addr. If they are > the same, then we have pushed the frame but not yet initialized it. In > this case we use the previous frame (senderSP() and senderFP()) as the > current frame since the current frame is not yet initialized. If the > return address check fails, then we assume the new frame is not yet > pushed, and and treat the current frame as compiled, even though PC > points into the interpreter (we replace PC with RAX in this case). > > Comments in the code pretty well explain all the above, so it is > probably easier to follow the logic in the code along with the > comments rather than apply my above description to the code. > > I should add that it's very rare that we ever get into this special > error handling code. This bug was very hard to reproduce initially. I > was only able to make progress with reproducing and debugging by > inserting delay loops in various spots in the code generated by > TemplateInterpreterGenerator::generate_normal_entry(). By doing this I > was able to reproduce the issue quite easily and hit all the logic in > the new code I've added. > > The fix is basically entirely contained within > AMD64CurrentFrameGuess.java. The rest of the changes are minor: > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java > > -Main fix for CR > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java > > -Added getInterpreterFrameBCP(), which is now needed by > AMD64CurrentFrameGuess.java > -I also simplified some code by using the existing > getInterpreterFrameMethod() > ?rather than replicating inline what it does. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java > > -I noticed the windows version of this code had some extra checks that > were missing > ?from the bsd version. I then looked at the linux version, but it had > been heavily modified > ?a short while back to leverage DWARF info to determine frames. So I > looked at the previous > ?rev and it too had these extra checks. I decided to add them to the > BSD port. I'm not sure > ?if it helps at all, but it certainly doesn't seem to do any harm. > > thanks, > > Chris > From chris.plummer at oracle.com Tue Jun 23 18:38:48 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 11:38:48 -0700 Subject: [15] RFR(XS): 8247730: 2 JNI exception pending defect groups in DwarfParser.cpp Message-ID: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> Hello, Please review the following: https://bugs.openjdk.java.net/browse/JDK-8247730 http://cr.openjdk.java.net/~cjplummer/8247730/webrev.00/ There are two locations were we make a JNI call with the possibility of a pending exception. This is new code in JDK 15, so it is being addressed there. thanks, Chris From daniel.daugherty at oracle.com Tue Jun 23 21:44:06 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 23 Jun 2020 17:44:06 -0400 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> Message-ID: <1c642a24-d994-e34a-6af8-61c4dab7709d@oracle.com> On 6/18/20 8:54 PM, Chris Plummer wrote: > [I've added runtime-dev to this SA review since understanding > interpreter invokes (code generated by > TemplateInterpreterGenerator::generate_normal_entry()) and stack > walking is probably more important than understanding SA.] > > Hello, > > Please help review the following: > > https://bugs.openjdk.java.net/browse/JDK-8244383 > http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html Sorry for the delay in reviewing this one. I've come back to it a couple of times because code like this is very hard to review. General comment: ??? This fix reminds of the crazy things that AsyncGetCallTrace has to ??? do in order to gather call trace data. I'm guessing that SA is ??? attaching to the VM in an asynchronous manner and that's why it ??? can observe things like partially constructed frames. If that's a ??? correct guess, then how is SA stopping/suspending the threads? ??? I'm just curious here. ??? Or this might be a case where SA is examining a core file in ??? which case the various threads stacks are not necessarily at ??? good/safepoint-safe pause points. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java ??? No comments. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java ??? No comments. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java ??? L104: ??? // two locations, then we canot determine the frame. ??????? typo: s/canot/cannot/ ??? L127: ??? // it's validity will help us determine the state of the new frame push. ??????? typo: s/it's/its/ ??? L148: ??????? System.out.println("CurrentFrameGuess: frame pushed but not initaliazed."); ??????? typo: s/initaliazed/initialized/ ??? L220: ????????????? System.out.println("CurrentFrameGuess: choosing interpreter frame: sp = " + ??? L221: ???????????????????????????????? spFound + ", fpFound = " + fp + ", pcFound = " + pc); ??????? This debug output doesn't make sense to me: ??????????? "sp = " label and 'spFound' value ??????????? "fpFound = " label and 'fp' value ??????????? "pcFound = " label and 'pc' value ??????? but I may not have enough context... With code like this, it's really hard to figure out if you've covered all the cases unless you've been in the observer seat yourself and even then your test runs may not hit all the possible cases. All you can really do is start with a set of adaptive changes, run with those for a while and tweak them as you gather more observations. Chris, nice job with this bit of insanity! Thumbs up! Dan > > The crux of the bug is when doing stack walking the topmost frame is > in an inconsistent state because we are in the middle of pushing a new > interpreter frame. Basically we are executing code generated by > TemplateInterpreterGenerator::generate_normal_entry(). Since the PC > register is in this code, SA assumes the topmost frame is an > interpreter frame. > > The first issue with this interpreter frame assumption is if we > haven't actually pushed the frame yet, then the current frame is the > caller's frame, and could be compiled. But since SA thinks it's > interpreted, later on it tries to convert the frame->bcp to a BCI, but > frame->bcp is only valid for interpreter frames. Thus the "illegal > BCI" failures. If the previous frame happened to be interpreted, then > the existing SA code works fine. > > The other state of frame pushing that was problematic was when the new > frame had been pushed, but frame->method and frame->bcp were not setup > yet. This also would lead to "illegal BCI" later on because garbage > would be stored in these locations. > > Fixing the above problems requires trying to determine the state of > the frame push through a series of checks, and then adapting what is > considered to be the current frame based on the outcome of the checks. > The first things checked is that frame->method is valid (we can > successfully instantiate a wrapper for the Method* without failure) > and that frame->bcp is within the method. If both these pass then we > can use the frame as-is. > > If the above checks fail, then we try to determine whether the issue > is that the frame is not yet pushed and the current frame is actually > compiled, or the frame has been pushed but not yet initialized. This > is done by first getting the return address from the stack or RAX > (it's location depends on how far along we are in the entry code) and > comparing this to what is stored in frame->return_addr. If they are > the same, then we have pushed the frame but not yet initialized it. In > this case we use the previous frame (senderSP() and senderFP()) as the > current frame since the current frame is not yet initialized. If the > return address check fails, then we assume the new frame is not yet > pushed, and and treat the current frame as compiled, even though PC > points into the interpreter (we replace PC with RAX in this case). > > Comments in the code pretty well explain all the above, so it is > probably easier to follow the logic in the code along with the > comments rather than apply my above description to the code. > > I should add that it's very rare that we ever get into this special > error handling code. This bug was very hard to reproduce initially. I > was only able to make progress with reproducing and debugging by > inserting delay loops in various spots in the code generated by > TemplateInterpreterGenerator::generate_normal_entry(). By doing this I > was able to reproduce the issue quite easily and hit all the logic in > the new code I've added. > > The fix is basically entirely contained within > AMD64CurrentFrameGuess.java. The rest of the changes are minor: > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java > > -Main fix for CR > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java > > -Added getInterpreterFrameBCP(), which is now needed by > AMD64CurrentFrameGuess.java > -I also simplified some code by using the existing > getInterpreterFrameMethod() > ?rather than replicating inline what it does. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java > > -I noticed the windows version of this code had some extra checks that > were missing > ?from the bsd version. I then looked at the linux version, but it had > been heavily modified > ?a short while back to leverage DWARF info to determine frames. So I > looked at the previous > ?rev and it too had these extra checks. I decided to add them to the > BSD port. I'm not sure > ?if it helps at all, but it certainly doesn't seem to do any harm. > > thanks, > > Chris > From alexey.menkov at oracle.com Tue Jun 23 21:46:06 2020 From: alexey.menkov at oracle.com (Alex Menkov) Date: Tue, 23 Jun 2020 14:46:06 -0700 Subject: [15] RFR(XS): 8247730: 2 JNI exception pending defect groups in DwarfParser.cpp In-Reply-To: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> References: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> Message-ID: <977df34e-9ab4-411c-e15f-743ecbc740bc@oracle.com> Hi Chris, LGTM. --alex On 06/23/2020 11:38, Chris Plummer wrote: > Hello, > > Please review the following: > > https://bugs.openjdk.java.net/browse/JDK-8247730 > http://cr.openjdk.java.net/~cjplummer/8247730/webrev.00/ > > There are two locations were we make a JNI call with the possibility of > a pending exception. This is new code in JDK 15, so it is being > addressed there. > > thanks, > > Chris From serguei.spitsyn at oracle.com Tue Jun 23 21:57:12 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 14:57:12 -0700 Subject: [15] RFR(XS): 8247730: 2 JNI exception pending defect groups in DwarfParser.cpp In-Reply-To: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> References: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> Message-ID: <35066aff-525e-09cd-fb91-37d37f3ac909@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Jun 23 22:01:06 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 15:01:06 -0700 Subject: [15] RFR(XS): 8247730: 2 JNI exception pending defect groups in DwarfParser.cpp In-Reply-To: <35066aff-525e-09cd-fb91-37d37f3ac909@oracle.com> References: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> <35066aff-525e-09cd-fb91-37d37f3ac909@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From ioi.lam at oracle.com Tue Jun 23 22:57:23 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 23 Jun 2020 15:57:23 -0700 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <31af0e67-7bee-0680-3f23-09864030bba4@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> <31af0e67-7bee-0680-3f23-09864030bba4@oracle.com> Message-ID: <431a5744-9001-c32a-6874-68d8acc3764e@oracle.com> I've updated the patch to include just the fix for class initialization: http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v02/ Hopefully this part is non-controversial. We are unlikely to make call_class_initializer(THREAD) any slower when there's no , so I didn't add the diagnostic flag as suggested by Claes. I'll leave the class linking alone for now, as that may change in the future. Meanwhile, I will look at other ways of reducing the effect of the performance counters on start-up, under JDK-8246020 (-XX:+UsePerfData is enabled by default and slows down VM bootstrap by 6%). thanks - Ioi On 6/18/20 4:38 AM, Claes Redestad wrote: > > > On 2020-06-17 05:19, Ioi Lam wrote: >> >> >> On 6/16/20 6:20 PM, David Holmes wrote: >>> Hi Ioi, >>> >>> On 17/06/2020 6:14 am, Ioi Lam wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>>> >>>> >>>> PerfClassTraceTime is (a rarely used feature) for measuring the >>>> time spent during class linking and initialization. >>> >>> "A special command jcmd PerfCounter.print >>> prints all performance counters in the process." >>> >>> How do you know this is a "rarely used feature"? >> Hi David, >> >> Sure, the counter will be dumped, but by "rarely used" -- I mean no >> one will find this particular counter useful, and no one will be >> actively looking at it. >> >> I changed two parts of the code -- class init and class linking. >> >> For class initialization, the counter may be useful for people who >> want to know how much time is spent in their functions, and >> my patch doesn't change that. It only avoids using the counter when a >> class has no , i.e., we know that the counter counts nothing >> (except for a logging statement). >> >> ===== >> >> For class linking, no user code is executed, so it only measures VM >> code. If it's useful for anyone, that would be VM engineers like me >> who are trying to optimize the speed of class loading. However, due >> to the overhead of the counter vs what it's trying to measure, the >> results are pretty meaningless. >> >> Note that I've not disabled the counter altogether. Instead, I >> disable it only when linking a CDS shared class, and we know that >> very little is happening for this class (e.g., no verification). >> >> I think the class linking timer might have been useful 15 years ago >> when it was introduced, or it might be useful today when CDS is >> disabled. But with CDS enabled, we are paying a constant price that >> seems to benefit no one. >> >> I think we should short-circuit it when it seems appropriate. If this >> indeed causes problems for our users, it's easy to re-enable it. >> That's better than just keeping this forever just because we're >> afraid to touch anything. > > I think this seems like well-rounded approach overall, but this assumes > that we're mostly measuring the overhead of measurement here. I don't > doubt that's the case for the scenarios you're excluding here and now, > but it's hard to guarantee this property hold in the future. > > Perhaps a diagnostic flag to enable timing unconditionally would be > appropriate? With such a flag we could verify that the time deltas of > running some applications with and without the flag roughly matches the > time delta in reported linking time. If they diverge, we might need to > adjust the conditions. > >> >>> >>> I find it hard to evaluate whether this short-circuiting of the time >>> tracing is reasonable or not. Obviously any monitoring mechanism >>> should impose minimal overhead compared to what is being measured, >>> and these timers fall short in that regard. But if these stats >>> become meaningless then they may as well be removed. >>> >>> I think the serviceability folk (cc'd) need to evaluate this in the >>> context of the M&M tools. > > As a complement (or even alternative) there might be ways we can reduce > time-to-measure overheads. E.g, JFR added > FastUnorderedElapsedCounterSource (share/utilities/ticks.hpp) which uses > rdtsc if available (x86 - fallback to os::elapsed_counter otherwise). > > This might be a reasonable alternative for the Perf* timers, which > should be short-running events on a single thread. > > /Claes > >>> >>>> However, it's quite expensive and it needs to start and stop a >>>> bunch of timers. With CDS, it's quite often for the overhead of the >>>> timer itself to be much more than the time it's trying to measure, >>>> giving unreliable measurement. >>>> >>>> In this patch, when it's clear that the init and linking will be >>>> very quick, I disable the timer and count only the number of >>>> invocations. This shows a small improvement in start-up >>> >>> I'm curious if you tried to forcing EagerInitialization to be true >>> to see how that improves the baseline. I've always noticed >>> eager_init in the code, but hadn't realized it is disabled by default. >>> >> >> I think it cannot be done by default, as it will violate the JLS. A >> class can be initialized only when it's touched by bytecodes. >> >> It can also backfire as we may load many classes without initializing >> them. E.g., during bytecode verification, we load many classes and >> just check that one is a supertype of another. >> >> Thanks >> - Ioi >> >>> Cheers, >>> David >>> ----- >>> >>>> Results of " perf stat -r 100 bin/java -Xshare:on >>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>>> >>>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 ( 0.310) + >>>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 ( 0.341) ++ >>>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>>> ================================================ >>>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>>> instruction delta =????? -284608??? -0.4774% >>>> time??????? delta =?????? -0.434 ms -1.0435% >>>> >>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>>> (so we have an overhead of about 715 instructions per use, yikes!). >> From chris.plummer at oracle.com Tue Jun 23 23:00:11 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 16:00:11 -0700 Subject: [15] RFR(XS): 8247730: 2 JNI exception pending defect groups in DwarfParser.cpp In-Reply-To: References: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> <35066aff-525e-09cd-fb91-37d37f3ac909@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Jun 23 23:16:43 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 16:16:43 -0700 Subject: [15] RFR(XS): 8247730: 2 JNI exception pending defect groups in DwarfParser.cpp In-Reply-To: References: <6be65951-87d0-c85a-4ba6-9154c7188be3@oracle.com> <35066aff-525e-09cd-fb91-37d37f3ac909@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Tue Jun 23 23:28:09 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 16:28:09 -0700 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: <1c642a24-d994-e34a-6af8-61c4dab7709d@oracle.com> References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> <1c642a24-d994-e34a-6af8-61c4dab7709d@oracle.com> Message-ID: <4315bde0-6422-34dd-2795-9596e417d534@oracle.com> On 6/23/20 2:44 PM, Daniel D. Daugherty wrote: > On 6/18/20 8:54 PM, Chris Plummer wrote: >> [I've added runtime-dev to this SA review since understanding >> interpreter invokes (code generated by >> TemplateInterpreterGenerator::generate_normal_entry()) and stack >> walking is probably more important than understanding SA.] >> >> Hello, >> >> Please help review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8244383 >> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html > Thanks for helping! > Sorry for the delay in reviewing this one. I've come back to it a couple > of times because code like this is very hard to review. > > > General comment: > ??? This fix reminds of the crazy things that AsyncGetCallTrace has to > ??? do in order to gather call trace data. I'm guessing that SA is > ??? attaching to the VM in an asynchronous manner and that's why it > ??? can observe things like partially constructed frames. If that's a > ??? correct guess, then how is SA stopping/suspending the threads? > ??? I'm just curious here. On linux SA uses ptrace. I'm not familiar with the details of how it works. I'm not sure where ptrace allows suspends to happen, but certainly it has no knowledge of JVM safepoints or other synchronization that the JVM does. So from the JVM and SA point of view the suspend can happen at any arbitrary JVM instruction. From what I can gather, PTRACE_ATTACH suspends the entire process, so that means all threads are suspended once you attach. However, PTRACE_GETREGS can be called on individual threads (LWPs), but I don't see any indication in the SA code that you need to attach to each LWP first. > > ??? Or this might be a case where SA is examining a core file in > ??? which case the various threads stacks are not necessarily at > ??? good/safepoint-safe pause points. For this bug and test it's a live process, but I think the bug being addressed here can happen just as well with a core file. Unfortunately we have very little core file testing support. I'm actually in the middle of addressing that right now. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java > > ??? No comments. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java > > ??? No comments. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java > > ??? L104: ??? // two locations, then we canot determine the frame. > ??????? typo: s/canot/cannot/ ok > > ??? L127: ??? // it's validity will help us determine the state of the > new frame push. > ??????? typo: s/it's/its/ ok > > ??? L148: ??????? System.out.println("CurrentFrameGuess: frame pushed > but not initaliazed."); > ??????? typo: s/initaliazed/initialized/ ok > > ??? L220: ????????????? System.out.println("CurrentFrameGuess: > choosing interpreter frame: sp = " + > ??? L221: ???????????????????????????????? spFound + ", fpFound = " + > fp + ", pcFound = " + pc); > ??????? This debug output doesn't make sense to me: > > ??????????? "sp = " label and 'spFound' value > ??????????? "fpFound = " label and 'fp' value > ??????????? "pcFound = " label and 'pc' value > > ??????? but I may not have enough context... From the point of view of the person reading the output, they want to know the values for sp, fp, and pc. But within the code these values are stored in the "found" variables. > > With code like this, it's really hard to figure out if you've covered > all the cases unless you've been in the observer seat yourself and > even then your test runs may not hit all the possible cases. All you > can really do is start with a set of adaptive changes, run with those > for a while and tweak them as you gather more observations. Yes, and I know there is still a very tiny gap or two in coverage that are maybe one or two instructions long, but they aren't worth dealing with. This bug was already very rare, and with the fixes I've done I don't see any issues now. SA is a debugger, so perfection in this regard is not expected. > > Chris, nice job with this bit of insanity! Thanks! I mostly stuck with this one to help with my SA expertise. Otherwise it wouldn't have been worth the time. Chris > > Thumbs up! > > Dan > > > >> >> The crux of the bug is when doing stack walking the topmost frame is >> in an inconsistent state because we are in the middle of pushing a >> new interpreter frame. Basically we are executing code generated by >> TemplateInterpreterGenerator::generate_normal_entry(). Since the PC >> register is in this code, SA assumes the topmost frame is an >> interpreter frame. >> >> The first issue with this interpreter frame assumption is if we >> haven't actually pushed the frame yet, then the current frame is the >> caller's frame, and could be compiled. But since SA thinks it's >> interpreted, later on it tries to convert the frame->bcp to a BCI, >> but frame->bcp is only valid for interpreter frames. Thus the >> "illegal BCI" failures. If the previous frame happened to be >> interpreted, then the existing SA code works fine. >> >> The other state of frame pushing that was problematic was when the >> new frame had been pushed, but frame->method and frame->bcp were not >> setup yet. This also would lead to "illegal BCI" later on because >> garbage would be stored in these locations. >> >> Fixing the above problems requires trying to determine the state of >> the frame push through a series of checks, and then adapting what is >> considered to be the current frame based on the outcome of the >> checks. The first things checked is that frame->method is valid (we >> can successfully instantiate a wrapper for the Method* without >> failure) and that frame->bcp is within the method. If both these pass >> then we can use the frame as-is. >> >> If the above checks fail, then we try to determine whether the issue >> is that the frame is not yet pushed and the current frame is actually >> compiled, or the frame has been pushed but not yet initialized. This >> is done by first getting the return address from the stack or RAX >> (it's location depends on how far along we are in the entry code) and >> comparing this to what is stored in frame->return_addr. If they are >> the same, then we have pushed the frame but not yet initialized it. >> In this case we use the previous frame (senderSP() and senderFP()) as >> the current frame since the current frame is not yet initialized. If >> the return address check fails, then we assume the new frame is not >> yet pushed, and and treat the current frame as compiled, even though >> PC points into the interpreter (we replace PC with RAX in this case). >> >> Comments in the code pretty well explain all the above, so it is >> probably easier to follow the logic in the code along with the >> comments rather than apply my above description to the code. >> >> I should add that it's very rare that we ever get into this special >> error handling code. This bug was very hard to reproduce initially. I >> was only able to make progress with reproducing and debugging by >> inserting delay loops in various spots in the code generated by >> TemplateInterpreterGenerator::generate_normal_entry(). By doing this >> I was able to reproduce the issue quite easily and hit all the logic >> in the new code I've added. >> >> The fix is basically entirely contained within >> AMD64CurrentFrameGuess.java. The rest of the changes are minor: >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >> >> -Main fix for CR >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >> >> -Added getInterpreterFrameBCP(), which is now needed by >> AMD64CurrentFrameGuess.java >> -I also simplified some code by using the existing >> getInterpreterFrameMethod() >> ?rather than replicating inline what it does. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >> >> -I noticed the windows version of this code had some extra checks >> that were missing >> ?from the bsd version. I then looked at the linux version, but it had >> been heavily modified >> ?a short while back to leverage DWARF info to determine frames. So I >> looked at the previous >> ?rev and it too had these extra checks. I decided to add them to the >> BSD port. I'm not sure >> ?if it helps at all, but it certainly doesn't seem to do any harm. >> >> thanks, >> >> Chris >> > From yasuenag at gmail.com Wed Jun 24 01:05:44 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 24 Jun 2020 10:05:44 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> Message-ID: <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> Hi Chris, Skillful troubleshooters who use jhsdb will aware this warnings, and they will take other appropriate methods. However, I'm not sure it is worth to continue to perform even if SA cannot get register values. For example, Linux AMD64 depends on RIP and RSP values to find top frame. According to your change, The caller of getThreadIntegerRegisterSet() has responsible for dealing with the set of null registers. However X86ThreadContext::data (it includes raw register values) would still be zero when it happens. So I think register holder (e.g. X86ThreadContext) should have tri-state (have registers, fail to get registers, not yet attempt to get registers). OTOH it might be over-engineering. What do you think? Thanks, Yasumasa On 2020/06/24 3:16, Chris Plummer wrote: > On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >> Hi Chris, >> >> On 2020/06/20 15:20, Chris Plummer wrote: >>> Hi Yasumasa, >>> >>> ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. >> >> You are right, sorry. >> >> >>> I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. >> >> I'm not sure we can trust call stack when ptrace() returns any errors other than ESRCH even if "last java frame" is available. For example, don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack corruption) If so, it may lead to a wrong analysis for troubleshooter. >> I think it should be abort dumping call stack for its thread at least. > Hi Yasumasa, > > In general stack walking makes a best effort and can be wrong, even when not getting errors like this. For any actively executing thread SA needs to determine where the stack starts, with register contents being the starting point (SP, FP, and PC). These registers could contain anything, and SA makes a best effort to determine a current frame from them. However, the verification steps it takes are not 100% guaranteed, and can lead to an incorrect assumption of the current frame, which in turn can result in an exception later on when walking the stack. See JDK-8247641. > > Keep in mind that the WARNING message will always be there. This should be enough to put the troubleshooter on alert that the stack trace may not be accurate. I think it's better to make an attempt at a stack trace then to just abandon it and not attempt to do something that may be useful. > > thanks, > > Chris >> >> >> Thanks, >> >> Yasumasa >> >> >>> thanks, >>> >>> Chris >>> >>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>> Hi Chris, >>>> >>>> I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. >>>> So I guess it is similar to "generic" error code. >>>> >>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>> >>>> According to manpage of ptrace(2), it might return errno other than ESRCH. >>>> For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. >>>> Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. >>>> >>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>> Hello, >>>>> >>>>> I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>> Hi Chris, >>>>>> >>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>>>>>> >>>>>>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>> >>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>>>>>> >>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>>>> >>>>>>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>>>>>> >>>>>>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>>>>>> >>>>>>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>>>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>>>>>> >>>>>>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>>>>>> Hi David, >>>>>>> >>>>>>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >>>>>> >>>>>> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >>>>>> >>>>>> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >>>>>> ????????????? (for requests that require a stopped tracee). >>>>>> >>>>>> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>>>>>> >>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>> ??? JavaThread state: _thread_in_native >>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>>>>>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>> >>>>>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>>>>>> >>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> Chris >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chris >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > From serguei.spitsyn at oracle.com Wed Jun 24 02:05:22 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 19:05:22 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs Message-ID: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> Please, review a fix for: ? https://bugs.openjdk.java.net/browse/JDK-8165276 CSR draft (one CSR reviewer is needed before finalizing it): ? https://bugs.openjdk.java.net/browse/JDK-8248189 Webrev: http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ The java.lang.instrument spec: https://docs.oracle.com/en/java/javase/14/docs/api/java.instrument/java/lang/instrument/package-summary.html Summary: ? The java.lang.instrument spec clearly states: ??? "The agent class must implement a public static premain method ?? ? similar in principle to the main application entry point." ? Current implementation of sun/instrument/InstrumentationImpl.java ? allows the premain method be non-public which violates the spec. ? This fix aligns the implementation with the spec. Testing: ? A mach5 run of jdk_instrument tests is in progress. Thanks, Serguei From daniel.daugherty at oracle.com Wed Jun 24 02:07:03 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 23 Jun 2020 22:07:03 -0400 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: <4315bde0-6422-34dd-2795-9596e417d534@oracle.com> References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> <1c642a24-d994-e34a-6af8-61c4dab7709d@oracle.com> <4315bde0-6422-34dd-2795-9596e417d534@oracle.com> Message-ID: Just one more comment on this part: > >??? L220: System.out.println("CurrentFrameGuess: choosing interpreter > frame: sp = " + > >??? L221: ???????????????????????????????? spFound + ", fpFound = " + > fp + ", pcFound = " + pc); > >??????? This debug output doesn't make sense to me: > > > >??????????? "sp = " label and 'spFound' value > >??????????? "fpFound = " label and 'fp' value > >??????????? "pcFound = " label and 'pc' value > > ??????? but I may not have enough context... > From the point of view of the person reading the output, they want to > know the values for sp, fp, and pc. But within the code these values > are stored in the "found" variables. In that case, the code is wrong for the 'fp' and 'pc' outputs since you changed the labels and not the variables. Dan On 6/23/20 7:28 PM, Chris Plummer wrote: > On 6/23/20 2:44 PM, Daniel D. Daugherty wrote: >> On 6/18/20 8:54 PM, Chris Plummer wrote: >>> [I've added runtime-dev to this SA review since understanding >>> interpreter invokes (code generated by >>> TemplateInterpreterGenerator::generate_normal_entry()) and stack >>> walking is probably more important than understanding SA.] >>> >>> Hello, >>> >>> Please help review the following: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8244383 >>> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html >> > Thanks for helping! >> Sorry for the delay in reviewing this one. I've come back to it a couple >> of times because code like this is very hard to review. >> >> >> General comment: >> ??? This fix reminds of the crazy things that AsyncGetCallTrace has to >> ??? do in order to gather call trace data. I'm guessing that SA is >> ??? attaching to the VM in an asynchronous manner and that's why it >> ??? can observe things like partially constructed frames. If that's a >> ??? correct guess, then how is SA stopping/suspending the threads? >> ??? I'm just curious here. > On linux SA uses ptrace. I'm not familiar with the details of how it > works. I'm not sure where ptrace allows suspends to happen, but > certainly it has no knowledge of JVM safepoints or other > synchronization that the JVM does. So from the JVM and SA point of > view the suspend can happen at any arbitrary JVM instruction. > > From what I can gather, PTRACE_ATTACH suspends the entire process, so > that means all threads are suspended once you attach. However, > PTRACE_GETREGS can be called on individual threads (LWPs), but I don't > see any indication in the SA code that you need to attach to each LWP > first. >> >> ??? Or this might be a case where SA is examining a core file in >> ??? which case the various threads stacks are not necessarily at >> ??? good/safepoint-safe pause points. > For this bug and test it's a live process, but I think the bug being > addressed here can happen just as well with a core file. Unfortunately > we have very little core file testing support. I'm actually in the > middle of addressing that right now. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >> >> ??? No comments. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >> >> ??? No comments. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >> >> ??? L104: ??? // two locations, then we canot determine the frame. >> ??????? typo: s/canot/cannot/ > ok >> >> ??? L127: ??? // it's validity will help us determine the state of >> the new frame push. >> ??????? typo: s/it's/its/ > ok >> >> ??? L148: ??????? System.out.println("CurrentFrameGuess: frame pushed >> but not initaliazed."); >> ??????? typo: s/initaliazed/initialized/ > ok >> >> ??? L220: ????????????? System.out.println("CurrentFrameGuess: >> choosing interpreter frame: sp = " + >> ??? L221: ???????????????????????????????? spFound + ", fpFound = " + >> fp + ", pcFound = " + pc); >> ??????? This debug output doesn't make sense to me: >> >> ??????????? "sp = " label and 'spFound' value >> ??????????? "fpFound = " label and 'fp' value >> ??????????? "pcFound = " label and 'pc' value >> >> ??????? but I may not have enough context... > From the point of view of the person reading the output, they want to > know the values for sp, fp, and pc. But within the code these values > are stored in the "found" variables. >> >> With code like this, it's really hard to figure out if you've covered >> all the cases unless you've been in the observer seat yourself and >> even then your test runs may not hit all the possible cases. All you >> can really do is start with a set of adaptive changes, run with those >> for a while and tweak them as you gather more observations. > Yes, and I know there is still a very tiny gap or two in coverage that > are maybe one or two instructions long, but they aren't worth dealing > with. This bug was already very rare, and with the fixes I've done I > don't see any issues now. SA is a debugger, so perfection in this > regard is not expected. >> >> Chris, nice job with this bit of insanity! > Thanks! I mostly stuck with this one to help with my SA expertise. > Otherwise it wouldn't have been worth the time. > > Chris >> >> Thumbs up! >> >> Dan >> >> >> >>> >>> The crux of the bug is when doing stack walking the topmost frame is >>> in an inconsistent state because we are in the middle of pushing a >>> new interpreter frame. Basically we are executing code generated by >>> TemplateInterpreterGenerator::generate_normal_entry(). Since the PC >>> register is in this code, SA assumes the topmost frame is an >>> interpreter frame. >>> >>> The first issue with this interpreter frame assumption is if we >>> haven't actually pushed the frame yet, then the current frame is the >>> caller's frame, and could be compiled. But since SA thinks it's >>> interpreted, later on it tries to convert the frame->bcp to a BCI, >>> but frame->bcp is only valid for interpreter frames. Thus the >>> "illegal BCI" failures. If the previous frame happened to be >>> interpreted, then the existing SA code works fine. >>> >>> The other state of frame pushing that was problematic was when the >>> new frame had been pushed, but frame->method and frame->bcp were not >>> setup yet. This also would lead to "illegal BCI" later on because >>> garbage would be stored in these locations. >>> >>> Fixing the above problems requires trying to determine the state of >>> the frame push through a series of checks, and then adapting what is >>> considered to be the current frame based on the outcome of the >>> checks. The first things checked is that frame->method is valid (we >>> can successfully instantiate a wrapper for the Method* without >>> failure) and that frame->bcp is within the method. If both these >>> pass then we can use the frame as-is. >>> >>> If the above checks fail, then we try to determine whether the issue >>> is that the frame is not yet pushed and the current frame is >>> actually compiled, or the frame has been pushed but not yet >>> initialized. This is done by first getting the return address from >>> the stack or RAX (it's location depends on how far along we are in >>> the entry code) and comparing this to what is stored in >>> frame->return_addr. If they are the same, then we have pushed the >>> frame but not yet initialized it. In this case we use the previous >>> frame (senderSP() and senderFP()) as the current frame since the >>> current frame is not yet initialized. If the return address check >>> fails, then we assume the new frame is not yet pushed, and and treat >>> the current frame as compiled, even though PC points into the >>> interpreter (we replace PC with RAX in this case). >>> >>> Comments in the code pretty well explain all the above, so it is >>> probably easier to follow the logic in the code along with the >>> comments rather than apply my above description to the code. >>> >>> I should add that it's very rare that we ever get into this special >>> error handling code. This bug was very hard to reproduce initially. >>> I was only able to make progress with reproducing and debugging by >>> inserting delay loops in various spots in the code generated by >>> TemplateInterpreterGenerator::generate_normal_entry(). By doing this >>> I was able to reproduce the issue quite easily and hit all the logic >>> in the new code I've added. >>> >>> The fix is basically entirely contained within >>> AMD64CurrentFrameGuess.java. The rest of the changes are minor: >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >>> >>> -Main fix for CR >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >>> >>> -Added getInterpreterFrameBCP(), which is now needed by >>> AMD64CurrentFrameGuess.java >>> -I also simplified some code by using the existing >>> getInterpreterFrameMethod() >>> ?rather than replicating inline what it does. >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >>> >>> -I noticed the windows version of this code had some extra checks >>> that were missing >>> ?from the bsd version. I then looked at the linux version, but it >>> had been heavily modified >>> ?a short while back to leverage DWARF info to determine frames. So I >>> looked at the previous >>> ?rev and it too had these extra checks. I decided to add them to the >>> BSD port. I'm not sure >>> ?if it helps at all, but it certainly doesn't seem to do any harm. >>> >>> thanks, >>> >>> Chris >>> >> > > From sundararajan.athijegannathan at oracle.com Wed Jun 24 02:53:47 2020 From: sundararajan.athijegannathan at oracle.com (sundararajan.athijegannathan at oracle.com) Date: Wed, 24 Jun 2020 08:23:47 +0530 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> Message-ID: <35b4039e-4252-60c5-689b-d5d99ce43f19@oracle.com> Looks good -Sundar On 24/06/20 7:35 am, serguei.spitsyn at oracle.com wrote: > Please, review a fix for: > ? https://bugs.openjdk.java.net/browse/JDK-8165276 > > > CSR draft (one CSR reviewer is needed before finalizing it): > ? https://bugs.openjdk.java.net/browse/JDK-8248189 > > > Webrev: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ > > > The java.lang.instrument spec: > https://docs.oracle.com/en/java/javase/14/docs/api/java.instrument/java/lang/instrument/package-summary.html > > > > Summary: > ? The java.lang.instrument spec clearly states: > ??? "The agent class must implement a public static premain method > ?? ? similar in principle to the main application entry point." > ? Current implementation of sun/instrument/InstrumentationImpl.java > ? allows the premain method be non-public which violates the spec. > ? This fix aligns the implementation with the spec. > > > Testing: > ? A mach5 run of jdk_instrument tests is in progress. > > > Thanks, > Serguei From chris.plummer at oracle.com Wed Jun 24 03:04:50 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 20:04:50 -0700 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> <1c642a24-d994-e34a-6af8-61c4dab7709d@oracle.com> <4315bde0-6422-34dd-2795-9596e417d534@oracle.com> Message-ID: <4e550a6a-2a04-bcec-f25f-2b9bd0347166@oracle.com> On 6/23/20 7:07 PM, Daniel D. Daugherty wrote: > Just one more comment on this part: > >> >??? L220: System.out.println("CurrentFrameGuess: choosing >> interpreter frame: sp = " + >> >??? L221: ???????????????????????????????? spFound + ", fpFound = " >> + fp + ", pcFound = " + pc); >> >??????? This debug output doesn't make sense to me: >> > >> >??????????? "sp = " label and 'spFound' value >> >??????????? "fpFound = " label and 'fp' value >> >??????????? "pcFound = " label and 'pc' value >> >> ??????? but I may not have enough context... >> From the point of view of the person reading the output, they want to >> know the values for sp, fp, and pc. But within the code these values >> are stored in the "found" variables. > > In that case, the code is wrong for the 'fp' and 'pc' outputs > since you changed the labels and not the variables. Yes, you are correct. I'll fix the output for fp and pc. thanks, Chris > > Dan > > > > On 6/23/20 7:28 PM, Chris Plummer wrote: >> On 6/23/20 2:44 PM, Daniel D. Daugherty wrote: >>> On 6/18/20 8:54 PM, Chris Plummer wrote: >>>> [I've added runtime-dev to this SA review since understanding >>>> interpreter invokes (code generated by >>>> TemplateInterpreterGenerator::generate_normal_entry()) and stack >>>> walking is probably more important than understanding SA.] >>>> >>>> Hello, >>>> >>>> Please help review the following: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8244383 >>>> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html >>> >> Thanks for helping! >>> Sorry for the delay in reviewing this one. I've come back to it a >>> couple >>> of times because code like this is very hard to review. >>> >>> >>> General comment: >>> ??? This fix reminds of the crazy things that AsyncGetCallTrace has to >>> ??? do in order to gather call trace data. I'm guessing that SA is >>> ??? attaching to the VM in an asynchronous manner and that's why it >>> ??? can observe things like partially constructed frames. If that's a >>> ??? correct guess, then how is SA stopping/suspending the threads? >>> ??? I'm just curious here. >> On linux SA uses ptrace. I'm not familiar with the details of how it >> works. I'm not sure where ptrace allows suspends to happen, but >> certainly it has no knowledge of JVM safepoints or other >> synchronization that the JVM does. So from the JVM and SA point of >> view the suspend can happen at any arbitrary JVM instruction. >> >> From what I can gather, PTRACE_ATTACH suspends the entire process, so >> that means all threads are suspended once you attach. However, >> PTRACE_GETREGS can be called on individual threads (LWPs), but I >> don't see any indication in the SA code that you need to attach to >> each LWP first. >>> >>> ??? Or this might be a case where SA is examining a core file in >>> ??? which case the various threads stacks are not necessarily at >>> ??? good/safepoint-safe pause points. >> For this bug and test it's a live process, but I think the bug being >> addressed here can happen just as well with a core file. >> Unfortunately we have very little core file testing support. I'm >> actually in the middle of addressing that right now. >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >>> >>> ??? No comments. >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >>> >>> ??? No comments. >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >>> >>> ??? L104: ??? // two locations, then we canot determine the frame. >>> ??????? typo: s/canot/cannot/ >> ok >>> >>> ??? L127: ??? // it's validity will help us determine the state of >>> the new frame push. >>> ??????? typo: s/it's/its/ >> ok >>> >>> ??? L148: ??????? System.out.println("CurrentFrameGuess: frame >>> pushed but not initaliazed."); >>> ??????? typo: s/initaliazed/initialized/ >> ok >>> >>> ??? L220: ????????????? System.out.println("CurrentFrameGuess: >>> choosing interpreter frame: sp = " + >>> ??? L221: ???????????????????????????????? spFound + ", fpFound = " >>> + fp + ", pcFound = " + pc); >>> ??????? This debug output doesn't make sense to me: >>> >>> ??????????? "sp = " label and 'spFound' value >>> ??????????? "fpFound = " label and 'fp' value >>> ??????????? "pcFound = " label and 'pc' value >>> >>> ??????? but I may not have enough context... >> From the point of view of the person reading the output, they want to >> know the values for sp, fp, and pc. But within the code these values >> are stored in the "found" variables. >>> >>> With code like this, it's really hard to figure out if you've covered >>> all the cases unless you've been in the observer seat yourself and >>> even then your test runs may not hit all the possible cases. All you >>> can really do is start with a set of adaptive changes, run with those >>> for a while and tweak them as you gather more observations. >> Yes, and I know there is still a very tiny gap or two in coverage >> that are maybe one or two instructions long, but they aren't worth >> dealing with. This bug was already very rare, and with the fixes I've >> done I don't see any issues now. SA is a debugger, so perfection in >> this regard is not expected. >>> >>> Chris, nice job with this bit of insanity! >> Thanks! I mostly stuck with this one to help with my SA expertise. >> Otherwise it wouldn't have been worth the time. >> >> Chris >>> >>> Thumbs up! >>> >>> Dan >>> >>> >>> >>>> >>>> The crux of the bug is when doing stack walking the topmost frame >>>> is in an inconsistent state because we are in the middle of pushing >>>> a new interpreter frame. Basically we are executing code generated >>>> by TemplateInterpreterGenerator::generate_normal_entry(). Since the >>>> PC register is in this code, SA assumes the topmost frame is an >>>> interpreter frame. >>>> >>>> The first issue with this interpreter frame assumption is if we >>>> haven't actually pushed the frame yet, then the current frame is >>>> the caller's frame, and could be compiled. But since SA thinks it's >>>> interpreted, later on it tries to convert the frame->bcp to a BCI, >>>> but frame->bcp is only valid for interpreter frames. Thus the >>>> "illegal BCI" failures. If the previous frame happened to be >>>> interpreted, then the existing SA code works fine. >>>> >>>> The other state of frame pushing that was problematic was when the >>>> new frame had been pushed, but frame->method and frame->bcp were >>>> not setup yet. This also would lead to "illegal BCI" later on >>>> because garbage would be stored in these locations. >>>> >>>> Fixing the above problems requires trying to determine the state of >>>> the frame push through a series of checks, and then adapting what >>>> is considered to be the current frame based on the outcome of the >>>> checks. The first things checked is that frame->method is valid (we >>>> can successfully instantiate a wrapper for the Method* without >>>> failure) and that frame->bcp is within the method. If both these >>>> pass then we can use the frame as-is. >>>> >>>> If the above checks fail, then we try to determine whether the >>>> issue is that the frame is not yet pushed and the current frame is >>>> actually compiled, or the frame has been pushed but not yet >>>> initialized. This is done by first getting the return address from >>>> the stack or RAX (it's location depends on how far along we are in >>>> the entry code) and comparing this to what is stored in >>>> frame->return_addr. If they are the same, then we have pushed the >>>> frame but not yet initialized it. In this case we use the previous >>>> frame (senderSP() and senderFP()) as the current frame since the >>>> current frame is not yet initialized. If the return address check >>>> fails, then we assume the new frame is not yet pushed, and and >>>> treat the current frame as compiled, even though PC points into the >>>> interpreter (we replace PC with RAX in this case). >>>> >>>> Comments in the code pretty well explain all the above, so it is >>>> probably easier to follow the logic in the code along with the >>>> comments rather than apply my above description to the code. >>>> >>>> I should add that it's very rare that we ever get into this special >>>> error handling code. This bug was very hard to reproduce initially. >>>> I was only able to make progress with reproducing and debugging by >>>> inserting delay loops in various spots in the code generated by >>>> TemplateInterpreterGenerator::generate_normal_entry(). By doing >>>> this I was able to reproduce the issue quite easily and hit all the >>>> logic in the new code I've added. >>>> >>>> The fix is basically entirely contained within >>>> AMD64CurrentFrameGuess.java. The rest of the changes are minor: >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >>>> >>>> -Main fix for CR >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >>>> >>>> -Added getInterpreterFrameBCP(), which is now needed by >>>> AMD64CurrentFrameGuess.java >>>> -I also simplified some code by using the existing >>>> getInterpreterFrameMethod() >>>> ?rather than replicating inline what it does. >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >>>> >>>> -I noticed the windows version of this code had some extra checks >>>> that were missing >>>> ?from the bsd version. I then looked at the linux version, but it >>>> had been heavily modified >>>> ?a short while back to leverage DWARF info to determine frames. So >>>> I looked at the previous >>>> ?rev and it too had these extra checks. I decided to add them to >>>> the BSD port. I'm not sure >>>> ?if it helps at all, but it certainly doesn't seem to do any harm. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>> >> >> > From chris.plummer at oracle.com Wed Jun 24 03:09:47 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 20:09:47 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> Message-ID: On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: > Hi Chris, > > Skillful troubleshooters who use jhsdb will aware this warnings, and > they will take other appropriate methods. > > However, I'm not sure it is worth to continue to perform even if SA > cannot get register values. > > For example, Linux AMD64 depends on RIP and RSP values to find top frame. > According to your change, The caller of getThreadIntegerRegisterSet() > has responsible for dealing with the set of null registers. However > X86ThreadContext::data (it includes raw register values) would still > be zero when it happens. This is? what I intended to have happen. Just end up with a register set of all nulls. Then when stack walking code gets a null, it will revert to "last java frame" if available, otherwise no stack dump is done. > > So I think register holder (e.g. X86ThreadContext) should have > tri-state (have registers, fail to get registers, not yet attempt to > get registers). > OTOH it might be over-engineering. What do you think? Before implementing this I looked at the what would be the easier approach to get the desired effect of stack walking code simply failing over to using "last java frame", and decided the null set of registers was easiest. Other approaches involved more changes and impacted more files. thanks, Chris > > > Thanks, > > Yasumasa > > > On 2020/06/24 3:16, Chris Plummer wrote: >> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>> Hi Chris, >>> >>> On 2020/06/20 15:20, Chris Plummer wrote: >>>> Hi Yasumasa, >>>> >>>> ptrace is not used for core files, so the EFAULT for a bad core >>>> file is not a possibility. However, get_lwp_regs() does redirect to >>>> core_get_lwp_regs() for core files. It can fail, but the only >>>> reason it ever does is if the LWP can't be found in the core (which >>>> is never suppose to happen). I would think if this happened due to >>>> the core being truncated, SA would be blowing up all over the place >>>> with exceptions, probably before we ever get to this code, but in >>>> any cast what we do here wouldn't really make a difference. >>> >>> You are right, sorry. >>> >>> >>>> I'm not sure why you prefer an exception for errors other than >>>> ESRCH. Why should they be treated differently? >>>> getThreadIntegerRegisterSet0() is used for finding the current >>>> frame for stack tracing. With my changes any failure will result in >>>> deferring to "last java frame" if set, and otherwise just not >>>> produce a stack trace (and the WARNING will be present in the >>>> output). This seems preferable to completely abandoning any further >>>> thread stack tracking. >>> >>> I'm not sure we can trust call stack when ptrace() returns any >>> errors other than ESRCH even if "last java frame" is available. For >>> example, don't ptrace() return EFAULT or EIO when something wrong? >>> (e.g. stack corruption) If so, it may lead to a wrong analysis for >>> troubleshooter. >>> I think it should be abort dumping call stack for its thread at least. >> Hi Yasumasa, >> >> In general stack walking makes a best effort and can be wrong, even >> when not getting errors like this. For any actively executing thread >> SA needs to determine where the stack starts, with register contents >> being the starting point (SP, FP, and PC). These registers could >> contain anything, and SA makes a best effort to determine a current >> frame from them. However, the verification steps it takes are not >> 100% guaranteed, and can lead to an incorrect assumption of the >> current frame, which in turn can result in an exception later on when >> walking the stack. See JDK-8247641. >> >> Keep in mind that the WARNING message will always be there. This >> should be enough to put the troubleshooter on alert that the stack >> trace may not be accurate. I think it's better to make an attempt at >> a stack trace then to just abandon it and not attempt to do something >> that may be useful. >> >> thanks, >> >> Chris >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>> Hi Chris, >>>>> >>>>> I checked Linux kernel code at a glance, ESRCH seems to be set to >>>>> errno by default. >>>>> So I guess it is similar to "generic" error code. >>>>> >>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>> >>>>> According to manpage of ptrace(2), it might return errno other >>>>> than ESRCH. >>>>> For example, if we analyze broken core (e.g. the core was dumped >>>>> with disk full), we might get EFAULT. >>>>> Thus I prefer to handle ESRCH only in your patch, and also I think >>>>> SA should throw DebuggerException if other error is occurred. >>>>> >>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>> Hello, >>>>>> >>>>>> I've? updated with webrev based on the new finding that a >>>>>> JavaThread cannot be on the ThreadList after its OS thread has >>>>>> been destroyed since the JavaThread removes itself from the >>>>>> ThreadList, and therefore must be running on its OS thread. The >>>>>> logic of the fix is unchanged from the first webrev, but I >>>>>> updated the comments to better reflect what is going on. I also >>>>>> updated the CR: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>> >>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a >>>>>>>>>>>>>>>> summary of changes in each file: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The problem sounds to me like a variation of the more >>>>>>>>>>>>>>> general problem of not ensuring a thread is kept alive >>>>>>>>>>>>>>> whilst acting upon it. I don't know how the SA finds >>>>>>>>>>>>>>> these references to the threads it is going to >>>>>>>>>>>>>>> stackwalk, but is it possible to fix this via >>>>>>>>>>>>>>> appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM >>>>>>>>>>>>>> changes. For example, SA can't create a wrapper to a >>>>>>>>>>>>>> JavaThread, only to have the JavaThread be freed later >>>>>>>>>>>>>> on. It's just not possible. >>>>>>>>>>>>> >>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for >>>>>>>>>>>>> which the native OS thread id is invalid? Any thread found >>>>>>>>>>>>> in _java_thread_list is either live or still to be >>>>>>>>>>>>> started. In the latter case the JavaThread->osThread does >>>>>>>>>>>>> not have its thread_id set yet. >>>>>>>>>>>>> >>>>>>>>>>>> My assumption was that the JavaThread is in the process of >>>>>>>>>>>> being destroyed, and it has freed its OS thread but is >>>>>>>>>>>> itself still in the thread list. I did notice that the OS >>>>>>>>>>>> thread id being used looked to be in the range of thread id >>>>>>>>>>>> #'s you would expect for the running app, so that to me >>>>>>>>>>>> indicated it was once valid, but is no more. >>>>>>>>>>>> >>>>>>>>>>>> Keep in mind that although hotspot may have synchronization >>>>>>>>>>>> code that prevents you from pulling a JavaThread off the >>>>>>>>>>>> thread list when it is in the process of being destroyed >>>>>>>>>>>> (I'm guessing it does), SA has no such protections. >>>>>>>>>>> >>>>>>>>>>> But you stated that once the SA has attached, the target VM >>>>>>>>>>> can't change. If the SA gets its set of thread from one >>>>>>>>>>> attach then tries to make queries about those threads in a >>>>>>>>>>> separate attach, then obviously it could be providing >>>>>>>>>>> garbage thread information. So you would need to re-validate >>>>>>>>>>> the JavaThread in the target VM before trying to do anything >>>>>>>>>>> with it. >>>>>>>>>> That's not what is going on here. It's attaching and doing a >>>>>>>>>> stack trace, which involves getting the thread list and >>>>>>>>>> iterating through all threads without detaching. >>>>>>>>> >>>>>>>>> Okay so I restate my original comment - all the JavaThreads >>>>>>>>> must be alive or not yet started, so how are you encountering >>>>>>>>> an invalid thread id? Any thread you find via the ThreadsList >>>>>>>>> can't have destroyed its osThread. In any case the logic >>>>>>>>> should be checking thread->osThread() for NULL, and then >>>>>>>>> osThread()->get_state() to ensure it is >= INITIALIZED before >>>>>>>>> using the thread_id(). >>>>>>>> Hi David, >>>>>>>> >>>>>>>> I chatted with Dan about this, and he said since the JavaThread >>>>>>>> is responsible for removing itself from the ThreadList, it is >>>>>>>> impossible to have a JavaThread still on the ThreadList, but >>>>>>>> without and underlying OS Thread. So I'm a bit perplexed as to >>>>>>>> how I can find a JavaThread on the ThreadList, but that results >>>>>>>> in ESRCH when trying to access the thread with ptrace. My only >>>>>>>> conclusion is that this failure is somehow spurious, and maybe >>>>>>>> the issue it just that the thread is in some temporary state >>>>>>>> that prevents its access. If so, I still think the approach I'm >>>>>>>> taking is the correct one, but the comments should be updated. >>>>>>> >>>>>>> ESRCH can have other meanings but I don't know enough about the >>>>>>> broader context to know whether they are applicable in this case. >>>>>>> >>>>>>> ??? ESRCH? The? specified? process? does not exist, or is not >>>>>>> currently being traced by the caller, or is not stopped >>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>> >>>>>>> I won't comment further on the fix/workaround as I don't know >>>>>>> the code. I'll leave that to other folk. >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> I had one other finding. When this issue first turned up, it >>>>>>>> prevented the thread from getting a stack trace due to the >>>>>>>> exception being thrown. What I hadn't realize is that after >>>>>>>> fixing it to not throw an exception, which resulted in the >>>>>>>> stack walking code getting all nulls for register values, I >>>>>>>> actually started to see a stack trace printed: >>>>>>>> >>>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 >>>>>>>> tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for >>>>>>>> lwp (8089) >>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = >>>>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 >>>>>>>> (Interpreted frame) >>>>>>>> ??- >>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>>>>>> @bci=108, line=216 (Interpreted frame) >>>>>>>> ??- >>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>>>>>> @bci=4 (Interpreted frame) >>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>>> >>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had >>>>>>>> enabled, and it indicates that the stack walking code is using >>>>>>>> the "last java frame" setting, which it will do if current >>>>>>>> registers values don't indicate a valid frame (as would be the >>>>>>>> case if sp was null). I had previously assumed that without an >>>>>>>> underling valid LWP, there would be no stack trace. Given that >>>>>>>> there is one, there must be a valid LWP. Otherwise I don't see >>>>>>>> how the stack could have been walked. That's another indication >>>>>>>> that the ptrace failure is spurious in nature. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> Also, even if you are using something like clhsdb to issue >>>>>>>>>> commands on addresses, if the address is no longer valid for >>>>>>>>>> the command you are executing, then you would get the >>>>>>>>>> appropriate error when there is an attempt to create a >>>>>>>>>> wrapper for it. I don't know of any command that operates >>>>>>>>>> directly on a JavaThread, but I think there are for >>>>>>>>>> InstanceKlass. So if you remembered the address of an >>>>>>>>>> InstanceKlass, and then reattached and tried a command that >>>>>>>>>> takes an InstanceKlass address, you would get an exception >>>>>>>>>> when SA tries to create the wrapper for the InsanceKlass if >>>>>>>>>> it were no longer a valid address for one. >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID >>>>>>>>>>>>>>>> is invalid, print a warning. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Deal with the array of registers read in being null >>>>>>>>>>>>>>>> due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Fix issue with >>>>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" appearing >>>>>>>>>>>>>>>> twice when printing the exception. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From mandy.chung at oracle.com Wed Jun 24 03:21:32 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 23 Jun 2020 20:21:32 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> Message-ID: <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> Hi Serguei, I'm glad that you have a patch for this. On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: > Please, review a fix for: > ? https://bugs.openjdk.java.net/browse/JDK-8165276 > > > CSR draft (one CSR reviewer is needed before finalizing it): > ? https://bugs.openjdk.java.net/browse/JDK-8248189 > The compatibility risk should be low (rather than minimal). It says "All known Java agents define the premain method as public".? It'd be useful to add a comment in the JBS issue to list the Java agents you have checked. > > Webrev: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ > > Looks okay.? Can you add a test to verify this fix? Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Jun 24 03:42:21 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 20:42:21 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> Message-ID: <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> Hi Mandy, Thank you for looking at this! On 6/23/20 20:21, Mandy Chung wrote: > Hi Serguei, > > I'm glad that you have a patch for this. > > On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: >> Please, review a fix for: >> https://bugs.openjdk.java.net/browse/JDK-8165276 >> >> >> CSR draft (one CSR reviewer is needed before finalizing it): >> https://bugs.openjdk.java.net/browse/JDK-8248189 >> > > The compatibility risk should be low (rather than minimal). I was not sure if it has to be minimal or low. Made it low now. > It says "All known Java agents define the premain method as public". > It'd be useful to add a comment in the JBS issue to list the Java > agents you have checked. I'm relying on the Alan's comments posted in the bug report: ?"I checked a number of popular java agents and their premain methods are public, I haven't found any where the premain was not public." ?"I think we should just bite the bullet on this so that the premain must be public as originally intended." Probably, my statement in the CSR is too strong. I've changed it to: ?"No popular Java agent that defines the premain method as a non-public was found." Does it looks better or you think we have to investigate existing popular Java agents? >> Webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >> >> > > Looks okay.? Can you add a test to verify this fix? Yes, I can add a test but it will be trivial. Thanks, Serguei > > Mandy From serguei.spitsyn at oracle.com Wed Jun 24 04:31:35 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 21:31:35 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <35b4039e-4252-60c5-689b-d5d99ce43f19@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <35b4039e-4252-60c5-689b-d5d99ce43f19@oracle.com> Message-ID: Thank you a lot, Sundar! Serguei On 6/23/20 19:53, sundararajan.athijegannathan at oracle.com wrote: > Looks good > > -Sundar > > On 24/06/20 7:35 am, serguei.spitsyn at oracle.com wrote: >> Please, review a fix for: >> ? https://bugs.openjdk.java.net/browse/JDK-8165276 >> >> >> CSR draft (one CSR reviewer is needed before finalizing it): >> ? https://bugs.openjdk.java.net/browse/JDK-8248189 >> >> >> Webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >> >> >> The java.lang.instrument spec: >> https://docs.oracle.com/en/java/javase/14/docs/api/java.instrument/java/lang/instrument/package-summary.html >> >> >> >> Summary: >> ? The java.lang.instrument spec clearly states: >> ??? "The agent class must implement a public static premain method >> ?? ? similar in principle to the main application entry point." >> ? Current implementation of sun/instrument/InstrumentationImpl.java >> ? allows the premain method be non-public which violates the spec. >> ? This fix aligns the implementation with the spec. >> >> >> Testing: >> ? A mach5 run of jdk_instrument tests is in progress. >> >> >> Thanks, >> Serguei From larry.cable at oracle.com Wed Jun 24 04:32:05 2020 From: larry.cable at oracle.com (Laurence Cable) Date: Tue, 23 Jun 2020 21:32:05 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> Message-ID: <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> should we not consider some form of depreciation here, and continue to support non-public pre-main invocation for some time while issuing a warning??? while we have a sample of agents that will not be affected there may be some agent that will fail terminally with this change just a thought - Larry On 6/23/20 8:42 PM, serguei.spitsyn at oracle.com wrote: > Hi Mandy, > > Thank you for looking at this! > > > On 6/23/20 20:21, Mandy Chung wrote: >> Hi Serguei, >> >> I'm glad that you have a patch for this. >> >> On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: >>> Please, review a fix for: >>> https://bugs.openjdk.java.net/browse/JDK-8165276 >>> >>> >>> CSR draft (one CSR reviewer is needed before finalizing it): >>> https://bugs.openjdk.java.net/browse/JDK-8248189 >>> >> >> The compatibility risk should be low (rather than minimal). > > I was not sure if it has to be minimal or low. > Made it low now. > > >> It says "All known Java agents define the premain method as public". >> It'd be useful to add a comment in the JBS issue to list the Java >> agents you have checked. > > I'm relying on the Alan's comments posted in the bug report: > ?"I checked a number of popular java agents and their premain methods > are public, I haven't found any where the premain was not public." > ?"I think we should just bite the bullet on this so that the premain > must be public as originally intended." > > Probably, my statement in the CSR is too strong. > I've changed it to: > ?"No popular Java agent that defines the premain method as a > non-public was found." > > Does it looks better or you think we have to investigate existing > popular Java agents? > > >>> Webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >>> >>> >>> >> >> Looks okay.? Can you add a test to verify this fix? > > Yes, I can add a test but it will be trivial. > > Thanks, > Serguei > > >> >> Mandy > From serguei.spitsyn at oracle.com Wed Jun 24 05:20:33 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 22:20:33 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> Message-ID: Please, hold on. The fix does not work for a number of the jdk_instrument tests. Thanks, Serguei On 6/23/20 19:05, serguei.spitsyn at oracle.com wrote: > Please, review a fix for: > ? https://bugs.openjdk.java.net/browse/JDK-8165276 > > > CSR draft (one CSR reviewer is needed before finalizing it): > ? https://bugs.openjdk.java.net/browse/JDK-8248189 > > > Webrev: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ > > > The java.lang.instrument spec: > https://docs.oracle.com/en/java/javase/14/docs/api/java.instrument/java/lang/instrument/package-summary.html > > > > Summary: > ? The java.lang.instrument spec clearly states: > ??? "The agent class must implement a public static premain method > ?? ? similar in principle to the main application entry point." > ? Current implementation of sun/instrument/InstrumentationImpl.java > ? allows the premain method be non-public which violates the spec. > ? This fix aligns the implementation with the spec. > > > Testing: > ? A mach5 run of jdk_instrument tests is in progress. > > > Thanks, > Serguei From serguei.spitsyn at oracle.com Wed Jun 24 05:37:17 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 22:37:17 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> Message-ID: <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> Hi Larry, Thank you for looking at this! On 6/23/20 21:32, Laurence Cable wrote: > should we not consider some form of depreciation here, and continue to > support non-public pre-main invocation for some time while issuing a > warning??? I'm not sure what form of deprecation we can use as it has to be a deprecation of a spec non-compliant implementation. :) > > while we have a sample of agents that will not be affected there may > be some agent that will fail terminally with this change There is more important problem now. A big number or j.l.instrument started to fail with my fix with messages like this: ?Exception in thread "main" java.lang.IllegalAccessException: class sun.instrument.InstrumentationImpl ?(in module java.instrument) cannot access a member of class SimpleAgent with modifiers "public static" I'm not sure if there can be a version of the Method.setAccessible(boolean flag) api that works for public methods only. One alternate approach is to relax the current spec to allow premain methods to be non-public. Thanks, Serguei > > just a thought > > - Larry > > On 6/23/20 8:42 PM, serguei.spitsyn at oracle.com wrote: >> Hi Mandy, >> >> Thank you for looking at this! >> >> >> On 6/23/20 20:21, Mandy Chung wrote: >>> Hi Serguei, >>> >>> I'm glad that you have a patch for this. >>> >>> On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: >>>> Please, review a fix for: >>>> https://bugs.openjdk.java.net/browse/JDK-8165276 >>>> >>>> >>>> CSR draft (one CSR reviewer is needed before finalizing it): >>>> https://bugs.openjdk.java.net/browse/JDK-8248189 >>>> >>> >>> The compatibility risk should be low (rather than minimal). >> >> I was not sure if it has to be minimal or low. >> Made it low now. >> >> >>> It says "All known Java agents define the premain method as public". >>> It'd be useful to add a comment in the JBS issue to list the Java >>> agents you have checked. >> >> I'm relying on the Alan's comments posted in the bug report: >> ?"I checked a number of popular java agents and their premain methods >> are public, I haven't found any where the premain was not public." >> ?"I think we should just bite the bullet on this so that the premain >> must be public as originally intended." >> >> Probably, my statement in the CSR is too strong. >> I've changed it to: >> ?"No popular Java agent that defines the premain method as a >> non-public was found." >> >> Does it looks better or you think we have to investigate existing >> popular Java agents? >> >> >>>> Webrev: >>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >>>> >>>> >>>> >>> >>> Looks okay.? Can you add a test to verify this fix? >> >> Yes, I can add a test but it will be trivial. >> >> Thanks, >> Serguei >> >> >>> >>> Mandy >> > From david.holmes at oracle.com Wed Jun 24 05:50:17 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2020 15:50:17 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> Message-ID: Hi Serguei, On 24/06/2020 3:37 pm, serguei.spitsyn at oracle.com wrote: > Hi Larry, > > Thank you for looking at this! > > > On 6/23/20 21:32, Laurence Cable wrote: >> should we not consider some form of depreciation here, and continue to >> support non-public pre-main invocation for some time while issuing a >> warning??? > > I'm not sure what form of deprecation we can use as it has to be a > deprecation of a spec non-compliant implementation. :) There's obviously no TCK test for this. :) You could just issue a warning if the premain is not public and say this will be disallowed in a future release; then disallow it in 17. But I'm not sure it's worth it. >> >> while we have a sample of agents that will not be affected there may >> be some agent that will fail terminally with this change > > There is more important problem now. > A big number or j.l.instrument started to fail with my fix with messages > like this: > ?Exception in thread "main" java.lang.IllegalAccessException: class > sun.instrument.InstrumentationImpl > ?(in module java.instrument) cannot access a member of class > SimpleAgent with modifiers "public static" It sounds like the use of setAccessible was hiding the need to disable some module related access checks. This will have a much bigger compatibility problem if agents with a public premain suddenly stop working. David ----- > I'm not sure if there can be a version of the > Method.setAccessible(boolean flag) api that works for public methods only. > One alternate approach is to relax the current spec to allow premain > methods to be non-public. > > Thanks, > Serguei > > >> >> just a thought >> >> - Larry >> >> On 6/23/20 8:42 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Mandy, >>> >>> Thank you for looking at this! >>> >>> >>> On 6/23/20 20:21, Mandy Chung wrote: >>>> Hi Serguei, >>>> >>>> I'm glad that you have a patch for this. >>>> >>>> On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: >>>>> Please, review a fix for: >>>>> https://bugs.openjdk.java.net/browse/JDK-8165276 >>>>> >>>>> >>>>> CSR draft (one CSR reviewer is needed before finalizing it): >>>>> https://bugs.openjdk.java.net/browse/JDK-8248189 >>>>> >>>> >>>> The compatibility risk should be low (rather than minimal). >>> >>> I was not sure if it has to be minimal or low. >>> Made it low now. >>> >>> >>>> It says "All known Java agents define the premain method as public". >>>> It'd be useful to add a comment in the JBS issue to list the Java >>>> agents you have checked. >>> >>> I'm relying on the Alan's comments posted in the bug report: >>> ?"I checked a number of popular java agents and their premain methods >>> are public, I haven't found any where the premain was not public." >>> ?"I think we should just bite the bullet on this so that the premain >>> must be public as originally intended." >>> >>> Probably, my statement in the CSR is too strong. >>> I've changed it to: >>> ?"No popular Java agent that defines the premain method as a >>> non-public was found." >>> >>> Does it looks better or you think we have to investigate existing >>> popular Java agents? >>> >>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >>>>> >>>>> >>>>> >>>> >>>> Looks okay.? Can you add a test to verify this fix? >>> >>> Yes, I can add a test but it will be trivial. >>> >>> Thanks, >>> Serguei >>> >>> >>>> >>>> Mandy >>> >> > From yasuenag at gmail.com Wed Jun 24 06:04:38 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 24 Jun 2020 15:04:38 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> Message-ID: <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> Hi Chris, Thanks you for explanation. Your change looks good (but "last java frame" would not be found in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) Thanks, Yasumasa On 2020/06/24 12:09, Chris Plummer wrote: > On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >> Hi Chris, >> >> Skillful troubleshooters who use jhsdb will aware this warnings, and they will take other appropriate methods. >> >> However, I'm not sure it is worth to continue to perform even if SA cannot get register values. >> >> For example, Linux AMD64 depends on RIP and RSP values to find top frame. >> According to your change, The caller of getThreadIntegerRegisterSet() has responsible for dealing with the set of null registers. However X86ThreadContext::data (it includes raw register values) would still be zero when it happens. > This is? what I intended to have happen. Just end up with a register set of all nulls. Then when stack walking code gets a null, it will revert to "last java frame" if available, otherwise no stack dump is done. >> >> So I think register holder (e.g. X86ThreadContext) should have tri-state (have registers, fail to get registers, not yet attempt to get registers). >> OTOH it might be over-engineering. What do you think? > Before implementing this I looked at the what would be the easier approach to get the desired effect of stack walking code simply failing over to using "last java frame", and decided the null set of registers was easiest. Other approaches involved more changes and impacted more files. > > thanks, > > Chris >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/24 3:16, Chris Plummer wrote: >>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>> Hi Chris, >>>> >>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>> Hi Yasumasa, >>>>> >>>>> ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. >>>> >>>> You are right, sorry. >>>> >>>> >>>>> I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. >>>> >>>> I'm not sure we can trust call stack when ptrace() returns any errors other than ESRCH even if "last java frame" is available. For example, don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack corruption) If so, it may lead to a wrong analysis for troubleshooter. >>>> I think it should be abort dumping call stack for its thread at least. >>> Hi Yasumasa, >>> >>> In general stack walking makes a best effort and can be wrong, even when not getting errors like this. For any actively executing thread SA needs to determine where the stack starts, with register contents being the starting point (SP, FP, and PC). These registers could contain anything, and SA makes a best effort to determine a current frame from them. However, the verification steps it takes are not 100% guaranteed, and can lead to an incorrect assumption of the current frame, which in turn can result in an exception later on when walking the stack. See JDK-8247641. >>> >>> Keep in mind that the WARNING message will always be there. This should be enough to put the troubleshooter on alert that the stack trace may not be accurate. I think it's better to make an attempt at a stack trace then to just abandon it and not attempt to do something that may be useful. >>> >>> thanks, >>> >>> Chris >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>> Hi Chris, >>>>>> >>>>>> I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. >>>>>> So I guess it is similar to "generic" error code. >>>>>> >>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>> >>>>>> According to manpage of ptrace(2), it might return errno other than ESRCH. >>>>>> For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. >>>>>> Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. >>>>>> >>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>> Hello, >>>>>>> >>>>>>> I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>>>>>> >>>>>>>>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>>>>>>>> >>>>>>>>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>>>>>>>> >>>>>>>>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>>>>>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>>>>>>>> >>>>>>>>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >>>>>>>> >>>>>>>> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >>>>>>>> >>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>> >>>>>>>> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>>>>>>>> >>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>>>> >>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> Chris >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > From Alan.Bateman at oracle.com Wed Jun 24 06:22:21 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 24 Jun 2020 07:22:21 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> Message-ID: <167b7e7a-7006-b54f-4fd5-b9296bd3c0f1@oracle.com> On 24/06/2020 06:50, David Holmes wrote: > > It sounds like the use of setAccessible was hiding the need to disable > some module related access checks. > > This will have a much bigger compatibility problem if agents with a > public premain suddenly stop working. I'm trying to understand what you mean in the final sentence as there is currently no support for compiling or deploying agents as named modules. It was prototyped during JDK 9 but hasn't been a priority to come back. If support were to be added then it might require the agent (in its module declare) to export the package with the entry point to java.instrument but this has no impact on the modifiers of the agent class or premain method, they would both required to include "public". -Alan. From serguei.spitsyn at oracle.com Wed Jun 24 06:24:18 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 23:24:18 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> Message-ID: <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> Hi David, On 6/23/20 22:50, David Holmes wrote: > Hi Serguei, > > On 24/06/2020 3:37 pm, serguei.spitsyn at oracle.com wrote: >> Hi Larry, >> >> Thank you for looking at this! >> >> >> On 6/23/20 21:32, Laurence Cable wrote: >>> should we not consider some form of depreciation here, and continue >>> to support non-public pre-main invocation for some time while >>> issuing a warning??? >> >> I'm not sure what form of deprecation we can use as it has to be a >> deprecation of a spec non-compliant implementation. :) > > There's obviously no TCK test for this. :) > > You could just issue a warning if the premain is not public and say > this will be disallowed in a future release; then disallow it in 17. > > But I'm not sure it's worth it. Yes, it is not clear it is worth it. >>> >>> while we have a sample of agents that will not be affected there may >>> be some agent that will fail terminally with this change >> >> There is more important problem now. >> A big number or j.l.instrument started to fail with my fix with >> messages like this: >> ??Exception in thread "main" java.lang.IllegalAccessException: class >> sun.instrument.InstrumentationImpl >> ??(in module java.instrument) cannot access a member of class >> SimpleAgent with modifiers "public static" > > It sounds like the use of setAccessible was hiding the need to disable > some module related access checks. It sounds so. In this particular case, the setAccessible was used just to disable some module related access checks. A side affect is that non-public premain methods became allowed as well. > > This will have a much bigger compatibility problem if agents with a > public premain suddenly stop working. One approach would be to continue using the setAccessible and add extra check for non-public premain method. Something like should probably work: ??????? if (!(Modifier.isPublic(m.getModifiers())) { ??????????? throw new IllegalAccessException("premain method is not public"); ??????? } Thanks, Serguei > > David > ----- > >> I'm not sure if there can be a version of the >> Method.setAccessible(boolean flag) api that works for public methods >> only. >> One alternate approach is to relax the current spec to allow premain >> methods to be non-public. >> >> Thanks, >> Serguei >> >> >>> >>> just a thought >>> >>> - Larry >>> >>> On 6/23/20 8:42 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Mandy, >>>> >>>> Thank you for looking at this! >>>> >>>> >>>> On 6/23/20 20:21, Mandy Chung wrote: >>>>> Hi Serguei, >>>>> >>>>> I'm glad that you have a patch for this. >>>>> >>>>> On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Please, review a fix for: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8165276 >>>>>> >>>>>> >>>>>> CSR draft (one CSR reviewer is needed before finalizing it): >>>>>> https://bugs.openjdk.java.net/browse/JDK-8248189 >>>>>> >>>>> >>>>> The compatibility risk should be low (rather than minimal). >>>> >>>> I was not sure if it has to be minimal or low. >>>> Made it low now. >>>> >>>> >>>>> It says "All known Java agents define the premain method as >>>>> public". It'd be useful to add a comment in the JBS issue to list >>>>> the Java agents you have checked. >>>> >>>> I'm relying on the Alan's comments posted in the bug report: >>>> ?"I checked a number of popular java agents and their premain >>>> methods are public, I haven't found any where the premain was not >>>> public." >>>> ?"I think we should just bite the bullet on this so that the >>>> premain must be public as originally intended." >>>> >>>> Probably, my statement in the CSR is too strong. >>>> I've changed it to: >>>> ?"No popular Java agent that defines the premain method as a >>>> non-public was found." >>>> >>>> Does it looks better or you think we have to investigate existing >>>> popular Java agents? >>>> >>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >>>>>> >>>>>> >>>>>> >>>>> >>>>> Looks okay.? Can you add a test to verify this fix? >>>> >>>> Yes, I can add a test but it will be trivial. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>>> >>>>> Mandy >>>> >>> >> From chris.plummer at oracle.com Wed Jun 24 06:32:35 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 23 Jun 2020 23:32:35 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> Message-ID: <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> Hi Yasumasa , I think LinuxAMD64CFrame is used for pstack and what I've been looking at has been jstack, and in particular AMD64CurrentFrameGuess, which does use "last java frame". I'm not sure why LinuxAMD64CFrame does not look at "last java frame". Maybe it should. thanks, Chris On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: > Hi Chris, > > Thanks you for explanation. > Your change looks good (but "last java frame" would not be found in > Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) > > > Thanks, > > Yasumasa > > > On 2020/06/24 12:09, Chris Plummer wrote: >> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>> Hi Chris, >>> >>> Skillful troubleshooters who use jhsdb will aware this warnings, and >>> they will take other appropriate methods. >>> >>> However, I'm not sure it is worth to continue to perform even if SA >>> cannot get register values. >>> >>> For example, Linux AMD64 depends on RIP and RSP values to find top >>> frame. >>> According to your change, The caller of >>> getThreadIntegerRegisterSet() has responsible for dealing with the >>> set of null registers. However X86ThreadContext::data (it includes >>> raw register values) would still be zero when it happens. >> This is? what I intended to have happen. Just end up with a register >> set of all nulls. Then when stack walking code gets a null, it will >> revert to "last java frame" if available, otherwise no stack dump is >> done. >>> >>> So I think register holder (e.g. X86ThreadContext) should have >>> tri-state (have registers, fail to get registers, not yet attempt to >>> get registers). >>> OTOH it might be over-engineering. What do you think? >> Before implementing this I looked at the what would be the easier >> approach to get the desired effect of stack walking code simply >> failing over to using "last java frame", and decided the null set of >> registers was easiest. Other approaches involved more changes and >> impacted more files. >> >> thanks, >> >> Chris >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/24 3:16, Chris Plummer wrote: >>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>> Hi Chris, >>>>> >>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>> Hi Yasumasa, >>>>>> >>>>>> ptrace is not used for core files, so the EFAULT for a bad core >>>>>> file is not a possibility. However, get_lwp_regs() does redirect >>>>>> to core_get_lwp_regs() for core files. It can fail, but the only >>>>>> reason it ever does is if the LWP can't be found in the core >>>>>> (which is never suppose to happen). I would think if this >>>>>> happened due to the core being truncated, SA would be blowing up >>>>>> all over the place with exceptions, probably before we ever get >>>>>> to this code, but in any cast what we do here wouldn't really >>>>>> make a difference. >>>>> >>>>> You are right, sorry. >>>>> >>>>> >>>>>> I'm not sure why you prefer an exception for errors other than >>>>>> ESRCH. Why should they be treated differently? >>>>>> getThreadIntegerRegisterSet0() is used for finding the current >>>>>> frame for stack tracing. With my changes any failure will result >>>>>> in deferring to "last java frame" if set, and otherwise just not >>>>>> produce a stack trace (and the WARNING will be present in the >>>>>> output). This seems preferable to completely abandoning any >>>>>> further thread stack tracking. >>>>> >>>>> I'm not sure we can trust call stack when ptrace() returns any >>>>> errors other than ESRCH even if "last java frame" is available. >>>>> For example, don't ptrace() return EFAULT or EIO when something >>>>> wrong? (e.g. stack corruption) If so, it may lead to a wrong >>>>> analysis for troubleshooter. >>>>> I think it should be abort dumping call stack for its thread at >>>>> least. >>>> Hi Yasumasa, >>>> >>>> In general stack walking makes a best effort and can be wrong, even >>>> when not getting errors like this. For any actively executing >>>> thread SA needs to determine where the stack starts, with register >>>> contents being the starting point (SP, FP, and PC). These registers >>>> could contain anything, and SA makes a best effort to determine a >>>> current frame from them. However, the verification steps it takes >>>> are not 100% guaranteed, and can lead to an incorrect assumption of >>>> the current frame, which in turn can result in an exception later >>>> on when walking the stack. See JDK-8247641. >>>> >>>> Keep in mind that the WARNING message will always be there. This >>>> should be enough to put the troubleshooter on alert that the stack >>>> trace may not be accurate. I think it's better to make an attempt >>>> at a stack trace then to just abandon it and not attempt to do >>>> something that may be useful. >>>> >>>> thanks, >>>> >>>> Chris >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>> >>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be set >>>>>>> to errno by default. >>>>>>> So I guess it is similar to "generic" error code. >>>>>>> >>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>> >>>>>>> According to manpage of ptrace(2), it might return errno other >>>>>>> than ESRCH. >>>>>>> For example, if we analyze broken core (e.g. the core was dumped >>>>>>> with disk full), we might get EFAULT. >>>>>>> Thus I prefer to handle ESRCH only in your patch, and also I >>>>>>> think SA should throw DebuggerException if other error is occurred. >>>>>>> >>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> I've? updated with webrev based on the new finding that a >>>>>>>> JavaThread cannot be on the ThreadList after its OS thread has >>>>>>>> been destroyed since the JavaThread removes itself from the >>>>>>>> ThreadList, and therefore must be running on its OS thread. The >>>>>>>> logic of the fix is unchanged from the first webrev, but I >>>>>>>> updated the comments to better reflect what is going on. I also >>>>>>>> updated the CR: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>> Hi Chris, >>>>>>>>> >>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a >>>>>>>>>>>>>>>>>> summary of changes in each file: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the more >>>>>>>>>>>>>>>>> general problem of not ensuring a thread is kept alive >>>>>>>>>>>>>>>>> whilst acting upon it. I don't know how the SA finds >>>>>>>>>>>>>>>>> these references to the threads it is going to >>>>>>>>>>>>>>>>> stackwalk, but is it possible to fix this via >>>>>>>>>>>>>>>>> appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM >>>>>>>>>>>>>>>> changes. For example, SA can't create a wrapper to a >>>>>>>>>>>>>>>> JavaThread, only to have the JavaThread be freed later >>>>>>>>>>>>>>>> on. It's just not possible. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for >>>>>>>>>>>>>>> which the native OS thread id is invalid? Any thread >>>>>>>>>>>>>>> found in _java_thread_list is either live or still to be >>>>>>>>>>>>>>> started. In the latter case the JavaThread->osThread >>>>>>>>>>>>>>> does not have its thread_id set yet. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> My assumption was that the JavaThread is in the process >>>>>>>>>>>>>> of being destroyed, and it has freed its OS thread but is >>>>>>>>>>>>>> itself still in the thread list. I did notice that the OS >>>>>>>>>>>>>> thread id being used looked to be in the range of thread >>>>>>>>>>>>>> id #'s you would expect for the running app, so that to >>>>>>>>>>>>>> me indicated it was once valid, but is no more. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Keep in mind that although hotspot may have >>>>>>>>>>>>>> synchronization code that prevents you from pulling a >>>>>>>>>>>>>> JavaThread off the thread list when it is in the process >>>>>>>>>>>>>> of being destroyed (I'm guessing it does), SA has no such >>>>>>>>>>>>>> protections. >>>>>>>>>>>>> >>>>>>>>>>>>> But you stated that once the SA has attached, the target >>>>>>>>>>>>> VM can't change. If the SA gets its set of thread from one >>>>>>>>>>>>> attach then tries to make queries about those threads in a >>>>>>>>>>>>> separate attach, then obviously it could be providing >>>>>>>>>>>>> garbage thread information. So you would need to >>>>>>>>>>>>> re-validate the JavaThread in the target VM before trying >>>>>>>>>>>>> to do anything with it. >>>>>>>>>>>> That's not what is going on here. It's attaching and doing >>>>>>>>>>>> a stack trace, which involves getting the thread list and >>>>>>>>>>>> iterating through all threads without detaching. >>>>>>>>>>> >>>>>>>>>>> Okay so I restate my original comment - all the JavaThreads >>>>>>>>>>> must be alive or not yet started, so how are you >>>>>>>>>>> encountering an invalid thread id? Any thread you find via >>>>>>>>>>> the ThreadsList can't have destroyed its osThread. In any >>>>>>>>>>> case the logic should be checking thread->osThread() for >>>>>>>>>>> NULL, and then osThread()->get_state() to ensure it is >= >>>>>>>>>>> INITIALIZED before using the thread_id(). >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> I chatted with Dan about this, and he said since the >>>>>>>>>> JavaThread is responsible for removing itself from the >>>>>>>>>> ThreadList, it is impossible to have a JavaThread still on >>>>>>>>>> the ThreadList, but without and underlying OS Thread. So I'm >>>>>>>>>> a bit perplexed as to how I can find a JavaThread on the >>>>>>>>>> ThreadList, but that results in ESRCH when trying to access >>>>>>>>>> the thread with ptrace. My only conclusion is that this >>>>>>>>>> failure is somehow spurious, and maybe the issue it just that >>>>>>>>>> the thread is in some temporary state that prevents its >>>>>>>>>> access. If so, I still think the approach I'm taking is the >>>>>>>>>> correct one, but the comments should be updated. >>>>>>>>> >>>>>>>>> ESRCH can have other meanings but I don't know enough about >>>>>>>>> the broader context to know whether they are applicable in >>>>>>>>> this case. >>>>>>>>> >>>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is not >>>>>>>>> currently being traced by the caller, or is not stopped >>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>> >>>>>>>>> I won't comment further on the fix/workaround as I don't know >>>>>>>>> the code. I'll leave that to other folk. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> I had one other finding. When this issue first turned up, it >>>>>>>>>> prevented the thread from getting a stack trace due to the >>>>>>>>>> exception being thrown. What I hadn't realize is that after >>>>>>>>>> fixing it to not throw an exception, which resulted in the >>>>>>>>>> stack walking code getting all nulls for register values, I >>>>>>>>>> actually started to see a stack trace printed: >>>>>>>>>> >>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 >>>>>>>>>> tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed >>>>>>>>>> for lwp (8089) >>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = >>>>>>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 >>>>>>>>>> (Interpreted frame) >>>>>>>>>> ??- >>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>>>>>>>> @bci=108, line=216 (Interpreted frame) >>>>>>>>>> ??- >>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>>>>>>>> @bci=4 (Interpreted frame) >>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>>>>> >>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had >>>>>>>>>> enabled, and it indicates that the stack walking code is >>>>>>>>>> using the "last java frame" setting, which it will do if >>>>>>>>>> current registers values don't indicate a valid frame (as >>>>>>>>>> would be the case if sp was null). I had previously assumed >>>>>>>>>> that without an underling valid LWP, there would be no stack >>>>>>>>>> trace. Given that there is one, there must be a valid LWP. >>>>>>>>>> Otherwise I don't see how the stack could have been walked. >>>>>>>>>> That's another indication that the ptrace failure is spurious >>>>>>>>>> in nature. >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Also, even if you are using something like clhsdb to issue >>>>>>>>>>>> commands on addresses, if the address is no longer valid >>>>>>>>>>>> for the command you are executing, then you would get the >>>>>>>>>>>> appropriate error when there is an attempt to create a >>>>>>>>>>>> wrapper for it. I don't know of any command that operates >>>>>>>>>>>> directly on a JavaThread, but I think there are for >>>>>>>>>>>> InstanceKlass. So if you remembered the address of an >>>>>>>>>>>> InstanceKlass, and then reattached and tried a command that >>>>>>>>>>>> takes an InstanceKlass address, you would get an exception >>>>>>>>>>>> when SA tries to create the wrapper for the InsanceKlass if >>>>>>>>>>>> it were no longer a valid address for one. >>>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS >>>>>>>>>>>>>>>>>> ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being null >>>>>>>>>>>>>>>>>> due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Fix issue with >>>>>>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>>>>>>>>>>>> appearing twice when printing the exception. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From Alan.Bateman at oracle.com Wed Jun 24 06:33:12 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 24 Jun 2020 07:33:12 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> Message-ID: <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> On 24/06/2020 07:24, serguei.spitsyn at oracle.com wrote: > : > > One approach would be to continue using the setAccessible and add > extra check for non-public premain method. > Something like should probably work: > ??????? if (!(Modifier.isPublic(m.getModifiers())) { > ??????????? throw new IllegalAccessException("premain method is not > public"); > ??????? } The equivalent with the java launcher is: $ java Foo.java error: 'main' method is not declared 'public static' $ javac Foo.java $ java Foo Error: Main method not found in class Foo, please define the main method as: ?? public static void main(String[] args) So have the exception message provide a helpful message will be useful in the event that someone tries to deploy that doesn't have a public premain method. BTW: Have you checked the agentmain case too? -Alan. From suenaga at oss.nttdata.com Wed Jun 24 06:50:19 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 24 Jun 2020 15:50:19 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake Message-ID: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> Hi all, Please review this change: JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ This change replace following VM operations to direct handshake. - VM_GetFrameCount (GetFrameCount()) - VM_GetFrameLocation (GetFrameLocation()) - VM_GetThreadListStackTraces (GetThreadListStackTrace()) - VM_GetCurrentLocation GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns k/jdwp. Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. Thanks, Yasumasa From serguei.spitsyn at oracle.com Wed Jun 24 06:55:36 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 23 Jun 2020 23:55:36 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> Message-ID: <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> On 6/23/20 23:33, Alan Bateman wrote: > > > On 24/06/2020 07:24, serguei.spitsyn at oracle.com wrote: >> : >> >> One approach would be to continue using the setAccessible and add >> extra check for non-public premain method. >> Something like should probably work: >> ??????? if (!(Modifier.isPublic(m.getModifiers())) { >> ??????????? throw new IllegalAccessException("premain method is not >> public"); >> ??????? } > The equivalent with the java launcher is: > > $ java Foo.java > error: 'main' method is not declared 'public static' > > $ javac Foo.java > $ java Foo > Error: Main method not found in class Foo, please define the main > method as: > ?? public static void main(String[] args) > > So have the exception message provide a helpful message will be useful > in the event that someone tries to deploy that doesn't have a public > premain method. Thank you for the example. Yes, I'm working on a helpful message and was thinking to use the Reflection method: ?IllegalAccessException newIllegalAccessException(Class currentClass, ???????????????????????? ? ? ? ? ? ? ? ? ? ? ? ?? Class memberClass, ???????????????????????? ? ? ? ? ? ? ? ? ? ? ? ?? Class targetClass, ????????????????????????????????????????????????? int modifiers); > > BTW: Have you checked the agentmain case too? The InstrumentationImpl::loadClassAndStartAgent() is common for both premain and agentmain. I'll update the CSR and my new test to cover the agentmain as well. Thanks, Serguei > > -Alan. > > > > > From yasuenag at gmail.com Wed Jun 24 07:01:51 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 24 Jun 2020 16:01:51 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> Message-ID: <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> On 2020/06/24 15:32, Chris Plummer wrote: > Hi Yasumasa , > > I think LinuxAMD64CFrame is used for pstack and what I've been looking at has been jstack, and in particular AMD64CurrentFrameGuess, which does use "last java frame". I'm not sure why LinuxAMD64CFrame does not look at "last java frame". Maybe it should. I thought both pattern (jstack, mixed stack) for this change. As you know, mixed jstack (jstack --mixed) attempt to find top of native stack via LinuxAMD64CFrame, register values are needed for it (so it depends on ptrace() call). So I guess mixed mode jstack (jhsdb jstack --mixed) would not show any stacks (cannot find "last java frame"). Thanks, Yasumasa > thanks, > > Chris > > On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: >> Hi Chris, >> >> Thanks you for explanation. >> Your change looks good (but "last java frame" would not be found in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/24 12:09, Chris Plummer wrote: >>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>>> Hi Chris, >>>> >>>> Skillful troubleshooters who use jhsdb will aware this warnings, and they will take other appropriate methods. >>>> >>>> However, I'm not sure it is worth to continue to perform even if SA cannot get register values. >>>> >>>> For example, Linux AMD64 depends on RIP and RSP values to find top frame. >>>> According to your change, The caller of getThreadIntegerRegisterSet() has responsible for dealing with the set of null registers. However X86ThreadContext::data (it includes raw register values) would still be zero when it happens. >>> This is? what I intended to have happen. Just end up with a register set of all nulls. Then when stack walking code gets a null, it will revert to "last java frame" if available, otherwise no stack dump is done. >>>> >>>> So I think register holder (e.g. X86ThreadContext) should have tri-state (have registers, fail to get registers, not yet attempt to get registers). >>>> OTOH it might be over-engineering. What do you think? >>> Before implementing this I looked at the what would be the easier approach to get the desired effect of stack walking code simply failing over to using "last java frame", and decided the null set of registers was easiest. Other approaches involved more changes and impacted more files. >>> >>> thanks, >>> >>> Chris >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/24 3:16, Chris Plummer wrote: >>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>>> Hi Chris, >>>>>> >>>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. >>>>>> >>>>>> You are right, sorry. >>>>>> >>>>>> >>>>>>> I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. >>>>>> >>>>>> I'm not sure we can trust call stack when ptrace() returns any errors other than ESRCH even if "last java frame" is available. For example, don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack corruption) If so, it may lead to a wrong analysis for troubleshooter. >>>>>> I think it should be abort dumping call stack for its thread at least. >>>>> Hi Yasumasa, >>>>> >>>>> In general stack walking makes a best effort and can be wrong, even when not getting errors like this. For any actively executing thread SA needs to determine where the stack starts, with register contents being the starting point (SP, FP, and PC). These registers could contain anything, and SA makes a best effort to determine a current frame from them. However, the verification steps it takes are not 100% guaranteed, and can lead to an incorrect assumption of the current frame, which in turn can result in an exception later on when walking the stack. See JDK-8247641. >>>>> >>>>> Keep in mind that the WARNING message will always be there. This should be enough to put the troubleshooter on alert that the stack trace may not be accurate. I think it's better to make an attempt at a stack trace then to just abandon it and not attempt to do something that may be useful. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. >>>>>>>> So I guess it is similar to "generic" error code. >>>>>>>> >>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>>> >>>>>>>> According to manpage of ptrace(2), it might return errno other than ESRCH. >>>>>>>> For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. >>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. >>>>>>>> >>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: >>>>>>>>> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>>> Hi Chris, >>>>>>>>>> >>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>>>>>>>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>>>>>>>>>> >>>>>>>>>>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >>>>>>>>>> >>>>>>>>>> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >>>>>>>>>> >>>>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >>>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>>> >>>>>>>>>> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>>>>>>>>>> >>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>>>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>>>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>>>>>> >>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>>>>>>>>>> >>>>>>>>>>>>> Chris >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > From Alan.Bateman at oracle.com Wed Jun 24 07:33:27 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 24 Jun 2020 08:33:27 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> Message-ID: On 24/06/2020 07:55, serguei.spitsyn at oracle.com wrote: > > Thank you for the example. > Yes, I'm working on a helpful message and was thinking to use the > Reflection method: > ?IllegalAccessException newIllegalAccessException(Class currentClass, > ???????????????????????? ? ? ? ? ? ? ? ? ? ? ? ?? Class memberClass, > ???????????????????????? ? ? ? ? ? ? ? ? ? ? ? ?? Class targetClass, > ????????????????????????????????????????????????? int modifiers); m.canAccess(null) will tell you if the premain method is accessible. If it returns false then you can thrwo IAE with a useful message. -Alan. From david.holmes at oracle.com Wed Jun 24 09:22:27 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2020 19:22:27 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> Message-ID: <81ccc4d6-6328-959a-4e0a-36b07eea68a2@oracle.com> On 24/06/2020 4:24 pm, serguei.spitsyn at oracle.com wrote: > Hi David, > > > On 6/23/20 22:50, David Holmes wrote: >> Hi Serguei, >> >> On 24/06/2020 3:37 pm, serguei.spitsyn at oracle.com wrote: >>> Hi Larry, >>> >>> Thank you for looking at this! >>> >>> >>> On 6/23/20 21:32, Laurence Cable wrote: >>>> should we not consider some form of depreciation here, and continue >>>> to support non-public pre-main invocation for some time while >>>> issuing a warning??? >>> >>> I'm not sure what form of deprecation we can use as it has to be a >>> deprecation of a spec non-compliant implementation. :) >> >> There's obviously no TCK test for this. :) >> >> You could just issue a warning if the premain is not public and say >> this will be disallowed in a future release; then disallow it in 17. >> >> But I'm not sure it's worth it. > > Yes, it is not clear it is worth it. > > >>>> >>>> while we have a sample of agents that will not be affected there may >>>> be some agent that will fail terminally with this change >>> >>> There is more important problem now. >>> A big number or j.l.instrument started to fail with my fix with >>> messages like this: >>> ??Exception in thread "main" java.lang.IllegalAccessException: class >>> sun.instrument.InstrumentationImpl >>> ??(in module java.instrument) cannot access a member of class >>> SimpleAgent with modifiers "public static" >> >> It sounds like the use of setAccessible was hiding the need to disable >> some module related access checks. > > It sounds so. > In this particular case, the setAccessible was used just to disable some > module related access checks. > A side affect is that non-public premain methods became allowed as well. The other way around. The setAccessible has been there long before the module system existed, to allow a non-public premain. As a side-effect when the module system came along it also disabled some module access check (I'm not sure exactly what). >> >> This will have a much bigger compatibility problem if agents with a >> public premain suddenly stop working. > > One approach would be to continue using the setAccessible and add extra > check for non-public premain method. > Something like should probably work: > ??????? if (!(Modifier.isPublic(m.getModifiers())) { > ??????????? throw new IllegalAccessException("premain method is not > public"); > ??????? } Yes an explicit modifier check would seem best. Thanks, David ----- > Thanks, > Serguei > >> >> David >> ----- >> >>> I'm not sure if there can be a version of the >>> Method.setAccessible(boolean flag) api that works for public methods >>> only. >>> One alternate approach is to relax the current spec to allow premain >>> methods to be non-public. >>> >>> Thanks, >>> Serguei >>> >>> >>>> >>>> just a thought >>>> >>>> - Larry >>>> >>>> On 6/23/20 8:42 PM, serguei.spitsyn at oracle.com wrote: >>>>> Hi Mandy, >>>>> >>>>> Thank you for looking at this! >>>>> >>>>> >>>>> On 6/23/20 20:21, Mandy Chung wrote: >>>>>> Hi Serguei, >>>>>> >>>>>> I'm glad that you have a patch for this. >>>>>> >>>>>> On 6/23/20 7:05 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Please, review a fix for: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8165276 >>>>>>> >>>>>>> >>>>>>> CSR draft (one CSR reviewer is needed before finalizing it): >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248189 >>>>>>> >>>>>> >>>>>> The compatibility risk should be low (rather than minimal). >>>>> >>>>> I was not sure if it has to be minimal or low. >>>>> Made it low now. >>>>> >>>>> >>>>>> It says "All known Java agents define the premain method as >>>>>> public". It'd be useful to add a comment in the JBS issue to list >>>>>> the Java agents you have checked. >>>>> >>>>> I'm relying on the Alan's comments posted in the bug report: >>>>> ?"I checked a number of popular java agents and their premain >>>>> methods are public, I haven't found any where the premain was not >>>>> public." >>>>> ?"I think we should just bite the bullet on this so that the >>>>> premain must be public as originally intended." >>>>> >>>>> Probably, my statement in the CSR is too strong. >>>>> I've changed it to: >>>>> ?"No popular Java agent that defines the premain method as a >>>>> non-public was found." >>>>> >>>>> Does it looks better or you think we have to investigate existing >>>>> popular Java agents? >>>>> >>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.1/ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> Looks okay.? Can you add a test to verify this fix? >>>>> >>>>> Yes, I can add a test but it will be trivial. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>>> >>>>>> Mandy >>>>> >>>> >>> > From david.holmes at oracle.com Wed Jun 24 09:24:36 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2020 19:24:36 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <167b7e7a-7006-b54f-4fd5-b9296bd3c0f1@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <167b7e7a-7006-b54f-4fd5-b9296bd3c0f1@oracle.com> Message-ID: On 24/06/2020 4:22 pm, Alan Bateman wrote: > > > On 24/06/2020 06:50, David Holmes wrote: >> >> It sounds like the use of setAccessible was hiding the need to disable >> some module related access checks. >> >> This will have a much bigger compatibility problem if agents with a >> public premain suddenly stop working. > I'm trying to understand what you mean in the final sentence as there is > currently no support for compiling or deploying agents as named modules. IIUC the tests do not use named modules. Serguei removed the setAccessible code and got an error because the agent code, even though public, was not accessible to the class in the java.instrument module. So if that test agent represented a real non-modular agent in the field, this change would break such agents. David ----- > It was prototyped during JDK 9 but hasn't been a priority to come back. > If support were to be added then it might require the agent (in its > module declare) to export the package with the entry point to > java.instrument but this has no impact on the modifiers of the agent > class or premain method, they would both required to include "public". > > -Alan. From david.holmes at oracle.com Wed Jun 24 09:26:45 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2020 19:26:45 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> Message-ID: <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> On 24/06/2020 5:33 pm, Alan Bateman wrote: > On 24/06/2020 07:55, serguei.spitsyn at oracle.com wrote: >> >> Thank you for the example. >> Yes, I'm working on a helpful message and was thinking to use the >> Reflection method: >> ?IllegalAccessException newIllegalAccessException(Class currentClass, >> ???????????????????????? ? ? ? ? ? ? ? ? ? ? ? ?? Class memberClass, >> ???????????????????????? ? ? ? ? ? ? ? ? ? ? ? ?? Class targetClass, >> ????????????????????????????????????????????????? int modifiers); > m.canAccess(null) will tell you if the premain method is accessible. If > it returns false then you can thrwo IAE with a useful message. If we call setAccessible(true) then canAccess will return true. If we don't call setAccessible(true) then canAccess will return false due to the problem Serguei reported. David > > -Alan. From Alan.Bateman at oracle.com Wed Jun 24 09:36:10 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 24 Jun 2020 10:36:10 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <81ccc4d6-6328-959a-4e0a-36b07eea68a2@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <81ccc4d6-6328-959a-4e0a-36b07eea68a2@oracle.com> Message-ID: <10857bef-a1ba-38d3-9b29-0a046462be8b@oracle.com> On 24/06/2020 10:22, David Holmes wrote: > > The other way around. The setAccessible has been there long before the > module system existed, to allow a non-public premain. As a side-effect > when the module system came along it also disabled some module access > check (I'm not sure exactly what). This issue is nothing to do with modules. Instead I think the bug in the JPLIS agent was discovered when core reflection didn't initially assume readability so required changes to allow code in java.instrument to invoke the agent's premain method. -Alan From Alan.Bateman at oracle.com Wed Jun 24 09:43:30 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 24 Jun 2020 10:43:30 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> Message-ID: <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> On 24/06/2020 10:26, David Holmes wrote: > > > If we call setAccessible(true) then canAccess will return true. Sure but the bug fix will remove the setAccessible(true) so canAccess will do what he wants without needing to catch the exception. This is of course all a side show to the important issue of aligning the spec and implementation. -Alan From david.holmes at oracle.com Wed Jun 24 09:57:56 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2020 19:57:56 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> Message-ID: On 24/06/2020 7:43 pm, Alan Bateman wrote: > On 24/06/2020 10:26, David Holmes wrote: >> >> >> If we call setAccessible(true) then canAccess will return true. > Sure but the bug fix will remove the setAccessible(true) so canAccess > will do what he wants without needing to catch the exception. This is of > course all a side show to the important issue of aligning the spec and > implementation. But you are ignoring my next statement. If we remove the setAccessible(true) then the premain method will not be accessible as Serguei reported. Exception in thread "main" java.lang.IllegalAccessException: class sun.instrument.InstrumentationImpl (in module java.instrument) cannot access a member of class SimpleAgent with modifiers "public static" I feel we are talking past each other on this issue with regards to the IllegalAcessError that comes from the module system. David ----- > -Alan From Alan.Bateman at oracle.com Wed Jun 24 10:14:50 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 24 Jun 2020 11:14:50 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> Message-ID: <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> On 24/06/2020 10:57, David Holmes wrote: > > But you are ignoring my next statement. If we remove the > setAccessible(true) then the premain method will not be accessible as > Serguei reported. > > Exception in thread "main" java.lang.IllegalAccessException: class > sun.instrument.InstrumentationImpl > ? (in module java.instrument) cannot access a member of class > SimpleAgent with modifiers "public static" > > I feel we are talking past each other on this issue with regards to > the IllegalAcessError that comes from the module system. This is nothing to do with the module system. If you drop the setAccessible(true) from JDK 6 or JDK 8 then you'll also get IllegalAccessException when the member is not accessible. I think the main thing that needs to be agreed here is whether to fix the bug or change the spec. My view is that fixing the bug should be low risk because (a) I've never seen an agent with a non-public premain method, and (b) Agents typically have to update or release frequently because of updates to the class file version. So yes, it would be a behavioral compatibility issue taht requires CSR approval and requires follow-up release notes to document the change. -Alan From david.holmes at oracle.com Wed Jun 24 12:25:51 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 24 Jun 2020 22:25:51 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> Message-ID: <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> On 24/06/2020 8:14 pm, Alan Bateman wrote: > On 24/06/2020 10:57, David Holmes wrote: >> >> But you are ignoring my next statement. If we remove the >> setAccessible(true) then the premain method will not be accessible as >> Serguei reported. >> >> Exception in thread "main" java.lang.IllegalAccessException: class >> sun.instrument.InstrumentationImpl >> ? (in module java.instrument) cannot access a member of class >> SimpleAgent with modifiers "public static" >> >> I feel we are talking past each other on this issue with regards to >> the IllegalAcessError that comes from the module system. > This is nothing to do with the module system. If you drop the > setAccessible(true) from JDK 6 or JDK 8 then you'll also get > IllegalAccessException when the member is not accessible. Ah! The test class SimpleAgent is what is not public. That seems a bug in the test. Sorry for the confusion. David ----- > I think the main thing that needs to be agreed here is whether to fix > the bug or change the spec. My view is that fixing the bug should be low > risk because (a) I've never seen an agent with a non-public premain > method, and (b) Agents typically have to update or release frequently > because of updates to the class file version. So yes, it would be a > behavioral compatibility issue taht requires CSR approval and requires > follow-up release notes to document the change. > > -Alan From daniel.daugherty at oracle.com Wed Jun 24 16:07:53 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 24 Jun 2020 12:07:53 -0400 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: <4e550a6a-2a04-bcec-f25f-2b9bd0347166@oracle.com> References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> <1c642a24-d994-e34a-6af8-61c4dab7709d@oracle.com> <4315bde0-6422-34dd-2795-9596e417d534@oracle.com> <4e550a6a-2a04-bcec-f25f-2b9bd0347166@oracle.com> Message-ID: <2a1abfe1-61bd-e0d5-5ef7-886278d10aec@oracle.com> On 6/23/20 11:04 PM, Chris Plummer wrote: > On 6/23/20 7:07 PM, Daniel D. Daugherty wrote: >> Just one more comment on this part: >> >>> >??? L220: System.out.println("CurrentFrameGuess: choosing >>> interpreter frame: sp = " + >>> >??? L221: ???????????????????????????????? spFound + ", fpFound = " >>> + fp + ", pcFound = " + pc); >>> >??????? This debug output doesn't make sense to me: >>> > >>> >??????????? "sp = " label and 'spFound' value >>> >??????????? "fpFound = " label and 'fp' value >>> >??????????? "pcFound = " label and 'pc' value >>> >>> ??????? but I may not have enough context... >>> From the point of view of the person reading the output, they want >>> to know the values for sp, fp, and pc. But within the code these >>> values are stored in the "found" variables. >> >> In that case, the code is wrong for the 'fp' and 'pc' outputs >> since you changed the labels and not the variables. > Yes, you are correct. I'll fix the output for fp and pc. I don't need another webrev for any of my comments. Thanks for your work on making SA more stable. The CI appreciates it! Dan > > thanks, > > Chris >> >> Dan >> >> >> >> On 6/23/20 7:28 PM, Chris Plummer wrote: >>> On 6/23/20 2:44 PM, Daniel D. Daugherty wrote: >>>> On 6/18/20 8:54 PM, Chris Plummer wrote: >>>>> [I've added runtime-dev to this SA review since understanding >>>>> interpreter invokes (code generated by >>>>> TemplateInterpreterGenerator::generate_normal_entry()) and stack >>>>> walking is probably more important than understanding SA.] >>>>> >>>>> Hello, >>>>> >>>>> Please help review the following: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8244383 >>>>> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html >>>> >>> Thanks for helping! >>>> Sorry for the delay in reviewing this one. I've come back to it a >>>> couple >>>> of times because code like this is very hard to review. >>>> >>>> >>>> General comment: >>>> ??? This fix reminds of the crazy things that AsyncGetCallTrace has to >>>> ??? do in order to gather call trace data. I'm guessing that SA is >>>> ??? attaching to the VM in an asynchronous manner and that's why it >>>> ??? can observe things like partially constructed frames. If that's a >>>> ??? correct guess, then how is SA stopping/suspending the threads? >>>> ??? I'm just curious here. >>> On linux SA uses ptrace. I'm not familiar with the details of how it >>> works. I'm not sure where ptrace allows suspends to happen, but >>> certainly it has no knowledge of JVM safepoints or other >>> synchronization that the JVM does. So from the JVM and SA point of >>> view the suspend can happen at any arbitrary JVM instruction. >>> >>> From what I can gather, PTRACE_ATTACH suspends the entire process, >>> so that means all threads are suspended once you attach. However, >>> PTRACE_GETREGS can be called on individual threads (LWPs), but I >>> don't see any indication in the SA code that you need to attach to >>> each LWP first. >>>> >>>> ??? Or this might be a case where SA is examining a core file in >>>> ??? which case the various threads stacks are not necessarily at >>>> ??? good/safepoint-safe pause points. >>> For this bug and test it's a live process, but I think the bug being >>> addressed here can happen just as well with a core file. >>> Unfortunately we have very little core file testing support. I'm >>> actually in the middle of addressing that right now. >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >>>> >>>> ??? No comments. >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >>>> >>>> ??? No comments. >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >>>> >>>> ??? L104: ??? // two locations, then we canot determine the frame. >>>> ??????? typo: s/canot/cannot/ >>> ok >>>> >>>> ??? L127: ??? // it's validity will help us determine the state of >>>> the new frame push. >>>> ??????? typo: s/it's/its/ >>> ok >>>> >>>> ??? L148: ??????? System.out.println("CurrentFrameGuess: frame >>>> pushed but not initaliazed."); >>>> ??????? typo: s/initaliazed/initialized/ >>> ok >>>> >>>> ??? L220: System.out.println("CurrentFrameGuess: choosing >>>> interpreter frame: sp = " + >>>> ??? L221: ???????????????????????????????? spFound + ", fpFound = " >>>> + fp + ", pcFound = " + pc); >>>> ??????? This debug output doesn't make sense to me: >>>> >>>> ??????????? "sp = " label and 'spFound' value >>>> ??????????? "fpFound = " label and 'fp' value >>>> ??????????? "pcFound = " label and 'pc' value >>>> >>>> ??????? but I may not have enough context... >>> From the point of view of the person reading the output, they want >>> to know the values for sp, fp, and pc. But within the code these >>> values are stored in the "found" variables. >>>> >>>> With code like this, it's really hard to figure out if you've covered >>>> all the cases unless you've been in the observer seat yourself and >>>> even then your test runs may not hit all the possible cases. All you >>>> can really do is start with a set of adaptive changes, run with those >>>> for a while and tweak them as you gather more observations. >>> Yes, and I know there is still a very tiny gap or two in coverage >>> that are maybe one or two instructions long, but they aren't worth >>> dealing with. This bug was already very rare, and with the fixes >>> I've done I don't see any issues now. SA is a debugger, so >>> perfection in this regard is not expected. >>>> >>>> Chris, nice job with this bit of insanity! >>> Thanks! I mostly stuck with this one to help with my SA expertise. >>> Otherwise it wouldn't have been worth the time. >>> >>> Chris >>>> >>>> Thumbs up! >>>> >>>> Dan >>>> >>>> >>>> >>>>> >>>>> The crux of the bug is when doing stack walking the topmost frame >>>>> is in an inconsistent state because we are in the middle of >>>>> pushing a new interpreter frame. Basically we are executing code >>>>> generated by >>>>> TemplateInterpreterGenerator::generate_normal_entry(). Since the >>>>> PC register is in this code, SA assumes the topmost frame is an >>>>> interpreter frame. >>>>> >>>>> The first issue with this interpreter frame assumption is if we >>>>> haven't actually pushed the frame yet, then the current frame is >>>>> the caller's frame, and could be compiled. But since SA thinks >>>>> it's interpreted, later on it tries to convert the frame->bcp to a >>>>> BCI, but frame->bcp is only valid for interpreter frames. Thus the >>>>> "illegal BCI" failures. If the previous frame happened to be >>>>> interpreted, then the existing SA code works fine. >>>>> >>>>> The other state of frame pushing that was problematic was when the >>>>> new frame had been pushed, but frame->method and frame->bcp were >>>>> not setup yet. This also would lead to "illegal BCI" later on >>>>> because garbage would be stored in these locations. >>>>> >>>>> Fixing the above problems requires trying to determine the state >>>>> of the frame push through a series of checks, and then adapting >>>>> what is considered to be the current frame based on the outcome of >>>>> the checks. The first things checked is that frame->method is >>>>> valid (we can successfully instantiate a wrapper for the Method* >>>>> without failure) and that frame->bcp is within the method. If both >>>>> these pass then we can use the frame as-is. >>>>> >>>>> If the above checks fail, then we try to determine whether the >>>>> issue is that the frame is not yet pushed and the current frame is >>>>> actually compiled, or the frame has been pushed but not yet >>>>> initialized. This is done by first getting the return address from >>>>> the stack or RAX (it's location depends on how far along we are in >>>>> the entry code) and comparing this to what is stored in >>>>> frame->return_addr. If they are the same, then we have pushed the >>>>> frame but not yet initialized it. In this case we use the previous >>>>> frame (senderSP() and senderFP()) as the current frame since the >>>>> current frame is not yet initialized. If the return address check >>>>> fails, then we assume the new frame is not yet pushed, and and >>>>> treat the current frame as compiled, even though PC points into >>>>> the interpreter (we replace PC with RAX in this case). >>>>> >>>>> Comments in the code pretty well explain all the above, so it is >>>>> probably easier to follow the logic in the code along with the >>>>> comments rather than apply my above description to the code. >>>>> >>>>> I should add that it's very rare that we ever get into this >>>>> special error handling code. This bug was very hard to reproduce >>>>> initially. I was only able to make progress with reproducing and >>>>> debugging by inserting delay loops in various spots in the code >>>>> generated by >>>>> TemplateInterpreterGenerator::generate_normal_entry(). By doing >>>>> this I was able to reproduce the issue quite easily and hit all >>>>> the logic in the new code I've added. >>>>> >>>>> The fix is basically entirely contained within >>>>> AMD64CurrentFrameGuess.java. The rest of the changes are minor: >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >>>>> >>>>> -Main fix for CR >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >>>>> >>>>> -Added getInterpreterFrameBCP(), which is now needed by >>>>> AMD64CurrentFrameGuess.java >>>>> -I also simplified some code by using the existing >>>>> getInterpreterFrameMethod() >>>>> ?rather than replicating inline what it does. >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >>>>> >>>>> -I noticed the windows version of this code had some extra checks >>>>> that were missing >>>>> ?from the bsd version. I then looked at the linux version, but it >>>>> had been heavily modified >>>>> ?a short while back to leverage DWARF info to determine frames. So >>>>> I looked at the previous >>>>> ?rev and it too had these extra checks. I decided to add them to >>>>> the BSD port. I'm not sure >>>>> ?if it helps at all, but it certainly doesn't seem to do any harm. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>> >>> >>> >> > > From chris.plummer at oracle.com Wed Jun 24 18:22:51 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 24 Jun 2020 11:22:51 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <39c6cc13-468c-3d72-8591-3bd9e71d5289@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> Message-ID: On 6/24/20 12:01 AM, Yasumasa Suenaga wrote: > On 2020/06/24 15:32, Chris Plummer wrote: >> Hi Yasumasa , >> >> I think LinuxAMD64CFrame is used for pstack and what I've been >> looking at has been jstack, and in particular AMD64CurrentFrameGuess, >> which does use "last java frame". I'm not sure why LinuxAMD64CFrame >> does not look at "last java frame". Maybe it should. > > I thought both pattern (jstack, mixed stack) for this change. > As you know, mixed jstack (jstack --mixed) attempt to find top of > native stack via LinuxAMD64CFrame, register values are needed for it > (so it depends on ptrace() call). So I guess mixed mode jstack (jhsdb > jstack --mixed) would not show any stacks (cannot find "last java > frame"). Hi Yasumasa, I should have been more clear on what I meant by jstack and pstack. For jstack I meant using StackTrace.java, which is what you get by default with "jhsdb jstack" and also the clhsdb jstack command. For pstack I meant PStack.java, which is what you get with "jhsdb jstack --mixed" or the clhsdb pstack command. So this CR impacts both types of stack traces in that they will get null registers when the the lower level API fails to get the register set. For StackTrace.java it will then defer to "last java frame" if available. For PStack.java it will not, and will always result in no stack trace. The code of interest is here: ?????? AMD64ThreadContext context = (AMD64ThreadContext) thread.getContext(); ?????? Address pc? = context.getRegisterAsAddress(AMD64ThreadContext.RIP); ?????? if (pc == null) return null; ?????? return LinuxAMD64CFrame.getTopFrame(dbg, pc, context); So the question is should "last java frame" be used if pc == null. If so, then getTopFrame() would also need to be modified to use "last java frame" when fetching RBP. thanks, Chris > > > Thanks, > > Yasumasa > > >> thanks, >> >> Chris >> >> On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: >>> Hi Chris, >>> >>> Thanks you for explanation. >>> Your change looks good (but "last java frame" would not be found in >>> Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/24 12:09, Chris Plummer wrote: >>>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>>>> Hi Chris, >>>>> >>>>> Skillful troubleshooters who use jhsdb will aware this warnings, >>>>> and they will take other appropriate methods. >>>>> >>>>> However, I'm not sure it is worth to continue to perform even if >>>>> SA cannot get register values. >>>>> >>>>> For example, Linux AMD64 depends on RIP and RSP values to find top >>>>> frame. >>>>> According to your change, The caller of >>>>> getThreadIntegerRegisterSet() has responsible for dealing with the >>>>> set of null registers. However X86ThreadContext::data (it includes >>>>> raw register values) would still be zero when it happens. >>>> This is? what I intended to have happen. Just end up with a >>>> register set of all nulls. Then when stack walking code gets a >>>> null, it will revert to "last java frame" if available, otherwise >>>> no stack dump is done. >>>>> >>>>> So I think register holder (e.g. X86ThreadContext) should have >>>>> tri-state (have registers, fail to get registers, not yet attempt >>>>> to get registers). >>>>> OTOH it might be over-engineering. What do you think? >>>> Before implementing this I looked at the what would be the easier >>>> approach to get the desired effect of stack walking code simply >>>> failing over to using "last java frame", and decided the null set >>>> of registers was easiest. Other approaches involved more changes >>>> and impacted more files. >>>> >>>> thanks, >>>> >>>> Chris >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/24 3:16, Chris Plummer wrote: >>>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> ptrace is not used for core files, so the EFAULT for a bad core >>>>>>>> file is not a possibility. However, get_lwp_regs() does >>>>>>>> redirect to core_get_lwp_regs() for core files. It can fail, >>>>>>>> but the only reason it ever does is if the LWP can't be found >>>>>>>> in the core (which is never suppose to happen). I would think >>>>>>>> if this happened due to the core being truncated, SA would be >>>>>>>> blowing up all over the place with exceptions, probably before >>>>>>>> we ever get to this code, but in any cast what we do here >>>>>>>> wouldn't really make a difference. >>>>>>> >>>>>>> You are right, sorry. >>>>>>> >>>>>>> >>>>>>>> I'm not sure why you prefer an exception for errors other than >>>>>>>> ESRCH. Why should they be treated differently? >>>>>>>> getThreadIntegerRegisterSet0() is used for finding the current >>>>>>>> frame for stack tracing. With my changes any failure will >>>>>>>> result in deferring to "last java frame" if set, and otherwise >>>>>>>> just not produce a stack trace (and the WARNING will be present >>>>>>>> in the output). This seems preferable to completely abandoning >>>>>>>> any further thread stack tracking. >>>>>>> >>>>>>> I'm not sure we can trust call stack when ptrace() returns any >>>>>>> errors other than ESRCH even if "last java frame" is available. >>>>>>> For example, don't ptrace() return EFAULT or EIO when something >>>>>>> wrong? (e.g. stack corruption) If so, it may lead to a wrong >>>>>>> analysis for troubleshooter. >>>>>>> I think it should be abort dumping call stack for its thread at >>>>>>> least. >>>>>> Hi Yasumasa, >>>>>> >>>>>> In general stack walking makes a best effort and can be wrong, >>>>>> even when not getting errors like this. For any actively >>>>>> executing thread SA needs to determine where the stack starts, >>>>>> with register contents being the starting point (SP, FP, and PC). >>>>>> These registers could contain anything, and SA makes a best >>>>>> effort to determine a current frame from them. However, the >>>>>> verification steps it takes are not 100% guaranteed, and can lead >>>>>> to an incorrect assumption of the current frame, which in turn >>>>>> can result in an exception later on when walking the stack. See >>>>>> JDK-8247641. >>>>>> >>>>>> Keep in mind that the WARNING message will always be there. This >>>>>> should be enough to put the troubleshooter on alert that the >>>>>> stack trace may not be accurate. I think it's better to make an >>>>>> attempt at a stack trace then to just abandon it and not attempt >>>>>> to do something that may be useful. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>>>> Hi Chris, >>>>>>>>> >>>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be set >>>>>>>>> to errno by default. >>>>>>>>> So I guess it is similar to "generic" error code. >>>>>>>>> >>>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>>>> >>>>>>>>> According to manpage of ptrace(2), it might return errno other >>>>>>>>> than ESRCH. >>>>>>>>> For example, if we analyze broken core (e.g. the core was >>>>>>>>> dumped with disk full), we might get EFAULT. >>>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also I >>>>>>>>> think SA should throw DebuggerException if other error is >>>>>>>>> occurred. >>>>>>>>> >>>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I've? updated with webrev based on the new finding that a >>>>>>>>>> JavaThread cannot be on the ThreadList after its OS thread >>>>>>>>>> has been destroyed since the JavaThread removes itself from >>>>>>>>>> the ThreadList, and therefore must be running on its OS >>>>>>>>>> thread. The logic of the fix is unchanged from the first >>>>>>>>>> webrev, but I updated the comments to better reflect what is >>>>>>>>>> going on. I also updated the CR: >>>>>>>>>> >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>>>> Hi Chris, >>>>>>>>>>> >>>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a >>>>>>>>>>>>>>>>>>>> summary of changes in each file: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the >>>>>>>>>>>>>>>>>>> more general problem of not ensuring a thread is >>>>>>>>>>>>>>>>>>> kept alive whilst acting upon it. I don't know how >>>>>>>>>>>>>>>>>>> the SA finds these references to the threads it is >>>>>>>>>>>>>>>>>>> going to stackwalk, but is it possible to fix this >>>>>>>>>>>>>>>>>>> via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM >>>>>>>>>>>>>>>>>> changes. For example, SA can't create a wrapper to a >>>>>>>>>>>>>>>>>> JavaThread, only to have the JavaThread be freed >>>>>>>>>>>>>>>>>> later on. It's just not possible. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread >>>>>>>>>>>>>>>>> for which the native OS thread id is invalid? Any >>>>>>>>>>>>>>>>> thread found in _java_thread_list is either live or >>>>>>>>>>>>>>>>> still to be started. In the latter case the >>>>>>>>>>>>>>>>> JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My assumption was that the JavaThread is in the process >>>>>>>>>>>>>>>> of being destroyed, and it has freed its OS thread but >>>>>>>>>>>>>>>> is itself still in the thread list. I did notice that >>>>>>>>>>>>>>>> the OS thread id being used looked to be in the range >>>>>>>>>>>>>>>> of thread id #'s you would expect for the running app, >>>>>>>>>>>>>>>> so that to me indicated it was once valid, but is no more. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Keep in mind that although hotspot may have >>>>>>>>>>>>>>>> synchronization code that prevents you from pulling a >>>>>>>>>>>>>>>> JavaThread off the thread list when it is in the >>>>>>>>>>>>>>>> process of being destroyed (I'm guessing it does), SA >>>>>>>>>>>>>>>> has no such protections. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But you stated that once the SA has attached, the target >>>>>>>>>>>>>>> VM can't change. If the SA gets its set of thread from >>>>>>>>>>>>>>> one attach then tries to make queries about those >>>>>>>>>>>>>>> threads in a separate attach, then obviously it could be >>>>>>>>>>>>>>> providing garbage thread information. So you would need >>>>>>>>>>>>>>> to re-validate the JavaThread in the target VM before >>>>>>>>>>>>>>> trying to do anything with it. >>>>>>>>>>>>>> That's not what is going on here. It's attaching and >>>>>>>>>>>>>> doing a stack trace, which involves getting the thread >>>>>>>>>>>>>> list and iterating through all threads without detaching. >>>>>>>>>>>>> >>>>>>>>>>>>> Okay so I restate my original comment - all the >>>>>>>>>>>>> JavaThreads must be alive or not yet started, so how are >>>>>>>>>>>>> you encountering an invalid thread id? Any thread you find >>>>>>>>>>>>> via the ThreadsList can't have destroyed its osThread. In >>>>>>>>>>>>> any case the logic should be checking thread->osThread() >>>>>>>>>>>>> for NULL, and then osThread()->get_state() to ensure it is >>>>>>>>>>>>> >= INITIALIZED before using the thread_id(). >>>>>>>>>>>> Hi David, >>>>>>>>>>>> >>>>>>>>>>>> I chatted with Dan about this, and he said since the >>>>>>>>>>>> JavaThread is responsible for removing itself from the >>>>>>>>>>>> ThreadList, it is impossible to have a JavaThread still on >>>>>>>>>>>> the ThreadList, but without and underlying OS Thread. So >>>>>>>>>>>> I'm a bit perplexed as to how I can find a JavaThread on >>>>>>>>>>>> the ThreadList, but that results in ESRCH when trying to >>>>>>>>>>>> access the thread with ptrace. My only conclusion is that >>>>>>>>>>>> this failure is somehow spurious, and maybe the issue it >>>>>>>>>>>> just that the thread is in some temporary state that >>>>>>>>>>>> prevents its access. If so, I still think the approach I'm >>>>>>>>>>>> taking is the correct one, but the comments should be updated. >>>>>>>>>>> >>>>>>>>>>> ESRCH can have other meanings but I don't know enough about >>>>>>>>>>> the broader context to know whether they are applicable in >>>>>>>>>>> this case. >>>>>>>>>>> >>>>>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is >>>>>>>>>>> not currently being traced by the caller, or is not stopped >>>>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>>>> >>>>>>>>>>> I won't comment further on the fix/workaround as I don't >>>>>>>>>>> know the code. I'll leave that to other folk. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> I had one other finding. When this issue first turned up, >>>>>>>>>>>> it prevented the thread from getting a stack trace due to >>>>>>>>>>>> the exception being thrown. What I hadn't realize is that >>>>>>>>>>>> after fixing it to not throw an exception, which resulted >>>>>>>>>>>> in the stack walking code getting all nulls for register >>>>>>>>>>>> values, I actually started to see a stack trace printed: >>>>>>>>>>>> >>>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon >>>>>>>>>>>> prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable >>>>>>>>>>>> [0x00007f125f0f4000] >>>>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed >>>>>>>>>>>> for lwp (8089) >>>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = >>>>>>>>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 >>>>>>>>>>>> (Interpreted frame) >>>>>>>>>>>> ??- >>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>>>>>>>>>> @bci=108, line=216 (Interpreted frame) >>>>>>>>>>>> ??- >>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>>>>>>>>>> @bci=4 (Interpreted frame) >>>>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted >>>>>>>>>>>> frame) >>>>>>>>>>>> >>>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had >>>>>>>>>>>> enabled, and it indicates that the stack walking code is >>>>>>>>>>>> using the "last java frame" setting, which it will do if >>>>>>>>>>>> current registers values don't indicate a valid frame (as >>>>>>>>>>>> would be the case if sp was null). I had previously assumed >>>>>>>>>>>> that without an underling valid LWP, there would be no >>>>>>>>>>>> stack trace. Given that there is one, there must be a valid >>>>>>>>>>>> LWP. Otherwise I don't see how the stack could have been >>>>>>>>>>>> walked. That's another indication that the ptrace failure >>>>>>>>>>>> is spurious in nature. >>>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> Also, even if you are using something like clhsdb to >>>>>>>>>>>>>> issue commands on addresses, if the address is no longer >>>>>>>>>>>>>> valid for the command you are executing, then you would >>>>>>>>>>>>>> get the appropriate error when there is an attempt to >>>>>>>>>>>>>> create a wrapper for it. I don't know of any command that >>>>>>>>>>>>>> operates directly on a JavaThread, but I think there are >>>>>>>>>>>>>> for InstanceKlass. So if you remembered the address of an >>>>>>>>>>>>>> InstanceKlass, and then reattached and tried a command >>>>>>>>>>>>>> that takes an InstanceKlass address, you would get an >>>>>>>>>>>>>> exception when SA tries to create the wrapper for the >>>>>>>>>>>>>> InsanceKlass if it were no longer a valid address for one. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS >>>>>>>>>>>>>>>>>>>> ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being >>>>>>>>>>>>>>>>>>>> null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Fix issue with >>>>>>>>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>>>>>>>>>>>>>> appearing twice when printing the exception. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From serguei.spitsyn at oracle.com Wed Jun 24 19:26:22 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 24 Jun 2020 12:26:22 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> Message-ID: <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> On 6/24/20 05:25, David Holmes wrote: > On 24/06/2020 8:14 pm, Alan Bateman wrote: >> On 24/06/2020 10:57, David Holmes wrote: >>> >>> But you are ignoring my next statement. If we remove the >>> setAccessible(true) then the premain method will not be accessible >>> as Serguei reported. >>> >>> Exception in thread "main" java.lang.IllegalAccessException: class >>> sun.instrument.InstrumentationImpl >>> ? (in module java.instrument) cannot access a member of class >>> SimpleAgent with modifiers "public static" >>> >>> I feel we are talking past each other on this issue with regards to >>> the IllegalAcessError that comes from the module system. >> This is nothing to do with the module system. If you drop the >> setAccessible(true) from JDK 6 or JDK 8 then you'll also get >> IllegalAccessException when the member is not accessible. > > Ah! The test class SimpleAgent is what is not public. That seems a bug > in the test. There are many such tests. We can break some of the existing agents by rejecting non-public agent classes. I'm inclined to continue using the setAccessible and just add an extra check for non-public premain/agentmain methods. Thanks, Serguei > Sorry for the confusion. > > David > ----- > >> I think the main thing that needs to be agreed here is whether to fix >> the bug or change the spec. My view is that fixing the bug should be >> low risk because (a) I've never seen an agent with a non-public >> premain method, and (b) Agents typically have to update or release >> frequently because of updates to the class file version. So yes, it >> would be a behavioral compatibility issue taht requires CSR approval >> and requires follow-up release notes to document the change. >> >> -Alan From mandy.chung at oracle.com Wed Jun 24 19:44:06 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 24 Jun 2020 12:44:06 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> Message-ID: On 6/24/20 12:26 PM, serguei.spitsyn at oracle.com wrote: > On 6/24/20 05:25, David Holmes wrote: >> >> Ah! The test class SimpleAgent is what is not public. That seems a >> bug in the test. > > There are many such tests. > We can break some of the existing agents by rejecting non-public agent > classes. > I'm inclined to continue using the setAccessible and just add an extra > check for non-public premain/agentmain methods. There is only one non-public SimpleAgent which is shared by j.l.instrument tests. ? test/jdk/java/lang/instrument/SimpleAgent.java test/hotspot/jtreg/runtime/cds/appcds/jvmti/dumpingWithAgent implements the agent properly (a public class and a public static void premain method). As the popular Java agents are conforming the spec (publicly accessible premain method), the compatibility risk is low. Unless such a? java agent exists and finds a strong compelling reason to argue that its premain method must be allowed non-public, I do not see the argument to change the spec to allow non-public agent classes. A bad test case is not a representative existing java agent. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Jun 24 20:07:32 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 24 Jun 2020 13:07:32 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> Message-ID: <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> On 6/24/20 12:44, Mandy Chung wrote: > > > On 6/24/20 12:26 PM, serguei.spitsyn at oracle.com wrote: >> On 6/24/20 05:25, David Holmes wrote: >>> >>> Ah! The test class SimpleAgent is what is not public. That seems a >>> bug in the test. >> >> There are many such tests. >> We can break some of the existing agents by rejecting non-public >> agent classes. >> I'm inclined to continue using the setAccessible and just add an >> extra check for non-public premain/agentmain methods. > > There is only one non-public SimpleAgent which is shared by > j.l.instrument tests. > ? test/jdk/java/lang/instrument/SimpleAgent.java There are many such tests: test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestLambdaFormRetransformation.java:class Agent implements ClassFileTransformer { test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java:class NativeMethodPrefixAgent { test/jdk/java/lang/instrument/PremainClass/NoPremainAgent.java:class NoPremainAgent { test/jdk/java/lang/instrument/SimpleAgent.java:class SimpleAgent { test/jdk/java/lang/instrument/RetransformAgent.java:class RetransformAgent { test/jdk/java/lang/instrument/PremainClass/InheritAgent0001.java:class InheritAgent0001 extends InheritAgent0001Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0001.java:class InheritAgent0001Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0010.java:class InheritAgent0010 extends InheritAgent0010Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0010.java:class InheritAgent0010Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0011.java:class InheritAgent0011 extends InheritAgent0011Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0011.java:class InheritAgent0011Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java:class InheritAgent0100 extends InheritAgent0100Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java:class InheritAgent0100Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0101.java:class InheritAgent0101 extends InheritAgent0101Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0101.java:class InheritAgent0101Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0110.java:class InheritAgent0110 extends InheritAgent0110Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0110.java:class InheritAgent0110Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0111.java:class InheritAgent0111 extends InheritAgent0111Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent0111.java:class InheritAgent0111Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java:class InheritAgent1000 extends InheritAgent1000Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java:class InheritAgent1000Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1001.java:class InheritAgent1001 extends InheritAgent1001Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1001.java:class InheritAgent1001Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1010.java:class InheritAgent1010 extends InheritAgent1010Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1010.java:class InheritAgent1010Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1011.java:class InheritAgent1011 extends InheritAgent1011Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1011.java:class InheritAgent1011Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java:class InheritAgent1100 extends InheritAgent1100Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java:class InheritAgent1100Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1101.java:class InheritAgent1101 extends InheritAgent1101Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1101.java:class InheritAgent1101Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1110.java:class InheritAgent1110 extends InheritAgent1110Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1110.java:class InheritAgent1110Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1111.java:class InheritAgent1111 extends InheritAgent1111Super { test/jdk/java/lang/instrument/PremainClass/InheritAgent1111.java:class InheritAgent1111Super { But is is not a big problem - all can be fixed. > test/hotspot/jtreg/runtime/cds/appcds/jvmti/dumpingWithAgent > implements the agent properly (a public class and a public static void > premain method). > > As the popular Java agents are conforming the spec (publicly > accessible premain method), the compatibility risk is low. > > Unless such a? java agent exists and finds a strong compelling reason > to argue that its premain method must be allowed non-public, I do not > see the argument to change the spec to allow non-public agent classes. > > A bad test case is not a representative existing java agent. Okay, thanks. I'll prepare a fix with a removed setAccessible. Thanks, Serguei > > Mandy From daniil.x.titov at oracle.com Wed Jun 24 23:06:42 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 24 Jun 2020 16:06:42 -0700 Subject: RFR (S): 8245129: Enhance jstat gc option output and tests In-Reply-To: <5E73A203-4A05-480B-97F1-C4F3A090B293@amazon.com> References: <5E73A203-4A05-480B-97F1-C4F3A090B293@amazon.com> Message-ID: <65944D3F-353D-42BB-A552-713BCB6331E9@oracle.com> Hi Paul, The change looks good to me. Thanks! --Daniil ?On 6/22/20, 8:48 AM, "serviceability-dev on behalf of Hohensee, Paul" wrote: Thanks very much for review, Volker. I'll file a follow-up issue. One more reviewer, please? :) Paul ?On 6/22/20, 8:10 AM, "serviceability-dev on behalf of Volker Simonis" wrote: Hi Paul, thanks for fixing jstat for larger heaps. I like that you've added explicit tests for ParallelGC which hasn't been tested since G1 was made the default collector. I also agree that sizes should all be right justified. I only wonder if the header of a right justified column shouldn't be right justified as well? However, taking into account that this already hasn't been handled consistently before your change, I'm fine to postpone that to a follow-up cleanup change. I think the change looks good so thumbs up from me. Thank you and best regards, Volker On Thu, Jun 18, 2020 at 11:53 PM Hohensee, Paul wrote: > > Ping. Any takers for this simple patch? > > > > Thanks, > > Paul > > > > From: serviceability-dev on behalf of "Hohensee, Paul" > Date: Monday, May 18, 2020 at 8:25 AM > To: serviceability-dev > Subject: RFR (S): 8245129: Enhance jstat gc option output and tests > > > > Please review an enhancement to the jstat gc option output to make the columns wider (for up to a 2TB heap) so one can read the output without going cross-eyed. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8245129 > > Webrev: http://cr.openjdk.java.net/~phh/8245129/webrev.00/ > > > > I added tests using ParallelGC since the output can differ for non-G1 collectors. Successfully ran the test/hotspot/jtreg/serviceability/tmtools/jstat and test/jdk/sun/tools/jstat tests. A submit repo run had one failure > > > > runtime/MemberName/MemberNameLeak.java > > tier1 > > macosx-x64-debug > > > > but rerunning it on my laptop succeeded, and there?s no connection between this test and my patch. > > > > Thanks, > > Paul > > > > From yasuenag at gmail.com Thu Jun 25 00:17:59 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Thu, 25 Jun 2020 09:17:59 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <3f7bed81-1399-0e31-94bd-856cd233c2a2@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> Message-ID: On 2020/06/25 3:22, Chris Plummer wrote: > On 6/24/20 12:01 AM, Yasumasa Suenaga wrote: >> On 2020/06/24 15:32, Chris Plummer wrote: >>> Hi Yasumasa , >>> >>> I think LinuxAMD64CFrame is used for pstack and what I've been looking at has been jstack, and in particular AMD64CurrentFrameGuess, which does use "last java frame". I'm not sure why LinuxAMD64CFrame does not look at "last java frame". Maybe it should. >> >> I thought both pattern (jstack, mixed stack) for this change. >> As you know, mixed jstack (jstack --mixed) attempt to find top of native stack via LinuxAMD64CFrame, register values are needed for it (so it depends on ptrace() call). So I guess mixed mode jstack (jhsdb jstack --mixed) would not show any stacks (cannot find "last java frame"). > Hi Yasumasa, > > I should have been more clear on what I meant by jstack and pstack. For jstack I meant using StackTrace.java, which is what you get by default with "jhsdb jstack" and also the clhsdb jstack command. For pstack I meant PStack.java, which is what you get with "jhsdb jstack --mixed" or the clhsdb pstack command. > > So this CR impacts both types of stack traces in that they will get null registers when the the lower level API fails to get the register set. For StackTrace.java it will then defer to "last java frame" if available. For PStack.java it will not, and will always result in no stack trace. The code of interest is here: > > ?????? AMD64ThreadContext context = (AMD64ThreadContext) thread.getContext(); > ?????? Address pc? = context.getRegisterAsAddress(AMD64ThreadContext.RIP); > ?????? if (pc == null) return null; > ?????? return LinuxAMD64CFrame.getTopFrame(dbg, pc, context); > > So the question is should "last java frame" be used if pc == null. If so, then getTopFrame() would also need to be modified to use "last java frame" when fetching RBP. I don't think so because CFrame is defined as "Models a "C" programming language frame on the stack" in the javadoc, so it should have *valid* register values IMHO. In addition, RIP is needed for Linux AMD64 at least because it would use DWARF since JDK-8234624. Thanks, Yasumasa > thanks, > > Chris >> >> >> Thanks, >> >> Yasumasa >> >> >>> thanks, >>> >>> Chris >>> >>> On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: >>>> Hi Chris, >>>> >>>> Thanks you for explanation. >>>> Your change looks good (but "last java frame" would not be found in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/24 12:09, Chris Plummer wrote: >>>>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>>>>> Hi Chris, >>>>>> >>>>>> Skillful troubleshooters who use jhsdb will aware this warnings, and they will take other appropriate methods. >>>>>> >>>>>> However, I'm not sure it is worth to continue to perform even if SA cannot get register values. >>>>>> >>>>>> For example, Linux AMD64 depends on RIP and RSP values to find top frame. >>>>>> According to your change, The caller of getThreadIntegerRegisterSet() has responsible for dealing with the set of null registers. However X86ThreadContext::data (it includes raw register values) would still be zero when it happens. >>>>> This is? what I intended to have happen. Just end up with a register set of all nulls. Then when stack walking code gets a null, it will revert to "last java frame" if available, otherwise no stack dump is done. >>>>>> >>>>>> So I think register holder (e.g. X86ThreadContext) should have tri-state (have registers, fail to get registers, not yet attempt to get registers). >>>>>> OTOH it might be over-engineering. What do you think? >>>>> Before implementing this I looked at the what would be the easier approach to get the desired effect of stack walking code simply failing over to using "last java frame", and decided the null set of registers was easiest. Other approaches involved more changes and impacted more files. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/24 3:16, Chris Plummer wrote: >>>>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. >>>>>>>> >>>>>>>> You are right, sorry. >>>>>>>> >>>>>>>> >>>>>>>>> I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. >>>>>>>> >>>>>>>> I'm not sure we can trust call stack when ptrace() returns any errors other than ESRCH even if "last java frame" is available. For example, don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack corruption) If so, it may lead to a wrong analysis for troubleshooter. >>>>>>>> I think it should be abort dumping call stack for its thread at least. >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> In general stack walking makes a best effort and can be wrong, even when not getting errors like this. For any actively executing thread SA needs to determine where the stack starts, with register contents being the starting point (SP, FP, and PC). These registers could contain anything, and SA makes a best effort to determine a current frame from them. However, the verification steps it takes are not 100% guaranteed, and can lead to an incorrect assumption of the current frame, which in turn can result in an exception later on when walking the stack. See JDK-8247641. >>>>>>> >>>>>>> Keep in mind that the WARNING message will always be there. This should be enough to put the troubleshooter on alert that the stack trace may not be accurate. I think it's better to make an attempt at a stack trace then to just abandon it and not attempt to do something that may be useful. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>>>>> Hi Chris, >>>>>>>>>> >>>>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. >>>>>>>>>> So I guess it is similar to "generic" error code. >>>>>>>>>> >>>>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>>>>> >>>>>>>>>> According to manpage of ptrace(2), it might return errno other than ESRCH. >>>>>>>>>> For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. >>>>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. >>>>>>>>>> >>>>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: >>>>>>>>>>> >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>>>>> Hi Chris, >>>>>>>>>>>> >>>>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>>>>>>>>>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>>>>>>>>>>>> Hi David, >>>>>>>>>>>>> >>>>>>>>>>>>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >>>>>>>>>>>> >>>>>>>>>>>> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >>>>>>>>>>>> >>>>>>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >>>>>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>>>>> >>>>>>>>>>>> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>>>>>>>>>>>> >>>>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>>>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>>>>>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>>>>>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>>>>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>>>>>>>> >>>>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>>>>>>>>>>>> >>>>>>>>>>>>> thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Chris >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > From chris.plummer at oracle.com Thu Jun 25 01:00:38 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 24 Jun 2020 18:00:38 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> Message-ID: <4d8cff28-8405-fc6a-8361-cfeaae8fc166@oracle.com> On 6/24/20 5:17 PM, Yasumasa Suenaga wrote: > On 2020/06/25 3:22, Chris Plummer wrote: >> On 6/24/20 12:01 AM, Yasumasa Suenaga wrote: >>> On 2020/06/24 15:32, Chris Plummer wrote: >>>> Hi Yasumasa , >>>> >>>> I think LinuxAMD64CFrame is used for pstack and what I've been >>>> looking at has been jstack, and in particular >>>> AMD64CurrentFrameGuess, which does use "last java frame". I'm not >>>> sure why LinuxAMD64CFrame does not look at "last java frame". Maybe >>>> it should. >>> >>> I thought both pattern (jstack, mixed stack) for this change. >>> As you know, mixed jstack (jstack --mixed) attempt to find top of >>> native stack via LinuxAMD64CFrame, register values are needed for it >>> (so it depends on ptrace() call). So I guess mixed mode jstack >>> (jhsdb jstack --mixed) would not show any stacks (cannot find "last >>> java frame"). >> Hi Yasumasa, >> >> I should have been more clear on what I meant by jstack and pstack. >> For jstack I meant using StackTrace.java, which is what you get by >> default with "jhsdb jstack" and also the clhsdb jstack command. For >> pstack I meant PStack.java, which is what you get with "jhsdb jstack >> --mixed" or the clhsdb pstack command. >> >> So this CR impacts both types of stack traces in that they will get >> null registers when the the lower level API fails to get the register >> set. For StackTrace.java it will then defer to "last java frame" if >> available. For PStack.java it will not, and will always result in no >> stack trace. The code of interest is here: >> >> ??????? AMD64ThreadContext context = (AMD64ThreadContext) >> thread.getContext(); >> ??????? Address pc? = >> context.getRegisterAsAddress(AMD64ThreadContext.RIP); >> ??????? if (pc == null) return null; >> ??????? return LinuxAMD64CFrame.getTopFrame(dbg, pc, context); >> >> So the question is should "last java frame" be used if pc == null. If >> so, then getTopFrame() would also need to be modified to use "last >> java frame" when fetching RBP. > > I don't think so because CFrame is defined as "Models a "C" > programming language frame on the stack" in the javadoc, so it should > have *valid* register values IMHO. > In addition, RIP is needed for Linux AMD64 at least because it would > use DWARF since JDK-8234624. > Hi Yasumasa, I don't quite understand the "C" frame nomenclature since CFrame is used for non C frames also. The PStack code roughly does the following: CFrame f = cdbg.topFrameForThread(); ClosestSymbol sym = f.closestSymbolToPC(); Address pc = f.pc(); if (sym != null) { ?? ... native symbol } else if (interp.contains(pc)) { ?? ... print interpreter frame So if the CFrame was filled in with "last java frame" values, it should allow PStack to print the stack starting with the "last java frame". Any native frame below that point would be missed. Chris > > Thanks, > > Yasumasa > > >> thanks, >> >> Chris >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: >>>>> Hi Chris, >>>>> >>>>> Thanks you for explanation. >>>>> Your change looks good (but "last java frame" would not be found >>>>> in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/24 12:09, Chris Plummer wrote: >>>>>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> Skillful troubleshooters who use jhsdb will aware this warnings, >>>>>>> and they will take other appropriate methods. >>>>>>> >>>>>>> However, I'm not sure it is worth to continue to perform even if >>>>>>> SA cannot get register values. >>>>>>> >>>>>>> For example, Linux AMD64 depends on RIP and RSP values to find >>>>>>> top frame. >>>>>>> According to your change, The caller of >>>>>>> getThreadIntegerRegisterSet() has responsible for dealing with >>>>>>> the set of null registers. However X86ThreadContext::data (it >>>>>>> includes raw register values) would still be zero when it happens. >>>>>> This is? what I intended to have happen. Just end up with a >>>>>> register set of all nulls. Then when stack walking code gets a >>>>>> null, it will revert to "last java frame" if available, otherwise >>>>>> no stack dump is done. >>>>>>> >>>>>>> So I think register holder (e.g. X86ThreadContext) should have >>>>>>> tri-state (have registers, fail to get registers, not yet >>>>>>> attempt to get registers). >>>>>>> OTOH it might be over-engineering. What do you think? >>>>>> Before implementing this I looked at the what would be the easier >>>>>> approach to get the desired effect of stack walking code simply >>>>>> failing over to using "last java frame", and decided the null set >>>>>> of registers was easiest. Other approaches involved more changes >>>>>> and impacted more files. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/24 3:16, Chris Plummer wrote: >>>>>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>>>>>> Hi Chris, >>>>>>>>> >>>>>>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> ptrace is not used for core files, so the EFAULT for a bad >>>>>>>>>> core file is not a possibility. However, get_lwp_regs() does >>>>>>>>>> redirect to core_get_lwp_regs() for core files. It can fail, >>>>>>>>>> but the only reason it ever does is if the LWP can't be found >>>>>>>>>> in the core (which is never suppose to happen). I would think >>>>>>>>>> if this happened due to the core being truncated, SA would be >>>>>>>>>> blowing up all over the place with exceptions, probably >>>>>>>>>> before we ever get to this code, but in any cast what we do >>>>>>>>>> here wouldn't really make a difference. >>>>>>>>> >>>>>>>>> You are right, sorry. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I'm not sure why you prefer an exception for errors other >>>>>>>>>> than ESRCH. Why should they be treated differently? >>>>>>>>>> getThreadIntegerRegisterSet0() is used for finding the >>>>>>>>>> current frame for stack tracing. With my changes any failure >>>>>>>>>> will result in deferring to "last java frame" if set, and >>>>>>>>>> otherwise just not produce a stack trace (and the WARNING >>>>>>>>>> will be present in the output). This seems preferable to >>>>>>>>>> completely abandoning any further thread stack tracking. >>>>>>>>> >>>>>>>>> I'm not sure we can trust call stack when ptrace() returns any >>>>>>>>> errors other than ESRCH even if "last java frame" is >>>>>>>>> available. For example, don't ptrace() return EFAULT or EIO >>>>>>>>> when something wrong? (e.g. stack corruption) If so, it may >>>>>>>>> lead to a wrong analysis for troubleshooter. >>>>>>>>> I think it should be abort dumping call stack for its thread >>>>>>>>> at least. >>>>>>>> Hi Yasumasa, >>>>>>>> >>>>>>>> In general stack walking makes a best effort and can be wrong, >>>>>>>> even when not getting errors like this. For any actively >>>>>>>> executing thread SA needs to determine where the stack starts, >>>>>>>> with register contents being the starting point (SP, FP, and >>>>>>>> PC). These registers could contain anything, and SA makes a >>>>>>>> best effort to determine a current frame from them. However, >>>>>>>> the verification steps it takes are not 100% guaranteed, and >>>>>>>> can lead to an incorrect assumption of the current frame, which >>>>>>>> in turn can result in an exception later on when walking the >>>>>>>> stack. See JDK-8247641. >>>>>>>> >>>>>>>> Keep in mind that the WARNING message will always be there. >>>>>>>> This should be enough to put the troubleshooter on alert that >>>>>>>> the stack trace may not be accurate. I think it's better to >>>>>>>> make an attempt at a stack trace then to just abandon it and >>>>>>>> not attempt to do something that may be useful. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>>>>>> Hi Chris, >>>>>>>>>>> >>>>>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be >>>>>>>>>>> set to errno by default. >>>>>>>>>>> So I guess it is similar to "generic" error code. >>>>>>>>>>> >>>>>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>>>>>> >>>>>>>>>>> According to manpage of ptrace(2), it might return errno >>>>>>>>>>> other than ESRCH. >>>>>>>>>>> For example, if we analyze broken core (e.g. the core was >>>>>>>>>>> dumped with disk full), we might get EFAULT. >>>>>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also I >>>>>>>>>>> think SA should throw DebuggerException if other error is >>>>>>>>>>> occurred. >>>>>>>>>>> >>>>>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I've? updated with webrev based on the new finding that a >>>>>>>>>>>> JavaThread cannot be on the ThreadList after its OS thread >>>>>>>>>>>> has been destroyed since the JavaThread removes itself from >>>>>>>>>>>> the ThreadList, and therefore must be running on its OS >>>>>>>>>>>> thread. The logic of the fix is unchanged from the first >>>>>>>>>>>> webrev, but I updated the comments to better reflect what >>>>>>>>>>>> is going on. I also updated the CR: >>>>>>>>>>>> >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>> >>>>>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a >>>>>>>>>>>>>>>>>>>>>> summary of changes in each file: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the >>>>>>>>>>>>>>>>>>>>> more general problem of not ensuring a thread is >>>>>>>>>>>>>>>>>>>>> kept alive whilst acting upon it. I don't know how >>>>>>>>>>>>>>>>>>>>> the SA finds these references to the threads it is >>>>>>>>>>>>>>>>>>>>> going to stackwalk, but is it possible to fix this >>>>>>>>>>>>>>>>>>>>> via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the >>>>>>>>>>>>>>>>>>>> VM changes. For example, SA can't create a wrapper >>>>>>>>>>>>>>>>>>>> to a JavaThread, only to have the JavaThread be >>>>>>>>>>>>>>>>>>>> freed later on. It's just not possible. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread >>>>>>>>>>>>>>>>>>> for which the native OS thread id is invalid? Any >>>>>>>>>>>>>>>>>>> thread found in _java_thread_list is either live or >>>>>>>>>>>>>>>>>>> still to be started. In the latter case the >>>>>>>>>>>>>>>>>>> JavaThread->osThread does not have its thread_id set >>>>>>>>>>>>>>>>>>> yet. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> My assumption was that the JavaThread is in the >>>>>>>>>>>>>>>>>> process of being destroyed, and it has freed its OS >>>>>>>>>>>>>>>>>> thread but is itself still in the thread list. I did >>>>>>>>>>>>>>>>>> notice that the OS thread id being used looked to be >>>>>>>>>>>>>>>>>> in the range of thread id #'s you would expect for >>>>>>>>>>>>>>>>>> the running app, so that to me indicated it was once >>>>>>>>>>>>>>>>>> valid, but is no more. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Keep in mind that although hotspot may have >>>>>>>>>>>>>>>>>> synchronization code that prevents you from pulling a >>>>>>>>>>>>>>>>>> JavaThread off the thread list when it is in the >>>>>>>>>>>>>>>>>> process of being destroyed (I'm guessing it does), SA >>>>>>>>>>>>>>>>>> has no such protections. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> But you stated that once the SA has attached, the >>>>>>>>>>>>>>>>> target VM can't change. If the SA gets its set of >>>>>>>>>>>>>>>>> thread from one attach then tries to make queries >>>>>>>>>>>>>>>>> about those threads in a separate attach, then >>>>>>>>>>>>>>>>> obviously it could be providing garbage thread >>>>>>>>>>>>>>>>> information. So you would need to re-validate the >>>>>>>>>>>>>>>>> JavaThread in the target VM before trying to do >>>>>>>>>>>>>>>>> anything with it. >>>>>>>>>>>>>>>> That's not what is going on here. It's attaching and >>>>>>>>>>>>>>>> doing a stack trace, which involves getting the thread >>>>>>>>>>>>>>>> list and iterating through all threads without detaching. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Okay so I restate my original comment - all the >>>>>>>>>>>>>>> JavaThreads must be alive or not yet started, so how are >>>>>>>>>>>>>>> you encountering an invalid thread id? Any thread you >>>>>>>>>>>>>>> find via the ThreadsList can't have destroyed its >>>>>>>>>>>>>>> osThread. In any case the logic should be checking >>>>>>>>>>>>>>> thread->osThread() for NULL, and then >>>>>>>>>>>>>>> osThread()->get_state() to ensure it is >= INITIALIZED >>>>>>>>>>>>>>> before using the thread_id(). >>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I chatted with Dan about this, and he said since the >>>>>>>>>>>>>> JavaThread is responsible for removing itself from the >>>>>>>>>>>>>> ThreadList, it is impossible to have a JavaThread still >>>>>>>>>>>>>> on the ThreadList, but without and underlying OS Thread. >>>>>>>>>>>>>> So I'm a bit perplexed as to how I can find a JavaThread >>>>>>>>>>>>>> on the ThreadList, but that results in ESRCH when trying >>>>>>>>>>>>>> to access the thread with ptrace. My only conclusion is >>>>>>>>>>>>>> that this failure is somehow spurious, and maybe the >>>>>>>>>>>>>> issue it just that the thread is in some temporary state >>>>>>>>>>>>>> that prevents its access. If so, I still think the >>>>>>>>>>>>>> approach I'm taking is the correct one, but the comments >>>>>>>>>>>>>> should be updated. >>>>>>>>>>>>> >>>>>>>>>>>>> ESRCH can have other meanings but I don't know enough >>>>>>>>>>>>> about the broader context to know whether they are >>>>>>>>>>>>> applicable in this case. >>>>>>>>>>>>> >>>>>>>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is >>>>>>>>>>>>> not currently being traced by the caller, or is not stopped >>>>>>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>>>>>> >>>>>>>>>>>>> I won't comment further on the fix/workaround as I don't >>>>>>>>>>>>> know the code. I'll leave that to other folk. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> I had one other finding. When this issue first turned up, >>>>>>>>>>>>>> it prevented the thread from getting a stack trace due to >>>>>>>>>>>>>> the exception being thrown. What I hadn't realize is that >>>>>>>>>>>>>> after fixing it to not throw an exception, which resulted >>>>>>>>>>>>>> in the stack walking code getting all nulls for register >>>>>>>>>>>>>> values, I actually started to see a stack trace printed: >>>>>>>>>>>>>> >>>>>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon >>>>>>>>>>>>>> prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable >>>>>>>>>>>>>> [0x00007f125f0f4000] >>>>>>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs >>>>>>>>>>>>>> failed for lwp (8089) >>>>>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = >>>>>>>>>>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted >>>>>>>>>>>>>> frame) >>>>>>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 >>>>>>>>>>>>>> (Interpreted frame) >>>>>>>>>>>>>> ??- >>>>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>>>>>>>>>>>> @bci=108, line=216 (Interpreted frame) >>>>>>>>>>>>>> ??- >>>>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>>>>>>>>>>>> @bci=4 (Interpreted frame) >>>>>>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted >>>>>>>>>>>>>> frame) >>>>>>>>>>>>>> >>>>>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I >>>>>>>>>>>>>> had enabled, and it indicates that the stack walking code >>>>>>>>>>>>>> is using the "last java frame" setting, which it will do >>>>>>>>>>>>>> if current registers values don't indicate a valid frame >>>>>>>>>>>>>> (as would be the case if sp was null). I had previously >>>>>>>>>>>>>> assumed that without an underling valid LWP, there would >>>>>>>>>>>>>> be no stack trace. Given that there is one, there must be >>>>>>>>>>>>>> a valid LWP. Otherwise I don't see how the stack could >>>>>>>>>>>>>> have been walked. That's another indication that the >>>>>>>>>>>>>> ptrace failure is spurious in nature. >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, even if you are using something like clhsdb to >>>>>>>>>>>>>>>> issue commands on addresses, if the address is no >>>>>>>>>>>>>>>> longer valid for the command you are executing, then >>>>>>>>>>>>>>>> you would get the appropriate error when there is an >>>>>>>>>>>>>>>> attempt to create a wrapper for it. I don't know of any >>>>>>>>>>>>>>>> command that operates directly on a JavaThread, but I >>>>>>>>>>>>>>>> think there are for InstanceKlass. So if you remembered >>>>>>>>>>>>>>>> the address of an InstanceKlass, and then reattached >>>>>>>>>>>>>>>> and tried a command that takes an InstanceKlass >>>>>>>>>>>>>>>> address, you would get an exception when SA tries to >>>>>>>>>>>>>>>> create the wrapper for the InsanceKlass if it were no >>>>>>>>>>>>>>>> longer a valid address for one. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS >>>>>>>>>>>>>>>>>>>>>> ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being >>>>>>>>>>>>>>>>>>>>>> null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -Fix issue with >>>>>>>>>>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>>>>>>>>>>>>>>>> appearing twice when printing the exception. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From yasuenag at gmail.com Thu Jun 25 01:53:41 2020 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Thu, 25 Jun 2020 10:53:41 +0900 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <4d8cff28-8405-fc6a-8361-cfeaae8fc166@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> <4d8cff28-8405-fc6a-8361-cfeaae8fc166@oracle.com> Message-ID: <1ffd9712-f27d-06ca-f6ce-260f10262830@gmail.com> On 2020/06/25 10:00, Chris Plummer wrote: > On 6/24/20 5:17 PM, Yasumasa Suenaga wrote: >> On 2020/06/25 3:22, Chris Plummer wrote: >>> On 6/24/20 12:01 AM, Yasumasa Suenaga wrote: >>>> On 2020/06/24 15:32, Chris Plummer wrote: >>>>> Hi Yasumasa , >>>>> >>>>> I think LinuxAMD64CFrame is used for pstack and what I've been looking at has been jstack, and in particular AMD64CurrentFrameGuess, which does use "last java frame". I'm not sure why LinuxAMD64CFrame does not look at "last java frame". Maybe it should. >>>> >>>> I thought both pattern (jstack, mixed stack) for this change. >>>> As you know, mixed jstack (jstack --mixed) attempt to find top of native stack via LinuxAMD64CFrame, register values are needed for it (so it depends on ptrace() call). So I guess mixed mode jstack (jhsdb jstack --mixed) would not show any stacks (cannot find "last java frame"). >>> Hi Yasumasa, >>> >>> I should have been more clear on what I meant by jstack and pstack. For jstack I meant using StackTrace.java, which is what you get by default with "jhsdb jstack" and also the clhsdb jstack command. For pstack I meant PStack.java, which is what you get with "jhsdb jstack --mixed" or the clhsdb pstack command. >>> >>> So this CR impacts both types of stack traces in that they will get null registers when the the lower level API fails to get the register set. For StackTrace.java it will then defer to "last java frame" if available. For PStack.java it will not, and will always result in no stack trace. The code of interest is here: >>> >>> ??????? AMD64ThreadContext context = (AMD64ThreadContext) thread.getContext(); >>> ??????? Address pc? = context.getRegisterAsAddress(AMD64ThreadContext.RIP); >>> ??????? if (pc == null) return null; >>> ??????? return LinuxAMD64CFrame.getTopFrame(dbg, pc, context); >>> >>> So the question is should "last java frame" be used if pc == null. If so, then getTopFrame() would also need to be modified to use "last java frame" when fetching RBP. >> >> I don't think so because CFrame is defined as "Models a "C" programming language frame on the stack" in the javadoc, so it should have *valid* register values IMHO. >> In addition, RIP is needed for Linux AMD64 at least because it would use DWARF since JDK-8234624. >> > Hi Yasumasa, > > I don't quite understand the "C" frame nomenclature since CFrame is used for non C frames also. The PStack code roughly does the following: > > CFrame f = cdbg.topFrameForThread(); > ClosestSymbol sym = f.closestSymbolToPC(); > Address pc = f.pc(); > if (sym != null) { > ?? ... native symbol > } else if (interp.contains(pc)) { > ?? ... print interpreter frame > > So if the CFrame was filled in with "last java frame" values, it should allow PStack to print the stack starting with the "last java frame". Any native frame below that point would be missed. To use "last java frame" in this case looks good because stack unwinding is a best effort behavior. However PStack::run is PC-driven. I want to regard it - in other words, it should not perform if we cannot get register values even if "last java frame" is available. Thanks, Yasumasa > Chris >> >> Thanks, >> >> Yasumasa >> >> >>> thanks, >>> >>> Chris >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: >>>>>> Hi Chris, >>>>>> >>>>>> Thanks you for explanation. >>>>>> Your change looks good (but "last java frame" would not be found in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2020/06/24 12:09, Chris Plummer wrote: >>>>>>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> Skillful troubleshooters who use jhsdb will aware this warnings, and they will take other appropriate methods. >>>>>>>> >>>>>>>> However, I'm not sure it is worth to continue to perform even if SA cannot get register values. >>>>>>>> >>>>>>>> For example, Linux AMD64 depends on RIP and RSP values to find top frame. >>>>>>>> According to your change, The caller of getThreadIntegerRegisterSet() has responsible for dealing with the set of null registers. However X86ThreadContext::data (it includes raw register values) would still be zero when it happens. >>>>>>> This is? what I intended to have happen. Just end up with a register set of all nulls. Then when stack walking code gets a null, it will revert to "last java frame" if available, otherwise no stack dump is done. >>>>>>>> >>>>>>>> So I think register holder (e.g. X86ThreadContext) should have tri-state (have registers, fail to get registers, not yet attempt to get registers). >>>>>>>> OTOH it might be over-engineering. What do you think? >>>>>>> Before implementing this I looked at the what would be the easier approach to get the desired effect of stack walking code simply failing over to using "last java frame", and decided the null set of registers was easiest. Other approaches involved more changes and impacted more files. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2020/06/24 3:16, Chris Plummer wrote: >>>>>>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>>>>>>> Hi Chris, >>>>>>>>>> >>>>>>>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>> >>>>>>>>>>> ptrace is not used for core files, so the EFAULT for a bad core file is not a possibility. However, get_lwp_regs() does redirect to core_get_lwp_regs() for core files. It can fail, but the only reason it ever does is if the LWP can't be found in the core (which is never suppose to happen). I would think if this happened due to the core being truncated, SA would be blowing up all over the place with exceptions, probably before we ever get to this code, but in any cast what we do here wouldn't really make a difference. >>>>>>>>>> >>>>>>>>>> You are right, sorry. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I'm not sure why you prefer an exception for errors other than ESRCH. Why should they be treated differently? getThreadIntegerRegisterSet0() is used for finding the current frame for stack tracing. With my changes any failure will result in deferring to "last java frame" if set, and otherwise just not produce a stack trace (and the WARNING will be present in the output). This seems preferable to completely abandoning any further thread stack tracking. >>>>>>>>>> >>>>>>>>>> I'm not sure we can trust call stack when ptrace() returns any errors other than ESRCH even if "last java frame" is available. For example, don't ptrace() return EFAULT or EIO when something wrong? (e.g. stack corruption) If so, it may lead to a wrong analysis for troubleshooter. >>>>>>>>>> I think it should be abort dumping call stack for its thread at least. >>>>>>>>> Hi Yasumasa, >>>>>>>>> >>>>>>>>> In general stack walking makes a best effort and can be wrong, even when not getting errors like this. For any actively executing thread SA needs to determine where the stack starts, with register contents being the starting point (SP, FP, and PC). These registers could contain anything, and SA makes a best effort to determine a current frame from them. However, the verification steps it takes are not 100% guaranteed, and can lead to an incorrect assumption of the current frame, which in turn can result in an exception later on when walking the stack. See JDK-8247641. >>>>>>>>> >>>>>>>>> Keep in mind that the WARNING message will always be there. This should be enough to put the troubleshooter on alert that the stack trace may not be accurate. I think it's better to make an attempt at a stack trace then to just abandon it and not attempt to do something that may be useful. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>>>>>>> Hi Chris, >>>>>>>>>>>> >>>>>>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be set to errno by default. >>>>>>>>>>>> So I guess it is similar to "generic" error code. >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>>>>>>> >>>>>>>>>>>> According to manpage of ptrace(2), it might return errno other than ESRCH. >>>>>>>>>>>> For example, if we analyze broken core (e.g. the core was dumped with disk full), we might get EFAULT. >>>>>>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also I think SA should throw DebuggerException if other error is occurred. >>>>>>>>>>>> >>>>>>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yasumasa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I've? updated with webrev based on the new finding that a JavaThread cannot be on the ThreadList after its OS thread has been destroyed since the JavaThread removes itself from the ThreadList, and therefore must be running on its OS thread. The logic of the fix is unchanged from the first webrev, but I updated the comments to better reflect what is going on. I also updated the CR: >>>>>>>>>>>>> >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>>>>>>> >>>>>>>>>>>>> thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Chris >>>>>>>>>>>>> >>>>>>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's a summary of changes in each file: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the more general problem of not ensuring a thread is kept alive whilst acting upon it. I don't know how the SA finds these references to the threads it is going to stackwalk, but is it possible to fix this via appropriate uses of ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in the VM changes. For example, SA can't create a wrapper to a JavaThread, only to have the JavaThread be freed later on. It's just not possible. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Then how does it obtain a reference to a JavaThread for which the native OS thread id is invalid? Any thread found in _java_thread_list is either live or still to be started. In the latter case the JavaThread->osThread does not have its thread_id set yet. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> My assumption was that the JavaThread is in the process of being destroyed, and it has freed its OS thread but is itself still in the thread list. I did notice that the OS thread id being used looked to be in the range of thread id #'s you would expect for the running app, so that to me indicated it was once valid, but is no more. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Keep in mind that although hotspot may have synchronization code that prevents you from pulling a JavaThread off the thread list when it is in the process of being destroyed (I'm guessing it does), SA has no such protections. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> But you stated that once the SA has attached, the target VM can't change. If the SA gets its set of thread from one attach then tries to make queries about those threads in a separate attach, then obviously it could be providing garbage thread information. So you would need to re-validate the JavaThread in the target VM before trying to do anything with it. >>>>>>>>>>>>>>>>> That's not what is going on here. It's attaching and doing a stack trace, which involves getting the thread list and iterating through all threads without detaching. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Okay so I restate my original comment - all the JavaThreads must be alive or not yet started, so how are you encountering an invalid thread id? Any thread you find via the ThreadsList can't have destroyed its osThread. In any case the logic should be checking thread->osThread() for NULL, and then osThread()->get_state() to ensure it is >= INITIALIZED before using the thread_id(). >>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I chatted with Dan about this, and he said since the JavaThread is responsible for removing itself from the ThreadList, it is impossible to have a JavaThread still on the ThreadList, but without and underlying OS Thread. So I'm a bit perplexed as to how I can find a JavaThread on the ThreadList, but that results in ESRCH when trying to access the thread with ptrace. My only conclusion is that this failure is somehow spurious, and maybe the issue it just that the thread is in some temporary state that prevents its access. If so, I still think the approach I'm taking is the correct one, but the comments should be updated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ESRCH can have other meanings but I don't know enough about the broader context to know whether they are applicable in this case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ??? ESRCH? The? specified? process? does not exist, or is not currently being traced by the caller, or is not stopped >>>>>>>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>>>>>>> >>>>>>>>>>>>>> I won't comment further on the fix/workaround as I don't know the code. I'll leave that to other folk. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I had one other finding. When this issue first turned up, it prevented the thread from getting a stack trace due to the exception being thrown. What I hadn't realize is that after fixing it to not throw an exception, which resulted in the stack walking code getting all nulls for register values, I actually started to see a stack trace printed: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable [0x00007f125f0f4000] >>>>>>>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs failed for lwp (8089) >>>>>>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted frame) >>>>>>>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 (Interpreted frame) >>>>>>>>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() @bci=108, line=216 (Interpreted frame) >>>>>>>>>>>>>>> ??- jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() @bci=4 (Interpreted frame) >>>>>>>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 (Interpreted frame) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I had enabled, and it indicates that the stack walking code is using the "last java frame" setting, which it will do if current registers values don't indicate a valid frame (as would be the case if sp was null). I had previously assumed that without an underling valid LWP, there would be no stack trace. Given that there is one, there must be a valid LWP. Otherwise I don't see how the stack could have been walked. That's another indication that the ptrace failure is spurious in nature. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also, even if you are using something like clhsdb to issue commands on addresses, if the address is no longer valid for the command you are executing, then you would get the appropriate error when there is an attempt to create a wrapper for it. I don't know of any command that operates directly on a JavaThread, but I think there are for InstanceKlass. So if you remembered the address of an InstanceKlass, and then reattached and tried a command that takes an InstanceKlass address, you would get an exception when SA tries to create the wrapper for the InsanceKlass if it were no longer a valid address for one. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>>>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing twice when printing the exception. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > > From david.holmes at oracle.com Thu Jun 25 04:02:38 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 25 Jun 2020 14:02:38 +1000 Subject: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up In-Reply-To: <431a5744-9001-c32a-6874-68d8acc3764e@oracle.com> References: <3ed0b32d-469e-6f89-f4b3-b78d5bcce700@oracle.com> <20411ebb-5b06-83da-07c8-b751252efecd@oracle.com> <31af0e67-7bee-0680-3f23-09864030bba4@oracle.com> <431a5744-9001-c32a-6874-68d8acc3764e@oracle.com> Message-ID: No objections. Thanks, David On 24/06/2020 8:57 am, Ioi Lam wrote: > I've updated the patch to include just the fix for class initialization: > > http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v02/ > > > Hopefully this part is non-controversial. We are unlikely to make > call_class_initializer(THREAD) any slower when there's no , so I > didn't add the diagnostic flag as suggested by Claes. > > I'll leave the class linking alone for now, as that may change in the > future. > > Meanwhile, I will look at other ways of reducing the effect of the > performance counters on start-up, under JDK-8246020 (-XX:+UsePerfData is > enabled by default and slows down VM bootstrap by 6%). > > thanks > - Ioi > > On 6/18/20 4:38 AM, Claes Redestad wrote: >> >> >> On 2020-06-17 05:19, Ioi Lam wrote: >>> >>> >>> On 6/16/20 6:20 PM, David Holmes wrote: >>>> Hi Ioi, >>>> >>>> On 17/06/2020 6:14 am, Ioi Lam wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8246019 >>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ >>>>> >>>>> >>>>> PerfClassTraceTime is (a rarely used feature) for measuring the >>>>> time spent during class linking and initialization. >>>> >>>> "A special command jcmd PerfCounter.print >>>> prints all performance counters in the process." >>>> >>>> How do you know this is a "rarely used feature"? >>> Hi David, >>> >>> Sure, the counter will be dumped, but by "rarely used" -- I mean no >>> one will find this particular counter useful, and no one will be >>> actively looking at it. >>> >>> I changed two parts of the code -- class init and class linking. >>> >>> For class initialization, the counter may be useful for people who >>> want to know how much time is spent in their functions, and >>> my patch doesn't change that. It only avoids using the counter when a >>> class has no , i.e., we know that the counter counts nothing >>> (except for a logging statement). >>> >>> ===== >>> >>> For class linking, no user code is executed, so it only measures VM >>> code. If it's useful for anyone, that would be VM engineers like me >>> who are trying to optimize the speed of class loading. However, due >>> to the overhead of the counter vs what it's trying to measure, the >>> results are pretty meaningless. >>> >>> Note that I've not disabled the counter altogether. Instead, I >>> disable it only when linking a CDS shared class, and we know that >>> very little is happening for this class (e.g., no verification). >>> >>> I think the class linking timer might have been useful 15 years ago >>> when it was introduced, or it might be useful today when CDS is >>> disabled. But with CDS enabled, we are paying a constant price that >>> seems to benefit no one. >>> >>> I think we should short-circuit it when it seems appropriate. If this >>> indeed causes problems for our users, it's easy to re-enable it. >>> That's better than just keeping this forever just because we're >>> afraid to touch anything. >> >> I think this seems like well-rounded approach overall, but this assumes >> that we're mostly measuring the overhead of measurement here. I don't >> doubt that's the case for the scenarios you're excluding here and now, >> but it's hard to guarantee this property hold in the future. >> >> Perhaps a diagnostic flag to enable timing unconditionally would be >> appropriate? With such a flag we could verify that the time deltas of >> running some applications with and without the flag roughly matches the >> time delta in reported linking time. If they diverge, we might need to >> adjust the conditions. >> >>> >>>> >>>> I find it hard to evaluate whether this short-circuiting of the time >>>> tracing is reasonable or not. Obviously any monitoring mechanism >>>> should impose minimal overhead compared to what is being measured, >>>> and these timers fall short in that regard. But if these stats >>>> become meaningless then they may as well be removed. >>>> >>>> I think the serviceability folk (cc'd) need to evaluate this in the >>>> context of the M&M tools. >> >> As a complement (or even alternative) there might be ways we can reduce >> time-to-measure overheads. E.g, JFR added >> FastUnorderedElapsedCounterSource (share/utilities/ticks.hpp) which uses >> rdtsc if available (x86 - fallback to os::elapsed_counter otherwise). >> >> This might be a reasonable alternative for the Perf* timers, which >> should be short-running events on a single thread. >> >> /Claes >> >>>> >>>>> However, it's quite expensive and it needs to start and stop a >>>>> bunch of timers. With CDS, it's quite often for the overhead of the >>>>> timer itself to be much more than the time it's trying to measure, >>>>> giving unreliable measurement. >>>>> >>>>> In this patch, when it's clear that the init and linking will be >>>>> very quick, I disable the timer and count only the number of >>>>> invocations. This shows a small improvement in start-up >>>> >>>> I'm curious if you tried to forcing EagerInitialization to be true >>>> to see how that improves the baseline. I've always noticed >>>> eager_init in the code, but hadn't realized it is disabled by default. >>>> >>> >>> I think it cannot be done by default, as it will violate the JLS. A >>> class can be initialized only when it's touched by bytecodes. >>> >>> It can also backfire as we may load many classes without initializing >>> them. E.g., during bytecode verification, we load many classes and >>> just check that one is a supertype of another. >>> >>> Thanks >>> - Ioi >>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> Results of " perf stat -r 100 bin/java -Xshare:on >>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version " >>>>> >>>>> 59623970 59341935 (-282035)?? -----? 41.774? 41.591 ( -0.183) - >>>>> 59623495 59331646 (-291849)?? -----? 41.696? 41.165 ( -0.531) -- >>>>> 59627148 59329526 (-297622)?? -----? 41.249? 41.094 ( -0.155) - >>>>> 59612439 59340760 (-271679)?? ----?? 41.773? 40.657 ( -1.116) ----- >>>>> 59626438 59335681 (-290757)?? -----? 41.683? 40.901 ( -0.782) ---- >>>>> 59618436 59338953 (-279483)?? -----? 41.861? 41.249 ( -0.612) --- >>>>> 59608782 59340173 (-268609)?? ----?? 41.198? 41.508 ( 0.310) + >>>>> 59614612 59325177 (-289435)?? -----? 41.397? 41.738 ( 0.341) ++ >>>>> 59615905 59344006 (-271899)?? ----?? 41.921? 40.969 ( -0.952) ---- >>>>> 59635867 59333147 (-302720)?? -----? 41.491? 40.836 ( -0.655) --- >>>>> ================================================ >>>>> 59620708 59336100 (-284608)?? -----? 41.604? 41.169 ( -0.434) -- >>>>> instruction delta =????? -284608??? -0.4774% >>>>> time??????? delta =?????? -0.434 ms -1.0435% >>>>> >>>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 >>>>> (so we have an overhead of about 715 instructions per use, yikes!). >>> > From chris.plummer at oracle.com Thu Jun 25 04:08:29 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 24 Jun 2020 21:08:29 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <1ffd9712-f27d-06ca-f6ce-260f10262830@gmail.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <3d72ddf2-8133-709b-150d-46af5c046dff@oracle.com> <1b1369f9-179c-d782-88c4-9eb030913583@oracle.com> <561269a9-fe2c-c11a-0f39-cc37d779d963@oracle.com> <55faf099-d241-845c-6f91-a8f32b6d9667@oracle.com> <9d598fb8-e5f5-27dd-b6e6-4c41f03e24e2@oracle.com> <17fbbb06-3cec-503b-73ea-c94156864ed7@oracle.com> <6a0152a9-a151-b49b-5003-f6cf73422973@gmail.com> <22872d51-e0c7-8a82-01d5-fed11748047c@gmail.com> <93dda82c-9a65-18b6-d34d-6dbe93fe310b@oracle.com> <060099cb-35bb-cbb5-ed0f-3c027ae7a1a7@gmail.com> <4d8cff28-8405-fc6a-8361-cfeaae8fc166@oracle.com> <1ffd9712-f27d-06ca-f6ce-260f10262830@gmail.com> Message-ID: On 6/24/20 6:53 PM, Yasumasa Suenaga wrote: > On 2020/06/25 10:00, Chris Plummer wrote: >> On 6/24/20 5:17 PM, Yasumasa Suenaga wrote: >>> On 2020/06/25 3:22, Chris Plummer wrote: >>>> On 6/24/20 12:01 AM, Yasumasa Suenaga wrote: >>>>> On 2020/06/24 15:32, Chris Plummer wrote: >>>>>> Hi Yasumasa , >>>>>> >>>>>> I think LinuxAMD64CFrame is used for pstack and what I've been >>>>>> looking at has been jstack, and in particular >>>>>> AMD64CurrentFrameGuess, which does use "last java frame". I'm not >>>>>> sure why LinuxAMD64CFrame does not look at "last java frame". >>>>>> Maybe it should. >>>>> >>>>> I thought both pattern (jstack, mixed stack) for this change. >>>>> As you know, mixed jstack (jstack --mixed) attempt to find top of >>>>> native stack via LinuxAMD64CFrame, register values are needed for >>>>> it (so it depends on ptrace() call). So I guess mixed mode jstack >>>>> (jhsdb jstack --mixed) would not show any stacks (cannot find >>>>> "last java frame"). >>>> Hi Yasumasa, >>>> >>>> I should have been more clear on what I meant by jstack and pstack. >>>> For jstack I meant using StackTrace.java, which is what you get by >>>> default with "jhsdb jstack" and also the clhsdb jstack command. For >>>> pstack I meant PStack.java, which is what you get with "jhsdb >>>> jstack --mixed" or the clhsdb pstack command. >>>> >>>> So this CR impacts both types of stack traces in that they will get >>>> null registers when the the lower level API fails to get the >>>> register set. For StackTrace.java it will then defer to "last java >>>> frame" if available. For PStack.java it will not, and will always >>>> result in no stack trace. The code of interest is here: >>>> >>>> ??????? AMD64ThreadContext context = (AMD64ThreadContext) >>>> thread.getContext(); >>>> ??????? Address pc? = >>>> context.getRegisterAsAddress(AMD64ThreadContext.RIP); >>>> ??????? if (pc == null) return null; >>>> ??????? return LinuxAMD64CFrame.getTopFrame(dbg, pc, context); >>>> >>>> So the question is should "last java frame" be used if pc == null. >>>> If so, then getTopFrame() would also need to be modified to use >>>> "last java frame" when fetching RBP. >>> >>> I don't think so because CFrame is defined as "Models a "C" >>> programming language frame on the stack" in the javadoc, so it >>> should have *valid* register values IMHO. >>> In addition, RIP is needed for Linux AMD64 at least because it would >>> use DWARF since JDK-8234624. >>> >> Hi Yasumasa, >> >> I don't quite understand the "C" frame nomenclature since CFrame is >> used for non C frames also. The PStack code roughly does the following: >> >> CFrame f = cdbg.topFrameForThread(); >> ClosestSymbol sym = f.closestSymbolToPC(); >> Address pc = f.pc(); >> if (sym != null) { >> ??? ... native symbol >> } else if (interp.contains(pc)) { >> ??? ... print interpreter frame >> >> So if the CFrame was filled in with "last java frame" values, it >> should allow PStack to print the stack starting with the "last java >> frame". Any native frame below that point would be missed. > > To use "last java frame" in this case looks good because stack > unwinding is a best effort behavior. > However PStack::run is PC-driven. I want to regard it - in other > words, it should not perform if we cannot get register values even if > "last java frame" is available. Ok, that sounds reasonable. thanks, Chris > > > Thanks, > > Yasumasa > > >> Chris >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> thanks, >>>> >>>> Chris >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>> >>>>>> On 6/23/20 11:04 PM, Yasumasa Suenaga wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> Thanks you for explanation. >>>>>>> Your change looks good (but "last java frame" would not be found >>>>>>> in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java) >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2020/06/24 12:09, Chris Plummer wrote: >>>>>>>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote: >>>>>>>>> Hi Chris, >>>>>>>>> >>>>>>>>> Skillful troubleshooters who use jhsdb will aware this >>>>>>>>> warnings, and they will take other appropriate methods. >>>>>>>>> >>>>>>>>> However, I'm not sure it is worth to continue to perform even >>>>>>>>> if SA cannot get register values. >>>>>>>>> >>>>>>>>> For example, Linux AMD64 depends on RIP and RSP values to find >>>>>>>>> top frame. >>>>>>>>> According to your change, The caller of >>>>>>>>> getThreadIntegerRegisterSet() has responsible for dealing with >>>>>>>>> the set of null registers. However X86ThreadContext::data (it >>>>>>>>> includes raw register values) would still be zero when it >>>>>>>>> happens. >>>>>>>> This is? what I intended to have happen. Just end up with a >>>>>>>> register set of all nulls. Then when stack walking code gets a >>>>>>>> null, it will revert to "last java frame" if available, >>>>>>>> otherwise no stack dump is done. >>>>>>>>> >>>>>>>>> So I think register holder (e.g. X86ThreadContext) should have >>>>>>>>> tri-state (have registers, fail to get registers, not yet >>>>>>>>> attempt to get registers). >>>>>>>>> OTOH it might be over-engineering. What do you think? >>>>>>>> Before implementing this I looked at the what would be the >>>>>>>> easier approach to get the desired effect of stack walking code >>>>>>>> simply failing over to using "last java frame", and decided the >>>>>>>> null set of registers was easiest. Other approaches involved >>>>>>>> more changes and impacted more files. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2020/06/24 3:16, Chris Plummer wrote: >>>>>>>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote: >>>>>>>>>>> Hi Chris, >>>>>>>>>>> >>>>>>>>>>> On 2020/06/20 15:20, Chris Plummer wrote: >>>>>>>>>>>> Hi Yasumasa, >>>>>>>>>>>> >>>>>>>>>>>> ptrace is not used for core files, so the EFAULT for a bad >>>>>>>>>>>> core file is not a possibility. However, get_lwp_regs() >>>>>>>>>>>> does redirect to core_get_lwp_regs() for core files. It can >>>>>>>>>>>> fail, but the only reason it ever does is if the LWP can't >>>>>>>>>>>> be found in the core (which is never suppose to happen). I >>>>>>>>>>>> would think if this happened due to the core being >>>>>>>>>>>> truncated, SA would be blowing up all over the place with >>>>>>>>>>>> exceptions, probably before we ever get to this code, but >>>>>>>>>>>> in any cast what we do here wouldn't really make a difference. >>>>>>>>>>> >>>>>>>>>>> You are right, sorry. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I'm not sure why you prefer an exception for errors other >>>>>>>>>>>> than ESRCH. Why should they be treated differently? >>>>>>>>>>>> getThreadIntegerRegisterSet0() is used for finding the >>>>>>>>>>>> current frame for stack tracing. With my changes any >>>>>>>>>>>> failure will result in deferring to "last java frame" if >>>>>>>>>>>> set, and otherwise just not produce a stack trace (and the >>>>>>>>>>>> WARNING will be present in the output). This seems >>>>>>>>>>>> preferable to completely abandoning any further thread >>>>>>>>>>>> stack tracking. >>>>>>>>>>> >>>>>>>>>>> I'm not sure we can trust call stack when ptrace() returns >>>>>>>>>>> any errors other than ESRCH even if "last java frame" is >>>>>>>>>>> available. For example, don't ptrace() return EFAULT or EIO >>>>>>>>>>> when something wrong? (e.g. stack corruption) If so, it may >>>>>>>>>>> lead to a wrong analysis for troubleshooter. >>>>>>>>>>> I think it should be abort dumping call stack for its thread >>>>>>>>>>> at least. >>>>>>>>>> Hi Yasumasa, >>>>>>>>>> >>>>>>>>>> In general stack walking makes a best effort and can be >>>>>>>>>> wrong, even when not getting errors like this. For any >>>>>>>>>> actively executing thread SA needs to determine where the >>>>>>>>>> stack starts, with register contents being the starting point >>>>>>>>>> (SP, FP, and PC). These registers could contain anything, and >>>>>>>>>> SA makes a best effort to determine a current frame from >>>>>>>>>> them. However, the verification steps it takes are not 100% >>>>>>>>>> guaranteed, and can lead to an incorrect assumption of the >>>>>>>>>> current frame, which in turn can result in an exception later >>>>>>>>>> on when walking the stack. See JDK-8247641. >>>>>>>>>> >>>>>>>>>> Keep in mind that the WARNING message will always be there. >>>>>>>>>> This should be enough to put the troubleshooter on alert that >>>>>>>>>> the stack trace may not be accurate. I think it's better to >>>>>>>>>> make an attempt at a stack trace then to just abandon it and >>>>>>>>>> not attempt to do something that may be useful. >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> thanks, >>>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote: >>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>> >>>>>>>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be >>>>>>>>>>>>> set to errno by default. >>>>>>>>>>>>> So I guess it is similar to "generic" error code. >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c >>>>>>>>>>>>> >>>>>>>>>>>>> According to manpage of ptrace(2), it might return errno >>>>>>>>>>>>> other than ESRCH. >>>>>>>>>>>>> For example, if we analyze broken core (e.g. the core was >>>>>>>>>>>>> dumped with disk full), we might get EFAULT. >>>>>>>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also >>>>>>>>>>>>> I think SA should throw DebuggerException if other error >>>>>>>>>>>>> is occurred. >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Yasumasa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've? updated with webrev based on the new finding that a >>>>>>>>>>>>>> JavaThread cannot be on the ThreadList after its OS >>>>>>>>>>>>>> thread has been destroyed since the JavaThread removes >>>>>>>>>>>>>> itself from the ThreadList, and therefore must be running >>>>>>>>>>>>>> on its OS thread. The logic of the fix is unchanged from >>>>>>>>>>>>>> the first webrev, but I updated the comments to better >>>>>>>>>>>>>> reflect what is going on. I also updated the CR: >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Chris >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote: >>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote: >>>>>>>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote: >>>>>>>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote: >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Please help review the following: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's >>>>>>>>>>>>>>>>>>>>>>>> a summary of changes in each file: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the >>>>>>>>>>>>>>>>>>>>>>> more general problem of not ensuring a thread is >>>>>>>>>>>>>>>>>>>>>>> kept alive whilst acting upon it. I don't know >>>>>>>>>>>>>>>>>>>>>>> how the SA finds these references to the threads >>>>>>>>>>>>>>>>>>>>>>> it is going to stackwalk, but is it possible to >>>>>>>>>>>>>>>>>>>>>>> fix this via appropriate uses of >>>>>>>>>>>>>>>>>>>>>>> ThreadsListHandle/Iterator? >>>>>>>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in >>>>>>>>>>>>>>>>>>>>>> the VM changes. For example, SA can't create a >>>>>>>>>>>>>>>>>>>>>> wrapper to a JavaThread, only to have the >>>>>>>>>>>>>>>>>>>>>> JavaThread be freed later on. It's just not >>>>>>>>>>>>>>>>>>>>>> possible. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Then how does it obtain a reference to a >>>>>>>>>>>>>>>>>>>>> JavaThread for which the native OS thread id is >>>>>>>>>>>>>>>>>>>>> invalid? Any thread found in _java_thread_list is >>>>>>>>>>>>>>>>>>>>> either live or still to be started. In the latter >>>>>>>>>>>>>>>>>>>>> case the JavaThread->osThread does not have its >>>>>>>>>>>>>>>>>>>>> thread_id set yet. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> My assumption was that the JavaThread is in the >>>>>>>>>>>>>>>>>>>> process of being destroyed, and it has freed its OS >>>>>>>>>>>>>>>>>>>> thread but is itself still in the thread list. I >>>>>>>>>>>>>>>>>>>> did notice that the OS thread id being used looked >>>>>>>>>>>>>>>>>>>> to be in the range of thread id #'s you would >>>>>>>>>>>>>>>>>>>> expect for the running app, so that to me indicated >>>>>>>>>>>>>>>>>>>> it was once valid, but is no more. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Keep in mind that although hotspot may have >>>>>>>>>>>>>>>>>>>> synchronization code that prevents you from pulling >>>>>>>>>>>>>>>>>>>> a JavaThread off the thread list when it is in the >>>>>>>>>>>>>>>>>>>> process of being destroyed (I'm guessing it does), >>>>>>>>>>>>>>>>>>>> SA has no such protections. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> But you stated that once the SA has attached, the >>>>>>>>>>>>>>>>>>> target VM can't change. If the SA gets its set of >>>>>>>>>>>>>>>>>>> thread from one attach then tries to make queries >>>>>>>>>>>>>>>>>>> about those threads in a separate attach, then >>>>>>>>>>>>>>>>>>> obviously it could be providing garbage thread >>>>>>>>>>>>>>>>>>> information. So you would need to re-validate the >>>>>>>>>>>>>>>>>>> JavaThread in the target VM before trying to do >>>>>>>>>>>>>>>>>>> anything with it. >>>>>>>>>>>>>>>>>> That's not what is going on here. It's attaching and >>>>>>>>>>>>>>>>>> doing a stack trace, which involves getting the >>>>>>>>>>>>>>>>>> thread list and iterating through all threads without >>>>>>>>>>>>>>>>>> detaching. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Okay so I restate my original comment - all the >>>>>>>>>>>>>>>>> JavaThreads must be alive or not yet started, so how >>>>>>>>>>>>>>>>> are you encountering an invalid thread id? Any thread >>>>>>>>>>>>>>>>> you find via the ThreadsList can't have destroyed its >>>>>>>>>>>>>>>>> osThread. In any case the logic should be checking >>>>>>>>>>>>>>>>> thread->osThread() for NULL, and then >>>>>>>>>>>>>>>>> osThread()->get_state() to ensure it is >= INITIALIZED >>>>>>>>>>>>>>>>> before using the thread_id(). >>>>>>>>>>>>>>>> Hi David, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I chatted with Dan about this, and he said since the >>>>>>>>>>>>>>>> JavaThread is responsible for removing itself from the >>>>>>>>>>>>>>>> ThreadList, it is impossible to have a JavaThread still >>>>>>>>>>>>>>>> on the ThreadList, but without and underlying OS >>>>>>>>>>>>>>>> Thread. So I'm a bit perplexed as to how I can find a >>>>>>>>>>>>>>>> JavaThread on the ThreadList, but that results in ESRCH >>>>>>>>>>>>>>>> when trying to access the thread with ptrace. My only >>>>>>>>>>>>>>>> conclusion is that this failure is somehow spurious, >>>>>>>>>>>>>>>> and maybe the issue it just that the thread is in some >>>>>>>>>>>>>>>> temporary state that prevents its access. If so, I >>>>>>>>>>>>>>>> still think the approach I'm taking is the correct one, >>>>>>>>>>>>>>>> but the comments should be updated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ESRCH can have other meanings but I don't know enough >>>>>>>>>>>>>>> about the broader context to know whether they are >>>>>>>>>>>>>>> applicable in this case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ??? ESRCH? The? specified? process does not exist, or is >>>>>>>>>>>>>>> not currently being traced by the caller, or is not stopped >>>>>>>>>>>>>>> ????????????? (for requests that require a stopped tracee). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I won't comment further on the fix/workaround as I don't >>>>>>>>>>>>>>> know the code. I'll leave that to other folk. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I had one other finding. When this issue first turned >>>>>>>>>>>>>>>> up, it prevented the thread from getting a stack trace >>>>>>>>>>>>>>>> due to the exception being thrown. What I hadn't >>>>>>>>>>>>>>>> realize is that after fixing it to not throw an >>>>>>>>>>>>>>>> exception, which resulted in the stack walking code >>>>>>>>>>>>>>>> getting all nulls for register values, I actually >>>>>>>>>>>>>>>> started to see a stack trace printed: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon >>>>>>>>>>>>>>>> prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable >>>>>>>>>>>>>>>> [0x00007f125f0f4000] >>>>>>>>>>>>>>>> ??? java.lang.Thread.State: RUNNABLE >>>>>>>>>>>>>>>> ??? JavaThread state: _thread_in_native >>>>>>>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs >>>>>>>>>>>>>>>> failed for lwp (8089) >>>>>>>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = >>>>>>>>>>>>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0 >>>>>>>>>>>>>>>> ??- java.io.FileInputStream.read0() @bci=0 (Interpreted >>>>>>>>>>>>>>>> frame) >>>>>>>>>>>>>>>> ??- java.io.FileInputStream.read() @bci=1, line=223 >>>>>>>>>>>>>>>> (Interpreted frame) >>>>>>>>>>>>>>>> ??- >>>>>>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() >>>>>>>>>>>>>>>> @bci=108, line=216 (Interpreted frame) >>>>>>>>>>>>>>>> ??- >>>>>>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() >>>>>>>>>>>>>>>> @bci=4 (Interpreted frame) >>>>>>>>>>>>>>>> ??- java.lang.Thread.run() @bci=11, line=832 >>>>>>>>>>>>>>>> (Interpreted frame) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I >>>>>>>>>>>>>>>> had enabled, and it indicates that the stack walking >>>>>>>>>>>>>>>> code is using the "last java frame" setting, which it >>>>>>>>>>>>>>>> will do if current registers values don't indicate a >>>>>>>>>>>>>>>> valid frame (as would be the case if sp was null). I >>>>>>>>>>>>>>>> had previously assumed that without an underling valid >>>>>>>>>>>>>>>> LWP, there would be no stack trace. Given that there is >>>>>>>>>>>>>>>> one, there must be a valid LWP. Otherwise I don't see >>>>>>>>>>>>>>>> how the stack could have been walked. That's another >>>>>>>>>>>>>>>> indication that the ptrace failure is spurious in nature. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also, even if you are using something like clhsdb to >>>>>>>>>>>>>>>>>> issue commands on addresses, if the address is no >>>>>>>>>>>>>>>>>> longer valid for the command you are executing, then >>>>>>>>>>>>>>>>>> you would get the appropriate error when there is an >>>>>>>>>>>>>>>>>> attempt to create a wrapper for it. I don't know of >>>>>>>>>>>>>>>>>> any command that operates directly on a JavaThread, >>>>>>>>>>>>>>>>>> but I think there are for InstanceKlass. So if you >>>>>>>>>>>>>>>>>> remembered the address of an InstanceKlass, and then >>>>>>>>>>>>>>>>>> reattached and tried a command that takes an >>>>>>>>>>>>>>>>>> InstanceKlass address, you would get an exception >>>>>>>>>>>>>>>>>> when SA tries to create the wrapper for the >>>>>>>>>>>>>>>>>> InsanceKlass if it were no longer a valid address for >>>>>>>>>>>>>>>>>> one. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS >>>>>>>>>>>>>>>>>>>>>>>> ThreadID is invalid, print a warning. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -Improve a print_debug message >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being >>>>>>>>>>>>>>>>>>>>>>>> null due to the OS ThreadID not being valid. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -Fix issue with >>>>>>>>>>>>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" >>>>>>>>>>>>>>>>>>>>>>>> appearing twice when printing the exception. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> From david.holmes at oracle.com Thu Jun 25 05:17:53 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 25 Jun 2020 15:17:53 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> Message-ID: <56591b98-214e-2066-9823-0276345efd29@oracle.com> Hi Yasumasa, Thanks for tackling this. I've had an initial look at it and have a few concerns. On 24/06/2020 4:50 pm, Yasumasa Suenaga wrote: > Hi all, > > Please review this change: > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ Some typos: invaliant -> invariant directry -> directly > This change replace following VM operations to direct handshake. > > ?- VM_GetFrameCount (GetFrameCount()) > ?- VM_GetFrameLocation (GetFrameLocation()) > ?- VM_GetThreadListStackTraces (GetThreadListStackTrace()) > ?- VM_GetCurrentLocation It would have been better to split these out into separate changes. I am finding it very hard to track through the webrev and try to compare the old safepoint based operation with the new direct handshake approach, to check they are functionally equivalent. You are not checking the return value of Handshake::execute_direct and so are missing the possibility that the target thread has terminated before you got to do the operation on it. It isn't clear to me under what other circumstances execute_direct can also return false. You don't seem to have these checks anymore in some places: && !_java_thread->is_exiting() && _java_thread->threadObj() != NULL) why not? It is not clear that all the code that previously could execute at a safepoint, due to being called from a VM_Operation, is still executable at a safepoint e.g. JvmtiThreadState::count_frames() > GetThreadListStackTrace() uses direct handshake if thread count == 1. In > other case (thread count > 1), it would be performed as VM operation > (VM_GetThreadListStackTraces). This introduces a large chunk of duplicated code for the frame fill in and final allocation. Can you not reuse the existing logic that does this - and in the process do away with the the use of _needs_thread_state? I really wanted to see simpler code after this conversion. I'm also wondering whether we can hide all this logic in the closure, as was done with the VM_Operation i.e. *stack_info_ptr = op.stack_info(); > Caller of VM_GetCurrentLocation > (JvmtiEnvThreadState::reset_current_location()) might be called at > safepoint. So I added safepoint check in its caller. I could not figure out what you were referring to here. > This change has been tested in serviceability/jvmti serviceability/jdwp > vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns > k/jdwp. Just a general comment on testing for these conversions to direct handshakes. We have no control over whether the handshake gets executed in the original thread or the target thread, so for all we know all our testing could be executing only one of the cases. This concerns me but I am not yet sure what to do about it. Thanks, David ----- > Also I tested it on submit repo, then it has execution error > (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to > dependency error. So I think it does not occur by this change. > > > Thanks, > > Yasumasa From suenaga at oss.nttdata.com Thu Jun 25 08:24:09 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 25 Jun 2020 17:24:09 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <56591b98-214e-2066-9823-0276345efd29@oracle.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <56591b98-214e-2066-9823-0276345efd29@oracle.com> Message-ID: Hi David, Thanks for your comment! On 2020/06/25 14:17, David Holmes wrote: > Hi Yasumasa, > > Thanks for tackling this. I've had an initial look at it and have a few concerns. > > On 24/06/2020 4:50 pm, Yasumasa Suenaga wrote: >> Hi all, >> >> Please review this change: >> >> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ > > Some typos: > > invaliant -> invariant > directry -> directly I will fix them. >> This change replace following VM operations to direct handshake. >> >> ??- VM_GetFrameCount (GetFrameCount()) >> ??- VM_GetFrameLocation (GetFrameLocation()) >> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >> ??- VM_GetCurrentLocation > > It would have been better to split these out into separate changes. I am finding it very hard to track through the webrev and try to compare the old safepoint based operation with the new direct handshake approach, to check they are functionally equivalent. I will separate them as following. What do you think? If you are ok, I will update JBS. - Thread operations - VM_GetThreadListStackTraces (GetThreadListStackTrace()) - VM_GetStackTrace(GetStackTrace()) <- I missed it to describe in previous mail, sorry. - Frame operations - VM_GetFrameCount (GetFrameCount()) - VM_GetFrameLocation (GetFrameLocation()) - VM_GetCurrentLocation I will start to work when they are separated. > You are not checking the return value of Handshake::execute_direct and so are missing the possibility that the target thread has terminated before you got to do the operation on it. It isn't clear to me under what other circumstances execute_direct can also return false. I will add it. According to Handshake::execute_direct() and HandshakeOperation::do_handshake(), it seems to return false if the target thread has terminated as you said. > You don't seem to have these checks anymore in some places: > > ? && !_java_thread->is_exiting() && _java_thread->threadObj() != NULL) > > why not? I thought the thread which enters handshake is always alive and it has threadObj. I will recover their conditions. (I also should recover them for GetOwnedMonitorInfoClosure and GetCurrentContendedMonitorClosure - I removed them in JDK-8242425) > It is not clear that all the code that previously could execute at a safepoint, due to being called from a VM_Operation, is still executable at a safepoint e.g. JvmtiThreadState::count_frames() > >> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). > > This introduces a large chunk of duplicated code for the frame fill in and final allocation. Can you not reuse the existing logic that does this - and in the process do away with the the use of _needs_thread_state? I really wanted to see simpler code after this conversion. > > I'm also wondering whether we can hide all this logic in the closure, as was done with the VM_Operation i.e. > > *stack_info_ptr = op.stack_info(); I will try to refactor this change. >> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. > > I could not figure out what you were referring to here. I guess following callpath is available: VM_GetCurrentLocation JvmtiEnvThreadState::reset_current_location() JvmtiEventControllerPrivate::recompute_env_thread_enabled() JvmtiEventControllerPrivate::recompute_thread_enabled() JvmtiEventControllerPrivate::set_frame_pop() JvmtiEventController::set_frame_pop() JvmtiEnvThreadState::set_frame_pop() VM_SetFramePop::doit() However, VM_SetFramePop seems not to allow nested VM operations. >> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >> k/jdwp. > > Just a general comment on testing for these conversions to direct handshakes. We have no control over whether the handshake gets executed in the original thread or the target thread, so for all we know all our testing could be executing only one of the cases. This concerns me but I am not yet sure what to do about it. > > Thanks, > David > ----- > >> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >> >> >> Thanks, >> >> Yasumasa From MRasmussen at perforce.com Thu Jun 25 10:34:34 2020 From: MRasmussen at perforce.com (Michael Rasmussen) Date: Thu, 25 Jun 2020 10:34:34 +0000 Subject: Java agents in paths containing unicode characters on Windows In-Reply-To: References: Message-ID: Anyone know if the below is a known issue? Tried searching the bug database, but didn't find anything that really matched this issue. /Michael From: serviceability-dev on behalf of Michael Rasmussen Sent: 15 June 2020 13:48 To: serviceability-dev at openjdk.java.net Subject: Java agents in paths containing unicode characters on Windows Hi, Trying to attach a javaagent that is located in a folder that contains characters that cannot be represented in the current windows system code page will fail to load, even if specified with relative path or using a full path using short name. Example: agent.jar file is a javaagent located in a folder with unicode characters, in my example: C:\tmp\Te?t (on my system, the short name (8.3) for that is: C:\Tmp\tet~1) no-agent.jar is a jar file that is not a javaagent C:\>dir /s /b C:\tmp\Te?t C:\tmp\Te?t\agent.jar C:\tmp\Te?t\no-agent.jar C:\>dir /s /b C:\tmp\tet~1\ C:\tmp\tet~1\agent.jar C:\tmp\tet~1\no-agent.jar C:\tmp\Te?t>java -javaagent:agent.jar -version Unexpected error (103) returned by AddToSystemClassLoaderSearch Unable to add agent.jar to system class path - the system class loader does not define the appendToClassPathForInstrumentation method or the method failed FATAL ERROR in native method: processing of -javaagent failed, appending to system class path failed If using full path using 8.3 names that is all in ASCII, it still fails: C:\>java -javaagent:C:\tmp\tet~1\agent.jar -version Unexpected error (103) returned by AddToSystemClassLoaderSearch Unable to add C:\tmp\tet~1\agent.jar to system class path - the system class loader does not define the appendToClassPathForInstrumentation method or the method failed FATAL ERROR in native method: processing of -javaagent failed, appending to system class path failed If I try a jar file that doesn't have the necessary manifest entries to be a javaagent: C:\>java -javaagent:C:\tmp\tet~1\no-agent.jar -version Failed to find Premain-Class manifest attribute in C:\tmp\tet~1\no-agent.jar Error occurred during initialization of VM agent library failed to init: instrument So it can find the jar file, is able to load and read the manifest, but fails afterwards when trying to add to classpath. The above was tried with current JDK14 and JDK11 versions. /Michael This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. From david.holmes at oracle.com Thu Jun 25 12:48:05 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 25 Jun 2020 22:48:05 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <56591b98-214e-2066-9823-0276345efd29@oracle.com> Message-ID: <0b70e475-ddc8-e63b-54b5-849b9f2553dd@oracle.com> Hi Yasumasa, On 25/06/2020 6:24 pm, Yasumasa Suenaga wrote: > Hi David, > > Thanks for your comment! > > On 2020/06/25 14:17, David Holmes wrote: >> Hi Yasumasa, >> >> Thanks for tackling this. I've had an initial look at it and have a >> few concerns. >> >> On 24/06/2020 4:50 pm, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> Please review this change: >>> >>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >> >> Some typos: >> >> invaliant -> invariant >> directry -> directly > > I will fix them. > > >>> This change replace following VM operations to direct handshake. >>> >>> ??- VM_GetFrameCount (GetFrameCount()) >>> ??- VM_GetFrameLocation (GetFrameLocation()) >>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>> ??- VM_GetCurrentLocation >> >> It would have been better to split these out into separate changes. I >> am finding it very hard to track through the webrev and try to compare >> the old safepoint based operation with the new direct handshake >> approach, to check they are functionally equivalent. > > I will separate them as following. What do you think? > If you are ok, I will update JBS. > > ?- Thread operations > ???? - VM_GetThreadListStackTraces (GetThreadListStackTrace()) > ???? - VM_GetStackTrace(GetStackTrace())? <- I missed it to describe in > previous mail, sorry. > > ?- Frame operations > ???? - VM_GetFrameCount (GetFrameCount()) > ???? - VM_GetFrameLocation (GetFrameLocation()) > ???? - VM_GetCurrentLocation > > I will start to work when they are separated. If the frame operations are each small enough that will help. > >> You are not checking the return value of Handshake::execute_direct and >> so are missing the possibility that the target thread has terminated >> before you got to do the operation on it. It isn't clear to me under >> what other circumstances execute_direct can also return false. > > I will add it. According to Handshake::execute_direct() and > HandshakeOperation::do_handshake(), it seems to return false if the > target thread has terminated as you said. Yes, but also if the handshake is not executed - but I don't know under what conditions that can occur. > >> You don't seem to have these checks anymore in some places: >> >> ?? && !_java_thread->is_exiting() && _java_thread->threadObj() != NULL) >> >> why not? > > I thought the thread which enters handshake is always alive and it has > threadObj. As far as I can see we can still engage in a handshake with a thread after it has marked itself as exiting. The threadObj() can only be null while a thread is attaching, which means it would have to checked in the general case, but for these JVM TI operations if we already have a jthread reference to the target thread then it must be beyond that point. Mind you that same logic applies to the existing code so ... > I will recover their conditions. > (I also should recover them for GetOwnedMonitorInfoClosure and > GetCurrentContendedMonitorClosure - I removed them in JDK-8242425) I think so - and we need to check the return value of execute_direct to determine when to report JVMTI_ERROR_THREAD_NOT_ALIVE. > >> It is not clear that all the code that previously could execute at a >> safepoint, due to being called from a VM_Operation, is still >> executable at a safepoint e.g. JvmtiThreadState::count_frames() >> >>> GetThreadListStackTrace() uses direct handshake if thread count == 1. >>> In other case (thread count > 1), it would be performed as VM >>> operation (VM_GetThreadListStackTraces). >> >> This introduces a large chunk of duplicated code for the frame fill in >> and final allocation. Can you not reuse the existing logic that does >> this - and in the process do away with the the use of >> _needs_thread_state? I really wanted to see simpler code after this >> conversion. >> >> I'm also wondering whether we can hide all this logic in the closure, >> as was done with the VM_Operation i.e. >> >> *stack_info_ptr = op.stack_info(); > > I will try to refactor this change. > > >>> Caller of VM_GetCurrentLocation >>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>> safepoint. So I added safepoint check in its caller. >> >> I could not figure out what you were referring to here. > > I guess following callpath is available: > > VM_GetCurrentLocation > ? JvmtiEnvThreadState::reset_current_location() > ??? JvmtiEventControllerPrivate::recompute_env_thread_enabled() > ????? JvmtiEventControllerPrivate::recompute_thread_enabled() > ??????? JvmtiEventControllerPrivate::set_frame_pop() > ????????? JvmtiEventController::set_frame_pop() > ??????????? JvmtiEnvThreadState::set_frame_pop() > ????????????? VM_SetFramePop::doit() > > However, VM_SetFramePop seems not to allow nested VM operations. It is the outer operation that has to allow nesting but VM_GetCurrentLocation doesn't allow it either. So if this path is possible then something is broken. Cheers, David > >>> This change has been tested in serviceability/jvmti >>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>> vmTestbase/ns >>> k/jdwp. >> >> Just a general comment on testing for these conversions to direct >> handshakes. We have no control over whether the handshake gets >> executed in the original thread or the target thread, so for all we >> know all our testing could be executing only one of the cases. This >> concerns me but I am not yet sure what to do about it. >> >> Thanks, >> David >> ----- >> >>> Also I tested it on submit repo, then it has execution error >>> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >>> dependency error. So I think it does not occur by this change. >>> >>> >>> Thanks, >>> >>> Yasumasa From serguei.spitsyn at oracle.com Thu Jun 25 16:17:08 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 25 Jun 2020 09:17:08 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From yumin.qi at oracle.com Thu Jun 25 17:33:55 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Thu, 25 Jun 2020 10:33:55 -0700 Subject: RFR(T) 8203005: The top-of-stack type specified for nofast_* bytecodes are wrong Message-ID: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> Hi, please review the tiny changes for bug: https://bugs.openjdk.java.net/browse/JDK-8203005 webrev:http://cr.openjdk.java.net/~minqi/2020/8203005/webrev-00/ Summary: The change was left by 8074345(https://bugs.openjdk.java.net/browse/JDK-8074345), the types were wrongly put as T_ILLEGAL for T_OBJECT, and T_ILLEGAL for T_INT. This has not triggered any failures yet since the types stored in the type array for nofast version are never used, the used types are always the original types fortunately(unfortunately either). tests: tier1,tier2,tier3 Thanks Yumin From daniel.daugherty at oracle.com Thu Jun 25 17:34:03 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 13:34:03 -0400 Subject: RFR(T): 8248341: ProblemList java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java Message-ID: Greetings, I'm doing another round of reduce-the-noise in the CI in preparation for the upcoming weekend... So I have another trivial review... Here's the bug for the failures: ??? JDK-8247426 ThreadMXBean/ThreadMXBeanStateTest.java still times out ??? https://bugs.openjdk.java.net/browse/JDK-8247426 and here's the bug for the ProblemListing: ??? JDK-8248341 ProblemList java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java ??? https://bugs.openjdk.java.net/browse/JDK-8248341 Here's the context diff: $ hg diff diff -r 48dff13bb70a test/jdk/ProblemList.txt --- a/test/jdk/ProblemList.txt??? Thu Jun 25 13:10:47 2020 -0400 +++ b/test/jdk/ProblemList.txt??? Thu Jun 25 13:27:50 2020 -0400 @@ -593,6 +593,8 @@ ?com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java 8030957 aix-all ?com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java 8030957 aix-all +java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java 8247426 generic-all + ?sun/management/jdp/JdpDefaultsTest.java 8241865 macosx-all ?sun/management/jdp/JdpJmxRemoteDynamicPortTest.java 8241865 macosx-all ?sun/management/jdp/JdpSpecificAddressTest.java 8241865 macosx-all Thanks, in advance, for any comments, questions or suggestions. Dan From igor.ignatyev at oracle.com Thu Jun 25 17:45:02 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 25 Jun 2020 10:45:02 -0700 Subject: RFR(T): 8248341: ProblemList java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java In-Reply-To: References: Message-ID: <2F0B6729-2ED2-45AA-9BB4-92765CFDA3BE@oracle.com> Hi Dan, LGTM -- Igor > On Jun 25, 2020, at 10:34 AM, Daniel D. Daugherty wrote: > > Greetings, > > I'm doing another round of reduce-the-noise in the CI in preparation > for the upcoming weekend... So I have another trivial review... > > Here's the bug for the failures: > > JDK-8247426 ThreadMXBean/ThreadMXBeanStateTest.java still times out > https://bugs.openjdk.java.net/browse/JDK-8247426 > > and here's the bug for the ProblemListing: > > JDK-8248341 ProblemList java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java > https://bugs.openjdk.java.net/browse/JDK-8248341 > > Here's the context diff: > > $ hg diff > diff -r 48dff13bb70a test/jdk/ProblemList.txt > --- a/test/jdk/ProblemList.txt Thu Jun 25 13:10:47 2020 -0400 > +++ b/test/jdk/ProblemList.txt Thu Jun 25 13:27:50 2020 -0400 > @@ -593,6 +593,8 @@ > com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java 8030957 aix-all > com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java 8030957 aix-all > > +java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java 8247426 generic-all > + > sun/management/jdp/JdpDefaultsTest.java 8241865 macosx-all > sun/management/jdp/JdpJmxRemoteDynamicPortTest.java 8241865 macosx-all > sun/management/jdp/JdpSpecificAddressTest.java 8241865 macosx-all > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > > From leonid.mesnik at oracle.com Thu Jun 25 17:58:50 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Thu, 25 Jun 2020 10:58:50 -0700 Subject: RFR: 8242891: vmTestbase/nsk/jvmti/ test should be fixed to fail early if JVMTI function return error In-Reply-To: <547f02e9-604c-09e5-5fe1-2afb1be54f2d@oracle.com> References: <11314027-4965-b38b-6bc7-5011515b94ab@oracle.com> <2cf4e45a-4d44-3c0a-a272-480f56a5e6e8@oracle.com> <3927ae7c-efa9-eb9f-ab98-18d778d5a966@oracle.com> <547f02e9-604c-09e5-5fe1-2afb1be54f2d@oracle.com> Message-ID: <611D0AF5-1295-4DD4-B6C9-F4D379AA069C@oracle.com> Ping > On Jun 12, 2020, at 4:18 PM, Leonid Mesnik wrote: > > Fixed all places, updated copyright. Still need second review > > http://cr.openjdk.java.net/~lmesnik/8242891/webrev.02/ > Leonid > > On 6/11/20 8:41 PM, serguei.spitsyn at oracle.com wrote: >> Hi Leonid, >> >> It is much better now. >> >> Several places still need the same fix. >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetAllThreads/allthr001/allthr001.cpp.frames.html >> >> 211 for (i = 0; i < thrInfo[ind].cnt; i++) { >> 212 for (j = 0, found = 0; j < threadsCount && !found; j++) { >> 213 err = jvmti->GetThreadInfo(threads[j], &inf); >> 214 if (err != JVMTI_ERROR_NONE) { >> 215 printf("Failed to get thread info: %s (%d)\n", >> 216 TranslateError(err), err); >> 217 result = STATUS_FAILED; >> 218 } >> 219 if (printdump == JNI_TRUE) { >> 220 printf(" >>> %s", inf.name); >> 221 } >> 222 found = (inf.name != NULL && >> 223 strstr(inf.name, thrInfo[ind].thrNames[i]) == inf.name && >> 224 (ind == 4 || strlen(inf.name) == >> 225 strlen(thrInfo[ind].thrNames[i]))); >> 226 } >> A return is needed after line 217, otherwise the the inf value is used at lines 222-224. >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetBytecodes/bytecodes003/bytecodes003.cpp.frames.html >> >> A return is needed for the errors: >> 363 result = STATUS_FAILED; >> 372 result = STATUS_FAILED; >> 384 result = STATUS_FAILED; >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/MethodEntry/mentry001/mentry001.cpp.frames.html >> >> A return is needed for the errors: >> 82 result = STATUS_FAILED; >> 94 result = STATUS_FAILED; >> 100 result = STATUS_FAILED; >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/MethodExit/mexit001/mexit001.cpp.frames.html >> >> A return is needed for the error: >> 98 result = STATUS_FAILED; >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/MethodExit/mexit002/mexit002.cpp.frames.html >> >> A return is needed for the error: >> 98 result = STATUS_FAILED; >> >> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jvmti/RedefineClasses/redefclass019/redefclass019.cpp.frames.html >> >> A return is needed for the error: >> 186 result = STATUS_FAILED; >> >> Also, I do not like many uninitialized locals in these tests. >> But it is for another pass. >> >> Otherwise, it looks good. >> No need for another webrev if you fix the above. >> I hope, you will update copyright comments before push. >> >> Thanks, >> Serguei >> >> >> On 6/11/20 15:30, Leonid Mesnik wrote: >>> Agree, it would be better to don't try to use data from functions with error code. The new webrev: >>> >>> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.01/ >>> I tried to prevent any usage of possibly corrupted data. Mostly strings or allocated data, sometimes method/class id which are used my other JVMTI functions. >>> >>> Leonid >>> >>> On 6/9/20 6:59 PM, serguei.spitsyn at oracle.com wrote: >>>> On 6/9/20 12:58, Leonid Mesnik wrote: >>>>> Hi >>>>> >>>>> >>>>> On 6/9/20 12:34 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Leonid, >>>>>> >>>>>> Thank you for taking care about this! >>>>>> It looks good in general. >>>>>> However, I think, a similar return is needed in more cases. >>>>>> >>>>>> One example: >>>>>> >>>>>> http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/test/hotspot/jtreg/vmTestbase/nsk/jvmti/Exception/exception001/exception001.cpp.frames.html >>>>>> >>>>>> 99 err = jvmti_env->GetMethodDeclaringClass(method, &cls); >>>>>> 100 if (err != JVMTI_ERROR_NONE) { >>>>>> 101 printf("(GetMethodDeclaringClass#t) unexpected error: %s (%d)\n", >>>>>> 102 TranslateError(err), err); >>>>>> 103 result = STATUS_FAILED; >>>>>> <> 104 return; >>>>>> 105 } >>>>>> 106 err = jvmti_env->GetClassSignature(cls, &ex.t_cls, &generic); >>>>>> 107 if (err != JVMTI_ERROR_NONE) { >>>>>> 108 printf("(GetClassSignature#t) unexpected error: %s (%d)\n", >>>>>> 109 TranslateError(err), err); >>>>>> 110 result = STATUS_FAILED; >>>>>> 111 } >>>>>> 112 err = jvmti_env->GetMethodName(method, >>>>>> 113 &ex.t_name, &ex.t_sig, &generic); >>>>>> 114 if (err != JVMTI_ERROR_NONE) { >>>>>> 115 printf("(GetMethodName#t) unexpected error: %s (%d)\n", >>>>>> 116 TranslateError(err), err); >>>>>> 117 result = STATUS_FAILED; >>>>>> 118 } >>>>>> 119 ex.t_loc = location; >>>>>> 120 err = jvmti_env->GetMethodDeclaringClass(catch_method, &cls); >>>>>> 121 if (err != JVMTI_ERROR_NONE) { >>>>>> 122 printf("(GetMethodDeclaringClass#c) unexpected error: %s (%d)\n", >>>>>> 123 TranslateError(err), err); >>>>>> 124 result = STATUS_FAILED; >>>>>> <> 125 return; >>>>>> 126 } >>>>>> 127 err = jvmti_env->GetClassSignature(cls, &ex.c_cls, &generic); >>>>>> 128 if (err != JVMTI_ERROR_NONE) { >>>>>> 129 printf("(GetClassSignature#c) unexpected error: %s (%d)\n", >>>>>> 130 TranslateError(err), err); >>>>>> 131 result = STATUS_FAILED; >>>>>> 132 } >>>>>> 133 err = jvmti_env->GetMethodName(catch_method, >>>>>> 134 &ex.c_name, &ex.c_sig, &generic); >>>>>> 135 if (err != JVMTI_ERROR_NONE) { >>>>>> 136 printf("(GetMethodName#c) unexpected error: %s (%d)\n", >>>>>> 137 TranslateError(err), err); >>>>>> 138 result = STATUS_FAILED; >>>>>> 139 } >>>>>> >>>>>> In the fragment above you added return for JVMTI GetMethodDeclaringClass error. >>>>>> But GetMethodName and GetClassSignature can be also problematic as the returned names are printed below. >>>>>> It seems to be more safe and even simpler to add returns for such cases as well. >>>>>> Otherwise, the code reader is puzzled why there is a return in one failure case and there is no such return in another. >>>>> It is a good question if we want to fix such places or even fails with first JVMTI failure. (I even started to fix it in the such way but find that existing tests usually don't fail always). >>>>> >>>> >>>> I do not suggest to fix all the tests but those which you are already fixing. >>>> >>>> >>>>> >>>>> The difference is that test tries to reuse "cls" in other JVMTI function and going to generate very misleading crash. How it just tries to compare ex and exs values. So test might crash but clearly outside of JVMTI function and with some useful info. So I am not sure if fixing these lines improve test failure handling. >>>>> >>>> >>>> If JVMTI functions fail with an error code the results with symbolic strings must be considered invalid. >>>> However, they are used later (the values are printed). >>>> It is better to bail out in such cases. >>>> It should not be a problem to add similar returns in such cases. >>>> Or do you think it is important to continue execution for some reason? >>>> >>>>> Assuming that most of existing tests fails early only if going to re-use possible corrupted data I propose to fix this separately. We need to figure out when to fail or to try to finish. >>>>> >>>> >>>> Do you suggest it for the updated tests only or for all the tests with such problems? >>>> >>>> Thanks, >>>> Serguei >>>> >>>>> >>>>> Leonid >>>>> >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> On 6/1/20 21:33, Leonid Mesnik wrote: >>>>>>> Hi >>>>>>> >>>>>>> Could you please review following fix which stop test execution if JVMTI function returns error. The test fails anyway however using potentially bad data in JVMTI function might cause misleading crash failures. The hs_err will contains the stacktrace not with problem function but with function called with corrupted data. Most of tests already has such behavior but not all. Also I fixed a couple of tests to finish if they haven't managed to suspend thread. >>>>>>> >>>>>>> I've updated only tests which try to use corrupted data in JVMTI functions after errors. I haven't updated tests which just compare/print values from erroring JVMTI functions. The crash in strcmp/println is not so misleading and might be point to real issue. >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~lmesnik/8242891/webrev.00/ >>>>>>> >>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8242891 >>>>>>> >>>>>>> Leonid >>>>>>> >>>>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Thu Jun 25 18:03:04 2020 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Thu, 25 Jun 2020 11:03:04 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler Message-ID: Hi all! I hope you are all doing well! I have a small review request to fix initialization of the heap sampler's byte left variable: http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.00/ The bug is here: https://bugs.openjdk.java.net/browse/JDK-8247615 Note, this passed the submit repo testing. Thanks and have a great day! Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Thu Jun 25 18:07:49 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 25 Jun 2020 19:07:49 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> Message-ID: On 25/06/2020 17:17, serguei.spitsyn at oracle.com wrote: > > New wevrev version is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ > One inconsistency is that it uses getDeclaredMethod to find the 2-arg premain and getMethod to find the 1-arg premain. The latter will fail if the method is not public so you won't get the nice exception message. I wonder if we could fix this at the same time. -Alan. From chris.plummer at oracle.com Thu Jun 25 18:28:08 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 25 Jun 2020 11:28:08 -0700 Subject: RFR(T) 8203005: The top-of-stack type specified for nofast_* bytecodes are wrong In-Reply-To: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> References: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> Message-ID: <1c07074e-52d1-a145-4946-030da72ddd97@oracle.com> Hi Yumin, It looks like the fix for https://bugs.openjdk.java.net/browse/JDK-8174995 has this same bug. What do you think? thanks, Chris On 6/25/20 10:33 AM, Yumin Qi wrote: > Hi, please review the tiny changes for > > bug: https://bugs.openjdk.java.net/browse/JDK-8203005 > > webrev:http://cr.openjdk.java.net/~minqi/2020/8203005/webrev-00/ > > > Summary: The change was left by > 8074345(https://bugs.openjdk.java.net/browse/JDK-8074345), the types > were wrongly put as T_ILLEGAL for T_OBJECT, and T_ILLEGAL for T_INT. > This has not triggered any failures yet since the types stored in the > type array for nofast version are never used, the used types are > always the original types fortunately(unfortunately either). > > > tests: tier1,tier2,tier3 > > > Thanks > > Yumin > From chris.plummer at oracle.com Thu Jun 25 18:37:23 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 25 Jun 2020 11:37:23 -0700 Subject: RFR(T) 8203005: The top-of-stack type specified for nofast_* bytecodes are wrong In-Reply-To: <1c07074e-52d1-a145-4946-030da72ddd97@oracle.com> References: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> <1c07074e-52d1-a145-4946-030da72ddd97@oracle.com> Message-ID: <16c13b8d-5135-9f7c-53d0-bc9f88b5fda8@oracle.com> Nevermind. I should have looked at your full webrev. I see you already covered this. thanks, Chris On 6/25/20 11:28 AM, Chris Plummer wrote: > Hi Yumin, > > It looks like the fix for > https://bugs.openjdk.java.net/browse/JDK-8174995 has this same bug. > What do you think? > > thanks, > > Chris > > On 6/25/20 10:33 AM, Yumin Qi wrote: >> Hi, please review the tiny changes for >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8203005 >> >> webrev:http://cr.openjdk.java.net/~minqi/2020/8203005/webrev-00/ >> >> >> Summary: The change was left by >> 8074345(https://bugs.openjdk.java.net/browse/JDK-8074345), the types >> were wrongly put as T_ILLEGAL for T_OBJECT, and T_ILLEGAL for T_INT. >> This has not triggered any failures yet since the types stored in the >> type array for nofast version are never used, the used types are >> always the original types fortunately(unfortunately either). >> >> >> tests: tier1,tier2,tier3 >> >> >> Thanks >> >> Yumin >> > From serguei.spitsyn at oracle.com Thu Jun 25 18:55:19 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 25 Jun 2020 11:55:19 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> Message-ID: <38c824e9-f94f-4bf5-4767-4dad59d41d86@oracle.com> Thanks you for reviewing, Alan! I'll check if it can be fixed. Thanks, Serguei On 6/25/20 11:07, Alan Bateman wrote: > On 25/06/2020 17:17, serguei.spitsyn at oracle.com wrote: >> >> New wevrev version is: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ >> > One inconsistency is that it uses getDeclaredMethod to find the 2-arg > premain and getMethod to find the 1-arg premain. The latter will fail > if the method is not public so you won't get the nice exception > message. I wonder if we could fix this at the same time. > > -Alan. From daniel.daugherty at oracle.com Thu Jun 25 18:58:56 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 14:58:56 -0400 Subject: RFR(T): 8248341: ProblemList java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java In-Reply-To: <2F0B6729-2ED2-45AA-9BB4-92765CFDA3BE@oracle.com> References: <2F0B6729-2ED2-45AA-9BB4-92765CFDA3BE@oracle.com> Message-ID: <0083808f-5f19-f771-5ff9-bf902faf9c1f@oracle.com> Igor, Thanks for the fast review! Dan On 6/25/20 1:45 PM, Igor Ignatyev wrote: > Hi Dan, > > LGTM > > -- Igor > >> On Jun 25, 2020, at 10:34 AM, Daniel D. Daugherty wrote: >> >> Greetings, >> >> I'm doing another round of reduce-the-noise in the CI in preparation >> for the upcoming weekend... So I have another trivial review... >> >> Here's the bug for the failures: >> >> JDK-8247426 ThreadMXBean/ThreadMXBeanStateTest.java still times out >> https://bugs.openjdk.java.net/browse/JDK-8247426 >> >> and here's the bug for the ProblemListing: >> >> JDK-8248341 ProblemList java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java >> https://bugs.openjdk.java.net/browse/JDK-8248341 >> >> Here's the context diff: >> >> $ hg diff >> diff -r 48dff13bb70a test/jdk/ProblemList.txt >> --- a/test/jdk/ProblemList.txt Thu Jun 25 13:10:47 2020 -0400 >> +++ b/test/jdk/ProblemList.txt Thu Jun 25 13:27:50 2020 -0400 >> @@ -593,6 +593,8 @@ >> com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java 8030957 aix-all >> com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java 8030957 aix-all >> >> +java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java 8247426 generic-all >> + >> sun/management/jdp/JdpDefaultsTest.java 8241865 macosx-all >> sun/management/jdp/JdpJmxRemoteDynamicPortTest.java 8241865 macosx-all >> sun/management/jdp/JdpSpecificAddressTest.java 8241865 macosx-all >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >> >> From yumin.qi at oracle.com Thu Jun 25 19:09:04 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Thu, 25 Jun 2020 12:09:04 -0700 Subject: RFR(T) 8203005: The top-of-stack type specified for nofast_* bytecodes are wrong In-Reply-To: <16c13b8d-5135-9f7c-53d0-bc9f88b5fda8@oracle.com> References: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> <1c07074e-52d1-a145-4946-030da72ddd97@oracle.com> <16c13b8d-5135-9f7c-53d0-bc9f88b5fda8@oracle.com> Message-ID: <45f2a9be-ae49-c4a6-c7bf-4631b2ca5051@oracle.com> Hi, Chris ? Thanks for the review! Thanks Yumin On 6/25/20 11:37 AM, Chris Plummer wrote: > Nevermind. I should have looked at your full webrev. I see you already > covered this. > > thanks, > > Chris > > On 6/25/20 11:28 AM, Chris Plummer wrote: >> Hi Yumin, >> >> It looks like the fix for >> https://bugs.openjdk.java.net/browse/JDK-8174995 has this same bug. >> What do you think? >> >> thanks, >> >> Chris >> >> On 6/25/20 10:33 AM, Yumin Qi wrote: >>> Hi, please review the tiny changes for >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8203005 >>> >>> webrev:http://cr.openjdk.java.net/~minqi/2020/8203005/webrev-00/ >>> >>> >>> Summary: The change was left by >>> 8074345(https://bugs.openjdk.java.net/browse/JDK-8074345), the types >>> were wrongly put as T_ILLEGAL for T_OBJECT, and T_ILLEGAL for T_INT. >>> This has not triggered any failures yet since the types stored in >>> the type array for nofast version are never used, the used types are >>> always the original types fortunately(unfortunately either). >>> >>> >>> tests: tier1,tier2,tier3 >>> >>> >>> Thanks >>> >>> Yumin >>> >> > From daniel.daugherty at oracle.com Thu Jun 25 19:17:52 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 15:17:52 -0400 Subject: RFR(T) 8203005: The top-of-stack type specified for nofast_* bytecodes are wrong In-Reply-To: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> References: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> Message-ID: <29b554fd-6a1c-483c-d830-cf409e984322@oracle.com> On 6/25/20 1:33 PM, Yumin Qi wrote: > Hi, please review the tiny changes for > > bug: https://bugs.openjdk.java.net/browse/JDK-8203005 > > webrev:http://cr.openjdk.java.net/~minqi/2020/8203005/webrev-00/ src/hotspot/share/interpreter/bytecodes.cpp ??? No comments. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/interpreter/Bytecodes.java ??? No comments. Thumbs up. I agree that this is a trivial fix and you don't need to wait 24 hours to push. Dan > > > Summary: The change was left by > 8074345(https://bugs.openjdk.java.net/browse/JDK-8074345), the types > were wrongly put as T_ILLEGAL for T_OBJECT, and T_ILLEGAL for T_INT. > This has not triggered any failures yet since the types stored in the > type array for nofast version are never used, the used types are > always the original types fortunately(unfortunately either). > > > tests: tier1,tier2,tier3 > > > Thanks > > Yumin > From yumin.qi at oracle.com Thu Jun 25 19:21:39 2020 From: yumin.qi at oracle.com (Yumin Qi) Date: Thu, 25 Jun 2020 12:21:39 -0700 Subject: RFR(T) 8203005: The top-of-stack type specified for nofast_* bytecodes are wrong In-Reply-To: <29b554fd-6a1c-483c-d830-cf409e984322@oracle.com> References: <94c9db1c-f1cb-fac0-7aab-7f2f3109a961@oracle.com> <29b554fd-6a1c-483c-d830-cf409e984322@oracle.com> Message-ID: Thanks Dan for the review! On 6/25/20 12:17 PM, Daniel D. Daugherty wrote: > On 6/25/20 1:33 PM, Yumin Qi wrote: >> Hi, please review the tiny changes for >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8203005 >> >> webrev:http://cr.openjdk.java.net/~minqi/2020/8203005/webrev-00/ > > src/hotspot/share/interpreter/bytecodes.cpp > ??? No comments. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/interpreter/Bytecodes.java > > ??? No comments. > > Thumbs up. I agree that this is a trivial fix and you don't need to > wait 24 hours to push. > > Dan > > > > >> >> >> Summary: The change was left by >> 8074345(https://bugs.openjdk.java.net/browse/JDK-8074345), the types >> were wrongly put as T_ILLEGAL for T_OBJECT, and T_ILLEGAL for T_INT. >> This has not triggered any failures yet since the types stored in the >> type array for nofast version are never used, the used types are >> always the original types fortunately(unfortunately either). >> >> >> tests: tier1,tier2,tier3 >> >> >> Thanks >> >> Yumin >> > From serguei.spitsyn at oracle.com Thu Jun 25 20:04:00 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 25 Jun 2020 13:04:00 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Thu Jun 25 20:29:46 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 25 Jun 2020 13:29:46 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> Message-ID: Ping. I still need one more review for this. There was one updated webev. I list it below so you don't need to dig it up in the long email thread: > I've? updated with webrev based on the new finding that a JavaThread > cannot be on the ThreadList after its OS thread has been destroyed > since the JavaThread removes itself from the ThreadList, and therefore > must be running on its OS thread. The logic of the fix is unchanged > from the first webrev, but I updated the comments to better reflect > what is going on. I also updated the CR: > > https://bugs.openjdk.java.net/browse/JDK-8247533 > http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html thanks, Chris On 6/17/20 1:34 PM, Chris Plummer wrote: > Hello, > > Please help review the following: > > https://bugs.openjdk.java.net/browse/JDK-8247533 > http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html > > The CR contains all the needed details. Here's a summary of changes in > each file: > > src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp > src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m > src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp > -Instead of throwing an exception when the OS ThreadID is invalid, > print a warning. > > src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c > -Improve a print_debug message > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java > > -Deal with the array of registers read in being null due to the OS > ThreadID not being valid. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java > > -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" appearing > twice when printing the exception. > > thanks, > > Chris From martinrb at google.com Thu Jun 25 20:38:44 2020 From: martinrb at google.com (Martin Buchholz) Date: Thu, 25 Jun 2020 13:38:44 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: A typo: inital --- I would think you might get a few samples, so maybe check sampledEvents > numThreads/2 or am I misunderstanding? + int sampledEvents = HeapMonitor.sampledEvents(); + if (sampledEvents > 0) { + throw new RuntimeException( + "Sampling the inital allocation too many times: " + sampledEvents); --- Maybe add the word "always" ? @summary Verifies the JVMTI Heap Monitor does not sample the first object. On Thu, Jun 25, 2020 at 11:06 AM Jean Christophe Beyler wrote: > > Hi all! > > I hope you are all doing well! > > I have a small review request to fix initialization of the heap sampler's byte left variable: > http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.00/ > > The bug is here: > https://bugs.openjdk.java.net/browse/JDK-8247615 > > Note, this passed the submit repo testing. > > Thanks and have a great day! > Jc From daniel.daugherty at oracle.com Thu Jun 25 20:44:17 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 16:44:17 -0400 Subject: RFR(T): 8248351: ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows Message-ID: <65f41c30-7642-3d2f-7742-4d4bd63e87cd@oracle.com> Greetings, I'm doing another round of reduce-the-noise in the CI in preparation for the upcoming weekend... So I have another trivial review... Here's the bug for the failures: ??? JDK-8225354 serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java ??????????????? failed with Didn't get ThreadStart events in VM early start phase! ??? https://bugs.openjdk.java.net/browse/JDK-8225354 and here's the bug for the ProblemListing: ??? JDK-8248351 ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows ??? https://bugs.openjdk.java.net/browse/JDK-8248351 Here's the context diff: $ hg diff diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt --- a/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 15:00:59 2020 -0400 +++ b/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 16:40:18 2020 -0400 @@ -102,6 +102,7 @@ ?serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java 8214032 generic-all ?serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatArrayCorrectnessTest.java 8224150 generic-all +serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java 8225354 windows-all ?############################################################################# Thanks, in advance, for any comments, questions or suggestions. Dan From chris.plummer at oracle.com Thu Jun 25 20:52:10 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 25 Jun 2020 13:52:10 -0700 Subject: RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci" In-Reply-To: <6efbc900-732f-ee8b-5561-f9a813ebfeca@oracle.com> References: <28e1b453-e1ea-0a1c-0ae0-0494b52f4b71@oracle.com> <6efbc900-732f-ee8b-5561-f9a813ebfeca@oracle.com> Message-ID: <54134c71-415a-68b0-8d60-0a2229381beb@oracle.com> Ping #2. I still need one more reviewer (Thanks for the review, Dan). I updated the webrev based on Dan's comments: http://cr.openjdk.java.net/~cjplummer/8244383/webrev.01/ I can still make the simplification mentioned below if necessary. thanks, Chris On 6/23/20 11:29 AM, Chris Plummer wrote: > Ping! > > If this fix is too complicated, there is a simplification I can make, > but at the cost of abandoning some attempts to determine the current > frame when this error condition pops up. At the start of > validateInterpreterFrame() it attempts to verify that the frame is > valid by verifying that frame->method and frame->bcp are valid. This > part is pretty simple. The complicated part is everything that follows > if the verification fails. It attempts to error correct the situation > by looking at various register contents and stack contents. I could > just abandon this complicated code and return false if frame->method > and frame->bcp don't check out. Upon return, the caller's code would > be simplified to: > > ??????????? if (validateInterpreterFrame(sp, fp, pc)) { > ????????????? return true; // We're done. setValues() has been called > for valid interpreter frame. > ??????????? } else { > ????????????? return checkLastJavaSP(); > ??????????? } > > So there's still a chance we can determine a valid current frame if > "last java frame" has been setup. However, if not setup we would not > be able to. This is where the complicated code in > validateInterpreterFrame() is useful because it can usually determine > the current frame, even if "last java frame" is not setup, but it's > rare enough that we run into this situation that I think failing to > get the current frame is ok. > > So if I can get a couple promises for reviews if I make this change, > I'll go ahead and do it and send out a new RFR. > > thanks, > > Chris > > On 6/18/20 5:54 PM, Chris Plummer wrote: >> [I've added runtime-dev to this SA review since understanding >> interpreter invokes (code generated by >> TemplateInterpreterGenerator::generate_normal_entry()) and stack >> walking is probably more important than understanding SA.] >> >> Hello, >> >> Please help review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8244383 >> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html >> >> The crux of the bug is when doing stack walking the topmost frame is >> in an inconsistent state because we are in the middle of pushing a >> new interpreter frame. Basically we are executing code generated by >> TemplateInterpreterGenerator::generate_normal_entry(). Since the PC >> register is in this code, SA assumes the topmost frame is an >> interpreter frame. >> >> The first issue with this interpreter frame assumption is if we >> haven't actually pushed the frame yet, then the current frame is the >> caller's frame, and could be compiled. But since SA thinks it's >> interpreted, later on it tries to convert the frame->bcp to a BCI, >> but frame->bcp is only valid for interpreter frames. Thus the >> "illegal BCI" failures. If the previous frame happened to be >> interpreted, then the existing SA code works fine. >> >> The other state of frame pushing that was problematic was when the >> new frame had been pushed, but frame->method and frame->bcp were not >> setup yet. This also would lead to "illegal BCI" later on because >> garbage would be stored in these locations. >> >> Fixing the above problems requires trying to determine the state of >> the frame push through a series of checks, and then adapting what is >> considered to be the current frame based on the outcome of the >> checks. The first things checked is that frame->method is valid (we >> can successfully instantiate a wrapper for the Method* without >> failure) and that frame->bcp is within the method. If both these pass >> then we can use the frame as-is. >> >> If the above checks fail, then we try to determine whether the issue >> is that the frame is not yet pushed and the current frame is actually >> compiled, or the frame has been pushed but not yet initialized. This >> is done by first getting the return address from the stack or RAX >> (it's location depends on how far along we are in the entry code) and >> comparing this to what is stored in frame->return_addr. If they are >> the same, then we have pushed the frame but not yet initialized it. >> In this case we use the previous frame (senderSP() and senderFP()) as >> the current frame since the current frame is not yet initialized. If >> the return address check fails, then we assume the new frame is not >> yet pushed, and and treat the current frame as compiled, even though >> PC points into the interpreter (we replace PC with RAX in this case). >> >> Comments in the code pretty well explain all the above, so it is >> probably easier to follow the logic in the code along with the >> comments rather than apply my above description to the code. >> >> I should add that it's very rare that we ever get into this special >> error handling code. This bug was very hard to reproduce initially. I >> was only able to make progress with reproducing and debugging by >> inserting delay loops in various spots in the code generated by >> TemplateInterpreterGenerator::generate_normal_entry(). By doing this >> I was able to reproduce the issue quite easily and hit all the logic >> in the new code I've added. >> >> The fix is basically entirely contained within >> AMD64CurrentFrameGuess.java. The rest of the changes are minor: >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java >> >> -Main fix for CR >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java >> >> -Added getInterpreterFrameBCP(), which is now needed by >> AMD64CurrentFrameGuess.java >> -I also simplified some code by using the existing >> getInterpreterFrameMethod() >> ?rather than replicating inline what it does. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java >> >> -I noticed the windows version of this code had some extra checks >> that were missing >> ?from the bsd version. I then looked at the linux version, but it had >> been heavily modified >> ?a short while back to leverage DWARF info to determine frames. So I >> looked at the previous >> ?rev and it too had these extra checks. I decided to add them to the >> BSD port. I'm not sure >> ?if it helps at all, but it certainly doesn't seem to do any harm. >> >> thanks, >> >> Chris >> > > From daniel.daugherty at oracle.com Thu Jun 25 21:22:23 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 17:22:23 -0400 Subject: RFR(T): 8248354: ProblemList vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java Message-ID: <5f3115db-6f04-7624-298b-037065644d03@oracle.com> Greetings, I'm doing another round of reduce-the-noise in the CI in preparation for the upcoming weekend... So I have another trivial review... Here's the bug for the failures: ??? JDK-8246493 JDI stress/serial/mixed002 "nsk.share.TestBug: There are more than one(2) ??????????????? instance of 'nsk.share.jpda.StateTestThread in debuggee" ??? https://bugs.openjdk.java.net/browse/JDK-8246493 and here's the bug for the ProblemListing: ??? JDK-8248354 ProblemList vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java ??? https://bugs.openjdk.java.net/browse/JDK-8248354 Here's the context diff: $ hg diff diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt --- a/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 15:00:59 2020 -0400 +++ b/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 17:15:04 2020 -0400 @@ -127,6 +127,7 @@ ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java 7034630 generic-all ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java 8065773 generic-all ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java 8065773 generic-all +vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java 8246493 generic-all ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all Thanks, in advance, for any comments, questions or suggestions. Dan From chris.plummer at oracle.com Thu Jun 25 21:41:39 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 25 Jun 2020 14:41:39 -0700 Subject: RFR (Preliminary): 8248194: Need better support for running SA tests on core files Message-ID: <04c841a1-1e38-155a-5d94-5d0e5a32c708@oracle.com> Hello, Please help with a preliminary review of changes to add better support for writing SA tests that work on core files: https://bugs.openjdk.java.net/browse/JDK-8248194 http://cr.openjdk.java.net/~cjplummer/8248194/webrev.00/index.html As pointed out, this is a preliminary review. I suspect there will be some feedback for changes/improvements. Also, I still need to work out a final solution for how to get LingeredApp to produce a crash. What I currently have works but is somewhat of a hack w.r.t. the makefile change, so you can ignore the makefiile change for now. I'm working on a more proper solution with the build team. As outlined in the CR, these are the 3 main goals of this CR: 1. SATestUtils should include support for finding the core file. This includes parsing the output of the crashed process to locate where the core file was saved, and returning this location to the user. 2. SATestUtils should include support for adding the "ulimit -c unlimited" prefix to the command that will produce the core file, allowing the overriding of any lower limit so we can be sure the core file will be produced. 3. LingeredApp should include support for producing a core file. As proof of concept for these 3 changes in test library support, I'm updating the following 3 tests: ClhsdbCDSCore.java: Use the SATestUtils support listed above. This test does not use LingeredApp, so those improvements don't apply. TestJmapCore.java: Use the SATestUtils support listed above. This test does not use LingeredApp, so those improvements don't apply. ClhsdbFindPC.java: Use all the above features, including having LingeredApp produce a core file. This is the only test modified to start testing on core files that didn't previously do so. It still also tests on a live process. In the future more Clhsdb tests will be converted to work on core files in a manner similar to ClhsdbFindPC. The new SATestUtils code is borrowed from (more like ripped out of) ClhsdbCDSCore.java and TestJmapCore.java. They both had a lot of code dedicated to finding the core file and also applying "ulimit -c unlimitted" if necessary, but didn't do so in quite the same way. Now both these tests share code in SATestUtils.java. One thing I did drop is TestJmapCore.java use of ":KILLED_PID" in the output to help find the core file. It's no longer necessary based on the smarter core locating code I pulled from ClhsdbCDSCore.java. thanks, Chris From chris.plummer at oracle.com Thu Jun 25 21:50:54 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 25 Jun 2020 14:50:54 -0700 Subject: RFR(T): 8248354: ProblemList vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java In-Reply-To: <5f3115db-6f04-7624-298b-037065644d03@oracle.com> References: <5f3115db-6f04-7624-298b-037065644d03@oracle.com> Message-ID: <67af3b9a-cb43-b6c5-53c5-d82342adb7d9@oracle.com> Looks good. This a new failure as of June 2nd and has failed quite a bit the past 3 weeks or so. Chris On 6/25/20 2:22 PM, Daniel D. Daugherty wrote: > Greetings, > > I'm doing another round of reduce-the-noise in the CI in preparation > for the upcoming weekend... So I have another trivial review... > > Here's the bug for the failures: > > ??? JDK-8246493 JDI stress/serial/mixed002 "nsk.share.TestBug: There > are more than one(2) > ??????????????? instance of 'nsk.share.jpda.StateTestThread in debuggee" > ??? https://bugs.openjdk.java.net/browse/JDK-8246493 > > and here's the bug for the ProblemListing: > > ??? JDK-8248354 ProblemList > vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java > ??? https://bugs.openjdk.java.net/browse/JDK-8248354 > > Here's the context diff: > > $ hg diff > diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt > --- a/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 15:00:59 2020 > -0400 > +++ b/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 17:15:04 2020 > -0400 > @@ -127,6 +127,7 @@ > ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java > 7034630 generic-all > ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java > 8065773 generic-all > ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java > 8065773 generic-all > +vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java > 8246493 generic-all > > ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > > > From daniel.daugherty at oracle.com Thu Jun 25 21:52:04 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 17:52:04 -0400 Subject: RFR(T): 8248354: ProblemList vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java In-Reply-To: <67af3b9a-cb43-b6c5-53c5-d82342adb7d9@oracle.com> References: <5f3115db-6f04-7624-298b-037065644d03@oracle.com> <67af3b9a-cb43-b6c5-53c5-d82342adb7d9@oracle.com> Message-ID: <16e8d2dd-52b8-6a0c-aaa8-6859ac1de288@oracle.com> Thanks for the fast review! Dan On 6/25/20 5:50 PM, Chris Plummer wrote: > Looks good. This a new failure as of June 2nd and has failed quite a > bit the past 3 weeks or so. > > Chris > > On 6/25/20 2:22 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> I'm doing another round of reduce-the-noise in the CI in preparation >> for the upcoming weekend... So I have another trivial review... >> >> Here's the bug for the failures: >> >> ??? JDK-8246493 JDI stress/serial/mixed002 "nsk.share.TestBug: There >> are more than one(2) >> ??????????????? instance of 'nsk.share.jpda.StateTestThread in debuggee" >> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >> >> and here's the bug for the ProblemListing: >> >> ??? JDK-8248354 ProblemList >> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >> ??? https://bugs.openjdk.java.net/browse/JDK-8248354 >> >> Here's the context diff: >> >> $ hg diff >> diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt >> --- a/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 15:00:59 2020 >> -0400 >> +++ b/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 17:15:04 2020 >> -0400 >> @@ -127,6 +127,7 @@ >> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >> 7034630 generic-all >> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >> 8065773 generic-all >> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >> 8065773 generic-all >> +vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >> 8246493 generic-all >> >> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >> >> >> > > From daniel.daugherty at oracle.com Thu Jun 25 22:22:14 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 18:22:14 -0400 Subject: RFR(T): 8248351: ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows In-Reply-To: <65f41c30-7642-3d2f-7742-4d4bd63e87cd@oracle.com> References: <65f41c30-7642-3d2f-7742-4d4bd63e87cd@oracle.com> Message-ID: <774f5f42-5f1d-75b5-4042-d9008c60f05b@oracle.com> Any takers? This is a trivial one liner for a single platform here... Dan On 6/25/20 4:44 PM, Daniel D. Daugherty wrote: > Greetings, > > I'm doing another round of reduce-the-noise in the CI in preparation > for the upcoming weekend... So I have another trivial review... > > Here's the bug for the failures: > > ??? JDK-8225354 > serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java > ??????????????? failed with Didn't get ThreadStart events in VM early > start phase! > ??? https://bugs.openjdk.java.net/browse/JDK-8225354 > > and here's the bug for the ProblemListing: > > ??? JDK-8248351 ProblemList > serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java > on Windows > ??? https://bugs.openjdk.java.net/browse/JDK-8248351 > > Here's the context diff: > > $ hg diff > diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt > --- a/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 15:00:59 2020 > -0400 > +++ b/test/hotspot/jtreg/ProblemList.txt??? Thu Jun 25 16:40:18 2020 > -0400 > @@ -102,6 +102,7 @@ > > ?serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java > 8214032 generic-all > ?serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatArrayCorrectnessTest.java > 8224150 generic-all > +serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java > 8225354 windows-all > > ?############################################################################# > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > > > From igor.ignatyev at oracle.com Thu Jun 25 22:27:41 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 25 Jun 2020 15:27:41 -0700 Subject: RFR(T): 8248351: ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows In-Reply-To: <65f41c30-7642-3d2f-7742-4d4bd63e87cd@oracle.com> References: <65f41c30-7642-3d2f-7742-4d4bd63e87cd@oracle.com> Message-ID: LGTM, -- Igor > On Jun 25, 2020, at 1:44 PM, Daniel D. Daugherty wrote: > > Greetings, > > I'm doing another round of reduce-the-noise in the CI in preparation > for the upcoming weekend... So I have another trivial review... > > Here's the bug for the failures: > > JDK-8225354 serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java > failed with Didn't get ThreadStart events in VM early start phase! > https://bugs.openjdk.java.net/browse/JDK-8225354 > > and here's the bug for the ProblemListing: > > JDK-8248351 ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows > https://bugs.openjdk.java.net/browse/JDK-8248351 > > Here's the context diff: > > $ hg diff > diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt > --- a/test/hotspot/jtreg/ProblemList.txt Thu Jun 25 15:00:59 2020 -0400 > +++ b/test/hotspot/jtreg/ProblemList.txt Thu Jun 25 16:40:18 2020 -0400 > @@ -102,6 +102,7 @@ > > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java 8214032 generic-all > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatArrayCorrectnessTest.java 8224150 generic-all > +serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java 8225354 windows-all > > ############################################################################# > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > > > From daniel.daugherty at oracle.com Thu Jun 25 22:28:17 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Jun 2020 18:28:17 -0400 Subject: RFR(T): 8248351: ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows In-Reply-To: References: <65f41c30-7642-3d2f-7742-4d4bd63e87cd@oracle.com> Message-ID: <28bc41ee-5fd1-51a4-9ef5-d2ad5248c726@oracle.com> Thanks Igor!! Dan On 6/25/20 6:27 PM, Igor Ignatyev wrote: > LGTM, > > -- Igor > >> On Jun 25, 2020, at 1:44 PM, Daniel D. Daugherty wrote: >> >> Greetings, >> >> I'm doing another round of reduce-the-noise in the CI in preparation >> for the upcoming weekend... So I have another trivial review... >> >> Here's the bug for the failures: >> >> JDK-8225354 serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java >> failed with Didn't get ThreadStart events in VM early start phase! >> https://bugs.openjdk.java.net/browse/JDK-8225354 >> >> and here's the bug for the ProblemListing: >> >> JDK-8248351 ProblemList serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java on Windows >> https://bugs.openjdk.java.net/browse/JDK-8248351 >> >> Here's the context diff: >> >> $ hg diff >> diff -r cf65909b98c5 test/hotspot/jtreg/ProblemList.txt >> --- a/test/hotspot/jtreg/ProblemList.txt Thu Jun 25 15:00:59 2020 -0400 >> +++ b/test/hotspot/jtreg/ProblemList.txt Thu Jun 25 16:40:18 2020 -0400 >> @@ -102,6 +102,7 @@ >> >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java 8214032 generic-all >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatArrayCorrectnessTest.java 8224150 generic-all >> +serviceability/jvmti/ModuleAwareAgents/ThreadStart/MAAThreadStart.java 8225354 windows-all >> >> ############################################################################# >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >> >> >> From manc at google.com Thu Jun 25 23:53:52 2020 From: manc at google.com (Man Cao) Date: Thu, 25 Jun 2020 16:53:52 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: Thanks for fixing this! > 53 ThreadHeapSampler() : _bytes_until_sample(get_sampling_interval()) { Does this work better? (It has to be done after the initialization of _rnd.) _bytes_until_sample = pick_next_sample(); It could avoid completely missing to sample the first 512K allocation. It could also avoid the problem where every thread deterministically allocates the same object at 512K, although this is unlikely. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From suenaga at oss.nttdata.com Fri Jun 26 01:31:53 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 26 Jun 2020 10:31:53 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <0b70e475-ddc8-e63b-54b5-849b9f2553dd@oracle.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <56591b98-214e-2066-9823-0276345efd29@oracle.com> <0b70e475-ddc8-e63b-54b5-849b9f2553dd@oracle.com> Message-ID: <02d6c032-9f73-34f7-47a8-34a2725661bf@oss.nttdata.com> Hi David, On 2020/06/25 21:48, David Holmes wrote: > Hi Yasumasa, > > On 25/06/2020 6:24 pm, Yasumasa Suenaga wrote: >> Hi David, >> >> Thanks for your comment! >> >> On 2020/06/25 14:17, David Holmes wrote: >>> Hi Yasumasa, >>> >>> Thanks for tackling this. I've had an initial look at it and have a few concerns. >>> >>> On 24/06/2020 4:50 pm, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> Please review this change: >>>> >>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>> >>> Some typos: >>> >>> invaliant -> invariant >>> directry -> directly >> >> I will fix them. >> >> >>>> This change replace following VM operations to direct handshake. >>>> >>>> ??- VM_GetFrameCount (GetFrameCount()) >>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>> ??- VM_GetCurrentLocation >>> >>> It would have been better to split these out into separate changes. I am finding it very hard to track through the webrev and try to compare the old safepoint based operation with the new direct handshake approach, to check they are functionally equivalent. >> >> I will separate them as following. What do you think? >> If you are ok, I will update JBS. >> >> ??- Thread operations >> ????? - VM_GetThreadListStackTraces (GetThreadListStackTrace()) >> ????? - VM_GetStackTrace(GetStackTrace())? <- I missed it to describe in previous mail, sorry. >> >> ??- Frame operations >> ????? - VM_GetFrameCount (GetFrameCount()) >> ????? - VM_GetFrameLocation (GetFrameLocation()) >> ????? - VM_GetCurrentLocation >> >> I will start to work when they are separated. > > If the frame operations are each small enough that will help. I updated JBS as above. >>> You are not checking the return value of Handshake::execute_direct and so are missing the possibility that the target thread has terminated before you got to do the operation on it. It isn't clear to me under what other circumstances execute_direct can also return false. >> >> I will add it. According to Handshake::execute_direct() and HandshakeOperation::do_handshake(), it seems to return false if the target thread has terminated as you said. > > Yes, but also if the handshake is not executed - but I don't know under what conditions that can occur. > >> >>> You don't seem to have these checks anymore in some places: >>> >>> ?? && !_java_thread->is_exiting() && _java_thread->threadObj() != NULL) >>> >>> why not? >> >> I thought the thread which enters handshake is always alive and it has threadObj. > > As far as I can see we can still engage in a handshake with a thread after it has marked itself as exiting. I think the handshake should not be run if its state is exiting because we can deem it as "dead". What do you think? > The threadObj() can only be null while a thread is attaching, which means it would have to checked in the general case, but for these JVM TI operations if we already have a jthread reference to the target thread then it must be beyond that point. Mind you that same logic applies to the existing code so ... > >> I will recover their conditions. >> (I also should recover them for GetOwnedMonitorInfoClosure and GetCurrentContendedMonitorClosure - I removed them in JDK-8242425) > > I think so - and we need to check the return value of execute_direct to determine when to report JVMTI_ERROR_THREAD_NOT_ALIVE. I will file it to JBS. We can get the result from result() in their Closures. JVMTI_ERROR_THREAD_NOT_ALIVE is set by default in GetCurrentContendedMonitorClosure, so we can get this error if the handshake is not completed. Should I check result of execute_direct() even if that? (Of course, we should fix GetOwnedMonitorInfoClosure and threadObj() check) >>> It is not clear that all the code that previously could execute at a safepoint, due to being called from a VM_Operation, is still executable at a safepoint e.g. JvmtiThreadState::count_frames() >>> >>>> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >>> >>> This introduces a large chunk of duplicated code for the frame fill in and final allocation. Can you not reuse the existing logic that does this - and in the process do away with the the use of _needs_thread_state? I really wanted to see simpler code after this conversion. >>> >>> I'm also wondering whether we can hide all this logic in the closure, as was done with the VM_Operation i.e. >>> >>> *stack_info_ptr = op.stack_info(); >> >> I will try to refactor this change. >> >> >>>> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >>> >>> I could not figure out what you were referring to here. >> >> I guess following callpath is available: >> >> VM_GetCurrentLocation >> ?? JvmtiEnvThreadState::reset_current_location() >> ???? JvmtiEventControllerPrivate::recompute_env_thread_enabled() >> ?????? JvmtiEventControllerPrivate::recompute_thread_enabled() >> ???????? JvmtiEventControllerPrivate::set_frame_pop() >> ?????????? JvmtiEventController::set_frame_pop() >> ???????????? JvmtiEnvThreadState::set_frame_pop() >> ?????????????? VM_SetFramePop::doit() >> >> However, VM_SetFramePop seems not to allow nested VM operations. > > It is the outer operation that has to allow nesting but VM_GetCurrentLocation doesn't allow it either. So if this path is possible then something is broken. I'm not sure this path would be happen. However following comments are left in the code: ``` // The java thread stack may not be walkable for a running thread // so get current location at safepoint. VM_GetCurrentLocation op(_thread); ``` Thanks, Yasumasa > Cheers, > David > >> >>>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >>>> k/jdwp. >>> >>> Just a general comment on testing for these conversions to direct handshakes. We have no control over whether the handshake gets executed in the original thread or the target thread, so for all we know all our testing could be executing only one of the cases. This concerns me but I am not yet sure what to do about it. >>> >>> Thanks, >>> David >>> ----- >>> >>>> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa From dthomson at google.com Fri Jun 26 01:53:54 2020 From: dthomson at google.com (Derek Thomson) Date: Thu, 25 Jun 2020 18:53:54 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: > It could also avoid the problem where every thread deterministically allocates the same object at 512K, although this is unlikely. I've recently discovered that with certain server frameworks that this actually becomes quite likely! So I'd strongly recommend using pick_next_sample. On Thu, Jun 25, 2020 at 4:56 PM Man Cao wrote: > Thanks for fixing this! > > > 53 ThreadHeapSampler() : _bytes_until_sample(get_sampling_interval()) { > > Does this work better? (It has to be done after the initialization of > _rnd.) > _bytes_until_sample = pick_next_sample(); > > It could avoid completely missing to sample the first 512K allocation. > It could also avoid the problem where every thread deterministically > allocates the same object at 512K, although this is unlikely. > > -Man > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Jun 26 02:20:50 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 26 Jun 2020 12:20:50 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <02d6c032-9f73-34f7-47a8-34a2725661bf@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <56591b98-214e-2066-9823-0276345efd29@oracle.com> <0b70e475-ddc8-e63b-54b5-849b9f2553dd@oracle.com> <02d6c032-9f73-34f7-47a8-34a2725661bf@oss.nttdata.com> Message-ID: On 26/06/2020 11:31 am, Yasumasa Suenaga wrote: > On 2020/06/25 21:48, David Holmes wrote: >>>> You are not checking the return value of Handshake::execute_direct >>>> and so are missing the possibility that the target thread has >>>> terminated before you got to do the operation on it. It isn't clear >>>> to me under what other circumstances execute_direct can also return >>>> false. >>> >>> I will add it. According to Handshake::execute_direct() and >>> HandshakeOperation::do_handshake(), it seems to return false if the >>> target thread has terminated as you said. >> >> Yes, but also if the handshake is not executed - but I don't know >> under what conditions that can occur. >> >>> >>>> You don't seem to have these checks anymore in some places: >>>> >>>> ?? && !_java_thread->is_exiting() && _java_thread->threadObj() != NULL) >>>> >>>> why not? >>> >>> I thought the thread which enters handshake is always alive and it >>> has threadObj. >> >> As far as I can see we can still engage in a handshake with a thread >> after it has marked itself as exiting. > > I think the handshake should not be run if its state is exiting because > we can deem it as "dead". > What do you think? The thread is marked as exiting fairly early in its termination path and can still interact with oops after that point so we must continue to obey all safety protocols in relation to handshakes and safepoints. I think it is up to the handshake operation to check that the target thread is in a suitable state for processing. > >> The threadObj() can only be null while a thread is attaching, which >> means it would have to checked in the general case, but for these JVM >> TI operations if we already have a jthread reference to the target >> thread then it must be beyond that point. Mind you that same logic >> applies to the existing code so ... >> >>> I will recover their conditions. >>> (I also should recover them for GetOwnedMonitorInfoClosure and >>> GetCurrentContendedMonitorClosure - I removed them in JDK-8242425) >> >> I think so - and we need to check the return value of execute_direct >> to determine when to report JVMTI_ERROR_THREAD_NOT_ALIVE. > > I will file it to JBS. > We can get the result from result() in their Closures. > JVMTI_ERROR_THREAD_NOT_ALIVE is set by default in > GetCurrentContendedMonitorClosure, so we can get this error if the > handshake is not completed. > Should I check result of execute_direct() even if that? > (Of course, we should fix GetOwnedMonitorInfoClosure and threadObj() check) I think it is a little bit too subtle to rely on a default setting for the result (and begs the question why GetOwnedMonitorInfoClosure doesn't also set result to JVMTI_ERROR_THREAD_NOT_ALIVE?). I think we should be establishing a common pattern for writing these Handshake closures and the related operation, in a clear, correct way. > >>>> It is not clear that all the code that previously could execute at a >>>> safepoint, due to being called from a VM_Operation, is still >>>> executable at a safepoint e.g. JvmtiThreadState::count_frames() >>>> >>>>> GetThreadListStackTrace() uses direct handshake if thread count == >>>>> 1. In other case (thread count > 1), it would be performed as VM >>>>> operation (VM_GetThreadListStackTraces). >>>> >>>> This introduces a large chunk of duplicated code for the frame fill >>>> in and final allocation. Can you not reuse the existing logic that >>>> does this - and in the process do away with the the use of >>>> _needs_thread_state? I really wanted to see simpler code after this >>>> conversion. >>>> >>>> I'm also wondering whether we can hide all this logic in the >>>> closure, as was done with the VM_Operation i.e. >>>> >>>> *stack_info_ptr = op.stack_info(); >>> >>> I will try to refactor this change. >>> >>> >>>>> Caller of VM_GetCurrentLocation >>>>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>>>> safepoint. So I added safepoint check in its caller. >>>> >>>> I could not figure out what you were referring to here. >>> >>> I guess following callpath is available: >>> >>> VM_GetCurrentLocation >>> ?? JvmtiEnvThreadState::reset_current_location() >>> ???? JvmtiEventControllerPrivate::recompute_env_thread_enabled() >>> ?????? JvmtiEventControllerPrivate::recompute_thread_enabled() >>> ???????? JvmtiEventControllerPrivate::set_frame_pop() >>> ?????????? JvmtiEventController::set_frame_pop() >>> ???????????? JvmtiEnvThreadState::set_frame_pop() >>> ?????????????? VM_SetFramePop::doit() >>> >>> However, VM_SetFramePop seems not to allow nested VM operations. >> >> It is the outer operation that has to allow nesting but >> VM_GetCurrentLocation doesn't allow it either. So if this path is >> possible then something is broken. > > I'm not sure this path would be happen. However following comments are > left in the code: > > ``` > ????? // The java thread stack may not be walkable for a running thread > ????? // so get current location at safepoint. > ????? VM_GetCurrentLocation op(_thread); > ``` I'm not sure what point is being made with all this. It's not safe to ask a running thread for the current location, so it must be done via a safepoint VM operation, or now a direct handshake operation with the thread. Thanks, David ----- > > Thanks, > > Yasumasa From suenaga at oss.nttdata.com Fri Jun 26 02:55:59 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 26 Jun 2020 11:55:59 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <56591b98-214e-2066-9823-0276345efd29@oracle.com> <0b70e475-ddc8-e63b-54b5-849b9f2553dd@oracle.com> <02d6c032-9f73-34f7-47a8-34a2725661bf@oss.nttdata.com> Message-ID: Hi David, >> We can get the result from result() in their Closures. JVMTI_ERROR_THREAD_NOT_ALIVE is set by default in GetCurrentContendedMonitorClosure, so we can get this error if the handshake is not completed. >> Should I check result of execute_direct() even if that? >> (Of course, we should fix GetOwnedMonitorInfoClosure and threadObj() check) > > I think it is a little bit too subtle to rely on a default setting for the result (and begs the question why GetOwnedMonitorInfoClosure doesn't also set result to JVMTI_ERROR_THREAD_NOT_ALIVE?). I think we should be establishing a common pattern for writing these Handshake closures and the related operation, in a clear, correct way. Ok, I will check the result from execute_direct(). I filed this issue as JDK-8248379. I will fix this at first, and I will start to work other enhancements in same way. Thanks, Yasumasa On 2020/06/26 11:20, David Holmes wrote: > > > On 26/06/2020 11:31 am, Yasumasa Suenaga wrote: >> On 2020/06/25 21:48, David Holmes wrote: >>>>> You are not checking the return value of Handshake::execute_direct and so are missing the possibility that the target thread has terminated before you got to do the operation on it. It isn't clear to me under what other circumstances execute_direct can also return false. >>>> >>>> I will add it. According to Handshake::execute_direct() and HandshakeOperation::do_handshake(), it seems to return false if the target thread has terminated as you said. >>> >>> Yes, but also if the handshake is not executed - but I don't know under what conditions that can occur. >>> >>>> >>>>> You don't seem to have these checks anymore in some places: >>>>> >>>>> ?? && !_java_thread->is_exiting() && _java_thread->threadObj() != NULL) >>>>> >>>>> why not? >>>> >>>> I thought the thread which enters handshake is always alive and it has threadObj. >>> >>> As far as I can see we can still engage in a handshake with a thread after it has marked itself as exiting. >> >> I think the handshake should not be run if its state is exiting because we can deem it as "dead". >> What do you think? > > The thread is marked as exiting fairly early in its termination path and can still interact with oops after that point so we must continue to obey all safety protocols in relation to handshakes and safepoints. I think it is up to the handshake operation to check that the target thread is in a suitable state for processing. > >> >>> The threadObj() can only be null while a thread is attaching, which means it would have to checked in the general case, but for these JVM TI operations if we already have a jthread reference to the target thread then it must be beyond that point. Mind you that same logic applies to the existing code so ... >>> >>>> I will recover their conditions. >>>> (I also should recover them for GetOwnedMonitorInfoClosure and GetCurrentContendedMonitorClosure - I removed them in JDK-8242425) >>> >>> I think so - and we need to check the return value of execute_direct to determine when to report JVMTI_ERROR_THREAD_NOT_ALIVE. >> >> I will file it to JBS. >> We can get the result from result() in their Closures. JVMTI_ERROR_THREAD_NOT_ALIVE is set by default in GetCurrentContendedMonitorClosure, so we can get this error if the handshake is not completed. >> Should I check result of execute_direct() even if that? >> (Of course, we should fix GetOwnedMonitorInfoClosure and threadObj() check) > > I think it is a little bit too subtle to rely on a default setting for the result (and begs the question why GetOwnedMonitorInfoClosure doesn't also set result to JVMTI_ERROR_THREAD_NOT_ALIVE?). I think we should be establishing a common pattern for writing these Handshake closures and the related operation, in a clear, correct way. > >> >>>>> It is not clear that all the code that previously could execute at a safepoint, due to being called from a VM_Operation, is still executable at a safepoint e.g. JvmtiThreadState::count_frames() >>>>> >>>>>> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >>>>> >>>>> This introduces a large chunk of duplicated code for the frame fill in and final allocation. Can you not reuse the existing logic that does this - and in the process do away with the the use of _needs_thread_state? I really wanted to see simpler code after this conversion. >>>>> >>>>> I'm also wondering whether we can hide all this logic in the closure, as was done with the VM_Operation i.e. >>>>> >>>>> *stack_info_ptr = op.stack_info(); >>>> >>>> I will try to refactor this change. >>>> >>>> >>>>>> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >>>>> >>>>> I could not figure out what you were referring to here. >>>> >>>> I guess following callpath is available: >>>> >>>> VM_GetCurrentLocation >>>> ?? JvmtiEnvThreadState::reset_current_location() >>>> ???? JvmtiEventControllerPrivate::recompute_env_thread_enabled() >>>> ?????? JvmtiEventControllerPrivate::recompute_thread_enabled() >>>> ???????? JvmtiEventControllerPrivate::set_frame_pop() >>>> ?????????? JvmtiEventController::set_frame_pop() >>>> ???????????? JvmtiEnvThreadState::set_frame_pop() >>>> ?????????????? VM_SetFramePop::doit() >>>> >>>> However, VM_SetFramePop seems not to allow nested VM operations. >>> >>> It is the outer operation that has to allow nesting but VM_GetCurrentLocation doesn't allow it either. So if this path is possible then something is broken. >> >> I'm not sure this path would be happen. However following comments are left in the code: >> >> ``` >> ?????? // The java thread stack may not be walkable for a running thread >> ?????? // so get current location at safepoint. >> ?????? VM_GetCurrentLocation op(_thread); >> ``` > > I'm not sure what point is being made with all this. It's not safe to ask a running thread for the current location, so it must be done via a safepoint VM operation, or now a direct handshake operation with the thread. > > Thanks, > David > ----- > >> >> Thanks, >> >> Yasumasa From serguei.spitsyn at oracle.com Fri Jun 26 04:48:35 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 25 Jun 2020 21:48:35 -0700 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> Message-ID: <466fae56-e60b-c4ed-71c1-ec19d9886e97@oracle.com> An HTML attachment was scrubbed... URL: From suenaga at oss.nttdata.com Fri Jun 26 05:02:14 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 26 Jun 2020 14:02:14 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <466fae56-e60b-c4ed-71c1-ec19d9886e97@oracle.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <466fae56-e60b-c4ed-71c1-ec19d9886e97@oracle.com> Message-ID: Hi Serguei, Thanks for your comment! I will fix them after JDK-8248379. Yasumasa On 2020/06/26 13:48, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > I agree with the approach to separate this into different bugs. > At least, it would be nice to separate the stack trace functions. > It will help to better focus on each fix and improve review quality. > > I'd wait for new webrevs from you before diving deep with reviewing. > A couple of quick comments. > > http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/src/hotspot/share/prims/jvmtiEnv.cpp.udiff.html > > I also do not like the complexity of the stack trace updateand extra boolean argument in GetStackTraceClosure. > It feels like it can be simpler. > I'd suggest to do some renaming as the identifiers you use are not typical in the jvmtiEnv.cpp: > target_javathread? => java_thread > actual_frame_count => frame_count > > The GetStackTraceClosureis better to have the same stack_info() function as VM_op: > ?? *stack_info_ptr = op.stack_info(); > At least, this part will be unified between VM_op and HandshakeClosure. > > http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/src/hotspot/share/prims/jvmtiThreadState.cpp.udiff.html > > - (JavaThread *)Thread::current() == get_thread(), > - "must be current thread or at safepoint"); > + current_thread == get_thread() || > + (current_thread->is_Java_thread() && (current_thread == get_thread()->active_handshaker())), > + "must be at safepoint or target thread is suspended"); > > > There is no check that the target thread is suspended. > You, probably, wanted to say about handshake instead. > > > Thanks, > Serguei > > > On 6/23/20 23:50, Yasumasa Suenaga wrote: >> Hi all, >> >> Please review this change: >> >> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >> >> This change replace following VM operations to direct handshake. >> >> ?- VM_GetFrameCount (GetFrameCount()) >> ?- VM_GetFrameLocation (GetFrameLocation()) >> ?- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >> ?- VM_GetCurrentLocation >> >> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >> >> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >> k/jdwp. >> >> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >> >> >> Thanks, >> >> Yasumasa > From suenaga at oss.nttdata.com Fri Jun 26 07:03:01 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 26 Jun 2020 16:03:01 +0900 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations Message-ID: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> Hi all, Please review this change. JBS: https://bugs.openjdk.java.net/browse/JDK-8248379 webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8248379/webrev.00/ JDK-8242425 introduces to migrate to thread local handshake from VM operation for GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, and GetCurrentContendedMonitor JVMTI functions. However it lacks of validations for thread state and thread oop of the target. This change has been tested on submit repo and serviceability/jvmti, serviceability/jdwp vmTestbase/nsk/jvmti, vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. On submit repo, tools/javac/7118412/ShadowingTest.java and java/foreign/TestMismatch.java were failed (mach5-one-ysuenaga-JDK-8248379-20200626-0503-12110818). However they do not seems to be related to this change. (Both tests have been passed on my Linux AMD64) Thanks, Yasumasa From serguei.spitsyn at oracle.com Fri Jun 26 16:40:23 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 26 Jun 2020 09:40:23 -0700 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations In-Reply-To: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> References: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> Message-ID: An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Jun 26 22:50:02 2020 From: david.holmes at oracle.com (David Holmes) Date: Sat, 27 Jun 2020 08:50:02 +1000 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations In-Reply-To: References: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> Message-ID: Hi Serguei, On 27/06/2020 2:40 am, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > I see, some VM_op's also have this check: > > 1546 ThreadsListHandle tlh; > 1547 if (jt != NULL && tlh.includes(jt) > > > I wonder if it make sense to add as well. If you are executing the handshake operation then you are in a handshake with the target thread which means it must exist in some ThreadsList. Cheers, David ----- > Otherwise, it looks good to me. > > Thanks, > Serguei > > On 6/26/20 00:03, Yasumasa Suenaga wrote: >> Hi all, >> >> Please review this change. >> >> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8248379 >> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8248379/webrev.00/ >> >> JDK-8242425 introduces to migrate to thread local handshake from VM >> operation for GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, and >> GetCurrentContendedMonitor JVMTI functions. However it lacks of >> validations for thread state and thread oop of the target. >> >> This change has been tested on submit repo and serviceability/jvmti, >> serviceability/jdwp vmTestbase/nsk/jvmti, vmTestbase/nsk/jdi >> vmTestbase/nsk/jdwp. >> On submit repo, tools/javac/7118412/ShadowingTest.java and >> java/foreign/TestMismatch.java were failed >> (mach5-one-ysuenaga-JDK-8248379-20200626-0503-12110818). However they >> do not seems to be related to this change. >> (Both tests have been passed on my Linux AMD64) >> >> >> Thanks, >> >> Yasumasa > From chris.plummer at oracle.com Fri Jun 26 23:03:40 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 26 Jun 2020 16:03:40 -0700 Subject: [15] RFR(XXS): 7107012: sun.jvm.hostspot.code.CompressedReadStream readDouble() conversion to long mishandled Message-ID: <151408c7-a9b7-7c02-d3e3-fc4155c1152f@oracle.com> Hello, Please help review the following: http://cr.openjdk.java.net/~cjplummer/7107012/webrev.00/index.html https://bugs.openjdk.java.net/browse/JDK-7107012 This bug is filed as confidential, although the issue is trivial. In the following line of code: ??? return Double.longBitsToDouble((h << 32) | ((long)l & 0x00000000FFFFFFFFL)); Since h is an int, it's subject to the following: https://docs.oracle.com/javase/specs/jls/se14/html/jls-15.html#jls-15.19 "If the promoted type of the left-hand operand is int, then only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & (?15.22.1) with the mask value 0x1f (0b11111). The shift distance actually used is therefore always in the range 0 to 31, inclusive." So (h << 32) is the same as (h << 0), which is not what was intended. The spec also calls out another issue: "The type of the shift expression is the promoted type of the left-hand operand." So even if it did left shift 32 bits, the result would have been truncated to an int, meaning the result would always be 0. The fix is to first cast h to a long. Doing this addresses both these problems, allowing a full 32 bit left shift to be done, and leaving the result as an untruncated long. I was unable to trigger use of this code in SA. It seems to be used to pull locals out of a CompiledVFrame. I don't see any clhsdb paths to this code. It appears the GUI hsdb uses it via a complex call path I could not fully decipher, but I could not trigger its use from hsdb. In any case, the fix is straight forward and trivial, so I'd rather not have to spend more time digging deeper into its use and providing a test case. thanks, Chris From serguei.spitsyn at oracle.com Fri Jun 26 23:51:51 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 26 Jun 2020 16:51:51 -0700 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations In-Reply-To: References: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> Message-ID: Hi David, Thank you for clarification. Thanks, Serguei On 6/26/20 15:50, David Holmes wrote: > Hi Serguei, > > On 27/06/2020 2:40 am, serguei.spitsyn at oracle.com wrote: >> Hi Yasumasa, >> >> I see, some VM_op's also have this check: >> >> 1546?? ThreadsListHandle tlh; >> 1547?? if (jt != NULL && tlh.includes(jt) >> >> >> I wonder if it make sense to add as well. > > If you are executing the handshake operation then you are in a > handshake with the target thread which means it must exist in some > ThreadsList. > > Cheers, > David > ----- > >> Otherwise, it looks good to me. >> >> Thanks, >> Serguei >> >> On 6/26/20 00:03, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> Please review this change. >>> >>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8248379 >>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8248379/webrev.00/ >>> >>> JDK-8242425 introduces to migrate to thread local handshake from VM >>> operation for GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, >>> and GetCurrentContendedMonitor JVMTI functions. However it lacks of >>> validations for thread state and thread oop of the target. >>> >>> This change has been tested on submit repo and serviceability/jvmti, >>> serviceability/jdwp vmTestbase/nsk/jvmti, vmTestbase/nsk/jdi >>> vmTestbase/nsk/jdwp. >>> On submit repo, tools/javac/7118412/ShadowingTest.java and >>> java/foreign/TestMismatch.java were failed >>> (mach5-one-ysuenaga-JDK-8248379-20200626-0503-12110818). However >>> they do not seems to be related to this change. >>> (Both tests have been passed on my Linux AMD64) >>> >>> >>> Thanks, >>> >>> Yasumasa >> From serguei.spitsyn at oracle.com Sat Jun 27 00:40:52 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 26 Jun 2020 17:40:52 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> Message-ID: <9e5dbaf6-9a08-951a-61f2-c50300b79457@oracle.com> On 6/25/20 11:07, Alan Bateman wrote: > On 25/06/2020 17:17, serguei.spitsyn at oracle.com wrote: >> >> New wevrev version is: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ >> > One inconsistency is that it uses getDeclaredMethod to find the 2-arg > premain and getMethod to find the 1-arg premain. The latter will fail > if the method is not public so you won't get the nice exception > message. I wonder if we could fix this at the same time. The implementation has this order of lookup: ??????? // The agent class must have a premain or agentmain method that ??????? // has 1 or 2 arguments. We check in the following order: ??????? // ??????? // 1) declared with a signature of (String, Instrumentation) ??????? // 2) declared with a signature of (String) ??????? // 3) inherited with a signature of (String, Instrumentation) ??????? // 4) inherited with a signature of (String) The declared methods are gotten with the getDeclaredMethod and inherited with the getMethod. This works for both 1-arg and 2-arg premain methods, so I'm not sure what is inconsistent. Or you have a concern there can be a non-nice NoSuchMethodException? In fact, I don't understand why there is a need to use the getDeclaredMethod. As I see, the getMethod should return a declared method first, and only if it is absent then it checks for a inherited one. Thanks, Serguei > > -Alan. From serguei.spitsyn at oracle.com Sat Jun 27 00:51:20 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 26 Jun 2020 17:51:20 -0700 Subject: [15] RFR(XXS): 7107012: sun.jvm.hostspot.code.CompressedReadStream readDouble() conversion to long mishandled In-Reply-To: <151408c7-a9b7-7c02-d3e3-fc4155c1152f@oracle.com> References: <151408c7-a9b7-7c02-d3e3-fc4155c1152f@oracle.com> Message-ID: <438b80e5-5094-69e6-c12a-b712be33a6a6@oracle.com> Hi Chris, The fix looks good. I would most likely overlook such a bug with my eyes. :) Thanks, Serguei On 6/26/20 16:03, Chris Plummer wrote: > Hello, > > Please help review the following: > > http://cr.openjdk.java.net/~cjplummer/7107012/webrev.00/index.html > https://bugs.openjdk.java.net/browse/JDK-7107012 > > This bug is filed as confidential, although the issue is trivial. In > the following line of code: > > ??? return Double.longBitsToDouble((h << 32) | ((long)l & > 0x00000000FFFFFFFFL)); > > Since h is an int, it's subject to the following: > > https://docs.oracle.com/javase/specs/jls/se14/html/jls-15.html#jls-15.19 > > "If the promoted type of the left-hand operand is int, then only the > five lowest-order bits of the right-hand operand are used as the shift > distance. It is as if the right-hand operand were subjected to a > bitwise logical AND operator & (?15.22.1) with the mask value 0x1f > (0b11111). The shift distance actually used is therefore always in the > range 0 to 31, inclusive." > > So (h << 32) is the same as (h << 0), which is not what was intended. > The spec also calls out another issue: > > "The type of the shift expression is the promoted type of the > left-hand operand." > > So even if it did left shift 32 bits, the result would have been > truncated to an int, meaning the result would always be 0. The fix is > to first cast h to a long. Doing this addresses both these problems, > allowing a full 32 bit left shift to be done, and leaving the result > as an untruncated long. > > I was unable to trigger use of this code in SA. It seems to be used to > pull locals out of a CompiledVFrame. I don't see any clhsdb paths to > this code. It appears the GUI hsdb uses it via a complex call path I > could not fully decipher, but I could not trigger its use from hsdb. > In any case, the fix is straight forward and trivial, so I'd rather > not have to spend more time digging deeper into its use and providing > a test case. > > thanks, > > Chris > From suenaga at oss.nttdata.com Sat Jun 27 01:54:31 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Sat, 27 Jun 2020 10:54:31 +0900 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations In-Reply-To: References: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> Message-ID: <592326cc-8be4-b804-eb06-54c068330297@oss.nttdata.com> Hi Serguei, David, >> If you are executing the handshake operation then you are in a handshake with the target thread which means it must exist in some ThreadsList. Yes, it is checked at Handshake::execute_direct(). ``` 349 ThreadsListHandle tlh; 350 if (tlh.includes(target)) { 351 target->set_handshake_operation(&op); 352 } else { 353 log_handshake_info(start_time_ns, op.name(), 0, 0, "(thread dead)"); 354 return false; 355 } ``` Serguei, can I list you as a Reviewer? If so, I will push this change when I got second reviewer. Thanks, Yasumasa On 2020/06/27 8:51, serguei.spitsyn at oracle.com wrote: > Hi David, > > Thank you for clarification. > > Thanks, > Serguei > > > On 6/26/20 15:50, David Holmes wrote: >> Hi Serguei, >> >> On 27/06/2020 2:40 am, serguei.spitsyn at oracle.com wrote: >>> Hi Yasumasa, >>> >>> I see, some VM_op's also have this check: >>> >>> 1546?? ThreadsListHandle tlh; >>> 1547?? if (jt != NULL && tlh.includes(jt) >>> >>> >>> I wonder if it make sense to add as well. >> >> If you are executing the handshake operation then you are in a handshake with the target thread which means it must exist in some ThreadsList. >> >> Cheers, >> David >> ----- >> >>> Otherwise, it looks good to me. >>> >>> Thanks, >>> Serguei >>> >>> On 6/26/20 00:03, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> Please review this change. >>>> >>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8248379 >>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8248379/webrev.00/ >>>> >>>> JDK-8242425 introduces to migrate to thread local handshake from VM operation for GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, and GetCurrentContendedMonitor JVMTI functions. However it lacks of validations for thread state and thread oop of the target. >>>> >>>> This change has been tested on submit repo and serviceability/jvmti, serviceability/jdwp vmTestbase/nsk/jvmti, vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. >>>> On submit repo, tools/javac/7118412/ShadowingTest.java and java/foreign/TestMismatch.java were failed (mach5-one-ysuenaga-JDK-8248379-20200626-0503-12110818). However they do not seems to be related to this change. >>>> (Both tests have been passed on my Linux AMD64) >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>> > From leonid.mesnik at oracle.com Sat Jun 27 02:51:53 2020 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Fri, 26 Jun 2020 19:51:53 -0700 Subject: RFR (Preliminary): 8248194: Need better support for running SA tests on core files In-Reply-To: <04c841a1-1e38-155a-5d94-5d0e5a32c708@oracle.com> References: <04c841a1-1e38-155a-5d94-5d0e5a32c708@oracle.com> Message-ID: <8BD2E0F8-EE7B-4308-A058-2EC235D5212F@oracle.com> Hi The idea basically looks good. I think it just make a sense to polish it a little bit to hide "sh" usage from test and get core from OutputAnalyzer. Also, there is a 'CrashApp' in ClhsdbCDSCore.java. Makes it sense to unify it with LingeredApp crasher? Currently, it uses Unsafe to crash application. Also, crashes are used in other tests, I see some implementations in open/test/hotspot/jtreg/vmTestbase/vm/share/vmcrasher So it would be nice to have some common way to crash hotspot. Leonid > On Jun 25, 2020, at 2:41 PM, Chris Plummer wrote: > > Hello, > > Please help with a preliminary review of changes to add better support for writing SA tests that work on core files: > > https://bugs.openjdk.java.net/browse/JDK-8248194 > http://cr.openjdk.java.net/~cjplummer/8248194/webrev.00/index.html > > As pointed out, this is a preliminary review. I suspect there will be some feedback for changes/improvements. Also, I still need to work out a final solution for how to get LingeredApp to produce a crash. What I currently have works but is somewhat of a hack w.r.t. the makefile change, so you can ignore the makefiile change for now. I'm working on a more proper solution with the build team. > > As outlined in the CR, these are the 3 main goals of this CR: > > 1. SATestUtils should include support for finding the core file. This includes parsing the output of the crashed process to locate where the core file was saved, and returning this location to the user. > > 2. SATestUtils should include support for adding the "ulimit -c unlimited" prefix to the command that will produce the core file, allowing the overriding of any lower limit so we can be sure the core file will be produced. > > 3. LingeredApp should include support for producing a core file. > > As proof of concept for these 3 changes in test library support, I'm updating the following 3 tests: > > ClhsdbCDSCore.java: Use the SATestUtils support listed above. This test does not use LingeredApp, so those improvements don't apply. > > TestJmapCore.java: Use the SATestUtils support listed above. This test does not use LingeredApp, so those improvements don't apply. > > ClhsdbFindPC.java: Use all the above features, including having LingeredApp produce a core file. This is the only test modified to start testing on core files that didn't previously do so. It still also tests on a live process. > > In the future more Clhsdb tests will be converted to work on core files in a manner similar to ClhsdbFindPC. > > The new SATestUtils code is borrowed from (more like ripped out of) ClhsdbCDSCore.java and TestJmapCore.java. They both had a lot of code dedicated to finding the core file and also applying "ulimit -c unlimitted" if necessary, but didn't do so in quite the same way. Now both these tests share code in SATestUtils.java. One thing I did drop is TestJmapCore.java use of ":KILLED_PID" in the output to help find the core file. It's no longer necessary based on the smarter core locating code I pulled from ClhsdbCDSCore.java. > > thanks, > > Chris From chris.plummer at oracle.com Sat Jun 27 03:42:07 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 26 Jun 2020 20:42:07 -0700 Subject: RFR (Preliminary): 8248194: Need better support for running SA tests on core files In-Reply-To: <8BD2E0F8-EE7B-4308-A058-2EC235D5212F@oracle.com> References: <04c841a1-1e38-155a-5d94-5d0e5a32c708@oracle.com> <8BD2E0F8-EE7B-4308-A058-2EC235D5212F@oracle.com> Message-ID: <2ac19e8a-0f0d-a215-6f92-748fd8c0c5ab@oracle.com> Hi Leonid, On 6/26/20 7:51 PM, Leonid Mesnik wrote: > Hi > > The idea basically looks good. I think it just make a sense to polish it a little bit to hide "sh" usage from test and get core from OutputAnalyzer. Ok, I'll look into both of those. > > Also, there is a 'CrashApp' in ClhsdbCDSCore.java. Makes it sense to unify it with LingeredApp crasher? Currently, it uses Unsafe to crash application. Yes, I purposely didn't not make that change. My main goal with the LingeredApp changes is to make it easier to make existing LingeredApp SA tests run on both a live process and on a core, and my main goal with ClhsdbCDSCore and TestJmapCore was to move the core finding code and ulimit code to a common location that could be reused by other tests. Keep in mind that ClhsdbLauncher and LingeredApp are independent of each other. You can have a LingeredApp tests that use or don't use ClhsdbLauncher, and you can have a non-LingeredApp tests that use or don't use ClhsdbLauncher. So I didn't want to go down the path of changing ClhsdbCDSCore (a non LingeredApp test) to use LingeredApp. Likewise I did not change TestJmapCore to use LingeredApp or ClhsdbLauncher. Possibly there is good reason to convert some of the tests to start using LingeredApp and/or ClhsdbLauncher, but that should be done under a separate RFE. > > Also, crashes are used in other tests, I see some implementations in > open/test/hotspot/jtreg/vmTestbase/vm/share/vmcrasher I don't see vmcrasher being used by any tests. In any case, my first attempt went down the Unsafe path to produce a crash. The issue is that it forces every user of LingeredApp to include the @module for Unsafe. I also tried using a WhiteBox API. That was worse, also requiring every user of LingeredApp to include an @module, plus the tests that actually want to cause a crash need to @build WhiteBox.java and then do the classfile install. It also required additional module related hacks in LingeredApp. The issue with my current solution is how to get libLingeredApp.c to compile has not been settled on. I'm still waiting for an answer from the build team. > > So it would be nice to have some common way to crash hotspot. I can see possibly moving the crashing code out of LingeredApp and into a native lib that non-LingeredApp tests can use, although that really is just a very small part of the changes to LingeredApp. For the most part the changes would look the same except you would call a different API to cause the crash. > > Leonid > Thanks for having a look. Chris >> On Jun 25, 2020, at 2:41 PM, Chris Plummer wrote: >> >> Hello, >> >> Please help with a preliminary review of changes to add better support for writing SA tests that work on core files: >> >> https://bugs.openjdk.java.net/browse/JDK-8248194 >> http://cr.openjdk.java.net/~cjplummer/8248194/webrev.00/index.html >> >> As pointed out, this is a preliminary review. I suspect there will be some feedback for changes/improvements. Also, I still need to work out a final solution for how to get LingeredApp to produce a crash. What I currently have works but is somewhat of a hack w.r.t. the makefile change, so you can ignore the makefiile change for now. I'm working on a more proper solution with the build team. >> >> As outlined in the CR, these are the 3 main goals of this CR: >> >> 1. SATestUtils should include support for finding the core file. This includes parsing the output of the crashed process to locate where the core file was saved, and returning this location to the user. >> >> 2. SATestUtils should include support for adding the "ulimit -c unlimited" prefix to the command that will produce the core file, allowing the overriding of any lower limit so we can be sure the core file will be produced. >> >> 3. LingeredApp should include support for producing a core file. >> >> As proof of concept for these 3 changes in test library support, I'm updating the following 3 tests: >> >> ClhsdbCDSCore.java: Use the SATestUtils support listed above. This test does not use LingeredApp, so those improvements don't apply. >> >> TestJmapCore.java: Use the SATestUtils support listed above. This test does not use LingeredApp, so those improvements don't apply. >> >> ClhsdbFindPC.java: Use all the above features, including having LingeredApp produce a core file. This is the only test modified to start testing on core files that didn't previously do so. It still also tests on a live process. >> >> In the future more Clhsdb tests will be converted to work on core files in a manner similar to ClhsdbFindPC. >> >> The new SATestUtils code is borrowed from (more like ripped out of) ClhsdbCDSCore.java and TestJmapCore.java. They both had a lot of code dedicated to finding the core file and also applying "ulimit -c unlimitted" if necessary, but didn't do so in quite the same way. Now both these tests share code in SATestUtils.java. One thing I did drop is TestJmapCore.java use of ":KILLED_PID" in the output to help find the core file. It's no longer necessary based on the smarter core locating code I pulled from ClhsdbCDSCore.java. >> >> thanks, >> >> Chris From serguei.spitsyn at oracle.com Sat Jun 27 04:58:30 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 26 Jun 2020 21:58:30 -0700 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations In-Reply-To: <592326cc-8be4-b804-eb06-54c068330297@oss.nttdata.com> References: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> <592326cc-8be4-b804-eb06-54c068330297@oss.nttdata.com> Message-ID: Hi Yasumasa, Yes, you can list me as a reviewer. Thanks, Serguei On 6/26/20 18:54, Yasumasa Suenaga wrote: > Hi Serguei, David, > >>> If you are executing the handshake operation then you are in a >>> handshake with the target thread which means it must exist in some >>> ThreadsList. > > Yes, it is checked at Handshake::execute_direct(). > > ``` > 349?? ThreadsListHandle tlh; > 350?? if (tlh.includes(target)) { > 351???? target->set_handshake_operation(&op); > 352?? } else { > 353???? log_handshake_info(start_time_ns, op.name(), 0, 0, "(thread > dead)"); > 354???? return false; > 355?? } > ``` > > Serguei, can I list you as a Reviewer? If so, I will push this change > when I got second reviewer. > > > Thanks, > > Yasumasa > > > On 2020/06/27 8:51, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> Thank you for clarification. >> >> Thanks, >> Serguei >> >> >> On 6/26/20 15:50, David Holmes wrote: >>> Hi Serguei, >>> >>> On 27/06/2020 2:40 am, serguei.spitsyn at oracle.com wrote: >>>> Hi Yasumasa, >>>> >>>> I see, some VM_op's also have this check: >>>> >>>> 1546?? ThreadsListHandle tlh; >>>> 1547?? if (jt != NULL && tlh.includes(jt) >>>> >>>> >>>> I wonder if it make sense to add as well. >>> >>> If you are executing the handshake operation then you are in a >>> handshake with the target thread which means it must exist in some >>> ThreadsList. >>> >>> Cheers, >>> David >>> ----- >>> >>>> Otherwise, it looks good to me. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> On 6/26/20 00:03, Yasumasa Suenaga wrote: >>>>> Hi all, >>>>> >>>>> Please review this change. >>>>> >>>>> ? JBS: https://bugs.openjdk.java.net/browse/JDK-8248379 >>>>> ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8248379/webrev.00/ >>>>> >>>>> JDK-8242425 introduces to migrate to thread local handshake from >>>>> VM operation for GetOwnedMonitorInfo, >>>>> GetOwnedMonitorStackDepthInfo, and GetCurrentContendedMonitor >>>>> JVMTI functions. However it lacks of validations for thread state >>>>> and thread oop of the target. >>>>> >>>>> This change has been tested on submit repo and >>>>> serviceability/jvmti, serviceability/jdwp vmTestbase/nsk/jvmti, >>>>> vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. >>>>> On submit repo, tools/javac/7118412/ShadowingTest.java and >>>>> java/foreign/TestMismatch.java were failed >>>>> (mach5-one-ysuenaga-JDK-8248379-20200626-0503-12110818). However >>>>> they do not seems to be related to this change. >>>>> (Both tests have been passed on my Linux AMD64) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>> >> From Alan.Bateman at oracle.com Sat Jun 27 07:23:29 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Sat, 27 Jun 2020 08:23:29 +0100 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <9e5dbaf6-9a08-951a-61f2-c50300b79457@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> <9e5dbaf6-9a08-951a-61f2-c50300b79457@oracle.com> Message-ID: <1883ddf8-20f1-0e17-b2d7-fd33ab906793@oracle.com> On 27/06/2020 01:40, serguei.spitsyn at oracle.com wrote: > > The implementation has this order of lookup: > > ??????? // The agent class must have a premain or agentmain method that > ??????? // has 1 or 2 arguments. We check in the following order: > ??????? // > ??????? // 1) declared with a signature of (String, Instrumentation) > ??????? // 2) declared with a signature of (String) > ??????? // 3) inherited with a signature of (String, Instrumentation) > ??????? // 4) inherited with a signature of (String) > > The declared methods are gotten with the getDeclaredMethod and > inherited with the getMethod. > This works for both 1-arg and 2-arg premain methods, so I'm not sure > what is inconsistent. > Or you have a concern there can be a non-nice NoSuchMethodException? > > In fact, I don't understand why there is a need to use the > getDeclaredMethod. > As I see, the getMethod should return a declared method first, and > only if it is absent then it checks for a inherited one. The JPLIS agent used getMethod when it was originally created in JDK 5 so it would only find public methods. I haven't studied the intervening history too closely but I assume JDK-6289149 (in JDK 7) created the inconsistency between the spec and implementation when it explored the scenario of premain declared in a super class with different arity and/or modifiers to the premain in the sub-class. I assume the tests that you've been forced to change are related to this same issue. So given where we are, and given the statement "The JVM first attempts to invoke the following method on the agent class" in the spec then I guess it's okay to keep the getDeclaredMethod to deal with "whacky" case where a super class of the agent class has a public premain method. -Alan. From chris.plummer at oracle.com Mon Jun 29 00:29:48 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Sun, 28 Jun 2020 17:29:48 -0700 Subject: RFR (Preliminary): 8248194: Need better support for running SA tests on core files In-Reply-To: <2ac19e8a-0f0d-a215-6f92-748fd8c0c5ab@oracle.com> References: <04c841a1-1e38-155a-5d94-5d0e5a32c708@oracle.com> <8BD2E0F8-EE7B-4308-A058-2EC235D5212F@oracle.com> <2ac19e8a-0f0d-a215-6f92-748fd8c0c5ab@oracle.com> Message-ID: <6980c0f3-5ae7-a813-3adb-16cac40fe7f6@oracle.com> Hi Leonid, I think getCoreFileLocation() can simply move to OutputAnalyzer. No need for it to be in SAUtils and be passed the String argument that comes from OutputAnalyzer.getOutput(). For the ulimit support, how about if in ProcessTools I add: ??? public static ProcessBuilder addCoreUlimitCommand(ProcessBuilder pb); All the ulimit logic would move there from SATestUtils. It's straight forward to use this API in LingeredApp and TestJmapCore. For ClhsdbCDSCore I'll need to rework the getTestJvmCommandlineWithPrefix() code a bit, since it creates a pb, but doesn't save it. It only uses it to get the cmd String. Also, there's one new finding since I sent out the review. I found the following in CiReplayBase.java: ??? // lets search few possible locations using process output and return existing location ??? private String getCoreFileLocation(String crashOutputString) { This is identical to the code I pulled from ClhsdbCDSCore and is now in SATestUtils.parseCoreFileLocationFromOutput(). Although this is in the compiler directory, it is in fact an SA test that uses clhsdb, although directly via the CLHSDB class rather than through "jhsdb clhsdb". This also explains why ClhsdbCDSCore had some logic to move and rename the core file to "cds_core_file". I removed this logic because it seemed unnecessary, but for CiReplayBase.java it needs to be in a known location so SABase.java can find it. It's still fine for ClhsdbCDSCore to not do the rename, and renaming is independent of any code that locates the core file. I'm not going to update CiReplayBase.java as part of these changes because the two tests that use it both have issues. TestSAServer is problem listed, and when I removed it from the problem list it failed with every run on every platform. There's also TestSAClient, but it relies on client VM, which we don't support anymore. So with neither of these tests running, I'd rather not introduce changes I can't really test. However, there was something good that came out of the CiReplayBase.java discovery. I had previously noted that ClhsdbCDSCore is excluded from running on windows. When I removed the @requires for this, it failed for a reason I didn't quite understand. The complaint was about the path to java.exe when running the process that was suppose to crash, although the path looked fine. However, I found that TestSAServer ran fine on Windows, even though it was basically the process launching code for causing the crash. I looked closer and found one difference. In getTestJvmCommandlineWithPrefix(), which both tests have, the CiReplayBase version had some extra code for Windows: ??????????? return new String[]{"sh", "-c", prefix ??????????????? + (Platform.isWindows() ? cmd.replace('\\', '/').replace(";", "\\;").replace("|", "\\|") : cmd)}; So on Windows it's doing a path conversion. Once I started doing the same with ClhsdbCDSCore, it started to run fine on Windwos also. thanks, Chris On 6/26/20 8:42 PM, Chris Plummer wrote: > Hi Leonid, > > On 6/26/20 7:51 PM, Leonid Mesnik wrote: >> Hi >> >> The idea basically looks good. I think it just make a sense to polish >> it a little bit to hide "sh" usage from test and get core from >> OutputAnalyzer. > Ok, I'll look into both of those. >> ? Also, there is a 'CrashApp' in ClhsdbCDSCore.java. Makes it sense >> to unify it with LingeredApp crasher? Currently, it uses Unsafe to >> crash application. > Yes, I purposely didn't not make that change. My main goal with the > LingeredApp changes is to make it easier to make existing LingeredApp > SA tests run on both a live process and on a core, and my main goal > with ClhsdbCDSCore and TestJmapCore was to move the core finding code > and ulimit code to a common location that could be reused by other tests. > > Keep in mind that ClhsdbLauncher and LingeredApp are independent of > each other. You can have a LingeredApp tests that use or don't use > ClhsdbLauncher, and you can have a non-LingeredApp tests that use or > don't use ClhsdbLauncher. So I didn't want to go down the path of > changing ClhsdbCDSCore (a non LingeredApp test) to use LingeredApp. > Likewise I did not change TestJmapCore to use LingeredApp or > ClhsdbLauncher. Possibly there is good reason to convert some of the > tests to start using LingeredApp and/or ClhsdbLauncher, but that > should be done under a separate RFE. > >> >> Also, crashes are used in other tests, I see some implementations in >> open/test/hotspot/jtreg/vmTestbase/vm/share/vmcrasher > I don't see vmcrasher being used by any tests. In any case, my first > attempt went down the Unsafe path to produce a crash. The issue is > that it forces every user of LingeredApp to include the @module for > Unsafe. I also tried using a WhiteBox API. That was worse, also > requiring every user of LingeredApp to include an @module, plus the > tests that actually want to cause a crash need to @build WhiteBox.java > and then do the classfile install. It also required additional module > related hacks in LingeredApp. The issue with my current solution is > how to get libLingeredApp.c to compile has not been settled on. I'm > still waiting for an answer from the build team. >> >> So it would be nice to have some common way to crash hotspot. > I can see possibly moving the crashing code out of LingeredApp and > into a native lib that non-LingeredApp tests can use, although that > really is just a very small part of the changes to LingeredApp. For > the most part the changes would look the same except you would call a > different API to cause the crash. >> >> Leonid >> > Thanks for having a look. > > Chris >>> On Jun 25, 2020, at 2:41 PM, Chris Plummer >>> wrote: >>> >>> Hello, >>> >>> Please help with a preliminary review of changes to add better >>> support for writing SA tests that work on core files: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8248194 >>> http://cr.openjdk.java.net/~cjplummer/8248194/webrev.00/index.html >>> >>> As pointed out, this is a preliminary review. I suspect there will >>> be some feedback for changes/improvements. Also, I still need to >>> work out a final solution for how to get LingeredApp to produce a >>> crash. What I currently have works but is somewhat of a hack w.r.t. >>> the makefile change, so you can ignore the makefiile change for now. >>> I'm working on a more proper solution with the build team. >>> >>> As outlined in the CR, these are the 3 main goals of this CR: >>> >>> 1. SATestUtils should include support for finding the core file. >>> This includes parsing the output of the crashed process to locate >>> where the core file was saved, and returning this location to the user. >>> >>> 2. SATestUtils should include support for adding the "ulimit -c >>> unlimited" prefix to the command that will produce the core file, >>> allowing the overriding of any lower limit so we can be sure the >>> core file will be produced. >>> >>> 3. LingeredApp should include support for producing a core file. >>> >>> As proof of concept for these 3 changes in test library support, I'm >>> updating the following 3 tests: >>> >>> ClhsdbCDSCore.java: Use the SATestUtils support listed above. This >>> test does not use LingeredApp, so those improvements don't apply. >>> >>> TestJmapCore.java: Use the SATestUtils support listed above. This >>> test does not use LingeredApp, so those improvements don't apply. >>> >>> ClhsdbFindPC.java: Use all the above features, including having >>> LingeredApp produce a core file. This is the only test modified to >>> start testing on core files that didn't previously do so. It still >>> also tests on a live process. >>> >>> In the future more Clhsdb tests will be converted to work on core >>> files in a manner similar to ClhsdbFindPC. >>> >>> The new SATestUtils code is borrowed from (more like ripped out of) >>> ClhsdbCDSCore.java and TestJmapCore.java. They both had a lot of >>> code dedicated to finding the core file and also applying "ulimit -c >>> unlimitted" if necessary, but didn't do so in quite the same way. >>> Now both these tests share code in SATestUtils.java. One thing I did >>> drop is TestJmapCore.java use of ":KILLED_PID" in the output to help >>> find the core file. It's no longer necessary based on the smarter >>> core locating code I pulled from ClhsdbCDSCore.java. >>> >>> thanks, >>> >>> Chris > From david.holmes at oracle.com Mon Jun 29 02:20:52 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 29 Jun 2020 12:20:52 +1000 Subject: RFR: 8248379: Handshake closures for JVMTI monitor functions lack of some validations In-Reply-To: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> References: <46a1e780-6ca6-becb-c3c1-b4e6f2c9a8a8@oss.nttdata.com> Message-ID: <3c9eed3a-14ac-7664-5f52-6ffe085a5897@oracle.com> Hi Yasumasa, On 26/06/2020 5:03 pm, Yasumasa Suenaga wrote: > Hi all, > > Please review this change. > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8248379 > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8248379/webrev.00/ > > JDK-8242425 introduces to migrate to thread local handshake from VM > operation for GetOwnedMonitorInfo, GetOwnedMonitorStackDepthInfo, and > GetCurrentContendedMonitor JVMTI functions. However it lacks of > validations for thread state and thread oop of the target. The restoration of these checks, and the explicit checking of the return value for execute_direct, looks good to me. Thanks, David ----- > This change has been tested on submit repo and serviceability/jvmti, > serviceability/jdwp vmTestbase/nsk/jvmti, vmTestbase/nsk/jdi > vmTestbase/nsk/jdwp. > On submit repo, tools/javac/7118412/ShadowingTest.java and > java/foreign/TestMismatch.java were failed > (mach5-one-ysuenaga-JDK-8248379-20200626-0503-12110818). However they do > not seems to be related to this change. > (Both tests have been passed on my Linux AMD64) > > > Thanks, > > Yasumasa From david.holmes at oracle.com Mon Jun 29 02:37:58 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 29 Jun 2020 12:37:58 +1000 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7445ef56-4fe9-47ba-0935-90ba800b0694@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> Message-ID: <1c917ae8-b7e6-d5f8-f423-bff70c31dda4@oracle.com> Hi Serguei, These changes look good to me. Note that I tweaked the bug synopsis to make it slightly more grammatically correct: that invoke -> to invoke Thanks, David On 26/06/2020 2:17 am, serguei.spitsyn at oracle.com wrote: > > New wevrev version is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ > > Now the InstrumentationImpl.java has this new check to throw IAE with > the meaningful error message: > > + // reject non-public premain or agentmain method > + if (!m.canAccess(null)) { > + String msg = "method " + classname + "." + methodname + " must be > declared public"; > + throw new IllegalAccessException(msg); > + } > > > It also includes a new negative test for non-public premain method: > test/jdk/java/lang/instrument/NonPublicPremainAgent.java > > I've tested the non-public agentmain as well with one of the hacked > JVMTI aod tests. > But I gave up to make it a stand alone test as this testing framework is > tricky to use for negative testing. > The implementation is common for premain and agentmain cases, so > probably, one test > > > Also, I had to fix all impacted java/lang/instrument tests to make the > Agent classes public. > The following tests required a refactoring: > > || test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java > test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java > test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java > > > They define an agent as extending an agent super class which has the > premain methods defined: > > 37 class InheritAgent0101 extends InheritAgent0101Super { > 38 > 39 // > 40 // This agent has a single argument premain() method which > 41 // is the one that should be called. > 42 // > 43 public static void premain (String agentArgs) { > 44 System.out.println("Hello from Single-Arg InheritAgent0101!"); > 45 } > 46 > 47 // This agent does NOT have a double argument premain() method. > 48 } > 49 > 50 class InheritAgent0101Super { > 51 > 52 // > 53 // This agent has a single argument premain() method which > 54 // is NOT the one that should be called. > 55 // > 56 public static void premain (String agentArgs) { > 57 System.out.println("Hello from Single-Arg InheritAgent0101Super!"); > 58 throw new Error("ERROR: THIS AGENT SHOULD NOT HAVE BEEN CALLED."); > 59 } > 60 > 61 // This agent does NOT have a double argument premain() method. > 62 } > > > Above, just one class can be made public. > But the InheritAgent0101Super has to be public as well as has the > premain method defined. > These agent super classes are separated to their own files. > To make this refactoring to work new || customized script is introduced: > ? test/jdk/java/lang/instrument/PremainClass/MakeJAR.sh > > The java/lang/instrument tests are passed locally. > I'll submit a mach5 test jobs to make sure nothing is broken. > > Thanks, > Serguei > > > > On 6/24/20 13:07, serguei.spitsyn at oracle.com wrote: >> On 6/24/20 12:44, Mandy Chung wrote: >>> >>> >>> On 6/24/20 12:26 PM, serguei.spitsyn at oracle.com wrote: >>>> On 6/24/20 05:25, David Holmes wrote: >>>>> >>>>> Ah! The test class SimpleAgent is what is not public. That seems a >>>>> bug in the test. >>>> >>>> There are many such tests. >>>> We can break some of the existing agents by rejecting non-public >>>> agent classes. >>>> I'm inclined to continue using the setAccessible and just add an >>>> extra check for non-public premain/agentmain methods. >>> >>> There is only one non-public SimpleAgent which is shared by >>> j.l.instrument tests. >>> ? test/jdk/java/lang/instrument/SimpleAgent.java >> >> There are many such tests: >> >> test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestLambdaFormRetransformation.java:class >> Agent implements ClassFileTransformer { >> >> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java:class >> NativeMethodPrefixAgent { >> test/jdk/java/lang/instrument/PremainClass/NoPremainAgent.java:class >> NoPremainAgent { >> test/jdk/java/lang/instrument/SimpleAgent.java:class SimpleAgent { >> test/jdk/java/lang/instrument/RetransformAgent.java:class >> RetransformAgent { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0001.java:class >> InheritAgent0001 extends InheritAgent0001Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0001.java:class >> InheritAgent0001Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0010.java:class >> InheritAgent0010 extends InheritAgent0010Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0010.java:class >> InheritAgent0010Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0011.java:class >> InheritAgent0011 extends InheritAgent0011Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0011.java:class >> InheritAgent0011Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java:class >> InheritAgent0100 extends InheritAgent0100Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java:class >> InheritAgent0100Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0101.java:class >> InheritAgent0101 extends InheritAgent0101Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0101.java:class >> InheritAgent0101Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0110.java:class >> InheritAgent0110 extends InheritAgent0110Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0110.java:class >> InheritAgent0110Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0111.java:class >> InheritAgent0111 extends InheritAgent0111Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent0111.java:class >> InheritAgent0111Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java:class >> InheritAgent1000 extends InheritAgent1000Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java:class >> InheritAgent1000Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1001.java:class >> InheritAgent1001 extends InheritAgent1001Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1001.java:class >> InheritAgent1001Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1010.java:class >> InheritAgent1010 extends InheritAgent1010Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1010.java:class >> InheritAgent1010Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1011.java:class >> InheritAgent1011 extends InheritAgent1011Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1011.java:class >> InheritAgent1011Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java:class >> InheritAgent1100 extends InheritAgent1100Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java:class >> InheritAgent1100Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1101.java:class >> InheritAgent1101 extends InheritAgent1101Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1101.java:class >> InheritAgent1101Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1110.java:class >> InheritAgent1110 extends InheritAgent1110Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1110.java:class >> InheritAgent1110Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1111.java:class >> InheritAgent1111 extends InheritAgent1111Super { >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1111.java:class >> InheritAgent1111Super { >> >> >> But is is not a big problem - all can be fixed. >> >>> test/hotspot/jtreg/runtime/cds/appcds/jvmti/dumpingWithAgent >>> implements the agent properly (a public class and a public static >>> void premain method). >>> >>> As the popular Java agents are conforming the spec (publicly >>> accessible premain method), the compatibility risk is low. >>> >>> Unless such a? java agent exists and finds a strong compelling reason >>> to argue that its premain method must be allowed non-public, I do not >>> see the argument to change the spec to allow non-public agent classes. >>> >>> A bad test case is not a representative existing java agent. >> >> Okay, thanks. >> I'll prepare a fix with a removed setAccessible. >> >> Thanks, >> Serguei >> >>> >>> Mandy >> > From sgehwolf at redhat.com Mon Jun 29 15:53:07 2020 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Mon, 29 Jun 2020 17:53:07 +0200 Subject: RFR(s): 8247863: Unreachable code in OperatingSystemImpl.getTotalSwapSpaceSize() Message-ID: Hi, Could I please get a review of this dead-code removal? During review of JDK-8244500 it was discovered that with the new cgroups implementation supporting v1 and v2 Metrics.getMemoryAndSwapLimit() will never return 0 when relevant cgroup files are missing. E.g. on a system where the kernel doesn't support swap limit capabilities. Therefore this code introduced with JDK-8236617 can no longer be reached and should get removed. Bug: https://bugs.openjdk.java.net/browse/JDK-8247863 webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8247863/01/webrev/ Testing: Matthias tested this on the affected system and it did pass for him. Docker tests on cgroup v1 and cgroup v2. Thanks, Severin From mandy.chung at oracle.com Mon Jun 29 16:46:21 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 29 Jun 2020 09:46:21 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <1883ddf8-20f1-0e17-b2d7-fd33ab906793@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> <9e5dbaf6-9a08-951a-61f2-c50300b79457@oracle.com> <1883ddf8-20f1-0e17-b2d7-fd33ab906793@oracle.com> Message-ID: <4a10c318-fca0-0eb3-9d7f-c244a6190832@oracle.com> On 6/27/20 12:23 AM, Alan Bateman wrote: > On 27/06/2020 01:40, serguei.spitsyn at oracle.com wrote: >> >> The implementation has this order of lookup: >> >> ??????? // The agent class must have a premain or agentmain method that >> ??????? // has 1 or 2 arguments. We check in the following order: >> ??????? // >> ??????? // 1) declared with a signature of (String, Instrumentation) >> ??????? // 2) declared with a signature of (String) >> ??????? // 3) inherited with a signature of (String, Instrumentation) >> ??????? // 4) inherited with a signature of (String) >> >> The declared methods are gotten with the getDeclaredMethod and >> inherited with the getMethod. >> This works for both 1-arg and 2-arg premain methods, so I'm not sure >> what is inconsistent. >> Or you have a concern there can be a non-nice NoSuchMethodException? >> >> In fact, I don't understand why there is a need to use the >> getDeclaredMethod. >> As I see, the getMethod should return a declared method first, and >> only if it is absent then it checks for a inherited one. > The JPLIS agent used getMethod when it was originally created in JDK 5 > so it would only find public methods. I haven't studied the > intervening history too closely but I assume JDK-6289149 (in JDK 7) > created the inconsistency between the spec and implementation when it > explored the scenario of premain declared in a super class with > different arity and/or modifiers to the premain in the sub-class. I > assume the tests that you've been forced to change are related to this > same issue. > Thanks for digging up the history. > So given where we are, and given the statement "The JVM first attempts > to invoke the following method on the agent class" in the spec then I > guess it's okay to keep the getDeclaredMethod to deal with "whacky" > case where a super class of the agent class has a public premain method. > I also think it's okay to get a different exception message in this case. Serguie - I reviewed this version.? It looks okay. > New wevrev version is: > http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Jun 29 17:42:35 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 29 Jun 2020 10:42:35 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <1883ddf8-20f1-0e17-b2d7-fd33ab906793@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> <9e5dbaf6-9a08-951a-61f2-c50300b79457@oracle.com> <1883ddf8-20f1-0e17-b2d7-fd33ab906793@oracle.com> Message-ID: On 6/27/20 00:23, Alan Bateman wrote: > On 27/06/2020 01:40, serguei.spitsyn at oracle.com wrote: >> >> The implementation has this order of lookup: >> >> ??????? // The agent class must have a premain or agentmain method that >> ??????? // has 1 or 2 arguments. We check in the following order: >> ??????? // >> ??????? // 1) declared with a signature of (String, Instrumentation) >> ??????? // 2) declared with a signature of (String) >> ??????? // 3) inherited with a signature of (String, Instrumentation) >> ??????? // 4) inherited with a signature of (String) >> >> The declared methods are gotten with the getDeclaredMethod and >> inherited with the getMethod. >> This works for both 1-arg and 2-arg premain methods, so I'm not sure >> what is inconsistent. >> Or you have a concern there can be a non-nice NoSuchMethodException? >> >> In fact, I don't understand why there is a need to use the >> getDeclaredMethod. >> As I see, the getMethod should return a declared method first, and >> only if it is absent then it checks for a inherited one. > The JPLIS agent used getMethod when it was originally created in JDK 5 > so it would only find public methods. I haven't studied the > intervening history too closely but I assume JDK-6289149 (in JDK 7) > created the inconsistency between the spec and implementation when it > explored the scenario of premain declared in a super class with > different arity and/or modifiers to the premain in the sub-class. I > assume the tests that you've been forced to change are related to this > same issue. > > So given where we are, and given the statement "The JVM first attempts > to invoke the following method on the agent class" in the spec then I > guess it's okay to keep the getDeclaredMethod to deal with "whacky" > case where a super class of the agent class has a public premain method. Thank you for clarification, Alan. Thanks, Serguei > > -Alan. > > > > > From serguei.spitsyn at oracle.com Mon Jun 29 17:44:38 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 29 Jun 2020 10:44:38 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <1c917ae8-b7e6-d5f8-f423-bff70c31dda4@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <7ad860a8-3778-d0eb-6b77-47a71d10881c@oracle.com> <0e1f6c93-e9cb-17af-9be6-870381f1744e@oracle.com> <99b0d691-e4e2-489f-2e1b-624c4ac711ea@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> <1c917ae8-b7e6-d5f8-f423-bff70c31dda4@oracle.com> Message-ID: <53da56dc-7aa9-fea1-272d-ffe44a005177@oracle.com> Hi David, Thank you a lot for review and tweaking the bug title. I've re-targeted this to 16 as was suggested by Joe. Thanks, Serguei On 6/28/20 19:37, David Holmes wrote: > Hi Serguei, > > These changes look good to me. > > Note that I tweaked the bug synopsis to make it slightly more > grammatically correct: that invoke -> to invoke > > Thanks, > David > > On 26/06/2020 2:17 am, serguei.spitsyn at oracle.com wrote: >> >> New wevrev version is: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ >> >> Now the InstrumentationImpl.java has this new check to throw IAE with >> the meaningful error message: >> >> + // reject non-public premain or agentmain method >> + if (!m.canAccess(null)) { >> + String msg = "method " + classname + "." + methodname + " must be >> declared public"; >> + throw new IllegalAccessException(msg); >> + } >> >> >> It also includes a new negative test for non-public premain method: >> test/jdk/java/lang/instrument/NonPublicPremainAgent.java >> >> I've tested the non-public agentmain as well with one of the hacked >> JVMTI aod tests. >> But I gave up to make it a stand alone test as this testing framework >> is tricky to use for negative testing. >> The implementation is common for premain and agentmain cases, so >> probably, one test >> >> >> Also, I had to fix all impacted java/lang/instrument tests to make >> the Agent classes public. >> The following tests required a refactoring: >> >> || test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java >> test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java >> >> >> They define an agent as extending an agent super class which has the >> premain methods defined: >> >> ?? 37 class InheritAgent0101 extends InheritAgent0101Super { >> ?? 38 >> ?? 39???? // >> ?? 40???? // This agent has a single argument premain() method which >> ?? 41???? // is the one that should be called. >> ?? 42???? // >> ?? 43???? public static void premain (String agentArgs) { >> ?? 44???????? System.out.println("Hello from Single-Arg >> InheritAgent0101!"); >> ?? 45???? } >> ?? 46 >> ?? 47???? // This agent does NOT have a double argument premain() >> method. >> ?? 48 } >> ?? 49 >> ?? 50 class InheritAgent0101Super { >> ?? 51 >> ?? 52???? // >> ?? 53???? // This agent has a single argument premain() method which >> ?? 54???? // is NOT the one that should be called. >> ?? 55???? // >> ?? 56???? public static void premain (String agentArgs) { >> ?? 57???????? System.out.println("Hello from Single-Arg >> InheritAgent0101Super!"); >> ?? 58???????? throw new Error("ERROR: THIS AGENT SHOULD NOT HAVE BEEN >> CALLED."); >> ?? 59???? } >> ?? 60 >> ?? 61???? // This agent does NOT have a double argument premain() >> method. >> ?? 62 } >> >> >> Above, just one class can be made public. >> But the InheritAgent0101Super has to be public as well as has the >> premain method defined. >> These agent super classes are separated to their own files. >> To make this refactoring to work new || customized script is introduced: >> ?? test/jdk/java/lang/instrument/PremainClass/MakeJAR.sh >> >> The java/lang/instrument tests are passed locally. >> I'll submit a mach5 test jobs to make sure nothing is broken. >> >> Thanks, >> Serguei >> >> >> >> On 6/24/20 13:07, serguei.spitsyn at oracle.com wrote: >>> On 6/24/20 12:44, Mandy Chung wrote: >>>> >>>> >>>> On 6/24/20 12:26 PM, serguei.spitsyn at oracle.com wrote: >>>>> On 6/24/20 05:25, David Holmes wrote: >>>>>> >>>>>> Ah! The test class SimpleAgent is what is not public. That seems >>>>>> a bug in the test. >>>>> >>>>> There are many such tests. >>>>> We can break some of the existing agents by rejecting non-public >>>>> agent classes. >>>>> I'm inclined to continue using the setAccessible and just add an >>>>> extra check for non-public premain/agentmain methods. >>>> >>>> There is only one non-public SimpleAgent which is shared by >>>> j.l.instrument tests. >>>> ? test/jdk/java/lang/instrument/SimpleAgent.java >>> >>> There are many such tests: >>> >>> test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestLambdaFormRetransformation.java:class >>> Agent implements ClassFileTransformer { >>> >>> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java:class >>> NativeMethodPrefixAgent { >>> test/jdk/java/lang/instrument/PremainClass/NoPremainAgent.java:class >>> NoPremainAgent { >>> test/jdk/java/lang/instrument/SimpleAgent.java:class SimpleAgent { >>> test/jdk/java/lang/instrument/RetransformAgent.java:class >>> RetransformAgent { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0001.java:class >>> InheritAgent0001 extends InheritAgent0001Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0001.java:class >>> InheritAgent0001Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0010.java:class >>> InheritAgent0010 extends InheritAgent0010Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0010.java:class >>> InheritAgent0010Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0011.java:class >>> InheritAgent0011 extends InheritAgent0011Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0011.java:class >>> InheritAgent0011Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java:class >>> InheritAgent0100 extends InheritAgent0100Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0100.java:class >>> InheritAgent0100Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0101.java:class >>> InheritAgent0101 extends InheritAgent0101Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0101.java:class >>> InheritAgent0101Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0110.java:class >>> InheritAgent0110 extends InheritAgent0110Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0110.java:class >>> InheritAgent0110Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0111.java:class >>> InheritAgent0111 extends InheritAgent0111Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent0111.java:class >>> InheritAgent0111Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java:class >>> InheritAgent1000 extends InheritAgent1000Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1000.java:class >>> InheritAgent1000Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1001.java:class >>> InheritAgent1001 extends InheritAgent1001Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1001.java:class >>> InheritAgent1001Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1010.java:class >>> InheritAgent1010 extends InheritAgent1010Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1010.java:class >>> InheritAgent1010Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1011.java:class >>> InheritAgent1011 extends InheritAgent1011Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1011.java:class >>> InheritAgent1011Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java:class >>> InheritAgent1100 extends InheritAgent1100Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1100.java:class >>> InheritAgent1100Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1101.java:class >>> InheritAgent1101 extends InheritAgent1101Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1101.java:class >>> InheritAgent1101Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1110.java:class >>> InheritAgent1110 extends InheritAgent1110Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1110.java:class >>> InheritAgent1110Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1111.java:class >>> InheritAgent1111 extends InheritAgent1111Super { >>> test/jdk/java/lang/instrument/PremainClass/InheritAgent1111.java:class >>> InheritAgent1111Super { >>> >>> >>> But is is not a big problem - all can be fixed. >>> >>>> test/hotspot/jtreg/runtime/cds/appcds/jvmti/dumpingWithAgent >>>> implements the agent properly (a public class and a public static >>>> void premain method). >>>> >>>> As the popular Java agents are conforming the spec (publicly >>>> accessible premain method), the compatibility risk is low. >>>> >>>> Unless such a? java agent exists and finds a strong compelling >>>> reason to argue that its premain method must be allowed non-public, >>>> I do not see the argument to change the spec to allow non-public >>>> agent classes. >>>> >>>> A bad test case is not a representative existing java agent. >>> >>> Okay, thanks. >>> I'll prepare a fix with a removed setAccessible. >>> >>> Thanks, >>> Serguei >>> >>>> >>>> Mandy >>> >> From serguei.spitsyn at oracle.com Mon Jun 29 17:46:45 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 29 Jun 2020 10:46:45 -0700 Subject: 15 RFR(XS): 8165276: Spec states that invoke the premain method in an agent class if it's public but implementation differs In-Reply-To: <4a10c318-fca0-0eb3-9d7f-c244a6190832@oracle.com> References: <981485e8-ae67-10a9-dbc9-855ffa9d7d4a@oracle.com> <26e05467-d93a-ea1b-f1d8-3b1325b72479@oracle.com> <7e1e2ee8-a49a-520c-9ced-cec4fc24eb55@oracle.com> <5ddd059d-5d16-8fa8-4a99-6496b4a30b9c@oracle.com> <486fa9e5-cf8d-f6a3-7334-d0339cf2a3a2@oracle.com> <17c9469b-e90e-9f6e-744e-e9a43d3ac348@oracle.com> <7626dc97-2ae6-0678-5670-83899180dc54@oracle.com> <7e0b83fd-067a-3ed3-bd09-93abb8712568@oracle.com> <544ae381-195d-cfc7-6421-ecbaad7da35c@oracle.com> <76c58309-cd04-f205-ca09-46a117590fad@oracle.com> <9e5dbaf6-9a08-951a-61f2-c50300b79457@oracle.com> <1883ddf8-20f1-0e17-b2d7-fd33ab906793@oracle.com> <4a10c318-fca0-0eb3-9d7f-c244a6190832@oracle.com> Message-ID: Thanks a lot for review, Mandy! I also, asked Leonid to look if the test changes can be simplified Thanks, Serguei On 6/29/20 09:46, Mandy Chung wrote: > > > On 6/27/20 12:23 AM, Alan Bateman wrote: >> On 27/06/2020 01:40, serguei.spitsyn at oracle.com wrote: >>> >>> The implementation has this order of lookup: >>> >>> ??????? // The agent class must have a premain or agentmain method that >>> ??????? // has 1 or 2 arguments. We check in the following order: >>> ??????? // >>> ??????? // 1) declared with a signature of (String, Instrumentation) >>> ??????? // 2) declared with a signature of (String) >>> ??????? // 3) inherited with a signature of (String, Instrumentation) >>> ??????? // 4) inherited with a signature of (String) >>> >>> The declared methods are gotten with the getDeclaredMethod and >>> inherited with the getMethod. >>> This works for both 1-arg and 2-arg premain methods, so I'm not sure >>> what is inconsistent. >>> Or you have a concern there can be a non-nice NoSuchMethodException? >>> >>> In fact, I don't understand why there is a need to use the >>> getDeclaredMethod. >>> As I see, the getMethod should return a declared method first, and >>> only if it is absent then it checks for a inherited one. >> The JPLIS agent used getMethod when it was originally created in JDK >> 5 so it would only find public methods. I haven't studied the >> intervening history too closely but I assume JDK-6289149 (in JDK 7) >> created the inconsistency between the spec and implementation when it >> explored the scenario of premain declared in a super class with >> different arity and/or modifiers to the premain in the sub-class. I >> assume the tests that you've been forced to change are related to >> this same issue. >> > > Thanks for digging up the history. > >> So given where we are, and given the statement "The JVM first >> attempts to invoke the following method on the agent class" in the >> spec then I guess it's okay to keep the getDeclaredMethod to deal >> with "whacky" case where a super class of the agent class has a >> public premain method. >> > > I also think it's okay to get a different exception message in this case. > > Serguie - I reviewed this version.? It looks okay. >> New wevrev version is: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2020/instr-setAccessible.2/ > > Mandy > > > From chris.plummer at oracle.com Mon Jun 29 19:19:15 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 29 Jun 2020 12:19:15 -0700 Subject: [15] RFR(XXS): 7107012: sun.jvm.hostspot.code.CompressedReadStream readDouble() conversion to long mishandled In-Reply-To: <438b80e5-5094-69e6-c12a-b712be33a6a6@oracle.com> References: <151408c7-a9b7-7c02-d3e3-fc4155c1152f@oracle.com> <438b80e5-5094-69e6-c12a-b712be33a6a6@oracle.com> Message-ID: <55ac10a4-4012-00d1-c299-55fc3002d59f@oracle.com> Thanks Serguei! Can I get one more reviewer please? The change is very simple. thanks, Chris On 6/26/20 5:51 PM, serguei.spitsyn at oracle.com wrote: > Hi Chris, > > The fix looks good. > I would most likely overlook such a bug with my eyes. :) > > > Thanks, > Serguei > > > On 6/26/20 16:03, Chris Plummer wrote: >> Hello, >> >> Please help review the following: >> >> http://cr.openjdk.java.net/~cjplummer/7107012/webrev.00/index.html >> https://bugs.openjdk.java.net/browse/JDK-7107012 >> >> This bug is filed as confidential, although the issue is trivial. In >> the following line of code: >> >> ??? return Double.longBitsToDouble((h << 32) | ((long)l & >> 0x00000000FFFFFFFFL)); >> >> Since h is an int, it's subject to the following: >> >> https://docs.oracle.com/javase/specs/jls/se14/html/jls-15.html#jls-15.19 >> >> "If the promoted type of the left-hand operand is int, then only the >> five lowest-order bits of the right-hand operand are used as the >> shift distance. It is as if the right-hand operand were subjected to >> a bitwise logical AND operator & (?15.22.1) with the mask value 0x1f >> (0b11111). The shift distance actually used is therefore always in >> the range 0 to 31, inclusive." >> >> So (h << 32) is the same as (h << 0), which is not what was intended. >> The spec also calls out another issue: >> >> "The type of the shift expression is the promoted type of the >> left-hand operand." >> >> So even if it did left shift 32 bits, the result would have been >> truncated to an int, meaning the result would always be 0. The fix is >> to first cast h to a long. Doing this addresses both these problems, >> allowing a full 32 bit left shift to be done, and leaving the result >> as an untruncated long. >> >> I was unable to trigger use of this code in SA. It seems to be used >> to pull locals out of a CompiledVFrame. I don't see any clhsdb paths >> to this code. It appears the GUI hsdb uses it via a complex call path >> I could not fully decipher, but I could not trigger its use from >> hsdb. In any case, the fix is straight forward and trivial, so I'd >> rather not have to spend more time digging deeper into its use and >> providing a test case. >> >> thanks, >> >> Chris >> > From daniel.daugherty at oracle.com Mon Jun 29 19:21:38 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 15:21:38 -0400 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support Message-ID: Greetings, I have a fix for the following bug: ??? JDK-8246493 JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support ??? https://bugs.openjdk.java.net/browse/JDK-8246493 Here's the webrev URL: http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ The test bug that's being fixed: ??? vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails ??? intermittently with the following message: ???? nsk.share.TestBug: There are more than one(2) instance of 'nsk.share.jpda.StateTestThread in debuggee Summary of the fix: ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated ??? ObjectMonitors are deflated after each debuggee has been run. This fix has been tested with a Mach5 Tier5 test run that executes all of the JDI tests (along with JDWP, JVM/TI and other Serviceability tests). I also did five 100 iteration runs of the failing mix002 test. Each Mach5 job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 for a total of (5 * 100 * 3) iterations of nsk/jdi/stress/serial/mixed002. There were no failures. Thanks, in advance, for any comments, questions or suggestions. Dan Gory details: The primary focus of the fix is in the first three files in the webrev: test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java test/hotspot/jtreg/ProblemList.txt nsk.share.jdi.SerialExecutionDebuggee is the class that used to serially execute the debuggee portion of a specific list of tests. After this class is done executing a debuggee class, it needs to deflate idle monitors in order to prevent a StateTestThread object created by one debuggee class from confusing the next debuggee class. Each of the debuggee classes that use StateTestThread expect there to be only one of these objects. However, since we are running multiple debuggee classes serially *in the same VM*, the StateTestThread object created in one debuggee can still be around when the next debuggee runs. The COMMAND_CLEAR_DEBUGGEE implementation clears the currentDebuggee variable which permits the debuggee to be GC'ed and is modified by this fix to call WhiteBox.deflateIdleMonitors() to make sure that all inflated ObjectMonitors are deflated after each debuggee has been run. This takes care of any pinned StateTestThread objects (and any other inflated ObjectMonitors). vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style stress test that executes the debugger and debuggee parts of a specific list of tests serially *in the same VM*. Several of the tests executed by mixed002 make use of the StateTestThread class. The failure is intermittent because the order of test execution is shuffled automatically and sometimes the ServiceThread manages to execute deflation at the right time to prevent more than one StateTestThread object from existing at the same time. The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the standard boilerplate needed to call WhiteBox functions from test code. The actual call to WhiteBox.deflateIdleMonitors() is made in SerialExecutionDebuggee. I did attempt a fix where I modified the StateTestThread class to make the call to WhiteBox.deflateIdleMonitors() after the internal waitOnObject is no longer contended or waited on. That fix reduced the frequency of the failures by about half, but it didn't solve the test bug entirely. So I had to make the fix in SerialExecutionDebuggee instead. test/hotspot/jtreg/ProblemList.txt is modified to re-enable the mix002 test. The remaining nine files are also wrapper style stress tests that execute the debugger and debuggee parts of a specific list of tests serially *in the same VM*. Because these tests also use SerialExecutionDebuggee, they also need the boilerplate changes so that WhiteBox.deflateIdleMonitors() can be called. From chris.plummer at oracle.com Mon Jun 29 19:37:01 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 29 Jun 2020 12:37:01 -0700 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: References: Message-ID: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> Hi Dan, Something is wrong with ProblemList.txt. It doesn't show any changes, but I also don't see mixed002 in the file anymore. Otherwise the changes look good. thanks, Chris On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: > Greetings, > > I have a fix for the following bug: > > ??? JDK-8246493 JDI stress/serial/mixed002 needs to use > WhiteBox.deflateIdleMonitors support > ??? https://bugs.openjdk.java.net/browse/JDK-8246493 > > Here's the webrev URL: > > http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ > > The test bug that's being fixed: > > ??? vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails > ??? intermittently with the following message: > > ???? nsk.share.TestBug: There are more than one(2) instance of > 'nsk.share.jpda.StateTestThread in debuggee > > Summary of the fix: > > ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated > ??? ObjectMonitors are deflated after each debuggee has been run. > > This fix has been tested with a Mach5 Tier5 test run that executes all > of the JDI tests (along with JDWP, JVM/TI and other Serviceability > tests). > I also did five 100 iteration runs of the failing mix002 test. Each Mach5 > job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 for a > total of (5 * 100 * 3) iterations of nsk/jdi/stress/serial/mixed002. > There > were no failures. > > Thanks, in advance, for any comments, questions or suggestions. > > Dan > > > Gory details: > > The primary focus of the fix is in the first three files in the webrev: > > test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java > test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java > > test/hotspot/jtreg/ProblemList.txt > > nsk.share.jdi.SerialExecutionDebuggee is the class that used to serially > execute the debuggee portion of a specific list of tests. After this > class > is done executing a debuggee class, it needs to deflate idle monitors in > order to prevent a StateTestThread object created by one debuggee class > from confusing the next debuggee class. Each of the debuggee classes that > use StateTestThread expect there to be only one of these objects. > However, > since we are running multiple debuggee classes serially *in the same VM*, > the StateTestThread object created in one debuggee can still be around > when the next debuggee runs. > > The COMMAND_CLEAR_DEBUGGEE implementation clears the currentDebuggee > variable > which permits the debuggee to be GC'ed and is modified by this fix to > call > WhiteBox.deflateIdleMonitors() to make sure that all inflated > ObjectMonitors > are deflated after each debuggee has been run. This takes care of any > pinned > StateTestThread objects (and any other inflated ObjectMonitors). > > > vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style stress > test that > executes the debugger and debuggee parts of a specific list of tests > serially > *in the same VM*. Several of the tests executed by mixed002 make use > of the > StateTestThread class. The failure is intermittent because the order > of test > execution is shuffled automatically and sometimes the ServiceThread > manages > to execute deflation at the right time to prevent more than one > StateTestThread > object from existing at the same time. > > The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the > standard > boilerplate needed to call WhiteBox functions from test code. The > actual call > to WhiteBox.deflateIdleMonitors() is made in SerialExecutionDebuggee. > I did > attempt a fix where I modified the StateTestThread class to make the > call to > WhiteBox.deflateIdleMonitors() after the internal waitOnObject is no > longer > contended or waited on. That fix reduced the frequency of the failures by > about half, but it didn't solve the test bug entirely. So I had to > make the > fix in SerialExecutionDebuggee instead. > > > test/hotspot/jtreg/ProblemList.txt is modified to re-enable the mix002 > test. > > > The remaining nine files are also wrapper style stress tests that execute > the debugger and debuggee parts of a specific list of tests serially *in > the same VM*. Because these tests also use SerialExecutionDebuggee, they > also need the boilerplate changes so that WhiteBox.deflateIdleMonitors() > can be called. From daniil.x.titov at oracle.com Mon Jun 29 19:39:29 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 29 Jun 2020 12:39:29 -0700 Subject: RFR: 8227337: javax/management/remote/mandatory/connection/ReconnectTest.java NoSuchObjectException no such object in table Message-ID: <0B4961E2-3CE1-4738-9C95-F6C2B0B081A2@oracle.com> Please review the change that fixes an intermittent tests failure. The tests javax/management/remote/mandatory/connection/ReconnectTest.java and javax/management/remote/mandatory/connection/MultiThreadDeadLockTest.java use specific settings for server timeout that in some cases (e.g. when the test is run with -Xcomp) result in JMX server connection timeout thread unexports the remote object while the client connection is still in the progress. Below is an example of a such stacktrace: java.rmi.NoSuchObjectException: no such object in table at java.rmi/sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:303) at java.rmi/sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:279) at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:164) at jdk.remoteref/jdk.jmx.remote.internal.rmi.PRef.invoke(Unknown Source) at java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl_Stub.getConnectionId(RMIConnectionImpl_Stub.java:318) at java.management.rmi/javax.management.remote.rmi.RMIConnector.getConnectionId(RMIConnector.java:385) at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:347) at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) at MultiThreadDeadLockTest.main(MultiThreadDeadLockTest.java:87) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) The fix adjusts the server timeout the tests use for "test.timeout.factor" system property. Testing: Mach5 tests are in the progress. [1] https://cr.openjdk.java.net/~dtitov/8227337/webrev.01/ [2] https://bugs.openjdk.java.net/browse/JDK-8227337 Thanks, Daniil From serguei.spitsyn at oracle.com Mon Jun 29 19:41:46 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 29 Jun 2020 12:41:46 -0700 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> Message-ID: <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> Hi Dan, The same as from Chris. The ProblemList.txt has no changes. Otherwise, it looks good. Thanks, Serguei On 6/29/20 12:37, Chris Plummer wrote: > Hi Dan, > > Something is wrong with ProblemList.txt. It doesn't show any changes, > but I also don't see mixed002 in the file anymore. > > Otherwise the changes look good. > > thanks, > > Chris > > On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >> Greetings, >> >> I have a fix for the following bug: >> >> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >> WhiteBox.deflateIdleMonitors support >> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >> >> Here's the webrev URL: >> >> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >> >> The test bug that's being fixed: >> >> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >> ??? intermittently with the following message: >> >> ???? nsk.share.TestBug: There are more than one(2) instance of >> 'nsk.share.jpda.StateTestThread in debuggee >> >> Summary of the fix: >> >> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated >> ??? ObjectMonitors are deflated after each debuggee has been run. >> >> This fix has been tested with a Mach5 Tier5 test run that executes all >> of the JDI tests (along with JDWP, JVM/TI and other Serviceability >> tests). >> I also did five 100 iteration runs of the failing mix002 test. Each >> Mach5 >> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 for a >> total of (5 * 100 * 3) iterations of nsk/jdi/stress/serial/mixed002. >> There >> were no failures. >> >> Thanks, in advance, for any comments, questions or suggestions. >> >> Dan >> >> >> Gory details: >> >> The primary focus of the fix is in the first three files in the webrev: >> >> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >> >> test/hotspot/jtreg/ProblemList.txt >> >> nsk.share.jdi.SerialExecutionDebuggee is the class that used to serially >> execute the debuggee portion of a specific list of tests. After this >> class >> is done executing a debuggee class, it needs to deflate idle monitors in >> order to prevent a StateTestThread object created by one debuggee class >> from confusing the next debuggee class. Each of the debuggee classes >> that >> use StateTestThread expect there to be only one of these objects. >> However, >> since we are running multiple debuggee classes serially *in the same >> VM*, >> the StateTestThread object created in one debuggee can still be around >> when the next debuggee runs. >> >> The COMMAND_CLEAR_DEBUGGEE implementation clears the currentDebuggee >> variable >> which permits the debuggee to be GC'ed and is modified by this fix to >> call >> WhiteBox.deflateIdleMonitors() to make sure that all inflated >> ObjectMonitors >> are deflated after each debuggee has been run. This takes care of any >> pinned >> StateTestThread objects (and any other inflated ObjectMonitors). >> >> >> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style stress >> test that >> executes the debugger and debuggee parts of a specific list of tests >> serially >> *in the same VM*. Several of the tests executed by mixed002 make use >> of the >> StateTestThread class. The failure is intermittent because the order >> of test >> execution is shuffled automatically and sometimes the ServiceThread >> manages >> to execute deflation at the right time to prevent more than one >> StateTestThread >> object from existing at the same time. >> >> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the >> standard >> boilerplate needed to call WhiteBox functions from test code. The >> actual call >> to WhiteBox.deflateIdleMonitors() is made in SerialExecutionDebuggee. >> I did >> attempt a fix where I modified the StateTestThread class to make the >> call to >> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is no >> longer >> contended or waited on. That fix reduced the frequency of the >> failures by >> about half, but it didn't solve the test bug entirely. So I had to >> make the >> fix in SerialExecutionDebuggee instead. >> >> >> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >> mix002 test. >> >> >> The remaining nine files are also wrapper style stress tests that >> execute >> the debugger and debuggee parts of a specific list of tests serially *in >> the same VM*. Because these tests also use SerialExecutionDebuggee, they >> also need the boilerplate changes so that WhiteBox.deflateIdleMonitors() >> can be called. > > From daniel.daugherty at oracle.com Mon Jun 29 19:45:40 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 15:45:40 -0400 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> Message-ID: <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> Chris and Serguei, Thanks for the fast reviews!! I generated the webrev in my "mach5" directory and that was baselined on the jdk-16+3 snapshot and that doesn't include the ProblemList change. Sigh...? I have updated the repo to "current" and regenerated the webrev. test/hotspot/jtreg/ProblemList.txt? now shows: @@ -126,11 +126,10 @@ ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java 8060733 generic-all ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java 7034630 generic-all ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java 8065773 generic-all ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java 8065773 generic-all -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java 8246493 generic-all ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 generic-all ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 generic-all Thanks again for the fast reviews!! Dan On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: > Hi Dan, > > The same as from Chris. > The ProblemList.txt has no changes. > Otherwise, it looks good. > > Thanks, > Serguei > > > > On 6/29/20 12:37, Chris Plummer wrote: >> Hi Dan, >> >> Something is wrong with ProblemList.txt. It doesn't show any changes, >> but I also don't see mixed002 in the file anymore. >> >> Otherwise the changes look good. >> >> thanks, >> >> Chris >> >> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>> Greetings, >>> >>> I have a fix for the following bug: >>> >>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>> WhiteBox.deflateIdleMonitors support >>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>> >>> Here's the webrev URL: >>> >>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>> >>> The test bug that's being fixed: >>> >>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >>> ??? intermittently with the following message: >>> >>> ???? nsk.share.TestBug: There are more than one(2) instance of >>> 'nsk.share.jpda.StateTestThread in debuggee >>> >>> Summary of the fix: >>> >>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated >>> ??? ObjectMonitors are deflated after each debuggee has been run. >>> >>> This fix has been tested with a Mach5 Tier5 test run that executes all >>> of the JDI tests (along with JDWP, JVM/TI and other Serviceability >>> tests). >>> I also did five 100 iteration runs of the failing mix002 test. Each >>> Mach5 >>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 for a >>> total of (5 * 100 * 3) iterations of nsk/jdi/stress/serial/mixed002. >>> There >>> were no failures. >>> >>> Thanks, in advance, for any comments, questions or suggestions. >>> >>> Dan >>> >>> >>> Gory details: >>> >>> The primary focus of the fix is in the first three files in the webrev: >>> >>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>> >>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>> >>> test/hotspot/jtreg/ProblemList.txt >>> >>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>> serially >>> execute the debuggee portion of a specific list of tests. After this >>> class >>> is done executing a debuggee class, it needs to deflate idle >>> monitors in >>> order to prevent a StateTestThread object created by one debuggee class >>> from confusing the next debuggee class. Each of the debuggee classes >>> that >>> use StateTestThread expect there to be only one of these objects. >>> However, >>> since we are running multiple debuggee classes serially *in the same >>> VM*, >>> the StateTestThread object created in one debuggee can still be around >>> when the next debuggee runs. >>> >>> The COMMAND_CLEAR_DEBUGGEE implementation clears the currentDebuggee >>> variable >>> which permits the debuggee to be GC'ed and is modified by this fix >>> to call >>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>> ObjectMonitors >>> are deflated after each debuggee has been run. This takes care of >>> any pinned >>> StateTestThread objects (and any other inflated ObjectMonitors). >>> >>> >>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style stress >>> test that >>> executes the debugger and debuggee parts of a specific list of tests >>> serially >>> *in the same VM*. Several of the tests executed by mixed002 make use >>> of the >>> StateTestThread class. The failure is intermittent because the order >>> of test >>> execution is shuffled automatically and sometimes the ServiceThread >>> manages >>> to execute deflation at the right time to prevent more than one >>> StateTestThread >>> object from existing at the same time. >>> >>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the >>> standard >>> boilerplate needed to call WhiteBox functions from test code. The >>> actual call >>> to WhiteBox.deflateIdleMonitors() is made in >>> SerialExecutionDebuggee. I did >>> attempt a fix where I modified the StateTestThread class to make the >>> call to >>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is no >>> longer >>> contended or waited on. That fix reduced the frequency of the >>> failures by >>> about half, but it didn't solve the test bug entirely. So I had to >>> make the >>> fix in SerialExecutionDebuggee instead. >>> >>> >>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>> mix002 test. >>> >>> >>> The remaining nine files are also wrapper style stress tests that >>> execute >>> the debugger and debuggee parts of a specific list of tests serially >>> *in >>> the same VM*. Because these tests also use SerialExecutionDebuggee, >>> they >>> also need the boilerplate changes so that >>> WhiteBox.deflateIdleMonitors() >>> can be called. >> >> > From chris.plummer at oracle.com Mon Jun 29 19:49:46 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 29 Jun 2020 12:49:46 -0700 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> Message-ID: Looks good. Chris On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: > Chris and Serguei, > > Thanks for the fast reviews!! > > I generated the webrev in my "mach5" directory and that was baselined > on the jdk-16+3 snapshot and that doesn't include the ProblemList change. > Sigh...? I have updated the repo to "current" and regenerated the webrev. > > test/hotspot/jtreg/ProblemList.txt? now shows: > > @@ -126,11 +126,10 @@ > ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java > 8060733 generic-all > > ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java > 7034630 generic-all > ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java > 8065773 generic-all > ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java > 8065773 generic-all > -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java > 8246493 generic-all > > ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all > > ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 > generic-all > ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 > generic-all > > Thanks again for the fast reviews!! > > Dan > > > On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >> Hi Dan, >> >> The same as from Chris. >> The ProblemList.txt has no changes. >> Otherwise, it looks good. >> >> Thanks, >> Serguei >> >> >> >> On 6/29/20 12:37, Chris Plummer wrote: >>> Hi Dan, >>> >>> Something is wrong with ProblemList.txt. It doesn't show any >>> changes, but I also don't see mixed002 in the file anymore. >>> >>> Otherwise the changes look good. >>> >>> thanks, >>> >>> Chris >>> >>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>> Greetings, >>>> >>>> I have a fix for the following bug: >>>> >>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>> WhiteBox.deflateIdleMonitors support >>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>> >>>> Here's the webrev URL: >>>> >>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>> >>>> The test bug that's being fixed: >>>> >>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >>>> ??? intermittently with the following message: >>>> >>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>> 'nsk.share.jpda.StateTestThread in debuggee >>>> >>>> Summary of the fix: >>>> >>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>> >>>> This fix has been tested with a Mach5 Tier5 test run that executes all >>>> of the JDI tests (along with JDWP, JVM/TI and other Serviceability >>>> tests). >>>> I also did five 100 iteration runs of the failing mix002 test. Each >>>> Mach5 >>>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 for a >>>> total of (5 * 100 * 3) iterations of >>>> nsk/jdi/stress/serial/mixed002. There >>>> were no failures. >>>> >>>> Thanks, in advance, for any comments, questions or suggestions. >>>> >>>> Dan >>>> >>>> >>>> Gory details: >>>> >>>> The primary focus of the fix is in the first three files in the >>>> webrev: >>>> >>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>> >>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>> >>>> test/hotspot/jtreg/ProblemList.txt >>>> >>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>> serially >>>> execute the debuggee portion of a specific list of tests. After >>>> this class >>>> is done executing a debuggee class, it needs to deflate idle >>>> monitors in >>>> order to prevent a StateTestThread object created by one debuggee >>>> class >>>> from confusing the next debuggee class. Each of the debuggee >>>> classes that >>>> use StateTestThread expect there to be only one of these objects. >>>> However, >>>> since we are running multiple debuggee classes serially *in the >>>> same VM*, >>>> the StateTestThread object created in one debuggee can still be around >>>> when the next debuggee runs. >>>> >>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>> currentDebuggee variable >>>> which permits the debuggee to be GC'ed and is modified by this fix >>>> to call >>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>> ObjectMonitors >>>> are deflated after each debuggee has been run. This takes care of >>>> any pinned >>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>> >>>> >>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style stress >>>> test that >>>> executes the debugger and debuggee parts of a specific list of >>>> tests serially >>>> *in the same VM*. Several of the tests executed by mixed002 make >>>> use of the >>>> StateTestThread class. The failure is intermittent because the >>>> order of test >>>> execution is shuffled automatically and sometimes the ServiceThread >>>> manages >>>> to execute deflation at the right time to prevent more than one >>>> StateTestThread >>>> object from existing at the same time. >>>> >>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the >>>> standard >>>> boilerplate needed to call WhiteBox functions from test code. The >>>> actual call >>>> to WhiteBox.deflateIdleMonitors() is made in >>>> SerialExecutionDebuggee. I did >>>> attempt a fix where I modified the StateTestThread class to make >>>> the call to >>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is >>>> no longer >>>> contended or waited on. That fix reduced the frequency of the >>>> failures by >>>> about half, but it didn't solve the test bug entirely. So I had to >>>> make the >>>> fix in SerialExecutionDebuggee instead. >>>> >>>> >>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>> mix002 test. >>>> >>>> >>>> The remaining nine files are also wrapper style stress tests that >>>> execute >>>> the debugger and debuggee parts of a specific list of tests >>>> serially *in >>>> the same VM*. Because these tests also use SerialExecutionDebuggee, >>>> they >>>> also need the boilerplate changes so that >>>> WhiteBox.deflateIdleMonitors() >>>> can be called. >>> >>> >> > From daniel.daugherty at oracle.com Mon Jun 29 19:50:38 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 15:50:38 -0400 Subject: [15] RFR(XXS): 7107012: sun.jvm.hostspot.code.CompressedReadStream readDouble() conversion to long mishandled In-Reply-To: <151408c7-a9b7-7c02-d3e3-fc4155c1152f@oracle.com> References: <151408c7-a9b7-7c02-d3e3-fc4155c1152f@oracle.com> Message-ID: <0ac1f9fa-7ac3-27c7-3794-b27f307323cb@oracle.com> Hi Chris, This one caught my eye just because it was such an old bug number... :-) On 6/26/20 7:03 PM, Chris Plummer wrote: > Hello, > > Please help review the following: > > http://cr.openjdk.java.net/~cjplummer/7107012/webrev.00/index.html src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/CompressedReadStream.java ??? Nice catch! Thumbs up. Dan > https://bugs.openjdk.java.net/browse/JDK-7107012 > > This bug is filed as confidential, although the issue is trivial. In > the following line of code: > > ??? return Double.longBitsToDouble((h << 32) | ((long)l & > 0x00000000FFFFFFFFL)); > > Since h is an int, it's subject to the following: > > https://docs.oracle.com/javase/specs/jls/se14/html/jls-15.html#jls-15.19 > > "If the promoted type of the left-hand operand is int, then only the > five lowest-order bits of the right-hand operand are used as the shift > distance. It is as if the right-hand operand were subjected to a > bitwise logical AND operator & (?15.22.1) with the mask value 0x1f > (0b11111). The shift distance actually used is therefore always in the > range 0 to 31, inclusive." > > So (h << 32) is the same as (h << 0), which is not what was intended. > The spec also calls out another issue: > > "The type of the shift expression is the promoted type of the > left-hand operand." > > So even if it did left shift 32 bits, the result would have been > truncated to an int, meaning the result would always be 0. The fix is > to first cast h to a long. Doing this addresses both these problems, > allowing a full 32 bit left shift to be done, and leaving the result > as an untruncated long. > > I was unable to trigger use of this code in SA. It seems to be used to > pull locals out of a CompiledVFrame. I don't see any clhsdb paths to > this code. It appears the GUI hsdb uses it via a complex call path I > could not fully decipher, but I could not trigger its use from hsdb. > In any case, the fix is straight forward and trivial, so I'd rather > not have to spend more time digging deeper into its use and providing > a test case. > > thanks, > > Chris > From daniel.daugherty at oracle.com Mon Jun 29 19:53:45 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 15:53:45 -0400 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> Message-ID: <3e4eeb1f-70b6-1972-2487-a5d4affc0287@oracle.com> Chris, Thanks. One last thing... since this is a test bug, I wasn't planning to backport the fix to JDK15. The test is ProblemListed there so we won't see the intermittent failures. Are you and Serguei good with not fixing this test bug in JDK15? Dan P.S. Thanks again for your sleuthing that linked the bug to my fix for JDK-8153224. On 6/29/20 3:49 PM, Chris Plummer wrote: > Looks good. > > Chris > > On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >> Chris and Serguei, >> >> Thanks for the fast reviews!! >> >> I generated the webrev in my "mach5" directory and that was baselined >> on the jdk-16+3 snapshot and that doesn't include the ProblemList >> change. >> Sigh...? I have updated the repo to "current" and regenerated the >> webrev. >> >> test/hotspot/jtreg/ProblemList.txt? now shows: >> >> @@ -126,11 +126,10 @@ >> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >> 8060733 generic-all >> >> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >> 7034630 generic-all >> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >> 8065773 generic-all >> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >> 8065773 generic-all >> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >> 8246493 generic-all >> >> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >> >> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >> generic-all >> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >> generic-all >> >> Thanks again for the fast reviews!! >> >> Dan >> >> >> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Dan, >>> >>> The same as from Chris. >>> The ProblemList.txt has no changes. >>> Otherwise, it looks good. >>> >>> Thanks, >>> Serguei >>> >>> >>> >>> On 6/29/20 12:37, Chris Plummer wrote: >>>> Hi Dan, >>>> >>>> Something is wrong with ProblemList.txt. It doesn't show any >>>> changes, but I also don't see mixed002 in the file anymore. >>>> >>>> Otherwise the changes look good. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> I have a fix for the following bug: >>>>> >>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>> WhiteBox.deflateIdleMonitors support >>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>> >>>>> The test bug that's being fixed: >>>>> >>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >>>>> ??? intermittently with the following message: >>>>> >>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>> >>>>> Summary of the fix: >>>>> >>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>> >>>>> This fix has been tested with a Mach5 Tier5 test run that executes >>>>> all >>>>> of the JDI tests (along with JDWP, JVM/TI and other Serviceability >>>>> tests). >>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>> Each Mach5 >>>>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 >>>>> for a >>>>> total of (5 * 100 * 3) iterations of >>>>> nsk/jdi/stress/serial/mixed002. There >>>>> were no failures. >>>>> >>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>> >>>>> Dan >>>>> >>>>> >>>>> Gory details: >>>>> >>>>> The primary focus of the fix is in the first three files in the >>>>> webrev: >>>>> >>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>> >>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>> >>>>> test/hotspot/jtreg/ProblemList.txt >>>>> >>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>>> serially >>>>> execute the debuggee portion of a specific list of tests. After >>>>> this class >>>>> is done executing a debuggee class, it needs to deflate idle >>>>> monitors in >>>>> order to prevent a StateTestThread object created by one debuggee >>>>> class >>>>> from confusing the next debuggee class. Each of the debuggee >>>>> classes that >>>>> use StateTestThread expect there to be only one of these objects. >>>>> However, >>>>> since we are running multiple debuggee classes serially *in the >>>>> same VM*, >>>>> the StateTestThread object created in one debuggee can still be >>>>> around >>>>> when the next debuggee runs. >>>>> >>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>> currentDebuggee variable >>>>> which permits the debuggee to be GC'ed and is modified by this fix >>>>> to call >>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>> ObjectMonitors >>>>> are deflated after each debuggee has been run. This takes care of >>>>> any pinned >>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>> >>>>> >>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>> stress test that >>>>> executes the debugger and debuggee parts of a specific list of >>>>> tests serially >>>>> *in the same VM*. Several of the tests executed by mixed002 make >>>>> use of the >>>>> StateTestThread class. The failure is intermittent because the >>>>> order of test >>>>> execution is shuffled automatically and sometimes the >>>>> ServiceThread manages >>>>> to execute deflation at the right time to prevent more than one >>>>> StateTestThread >>>>> object from existing at the same time. >>>>> >>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the >>>>> standard >>>>> boilerplate needed to call WhiteBox functions from test code. The >>>>> actual call >>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>> SerialExecutionDebuggee. I did >>>>> attempt a fix where I modified the StateTestThread class to make >>>>> the call to >>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is >>>>> no longer >>>>> contended or waited on. That fix reduced the frequency of the >>>>> failures by >>>>> about half, but it didn't solve the test bug entirely. So I had to >>>>> make the >>>>> fix in SerialExecutionDebuggee instead. >>>>> >>>>> >>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>>> mix002 test. >>>>> >>>>> >>>>> The remaining nine files are also wrapper style stress tests that >>>>> execute >>>>> the debugger and debuggee parts of a specific list of tests >>>>> serially *in >>>>> the same VM*. Because these tests also use >>>>> SerialExecutionDebuggee, they >>>>> also need the boilerplate changes so that >>>>> WhiteBox.deflateIdleMonitors() >>>>> can be called. >>>> >>>> >>> >> > > From daniil.x.titov at oracle.com Mon Jun 29 19:54:46 2020 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Mon, 29 Jun 2020 12:54:46 -0700 Subject: RFR: 8205467: javax/management/remote/mandatory/connection/MultiThreadDeadLockTest.java possible deadlock Message-ID: <619D0073-F88B-4196-A286-6B2A2FF12305@oracle.com> Please review a tiny change that adjusts the wait timeout the test uses for "test.timeout.factor" system property. Please note that a trivial merge with fix [4] that is currently on review [3] will be required. Since issues [2] and [4] describe different problems I decided to not combine these both changes in the single fix. Testing: Mach5 tests tier1-tier3 successfully passed. [1] Web rev: https://cr.openjdk.java.net/~dtitov/8205467/webrev.01/ [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8205467 [3] https://mail.openjdk.java.net/pipermail/serviceability-dev/2020-June/032098.html [4] https://bugs.openjdk.java.net/browse/JDK-8227337 Thank you, Daniil From jcbeyler at google.com Mon Jun 29 20:10:07 2020 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Mon, 29 Jun 2020 13:10:07 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: Hi all, Sorry it took time to get back to this; could I get a new review from: http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.01/ The bug is here: https://bugs.openjdk.java.net/browse/JDK-8247615 Note, this passed the submit repo testing. Thanks and have a great day! Jc Ps: explicit inlined Acks/Done are below: Sorry it took time to get back to this: @Martin: - done the typo - about the sampling test: No you won't get samples due to how the system is done, since we know we only will be allocating one object for the thread, it dies out before a sample is required... though adding the change that Man wants might make it more flaky so I added your numThreads / 2 in case - done for the always in the description On Thu, Jun 25, 2020 at 6:54 PM Derek Thomson wrote: > > It could also avoid the problem where every thread deterministically > allocates the same object at 512K, although this is unlikely. > > I've recently discovered that with certain server frameworks that this > actually becomes quite likely! So I'd strongly recommend using > pick_next_sample. > Ack, done :) > > On Thu, Jun 25, 2020 at 4:56 PM Man Cao wrote: > >> Thanks for fixing this! >> >> > 53 ThreadHeapSampler() : _bytes_until_sample(get_sampling_interval()) >> { >> >> Does this work better? (It has to be done after the initialization of >> _rnd.) >> _bytes_until_sample = pick_next_sample(); >> >> It could avoid completely missing to sample the first 512K allocation. >> It could also avoid the problem where every thread >> > Done. > deterministically allocates the same object at 512K, although this is >> unlikely. >> >> -Man >> > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Mon Jun 29 20:41:21 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 29 Jun 2020 13:41:21 -0700 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: <3e4eeb1f-70b6-1972-2487-a5d4affc0287@oracle.com> References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> <3e4eeb1f-70b6-1972-2487-a5d4affc0287@oracle.com> Message-ID: Hi Dan, I think you should push it directly to 15 since it's a new issue. thanks, Chris On 6/29/20 12:53 PM, Daniel D. Daugherty wrote: > Chris, > > Thanks. One last thing... since this is a test bug, I wasn't planning to > backport the fix to JDK15. The test is ProblemListed there so we won't > see > the intermittent failures. > > Are you and Serguei good with not fixing this test bug in JDK15? > > Dan > > P.S. > Thanks again for your sleuthing that linked the bug to my > fix for JDK-8153224. > > > On 6/29/20 3:49 PM, Chris Plummer wrote: >> Looks good. >> >> Chris >> >> On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >>> Chris and Serguei, >>> >>> Thanks for the fast reviews!! >>> >>> I generated the webrev in my "mach5" directory and that was baselined >>> on the jdk-16+3 snapshot and that doesn't include the ProblemList >>> change. >>> Sigh...? I have updated the repo to "current" and regenerated the >>> webrev. >>> >>> test/hotspot/jtreg/ProblemList.txt? now shows: >>> >>> @@ -126,11 +126,10 @@ >>> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >>> 8060733 generic-all >>> >>> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >>> 7034630 generic-all >>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >>> 8065773 generic-all >>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >>> 8065773 generic-all >>> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>> 8246493 generic-all >>> >>> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >>> >>> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >>> generic-all >>> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >>> generic-all >>> >>> Thanks again for the fast reviews!! >>> >>> Dan >>> >>> >>> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Dan, >>>> >>>> The same as from Chris. >>>> The ProblemList.txt has no changes. >>>> Otherwise, it looks good. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> >>>> On 6/29/20 12:37, Chris Plummer wrote: >>>>> Hi Dan, >>>>> >>>>> Something is wrong with ProblemList.txt. It doesn't show any >>>>> changes, but I also don't see mixed002 in the file anymore. >>>>> >>>>> Otherwise the changes look good. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> I have a fix for the following bug: >>>>>> >>>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>>> WhiteBox.deflateIdleMonitors support >>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>>> >>>>>> Here's the webrev URL: >>>>>> >>>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>>> >>>>>> The test bug that's being fixed: >>>>>> >>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >>>>>> ??? intermittently with the following message: >>>>>> >>>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>>> >>>>>> Summary of the fix: >>>>>> >>>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all >>>>>> inflated >>>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>>> >>>>>> This fix has been tested with a Mach5 Tier5 test run that >>>>>> executes all >>>>>> of the JDI tests (along with JDWP, JVM/TI and other >>>>>> Serviceability tests). >>>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>>> Each Mach5 >>>>>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 >>>>>> for a >>>>>> total of (5 * 100 * 3) iterations of >>>>>> nsk/jdi/stress/serial/mixed002. There >>>>>> were no failures. >>>>>> >>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>> Gory details: >>>>>> >>>>>> The primary focus of the fix is in the first three files in the >>>>>> webrev: >>>>>> >>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>>> >>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>> >>>>>> test/hotspot/jtreg/ProblemList.txt >>>>>> >>>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>>>> serially >>>>>> execute the debuggee portion of a specific list of tests. After >>>>>> this class >>>>>> is done executing a debuggee class, it needs to deflate idle >>>>>> monitors in >>>>>> order to prevent a StateTestThread object created by one debuggee >>>>>> class >>>>>> from confusing the next debuggee class. Each of the debuggee >>>>>> classes that >>>>>> use StateTestThread expect there to be only one of these objects. >>>>>> However, >>>>>> since we are running multiple debuggee classes serially *in the >>>>>> same VM*, >>>>>> the StateTestThread object created in one debuggee can still be >>>>>> around >>>>>> when the next debuggee runs. >>>>>> >>>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>>> currentDebuggee variable >>>>>> which permits the debuggee to be GC'ed and is modified by this >>>>>> fix to call >>>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>>> ObjectMonitors >>>>>> are deflated after each debuggee has been run. This takes care of >>>>>> any pinned >>>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>>> >>>>>> >>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>>> stress test that >>>>>> executes the debugger and debuggee parts of a specific list of >>>>>> tests serially >>>>>> *in the same VM*. Several of the tests executed by mixed002 make >>>>>> use of the >>>>>> StateTestThread class. The failure is intermittent because the >>>>>> order of test >>>>>> execution is shuffled automatically and sometimes the >>>>>> ServiceThread manages >>>>>> to execute deflation at the right time to prevent more than one >>>>>> StateTestThread >>>>>> object from existing at the same time. >>>>>> >>>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are >>>>>> the standard >>>>>> boilerplate needed to call WhiteBox functions from test code. The >>>>>> actual call >>>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>>> SerialExecutionDebuggee. I did >>>>>> attempt a fix where I modified the StateTestThread class to make >>>>>> the call to >>>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is >>>>>> no longer >>>>>> contended or waited on. That fix reduced the frequency of the >>>>>> failures by >>>>>> about half, but it didn't solve the test bug entirely. So I had >>>>>> to make the >>>>>> fix in SerialExecutionDebuggee instead. >>>>>> >>>>>> >>>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>>>> mix002 test. >>>>>> >>>>>> >>>>>> The remaining nine files are also wrapper style stress tests that >>>>>> execute >>>>>> the debugger and debuggee parts of a specific list of tests >>>>>> serially *in >>>>>> the same VM*. Because these tests also use >>>>>> SerialExecutionDebuggee, they >>>>>> also need the boilerplate changes so that >>>>>> WhiteBox.deflateIdleMonitors() >>>>>> can be called. >>>>> >>>>> >>>> >>> >> >> > From hohensee at amazon.com Mon Jun 29 20:40:14 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Mon, 29 Jun 2020 20:40:14 +0000 Subject: RFR (S): 8245129: Enhance jstat gc option output and tests In-Reply-To: <65944D3F-353D-42BB-A552-713BCB6331E9@oracle.com> References: <5E73A203-4A05-480B-97F1-C4F3A090B293@amazon.com> <65944D3F-353D-42BB-A552-713BCB6331E9@oracle.com> Message-ID: <6C0B6C5F-D67C-48A6-B35D-69F9E252E151@amazon.com> Thanks, Daniil! Pushed. Paul ?On 6/24/20, 4:07 PM, "Daniil Titov" wrote: Hi Paul, The change looks good to me. Thanks! --Daniil On 6/22/20, 8:48 AM, "serviceability-dev on behalf of Hohensee, Paul" wrote: Thanks very much for review, Volker. I'll file a follow-up issue. One more reviewer, please? :) Paul On 6/22/20, 8:10 AM, "serviceability-dev on behalf of Volker Simonis" wrote: Hi Paul, thanks for fixing jstat for larger heaps. I like that you've added explicit tests for ParallelGC which hasn't been tested since G1 was made the default collector. I also agree that sizes should all be right justified. I only wonder if the header of a right justified column shouldn't be right justified as well? However, taking into account that this already hasn't been handled consistently before your change, I'm fine to postpone that to a follow-up cleanup change. I think the change looks good so thumbs up from me. Thank you and best regards, Volker On Thu, Jun 18, 2020 at 11:53 PM Hohensee, Paul wrote: > > Ping. Any takers for this simple patch? > > > > Thanks, > > Paul > > > > From: serviceability-dev on behalf of "Hohensee, Paul" > Date: Monday, May 18, 2020 at 8:25 AM > To: serviceability-dev > Subject: RFR (S): 8245129: Enhance jstat gc option output and tests > > > > Please review an enhancement to the jstat gc option output to make the columns wider (for up to a 2TB heap) so one can read the output without going cross-eyed. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8245129 > > Webrev: http://cr.openjdk.java.net/~phh/8245129/webrev.00/ > > > > I added tests using ParallelGC since the output can differ for non-G1 collectors. Successfully ran the test/hotspot/jtreg/serviceability/tmtools/jstat and test/jdk/sun/tools/jstat tests. A submit repo run had one failure > > > > runtime/MemberName/MemberNameLeak.java > > tier1 > > macosx-x64-debug > > > > but rerunning it on my laptop succeeded, and there?s no connection between this test and my patch. > > > > Thanks, > > Paul > > > > From daniel.daugherty at oracle.com Mon Jun 29 20:53:40 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 16:53:40 -0400 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> <3e4eeb1f-70b6-1972-2487-a5d4affc0287@oracle.com> Message-ID: <78104402-e49c-73a4-f761-8305a68f276e@oracle.com> The WhiteBox.deflateIdleMonitors() support is not in JDK15. That's something that added in JDK16 so I'd have to also backport that support. That support was included with another change (getting rid of the special deflation request mechanism) that is not appropriate for JDK15. Short version: I don't think we want to back port part of a patch from JDK16 -> JDK15 in order to fix this test bug. Dan On 6/29/20 4:41 PM, Chris Plummer wrote: > Hi Dan, > > I think you should push it directly to 15 since it's a new issue. > > thanks, > > Chris > > On 6/29/20 12:53 PM, Daniel D. Daugherty wrote: >> Chris, >> >> Thanks. One last thing... since this is a test bug, I wasn't planning to >> backport the fix to JDK15. The test is ProblemListed there so we >> won't see >> the intermittent failures. >> >> Are you and Serguei good with not fixing this test bug in JDK15? >> >> Dan >> >> P.S. >> Thanks again for your sleuthing that linked the bug to my >> fix for JDK-8153224. >> >> >> On 6/29/20 3:49 PM, Chris Plummer wrote: >>> Looks good. >>> >>> Chris >>> >>> On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >>>> Chris and Serguei, >>>> >>>> Thanks for the fast reviews!! >>>> >>>> I generated the webrev in my "mach5" directory and that was baselined >>>> on the jdk-16+3 snapshot and that doesn't include the ProblemList >>>> change. >>>> Sigh...? I have updated the repo to "current" and regenerated the >>>> webrev. >>>> >>>> test/hotspot/jtreg/ProblemList.txt? now shows: >>>> >>>> @@ -126,11 +126,10 @@ >>>> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >>>> 8060733 generic-all >>>> >>>> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >>>> 7034630 generic-all >>>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >>>> 8065773 generic-all >>>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >>>> 8065773 generic-all >>>> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>> 8246493 generic-all >>>> >>>> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >>>> >>>> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >>>> generic-all >>>> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >>>> generic-all >>>> >>>> Thanks again for the fast reviews!! >>>> >>>> Dan >>>> >>>> >>>> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>>>> Hi Dan, >>>>> >>>>> The same as from Chris. >>>>> The ProblemList.txt has no changes. >>>>> Otherwise, it looks good. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> >>>>> On 6/29/20 12:37, Chris Plummer wrote: >>>>>> Hi Dan, >>>>>> >>>>>> Something is wrong with ProblemList.txt. It doesn't show any >>>>>> changes, but I also don't see mixed002 in the file anymore. >>>>>> >>>>>> Otherwise the changes look good. >>>>>> >>>>>> thanks, >>>>>> >>>>>> Chris >>>>>> >>>>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I have a fix for the following bug: >>>>>>> >>>>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>>>> WhiteBox.deflateIdleMonitors support >>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>>>> >>>>>>> Here's the webrev URL: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>>>> >>>>>>> The test bug that's being fixed: >>>>>>> >>>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>>> fails >>>>>>> ??? intermittently with the following message: >>>>>>> >>>>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>>>> >>>>>>> Summary of the fix: >>>>>>> >>>>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all >>>>>>> inflated >>>>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>>>> >>>>>>> This fix has been tested with a Mach5 Tier5 test run that >>>>>>> executes all >>>>>>> of the JDI tests (along with JDWP, JVM/TI and other >>>>>>> Serviceability tests). >>>>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>>>> Each Mach5 >>>>>>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 >>>>>>> for a >>>>>>> total of (5 * 100 * 3) iterations of >>>>>>> nsk/jdi/stress/serial/mixed002. There >>>>>>> were no failures. >>>>>>> >>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> Gory details: >>>>>>> >>>>>>> The primary focus of the fix is in the first three files in the >>>>>>> webrev: >>>>>>> >>>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>>>> >>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>>> >>>>>>> test/hotspot/jtreg/ProblemList.txt >>>>>>> >>>>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>>>>> serially >>>>>>> execute the debuggee portion of a specific list of tests. After >>>>>>> this class >>>>>>> is done executing a debuggee class, it needs to deflate idle >>>>>>> monitors in >>>>>>> order to prevent a StateTestThread object created by one >>>>>>> debuggee class >>>>>>> from confusing the next debuggee class. Each of the debuggee >>>>>>> classes that >>>>>>> use StateTestThread expect there to be only one of these >>>>>>> objects. However, >>>>>>> since we are running multiple debuggee classes serially *in the >>>>>>> same VM*, >>>>>>> the StateTestThread object created in one debuggee can still be >>>>>>> around >>>>>>> when the next debuggee runs. >>>>>>> >>>>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>>>> currentDebuggee variable >>>>>>> which permits the debuggee to be GC'ed and is modified by this >>>>>>> fix to call >>>>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>>>> ObjectMonitors >>>>>>> are deflated after each debuggee has been run. This takes care >>>>>>> of any pinned >>>>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>>>> >>>>>>> >>>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>>>> stress test that >>>>>>> executes the debugger and debuggee parts of a specific list of >>>>>>> tests serially >>>>>>> *in the same VM*. Several of the tests executed by mixed002 make >>>>>>> use of the >>>>>>> StateTestThread class. The failure is intermittent because the >>>>>>> order of test >>>>>>> execution is shuffled automatically and sometimes the >>>>>>> ServiceThread manages >>>>>>> to execute deflation at the right time to prevent more than one >>>>>>> StateTestThread >>>>>>> object from existing at the same time. >>>>>>> >>>>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are >>>>>>> the standard >>>>>>> boilerplate needed to call WhiteBox functions from test code. >>>>>>> The actual call >>>>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>>>> SerialExecutionDebuggee. I did >>>>>>> attempt a fix where I modified the StateTestThread class to make >>>>>>> the call to >>>>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject >>>>>>> is no longer >>>>>>> contended or waited on. That fix reduced the frequency of the >>>>>>> failures by >>>>>>> about half, but it didn't solve the test bug entirely. So I had >>>>>>> to make the >>>>>>> fix in SerialExecutionDebuggee instead. >>>>>>> >>>>>>> >>>>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>>>>> mix002 test. >>>>>>> >>>>>>> >>>>>>> The remaining nine files are also wrapper style stress tests >>>>>>> that execute >>>>>>> the debugger and debuggee parts of a specific list of tests >>>>>>> serially *in >>>>>>> the same VM*. Because these tests also use >>>>>>> SerialExecutionDebuggee, they >>>>>>> also need the boilerplate changes so that >>>>>>> WhiteBox.deflateIdleMonitors() >>>>>>> can be called. >>>>>> >>>>>> >>>>> >>>> >>> >>> >> > > From chris.plummer at oracle.com Mon Jun 29 21:02:37 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 29 Jun 2020 14:02:37 -0700 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: <78104402-e49c-73a4-f761-8305a68f276e@oracle.com> References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> <3e4eeb1f-70b6-1972-2487-a5d4affc0287@oracle.com> <78104402-e49c-73a4-f761-8305a68f276e@oracle.com> Message-ID: Ok, in that case it sounds best not to backport. It would be best to make this clear in the bug so there is no future attempt to backport this change except to versions that have already done the WhiteBox.deflateIdleMonitors() backport. thanks, Chris On 6/29/20 1:53 PM, Daniel D. Daugherty wrote: > The WhiteBox.deflateIdleMonitors() support is not in JDK15. That's > something that added in JDK16 so I'd have to also backport that support. > That support was included with another change (getting rid of the special > deflation request mechanism) that is not appropriate for JDK15. > > Short version: I don't think we want to back port part of a patch from > JDK16 -> JDK15 in order to fix this test bug. > > Dan > > > On 6/29/20 4:41 PM, Chris Plummer wrote: >> Hi Dan, >> >> I think you should push it directly to 15 since it's a new issue. >> >> thanks, >> >> Chris >> >> On 6/29/20 12:53 PM, Daniel D. Daugherty wrote: >>> Chris, >>> >>> Thanks. One last thing... since this is a test bug, I wasn't >>> planning to >>> backport the fix to JDK15. The test is ProblemListed there so we >>> won't see >>> the intermittent failures. >>> >>> Are you and Serguei good with not fixing this test bug in JDK15? >>> >>> Dan >>> >>> P.S. >>> Thanks again for your sleuthing that linked the bug to my >>> fix for JDK-8153224. >>> >>> >>> On 6/29/20 3:49 PM, Chris Plummer wrote: >>>> Looks good. >>>> >>>> Chris >>>> >>>> On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >>>>> Chris and Serguei, >>>>> >>>>> Thanks for the fast reviews!! >>>>> >>>>> I generated the webrev in my "mach5" directory and that was baselined >>>>> on the jdk-16+3 snapshot and that doesn't include the ProblemList >>>>> change. >>>>> Sigh...? I have updated the repo to "current" and regenerated the >>>>> webrev. >>>>> >>>>> test/hotspot/jtreg/ProblemList.txt? now shows: >>>>> >>>>> @@ -126,11 +126,10 @@ >>>>> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >>>>> 8060733 generic-all >>>>> >>>>> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >>>>> 7034630 generic-all >>>>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >>>>> 8065773 generic-all >>>>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >>>>> 8065773 generic-all >>>>> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>> 8246493 generic-all >>>>> >>>>> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >>>>> >>>>> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >>>>> generic-all >>>>> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >>>>> generic-all >>>>> >>>>> Thanks again for the fast reviews!! >>>>> >>>>> Dan >>>>> >>>>> >>>>> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>>>>> Hi Dan, >>>>>> >>>>>> The same as from Chris. >>>>>> The ProblemList.txt has no changes. >>>>>> Otherwise, it looks good. >>>>>> >>>>>> Thanks, >>>>>> Serguei >>>>>> >>>>>> >>>>>> >>>>>> On 6/29/20 12:37, Chris Plummer wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> Something is wrong with ProblemList.txt. It doesn't show any >>>>>>> changes, but I also don't see mixed002 in the file anymore. >>>>>>> >>>>>>> Otherwise the changes look good. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> I have a fix for the following bug: >>>>>>>> >>>>>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>>>>> WhiteBox.deflateIdleMonitors support >>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>>>>> >>>>>>>> Here's the webrev URL: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>>>>> >>>>>>>> The test bug that's being fixed: >>>>>>>> >>>>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>>>> fails >>>>>>>> ??? intermittently with the following message: >>>>>>>> >>>>>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>>>>> >>>>>>>> Summary of the fix: >>>>>>>> >>>>>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all >>>>>>>> inflated >>>>>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>>>>> >>>>>>>> This fix has been tested with a Mach5 Tier5 test run that >>>>>>>> executes all >>>>>>>> of the JDI tests (along with JDWP, JVM/TI and other >>>>>>>> Serviceability tests). >>>>>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>>>>> Each Mach5 >>>>>>>> job set ran the test 100 times on Linux-X64, macOSX, and >>>>>>>> Win-X64 for a >>>>>>>> total of (5 * 100 * 3) iterations of >>>>>>>> nsk/jdi/stress/serial/mixed002. There >>>>>>>> were no failures. >>>>>>>> >>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> >>>>>>>> Gory details: >>>>>>>> >>>>>>>> The primary focus of the fix is in the first three files in the >>>>>>>> webrev: >>>>>>>> >>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>>>>> >>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>>>> >>>>>>>> test/hotspot/jtreg/ProblemList.txt >>>>>>>> >>>>>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>>>>>> serially >>>>>>>> execute the debuggee portion of a specific list of tests. After >>>>>>>> this class >>>>>>>> is done executing a debuggee class, it needs to deflate idle >>>>>>>> monitors in >>>>>>>> order to prevent a StateTestThread object created by one >>>>>>>> debuggee class >>>>>>>> from confusing the next debuggee class. Each of the debuggee >>>>>>>> classes that >>>>>>>> use StateTestThread expect there to be only one of these >>>>>>>> objects. However, >>>>>>>> since we are running multiple debuggee classes serially *in the >>>>>>>> same VM*, >>>>>>>> the StateTestThread object created in one debuggee can still be >>>>>>>> around >>>>>>>> when the next debuggee runs. >>>>>>>> >>>>>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>>>>> currentDebuggee variable >>>>>>>> which permits the debuggee to be GC'ed and is modified by this >>>>>>>> fix to call >>>>>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>>>>> ObjectMonitors >>>>>>>> are deflated after each debuggee has been run. This takes care >>>>>>>> of any pinned >>>>>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>>>>> >>>>>>>> >>>>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>>>>> stress test that >>>>>>>> executes the debugger and debuggee parts of a specific list of >>>>>>>> tests serially >>>>>>>> *in the same VM*. Several of the tests executed by mixed002 >>>>>>>> make use of the >>>>>>>> StateTestThread class. The failure is intermittent because the >>>>>>>> order of test >>>>>>>> execution is shuffled automatically and sometimes the >>>>>>>> ServiceThread manages >>>>>>>> to execute deflation at the right time to prevent more than one >>>>>>>> StateTestThread >>>>>>>> object from existing at the same time. >>>>>>>> >>>>>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are >>>>>>>> the standard >>>>>>>> boilerplate needed to call WhiteBox functions from test code. >>>>>>>> The actual call >>>>>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>>>>> SerialExecutionDebuggee. I did >>>>>>>> attempt a fix where I modified the StateTestThread class to >>>>>>>> make the call to >>>>>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject >>>>>>>> is no longer >>>>>>>> contended or waited on. That fix reduced the frequency of the >>>>>>>> failures by >>>>>>>> about half, but it didn't solve the test bug entirely. So I had >>>>>>>> to make the >>>>>>>> fix in SerialExecutionDebuggee instead. >>>>>>>> >>>>>>>> >>>>>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>>>>>> mix002 test. >>>>>>>> >>>>>>>> >>>>>>>> The remaining nine files are also wrapper style stress tests >>>>>>>> that execute >>>>>>>> the debugger and debuggee parts of a specific list of tests >>>>>>>> serially *in >>>>>>>> the same VM*. Because these tests also use >>>>>>>> SerialExecutionDebuggee, they >>>>>>>> also need the boilerplate changes so that >>>>>>>> WhiteBox.deflateIdleMonitors() >>>>>>>> can be called. >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From serguei.spitsyn at oracle.com Mon Jun 29 21:16:52 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 29 Jun 2020 14:16:52 -0700 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> Message-ID: <83aa0144-b78e-2236-d481-89420928cb6b@oracle.com> +1 Thanks, Serguei On 6/29/20 12:49, Chris Plummer wrote: > Looks good. > > Chris > > On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >> Chris and Serguei, >> >> Thanks for the fast reviews!! >> >> I generated the webrev in my "mach5" directory and that was baselined >> on the jdk-16+3 snapshot and that doesn't include the ProblemList >> change. >> Sigh...? I have updated the repo to "current" and regenerated the >> webrev. >> >> test/hotspot/jtreg/ProblemList.txt? now shows: >> >> @@ -126,11 +126,10 @@ >> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >> 8060733 generic-all >> >> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >> 7034630 generic-all >> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >> 8065773 generic-all >> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >> 8065773 generic-all >> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >> 8246493 generic-all >> >> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >> >> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >> generic-all >> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >> generic-all >> >> Thanks again for the fast reviews!! >> >> Dan >> >> >> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Dan, >>> >>> The same as from Chris. >>> The ProblemList.txt has no changes. >>> Otherwise, it looks good. >>> >>> Thanks, >>> Serguei >>> >>> >>> >>> On 6/29/20 12:37, Chris Plummer wrote: >>>> Hi Dan, >>>> >>>> Something is wrong with ProblemList.txt. It doesn't show any >>>> changes, but I also don't see mixed002 in the file anymore. >>>> >>>> Otherwise the changes look good. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>> Greetings, >>>>> >>>>> I have a fix for the following bug: >>>>> >>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>> WhiteBox.deflateIdleMonitors support >>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>> >>>>> Here's the webrev URL: >>>>> >>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>> >>>>> The test bug that's being fixed: >>>>> >>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >>>>> ??? intermittently with the following message: >>>>> >>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>> >>>>> Summary of the fix: >>>>> >>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>> >>>>> This fix has been tested with a Mach5 Tier5 test run that executes >>>>> all >>>>> of the JDI tests (along with JDWP, JVM/TI and other Serviceability >>>>> tests). >>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>> Each Mach5 >>>>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 >>>>> for a >>>>> total of (5 * 100 * 3) iterations of >>>>> nsk/jdi/stress/serial/mixed002. There >>>>> were no failures. >>>>> >>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>> >>>>> Dan >>>>> >>>>> >>>>> Gory details: >>>>> >>>>> The primary focus of the fix is in the first three files in the >>>>> webrev: >>>>> >>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>> >>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>> >>>>> test/hotspot/jtreg/ProblemList.txt >>>>> >>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>>> serially >>>>> execute the debuggee portion of a specific list of tests. After >>>>> this class >>>>> is done executing a debuggee class, it needs to deflate idle >>>>> monitors in >>>>> order to prevent a StateTestThread object created by one debuggee >>>>> class >>>>> from confusing the next debuggee class. Each of the debuggee >>>>> classes that >>>>> use StateTestThread expect there to be only one of these objects. >>>>> However, >>>>> since we are running multiple debuggee classes serially *in the >>>>> same VM*, >>>>> the StateTestThread object created in one debuggee can still be >>>>> around >>>>> when the next debuggee runs. >>>>> >>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>> currentDebuggee variable >>>>> which permits the debuggee to be GC'ed and is modified by this fix >>>>> to call >>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>> ObjectMonitors >>>>> are deflated after each debuggee has been run. This takes care of >>>>> any pinned >>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>> >>>>> >>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>> stress test that >>>>> executes the debugger and debuggee parts of a specific list of >>>>> tests serially >>>>> *in the same VM*. Several of the tests executed by mixed002 make >>>>> use of the >>>>> StateTestThread class. The failure is intermittent because the >>>>> order of test >>>>> execution is shuffled automatically and sometimes the >>>>> ServiceThread manages >>>>> to execute deflation at the right time to prevent more than one >>>>> StateTestThread >>>>> object from existing at the same time. >>>>> >>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are the >>>>> standard >>>>> boilerplate needed to call WhiteBox functions from test code. The >>>>> actual call >>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>> SerialExecutionDebuggee. I did >>>>> attempt a fix where I modified the StateTestThread class to make >>>>> the call to >>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is >>>>> no longer >>>>> contended or waited on. That fix reduced the frequency of the >>>>> failures by >>>>> about half, but it didn't solve the test bug entirely. So I had to >>>>> make the >>>>> fix in SerialExecutionDebuggee instead. >>>>> >>>>> >>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>>> mix002 test. >>>>> >>>>> >>>>> The remaining nine files are also wrapper style stress tests that >>>>> execute >>>>> the debugger and debuggee parts of a specific list of tests >>>>> serially *in >>>>> the same VM*. Because these tests also use >>>>> SerialExecutionDebuggee, they >>>>> also need the boilerplate changes so that >>>>> WhiteBox.deflateIdleMonitors() >>>>> can be called. >>>> >>>> >>> >> > > From manc at google.com Mon Jun 29 21:27:54 2020 From: manc at google.com (Man Cao) Date: Mon, 29 Jun 2020 14:27:54 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: Looks good. > though adding the change that Man wants might make it more flaky so I added your numThreads / 2 in case I don't see the "numThreads / 2" in webrev.01 though. No need for a webrev for this fix. -Man On Mon, Jun 29, 2020 at 1:10 PM Jean Christophe Beyler wrote: > Hi all, > > Sorry it took time to get back to this; could I get a new review from: > http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.01/ > > The bug is here: > https://bugs.openjdk.java.net/browse/JDK-8247615 > > Note, this passed the submit repo testing. > > Thanks and have a great day! > Jc > > Ps: explicit inlined Acks/Done are below: > > Sorry it took time to get back to this: > @Martin: > - done the typo > - about the sampling test: No you won't get samples due to how the > system is done, since we know we only will be allocating one object for the > thread, it dies out before a sample is required... though adding the change > that Man wants might make it more flaky so I added your numThreads / 2 in > case > - done for the always in the description > > > On Thu, Jun 25, 2020 at 6:54 PM Derek Thomson wrote: > >> > It could also avoid the problem where every thread deterministically >> allocates the same object at 512K, although this is unlikely. >> >> I've recently discovered that with certain server frameworks that this >> actually becomes quite likely! So I'd strongly recommend using >> pick_next_sample. >> > > Ack, done :) > > >> >> On Thu, Jun 25, 2020 at 4:56 PM Man Cao wrote: >> >>> Thanks for fixing this! >>> >>> > 53 ThreadHeapSampler() : >>> _bytes_until_sample(get_sampling_interval()) { >>> >>> Does this work better? (It has to be done after the initialization of >>> _rnd.) >>> _bytes_until_sample = pick_next_sample(); >>> >>> It could avoid completely missing to sample the first 512K allocation. >>> It could also avoid the problem where every thread >>> >> > Done. > > > >> deterministically allocates the same object at 512K, although this is >>> unlikely. >>> >>> -Man >>> >> > > -- > > Thanks, > Jc > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Mon Jun 29 21:27:56 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 17:27:56 -0400 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: <83aa0144-b78e-2236-d481-89420928cb6b@oracle.com> References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> <83aa0144-b78e-2236-d481-89420928cb6b@oracle.com> Message-ID: <971461a0-b001-d5fd-8d2d-b807f04e5cbb@oracle.com> Thanks for review! Dan On 6/29/20 5:16 PM, serguei.spitsyn at oracle.com wrote: > +1 > > Thanks, > Serguei > > > On 6/29/20 12:49, Chris Plummer wrote: >> Looks good. >> >> Chris >> >> On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >>> Chris and Serguei, >>> >>> Thanks for the fast reviews!! >>> >>> I generated the webrev in my "mach5" directory and that was baselined >>> on the jdk-16+3 snapshot and that doesn't include the ProblemList >>> change. >>> Sigh...? I have updated the repo to "current" and regenerated the >>> webrev. >>> >>> test/hotspot/jtreg/ProblemList.txt? now shows: >>> >>> @@ -126,11 +126,10 @@ >>> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >>> 8060733 generic-all >>> >>> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >>> 7034630 generic-all >>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >>> 8065773 generic-all >>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >>> 8065773 generic-all >>> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>> 8246493 generic-all >>> >>> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >>> >>> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >>> generic-all >>> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >>> generic-all >>> >>> Thanks again for the fast reviews!! >>> >>> Dan >>> >>> >>> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Dan, >>>> >>>> The same as from Chris. >>>> The ProblemList.txt has no changes. >>>> Otherwise, it looks good. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> >>>> On 6/29/20 12:37, Chris Plummer wrote: >>>>> Hi Dan, >>>>> >>>>> Something is wrong with ProblemList.txt. It doesn't show any >>>>> changes, but I also don't see mixed002 in the file anymore. >>>>> >>>>> Otherwise the changes look good. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>>> Greetings, >>>>>> >>>>>> I have a fix for the following bug: >>>>>> >>>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>>> WhiteBox.deflateIdleMonitors support >>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>>> >>>>>> Here's the webrev URL: >>>>>> >>>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>>> >>>>>> The test bug that's being fixed: >>>>>> >>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java fails >>>>>> ??? intermittently with the following message: >>>>>> >>>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>>> >>>>>> Summary of the fix: >>>>>> >>>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all >>>>>> inflated >>>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>>> >>>>>> This fix has been tested with a Mach5 Tier5 test run that >>>>>> executes all >>>>>> of the JDI tests (along with JDWP, JVM/TI and other >>>>>> Serviceability tests). >>>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>>> Each Mach5 >>>>>> job set ran the test 100 times on Linux-X64, macOSX, and Win-X64 >>>>>> for a >>>>>> total of (5 * 100 * 3) iterations of >>>>>> nsk/jdi/stress/serial/mixed002. There >>>>>> were no failures. >>>>>> >>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>> Gory details: >>>>>> >>>>>> The primary focus of the fix is in the first three files in the >>>>>> webrev: >>>>>> >>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>>> >>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>> >>>>>> test/hotspot/jtreg/ProblemList.txt >>>>>> >>>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used to >>>>>> serially >>>>>> execute the debuggee portion of a specific list of tests. After >>>>>> this class >>>>>> is done executing a debuggee class, it needs to deflate idle >>>>>> monitors in >>>>>> order to prevent a StateTestThread object created by one debuggee >>>>>> class >>>>>> from confusing the next debuggee class. Each of the debuggee >>>>>> classes that >>>>>> use StateTestThread expect there to be only one of these objects. >>>>>> However, >>>>>> since we are running multiple debuggee classes serially *in the >>>>>> same VM*, >>>>>> the StateTestThread object created in one debuggee can still be >>>>>> around >>>>>> when the next debuggee runs. >>>>>> >>>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>>> currentDebuggee variable >>>>>> which permits the debuggee to be GC'ed and is modified by this >>>>>> fix to call >>>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>>> ObjectMonitors >>>>>> are deflated after each debuggee has been run. This takes care of >>>>>> any pinned >>>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>>> >>>>>> >>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>>> stress test that >>>>>> executes the debugger and debuggee parts of a specific list of >>>>>> tests serially >>>>>> *in the same VM*. Several of the tests executed by mixed002 make >>>>>> use of the >>>>>> StateTestThread class. The failure is intermittent because the >>>>>> order of test >>>>>> execution is shuffled automatically and sometimes the >>>>>> ServiceThread manages >>>>>> to execute deflation at the right time to prevent more than one >>>>>> StateTestThread >>>>>> object from existing at the same time. >>>>>> >>>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are >>>>>> the standard >>>>>> boilerplate needed to call WhiteBox functions from test code. The >>>>>> actual call >>>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>>> SerialExecutionDebuggee. I did >>>>>> attempt a fix where I modified the StateTestThread class to make >>>>>> the call to >>>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject is >>>>>> no longer >>>>>> contended or waited on. That fix reduced the frequency of the >>>>>> failures by >>>>>> about half, but it didn't solve the test bug entirely. So I had >>>>>> to make the >>>>>> fix in SerialExecutionDebuggee instead. >>>>>> >>>>>> >>>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable the >>>>>> mix002 test. >>>>>> >>>>>> >>>>>> The remaining nine files are also wrapper style stress tests that >>>>>> execute >>>>>> the debugger and debuggee parts of a specific list of tests >>>>>> serially *in >>>>>> the same VM*. Because these tests also use >>>>>> SerialExecutionDebuggee, they >>>>>> also need the boilerplate changes so that >>>>>> WhiteBox.deflateIdleMonitors() >>>>>> can be called. >>>>> >>>>> >>>> >>> >> >> > From daniel.daugherty at oracle.com Mon Jun 29 21:28:20 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 29 Jun 2020 17:28:20 -0400 Subject: RFR(S): 8246493: JDI stress/serial/mixed002 needs to use WhiteBox.deflateIdleMonitors support In-Reply-To: References: <0f906972-f1a4-0cb4-ce83-87dc71fbede8@oracle.com> <558261cb-fa3a-3727-efce-e8bc2284c0ea@oracle.com> <7e8e4839-fb35-b7a2-c989-729d84354c0e@oracle.com> <3e4eeb1f-70b6-1972-2487-a5d4affc0287@oracle.com> <78104402-e49c-73a4-f761-8305a68f276e@oracle.com> Message-ID: <1266fde2-4563-7687-8986-4eaae97639b1@oracle.com> I'll add a note to 8246493. Dan On 6/29/20 5:02 PM, Chris Plummer wrote: > Ok, in that case it sounds best not to backport. It would be best to > make this clear in the bug so there is no future attempt to backport > this change except to versions that have already done the > WhiteBox.deflateIdleMonitors() backport. > > thanks, > > Chris > > On 6/29/20 1:53 PM, Daniel D. Daugherty wrote: >> The WhiteBox.deflateIdleMonitors() support is not in JDK15. That's >> something that added in JDK16 so I'd have to also backport that support. >> That support was included with another change (getting rid of the >> special >> deflation request mechanism) that is not appropriate for JDK15. >> >> Short version: I don't think we want to back port part of a patch from >> JDK16 -> JDK15 in order to fix this test bug. >> >> Dan >> >> >> On 6/29/20 4:41 PM, Chris Plummer wrote: >>> Hi Dan, >>> >>> I think you should push it directly to 15 since it's a new issue. >>> >>> thanks, >>> >>> Chris >>> >>> On 6/29/20 12:53 PM, Daniel D. Daugherty wrote: >>>> Chris, >>>> >>>> Thanks. One last thing... since this is a test bug, I wasn't >>>> planning to >>>> backport the fix to JDK15. The test is ProblemListed there so we >>>> won't see >>>> the intermittent failures. >>>> >>>> Are you and Serguei good with not fixing this test bug in JDK15? >>>> >>>> Dan >>>> >>>> P.S. >>>> Thanks again for your sleuthing that linked the bug to my >>>> fix for JDK-8153224. >>>> >>>> >>>> On 6/29/20 3:49 PM, Chris Plummer wrote: >>>>> Looks good. >>>>> >>>>> Chris >>>>> >>>>> On 6/29/20 12:45 PM, Daniel D. Daugherty wrote: >>>>>> Chris and Serguei, >>>>>> >>>>>> Thanks for the fast reviews!! >>>>>> >>>>>> I generated the webrev in my "mach5" directory and that was >>>>>> baselined >>>>>> on the jdk-16+3 snapshot and that doesn't include the ProblemList >>>>>> change. >>>>>> Sigh...? I have updated the repo to "current" and regenerated the >>>>>> webrev. >>>>>> >>>>>> test/hotspot/jtreg/ProblemList.txt? now shows: >>>>>> >>>>>> @@ -126,11 +126,10 @@ >>>>>> ?vmTestbase/nsk/monitoring/ThreadMXBean/ThreadInfo/Deadlock/JavaDeadlock001/TestDescription.java >>>>>> 8060733 generic-all >>>>>> >>>>>> ?vmTestbase/nsk/jdi/ThreadReference/stop/stop001/TestDescription.java >>>>>> 7034630 generic-all >>>>>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses021/TestDescription.java >>>>>> 8065773 generic-all >>>>>> ?vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses023/TestDescription.java >>>>>> 8065773 generic-all >>>>>> -vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>> 8246493 generic-all >>>>>> >>>>>> ?vmTestbase/nsk/jdb/eval/eval001/eval001.java 8221503 generic-all >>>>>> >>>>>> ?vmTestbase/metaspace/gc/firstGC_10m/TestDescription.java 8208250 >>>>>> generic-all >>>>>> ?vmTestbase/metaspace/gc/firstGC_50m/TestDescription.java 8208250 >>>>>> generic-all >>>>>> >>>>>> Thanks again for the fast reviews!! >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>> On 6/29/20 3:41 PM, serguei.spitsyn at oracle.com wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> The same as from Chris. >>>>>>> The ProblemList.txt has no changes. >>>>>>> Otherwise, it looks good. >>>>>>> >>>>>>> Thanks, >>>>>>> Serguei >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 6/29/20 12:37, Chris Plummer wrote: >>>>>>>> Hi Dan, >>>>>>>> >>>>>>>> Something is wrong with ProblemList.txt. It doesn't show any >>>>>>>> changes, but I also don't see mixed002 in the file anymore. >>>>>>>> >>>>>>>> Otherwise the changes look good. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>> On 6/29/20 12:21 PM, Daniel D. Daugherty wrote: >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> I have a fix for the following bug: >>>>>>>>> >>>>>>>>> ??? JDK-8246493 JDI stress/serial/mixed002 needs to use >>>>>>>>> WhiteBox.deflateIdleMonitors support >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246493 >>>>>>>>> >>>>>>>>> Here's the webrev URL: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~dcubed/8246493-webrev/0_for_jdk16/ >>>>>>>>> >>>>>>>>> The test bug that's being fixed: >>>>>>>>> >>>>>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>>>>> fails >>>>>>>>> ??? intermittently with the following message: >>>>>>>>> >>>>>>>>> ???? nsk.share.TestBug: There are more than one(2) instance of >>>>>>>>> 'nsk.share.jpda.StateTestThread in debuggee >>>>>>>>> >>>>>>>>> Summary of the fix: >>>>>>>>> >>>>>>>>> ??? Use WhiteBox.deflateIdleMonitors() to make sure that all >>>>>>>>> inflated >>>>>>>>> ??? ObjectMonitors are deflated after each debuggee has been run. >>>>>>>>> >>>>>>>>> This fix has been tested with a Mach5 Tier5 test run that >>>>>>>>> executes all >>>>>>>>> of the JDI tests (along with JDWP, JVM/TI and other >>>>>>>>> Serviceability tests). >>>>>>>>> I also did five 100 iteration runs of the failing mix002 test. >>>>>>>>> Each Mach5 >>>>>>>>> job set ran the test 100 times on Linux-X64, macOSX, and >>>>>>>>> Win-X64 for a >>>>>>>>> total of (5 * 100 * 3) iterations of >>>>>>>>> nsk/jdi/stress/serial/mixed002. There >>>>>>>>> were no failures. >>>>>>>>> >>>>>>>>> Thanks, in advance, for any comments, questions or suggestions. >>>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> Gory details: >>>>>>>>> >>>>>>>>> The primary focus of the fix is in the first three files in >>>>>>>>> the webrev: >>>>>>>>> >>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/SerialExecutionDebuggee.java >>>>>>>>> >>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/stress/serial/mixed002/TestDescription.java >>>>>>>>> >>>>>>>>> test/hotspot/jtreg/ProblemList.txt >>>>>>>>> >>>>>>>>> nsk.share.jdi.SerialExecutionDebuggee is the class that used >>>>>>>>> to serially >>>>>>>>> execute the debuggee portion of a specific list of tests. >>>>>>>>> After this class >>>>>>>>> is done executing a debuggee class, it needs to deflate idle >>>>>>>>> monitors in >>>>>>>>> order to prevent a StateTestThread object created by one >>>>>>>>> debuggee class >>>>>>>>> from confusing the next debuggee class. Each of the debuggee >>>>>>>>> classes that >>>>>>>>> use StateTestThread expect there to be only one of these >>>>>>>>> objects. However, >>>>>>>>> since we are running multiple debuggee classes serially *in >>>>>>>>> the same VM*, >>>>>>>>> the StateTestThread object created in one debuggee can still >>>>>>>>> be around >>>>>>>>> when the next debuggee runs. >>>>>>>>> >>>>>>>>> The COMMAND_CLEAR_DEBUGGEE implementation clears the >>>>>>>>> currentDebuggee variable >>>>>>>>> which permits the debuggee to be GC'ed and is modified by this >>>>>>>>> fix to call >>>>>>>>> WhiteBox.deflateIdleMonitors() to make sure that all inflated >>>>>>>>> ObjectMonitors >>>>>>>>> are deflated after each debuggee has been run. This takes care >>>>>>>>> of any pinned >>>>>>>>> StateTestThread objects (and any other inflated ObjectMonitors). >>>>>>>>> >>>>>>>>> >>>>>>>>> vmTestbase/nsk/jdi/stress/serial/mixed002 is a wrapper style >>>>>>>>> stress test that >>>>>>>>> executes the debugger and debuggee parts of a specific list of >>>>>>>>> tests serially >>>>>>>>> *in the same VM*. Several of the tests executed by mixed002 >>>>>>>>> make use of the >>>>>>>>> StateTestThread class. The failure is intermittent because the >>>>>>>>> order of test >>>>>>>>> execution is shuffled automatically and sometimes the >>>>>>>>> ServiceThread manages >>>>>>>>> to execute deflation at the right time to prevent more than >>>>>>>>> one StateTestThread >>>>>>>>> object from existing at the same time. >>>>>>>>> >>>>>>>>> The additions to vmTestbase/nsk/jdi/stress/serial/mixed002 are >>>>>>>>> the standard >>>>>>>>> boilerplate needed to call WhiteBox functions from test code. >>>>>>>>> The actual call >>>>>>>>> to WhiteBox.deflateIdleMonitors() is made in >>>>>>>>> SerialExecutionDebuggee. I did >>>>>>>>> attempt a fix where I modified the StateTestThread class to >>>>>>>>> make the call to >>>>>>>>> WhiteBox.deflateIdleMonitors() after the internal waitOnObject >>>>>>>>> is no longer >>>>>>>>> contended or waited on. That fix reduced the frequency of the >>>>>>>>> failures by >>>>>>>>> about half, but it didn't solve the test bug entirely. So I >>>>>>>>> had to make the >>>>>>>>> fix in SerialExecutionDebuggee instead. >>>>>>>>> >>>>>>>>> >>>>>>>>> test/hotspot/jtreg/ProblemList.txt is modified to re-enable >>>>>>>>> the mix002 test. >>>>>>>>> >>>>>>>>> >>>>>>>>> The remaining nine files are also wrapper style stress tests >>>>>>>>> that execute >>>>>>>>> the debugger and debuggee parts of a specific list of tests >>>>>>>>> serially *in >>>>>>>>> the same VM*. Because these tests also use >>>>>>>>> SerialExecutionDebuggee, they >>>>>>>>> also need the boilerplate changes so that >>>>>>>>> WhiteBox.deflateIdleMonitors() >>>>>>>>> can be called. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > From jcbeyler at google.com Mon Jun 29 22:26:27 2020 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Mon, 29 Jun 2020 15:26:27 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: Agreed; missed a hg qrefresh; link is now updated: http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.01/ :) Jc On Mon, Jun 29, 2020 at 2:28 PM Man Cao wrote: > Looks good. > > > though adding the change that Man wants might make it more flaky so I > added your numThreads / 2 in case > I don't see the "numThreads / 2" in webrev.01 though. No need for a webrev > for this fix. > > -Man > > > On Mon, Jun 29, 2020 at 1:10 PM Jean Christophe Beyler < > jcbeyler at google.com> wrote: > >> Hi all, >> >> Sorry it took time to get back to this; could I get a new review from: >> http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.01/ >> >> The bug is here: >> https://bugs.openjdk.java.net/browse/JDK-8247615 >> >> Note, this passed the submit repo testing. >> >> Thanks and have a great day! >> Jc >> >> Ps: explicit inlined Acks/Done are below: >> >> Sorry it took time to get back to this: >> @Martin: >> - done the typo >> - about the sampling test: No you won't get samples due to how the >> system is done, since we know we only will be allocating one object for the >> thread, it dies out before a sample is required... though adding the change >> that Man wants might make it more flaky so I added your numThreads / 2 in >> case >> - done for the always in the description >> >> >> On Thu, Jun 25, 2020 at 6:54 PM Derek Thomson >> wrote: >> >>> > It could also avoid the problem where every thread deterministically >>> allocates the same object at 512K, although this is unlikely. >>> >>> I've recently discovered that with certain server frameworks that this >>> actually becomes quite likely! So I'd strongly recommend using >>> pick_next_sample. >>> >> >> Ack, done :) >> >> >>> >>> On Thu, Jun 25, 2020 at 4:56 PM Man Cao wrote: >>> >>>> Thanks for fixing this! >>>> >>>> > 53 ThreadHeapSampler() : >>>> _bytes_until_sample(get_sampling_interval()) { >>>> >>>> Does this work better? (It has to be done after the initialization of >>>> _rnd.) >>>> _bytes_until_sample = pick_next_sample(); >>>> >>>> It could avoid completely missing to sample the first 512K allocation. >>>> It could also avoid the problem where every thread >>>> >>> >> Done. >> >> >> >>> deterministically allocates the same object at 512K, although this is >>>> unlikely. >>>> >>>> -Man >>>> >>> >> >> -- >> >> Thanks, >> Jc >> > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From suenaga at oss.nttdata.com Tue Jun 30 00:05:45 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Tue, 30 Jun 2020 09:05:45 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> Message-ID: <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> Hi David, Serguei, I updated webrev for 8242428. Could you review again? This change migrate to use direct handshake for GetStackTrace() and GetThreadListStackTraces() (when thread_count == 1). http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ VM_GetThreadListStackTrace (for GetThreadListStackTraces) and VM_GetAllStackTraces (for GetAllStackTraces) have inherited VM_GetMultipleStackTraces VM operation which provides the feature to generate jvmtiStackInfo. I modified VM_GetMultipleStackTraces to a normal C++ class to share with HandshakeClosure for GetThreadListStackTraces (GetSingleStackTraceClosure). Also I added new testcases which test GetThreadListStackTraces() with thread_count == 1 and with all threads. This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. Thanks, Yasumasa On 2020/06/24 15:50, Yasumasa Suenaga wrote: > Hi all, > > Please review this change: > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ > > This change replace following VM operations to direct handshake. > > ?- VM_GetFrameCount (GetFrameCount()) > ?- VM_GetFrameLocation (GetFrameLocation()) > ?- VM_GetThreadListStackTraces (GetThreadListStackTrace()) > ?- VM_GetCurrentLocation > > GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). > Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. > > This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns > k/jdwp. > > Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. > > > Thanks, > > Yasumasa From chris.plummer at oracle.com Tue Jun 30 00:10:27 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 29 Jun 2020 17:10:27 -0700 Subject: RFR (Preliminary): 8248194: Need better support for running SA tests on core files In-Reply-To: <6980c0f3-5ae7-a813-3adb-16cac40fe7f6@oracle.com> References: <04c841a1-1e38-155a-5d94-5d0e5a32c708@oracle.com> <8BD2E0F8-EE7B-4308-A058-2EC235D5212F@oracle.com> <2ac19e8a-0f0d-a215-6f92-748fd8c0c5ab@oracle.com> <6980c0f3-5ae7-a813-3adb-16cac40fe7f6@oracle.com> Message-ID: <84c27b68-d90b-c0c6-4e65-1a97aa9f2126@oracle.com> Hi Leonid, I'm starting to think that this should all go in a new CoreUtils.java file. I experimented with moving getCoreFileLocation() to OutputAnalyzer. It worked, but one adjustment I had to make is also moving SATestUtils.unzipCores() there also and make it private (no one else calls it). But that just got me thinking that maybe CoreUtils.java would be a better solution. I think I would put the addCoreUlimitCommand() API discussed below there also. What do you think? thanks, Chris On 6/28/20 5:29 PM, Chris Plummer wrote: > Hi Leonid, > > I think getCoreFileLocation() can simply move to OutputAnalyzer. No > need for it to be in SAUtils and be passed the String argument that > comes from OutputAnalyzer.getOutput(). > > For the ulimit support, how about if in ProcessTools I add: > > ??? public static ProcessBuilder addCoreUlimitCommand(ProcessBuilder pb); > > All the ulimit logic would move there from SATestUtils. It's straight > forward to use this API in LingeredApp and TestJmapCore. For > ClhsdbCDSCore I'll need to rework the > getTestJvmCommandlineWithPrefix() code a bit, since it creates a pb, > but doesn't save it. It only uses it to get the cmd String. > > Also, there's one new finding since I sent out the review. I found the > following in CiReplayBase.java: > > ??? // lets search few possible locations using process output and > return existing location > ??? private String getCoreFileLocation(String crashOutputString) { > > This is identical to the code I pulled from ClhsdbCDSCore and is now > in SATestUtils.parseCoreFileLocationFromOutput(). Although this is in > the compiler directory, it is in fact an SA test that uses clhsdb, > although directly via the CLHSDB class rather than through "jhsdb > clhsdb". > > This also explains why ClhsdbCDSCore had some logic to move and rename > the core file to "cds_core_file". I removed this logic because it > seemed unnecessary, but for CiReplayBase.java it needs to be in a > known location so SABase.java can find it. It's still fine for > ClhsdbCDSCore to not do the rename, and renaming is independent of any > code that locates the core file. > > I'm not going to update CiReplayBase.java as part of these changes > because the two tests that use it both have issues. TestSAServer is > problem listed, and when I removed it from the problem list it failed > with every run on every platform. There's also TestSAClient, but it > relies on client VM, which we don't support anymore. So with neither > of these tests running, I'd rather not introduce changes I can't > really test. > > However, there was something good that came out of the > CiReplayBase.java discovery. I had previously noted that ClhsdbCDSCore > is excluded from running on windows. When I removed the @requires for > this, it failed for a reason I didn't quite understand. The complaint > was about the path to java.exe when running the process that was > suppose to crash, although the path looked fine. However, I found that > TestSAServer ran fine on Windows, even though it was basically the > process launching code for causing the crash. I looked closer and > found one difference. In getTestJvmCommandlineWithPrefix(), which both > tests have, the CiReplayBase version had some extra code for Windows: > > ??????????? return new String[]{"sh", "-c", prefix > ??????????????? + (Platform.isWindows() ? cmd.replace('\\', > '/').replace(";", "\\;").replace("|", "\\|") : cmd)}; > > So on Windows it's doing a path conversion. Once I started doing the > same with ClhsdbCDSCore, it started to run fine on Windwos also. > > thanks, > > Chris > > On 6/26/20 8:42 PM, Chris Plummer wrote: >> Hi Leonid, >> >> On 6/26/20 7:51 PM, Leonid Mesnik wrote: >>> Hi >>> >>> The idea basically looks good. I think it just make a sense to >>> polish it a little bit to hide "sh" usage from test and get core >>> from OutputAnalyzer. >> Ok, I'll look into both of those. >>> ? Also, there is a 'CrashApp' in ClhsdbCDSCore.java. Makes it sense >>> to unify it with LingeredApp crasher? Currently, it uses Unsafe to >>> crash application. >> Yes, I purposely didn't not make that change. My main goal with the >> LingeredApp changes is to make it easier to make existing LingeredApp >> SA tests run on both a live process and on a core, and my main goal >> with ClhsdbCDSCore and TestJmapCore was to move the core finding code >> and ulimit code to a common location that could be reused by other >> tests. >> >> Keep in mind that ClhsdbLauncher and LingeredApp are independent of >> each other. You can have a LingeredApp tests that use or don't use >> ClhsdbLauncher, and you can have a non-LingeredApp tests that use or >> don't use ClhsdbLauncher. So I didn't want to go down the path of >> changing ClhsdbCDSCore (a non LingeredApp test) to use LingeredApp. >> Likewise I did not change TestJmapCore to use LingeredApp or >> ClhsdbLauncher. Possibly there is good reason to convert some of the >> tests to start using LingeredApp and/or ClhsdbLauncher, but that >> should be done under a separate RFE. >> >>> >>> Also, crashes are used in other tests, I see some implementations in >>> open/test/hotspot/jtreg/vmTestbase/vm/share/vmcrasher >> I don't see vmcrasher being used by any tests. In any case, my first >> attempt went down the Unsafe path to produce a crash. The issue is >> that it forces every user of LingeredApp to include the @module for >> Unsafe. I also tried using a WhiteBox API. That was worse, also >> requiring every user of LingeredApp to include an @module, plus the >> tests that actually want to cause a crash need to @build >> WhiteBox.java and then do the classfile install. It also required >> additional module related hacks in LingeredApp. The issue with my >> current solution is how to get libLingeredApp.c to compile has not >> been settled on. I'm still waiting for an answer from the build team. >>> >>> So it would be nice to have some common way to crash hotspot. >> I can see possibly moving the crashing code out of LingeredApp and >> into a native lib that non-LingeredApp tests can use, although that >> really is just a very small part of the changes to LingeredApp. For >> the most part the changes would look the same except you would call a >> different API to cause the crash. >>> >>> Leonid >>> >> Thanks for having a look. >> >> Chris >>>> On Jun 25, 2020, at 2:41 PM, Chris Plummer >>>> wrote: >>>> >>>> Hello, >>>> >>>> Please help with a preliminary review of changes to add better >>>> support for writing SA tests that work on core files: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8248194 >>>> http://cr.openjdk.java.net/~cjplummer/8248194/webrev.00/index.html >>>> >>>> As pointed out, this is a preliminary review. I suspect there will >>>> be some feedback for changes/improvements. Also, I still need to >>>> work out a final solution for how to get LingeredApp to produce a >>>> crash. What I currently have works but is somewhat of a hack w.r.t. >>>> the makefile change, so you can ignore the makefiile change for >>>> now. I'm working on a more proper solution with the build team. >>>> >>>> As outlined in the CR, these are the 3 main goals of this CR: >>>> >>>> 1. SATestUtils should include support for finding the core file. >>>> This includes parsing the output of the crashed process to locate >>>> where the core file was saved, and returning this location to the >>>> user. >>>> >>>> 2. SATestUtils should include support for adding the "ulimit -c >>>> unlimited" prefix to the command that will produce the core file, >>>> allowing the overriding of any lower limit so we can be sure the >>>> core file will be produced. >>>> >>>> 3. LingeredApp should include support for producing a core file. >>>> >>>> As proof of concept for these 3 changes in test library support, >>>> I'm updating the following 3 tests: >>>> >>>> ClhsdbCDSCore.java: Use the SATestUtils support listed above. This >>>> test does not use LingeredApp, so those improvements don't apply. >>>> >>>> TestJmapCore.java: Use the SATestUtils support listed above. This >>>> test does not use LingeredApp, so those improvements don't apply. >>>> >>>> ClhsdbFindPC.java: Use all the above features, including having >>>> LingeredApp produce a core file. This is the only test modified to >>>> start testing on core files that didn't previously do so. It still >>>> also tests on a live process. >>>> >>>> In the future more Clhsdb tests will be converted to work on core >>>> files in a manner similar to ClhsdbFindPC. >>>> >>>> The new SATestUtils code is borrowed from (more like ripped out of) >>>> ClhsdbCDSCore.java and TestJmapCore.java. They both had a lot of >>>> code dedicated to finding the core file and also applying "ulimit >>>> -c unlimitted" if necessary, but didn't do so in quite the same >>>> way. Now both these tests share code in SATestUtils.java. One thing >>>> I did drop is TestJmapCore.java use of ":KILLED_PID" in the output >>>> to help find the core file. It's no longer necessary based on the >>>> smarter core locating code I pulled from ClhsdbCDSCore.java. >>>> >>>> thanks, >>>> >>>> Chris >> > > From linzang at tencent.com Tue Jun 30 02:19:24 2020 From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=) Date: Tue, 30 Jun 2020 02:19:24 +0000 Subject: RFR(L): 8215624: add parallel heap inspection support for jmap histo(G1)(Internet mail) In-Reply-To: References: <94C0D11E-F395-4FE4-9ECE-5ECC84B3AE1B@tencent.com> <09702D94-F53C-413D-A156-B7390D689BC6@tencent.com> <4751f476-1e7a-490f-80c5-96b58eb25191@oracle.com> Message-ID: Dear All, Sorry to bother again, I just want to make sure that is this change worth to be continue to work on? If decision is made to not. I think I can drop this work and stop asking for help reviewing... Thanks for all your help about reviewing this previously. BRs, Lin ?On 2020/5/9, 3:47 PM, "linzang(??)" wrote: Dear All, May I ask your help again for review the latest change? Thanks! BRs, Lin On 2020/4/28, 1:54 PM, "linzang(??)" wrote: Hi Stefan, >> - Adding Atomic::load/store. >> - Removing the time measurement in the run_task. I renamed G1's function >> to run_task_timed. If we need this outside of G1, we can rethink the API >> at that point. >> - ZGC style cleanups Thanks for revising the patch, they are all good to me, and I have made a tiny change based on it: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04/ http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04-delta/ it reduce the scope of mutex in ParHeapInspectTask, and delete unnecessary comments. BRs, Lin On 2020/4/27, 4:34 PM, "Stefan Karlsson" wrote: Hi Lin, On 2020-04-26 05:10, linzang(??) wrote: > Hi Stefan and Paul? > I have made a new patch based on your comments and Stefan's Poc code: > Webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03/ > Delta(based on Stefan's change:) : http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03-delta/webrev_03-delta/ Thanks for providing a delta patch. It makes it much easier to look at, and more likely for reviewers to continue reviewing. I'm going to continue focusing on the GC parts, and leave the rest to others to review. > > And Here are main changed I made and want to discuss with you: > 1. changed"parallelThreadNum=" to "parallel=" for jmap -histo options. > 2. Add logic to test where parallelHeapInspection is fail, in heapInspection.cpp > This is because the parHeapInspectTask create thread local KlassInfoTable in it's work() method, and this may fail because of native OOM, in this case, the parallel should fail and serial heap inspection can be tried. > One more thing I want discuss with you is about the member "_success" of parHeapInspectTask, when native OOM happenes, it is set to false. And since this "set" operation can be conducted in multiple threads, should it be atomic ops? IMO, this is not necessary because "_success" can only be set to false, and there is no way to change it from back to true after the ParHeapInspectTask instance is created, so it is save to be non-atomic, do you agree with that? In these situations you should be using the Atomic::load/store primitives. We're moving toward a later C++ standard were data races are considered undefined behavior. > 3. make CollectedHeap::run_task() be an abstract virtual func, so that every subclass of collectedHeap should support it, so later implementation of new collectedHeap will not miss the "parallel" features. > The problem I want to discuss with you is about epsilonHeap and SerialHeap, as they may not need parallel heap iteration, so I only make task->work(0), in case the run_task() is invoked someway in future. Another way is to left run_task() unimplemented, which one do you think is better? I don't have a strong opinion about this. And also please help take a look at the zHeap, as there is a class zTask that wrap the abstractGangTask, and the collectedHeap::run_task() only accept AbstraceGangTask* as argument, so I made a delegate class to adapt it , please see src/hotspot/share/gc/z/zHeap.cpp. > > There maybe other better ways to sovle the above problems, welcome for any comments, Thanks! I've created a few cleanups and changes on top of your latest patch: https://cr.openjdk.java.net/~stefank/8215624/webrev.02.delta https://cr.openjdk.java.net/~stefank/8215624/webrev.02 - Adding Atomic::load/store. - Removing the time measurement in the run_task. I renamed G1's function to run_task_timed. If we need this outside of G1, we can rethink the API at that point. - ZGC style cleanups Thanks, StefanK > > BRs, > Lin > > On 2020/4/23, 11:08 AM, "linzang(??)" wrote: > > Thanks Paul! I agree with using "parallel", will make the update in next patch, Thanks for help update the CSR. > > BRs, > Lin > > On 2020/4/23, 4:42 AM, "Hohensee, Paul" wrote: > > For the interface, I'd use "parallel" instead of "parallelThreadNum". All the other options are lower case, and it's a lot easier to type "parallel". I took the liberty of updating the CSR. If you're ok with it, you might want to change variable names and such, plus of course JMap.usage. > > Thanks, > Paul > > On 4/22/20, 2:29 AM, "serviceability-dev on behalf of linzang(??)" wrote: > > Dear Stefan, > > Thanks a lot! I agree with you to decouple the heap inspection code with GC's. > I will start from your POC code, may discuss with you later. > > > BRs, > Lin > > On 2020/4/22, 5:14 PM, "Stefan Karlsson" wrote: > > Hi Lin, > > I took a look at this earlier and saw that the heap inspection code is > strongly coupled with the CollectedHeap and G1CollectedHeap. I'd prefer > if we'd abstract this away, so that the GCs only provide a "parallel > object iteration" interface, and the heap inspection code is kept elsewhere. > > I started experimenting with doing that, but other higher-priority (to > me) tasks have had to take precedence. > > I've uploaded my work-in-progress / proof-of-concept: > https://cr.openjdk.java.net/~stefank/8215624/webrev.01.delta/ > https://cr.openjdk.java.net/~stefank/8215624/webrev.01/ > > The current code doesn't handle the lifecycle (deletion) of the > ParallelObjectIterators. There's also code left unimplemented in around > CollectedHeap::run_task. However, I think this could work as a basis to > pull out the heap inspection code out of the GCs. > > Thanks, > StefanK > > On 2020-04-22 02:21, linzang(??) wrote: > > Dear all, > > May I ask you help to review? This RFR has been there for quite a while. > > Thanks! > > > > BRs, > > Lin > > > > > On 2020/3/16, 5:18 PM, "linzang(??)" wrote:> > > > >> Just update a new path, my preliminary measure show about 3.5x speedup of jmap histo on a nearly full 4GB G1 heap (8-core platform with parallel thread number set to 4). > >> webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_02/ > >> bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > >> CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > >> BRs, > >> Lin > >> > On 2020/3/2, 9:56 PM, "linzang(??)" wrote: > >> > > >> > Dear all, > >> > Let me try to ease the reviewing work by some explanation :P > >> > The patch's target is to speed up jmap -histo for heap iteration, from my experience it is necessary for large heap investigation. E.g in bigData scenario I have tried to conduct jmap -histo against 180GB heap, it does take quite a while. > >> > And if my understanding is corrent, even the jmap -histo without "live" option does heap inspection with heap lock acquired. so it is very likely to block mutator thread in allocation-sensitive scenario. I would say the faster the heap inspection does, the shorter the mutator be blocked. This is parallel iteration for jmap is necessary. > >> > I think the parallel heap inspection should be applied to all kind of heap. However, consider the heap layout are different for GCs, much time is required to understand all kinds of the heap layout to make the whole change. IMO, It is not wise to have a huge patch for the whole solution at once, and it is even harder to review it. So I plan to implement it incrementally, the first patch (this one) is going to confirm the implemention detail of how jmap accept the new option, passes it to attachListener of the jvm process and then how to make the parallel inspection closure be generic enough to make it easy to extend to different heap layout. And also how to implement the heap inspection in specific gc's heap. This patch use G1's heap as the begining. > >> > This patch actually do several things: > >> > 1. Add an option "parallelThreadNum=" to jmap -histo, the default behavior is to set N to 0, means let's JVM decide how many threads to use for heap inspection. Set this option to 1 will disable parallel heap inspection. (more details in CSR: https://bugs.openjdk.java.net/browse/JDK-8239290) > >> > 2. Make a change in how Jmap passing arguments, changes in http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html, originally it pass options as separate arguments to attachListener, this patch change to that all options be compose to a single string. So the arg_count_max in attachListener.hpp do not need to be changed, and hence avoid the compatibility issue, as disscussed at https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html > >> > 3. Add an abstract class ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp, It's work(uint worker_id) method prepares the data structure (KlassInfoTable) need for every parallel worker thread, and then call do_object_iterate_parallel() which is heap specific implementation. I also added some machenism in KlassInfoTable to support parallel iteration, such as merge(). > >> > 4. In specific heap (G1 in this patch), create a subclass of ParHeapInspectTask, implement the do_object_iterate_parallel() for parallel heap inspection. For G1, it simply invoke g1CollectedHeap's object_iterate_parallel(). > >> > 5. Add related test. > >> > 6. it may be easy to extend this patch for other kinds of heap by creating subclass of ParHeapInspectTask and implement the do_object_iterate_parallel(). > >> > > >> > Hope these info could help on code review and initate the discussion :-) > >> > Thanks! > >> > > >> > BRs, > >> > Lin > >> > >On 2020/2/19, 9:40 AM, "linzang(??)" wrote:. > >> > > > >> > > Re-post this RFR with correct enhancement number to make it trackable. > >> > > please ignore the previous wrong post. sorry for troubles. > >> > > > >> > > webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/ > >> > > Hi bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > >> > > CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > >> > > -------------- > >> > > Lin > >> > > >Hi Lin, > > > > > > > >> > > >Could you, please, re-post your RFR with the right enhancement number in > >> > > >the message subject? > >> > > >It will be more trackable this way. > >> > > > > >> > > >Thanks, > >> > > >Serguei > >> > > > > >> > > > > >> > > >On 2/17/20 10:29 PM, linzang(??) wrote: > >> > > >> Dear David, > >> > > >> Thanks a lot! > >> > > >> I have updated the refined code to http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/. > >> > > >> IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration. > >> > > >> Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap, then we can extend the solution to other kinds of heap. > >> > > >> > >> > > >> Thanks, > >> > > >> -------------- > >> > > >> Lin > >> > > >>> Hi Lin, > >> > > >>> > >> > > >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC > >> > > >>> worker threads, and whether it needs to be extended beyond G1. > >> > > >>> > >> > > >>> I happened to spot one nit when browsing: > >> > > >>> > >> > > >>> src/hotspot/share/gc/shared/collectedHeap.hpp > >> > > >>> > >> > > >>> + virtual bool run_par_heap_inspect_task(KlassInfoTable* cit, > >> > > >>> + BoolObjectClosure* filter, > >> > > >>> + size_t* missed_count, > >> > > >>> + size_t thread_num) { > >> > > >>> + return NULL; > >> > > >>> > >> > > >>> s/NULL/false/ > >> > > >>> > >> > > >>> Cheers, > >> > > >>> David > > > > > >>> > >> > > >>> On 18/02/2020 2:15 pm, linzang(??) wrote: > >> > > >>>> Dear All, > >> > > >>>> May I ask your help to review the follow changes: > >> > > >>>> webrev: > >> > > >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/ > >> > > >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8215624 > >> > > >>>> related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290 > >> > > >>>> This patch enable parallel heap inspection of G1 for jmap histo. > >> > > >>>> my simple test shown it can speed up 2x of jmap -histo with > >> > > >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform. > >> > > >>>> > >> > > >>>> ------------------------------------------------------------------------ > >> > > >>>> BRs, > >> > > >>>> Lin > >> > > >> > > >> > > > > > > > > > > > > > > > From martinrb at google.com Tue Jun 30 07:47:43 2020 From: martinrb at google.com (Martin Buchholz) Date: Tue, 30 Jun 2020 00:47:43 -0700 Subject: RFR (S) 8247615: Initialize the bytes left for the heap sampler In-Reply-To: References: Message-ID: Looks good to me, but of course I'm not qualified to review. On Mon, Jun 29, 2020 at 3:26 PM Jean Christophe Beyler wrote: > > Agreed; missed a hg qrefresh; link is now updated: > http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.01/ > > :) > Jc > > On Mon, Jun 29, 2020 at 2:28 PM Man Cao wrote: >> >> Looks good. >> >> > though adding the change that Man wants might make it more flaky so I added your numThreads / 2 in case >> I don't see the "numThreads / 2" in webrev.01 though. No need for a webrev for this fix. >> >> -Man >> >> >> On Mon, Jun 29, 2020 at 1:10 PM Jean Christophe Beyler wrote: >>> >>> Hi all, >>> >>> Sorry it took time to get back to this; could I get a new review from: >>> http://cr.openjdk.java.net/~jcbeyler/8247615/webrev.01/ >>> >>> The bug is here: >>> https://bugs.openjdk.java.net/browse/JDK-8247615 >>> >>> Note, this passed the submit repo testing. >>> >>> Thanks and have a great day! >>> Jc >>> >>> Ps: explicit inlined Acks/Done are below: >>> >>> Sorry it took time to get back to this: >>> @Martin: >>> - done the typo >>> - about the sampling test: No you won't get samples due to how the system is done, since we know we only will be allocating one object for the thread, it dies out before a sample is required... though adding the change that Man wants might make it more flaky so I added your numThreads / 2 in case >>> - done for the always in the description >>> >>> >>> On Thu, Jun 25, 2020 at 6:54 PM Derek Thomson wrote: >>>> >>>> > It could also avoid the problem where every thread deterministically allocates the same object at 512K, although this is unlikely. >>>> >>>> I've recently discovered that with certain server frameworks that this actually becomes quite likely! So I'd strongly recommend using pick_next_sample. >>> >>> >>> Ack, done :) >>> >>>> >>>> >>>> On Thu, Jun 25, 2020 at 4:56 PM Man Cao wrote: >>>>> >>>>> Thanks for fixing this! >>>>> >>>>> > 53 ThreadHeapSampler() : _bytes_until_sample(get_sampling_interval()) { >>>>> >>>>> Does this work better? (It has to be done after the initialization of _rnd.) >>>>> _bytes_until_sample = pick_next_sample(); >>>>> >>>>> It could avoid completely missing to sample the first 512K allocation. >>>>> It could also avoid the problem where every thread >>> >>> >>> Done. >>> >>> >>>>> >>>>> deterministically allocates the same object at 512K, although this is unlikely. >>>>> >>>>> -Man >>> >>> >>> >>> -- >>> >>> Thanks, >>> Jc > > > > -- > > Thanks, > Jc From robbin.ehn at oracle.com Tue Jun 30 10:23:56 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 30 Jun 2020 12:23:56 +0200 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> Message-ID: <9e3f081f-be59-f818-8793-67c439b92598@oracle.com> Hi Yasumasa, Thanks for your effort doing this. #1 GetFrameLocation GetStackTrace GetCurrentLocation (need to add BCI) Should use exactly the same code since a stack trace with max_count = 1 and start_depth = depth/0 is the frame location and jvmtiFrameInfo contain the correct information (+ add BCI)? Thus GetFrameLocation also would use handshakes and no special handshake path for GetCurrentLocation. So we would have _one_ function to get method and BCI/lineno for depth and max count. Which can easily handle all three cases? (maybe more cases also) Is there nay reason for having a separate path for each of these ??? #2 In this method: JvmtiEnvThreadState::reset_current_location(jvmtiEvent event_type, bool enabled) if (event_type == JVMTI_EVENT_SINGLE_STEP && _thread->has_last_Java_frame()) { We are checking if a running thread have a last Java frame, which means it could have one now, e.g. it could be in another handshake or not woken up from a safepoint yet. So there is no use in checking that. (old code) 313 if (SafepointSynchronize::is_at_safepoint() || 314 ((Thread::current() == _thread) && (_thread == _thread->active_handshaker()))) { #3 You are using a debug only method here "active_handshaker()". #4 This AND is never true: ((Thread::current() == _thread) && (_thread == _thread->active_handshaker()))) You can't be active handshaker for yourself. Thanks, Robbin On 2020-06-24 08:50, Yasumasa Suenaga wrote: > Hi all, > > Please review this change: > > ? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 > ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ > > This change replace following VM operations to direct handshake. > > ?- VM_GetFrameCount (GetFrameCount()) > ?- VM_GetFrameLocation (GetFrameLocation()) > ?- VM_GetThreadListStackTraces (GetThreadListStackTrace()) > ?- VM_GetCurrentLocation > > GetThreadListStackTrace() uses direct handshake if thread count == 1. In > other case (thread count > 1), it would be performed as VM operation > (VM_GetThreadListStackTraces). > Caller of VM_GetCurrentLocation > (JvmtiEnvThreadState::reset_current_location()) might be called at > safepoint. So I added safepoint check in its caller. > > This change has been tested in serviceability/jvmti serviceability/jdwp > vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns > k/jdwp. > > Also I tested it on submit repo, then it has execution error > (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to > dependency error. So I think it does not occur by this change. > > > Thanks, > > Yasumasa From suenaga at oss.nttdata.com Tue Jun 30 12:35:51 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Tue, 30 Jun 2020 21:35:51 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <9e3f081f-be59-f818-8793-67c439b92598@oracle.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <9e3f081f-be59-f818-8793-67c439b92598@oracle.com> Message-ID: <46ec19dc-6f82-9d39-061c-0c703ed13527@oss.nttdata.com> Hi Robbin, We decided to separate thread operation and frame operation. I've posted review request for thread operation. Could you review it? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ We can share HandshakeClosure for GetStackTrace() to GetFrameLocation() as you said. However I wonder why it is not so now. I guess GetStackTrace() would give some overhead (e.g. memory allocation for jvmtiFrameInfo) if we use it for frame location. I thought we should replace VM operation to HandshakeClosure one by one. I will merge these operations as possible in JDK-8248362 if we should do. Thanks, Yasumasa On 2020/06/30 19:23, Robbin Ehn wrote: > Hi Yasumasa, > > Thanks for your effort doing this. > > #1 > GetFrameLocation > GetStackTrace > GetCurrentLocation (need to add BCI) > > Should use exactly the same code since a stack trace with max_count = 1 > and start_depth = depth/0 is the frame location and jvmtiFrameInfo > contain the correct information (+ add BCI)? Thus GetFrameLocation also > would use handshakes and no special handshake path for > GetCurrentLocation. > > So we would have _one_ function to get method and BCI/lineno for depth and max count. Which can easily handle all three cases? (maybe more > cases also) > > Is there nay reason for having a separate path for each of these ??? > > #2 > In this method: > JvmtiEnvThreadState::reset_current_location(jvmtiEvent event_type, bool enabled) > > if (event_type == JVMTI_EVENT_SINGLE_STEP && _thread->has_last_Java_frame()) { > > We are checking if a running thread have a last Java frame, which means it could have one now, e.g. it could be in another handshake or not woken up from a safepoint yet. So there is no use in checking that. > (old code) > > ?313?????? if (SafepointSynchronize::is_at_safepoint() || > ?314?????????? ((Thread::current() == _thread) && (_thread == _thread->active_handshaker()))) { > > #3 > You are using a debug only method here "active_handshaker()". > > #4 > This AND is never true: > ((Thread::current() == _thread) && (_thread == _thread->active_handshaker()))) > > You can't be active handshaker for yourself. > > Thanks, Robbin > > On 2020-06-24 08:50, Yasumasa Suenaga wrote: >> Hi all, >> >> Please review this change: >> >> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >> >> This change replace following VM operations to direct handshake. >> >> ??- VM_GetFrameCount (GetFrameCount()) >> ??- VM_GetFrameLocation (GetFrameLocation()) >> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >> ??- VM_GetCurrentLocation >> >> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >> >> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >> k/jdwp. >> >> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >> >> >> Thanks, >> >> Yasumasa From robbin.ehn at oracle.com Tue Jun 30 13:03:11 2020 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 30 Jun 2020 15:03:11 +0200 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <46ec19dc-6f82-9d39-061c-0c703ed13527@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <9e3f081f-be59-f818-8793-67c439b92598@oracle.com> <46ec19dc-6f82-9d39-061c-0c703ed13527@oss.nttdata.com> Message-ID: Hi Yasumasa, On 2020-06-30 14:35, Yasumasa Suenaga wrote: > Hi Robbin, > > We decided to separate thread operation and frame operation. > I've posted review request for thread operation. Could you review it? Yes I know but I'm soon off for vacation and you have not sent them all out and due to the nature of my comments replied here. > > ? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ > > We can share HandshakeClosure for GetStackTrace() to GetFrameLocation() > as you said. > However I wonder why it is not so now. > I guess GetStackTrace() would give some overhead (e.g. memory allocation > for jvmtiFrameInfo) if we use it for frame location. In case of GetFrameLocation we only need one and it's only a dozen bytes big I wouldn't count that as an overhead in this case. If you are really paranoid you can stack allocated it and pass it in. > > I thought we should replace VM operation to HandshakeClosure one by one. > I will merge these operations as possible in JDK-8248362 if we should do. If you make the first one generic enough or evolve it you can convert VM op to your new function instead, thus doing one by one. So in http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ I really think: 1575 JvmtiEnv::GetThreadListStackTraces(jint thread_count, const jthread* thread_list, jint max_frame_count, jvmtiStackInfo** stack_info_ptr) { ..... 1578 if (thread_count == 1) { ===> // This case should be the same as JvmtiEnv::GetStackTrace 1592 } else { I do not see any issues, so it seem reasonable, thanks. Thanks, Robbin > > > Thanks, > > Yasumasa > > > On 2020/06/30 19:23, Robbin Ehn wrote: >> Hi Yasumasa, >> >> Thanks for your effort doing this. >> >> #1 >> GetFrameLocation >> GetStackTrace >> GetCurrentLocation (need to add BCI) >> >> Should use exactly the same code since a stack trace with max_count = 1 >> and start_depth = depth/0 is the frame location and jvmtiFrameInfo >> contain the correct information (+ add BCI)? Thus GetFrameLocation also >> would use handshakes and no special handshake path for >> GetCurrentLocation. >> >> So we would have _one_ function to get method and BCI/lineno for depth >> and max count. Which can easily handle all three cases? (maybe more >> cases also) >> >> Is there nay reason for having a separate path for each of these ??? >> >> #2 >> In this method: >> JvmtiEnvThreadState::reset_current_location(jvmtiEvent event_type, >> bool enabled) >> >> if (event_type == JVMTI_EVENT_SINGLE_STEP && >> _thread->has_last_Java_frame()) { >> >> We are checking if a running thread have a last Java frame, which >> means it could have one now, e.g. it could be in another handshake or >> not woken up from a safepoint yet. So there is no use in checking that. >> (old code) >> >> ??313?????? if (SafepointSynchronize::is_at_safepoint() || >> ??314?????????? ((Thread::current() == _thread) && (_thread == >> _thread->active_handshaker()))) { >> >> #3 >> You are using a debug only method here "active_handshaker()". >> >> #4 >> This AND is never true: >> ((Thread::current() == _thread) && (_thread == >> _thread->active_handshaker()))) >> >> You can't be active handshaker for yourself. >> >> Thanks, Robbin >> >> On 2020-06-24 08:50, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> Please review this change: >>> >>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>> >>> This change replace following VM operations to direct handshake. >>> >>> ??- VM_GetFrameCount (GetFrameCount()) >>> ??- VM_GetFrameLocation (GetFrameLocation()) >>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>> ??- VM_GetCurrentLocation >>> >>> GetThreadListStackTrace() uses direct handshake if thread count == 1. >>> In other case (thread count > 1), it would be performed as VM >>> operation (VM_GetThreadListStackTraces). >>> Caller of VM_GetCurrentLocation >>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>> safepoint. So I added safepoint check in its caller. >>> >>> This change has been tested in serviceability/jvmti >>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>> vmTestbase/ns >>> k/jdwp. >>> >>> Also I tested it on submit repo, then it has execution error >>> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >>> dependency error. So I think it does not occur by this change. >>> >>> >>> Thanks, >>> >>> Yasumasa From david.holmes at oracle.com Tue Jun 30 13:22:05 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jun 2020 23:22:05 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> Message-ID: <65889ab4-6b7e-86fb-59cc-012520b83138@oracle.com> Hi Yasumasa, On 30/06/2020 10:05 am, Yasumasa Suenaga wrote: > Hi David, Serguei, > > I updated webrev for 8242428. Could you review again? > This change migrate to use direct handshake for GetStackTrace() and > GetThreadListStackTraces() (when thread_count == 1). > > ? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ This looks really good now! I only have a few nits below. There is one thing I don't like about it but it requires a change to the main Handshake logic to address - in JvmtiEnv::GetThreadListStackTraces you have to create a ThreadsListHandle to convert the jthread to a JavaThread, but then the Handshake::execute_direct creates another ThreadsListHandle internally. That's a waste. I will discuss with Robbin and file a RFE to have an overload of execute_direct that takes an existing TLH. Actually it's worse than that because we have another TLH in use at the entry point for the JVMTI functions, so I think there may be some scope for simplifying the use of TLH instances - future RFE. --- src/hotspot/share/prims/jvmtiEnvBase.hpp 451 GetStackTraceClosure(JvmtiEnv *env, jint start_depth, jint max_count, 452 jvmtiFrameInfo* frame_buffer, jint* count_ptr) 453 : HandshakeClosure("GetStackTrace"), 454 _env(env), _start_depth(start_depth), _max_count(max_count), 455 _frame_buffer(frame_buffer), _count_ptr(count_ptr), 456 _result(JVMTI_ERROR_THREAD_NOT_ALIVE) { Nit: can you do one initializer per line please. This looks wrong: 466 class MultipleStackTracesCollector : public StackObj { 498 class VM_GetAllStackTraces : public VM_Operation { 499 private: 500 JavaThread *_calling_thread; 501 jint _final_thread_count; 502 MultipleStackTracesCollector _collector; You can't have a StackObj as a member of another class like that as it may not be on the stack. I think MultipleStackTracesCollector should not extend any allocation class, and should always be embedded directly in another class. 481 MultipleStackTracesCollector(JvmtiEnv *env, jint max_frame_count) { 482 _env = env; 483 _max_frame_count = max_frame_count; 484 _frame_count_total = 0; 485 _head = NULL; 486 _stack_info = NULL; 487 _result = JVMTI_ERROR_NONE; 488 } As you are touching this can you change it to use an initializer list as you did for the HandshakeClosure, and please keep one item per line. --- src/hotspot/share/prims/jvmtiEnvBase.cpp 820 assert(SafepointSynchronize::is_at_safepoint() || 821 java_thread->is_thread_fully_suspended(false, &debug_bits) || 822 current_thread == java_thread->active_handshaker(), 823 "at safepoint / handshake or target thread is suspended"); I don't think the suspension check is necessary, as even if the target is suspended we must still be at a safepoint or in a handshake with it. Makes me wonder if we used to allow a racy stacktrace operation on a suspended thread, assuming it would remain suspended? 1268 oop thread_oop = jt->threadObj(); 1269 1270 if (!jt->is_exiting() && (jt->threadObj() != NULL)) { You can use thread_oop in line 1270. 1272 _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, thread_oop), 1273 jt, thread_oop); It is frustrating that this entire call chain started with a jthread reference, which we converted to a JavaThread, only to eventually need to convert it back to a jthread! I think there is some scope for simplification here but not as part of this change. 1271 ResourceMark rm; IIUC at this point the _calling_thread is the current thread, so we can use: ResourceMark rm(_calling_thread); --- Please add @bug lines to the tests. I'm still pondering the test logic but wanted to send this now. Thanks, David ----- > VM_GetThreadListStackTrace (for GetThreadListStackTraces) and > VM_GetAllStackTraces (for GetAllStackTraces) have inherited > VM_GetMultipleStackTraces VM operation which provides the feature to > generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a > normal C++ class to share with HandshakeClosure for > GetThreadListStackTraces (GetSingleStackTraceClosure). > > Also I added new testcases which test GetThreadListStackTraces() with > thread_count == 1 and with all threads. > > This change has been tested in serviceability/jvmti serviceability/jdwp > vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. > > > Thanks, > > Yasumasa > > > On 2020/06/24 15:50, Yasumasa Suenaga wrote: >> Hi all, >> >> Please review this change: >> >> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >> >> This change replace following VM operations to direct handshake. >> >> ??- VM_GetFrameCount (GetFrameCount()) >> ??- VM_GetFrameLocation (GetFrameLocation()) >> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >> ??- VM_GetCurrentLocation >> >> GetThreadListStackTrace() uses direct handshake if thread count == 1. >> In other case (thread count > 1), it would be performed as VM >> operation (VM_GetThreadListStackTraces). >> Caller of VM_GetCurrentLocation >> (JvmtiEnvThreadState::reset_current_location()) might be called at >> safepoint. So I added safepoint check in its caller. >> >> This change has been tested in serviceability/jvmti >> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >> k/jdwp. >> >> Also I tested it on submit repo, then it has execution error >> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >> dependency error. So I think it does not occur by this change. >> >> >> Thanks, >> >> Yasumasa From david.holmes at oracle.com Tue Jun 30 13:28:18 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 30 Jun 2020 23:28:18 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <9e3f081f-be59-f818-8793-67c439b92598@oracle.com> <46ec19dc-6f82-9d39-061c-0c703ed13527@oss.nttdata.com> Message-ID: <56cbe665-e9c5-8cad-12e9-2f04dbb594fd@oracle.com> Hi Robbin, On 30/06/2020 11:03 pm, Robbin Ehn wrote: > Hi Yasumasa, > > On 2020-06-30 14:35, Yasumasa Suenaga wrote: >> Hi Robbin, >> >> We decided to separate thread operation and frame operation. >> I've posted review request for thread operation. Could you review it? > > Yes I know but I'm soon off for vacation and you have not sent them all > out and due to the nature of my comments replied here. > >> >> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >> >> We can share HandshakeClosure for GetStackTrace() to >> GetFrameLocation() as you said. >> However I wonder why it is not so now. >> I guess GetStackTrace() would give some overhead (e.g. memory >> allocation for jvmtiFrameInfo) if we use it for frame location. > > In case of GetFrameLocation we only need one and it's only a dozen bytes > big I wouldn't count that as an overhead in this case. > If you are really paranoid you can stack allocated it and pass it in. > >> >> I thought we should replace VM operation to HandshakeClosure one by one. >> I will merge these operations as possible in JDK-8248362 if we should do. > > If you make the first one generic enough or evolve it you can convert VM > op to your new function instead, thus doing one by one. > > So in http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ I > really think: > > 1575 JvmtiEnv::GetThreadListStackTraces(jint thread_count, const > jthread* thread_list, jint max_frame_count, jvmtiStackInfo** > stack_info_ptr) { > ..... > 1578?? if (thread_count == 1) { > ===>???? // This case should be the same as JvmtiEnv::GetStackTrace > 1592?? } else { It effectively is, but it can't be exactly the same because the inputs and outputs are different. There is an opportunity for more refactoring and streamlining if we look at the flow of arguments through the whole call-chain (as per my other email) but that's a bit beyond the scope of this initial conversion I think. Cheers, David ----- > I do not see any issues, so it seem reasonable, thanks. > > Thanks, Robbin > >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/30 19:23, Robbin Ehn wrote: >>> Hi Yasumasa, >>> >>> Thanks for your effort doing this. >>> >>> #1 >>> GetFrameLocation >>> GetStackTrace >>> GetCurrentLocation (need to add BCI) >>> >>> Should use exactly the same code since a stack trace with max_count = 1 >>> and start_depth = depth/0 is the frame location and jvmtiFrameInfo >>> contain the correct information (+ add BCI)? Thus GetFrameLocation also >>> would use handshakes and no special handshake path for >>> GetCurrentLocation. >>> >>> So we would have _one_ function to get method and BCI/lineno for >>> depth and max count. Which can easily handle all three cases? (maybe >>> more >>> cases also) >>> >>> Is there nay reason for having a separate path for each of these ??? >>> >>> #2 >>> In this method: >>> JvmtiEnvThreadState::reset_current_location(jvmtiEvent event_type, >>> bool enabled) >>> >>> if (event_type == JVMTI_EVENT_SINGLE_STEP && >>> _thread->has_last_Java_frame()) { >>> >>> We are checking if a running thread have a last Java frame, which >>> means it could have one now, e.g. it could be in another handshake or >>> not woken up from a safepoint yet. So there is no use in checking that. >>> (old code) >>> >>> ??313?????? if (SafepointSynchronize::is_at_safepoint() || >>> ??314?????????? ((Thread::current() == _thread) && (_thread == >>> _thread->active_handshaker()))) { >>> >>> #3 >>> You are using a debug only method here "active_handshaker()". >>> >>> #4 >>> This AND is never true: >>> ((Thread::current() == _thread) && (_thread == >>> _thread->active_handshaker()))) >>> >>> You can't be active handshaker for yourself. >>> >>> Thanks, Robbin >>> >>> On 2020-06-24 08:50, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> Please review this change: >>>> >>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>>> >>>> This change replace following VM operations to direct handshake. >>>> >>>> ??- VM_GetFrameCount (GetFrameCount()) >>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>> ??- VM_GetCurrentLocation >>>> >>>> GetThreadListStackTrace() uses direct handshake if thread count == >>>> 1. In other case (thread count > 1), it would be performed as VM >>>> operation (VM_GetThreadListStackTraces). >>>> Caller of VM_GetCurrentLocation >>>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>>> safepoint. So I added safepoint check in its caller. >>>> >>>> This change has been tested in serviceability/jvmti >>>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>>> vmTestbase/ns >>>> k/jdwp. >>>> >>>> Also I tested it on submit repo, then it has execution error >>>> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >>>> dependency error. So I think it does not occur by this change. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa From suenaga at oss.nttdata.com Tue Jun 30 14:17:20 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Tue, 30 Jun 2020 23:17:20 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <65889ab4-6b7e-86fb-59cc-012520b83138@oracle.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <65889ab4-6b7e-86fb-59cc-012520b83138@oracle.com> Message-ID: <4b7df6f8-c8f7-fedf-6366-b885d2235b58@oss.nttdata.com> Hi David, Thank you for reviewing! I will update new webrev tomorrow. > 466 class MultipleStackTracesCollector : public StackObj { > > 498 class VM_GetAllStackTraces : public VM_Operation { > 499 private: > 500 JavaThread *_calling_thread; > 501 jint _final_thread_count; > 502 MultipleStackTracesCollector _collector; > > You can't have a StackObj as a member of another class like that as it may not be on the stack. I think MultipleStackTracesCollector should not extend any allocation class, and should always be embedded directly in another class. I'm not sure what does mean "embedded". Is it ok as below? ``` class MultipleStackTracesCollector { : } class GetAllStackTraces : public VM_Operation { private: MultipleStackTracesCollector _collector; } ``` Thanks, Yasumasa On 2020/06/30 22:22, David Holmes wrote: > Hi Yasumasa, > > On 30/06/2020 10:05 am, Yasumasa Suenaga wrote: >> Hi David, Serguei, >> >> I updated webrev for 8242428. Could you review again? >> This change migrate to use direct handshake for GetStackTrace() and GetThreadListStackTraces() (when thread_count == 1). >> >> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ > > This looks really good now! I only have a few nits below. There is one thing I don't like about it but it requires a change to the main Handshake logic to address - in JvmtiEnv::GetThreadListStackTraces you have to create a ThreadsListHandle to convert the jthread to a JavaThread, but then the Handshake::execute_direct creates another ThreadsListHandle internally. That's a waste. I will discuss with Robbin and file a RFE to have an overload of execute_direct that takes an existing TLH. Actually it's worse than that because we have another TLH in use at the entry point for the JVMTI functions, so I think there may be some scope for simplifying the use of TLH instances - future RFE. > > --- > > src/hotspot/share/prims/jvmtiEnvBase.hpp > > ?451?? GetStackTraceClosure(JvmtiEnv *env, jint start_depth, jint max_count, > ?452??????????????????????? jvmtiFrameInfo* frame_buffer, jint* count_ptr) > ?453???? : HandshakeClosure("GetStackTrace"), > ?454?????? _env(env), _start_depth(start_depth), _max_count(max_count), > ?455?????? _frame_buffer(frame_buffer), _count_ptr(count_ptr), > ?456?????? _result(JVMTI_ERROR_THREAD_NOT_ALIVE) { > > Nit: can you do one initializer per line please. > > This looks wrong: > > 466 class MultipleStackTracesCollector : public StackObj { > > ?498 class VM_GetAllStackTraces : public VM_Operation { > ?499 private: > ?500?? JavaThread *_calling_thread; > ?501?? jint _final_thread_count; > ?502?? MultipleStackTracesCollector _collector; > > You can't have a StackObj as a member of another class like that as it may not be on the stack. I think MultipleStackTracesCollector should not extend any allocation class, and should always be embedded directly in another class. > > 481?? MultipleStackTracesCollector(JvmtiEnv *env, jint max_frame_count) { > ?482???? _env = env; > ?483???? _max_frame_count = max_frame_count; > ?484???? _frame_count_total = 0; > ?485???? _head = NULL; > ?486???? _stack_info = NULL; > ?487???? _result = JVMTI_ERROR_NONE; > ?488?? } > > As you are touching this can you change it to use an initializer list as you did for the HandshakeClosure, and please keep one item per line. > > --- > > src/hotspot/share/prims/jvmtiEnvBase.cpp > > ?820?? assert(SafepointSynchronize::is_at_safepoint() || > ?821????????? java_thread->is_thread_fully_suspended(false, &debug_bits) || > ?822????????? current_thread == java_thread->active_handshaker(), > ?823????????? "at safepoint / handshake or target thread is suspended"); > > I don't think the suspension check is necessary, as even if the target is suspended we must still be at a safepoint or in a handshake with it. Makes me wonder if we used to allow a racy stacktrace operation on a suspended thread, assuming it would remain suspended? > > 1268?? oop thread_oop = jt->threadObj(); > 1269 > 1270?? if (!jt->is_exiting() && (jt->threadObj() != NULL)) { > > You can use thread_oop in line 1270. > > 1272 _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, thread_oop), > 1273??????????????????????????? jt, thread_oop); > > It is frustrating that this entire call chain started with a jthread reference, which we converted to a JavaThread, only to eventually need to convert it back to a jthread! I think there is some scope for simplification here but not as part of this change. > > 1271???? ResourceMark rm; > > IIUC at this point the _calling_thread is the current thread, so we can use: > > ??? ResourceMark rm(_calling_thread); > > --- > > Please add @bug lines to the tests. > > I'm still pondering the test logic but wanted to send this now. > > Thanks, > David > ----- >> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and VM_GetAllStackTraces (for GetAllStackTraces) have inherited VM_GetMultipleStackTraces VM operation which provides the feature to generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a normal C++ class to share with HandshakeClosure for GetThreadListStackTraces (GetSingleStackTraceClosure). >> >> Also I added new testcases which test GetThreadListStackTraces() with thread_count == 1 and with all threads. >> >> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> Please review this change: >>> >>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>> >>> This change replace following VM operations to direct handshake. >>> >>> ??- VM_GetFrameCount (GetFrameCount()) >>> ??- VM_GetFrameLocation (GetFrameLocation()) >>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>> ??- VM_GetCurrentLocation >>> >>> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >>> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >>> >>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >>> k/jdwp. >>> >>> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >>> >>> >>> Thanks, >>> >>> Yasumasa From hohensee at amazon.com Tue Jun 30 15:51:13 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 30 Jun 2020 15:51:13 +0000 Subject: RFR: 8205467: javax/management/remote/mandatory/connection/MultiThreadDeadLockTest.java possible deadlock Message-ID: Lgtm. Thanks, Paul ?On 6/29/20, 12:58 PM, "serviceability-dev on behalf of Daniil Titov" wrote: Please review a tiny change that adjusts the wait timeout the test uses for "test.timeout.factor" system property. Please note that a trivial merge with fix [4] that is currently on review [3] will be required. Since issues [2] and [4] describe different problems I decided to not combine these both changes in the single fix. Testing: Mach5 tests tier1-tier3 successfully passed. [1] Web rev: https://cr.openjdk.java.net/~dtitov/8205467/webrev.01/ [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8205467 [3] https://mail.openjdk.java.net/pipermail/serviceability-dev/2020-June/032098.html [4] https://bugs.openjdk.java.net/browse/JDK-8227337 Thank you, Daniil From hohensee at amazon.com Tue Jun 30 15:49:57 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 30 Jun 2020 15:49:57 +0000 Subject: RFR: 8227337: javax/management/remote/mandatory/connection/ReconnectTest.java NoSuchObjectException no such object in table Message-ID: <295E28BE-DB14-4332-88F1-765BBB76BC5A@amazon.com> The JBS issue is non-public, but this looks fine. I assume you set test.timeout.factor to something larger than 1.0 when you run MultiThreadDeadLockTest. Thanks, Paul ?On 6/29/20, 12:43 PM, "serviceability-dev on behalf of Daniil Titov" wrote: Please review the change that fixes an intermittent tests failure. The tests javax/management/remote/mandatory/connection/ReconnectTest.java and javax/management/remote/mandatory/connection/MultiThreadDeadLockTest.java use specific settings for server timeout that in some cases (e.g. when the test is run with -Xcomp) result in JMX server connection timeout thread unexports the remote object while the client connection is still in the progress. Below is an example of a such stacktrace: java.rmi.NoSuchObjectException: no such object in table at java.rmi/sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:303) at java.rmi/sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:279) at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:164) at jdk.remoteref/jdk.jmx.remote.internal.rmi.PRef.invoke(Unknown Source) at java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl_Stub.getConnectionId(RMIConnectionImpl_Stub.java:318) at java.management.rmi/javax.management.remote.rmi.RMIConnector.getConnectionId(RMIConnector.java:385) at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:347) at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) at MultiThreadDeadLockTest.main(MultiThreadDeadLockTest.java:87) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) The fix adjusts the server timeout the tests use for "test.timeout.factor" system property. Testing: Mach5 tests are in the progress. [1] https://cr.openjdk.java.net/~dtitov/8227337/webrev.01/ [2] https://bugs.openjdk.java.net/browse/JDK-8227337 Thanks, Daniil From serguei.spitsyn at oracle.com Tue Jun 30 18:57:08 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 30 Jun 2020 11:57:08 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> Message-ID: <48ad7cf5-af7d-1593-a5c9-0c0891bd2ce2@oracle.com> Hi Chris, I do not see any problems with this change. Thanks, Serguei On 6/25/20 13:29, Chris Plummer wrote: > Ping. I still need one more review for this. There was one updated > webev. I list it below so you don't need to dig it up in the long > email thread: >> I've? updated with webrev based on the new finding that a JavaThread >> cannot be on the ThreadList after its OS thread has been destroyed >> since the JavaThread removes itself from the ThreadList, and >> therefore must be running on its OS thread. The logic of the fix is >> unchanged from the first webrev, but I updated the comments to better >> reflect what is going on. I also updated the CR: >> >> https://bugs.openjdk.java.net/browse/JDK-8247533 >> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html > > thanks, > > Chris > > On 6/17/20 1:34 PM, Chris Plummer wrote: >> Hello, >> >> Please help review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8247533 >> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >> >> The CR contains all the needed details. Here's a summary of changes >> in each file: >> >> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >> -Instead of throwing an exception when the OS ThreadID is invalid, >> print a warning. >> >> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >> -Improve a print_debug message >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >> >> -Deal with the array of registers read in being null due to the OS >> ThreadID not being valid. >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >> >> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >> appearing twice when printing the exception. >> >> thanks, >> >> Chris > > From chris.plummer at oracle.com Tue Jun 30 19:25:17 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 30 Jun 2020 12:25:17 -0700 Subject: RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp In-Reply-To: <48ad7cf5-af7d-1593-a5c9-0c0891bd2ce2@oracle.com> References: <5e3727cd-dee9-abdf-fe7a-657fe6a4c845@oracle.com> <48ad7cf5-af7d-1593-a5c9-0c0891bd2ce2@oracle.com> Message-ID: <28c2c116-7aca-6858-f964-841a30f9cdc5@oracle.com> Thanks! On 6/30/20 11:57 AM, serguei.spitsyn at oracle.com wrote: > Hi Chris, > > I do not see any problems with this change. > > Thanks, > Serguei > > > On 6/25/20 13:29, Chris Plummer wrote: >> Ping. I still need one more review for this. There was one updated >> webev. I list it below so you don't need to dig it up in the long >> email thread: >>> I've? updated with webrev based on the new finding that a JavaThread >>> cannot be on the ThreadList after its OS thread has been destroyed >>> since the JavaThread removes itself from the ThreadList, and >>> therefore must be running on its OS thread. The logic of the fix is >>> unchanged from the first webrev, but I updated the comments to >>> better reflect what is going on. I also updated the CR: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html >> >> thanks, >> >> Chris >> >> On 6/17/20 1:34 PM, Chris Plummer wrote: >>> Hello, >>> >>> Please help review the following: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8247533 >>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html >>> >>> The CR contains all the needed details. Here's a summary of changes >>> in each file: >>> >>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp >>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m >>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp >>> -Instead of throwing an exception when the OS ThreadID is invalid, >>> print a warning. >>> >>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c >>> -Improve a print_debug message >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java >>> >>> -Deal with the array of registers read in being null due to the OS >>> ThreadID not being valid. >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java >>> >>> -Fix issue with "sun.jvm.hotspot.debugger.DebuggerException" >>> appearing twice when printing the exception. >>> >>> thanks, >>> >>> Chris >> >> > From david.holmes at oracle.com Tue Jun 30 22:05:23 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 1 Jul 2020 08:05:23 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <4b7df6f8-c8f7-fedf-6366-b885d2235b58@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <65889ab4-6b7e-86fb-59cc-012520b83138@oracle.com> <4b7df6f8-c8f7-fedf-6366-b885d2235b58@oss.nttdata.com> Message-ID: On 1/07/2020 12:17 am, Yasumasa Suenaga wrote: > Hi David, > > Thank you for reviewing! I will update new webrev tomorrow. > >> 466 class MultipleStackTracesCollector : public StackObj { >> >> ? 498 class VM_GetAllStackTraces : public VM_Operation { >> ? 499 private: >> ? 500?? JavaThread *_calling_thread; >> ? 501?? jint _final_thread_count; >> ? 502?? MultipleStackTracesCollector _collector; >> >> You can't have a StackObj as a member of another class like that as it >> may not be on the stack. I think MultipleStackTracesCollector should >> not extend any allocation class, and should always be embedded >> directly in another class. > > I'm not sure what does mean "embedded". > Is it ok as below? > > ``` > class MultipleStackTracesCollector { > ?? : > } > > class GetAllStackTraces : public VM_Operation { > ? private: > ??? MultipleStackTracesCollector _collector; > } > ``` Yes that I what I meant. Thanks, David ----- > > Thanks, > > Yasumasa > > > On 2020/06/30 22:22, David Holmes wrote: >> Hi Yasumasa, >> >> On 30/06/2020 10:05 am, Yasumasa Suenaga wrote: >>> Hi David, Serguei, >>> >>> I updated webrev for 8242428. Could you review again? >>> This change migrate to use direct handshake for GetStackTrace() and >>> GetThreadListStackTraces() (when thread_count == 1). >>> >>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >> >> This looks really good now! I only have a few nits below. There is one >> thing I don't like about it but it requires a change to the main >> Handshake logic to address - in JvmtiEnv::GetThreadListStackTraces you >> have to create a ThreadsListHandle to convert the jthread to a >> JavaThread, but then the Handshake::execute_direct creates another >> ThreadsListHandle internally. That's a waste. I will discuss with >> Robbin and file a RFE to have an overload of execute_direct that takes >> an existing TLH. Actually it's worse than that because we have another >> TLH in use at the entry point for the JVMTI functions, so I think >> there may be some scope for simplifying the use of TLH instances - >> future RFE. >> >> --- >> >> src/hotspot/share/prims/jvmtiEnvBase.hpp >> >> ??451?? GetStackTraceClosure(JvmtiEnv *env, jint start_depth, jint >> max_count, >> ??452??????????????????????? jvmtiFrameInfo* frame_buffer, jint* >> count_ptr) >> ??453???? : HandshakeClosure("GetStackTrace"), >> ??454?????? _env(env), _start_depth(start_depth), _max_count(max_count), >> ??455?????? _frame_buffer(frame_buffer), _count_ptr(count_ptr), >> ??456?????? _result(JVMTI_ERROR_THREAD_NOT_ALIVE) { >> >> Nit: can you do one initializer per line please. >> >> This looks wrong: >> >> 466 class MultipleStackTracesCollector : public StackObj { >> >> ??498 class VM_GetAllStackTraces : public VM_Operation { >> ??499 private: >> ??500?? JavaThread *_calling_thread; >> ??501?? jint _final_thread_count; >> ??502?? MultipleStackTracesCollector _collector; >> >> You can't have a StackObj as a member of another class like that as it >> may not be on the stack. I think MultipleStackTracesCollector should >> not extend any allocation class, and should always be embedded >> directly in another class. >> >> 481?? MultipleStackTracesCollector(JvmtiEnv *env, jint max_frame_count) { >> ??482???? _env = env; >> ??483???? _max_frame_count = max_frame_count; >> ??484???? _frame_count_total = 0; >> ??485???? _head = NULL; >> ??486???? _stack_info = NULL; >> ??487???? _result = JVMTI_ERROR_NONE; >> ??488?? } >> >> As you are touching this can you change it to use an initializer list >> as you did for the HandshakeClosure, and please keep one item per line. >> >> --- >> >> src/hotspot/share/prims/jvmtiEnvBase.cpp >> >> ??820?? assert(SafepointSynchronize::is_at_safepoint() || >> ??821????????? java_thread->is_thread_fully_suspended(false, >> &debug_bits) || >> ??822????????? current_thread == java_thread->active_handshaker(), >> ??823????????? "at safepoint / handshake or target thread is suspended"); >> >> I don't think the suspension check is necessary, as even if the target >> is suspended we must still be at a safepoint or in a handshake with >> it. Makes me wonder if we used to allow a racy stacktrace operation on >> a suspended thread, assuming it would remain suspended? >> >> 1268?? oop thread_oop = jt->threadObj(); >> 1269 >> 1270?? if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >> >> You can use thread_oop in line 1270. >> >> 1272 >> _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, thread_oop), >> >> 1273??????????????????????????? jt, thread_oop); >> >> It is frustrating that this entire call chain started with a jthread >> reference, which we converted to a JavaThread, only to eventually need >> to convert it back to a jthread! I think there is some scope for >> simplification here but not as part of this change. >> >> 1271???? ResourceMark rm; >> >> IIUC at this point the _calling_thread is the current thread, so we >> can use: >> >> ???? ResourceMark rm(_calling_thread); >> >> --- >> >> Please add @bug lines to the tests. >> >> I'm still pondering the test logic but wanted to send this now. >> >> Thanks, >> David >> ----- >>> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and >>> VM_GetAllStackTraces (for GetAllStackTraces) have inherited >>> VM_GetMultipleStackTraces VM operation which provides the feature to >>> generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a >>> normal C++ class to share with HandshakeClosure for >>> GetThreadListStackTraces (GetSingleStackTraceClosure). >>> >>> Also I added new testcases which test GetThreadListStackTraces() with >>> thread_count == 1 and with all threads. >>> >>> This change has been tested in serviceability/jvmti >>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>> vmTestbase/nsk/jdwp. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> Please review this change: >>>> >>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>>> >>>> This change replace following VM operations to direct handshake. >>>> >>>> ??- VM_GetFrameCount (GetFrameCount()) >>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>> ??- VM_GetCurrentLocation >>>> >>>> GetThreadListStackTrace() uses direct handshake if thread count == >>>> 1. In other case (thread count > 1), it would be performed as VM >>>> operation (VM_GetThreadListStackTraces). >>>> Caller of VM_GetCurrentLocation >>>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>>> safepoint. So I added safepoint check in its caller. >>>> >>>> This change has been tested in serviceability/jvmti >>>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>>> vmTestbase/ns >>>> k/jdwp. >>>> >>>> Also I tested it on submit repo, then it has execution error >>>> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >>>> dependency error. So I think it does not occur by this change. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa From suenaga at oss.nttdata.com Tue Jun 30 23:05:31 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 1 Jul 2020 08:05:31 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <65889ab4-6b7e-86fb-59cc-012520b83138@oracle.com> <4b7df6f8-c8f7-fedf-6366-b885d2235b58@oss.nttdata.com> Message-ID: Hi David, >>> 1271 ResourceMark rm; >>> >>> IIUC at this point the _calling_thread is the current thread, so we can use: >>> >>> ResourceMark rm(_calling_thread); If so, we can call make_local() in L1272 without JavaThread (or we can pass current thread to make_local()). Is it right? ``` 1271 ResourceMark rm; 1272 _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, thread_oop), 1273 jt, thread_oop); ``` Thanks, Yasumasa On 2020/07/01 7:05, David Holmes wrote: > On 1/07/2020 12:17 am, Yasumasa Suenaga wrote: >> Hi David, >> >> Thank you for reviewing! I will update new webrev tomorrow. >> >>> 466 class MultipleStackTracesCollector : public StackObj { >>> >>> ? 498 class VM_GetAllStackTraces : public VM_Operation { >>> ? 499 private: >>> ? 500?? JavaThread *_calling_thread; >>> ? 501?? jint _final_thread_count; >>> ? 502?? MultipleStackTracesCollector _collector; >>> >>> You can't have a StackObj as a member of another class like that as it may not be on the stack. I think MultipleStackTracesCollector should not extend any allocation class, and should always be embedded directly in another class. >> >> I'm not sure what does mean "embedded". >> Is it ok as below? >> >> ``` >> class MultipleStackTracesCollector { >> ??? : >> } >> >> class GetAllStackTraces : public VM_Operation { >> ?? private: >> ???? MultipleStackTracesCollector _collector; >> } >> ``` > > Yes that I what I meant. > > Thanks, > David > ----- > >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/30 22:22, David Holmes wrote: >>> Hi Yasumasa, >>> >>> On 30/06/2020 10:05 am, Yasumasa Suenaga wrote: >>>> Hi David, Serguei, >>>> >>>> I updated webrev for 8242428. Could you review again? >>>> This change migrate to use direct handshake for GetStackTrace() and GetThreadListStackTraces() (when thread_count == 1). >>>> >>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >>> >>> This looks really good now! I only have a few nits below. There is one thing I don't like about it but it requires a change to the main Handshake logic to address - in JvmtiEnv::GetThreadListStackTraces you have to create a ThreadsListHandle to convert the jthread to a JavaThread, but then the Handshake::execute_direct creates another ThreadsListHandle internally. That's a waste. I will discuss with Robbin and file a RFE to have an overload of execute_direct that takes an existing TLH. Actually it's worse than that because we have another TLH in use at the entry point for the JVMTI functions, so I think there may be some scope for simplifying the use of TLH instances - future RFE. >>> >>> --- >>> >>> src/hotspot/share/prims/jvmtiEnvBase.hpp >>> >>> ??451?? GetStackTraceClosure(JvmtiEnv *env, jint start_depth, jint max_count, >>> ??452??????????????????????? jvmtiFrameInfo* frame_buffer, jint* count_ptr) >>> ??453???? : HandshakeClosure("GetStackTrace"), >>> ??454?????? _env(env), _start_depth(start_depth), _max_count(max_count), >>> ??455?????? _frame_buffer(frame_buffer), _count_ptr(count_ptr), >>> ??456?????? _result(JVMTI_ERROR_THREAD_NOT_ALIVE) { >>> >>> Nit: can you do one initializer per line please. >>> >>> This looks wrong: >>> >>> 466 class MultipleStackTracesCollector : public StackObj { >>> >>> ??498 class VM_GetAllStackTraces : public VM_Operation { >>> ??499 private: >>> ??500?? JavaThread *_calling_thread; >>> ??501?? jint _final_thread_count; >>> ??502?? MultipleStackTracesCollector _collector; >>> >>> You can't have a StackObj as a member of another class like that as it may not be on the stack. I think MultipleStackTracesCollector should not extend any allocation class, and should always be embedded directly in another class. >>> >>> 481?? MultipleStackTracesCollector(JvmtiEnv *env, jint max_frame_count) { >>> ??482???? _env = env; >>> ??483???? _max_frame_count = max_frame_count; >>> ??484???? _frame_count_total = 0; >>> ??485???? _head = NULL; >>> ??486???? _stack_info = NULL; >>> ??487???? _result = JVMTI_ERROR_NONE; >>> ??488?? } >>> >>> As you are touching this can you change it to use an initializer list as you did for the HandshakeClosure, and please keep one item per line. >>> >>> --- >>> >>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>> >>> ??820?? assert(SafepointSynchronize::is_at_safepoint() || >>> ??821????????? java_thread->is_thread_fully_suspended(false, &debug_bits) || >>> ??822????????? current_thread == java_thread->active_handshaker(), >>> ??823????????? "at safepoint / handshake or target thread is suspended"); >>> >>> I don't think the suspension check is necessary, as even if the target is suspended we must still be at a safepoint or in a handshake with it. Makes me wonder if we used to allow a racy stacktrace operation on a suspended thread, assuming it would remain suspended? >>> >>> 1268?? oop thread_oop = jt->threadObj(); >>> 1269 >>> 1270?? if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >>> >>> You can use thread_oop in line 1270. >>> >>> 1272 _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, thread_oop), >>> 1273??????????????????????????? jt, thread_oop); >>> >>> It is frustrating that this entire call chain started with a jthread reference, which we converted to a JavaThread, only to eventually need to convert it back to a jthread! I think there is some scope for simplification here but not as part of this change. >>> >>> 1271???? ResourceMark rm; >>> >>> IIUC at this point the _calling_thread is the current thread, so we can use: >>> >>> ???? ResourceMark rm(_calling_thread); >>> >>> --- >>> >>> Please add @bug lines to the tests. >>> >>> I'm still pondering the test logic but wanted to send this now. >>> >>> Thanks, >>> David >>> ----- >>>> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and VM_GetAllStackTraces (for GetAllStackTraces) have inherited VM_GetMultipleStackTraces VM operation which provides the feature to generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a normal C++ class to share with HandshakeClosure for GetThreadListStackTraces (GetSingleStackTraceClosure). >>>> >>>> Also I added new testcases which test GetThreadListStackTraces() with thread_count == 1 and with all threads. >>>> >>>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>>>> Hi all, >>>>> >>>>> Please review this change: >>>>> >>>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>>>> >>>>> This change replace following VM operations to direct handshake. >>>>> >>>>> ??- VM_GetFrameCount (GetFrameCount()) >>>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>>> ??- VM_GetCurrentLocation >>>>> >>>>> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >>>>> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >>>>> >>>>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >>>>> k/jdwp. >>>>> >>>>> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa From serguei.spitsyn at oracle.com Tue Jun 30 23:24:09 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 30 Jun 2020 16:24:09 -0700 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> Message-ID: <6f46bfec-7135-1580-63cd-668b4d53ff48@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Jun 30 23:48:48 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 1 Jul 2020 09:48:48 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <65889ab4-6b7e-86fb-59cc-012520b83138@oracle.com> <4b7df6f8-c8f7-fedf-6366-b885d2235b58@oss.nttdata.com> Message-ID: Hi Yasumasa, On 1/07/2020 9:05 am, Yasumasa Suenaga wrote: > Hi David, > >>>> 1271???? ResourceMark rm; >>>> >>>> IIUC at this point the _calling_thread is the current thread, so we >>>> can use: >>>> >>>> ???? ResourceMark rm(_calling_thread); > > If so, we can call make_local() in L1272 without JavaThread (or we can > pass current thread to make_local()). Is it right? > > ``` > 1271???? ResourceMark rm; > 1272 > _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, > thread_oop), > 1273??????????????????????????? jt, thread_oop); > ``` Sorry I got confused, _calling_thread may not be the current thread as we could be executing the handshake in the target thread itself. So the ResourceMark is correct as-is (implicitly for current thread). The argument to fill_frames will be used in the jvmtiStackInfo and passed back to the _calling_thread, so it must be created via make_local(_calling_thread, ...) as you presently have. Thanks, David > Thanks, > > Yasumasa > > > On 2020/07/01 7:05, David Holmes wrote: >> On 1/07/2020 12:17 am, Yasumasa Suenaga wrote: >>> Hi David, >>> >>> Thank you for reviewing! I will update new webrev tomorrow. >>> >>>> 466 class MultipleStackTracesCollector : public StackObj { >>>> >>>> ? 498 class VM_GetAllStackTraces : public VM_Operation { >>>> ? 499 private: >>>> ? 500?? JavaThread *_calling_thread; >>>> ? 501?? jint _final_thread_count; >>>> ? 502?? MultipleStackTracesCollector _collector; >>>> >>>> You can't have a StackObj as a member of another class like that as >>>> it may not be on the stack. I think MultipleStackTracesCollector >>>> should not extend any allocation class, and should always be >>>> embedded directly in another class. >>> >>> I'm not sure what does mean "embedded". >>> Is it ok as below? >>> >>> ``` >>> class MultipleStackTracesCollector { >>> ??? : >>> } >>> >>> class GetAllStackTraces : public VM_Operation { >>> ?? private: >>> ???? MultipleStackTracesCollector _collector; >>> } >>> ``` >> >> Yes that I what I meant. >> >> Thanks, >> David >> ----- >> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/30 22:22, David Holmes wrote: >>>> Hi Yasumasa, >>>> >>>> On 30/06/2020 10:05 am, Yasumasa Suenaga wrote: >>>>> Hi David, Serguei, >>>>> >>>>> I updated webrev for 8242428. Could you review again? >>>>> This change migrate to use direct handshake for GetStackTrace() and >>>>> GetThreadListStackTraces() (when thread_count == 1). >>>>> >>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >>>> >>>> This looks really good now! I only have a few nits below. There is >>>> one thing I don't like about it but it requires a change to the main >>>> Handshake logic to address - in JvmtiEnv::GetThreadListStackTraces >>>> you have to create a ThreadsListHandle to convert the jthread to a >>>> JavaThread, but then the Handshake::execute_direct creates another >>>> ThreadsListHandle internally. That's a waste. I will discuss with >>>> Robbin and file a RFE to have an overload of execute_direct that >>>> takes an existing TLH. Actually it's worse than that because we have >>>> another TLH in use at the entry point for the JVMTI functions, so I >>>> think there may be some scope for simplifying the use of TLH >>>> instances - future RFE. >>>> >>>> --- >>>> >>>> src/hotspot/share/prims/jvmtiEnvBase.hpp >>>> >>>> ??451?? GetStackTraceClosure(JvmtiEnv *env, jint start_depth, jint >>>> max_count, >>>> ??452??????????????????????? jvmtiFrameInfo* frame_buffer, jint* >>>> count_ptr) >>>> ??453???? : HandshakeClosure("GetStackTrace"), >>>> ??454?????? _env(env), _start_depth(start_depth), >>>> _max_count(max_count), >>>> ??455?????? _frame_buffer(frame_buffer), _count_ptr(count_ptr), >>>> ??456?????? _result(JVMTI_ERROR_THREAD_NOT_ALIVE) { >>>> >>>> Nit: can you do one initializer per line please. >>>> >>>> This looks wrong: >>>> >>>> 466 class MultipleStackTracesCollector : public StackObj { >>>> >>>> ??498 class VM_GetAllStackTraces : public VM_Operation { >>>> ??499 private: >>>> ??500?? JavaThread *_calling_thread; >>>> ??501?? jint _final_thread_count; >>>> ??502?? MultipleStackTracesCollector _collector; >>>> >>>> You can't have a StackObj as a member of another class like that as >>>> it may not be on the stack. I think MultipleStackTracesCollector >>>> should not extend any allocation class, and should always be >>>> embedded directly in another class. >>>> >>>> 481?? MultipleStackTracesCollector(JvmtiEnv *env, jint >>>> max_frame_count) { >>>> ??482???? _env = env; >>>> ??483???? _max_frame_count = max_frame_count; >>>> ??484???? _frame_count_total = 0; >>>> ??485???? _head = NULL; >>>> ??486???? _stack_info = NULL; >>>> ??487???? _result = JVMTI_ERROR_NONE; >>>> ??488?? } >>>> >>>> As you are touching this can you change it to use an initializer >>>> list as you did for the HandshakeClosure, and please keep one item >>>> per line. >>>> >>>> --- >>>> >>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>> >>>> ??820?? assert(SafepointSynchronize::is_at_safepoint() || >>>> ??821????????? java_thread->is_thread_fully_suspended(false, >>>> &debug_bits) || >>>> ??822????????? current_thread == java_thread->active_handshaker(), >>>> ??823????????? "at safepoint / handshake or target thread is >>>> suspended"); >>>> >>>> I don't think the suspension check is necessary, as even if the >>>> target is suspended we must still be at a safepoint or in a >>>> handshake with it. Makes me wonder if we used to allow a racy >>>> stacktrace operation on a suspended thread, assuming it would remain >>>> suspended? >>>> >>>> 1268?? oop thread_oop = jt->threadObj(); >>>> 1269 >>>> 1270?? if (!jt->is_exiting() && (jt->threadObj() != NULL)) { >>>> >>>> You can use thread_oop in line 1270. >>>> >>>> 1272 >>>> _collector.fill_frames((jthread)JNIHandles::make_local(_calling_thread, >>>> thread_oop), >>>> 1273??????????????????????????? jt, thread_oop); >>>> >>>> It is frustrating that this entire call chain started with a jthread >>>> reference, which we converted to a JavaThread, only to eventually >>>> need to convert it back to a jthread! I think there is some scope >>>> for simplification here but not as part of this change. >>>> >>>> 1271???? ResourceMark rm; >>>> >>>> IIUC at this point the _calling_thread is the current thread, so we >>>> can use: >>>> >>>> ???? ResourceMark rm(_calling_thread); >>>> >>>> --- >>>> >>>> Please add @bug lines to the tests. >>>> >>>> I'm still pondering the test logic but wanted to send this now. >>>> >>>> Thanks, >>>> David >>>> ----- >>>>> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and >>>>> VM_GetAllStackTraces (for GetAllStackTraces) have inherited >>>>> VM_GetMultipleStackTraces VM operation which provides the feature >>>>> to generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces >>>>> to a normal C++ class to share with HandshakeClosure for >>>>> GetThreadListStackTraces (GetSingleStackTraceClosure). >>>>> >>>>> Also I added new testcases which test GetThreadListStackTraces() >>>>> with thread_count == 1 and with all threads. >>>>> >>>>> This change has been tested in serviceability/jvmti >>>>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>>>> vmTestbase/nsk/jdwp. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>>>>> Hi all, >>>>>> >>>>>> Please review this change: >>>>>> >>>>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>>>> ?? webrev: >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>>>>> >>>>>> This change replace following VM operations to direct handshake. >>>>>> >>>>>> ??- VM_GetFrameCount (GetFrameCount()) >>>>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>>>> ??- VM_GetCurrentLocation >>>>>> >>>>>> GetThreadListStackTrace() uses direct handshake if thread count == >>>>>> 1. In other case (thread count > 1), it would be performed as VM >>>>>> operation (VM_GetThreadListStackTraces). >>>>>> Caller of VM_GetCurrentLocation >>>>>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>>>>> safepoint. So I added safepoint check in its caller. >>>>>> >>>>>> This change has been tested in serviceability/jvmti >>>>>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>>>>> vmTestbase/ns >>>>>> k/jdwp. >>>>>> >>>>>> Also I tested it on submit repo, then it has execution error >>>>>> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >>>>>> dependency error. So I think it does not occur by this change. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa From suenaga at oss.nttdata.com Tue Jun 30 23:51:15 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 1 Jul 2020 08:51:15 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <6f46bfec-7135-1580-63cd-668b4d53ff48@oracle.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <6f46bfec-7135-1580-63cd-668b4d53ff48@oracle.com> Message-ID: <52009673-1334-5810-d0e4-d47aec316254@oss.nttdata.com> Hi Serguei, On 2020/07/01 8:24, serguei.spitsyn at oracle.com wrote: > Hi Yasumasa, > > Thank you for separating your initial webrev. > I'll do a full review after you address comments from David and Robbin as I'm stepping on the same ground. > > Just a quick comment now. > > http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/src/hotspot/share/prims/jvmtiEnv.cpp.udiff.html > > I've already asked in prev. round to make this renaming: target_javathread => java_thread > The identifier java_thread is normally used in the jvmtiEnv.cpp functions. > The target_javathread sounds very unusual. Sorry, I've fixed it. I will update webrev. > I do not like much the introduction ofthe GetSingleStackTraceClosure. > It feels like it can be done in a more elegant way. > From the other hand, it is not that bad. :-) I thought to use handshake on VMThread (!= direct handshake) for GetThreadListStackTraces() to be simplify implementation. However it would be queued as VM op (of course STW would not happen), so I introduced GetSingleStackTraceClosure. Thanks, Yasumasa > Thanks, > Serguei > > > On 6/29/20 17:05, Yasumasa Suenaga wrote: >> Hi David, Serguei, >> >> I updated webrev for 8242428. Could you review again? >> This change migrate to use direct handshake for GetStackTrace() and GetThreadListStackTraces() (when thread_count == 1). >> >> http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >> >> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and VM_GetAllStackTraces (for GetAllStackTraces) have inherited VM_GetMultipleStackTraces VM operation which provides the feature to generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a normal C++ class to share with HandshakeClosure for GetThreadListStackTraces (GetSingleStackTraceClosure). >> >> Also I added new testcases which test GetThreadListStackTraces() with thread_count == 1 and with all threads. >> >> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>> Hi all, >>> >>> Please review this change: >>> >>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>> >>> This change replace following VM operations to direct handshake. >>> >>> ??- VM_GetFrameCount (GetFrameCount()) >>> ??- VM_GetFrameLocation (GetFrameLocation()) >>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>> ??- VM_GetCurrentLocation >>> >>> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >>> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >>> >>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >>> k/jdwp. >>> >>> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >>> >>> >>> Thanks, >>> >>> Yasumasa > From david.holmes at oracle.com Tue Jun 30 23:54:18 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 1 Jul 2020 09:54:18 +1000 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: <52009673-1334-5810-d0e4-d47aec316254@oss.nttdata.com> References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <6f46bfec-7135-1580-63cd-668b4d53ff48@oracle.com> <52009673-1334-5810-d0e4-d47aec316254@oss.nttdata.com> Message-ID: On 1/07/2020 9:51 am, Yasumasa Suenaga wrote: > Hi Serguei, > > On 2020/07/01 8:24, serguei.spitsyn at oracle.com wrote: >> Hi Yasumasa, >> >> Thank you for separating your initial webrev. >> I'll do a full review after you address comments from David and Robbin >> as I'm stepping on the same ground. >> >> Just a quick comment now. >> >> http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/src/hotspot/share/prims/jvmtiEnv.cpp.udiff.html >> >> >> I've already asked in prev. round to make this renaming: >> target_javathread => java_thread >> The identifier java_thread is normally used in the jvmtiEnv.cpp >> functions. >> The target_javathread sounds very unusual. > > Sorry, I've fixed it. I will update webrev. > > >> I do not like much the introduction ofthe GetSingleStackTraceClosure. >> It feels like it can be done in a more elegant way. >> ?From the other hand, it is not that bad. :-) > > I thought to use handshake on VMThread (!= direct handshake) for > GetThreadListStackTraces() to be simplify implementation. > However it would be queued as VM op (of course STW would not happen), so > I introduced GetSingleStackTraceClosure. Getting multiple stacktraces must be a stop-the-world operation, so that the traces are consistent. David > > Thanks, > > Yasumasa > > >> Thanks, >> Serguei >> >> >> On 6/29/20 17:05, Yasumasa Suenaga wrote: >>> Hi David, Serguei, >>> >>> I updated webrev for 8242428. Could you review again? >>> This change migrate to use direct handshake for GetStackTrace() and >>> GetThreadListStackTraces() (when thread_count == 1). >>> >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >>> >>> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and >>> VM_GetAllStackTraces (for GetAllStackTraces) have inherited >>> VM_GetMultipleStackTraces VM operation which provides the feature to >>> generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a >>> normal C++ class to share with HandshakeClosure for >>> GetThreadListStackTraces (GetSingleStackTraceClosure). >>> >>> Also I added new testcases which test GetThreadListStackTraces() with >>> thread_count == 1 and with all threads. >>> >>> This change has been tested in serviceability/jvmti >>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>> vmTestbase/nsk/jdwp. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>>> Hi all, >>>> >>>> Please review this change: >>>> >>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>>> >>>> This change replace following VM operations to direct handshake. >>>> >>>> ??- VM_GetFrameCount (GetFrameCount()) >>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>> ??- VM_GetCurrentLocation >>>> >>>> GetThreadListStackTrace() uses direct handshake if thread count == >>>> 1. In other case (thread count > 1), it would be performed as VM >>>> operation (VM_GetThreadListStackTraces). >>>> Caller of VM_GetCurrentLocation >>>> (JvmtiEnvThreadState::reset_current_location()) might be called at >>>> safepoint. So I added safepoint check in its caller. >>>> >>>> This change has been tested in serviceability/jvmti >>>> serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi >>>> vmTestbase/ns >>>> k/jdwp. >>>> >>>> Also I tested it on submit repo, then it has execution error >>>> (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to >>>> dependency error. So I think it does not occur by this change. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >> From suenaga at oss.nttdata.com Tue Jun 30 23:58:34 2020 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Wed, 1 Jul 2020 08:58:34 +0900 Subject: RFR: 8242428: JVMTI thread operations should use Thread-Local Handshake In-Reply-To: References: <6c263e4c-0a83-be7a-0288-ecb2de0b7cea@oss.nttdata.com> <2ea2f5d8-5c43-a7d3-d525-cd44c4009d25@oss.nttdata.com> <6f46bfec-7135-1580-63cd-668b4d53ff48@oracle.com> <52009673-1334-5810-d0e4-d47aec316254@oss.nttdata.com> Message-ID: On 2020/07/01 8:54, David Holmes wrote: > On 1/07/2020 9:51 am, Yasumasa Suenaga wrote: >> Hi Serguei, >> >> On 2020/07/01 8:24, serguei.spitsyn at oracle.com wrote: >>> Hi Yasumasa, >>> >>> Thank you for separating your initial webrev. >>> I'll do a full review after you address comments from David and Robbin as I'm stepping on the same ground. >>> >>> Just a quick comment now. >>> >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/src/hotspot/share/prims/jvmtiEnv.cpp.udiff.html >>> >>> I've already asked in prev. round to make this renaming: target_javathread => java_thread >>> The identifier java_thread is normally used in the jvmtiEnv.cpp functions. >>> The target_javathread sounds very unusual. >> >> Sorry, I've fixed it. I will update webrev. >> >> >>> I do not like much the introduction ofthe GetSingleStackTraceClosure. >>> It feels like it can be done in a more elegant way. >>> ?From the other hand, it is not that bad. :-) >> >> I thought to use handshake on VMThread (!= direct handshake) for GetThreadListStackTraces() to be simplify implementation. >> However it would be queued as VM op (of course STW would not happen), so I introduced GetSingleStackTraceClosure. > > Getting multiple stacktraces must be a stop-the-world operation, so that the traces are consistent. Yes, this is the case of thread_count == 1. Yasumasa > David > >> >> Thanks, >> >> Yasumasa >> >> >>> Thanks, >>> Serguei >>> >>> >>> On 6/29/20 17:05, Yasumasa Suenaga wrote: >>>> Hi David, Serguei, >>>> >>>> I updated webrev for 8242428. Could you review again? >>>> This change migrate to use direct handshake for GetStackTrace() and GetThreadListStackTraces() (when thread_count == 1). >>>> >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.01/ >>>> >>>> VM_GetThreadListStackTrace (for GetThreadListStackTraces) and VM_GetAllStackTraces (for GetAllStackTraces) have inherited VM_GetMultipleStackTraces VM operation which provides the feature to generate jvmtiStackInfo. I modified? VM_GetMultipleStackTraces to a normal C++ class to share with HandshakeClosure for GetThreadListStackTraces (GetSingleStackTraceClosure). >>>> >>>> Also I added new testcases which test GetThreadListStackTraces() with thread_count == 1 and with all threads. >>>> >>>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/nsk/jdwp. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2020/06/24 15:50, Yasumasa Suenaga wrote: >>>>> Hi all, >>>>> >>>>> Please review this change: >>>>> >>>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8242428 >>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8242428/webrev.00/ >>>>> >>>>> This change replace following VM operations to direct handshake. >>>>> >>>>> ??- VM_GetFrameCount (GetFrameCount()) >>>>> ??- VM_GetFrameLocation (GetFrameLocation()) >>>>> ??- VM_GetThreadListStackTraces (GetThreadListStackTrace()) >>>>> ??- VM_GetCurrentLocation >>>>> >>>>> GetThreadListStackTrace() uses direct handshake if thread count == 1. In other case (thread count > 1), it would be performed as VM operation (VM_GetThreadListStackTraces). >>>>> Caller of VM_GetCurrentLocation (JvmtiEnvThreadState::reset_current_location()) might be called at safepoint. So I added safepoint check in its caller. >>>>> >>>>> This change has been tested in serviceability/jvmti serviceability/jdwp vmTestbase/nsk/jvmti vmTestbase/nsk/jdi vmTestbase/ns >>>>> k/jdwp. >>>>> >>>>> Also I tested it on submit repo, then it has execution error (mach5-one-ysuenaga-JDK-8242428-20200624-0054-12034717) due to dependency error. So I think it does not occur by this change. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>