From jcbeyler at google.com Mon Apr 2 17:17:03 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 02 Apr 2018 17:17:03 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi Derek, I know there were a few things that went in that provoked a merge conflict. I worked on it and got it up to date. Sadly my lack of knowledge makes it a full rebase instead of keeping all the history. However, with a newly cloned jdk/hs you should now be able to use: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ The change you are referring to was done with the others so perhaps you were unlucky and I forgot it in a webrev and fixed it in another? I don't know but it's been there and I checked, it is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html I double checked that tlab_end_offset no longer appears in any architecture (as far as I can tell :)). Thanks for testing and let me know if you run into any other issues! Jc On Fri, Mar 30, 2018 at 4:24 PM White, Derek wrote: > Hi Jc, > > > > I?ve been having trouble getting your patch to apply correctly. I may have > based it on the wrong version. > > > > In any case, I think there?s a missing update to > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where > ?JavaThread::tlab_end_offset()? should become > ?JavaThread::tlab_current_end_offset()?. > > > > This should correspond to the other port?s changes in > templateTable_.cpp files. > > > > Thanks! > - Derek > > > > *From:* hotspot-compiler-dev [mailto: > hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of *JC Beyler > *Sent:* Wednesday, March 28, 2018 11:43 AM > *To:* Erik ?sterlund > *Cc:* serviceability-dev at openjdk.java.net; hotspot-compiler-dev < > hotspot-compiler-dev at openjdk.java.net> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in the > JVMTI spec. > > > > Here is the two incremental webrevs: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > - Currently the tests pass 100 times in a row, I am working on checking > if they pass 1000 times in a row. > > - The default sampling rate is set to 512k, this is what we use > internally and having a default means that to enable the sampling with the > default, the user only has to do a enable event/disable event via JVMTI > (instead of enable + set sample rate). > > - I deprecated the code that was handling the fast path tlab refill if > it happened since this is now deprecated > > - Though I saw that Graal is still using it so I have to see what > needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when > the event system is turned on and the callback to the native agent is just > empty. I got a 3% overhead with a 512k sampling rate with the code I put in > the native side of my tests. > > > > Thanks and comments are appreciated, > > Jc > > > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler wrote: > > Hi all, > > > > The incremental webrev update is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > - I've removed the heapMonitoring.cpp code in favor of just having the > sampling events as per Serguei's request; I still have to do some overhead > measurements but the tests prove the concept can work > > - Most of the tlab code is unchanged, the only major part is that > now things get sent off to event collectors when used and enabled. > > - Added the interpreter collectors to handle interpreter execution > > - Updated the name from SetTlabHeapSampling to SetHeapSampling to be > more generic > > - Added a mutex for the thread sampling so that we can initialize an > internal static array safely > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed the heap > sampling code entirely and now rely entirely on the event sampling system. > The tests reflect this by using a simplified implementation of what an > agent could do: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had originally in > heapMonitoring to this. I hesitate only because that code was in C++, I'd > have to port it to C and this is for tests so perhaps what I have now is > good enough? > > > > As far as testing goes, I've ported all the relevant tests and then added > a few: > > - Turning the system on/off > > - Testing using various GCs > > - Testing using the interpreter > > - Testing the sampling rate > > - Testing with objects and arrays > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off vs a clean > build and I have 0% overhead, which is what we'd want. This was using the > Dacapo benchmarks. I am now preparing to run a version with the events on > using dacapo and will report back here. > > > > Any comments are welcome :) > > Jc > > > > > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler wrote: > > Hi all, > > > > I apologize for the delay but I wanted to add an event system and that > took a bit longer than expected and I also reworked the code to take into > account the deprecation of FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to ThreadHeapSampler inside of > Thread. Would you prefer it as a pointer inside of Thread or like this > works for you? Second question would be would you rather have an > association outside of Thread altogether that tries to remember when > threads are live and then we would have something like: > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too too bad? > > > > B) I also have been working on the Allocation event system that sends out > a notification at each sampled event. This will be practical when wanting > to do something at the allocation point. I'm also looking at if the whole > heapMonitoring code could not reside in the agent code and not in the JDK. > I'm not convinced but I'm talking to Serguei about it to see/assess :) > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since now > FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the methods. > > > > So the incremental webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track this :) > > > > Thanks for your input, > > Jc > > > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler wrote: > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer Robbin's concerns > about the same thing (adding things to Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > wrote: > > Hi JC, > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own answers/comments. > > > > I've done a new webrev here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > > The incremental is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > > Note to all: > > - I've been integrating changes from Erin/Serguei/David comments so this > webrev incremental is a bit an answer to all comments in one. I apologize > for that :) > > > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > wrote: > > Hi JC, > > Sorry for the delayed reply. > > Inlined answers: > > > > On 2018-02-06 00:04, JC Beyler wrote: > > Hi Erik, > > (Renaming this to be folded into the newly renamed thread :)) > > First off, thanks a lot for reviewing the webrev! I appreciate it! > > I updated the webrev to: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > And the incremental one is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > It contains: > - The change for since from 9 to 11 for the jvmti.xml > - The use of the OrderAccess for initialized > - Clearing the oop > > I also have inlined my answers to your comments. The biggest question > will come from the multiple *_end variables. A bit of the logic there > is due to handling the slow path refill vs fast path refill and > checking that the rug was not pulled underneath the slowpath. I > believe that a previous comment was that TlabFastRefill was going to > be deprecated. > > If this is true, we could revert this code a bit and just do a : if > TlabFastRefill is enabled, disable this. And then deprecate that when > TlabFastRefill is deprecated. > > This might simplify this webrev and I can work on a follow-up that > either: removes TlabFastRefill if Robbin does not have the time to do > it or add the support to the assembly side to handle this correctly. > What do you think? > > > > I support removing TlabFastRefill, but I think it is good to not depend on > that happening first. > > > > > I'm slowly pushing on the FastTLABRefill (https://bugs.openjdk.java.net/browse/JDK-8194084), > I agree on keeping both separate for now though so that we can think of > both differently > > > > > > Now, below, inlined are my answers: > > On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund > wrote: > > Hi JC, > > Hope I am reviewing the right version of your work. Here goes... > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > 159 AllocTracer::send_allocation_outside_tlab(klass, result, size * > HeapWordSize, THREAD); > 160 > 161 THREAD->tlab().handle_sample(THREAD, result, size); > 162 return result; > 163 } > > Should not call tlab()->X without checking if (UseTLAB) IMO. > > Done! > > > More about this later. > > > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > So first of all, there seems to quite a few ends. There is an "end", a > "hard > end", a "slow path end", and an "actual end". Moreover, it seems like the > "hard end" is actually further away than the "actual end". So the "hard > end" > seems like more of a "really definitely actual end" or something. I don't > know about you, but I think it looks kind of messy. In particular, I don't > feel like the name "actual end" reflects what it represents, especially > when > there is another end that is behind the "actual end". > > 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { > 414 // Did a fast TLAB refill occur? > 415 if (_slow_path_end != _end) { > 416 // Fix up the actual end to be now the end of this TLAB. > 417 _slow_path_end = _end; > 418 _actual_end = _end; > 419 } > 420 > 421 return _actual_end + alignment_reserve(); > 422 } > > I really do not like making getters unexpectedly have these kind of side > effects. It is not expected that when you ask for the "hard end", you > implicitly update the "slow path end" and "actual end" to new values. > > As I said, a lot of this is due to the FastTlabRefill. If I make this > not supporting FastTlabRefill, this goes away. The reason the system > needs to update itself at the get is that you only know at that get if > things have shifted underneath the tlab slow path. I am not sure of > really better names (naming is hard!), perhaps we could do these > names: > > - current_tlab_end // Either the allocated tlab end or a sampling > point > - last_allocation_address // The end of the tlab allocation > - last_slowpath_allocated_end // In case a fast refill occurred the > end might have changed, this is to remember slow vs fast past refills > > the hard_end method can be renamed to something like: > tlab_end_pointer() // The end of the lab including a bit of > alignment reserved bytes > > > > Those names sound better to me. Could you please provide a mapping from > the old names to the new names so I understand which one is which please? > > This is my current guess of what you are proposing: > > end -> current_tlab_end > actual_end -> last_allocation_address > slow_path_end -> last_slowpath_allocated_end > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was proposing. > > > > I would prefer this naming: > > end -> slow_path_end // the end for taking a slow path; either due to > sampling or refilling > actual_end -> allocation_end // the end for allocations > slow_path_end -> last_slow_path_end // last address for slow_path_end (as > opposed to allocation_end) > hard_end -> reserved_end // the end of the reserved space of the TLAB > > About setting things in the getter... that still seems like a very > unpleasant thing to me. It would be better to inspect the call hierarchy > and explicitly update the ends where they need updating, and assert in the > getter that they are in sync, rather than implicitly setting various ends > as a surprising side effect in a getter. It looks like the call hierarchy > is very small. With my new naming convention, reserved_end() would > presumably return _allocation_end + alignment_reserve(), and have an assert > checking that _allocation_end == _last_slow_path_allocation_end, > complaining that this invariant must hold, and that a caller to this > function, such as make_parsable(), must first explicitly synchronize the > ends as required, to honor that invariant. > > > > > > > I've renamed the variables to how you preferred it except for the _end > one. I did: > > current_end > > last_allocation_address > > tlab_end_ptr > > > > The reason is that the architecture dependent code use the thread.hpp API > and it already has tlab included into the name so it becomes > tlab_current_end (which is better that tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method with a TODO that says to > remove it when FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > Note that the following comment now needs updating accordingly in > threadLocalAllocBuffer.hpp: > > 41 // Heap sampling is performed via the end/actual_end fields. > > 42 // actual_end contains the real end of the tlab allocation, > > 43 // whereas end can be set to an arbitrary spot in the tlab to > > 44 // trip the return and sample the allocation. > > 45 // slow_path_end is used to track if a fast tlab refill occured > > 46 // between slowpath calls. > > There might be other comments too, I have not looked in detail. > > > > This was the only spot that still had an actual_end, I fixed it now. I'll > do a sweep to double check other comments. > > > > > > > > > > > > > Not sure it's better but before updating the webrev, I wanted to try > to get input/consensus :) > > (Note hard_end was always further off than end). > > src/hotspot/share/prims/jvmti.xml: > > 10357 > 10358 > 10359 Can sample the heap. > 10360 If this capability is enabled then the heap sampling > methods > can be called. > 10361 > 10362 > > Looks like this capability should not be "since 9" if it gets integrated > now. > > Updated now to 11, crossing my fingers :) > > src/hotspot/share/runtime/heapMonitoring.cpp: > > 448 if (is_alive->do_object_b(value)) { > 449 // Update the oop to point to the new object if it is still > alive. > 450 f->do_oop(&(trace.obj)); > 451 > 452 // Copy the old trace, if it is still live. > 453 _allocated_traces->at_put(curr_pos++, trace); > 454 > 455 // Store the live trace in a cache, to be served up on > /heapz. > 456 _traces_on_last_full_gc->append(trace); > 457 > 458 count++; > 459 } else { > 460 // If the old trace is no longer live, add it to the list of > 461 // recently collected garbage. > 462 store_garbage_trace(trace); > 463 } > > In the case where the oop was not live, I would like it to be explicitly > cleared. > > Done I think how you wanted it. Let me know because I'm not familiar > with the RootAccess API. I'm unclear if I'm doing this right or not so > reviews of these parts are highly appreciated. Robbin had talked of > perhaps later pushing this all into a OopStorage, should I do this now > do you think? Or can that wait a second webrev later down the road? > > > > I think using handles can and should be done later. You can use the Access > API now. > I noticed that you are missing an #include "oops/access.inline.hpp" in > your heapMonitoring.cpp file. > > > > The missing header is there for me so I don't know, I made sure it is > present in the latest webrev. Sorry about that. > > > > + Did I clear it the way you wanted me to or were you thinking of > something else? > > > That is precisely how I wanted it to be cleared. Thanks. > > + Final question here, seems like if I were to want to not do the > f->do_oop directly on the trace.obj, I'd need to do something like: > > f->do_oop(&value); > ... > trace->store_oop(value); > > to update the oop internally. Is that right/is that one of the > advantages of going to the Oopstorage sooner than later? > > > I think you really want to do the do_oop on the root directly. Is there a > particular reason why you would not want to do that? > Otherwise, yes - the benefit with using the handle approach is that you do > not need to call do_oop explicitly in your code. > > > > There is no reason except that now we have a load_oop and a get_oop_addr, > I was not sure what you would think of that. > > > > > > That's fine. > > > > > > > Also I see a lot of concurrent-looking use of the following field: > 267 volatile bool _initialized; > > Please note that the "volatile" qualifier does not help with reordering > here. Reordering between volatile and non-volatile fields is completely > free > for both compiler and hardware, except for windows with MSVC, where > volatile > semantics is defined to use acquire/release semantics, and the hardware is > TSO. But for the general case, I would expect this field to be stored with > OrderAccess::release_store and loaded with OrderAccess::load_acquire. > Otherwise it is not thread safe. > > Because everything is behind a mutex, I wasn't really worried about > this. I have a test that has multiple threads trying to hit this > corner case and it passes. > > However, to be paranoid, I updated it to using the OrderAccess API > now, thanks! Let me know what you think there too! > > > If it is indeed always supposed to be read and written under a mutex, then > I would strongly prefer to have it accessed as a normal non-volatile > member, and have an assertion that given lock is held or we are in a > safepoint, as we do in many other places. Something like this: > > assert(HeapMonitorStorage_lock->owned_by_self() || > (SafepointSynchronize::is_at_safepoint() && > Thread::current()->is_VM_thread()), "this should not be accessed > concurrently"); > > It would be confusing to people reading the code if there are uses of > OrderAccess that are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the code! I put it around > each access/assignment of the _initialized method and found one case where > yes you can touch it and not have the lock. It actually is "ok" because you > don't act on the storage until later and only when you really want to > modify the storage (see the object_alloc_do_sample method which calls the > add_trace method). > > > > But, because of this, I'm going to put the OrderAccess here, I'll do some > performance numbers later and if there are issues, I might add a "unsafe" > read and a "safe" one to make it explicit to the reader. But I don't think > it will come to that. > > > Okay. This double return in heapMonitoring.cpp looks wrong: > > 283 bool initialized() { > 284 return OrderAccess::load_acquire(&_initialized) != 0; > 285 return _initialized; > 286 } > > Since you said object_alloc_do_sample() is the only place where you do not > hold the mutex while reading initialized(), I had a closer look at that. It > looks like in its current shape, the lack of a mutex may lead to a memory > leak. In particular, it first checks if (initialized()). Let's assume this > is now true. It then allocates a bunch of stuff, and checks if the number > of frames were over 0. If they were, it calls > StackTraceStorage::storage()->add_trace() seemingly hoping that after > grabbing the lock in there, initialized() will still return true. But it > could now return false and skip doing anything, in which case the allocated > stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean and checking for > that. It will be in the next webrev. Thanks, the truth is that in our > implementation the system is always on or off, so this never really occurs > :). In this version though, that is not true and it's important to handle > so thanks again! > > > > > > > So the analysis seems to be that _initialized is only used outside of the > mutex in once instance, where it is used to perform double-checked locking, > that actually causes a memory leak. > > I am not proposing how to fix that, just raising the issue. If you still > want to perform this double-checked locking somehow, then the use of > acquire/release still seems odd. Because the memory ordering restrictions > of it never comes into play in this particular case. If it ever did, then > the use of destroy_stuff(); release_store(_initialized, 0) would be broken > anyway as that would imply that whatever concurrent reader there ever was > would after reading _initialized with load_acquire() could *never* read the > data that is concurrently destroyed anyway. I would be biased to think that > RawAccess::load/store looks like a more appropriate solution, > given that the memory leak issue is resolved. I do not know how painful it > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more and the difference > and code really stems from our internal version. In this version however, > there are actually a lot of things going on that I did not go entirely > through in my head but this comment made me ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a check to > HeapMonitoring::enabled(), there is only a small chance that the call is > happening when things have been disabled. So there is no real need to do a > first check on the initialized, it is a rare occurence that a call happens > to object_alloc_do_sample and the initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample without looking > at HeapMonitoring::enabled(), that would be ok too. You would gather the > stacktrace and get nowhere at the add_trace call, which would return false; > so though not optimal performance wise, nothing would break). > > > > Furthermore, the add_trace is really the moment of no return and we have > the mutex lock and then the initialized check. So, in the end, I did two > things: I removed that first check and then I removed the OrderAccess for > the storage initialized. I think now I have a better grasp and > understanding why it was done in our code and why it is not needed here. > Thanks for pointing it out :). This now still passes my JTREG tests, > especially the threaded one. > > > > > > > > > > > > > > > > > As a kind of meta comment, I wonder if it would make sense to add sampling > for non-TLAB allocations. Seems like if someone is rapidly allocating a > whole bunch of 1 MB objects that never fit in a TLAB, I might still be > interested in seeing that in my traces, and not get surprised that the > allocation rate is very high yet not showing up in any profiles. > > That is handled by the handle_sample where you wanted me to put a > UseTlab because you hit that case if the allocation is too big. > > > I see. It was not obvious to me that non-TLAB sampling is done in the TLAB > class. That seems like an abstraction crime. > What I wanted in my previous comment was that we do not call into the TLAB > when we are not using TLABs. If there is sampling logic in the TLAB that is > used for something else than TLABs, then it seems like that logic simply > does not belong inside of the TLAB. It should be moved out of the TLAB, and > instead have the TLAB call this common abstraction that makes sense. > > > > So in the incremental version: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still > a "crime". The reason is that the system has to have the bytes_until_sample > on a per-thread level and it made "sense" to have it with the TLAB > implementation. Also, I was not sure how people felt about adding something > to the thread instance instead. > > > > Do you think it fits better at the Thread level? I can see how difficult > it is to make it happen there and add some logic there. Let me know what > you think. > > > We have an unfortunate situation where everyone that has some fields that > are thread local tend to dump them right into Thread, making the size and > complexity of Thread grow as it becomes tightly coupled with various > unrelated subsystems. It would be desirable to have a separate class for > this instead that encapsulates the sampling logic. That class could > possibly reside in Thread though as a value object of Thread. > > > > I imagined that would be the case but was not sure. I will look at the > example that Robbin is talking about (ThreadSMR) and will see how to > refactor my code to use that. > > > > Thanks again for your help, > > Jc > > > > > > > > > > > Hope I have answered your questions and that my feedback makes sense to > you. > > > > You have and thank you for them, I think we are getting to a cleaner > implementation and things are getting better and more readable :) > > > Yes it is getting better. > > Thanks, > /Erik > > > > > Thanks for your help! > > Jc > > > > > > Thanks, > /Erik > > > > I double checked by changing the test > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java > > to use a smaller Tlab (2048) and made the object bigger and it goes > through that and passes. > > Thanks again for your review and I look forward to your pointers for > the questions I now have raised! > Jc > > > > > > > > > Thanks, > /Erik > > > On 2018-01-26 06:45, JC Beyler wrote: > > Thanks Robbin for the reviews :) > > The new full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ > > I inlined my answers: > > On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn wrote: > > Hi JC, great to see another revision! > > #### > heapMonitoring.cpp > > StackTraceData should not contain the oop for 'safety' reasons. > When StackTraceData is moved from _allocated_traces: > L452 store_garbage_trace(trace); > it contains a dead oop. > _allocated_traces could instead be a tupel of oop and StackTraceData thus > dead oops are not kept. > > Done I used inheritance to make the copier work regardless but the > idea is the same. > > You should use the new Access API for loading the oop, something like > this: > RootAccess::load(...) > I don't think you need to use Access API for clearing the oop, but it > would > look nicer. And you shouldn't probably be using: > Universe::heap()->is_in_reserved(value) > > I am unfamiliar with this but I think I did do it like you wanted me > to (all tests pass so that's a start). I'm not sure how to clear the > oop exactly, is there somewhere that does that, which I can use to do > the same? > > I removed the is_in_reserved, this came from our internal version, I > don't know why it was there but my tests work without so I removed it > :) > > The lock: > L424 MutexLocker mu(HeapMonitorStorage_lock); > Is not needed as far as I can see. > weak_oops_do is called in a safepoint, no TLAB allocation can happen and > JVMTI thread can't access these data-structures. Is there something more > to > this lock that I'm missing? > > Since a thread can call the JVMTI getLiveTraces (or any of the other > ones), it can get to the point of trying to copying the > _allocated_traces. I imagine it is possible that this is happening > during a GC or that it can be started and a GC happens afterwards. > Therefore, it seems to me that you want this protected, no? > > #### > You have 6 files without any changes in them (any more): > g1CollectedHeap.cpp > psMarkSweep.cpp > psParallelCompact.cpp > genCollectedHeap.cpp > referenceProcessor.cpp > thread.hpp > > Done. > > #### > I have not looked closely, but is it possible to hide heap sampling in > AllocTracer ? (with some minor changes to the AllocTracer API) > > I am imagining that you are saying to move the code that does the > sampling code (change the tlab end, do the call to HeapMonitoring, > etc.) into the AllocTracer code itself? I think that is right and I'll > look if that is possible and prepare a webrev to show what would be > needed to make that happen. > > #### > Minor nit, when declaring pointer there is a little mix of having the > pointer adjacent by type name and data name. (Most hotspot code is by > type > name) > E.g. > heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... > heapMonitoring.cpp:733 Method* m = vfst.method(); > (not just this file) > > Done! > > #### > HeapMonitorThreadOnOffTest.java:77 > I would make g_tmp volatile, otherwise the assignment in loop may > theoretical be skipped. > > Also done! > > Thanks again! > Jc > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Apr 2 18:32:50 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Apr 2018 11:32:50 -0700 Subject: [8u] RFR for backport of "JDK-8165736: Error message should be shown when JVMTI agent cannot be attached" to jdk8u-dev In-Reply-To: <70a18b4a-a310-babe-1f41-c86100638457@oracle.com> References: <8c218a37-4a50-4b4f-847b-4c67e02b7866@default> <70a18b4a-a310-babe-1f41-c86100638457@oracle.com> Message-ID: Hi Shafi, I agree with David. Consider it reviewed if you add the initialization of ebuf. Thanks, Serguei On 3/31/18 00:24, David Holmes wrote: > Hi Shafi, > > On 29/03/2018 7:11 PM, Shafi Ahmad wrote: >> Hi, >> >> Please review the backport of ' JDK-8165736: Error message should be >> shown when JVMTI agent cannot be attached' to jdk8u-dev. >> Please note that this is not a clean backport because we can't not >> backport native jtreg tests as? infrastructure of naive jtreg test >> has been available since JDK 9. > > Ok. > >> webrev: http://cr.openjdk.java.net/~shshahma/8165736/ >> jdk10 bug: https://bugs.openjdk.java.net/browse/JDK-8165736 >> original patch pushed to jdk10: >> http://hg.openjdk.java.net/jdk/jdk/rev/bc1cffa26561 > > src/share/vm/prims/jvmtiExport.cpp > > You missed the initalization of ebuf: > > +? char ebuf[1024] = {0}; > > Otherwise the functional backport seems okay. > > Thanks, > David > >> Test:? Run jprt -testset hotspot, -testset core >> >> Regards, >> Shafi >> From serguei.spitsyn at oracle.com Mon Apr 2 21:44:15 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Apr 2018 14:44:15 -0700 Subject: RFR 4613913: Four EventRequest methods are invokable on deleted request In-Reply-To: References: <579aad5f-fdaa-e0c9-dc16-7bc2394cb82f@oracle.com> Message-ID: <9d8eb853-3e20-7369-a28d-0323cbc1b6e1@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon Apr 2 22:02:55 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Apr 2018 08:02:55 +1000 Subject: RFR 4613913: Four EventRequest methods are invokable on deleted request In-Reply-To: <9d8eb853-3e20-7369-a28d-0323cbc1b6e1@oracle.com> References: <579aad5f-fdaa-e0c9-dc16-7bc2394cb82f@oracle.com> <9d8eb853-3e20-7369-a28d-0323cbc1b6e1@oracle.com> Message-ID: Hi Serguei, On 3/04/2018 7:44 AM, serguei.spitsyn at oracle.com wrote: > Hi David and Daniil, > > > David, > > Thank you for raising this concern. > You are right. > > I've made a mistake when looked at the EventRequest.isEnabled() spec and > thought > that the following spec lines of the setEnbaled() belong to the isEnabled() > and other 3 methods as well: > > Throws: > |InvalidRequestStateException > | > - if this request has been deleted. > > In fact, the JDI spec for methods isEnabled(), getProperty(), > putProperty() and suspendPolicy() > does not say they can throw the InvalidRequestStateException. > > So, now I'd suggest to just relax the test checks by not expecting an > InvalidRequestStateException from isEnabled(), getProperty(), putProperty() > and suspendPolicy(). > > Would this approach resolve your concern? Yes. The semantics for these methods was established way back in 2000 under: https://bugs.openjdk.java.net/browse/JDK-4320478 I think this bug, 4613913, was misguided in expecting all of the methods to throw the exception. You could make a case for doing so, but as I said that's a spec change that should have been made back then. Changing the spec now seems pointless - it gains nothing but introduces an incompatible behaviour change. Changing the test is the way to go. Thanks, David ----- > Thanks, > Serguei > > > > > On 3/29/18 17:12, David Holmes wrote: >> Daniil, >> >> Even as far back as 2007 there was concern that changing the current >> behaviour might break existing code. That has to be an even bigger >> concern now! >> >> Further the spec is sloppy here: >> >> " Once the eventRequest is deleted, no operations (for example, >> EventRequest.setEnabled(boolean)) are permitted." >> >> This is too loose. What is an "operation"? Is a query like isEnabled() >> really an "operation"? I would not consider it so. And if we can >> delete requests why is there no "isDeleted" query? The spec seems >> incomplete and too vague. >> >> To me this something that should have been clarified in the spec first >> and then the implementation brought into alignment. But that should >> have happened many years ago. Changing this now seems risky to me. >> >> This change in long standing behaviour also requires a CSR request if >> it is to proceed. >> >> David >> ----- >> >> >> On 30/03/2018 8:36 AM, Daniil Titov wrote: >>> Hi Serguei, >>> >>> Please review a new version of the fix that has these places corrected. >>> >>> Webreb: http://cr.openjdk.java.net/~dtitov/4613913/webrev.03 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> ?On 3/29/18, 11:46 AM, "serguei.spitsyn at oracle.com" >>> wrote: >>> >>> ???? Hi Daniil, >>> ???? ???? It looks good in general. >>> ???? One minor comment is that it would be nice to make a cleanup >>> ???? (as we already discussed) for all places like this: >>> ???? ?????? 202???????????? if (isEnabled() || deleted) { >>> ?????? 203???????????????? throw invalidState(); >>> ?????? 204???????????? } >>> ???? ???? As the isEnabled() now checks for deleted and throws the >>> invalidState() >>> ???? then we can simplify these fragments to be: >>> ???? ?????? 202???????????? if (isEnabled()) { >>> ?????? 203???????????????? throw invalidState(); >>> ?????? 204???????????? } >>> ???? ???? ???? Thanks, >>> ???? Serguei >>> ???? ???? ???? On 3/29/18 10:27, Daniil Titov wrote: >>> ???? > Please review the changes that ensure that no operation on >>> deleted com.sun.jdi.request.EventRequest objects are permitted as per >>> JDI specification for >>> com.sun.jdi.request.EventRequestManager.deleteEventRequest(com.sun.jdi.request.EventRequest) >>> method.? The fix makes the following 4 methods in class >>> com.sun.tools.jdi. EventRequestManagerImpl$EventRequestImpl to throw >>> com.sun.jdi.request.InvalidRequestStateException if the request is >>> deleted: >>> ???? >??? - getProperty() >>> ???? >??? - putProperty(Object, Object) >>> ???? >??? - suspendPolicy() >>> ???? >??? - isEnabled() >>> ???? > >>> ???? > Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> ???? > Webrev: http://cr.openjdk.java.net/~dtitov/4613913/webrev.02/ >>> ???? > >>> ???? > Best regards, >>> ???? > Daniil >>> ???? > >>> ???? > >>> >>> > From serguei.spitsyn at oracle.com Mon Apr 2 22:25:37 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn) Date: Mon, 02 Apr 2018 15:25:37 -0700 Subject: =?US-ASCII?Q?Re:_RFR_4613913:_Four_EventRequest_me?= =?US-ASCII?Q?thods_are_invokable_on_deleted=0D__request?= Message-ID: Hi David, Somehow I can see your message from my smart phone only... Thank you for for confirming that you are agree with this approach! Thanks, Serguei Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: David Holmes Date: 04/02/2018 15:02 (GMT-08:00) To: serguei.spitsyn at oracle.com, Daniil Titov , serviceability-dev at openjdk.java.net Subject: Re: RFR 4613913: Four EventRequest methods are invokable on deleted request Hi Serguei, On 3/04/2018 7:44 AM, serguei.spitsyn at oracle.com wrote: > Hi David and Daniil, > > > David, > > Thank you for raising this concern. > You are right. > > I've made a mistake when looked at the EventRequest.isEnabled() spec and > thought > that the following spec lines of the setEnbaled() belong to the isEnabled() > and other 3 methods as well: > > Throws: > |InvalidRequestStateException > | > - if this request has been deleted. > > In fact, the JDI spec for methods isEnabled(), getProperty(), > putProperty() and suspendPolicy() > does not say they can throw the InvalidRequestStateException. > > So, now I'd suggest to just relax the test checks by not expecting an > InvalidRequestStateException from isEnabled(), getProperty(), putProperty() > and suspendPolicy(). > > Would this approach resolve your concern? Yes. The semantics for these methods was established way back in 2000 under: https://bugs.openjdk.java.net/browse/JDK-4320478 I think this bug, 4613913, was misguided in expecting all of the methods to throw the exception. You could make a case for doing so, but as I said that's a spec change that should have been made back then. Changing the spec now seems pointless - it gains nothing but introduces an incompatible behaviour change. Changing the test is the way to go. Thanks, David ----- > Thanks, > Serguei > > > > > On 3/29/18 17:12, David Holmes wrote: >> Daniil, >> >> Even as far back as 2007 there was concern that changing the current >> behaviour might break existing code. That has to be an even bigger >> concern now! >> >> Further the spec is sloppy here: >> >> " Once the eventRequest is deleted, no operations (for example, >> EventRequest.setEnabled(boolean)) are permitted." >> >> This is too loose. What is an "operation"? Is a query like isEnabled() >> really an "operation"? I would not consider it so. And if we can >> delete requests why is there no "isDeleted" query? The spec seems >> incomplete and too vague. >> >> To me this something that should have been clarified in the spec first >> and then the implementation brought into alignment. But that should >> have happened many years ago. Changing this now seems risky to me. >> >> This change in long standing behaviour also requires a CSR request if >> it is to proceed. >> >> David >> ----- >> >> >> On 30/03/2018 8:36 AM, Daniil Titov wrote: >>> Hi Serguei, >>> >>> Please review a new version of the fix that has these places corrected. >>> >>> Webreb: http://cr.openjdk.java.net/~dtitov/4613913/webrev.03 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> ?On 3/29/18, 11:46 AM, "serguei.spitsyn at oracle.com" >>> wrote: >>> >>> Hi Daniil, >>> It looks good in general. >>> One minor comment is that it would be nice to make a cleanup >>> (as we already discussed) for all places like this: >>> 202 if (isEnabled() || deleted) { >>> 203 throw invalidState(); >>> 204 } >>> As the isEnabled() now checks for deleted and throws the >>> invalidState() >>> then we can simplify these fragments to be: >>> 202 if (isEnabled()) { >>> 203 throw invalidState(); >>> 204 } >>> Thanks, >>> Serguei >>> On 3/29/18 10:27, Daniil Titov wrote: >>> > Please review the changes that ensure that no operation on >>> deleted com.sun.jdi.request.EventRequest objects are permitted as per >>> JDI specification for >>> com.sun.jdi.request.EventRequestManager.deleteEventRequest(com.sun.jdi.request.EventRequest) >>> method. The fix makes the following 4 methods in class >>> com.sun.tools.jdi. EventRequestManagerImpl$EventRequestImpl to throw >>> com.sun.jdi.request.InvalidRequestStateException if the request is >>> deleted: >>> > - getProperty() >>> > - putProperty(Object, Object) >>> > - suspendPolicy() >>> > - isEnabled() >>> > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> > Webrev: http://cr.openjdk.java.net/~dtitov/4613913/webrev.02/ >>> > >>> > Best regards, >>> > Daniil >>> > >>> > >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Apr 3 01:52:44 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Apr 2018 18:52:44 -0700 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: References: Message-ID: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Hi Thomas, Added the serviceability-dev mailing list as it is a Serviceability area. The fix looks good to me. One question: ?Could you, please, post the sorted help output? ?It is interesting how does it look like when sorted. Thanks, Serguei On 3/28/18 13:08, Thomas St?fe wrote: > Hi all, > > may I get reviews for this tiny trivial change which causes jcmd help > output (the command list) to be sorted? > > bug: https://bugs.openjdk.java.net/browse/JDK-8200384 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help-sorted/webrev.00/webrev/ > > Thanks! > > Best Regards, Thomas From shafi.s.ahmad at oracle.com Tue Apr 3 05:59:44 2018 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Mon, 2 Apr 2018 22:59:44 -0700 (PDT) Subject: [8u] RFR for backport of "JDK-8165736: Error message should be shown when JVMTI agent cannot be attached" to jdk8u-dev In-Reply-To: References: <8c218a37-4a50-4b4f-847b-4c67e02b7866@default> <70a18b4a-a310-babe-1f41-c86100638457@oracle.com> Message-ID: <18d65438-9a9d-4664-939c-74a2af1da73b@default> Thank you David and Serguei. I have uploaded the webrev for my reference. http://cr.openjdk.java.net/~shshahma/8165736/hotspot.01/ Regards, Shafi > -----Original Message----- > From: Serguei Spitsyn > Sent: Tuesday, April 03, 2018 12:03 AM > To: David Holmes ; Shafi Ahmad > ; serviceability-dev at openjdk.java.net > Cc: Yasumasa Suenaga > Subject: Re: [8u] RFR for backport of "JDK-8165736: Error message should be > shown when JVMTI agent cannot be attached" to jdk8u-dev > > Hi Shafi, > > I agree with David. > Consider it reviewed if you add the initialization of ebuf. > > Thanks, > Serguei > > > On 3/31/18 00:24, David Holmes wrote: > > Hi Shafi, > > > > On 29/03/2018 7:11 PM, Shafi Ahmad wrote: > >> Hi, > >> > >> Please review the backport of ' JDK-8165736: Error message should be > >> shown when JVMTI agent cannot be attached' to jdk8u-dev. > >> Please note that this is not a clean backport because we can't not > >> backport native jtreg tests as? infrastructure of naive jtreg test > >> has been available since JDK 9. > > > > Ok. > > > >> webrev: http://cr.openjdk.java.net/~shshahma/8165736/ > >> jdk10 bug: https://bugs.openjdk.java.net/browse/JDK-8165736 > >> original patch pushed to jdk10: > >> http://hg.openjdk.java.net/jdk/jdk/rev/bc1cffa26561 > > > > src/share/vm/prims/jvmtiExport.cpp > > > > You missed the initalization of ebuf: > > > > +? char ebuf[1024] = {0}; > > > > Otherwise the functional backport seems okay. > > > > Thanks, > > David > > > >> Test:? Run jprt -testset hotspot, -testset core > >> > >> Regards, > >> Shafi > >> > From amit.sapre at oracle.com Tue Apr 3 10:08:14 2018 From: amit.sapre at oracle.com (Amit Sapre) Date: Tue, 3 Apr 2018 03:08:14 -0700 (PDT) Subject: RFR : JDK-8042215 - javax/management/remote/mandatory/connection/ReconnectTest.java NoSuchObjectException no such object in table Message-ID: <9851f5fa-86e5-4ee3-a303-44a90dd934d6@default> Hello, Please review changes for refactored test case As part of refactoring, 1) Removed iiop & jmxmp protocol related code 2) Added exception handling during connector connection. Webrev : http://cr.openjdk.java.net/~asapre/webrev/2018/JDK-8042215/webrev.00/ Bug ID : https://bugs.openjdk.java.net/browse/JDK-8042215 Thanks, Amit -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasuenag at gmail.com Tue Apr 3 12:37:21 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Tue, 3 Apr 2018 21:37:21 +0900 Subject: PING: RFR: 8199519: Several GC tests fails with: java.lang.NumberFormatException: Unparseable number: "-" In-Reply-To: <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> References: <6755303f-a1a0-da4f-e1e0-a1bcb0c72efd@gmail.com> <7809552d-dfa0-5f26-bd82-c13df7f45f5f@oracle.com> <85853429-a520-1782-40e4-e05776aa639d@oracle.com> <40b04f2e-1d6c-524e-ea4a-08c42fd41ee6@gmail.com> <93a1ffeb-4959-3bdb-cbe3-510c258129b6@oracle.com> <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> Message-ID: <33358f2d-4e01-7ccb-0f06-02b6828fe65b@gmail.com> PING: Could you review it? This change has been passed Mach5 test. >> > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ Thanks, Yasumasa On 2018/03/28 22:38, Stefan Johansson wrote: > Mach5 testing looks good. > > Can someone in the serviceability team do the second review? > > Cheers, > Stefan > > On 2018-03-28 13:32, Yasumasa Suenaga wrote: >> Thanks Stefan, >> I'm waiting for second reviewer. >> >> >> Yasumasa >> >> >> 2018?3?28?(?) 18:36 Stefan Johansson >: >> >> Hi Yasumasa, >> >> Local testing looks good and I've kicked of some additional Mach5 >> testing that will include these tests on all platforms. >> >> Cheers, >> Stefan >> >> On 2018-03-28 06:04, Yasumasa Suenaga wrote: >> > Hi Stefan, >> > >> > Thank you for sharing your report! >> > I could reproduce them on my VM. >> > >> > I've fixed them in new webrev, and it works fine on my environment. >> > Could you check again? >> > >> > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >> > >> > >> > Thanks, >> > >> > Yasumasa >> > >> > >> > >> > 2018-03-28 0:29 GMT+09:00 Stefan Johansson >: >> >> >> >> On 2018-03-27 16:44, Yasumasa Suenaga wrote: >> >>> Hi Stefan, >> >>> >> >>> On 2018/03/27 22:45, Stefan Johansson wrote: >> >>>> Hi Yasumasa, >> >>>> >> >>>> On 2018-03-27 10:56, Yasumasa Suenaga wrote: >> >>>>> Hi Stefan, >> >>>>> >> >>>>> Thank you for your comment. >> >>>>> I updated webrev: >> >>>>> >> >>>>>? ? ?webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.01/ >> >>>> I think the usage of Optional in Expression.setRequired(bool) is a bit >> >>>> unnecessary. It will create temporary objects and there is no benefit from >> >>>> just doing two simple if-statements. >> >>> >> >>> I fixed it in new webrev: >> >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.02/ >> >>> >> >>> >> >>>> I also ran this patch (and the one using forcibly) on my single core VM >> >>>> and realized that this fix will have to include some awk-file updates to >> >>>> make the test in test/jdk/sun/tools/jstat pass when Serial in chosen as the >> >>>> default collector. The tests in test/jdk/sun/tools/jstatd/ are fine. >> >>> >> >>> Can you share the failure report? >> >> It relates to all tests that display the the CGC and the CGCT columns, for >> >> example in jstatGCOutput1.sh: >> >>? ?S0C? ? S1C? ? S0U? ? S1U? ? ? EC? ? ? ?EU OC ?OU? ? ? ?MC? ? ?MU >> >> CCSC? ?CCSU? ?YGC? ? ?YGCT FGC? ? FGCT? ? CGC CGCT? ? ?GCT >> >> 256.0? 256.0? 254.0? ?0.0? ? 2176.0? ?1025.0 5504.0 920.5? ? 7168.0 >> >> 6839.7 768.0? 602.8? ? ? ?2? ? 0.007? ?0 0.000? ?- ? ? ? -? ? 0.007 >> >> >> >> The awk regex needs to be updated to handle '-' for these tests: >> >> test: sun/tools/jstat/jstatGcCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcMetaCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcNewCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcOldCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcOldOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> >> >>> If it occurs in jstatClassloadOutput1.sh, it relates to JDK-8173942. >> >>> >> >>> >> >>> Thanks, >> >>> >> >>> Yasumasa >> >>> >> >>> >> >>>> Thanks, >> >>>> Stefan >> >>>>>? ? ?submit-hs: mach5-one-ysuenaga-JDK-8199519-20180327-0652-16322 >> >>>>> >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Yasumasa >> >>>>> >> >>>>> >> >>>>> >> >>>>> 2018-03-27 0:03 GMT+09:00 Stefan Johansson >> >>>>> >: >> >>>>>> Hi Yasumasa, >> >>>>>> >> >>>>>> On 2018-03-22 11:35, Yasumasa Suenaga wrote: >> >>>>>>> Hi all, >> >>>>>>> >> >>>>>>> Please review this change: >> >>>>>>> >> >>>>>>>? ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8199519 >> >>>>>>> webrev: cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.00/ >> >>>>>> The fix seems to make things to work as expected. Manually tested it >> >>>>>> and >> >>>>>> Mach5 also looks good. >> >>>>>> >> >>>>>> I have some comments regarding the patch. I think 'forcibly' should be >> >>>>>> rename to something more descriptive. Naming is never easy but I think >> >>>>>> 'required' would be better, as in, this column is required and not >> >>>>>> allowed >> >>>>>> to print '-'. That would also render the code in >> >>>>>> ExpressionResolver.java to >> >>>>>> be: >> >>>>>>? ? ?return new Literal(isRequired ? 0.0d : Double.NaN); >> >>>>>> I think that also better explains why we return 0 instead of NaN. >> >>>>>> >> >>>>>> I would also like to see the forcibly/required state moved into the >> >>>>>> Expression it self, that way we don't have to pass it around but can >> >>>>>> instead >> >>>>>> do: >> >>>>>>? ? ?return new Literal(e.isRequired() ? 0.0d : Double.NaN); >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Stefan >> >>>>>> >> >>>>>> >> >>>>>>> After JDK-8153333, some jstat tests are failed because GCT in jstat >> >>>>>>> output >> >>>>>>> is dash (-) if garbage collector is not concurrent collector e.g. >> >>>>>>> Serial GC. >> >>>>>>> I fixed that GCT can be calculated correctly. >> >>>>>>> >> >>>>>>> This change has been tested on Mach5 by Stefan. >> >>>>>>> >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> Yasumasa >> >>>>>> >> > From bob.vandette at oracle.com Tue Apr 3 14:09:56 2018 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 3 Apr 2018 10:09:56 -0400 Subject: RFR: 81820709 - Container Awareness JEP Message-ID: Here is a first pass at an implementation of the Container Awareness JEP. This JEP adds an implementation of an internal API for the extraction of system metrics for processes running in Isolation Groups (Containers). The plan is to get the internal API integrated in JDK 11 with support for Linux x64 and then follow this work up with support for alternate platforms, the addition of a JMX MBean and Java Flight Recorder. JEP: https://bugs.openjdk.java.net/browse/JDK-8182070 JAVADOC: http://cr.openjdk.java.net/~bobv/8182070/v01/javadoc/jdk/internal/platform/Metrics.html WEBREV: http://cr.openjdk.java.net/~bobv/8182070/v01/webrev WEBREV including a Prototype MBEAN for exposing these Metrics: This prototype will not be integrated as part of this JEP. It?s for information only. http://cr.openjdk.java.net/~bobv/8182070/v01/mbean-proto/ This feature adds a new -XshowSetting option ?system? which displays the available system Metrics. % java -XshowSettings:system Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 24 CPUTime per Processor: [0]: 52805305 (ns) [1]: 70799492 (ns) [2]: 27449618 (ns) [3]: 12957734 (ns) [4]: 38382720 (ns) [5]: 20325731 (ns) [6]: 36374924 (ns) [7]: 40279640 (ns) [8]: 17557347 (ns) [9]: 19056675 (ns) [10]: 66185888 (ns) [11]: 56539480 (ns) [12]: 10009386 (ns) [13]: 19139797 (ns) [14]: 2257349 (ns) [15]: 8712468 (ns) [16]: 10306911 (ns) [17]: 9814800 (ns) [18]: 3516611 (ns) [19]: 747174 (ns) [20]: 4380756 (ns) [21]: 11803118 (ns) [22]: 1076297 (ns) [23]: 8069315 (ns) CPU Usage is: 550599580 (ns) CPU User Usage is: 36 (ticks) CPU System Usage is: 10 (ticks) CPU Period: 100000 CPU Quota: -1 CPU Shares: -1 CPU Number of Periods: 0 CPU Number of Throttled Periods: 0 CPU Throttled Time: 0 CPUSet Exclusive: false CPUSet Memory Exclusive: false List of Processors, 24 total: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 List of Effective Processors, 24 total: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 List of Memory Nodes, 2 total: 0 1 List of Available Memory Nodes, 2 total: 0 1 CPUSet Memory Pressure Enabled: false CPUSet Memory Pressure: 0.0 Memory Failed Count: 0 Memory Limit: Unlimited Memory Used: 43.31M Max Memory Used: 48.82M Memory Soft Limit: Unlimited Memory & Swap Failed Count: 0.00K Memory & Swap Limit: Unlimited Memory & Swap Used: 43.93M Max Memory & Swap Used: 48.82M Kernel Memory Failed Count: 0.00K Kernel Memory Limit: Unlimited Kernel Memory Used: 0.00K Kernel Max Memory Used: 0.00K TCP Memory Failed Count: 0.00K TCP Memory Limit: Unlimited TCP Memory Used: 0.00K TCP Max Memory Used: 0.00K Out Of Memory Killer Enabled: true BLKIO: Number of I/O Operations Completed: 42 BLKIO: Bytes Transferred from disk: 4923392 Bob Vandette From Derek.White at cavium.com Tue Apr 3 22:54:07 2018 From: Derek.White at cavium.com (White, Derek) Date: Tue, 3 Apr 2018 22:54:07 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Thanks JC, New patch applies cleanly. Compiles and runs (simple test programs) on aarch64. * Derek From: JC Beyler [mailto:jcbeyler at google.com] Sent: Monday, April 02, 2018 1:17 PM To: White, Derek Cc: Erik ?sterlund ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev Subject: Re: JDK-8171119: Low-Overhead Heap Profiling Hi Derek, I know there were a few things that went in that provoked a merge conflict. I worked on it and got it up to date. Sadly my lack of knowledge makes it a full rebase instead of keeping all the history. However, with a newly cloned jdk/hs you should now be able to use: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ The change you are referring to was done with the others so perhaps you were unlucky and I forgot it in a webrev and fixed it in another? I don't know but it's been there and I checked, it is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html I double checked that tlab_end_offset no longer appears in any architecture (as far as I can tell :)). Thanks for testing and let me know if you run into any other issues! Jc On Fri, Mar 30, 2018 at 4:24 PM White, Derek > wrote: Hi Jc, I?ve been having trouble getting your patch to apply correctly. I may have based it on the wrong version. In any case, I think there?s a missing update to macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where ?JavaThread::tlab_end_offset()? should become ?JavaThread::tlab_current_end_offset()?. This should correspond to the other port?s changes in templateTable_.cpp files. Thanks! - Derek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of JC Beyler Sent: Wednesday, March 28, 2018 11:43 AM To: Erik ?sterlund > Cc: serviceability-dev at openjdk.java.net; hotspot-compiler-dev > Subject: Re: JDK-8171119: Low-Overhead Heap Profiling Hi all, I've been working on deflaking the tests mostly and the wording in the JVMTI spec. Here is the two incremental webrevs: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ Here is the total webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ Here are the notes of this change: - Currently the tests pass 100 times in a row, I am working on checking if they pass 1000 times in a row. - The default sampling rate is set to 512k, this is what we use internally and having a default means that to enable the sampling with the default, the user only has to do a enable event/disable event via JVMTI (instead of enable + set sample rate). - I deprecated the code that was handling the fast path tlab refill if it happened since this is now deprecated - Though I saw that Graal is still using it so I have to see what needs to be done there exactly Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when the event system is turned on and the callback to the native agent is just empty. I got a 3% overhead with a 512k sampling rate with the code I put in the native side of my tests. Thanks and comments are appreciated, Jc On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > wrote: Hi all, The incremental webrev update is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ The full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ Major change here is: - I've removed the heapMonitoring.cpp code in favor of just having the sampling events as per Serguei's request; I still have to do some overhead measurements but the tests prove the concept can work - Most of the tlab code is unchanged, the only major part is that now things get sent off to event collectors when used and enabled. - Added the interpreter collectors to handle interpreter execution - Updated the name from SetTlabHeapSampling to SetHeapSampling to be more generic - Added a mutex for the thread sampling so that we can initialize an internal static array safely - Ported the tests from the old system to this new one I've also updated the JEP and CSR to reflect these changes: https://bugs.openjdk.java.net/browse/JDK-8194905 https://bugs.openjdk.java.net/browse/JDK-8171119 In order to make this have some forward progress, I've removed the heap sampling code entirely and now rely entirely on the event sampling system. The tests reflect this by using a simplified implementation of what an agent could do: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c (Search for anything mentioning event_storage). I have not taken the time to port the whole code we had originally in heapMonitoring to this. I hesitate only because that code was in C++, I'd have to port it to C and this is for tests so perhaps what I have now is good enough? As far as testing goes, I've ported all the relevant tests and then added a few: - Turning the system on/off - Testing using various GCs - Testing using the interpreter - Testing the sampling rate - Testing with objects and arrays - Testing with various threads Finally, as overhead goes, I have the numbers of the system off vs a clean build and I have 0% overhead, which is what we'd want. This was using the Dacapo benchmarks. I am now preparing to run a version with the events on using dacapo and will report back here. Any comments are welcome :) Jc On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > wrote: Hi all, I apologize for the delay but I wanted to add an event system and that took a bit longer than expected and I also reworked the code to take into account the deprecation of FastTLABRefill. This update has four parts: A) I moved the implementation from Thread to ThreadHeapSampler inside of Thread. Would you prefer it as a pointer inside of Thread or like this works for you? Second question would be would you rather have an association outside of Thread altogether that tries to remember when threads are live and then we would have something like: ThreadHeapSampler::get_sampling_size(this_thread); I worry about the overhead of this but perhaps it is not too too bad? B) I also have been working on the Allocation event system that sends out a notification at each sampled event. This will be practical when wanting to do something at the allocation point. I'm also looking at if the whole heapMonitoring code could not reside in the agent code and not in the JDK. I'm not convinced but I'm talking to Serguei about it to see/assess :) - Also added two tests for the new event subsystem C) Removed the slow_path fields inside the TLAB code since now FastTLABRefill is deprecated D) Updated the JVMTI documentation and specification for the methods. So the incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ and the full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 I believe I have updated the various JIRA issues that track this :) Thanks for your input, Jc On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > wrote: Hi Erik, I inlined my answers, which the last one seems to answer Robbin's concerns about the same thing (adding things to Thread). On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > wrote: Hi JC, Comments are inlined below. On 2018-02-13 06:18, JC Beyler wrote: Hi Erik, Thanks for your answers, I've now inlined my own answers/comments. I've done a new webrev here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ The incremental is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ Note to all: - I've been integrating changes from Erin/Serguei/David comments so this webrev incremental is a bit an answer to all comments in one. I apologize for that :) On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > wrote: Hi JC, Sorry for the delayed reply. Inlined answers: On 2018-02-06 00:04, JC Beyler wrote: Hi Erik, (Renaming this to be folded into the newly renamed thread :)) First off, thanks a lot for reviewing the webrev! I appreciate it! I updated the webrev to: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ And the incremental one is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ It contains: - The change for since from 9 to 11 for the jvmti.xml - The use of the OrderAccess for initialized - Clearing the oop I also have inlined my answers to your comments. The biggest question will come from the multiple *_end variables. A bit of the logic there is due to handling the slow path refill vs fast path refill and checking that the rug was not pulled underneath the slowpath. I believe that a previous comment was that TlabFastRefill was going to be deprecated. If this is true, we could revert this code a bit and just do a : if TlabFastRefill is enabled, disable this. And then deprecate that when TlabFastRefill is deprecated. This might simplify this webrev and I can work on a follow-up that either: removes TlabFastRefill if Robbin does not have the time to do it or add the support to the assembly side to handle this correctly. What do you think? I support removing TlabFastRefill, but I think it is good to not depend on that happening first. I'm slowly pushing on the FastTLABRefill (https://bugs.openjdk.java.net/browse/JDK-8194084), I agree on keeping both separate for now though so that we can think of both differently Now, below, inlined are my answers: On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund > wrote: Hi JC, Hope I am reviewing the right version of your work. Here goes... src/hotspot/share/gc/shared/collectedHeap.inline.hpp: 159 AllocTracer::send_allocation_outside_tlab(klass, result, size * HeapWordSize, THREAD); 160 161 THREAD->tlab().handle_sample(THREAD, result, size); 162 return result; 163 } Should not call tlab()->X without checking if (UseTLAB) IMO. Done! More about this later. src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: So first of all, there seems to quite a few ends. There is an "end", a "hard end", a "slow path end", and an "actual end". Moreover, it seems like the "hard end" is actually further away than the "actual end". So the "hard end" seems like more of a "really definitely actual end" or something. I don't know about you, but I think it looks kind of messy. In particular, I don't feel like the name "actual end" reflects what it represents, especially when there is another end that is behind the "actual end". 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { 414 // Did a fast TLAB refill occur? 415 if (_slow_path_end != _end) { 416 // Fix up the actual end to be now the end of this TLAB. 417 _slow_path_end = _end; 418 _actual_end = _end; 419 } 420 421 return _actual_end + alignment_reserve(); 422 } I really do not like making getters unexpectedly have these kind of side effects. It is not expected that when you ask for the "hard end", you implicitly update the "slow path end" and "actual end" to new values. As I said, a lot of this is due to the FastTlabRefill. If I make this not supporting FastTlabRefill, this goes away. The reason the system needs to update itself at the get is that you only know at that get if things have shifted underneath the tlab slow path. I am not sure of really better names (naming is hard!), perhaps we could do these names: - current_tlab_end // Either the allocated tlab end or a sampling point - last_allocation_address // The end of the tlab allocation - last_slowpath_allocated_end // In case a fast refill occurred the end might have changed, this is to remember slow vs fast past refills the hard_end method can be renamed to something like: tlab_end_pointer() // The end of the lab including a bit of alignment reserved bytes Those names sound better to me. Could you please provide a mapping from the old names to the new names so I understand which one is which please? This is my current guess of what you are proposing: end -> current_tlab_end actual_end -> last_allocation_address slow_path_end -> last_slowpath_allocated_end hard_end -> tlab_end_pointer Yes that is correct, that was what I was proposing. I would prefer this naming: end -> slow_path_end // the end for taking a slow path; either due to sampling or refilling actual_end -> allocation_end // the end for allocations slow_path_end -> last_slow_path_end // last address for slow_path_end (as opposed to allocation_end) hard_end -> reserved_end // the end of the reserved space of the TLAB About setting things in the getter... that still seems like a very unpleasant thing to me. It would be better to inspect the call hierarchy and explicitly update the ends where they need updating, and assert in the getter that they are in sync, rather than implicitly setting various ends as a surprising side effect in a getter. It looks like the call hierarchy is very small. With my new naming convention, reserved_end() would presumably return _allocation_end + alignment_reserve(), and have an assert checking that _allocation_end == _last_slow_path_allocation_end, complaining that this invariant must hold, and that a caller to this function, such as make_parsable(), must first explicitly synchronize the ends as required, to honor that invariant. I've renamed the variables to how you preferred it except for the _end one. I did: current_end last_allocation_address tlab_end_ptr The reason is that the architecture dependent code use the thread.hpp API and it already has tlab included into the name so it becomes tlab_current_end (which is better that tlab_current_tlab_end in my opinion). I also moved the update into a separate method with a TODO that says to remove it when FastTLABRefill is deprecated This looks a lot better now. Thanks. Note that the following comment now needs updating accordingly in threadLocalAllocBuffer.hpp: 41 // Heap sampling is performed via the end/actual_end fields. 42 // actual_end contains the real end of the tlab allocation, 43 // whereas end can be set to an arbitrary spot in the tlab to 44 // trip the return and sample the allocation. 45 // slow_path_end is used to track if a fast tlab refill occured 46 // between slowpath calls. There might be other comments too, I have not looked in detail. This was the only spot that still had an actual_end, I fixed it now. I'll do a sweep to double check other comments. Not sure it's better but before updating the webrev, I wanted to try to get input/consensus :) (Note hard_end was always further off than end). src/hotspot/share/prims/jvmti.xml: 10357 10358 10359 Can sample the heap. 10360 If this capability is enabled then the heap sampling methods can be called. 10361 10362 Looks like this capability should not be "since 9" if it gets integrated now. Updated now to 11, crossing my fingers :) src/hotspot/share/runtime/heapMonitoring.cpp: 448 if (is_alive->do_object_b(value)) { 449 // Update the oop to point to the new object if it is still alive. 450 f->do_oop(&(trace.obj)); 451 452 // Copy the old trace, if it is still live. 453 _allocated_traces->at_put(curr_pos++, trace); 454 455 // Store the live trace in a cache, to be served up on /heapz. 456 _traces_on_last_full_gc->append(trace); 457 458 count++; 459 } else { 460 // If the old trace is no longer live, add it to the list of 461 // recently collected garbage. 462 store_garbage_trace(trace); 463 } In the case where the oop was not live, I would like it to be explicitly cleared. Done I think how you wanted it. Let me know because I'm not familiar with the RootAccess API. I'm unclear if I'm doing this right or not so reviews of these parts are highly appreciated. Robbin had talked of perhaps later pushing this all into a OopStorage, should I do this now do you think? Or can that wait a second webrev later down the road? I think using handles can and should be done later. You can use the Access API now. I noticed that you are missing an #include "oops/access.inline.hpp" in your heapMonitoring.cpp file. The missing header is there for me so I don't know, I made sure it is present in the latest webrev. Sorry about that. + Did I clear it the way you wanted me to or were you thinking of something else? That is precisely how I wanted it to be cleared. Thanks. + Final question here, seems like if I were to want to not do the f->do_oop directly on the trace.obj, I'd need to do something like: f->do_oop(&value); ... trace->store_oop(value); to update the oop internally. Is that right/is that one of the advantages of going to the Oopstorage sooner than later? I think you really want to do the do_oop on the root directly. Is there a particular reason why you would not want to do that? Otherwise, yes - the benefit with using the handle approach is that you do not need to call do_oop explicitly in your code. There is no reason except that now we have a load_oop and a get_oop_addr, I was not sure what you would think of that. That's fine. Also I see a lot of concurrent-looking use of the following field: 267 volatile bool _initialized; Please note that the "volatile" qualifier does not help with reordering here. Reordering between volatile and non-volatile fields is completely free for both compiler and hardware, except for windows with MSVC, where volatile semantics is defined to use acquire/release semantics, and the hardware is TSO. But for the general case, I would expect this field to be stored with OrderAccess::release_store and loaded with OrderAccess::load_acquire. Otherwise it is not thread safe. Because everything is behind a mutex, I wasn't really worried about this. I have a test that has multiple threads trying to hit this corner case and it passes. However, to be paranoid, I updated it to using the OrderAccess API now, thanks! Let me know what you think there too! If it is indeed always supposed to be read and written under a mutex, then I would strongly prefer to have it accessed as a normal non-volatile member, and have an assertion that given lock is held or we are in a safepoint, as we do in many other places. Something like this: assert(HeapMonitorStorage_lock->owned_by_self() || (SafepointSynchronize::is_at_safepoint() && Thread::current()->is_VM_thread()), "this should not be accessed concurrently"); It would be confusing to people reading the code if there are uses of OrderAccess that are actually always protected under a mutex. Thank you for the exact example to be put in the code! I put it around each access/assignment of the _initialized method and found one case where yes you can touch it and not have the lock. It actually is "ok" because you don't act on the storage until later and only when you really want to modify the storage (see the object_alloc_do_sample method which calls the add_trace method). But, because of this, I'm going to put the OrderAccess here, I'll do some performance numbers later and if there are issues, I might add a "unsafe" read and a "safe" one to make it explicit to the reader. But I don't think it will come to that. Okay. This double return in heapMonitoring.cpp looks wrong: 283 bool initialized() { 284 return OrderAccess::load_acquire(&_initialized) != 0; 285 return _initialized; 286 } Since you said object_alloc_do_sample() is the only place where you do not hold the mutex while reading initialized(), I had a closer look at that. It looks like in its current shape, the lack of a mutex may lead to a memory leak. In particular, it first checks if (initialized()). Let's assume this is now true. It then allocates a bunch of stuff, and checks if the number of frames were over 0. If they were, it calls StackTraceStorage::storage()->add_trace() seemingly hoping that after grabbing the lock in there, initialized() will still return true. But it could now return false and skip doing anything, in which case the allocated stuff will never be freed. I fixed this now by making add_trace return a boolean and checking for that. It will be in the next webrev. Thanks, the truth is that in our implementation the system is always on or off, so this never really occurs :). In this version though, that is not true and it's important to handle so thanks again! So the analysis seems to be that _initialized is only used outside of the mutex in once instance, where it is used to perform double-checked locking, that actually causes a memory leak. I am not proposing how to fix that, just raising the issue. If you still want to perform this double-checked locking somehow, then the use of acquire/release still seems odd. Because the memory ordering restrictions of it never comes into play in this particular case. If it ever did, then the use of destroy_stuff(); release_store(_initialized, 0) would be broken anyway as that would imply that whatever concurrent reader there ever was would after reading _initialized with load_acquire() could *never* read the data that is concurrently destroyed anyway. I would be biased to think that RawAccess::load/store looks like a more appropriate solution, given that the memory leak issue is resolved. I do not know how painful it would be to not perform this double-checked locking. So I agree with this entirely. I looked also a bit more and the difference and code really stems from our internal version. In this version however, there are actually a lot of things going on that I did not go entirely through in my head but this comment made me ponder a bit more on it. Since every object_alloc_do_sample is protected by a check to HeapMonitoring::enabled(), there is only a small chance that the call is happening when things have been disabled. So there is no real need to do a first check on the initialized, it is a rare occurence that a call happens to object_alloc_do_sample and the initialized of the storage returns false. (By the way, even if you did call object_alloc_do_sample without looking at HeapMonitoring::enabled(), that would be ok too. You would gather the stacktrace and get nowhere at the add_trace call, which would return false; so though not optimal performance wise, nothing would break). Furthermore, the add_trace is really the moment of no return and we have the mutex lock and then the initialized check. So, in the end, I did two things: I removed that first check and then I removed the OrderAccess for the storage initialized. I think now I have a better grasp and understanding why it was done in our code and why it is not needed here. Thanks for pointing it out :). This now still passes my JTREG tests, especially the threaded one. As a kind of meta comment, I wonder if it would make sense to add sampling for non-TLAB allocations. Seems like if someone is rapidly allocating a whole bunch of 1 MB objects that never fit in a TLAB, I might still be interested in seeing that in my traces, and not get surprised that the allocation rate is very high yet not showing up in any profiles. That is handled by the handle_sample where you wanted me to put a UseTlab because you hit that case if the allocation is too big. I see. It was not obvious to me that non-TLAB sampling is done in the TLAB class. That seems like an abstraction crime. What I wanted in my previous comment was that we do not call into the TLAB when we are not using TLABs. If there is sampling logic in the TLAB that is used for something else than TLABs, then it seems like that logic simply does not belong inside of the TLAB. It should be moved out of the TLAB, and instead have the TLAB call this common abstraction that makes sense. So in the incremental version: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still a "crime". The reason is that the system has to have the bytes_until_sample on a per-thread level and it made "sense" to have it with the TLAB implementation. Also, I was not sure how people felt about adding something to the thread instance instead. Do you think it fits better at the Thread level? I can see how difficult it is to make it happen there and add some logic there. Let me know what you think. We have an unfortunate situation where everyone that has some fields that are thread local tend to dump them right into Thread, making the size and complexity of Thread grow as it becomes tightly coupled with various unrelated subsystems. It would be desirable to have a separate class for this instead that encapsulates the sampling logic. That class could possibly reside in Thread though as a value object of Thread. I imagined that would be the case but was not sure. I will look at the example that Robbin is talking about (ThreadSMR) and will see how to refactor my code to use that. Thanks again for your help, Jc Hope I have answered your questions and that my feedback makes sense to you. You have and thank you for them, I think we are getting to a cleaner implementation and things are getting better and more readable :) Yes it is getting better. Thanks, /Erik Thanks for your help! Jc Thanks, /Erik I double checked by changing the test http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java to use a smaller Tlab (2048) and made the object bigger and it goes through that and passes. Thanks again for your review and I look forward to your pointers for the questions I now have raised! Jc Thanks, /Erik On 2018-01-26 06:45, JC Beyler wrote: Thanks Robbin for the reviews :) The new full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ The incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ I inlined my answers: On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn > wrote: Hi JC, great to see another revision! #### heapMonitoring.cpp StackTraceData should not contain the oop for 'safety' reasons. When StackTraceData is moved from _allocated_traces: L452 store_garbage_trace(trace); it contains a dead oop. _allocated_traces could instead be a tupel of oop and StackTraceData thus dead oops are not kept. Done I used inheritance to make the copier work regardless but the idea is the same. You should use the new Access API for loading the oop, something like this: RootAccess::load(...) I don't think you need to use Access API for clearing the oop, but it would look nicer. And you shouldn't probably be using: Universe::heap()->is_in_reserved(value) I am unfamiliar with this but I think I did do it like you wanted me to (all tests pass so that's a start). I'm not sure how to clear the oop exactly, is there somewhere that does that, which I can use to do the same? I removed the is_in_reserved, this came from our internal version, I don't know why it was there but my tests work without so I removed it :) The lock: L424 MutexLocker mu(HeapMonitorStorage_lock); Is not needed as far as I can see. weak_oops_do is called in a safepoint, no TLAB allocation can happen and JVMTI thread can't access these data-structures. Is there something more to this lock that I'm missing? Since a thread can call the JVMTI getLiveTraces (or any of the other ones), it can get to the point of trying to copying the _allocated_traces. I imagine it is possible that this is happening during a GC or that it can be started and a GC happens afterwards. Therefore, it seems to me that you want this protected, no? #### You have 6 files without any changes in them (any more): g1CollectedHeap.cpp psMarkSweep.cpp psParallelCompact.cpp genCollectedHeap.cpp referenceProcessor.cpp thread.hpp Done. #### I have not looked closely, but is it possible to hide heap sampling in AllocTracer ? (with some minor changes to the AllocTracer API) I am imagining that you are saying to move the code that does the sampling code (change the tlab end, do the call to HeapMonitoring, etc.) into the AllocTracer code itself? I think that is right and I'll look if that is possible and prepare a webrev to show what would be needed to make that happen. #### Minor nit, when declaring pointer there is a little mix of having the pointer adjacent by type name and data name. (Most hotspot code is by type name) E.g. heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... heapMonitoring.cpp:733 Method* m = vfst.method(); (not just this file) Done! #### HeapMonitorThreadOnOffTest.java:77 I would make g_tmp volatile, otherwise the assignment in loop may theoretical be skipped. Also done! Thanks again! Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Wed Apr 4 12:34:36 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 4 Apr 2018 12:34:36 +0000 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: Hi Thomas, I like the fix, too. Maybe you can add example output before and after sorting to the bug. Thanks Christoph > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > bounces at openjdk.java.net] On Behalf Of serguei.spitsyn at oracle.com > Sent: Dienstag, 3. April 2018 03:53 > To: Thomas St?fe ; Hotspot dev runtime > ; serviceability- > dev at openjdk.java.net > Subject: Re: RFR(xxs): 8200384: jcmd help output should be sorted > > Hi Thomas, > > Added the serviceability-dev mailing list as it is a Serviceability area. > > The fix looks good to me. > One question: > ?Could you, please, post the sorted help output? > ?It is interesting how does it look like when sorted. > > Thanks, > Serguei > > > On 3/28/18 13:08, Thomas St?fe wrote: > > Hi all, > > > > may I get reviews for this tiny trivial change which causes jcmd help > > output (the command list) to be sorted? > > > > bug: https://bugs.openjdk.java.net/browse/JDK-8200384 > > webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help- > sorted/webrev.00/webrev/ > > > > Thanks! > > > > Best Regards, Thomas From daniil.x.titov at oracle.com Wed Apr 4 17:45:14 2018 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 04 Apr 2018 10:45:14 -0700 Subject: CANCELED: RFR 4613913: Four EventRequest methods are invokable on deleted request Message-ID: <3C232EEE-396C-4BB7-9153-28C1C641C10A@oracle.com> Hello, Based on the discussion below I am canceling this review. The issue will be addressed by changing the test that resides out of the open repository. Thanks! Best regards, Daniil ?On 4/2/18, 3:02 PM, "David Holmes" wrote: Hi Serguei, On 3/04/2018 7:44 AM, serguei.spitsyn at oracle.com wrote: > Hi David and Daniil, > > > David, > > Thank you for raising this concern. > You are right. > > I've made a mistake when looked at the EventRequest.isEnabled() spec and > thought > that the following spec lines of the setEnbaled() belong to the isEnabled() > and other 3 methods as well: > > Throws: > |InvalidRequestStateException > | > - if this request has been deleted. > > In fact, the JDI spec for methods isEnabled(), getProperty(), > putProperty() and suspendPolicy() > does not say they can throw the InvalidRequestStateException. > > So, now I'd suggest to just relax the test checks by not expecting an > InvalidRequestStateException from isEnabled(), getProperty(), putProperty() > and suspendPolicy(). > > Would this approach resolve your concern? Yes. The semantics for these methods was established way back in 2000 under: https://bugs.openjdk.java.net/browse/JDK-4320478 I think this bug, 4613913, was misguided in expecting all of the methods to throw the exception. You could make a case for doing so, but as I said that's a spec change that should have been made back then. Changing the spec now seems pointless - it gains nothing but introduces an incompatible behaviour change. Changing the test is the way to go. Thanks, David ----- > Thanks, > Serguei > > > > > On 3/29/18 17:12, David Holmes wrote: >> Daniil, >> >> Even as far back as 2007 there was concern that changing the current >> behaviour might break existing code. That has to be an even bigger >> concern now! >> >> Further the spec is sloppy here: >> >> " Once the eventRequest is deleted, no operations (for example, >> EventRequest.setEnabled(boolean)) are permitted." >> >> This is too loose. What is an "operation"? Is a query like isEnabled() >> really an "operation"? I would not consider it so. And if we can >> delete requests why is there no "isDeleted" query? The spec seems >> incomplete and too vague. >> >> To me this something that should have been clarified in the spec first >> and then the implementation brought into alignment. But that should >> have happened many years ago. Changing this now seems risky to me. >> >> This change in long standing behaviour also requires a CSR request if >> it is to proceed. >> >> David >> ----- >> >> >> On 30/03/2018 8:36 AM, Daniil Titov wrote: >>> Hi Serguei, >>> >>> Please review a new version of the fix that has these places corrected. >>> >>> Webreb: http://cr.openjdk.java.net/~dtitov/4613913/webrev.03 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> ?On 3/29/18, 11:46 AM, "serguei.spitsyn at oracle.com" >>> wrote: >>> >>> Hi Daniil, >>> It looks good in general. >>> One minor comment is that it would be nice to make a cleanup >>> (as we already discussed) for all places like this: >>> 202 if (isEnabled() || deleted) { >>> 203 throw invalidState(); >>> 204 } >>> As the isEnabled() now checks for deleted and throws the >>> invalidState() >>> then we can simplify these fragments to be: >>> 202 if (isEnabled()) { >>> 203 throw invalidState(); >>> 204 } >>> Thanks, >>> Serguei >>> On 3/29/18 10:27, Daniil Titov wrote: >>> > Please review the changes that ensure that no operation on >>> deleted com.sun.jdi.request.EventRequest objects are permitted as per >>> JDI specification for >>> com.sun.jdi.request.EventRequestManager.deleteEventRequest(com.sun.jdi.request.EventRequest) >>> method. The fix makes the following 4 methods in class >>> com.sun.tools.jdi. EventRequestManagerImpl$EventRequestImpl to throw >>> com.sun.jdi.request.InvalidRequestStateException if the request is >>> deleted: >>> > - getProperty() >>> > - putProperty(Object, Object) >>> > - suspendPolicy() >>> > - isEnabled() >>> > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> > Webrev: http://cr.openjdk.java.net/~dtitov/4613913/webrev.02/ >>> > >>> > Best regards, >>> > Daniil >>> > >>> > >>> >>> > From gary.adams at oracle.com Wed Apr 4 18:18:35 2018 From: gary.adams at oracle.com (Gary Adams) Date: Wed, 04 Apr 2018 14:18:35 -0400 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 Message-ID: <5AC516FB.9010101@oracle.com> Getting the sources ready for the next Solaris developer studio toolchain. Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ This update conditionally disables some new error checks, if the new toolchain is used. From serguei.spitsyn at oracle.com Wed Apr 4 18:41:07 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 4 Apr 2018 11:41:07 -0700 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: <5AC516FB.9010101@oracle.com> References: <5AC516FB.9010101@oracle.com> Message-ID: <3d192086-7fdd-4bc3-337a-6e2e34b1e99f@oracle.com> Hi Gary, It looks reasonable. I'm not very familiar with the concrete SolStudio versions. Thanks, Serguei On 4/4/18 11:18, Gary Adams wrote: > Getting the sources ready for the next Solaris developer studio > toolchain. > > ? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 > ? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ > > This update conditionally disables some new error checks, if the > new toolchain is used. From david.holmes at oracle.com Thu Apr 5 00:00:02 2018 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Apr 2018 10:00:02 +1000 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: <5AC516FB.9010101@oracle.com> References: <5AC516FB.9010101@oracle.com> Message-ID: Hi Gary, On 5/04/2018 4:18 AM, Gary Adams wrote: > Getting the sources ready for the next Solaris developer studio toolchain. > > ? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 > ? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ > > This update conditionally disables some new error checks, if the > new toolchain is used. This looks odd: 231 DISABLED_WARNINGS_solstudio := $(DISABLED_WARNINGS_solstudio), \ as it is self-referential. Should you use a different variable name? Is there an issue if this variable has not been set? Otherwise seems okay. Thanks, David From erik.joelsson at oracle.com Thu Apr 5 00:05:43 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 4 Apr 2018 17:05:43 -0700 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: References: <5AC516FB.9010101@oracle.com> Message-ID: On 2018-04-04 17:00, David Holmes wrote: > Hi Gary, > > On 5/04/2018 4:18 AM, Gary Adams wrote: >> Getting the sources ready for the next Solaris developer studio >> toolchain. >> >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 >> ?? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ >> >> This update conditionally disables some new error checks, if the >> new toolchain is used. > > This looks odd: > > ?231???? DISABLED_WARNINGS_solstudio := $(DISABLED_WARNINGS_solstudio), \ > > as it is self-referential. Should you use a different variable name? > Is there an issue if this variable has not been set? > This construct may look a bit weird but is fine. The named parameter will get translated behind the scenes to BUILD_LIBJVM_DISABLED_WARNINGS_solstudio so it's not actually self referential (and even if it was, it would still work as expected, even if it looks a bit weird). /Erik > Otherwise seems okay. > > Thanks, > David From magnus.ihse.bursie at oracle.com Thu Apr 5 12:35:45 2018 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 5 Apr 2018 14:35:45 +0200 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: <5AC516FB.9010101@oracle.com> References: <5AC516FB.9010101@oracle.com> Message-ID: <904f24c7-0fb7-a99a-669e-fcd3277291f8@oracle.com> On 2018-04-04 20:18, Gary Adams wrote: > Getting the sources ready for the next Solaris developer studio > toolchain. > > ? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 > ? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ > > This update conditionally disables some new error checks, if the > new toolchain is used. Looks good to me. /Magnus From boris.ulasevich at bell-sw.com Thu Apr 5 11:54:01 2018 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 5 Apr 2018 14:54:01 +0300 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi JC, I have just checked on arm32: your patch compiles and runs ok. As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not correspond to actual library name: libHeapMonitorTest.c -> libHeapMonitorTest.so Boris On 04.04.2018 01:54, White, Derek wrote: > Thanks JC, > > New patch applies cleanly. Compiles and runs (simple test programs) on > aarch64. > > * Derek > > *From:* JC Beyler [mailto:jcbeyler at google.com] > *Sent:* Monday, April 02, 2018 1:17 PM > *To:* White, Derek > *Cc:* Erik ?sterlund ; > serviceability-dev at openjdk.java.net; hotspot-compiler-dev > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > Hi Derek, > > I know there were a few things that went in that provoked a merge > conflict. I worked on it and got it up to date. Sadly my lack of > knowledge makes it a full rebase instead of keeping all the history. > However, with a newly cloned jdk/hs you should now be able to use: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ > > The change you are referring to was done with the others so perhaps you > were unlucky and I forgot it in a webrev and fixed it in another? I > don't know but it's been there and I checked, it is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html > > I double checked that tlab_end_offset no longer appears in any > architecture (as far as I can tell :)). > > Thanks for testing and let me know if you run into any other issues! > > Jc > > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > wrote: > > Hi Jc, > > I?ve been having trouble getting your patch to apply correctly. I > may have based it on the wrong version. > > In any case, I think there?s a missing update to > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), > where ?JavaThread::tlab_end_offset()? should become > ?JavaThread::tlab_current_end_offset()?. > > This should correspond to the other port?s changes in > templateTable_.cpp files. > > Thanks! > - Derek > > *From:* hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > ] *On Behalf > Of *JC Beyler > *Sent:* Wednesday, March 28, 2018 11:43 AM > *To:* Erik ?sterlund > > *Cc:* serviceability-dev at openjdk.java.net > ; hotspot-compiler-dev > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > Hi all, > > I've been working on deflaking the tests mostly and the wording in > the JVMTI spec. > > Here is the two incremental webrevs: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > Here is the total webrev: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > Here are the notes of this change: > > ? - Currently the tests pass 100 times in a row, I am working on > checking if they pass 1000 times in a row. > > ? - The default sampling rate is set to 512k, this is what we use > internally and having a default means that to enable the sampling > with the default, the user only has to do a enable event/disable > event via JVMTI (instead of enable?+ set sample rate). > > ? - I deprecated the code that was handling the fast path tlab > refill if it happened since this is now deprecated > > ? ? ? - Though I saw that Graal is still using it so I have to see > what needs to be done there exactly > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for > when the event system is turned on and the callback to the native > agent is just empty. I got a 3% overhead with a 512k sampling rate > with the code I put in the native side of my tests. > > Thanks and comments are appreciated, > > Jc > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > wrote: > > Hi all, > > The incremental webrev update is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > The full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > Major change here is: > > ? - I've removed the heapMonitoring.cpp code in favor of just > having the sampling events as per Serguei's request; I still > have to do some overhead measurements but the tests prove the > concept can work > > ? ? ? ?- Most of the tlab code is unchanged, the only major > part is that now things get sent off to event collectors when > used and enabled. > > ? - Added the interpreter collectors to handle interpreter > execution > > ? - Updated the name from SetTlabHeapSampling to > SetHeapSampling to be more generic > > ? - Added a mutex for the thread sampling so that we can > initialize an internal static array safely > > ? - Ported the tests from the old system to this new one > > I've also updated the JEP and CSR to reflect these changes: > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > In order to make this have some forward progress, I've removed > the heap sampling code entirely and now rely entirely on the > event sampling system. The tests reflect this by using a > simplified implementation of what an agent could do: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > (Search for anything mentioning event_storage). > > I have not taken the time to port the whole code we had > originally in heapMonitoring to this. I hesitate only because > that code was in C++, I'd have to port it to C and this is for > tests so perhaps what I have now is good enough? > > As far as testing goes, I've ported all the relevant tests and > then added a few: > > ? ?- Turning the system on/off > > ? ?- Testing using various GCs > > ? ?- Testing using the interpreter > > ? ?- Testing the sampling rate > > ? ?- Testing with objects and arrays > > ? ?- Testing with various threads > > Finally, as overhead goes, I have the numbers of the system off > vs a clean build and I have 0% overhead, which is what we'd > want. This was using the Dacapo benchmarks. I am now preparing > to run a version with the events on using dacapo and will report > back here. > > Any comments are welcome :) > > Jc > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > wrote: > > Hi all, > > I apologize for the delay but I wanted to add an event > system and that took a bit longer than expected and I also > reworked the code to take into account the deprecation of > FastTLABRefill. > > This update has four parts: > > A) I moved the implementation from Thread to > ThreadHeapSampler inside of Thread. Would you prefer it as a > pointer inside of Thread or like this works for you? Second > question would be would you rather have an association > outside of Thread altogether that tries to remember when > threads are live and then we would have something like: > > ThreadHeapSampler::get_sampling_size(this_thread); > > I worry about the overhead of this but perhaps it is not too > too bad? > > B) I also have been working on the Allocation event system > that sends out a notification at each sampled event. This > will be practical when wanting to do something at the > allocation point. I'm also looking at if the whole > heapMonitoring code could not reside in the agent code and > not in the JDK. I'm not convinced but I'm talking to Serguei > about it to see/assess :) > > ? ?- Also added two tests for the new event subsystem > > C) Removed the slow_path fields inside the TLAB code since > now FastTLABRefill is deprecated > > D) Updated the JVMTI documentation and specification for the > methods. > > So the incremental webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > and the full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > I believe I have updated the various JIRA issues that track > this :) > > Thanks for your input, > > Jc > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > > wrote: > > Hi Erik, > > I inlined my answers, which the last one seems to answer > Robbin's concerns about the same thing (adding things to > Thread). > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > > wrote: > > Hi JC, > > Comments are inlined below. > > On 2018-02-13 06:18, JC Beyler wrote: > > Hi Erik, > > Thanks for your answers, I've now inlined my own > answers/comments. > > I've done a new webrev here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > The incremental is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > Note to all: > > ? - I've been integrating changes from > Erin/Serguei/David comments so this webrev > incremental is a bit an answer to all comments > in one. I apologize for that :) > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > > wrote: > > Hi JC, > > Sorry for the delayed reply. > > Inlined answers: > > > > On 2018-02-06 00:04, JC Beyler wrote: > > Hi Erik, > > (Renaming this to be folded into the > newly renamed thread :)) > > First off, thanks a lot for reviewing > the webrev! I appreciate it! > > I updated the webrev to: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > > And the incremental one is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > > It contains: > - The change for since from 9 to 11 for > the jvmti.xml > - The use of the OrderAccess for initialized > - Clearing the oop > > I also have inlined my answers to your > comments. The biggest question > will come from the multiple *_end > variables. A bit of the logic there > is due to handling the slow path refill > vs fast path refill and > checking that the rug was not pulled > underneath the slowpath. I > believe that a previous comment was that > TlabFastRefill was going to > be deprecated. > > If this is true, we could revert this > code a bit and just do a : if > TlabFastRefill is enabled, disable this. > And then deprecate that when > TlabFastRefill is deprecated. > > This might simplify this webrev and I > can work on a follow-up that > either: removes TlabFastRefill if Robbin > does not have the time to do > it or add the support to the assembly > side to handle this correctly. > What do you think? > > I support removing TlabFastRefill, but I > think it is good to not depend on that > happening first. > > > I'm slowly pushing on the FastTLABRefill > (https://bugs.openjdk.java.net/browse/JDK-8194084), > I agree on keeping both separate for now though > so that we can think of both differently > > Now, below, inlined are my answers: > > On Fri, Feb 2, 2018 at 8:44 AM, Erik > ?sterlund > > wrote: > > Hi JC, > > Hope I am reviewing the right > version of your work. Here goes... > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > ? 159 > ?AllocTracer::send_allocation_outside_tlab(klass, result, size * > HeapWordSize, THREAD); > ? 160 > ? 161 > ?THREAD->tlab().handle_sample(THREAD, result, size); > ? 162? ? ?return result; > ? 163? ?} > > Should not call tlab()->X without > checking if (UseTLAB) IMO. > > Done! > > > More about this later. > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > So first of all, there seems to > quite a few ends. There is an "end", > a "hard > end", a "slow path end", and an > "actual end". Moreover, it seems > like the > "hard end" is actually further away > than the "actual end". So the "hard end" > seems like more of a "really > definitely actual end" or something. > I don't > know about you, but I think it looks > kind of messy. In particular, I don't > feel like the name "actual end" > reflects what it represents, > especially when > there is another end that is behind > the "actual end". > > ? 413 HeapWord* > ThreadLocalAllocBuffer::hard_end() { > ? 414? ?// Did a fast TLAB refill > occur? > ? 415? ?if (_slow_path_end != _end) { > ? 416? ? ?// Fix up the actual end > to be now the end of this TLAB. > ? 417? ? ?_slow_path_end = _end; > ? 418? ? ?_actual_end = _end; > ? 419? ?} > ? 420 > ? 421? ?return _actual_end + > alignment_reserve(); > ? 422 } > > I really do not like making getters > unexpectedly have these kind of side > effects. It is not expected that > when you ask for the "hard end", you > implicitly update the "slow path > end" and "actual end" to new values. > > As I said, a lot of this is due to the > FastTlabRefill. If I make this > not supporting FastTlabRefill, this goes > away. The reason the system > needs to update itself at the get is > that you only know at that get if > things have shifted underneath the tlab > slow path. I am not sure of > really better names (naming is hard!), > perhaps we could do these > names: > > - current_tlab_end? ? ? ?// Either the > allocated tlab end or a sampling point > - last_allocation_address? // The end of > the tlab allocation > - last_slowpath_allocated_end? // In > case a fast refill occurred the > end might have changed, this is to > remember slow vs fast past refills > > the hard_end method can be renamed to > something like: > tlab_end_pointer()? ? ? ? // The end of > the lab including a bit of > alignment reserved bytes > > Those names sound better to me. Could you > please provide a mapping from the old names > to the new names so I understand which one > is which please? > > This is my current guess of what you are > proposing: > > end -> current_tlab_end > actual_end -> last_allocation_address > slow_path_end -> last_slowpath_allocated_end > hard_end -> tlab_end_pointer > > Yes that is correct, that was what I was proposing. > > I would prefer this naming: > > end -> slow_path_end // the end for taking a > slow path; either due to sampling or refilling > actual_end -> allocation_end // the end for > allocations > slow_path_end -> last_slow_path_end // last > address for slow_path_end (as opposed to > allocation_end) > hard_end -> reserved_end // the end of the > reserved space of the TLAB > > About setting things in the getter... that > still seems like a very unpleasant thing to > me. It would be better to inspect the call > hierarchy and explicitly update the ends > where they need updating, and assert in the > getter that they are in sync, rather than > implicitly setting various ends as a > surprising side effect in a getter. It looks > like the call hierarchy is very small. With > my new naming convention, reserved_end() > would presumably return _allocation_end + > alignment_reserve(), and have an assert > checking that _allocation_end == > _last_slow_path_allocation_end, complaining > that this invariant must hold, and that a > caller to this function, such as > make_parsable(), must first explicitly > synchronize the ends as required, to honor > that invariant. > > > I've renamed the variables to how you preferred > it except for the _end one. I did: > > current_end > > last_allocation_address > > tlab_end_ptr > > The reason is that the architecture dependent > code use the thread.hpp API and it already has > tlab included into the name so it becomes > tlab_current_end (which is better that > tlab_current_tlab_end in my opinion). > > I also moved the update into a separate method > with a TODO that says to remove it when > FastTLABRefill is deprecated > > This looks a lot better now. Thanks. > > Note that the following comment now needs updating > accordingly in threadLocalAllocBuffer.hpp: > > ? 41 //??????????? Heap sampling is performed via > the end/actual_end fields. > > ? 42 //??????????? actual_end contains the real end > of the tlab allocation, > > ? 43 //??????????? whereas end can be set to an > arbitrary spot in the tlab to > > ? 44 //????????? ??trip the return and sample the > allocation. > > ? 45 //??????????? slow_path_end is used to track > if a fast tlab refill occured > > ? 46 //??????????? between slowpath calls. > > There might be other comments too, I have not looked > in detail. > > This was the only spot that still had an actual_end, I > fixed it now. I'll do a sweep to double check other > comments. > > > > Not sure it's better but before updating > the webrev, I wanted to try > to get input/consensus :) > > (Note hard_end was always further off > than end). > > src/hotspot/share/prims/jvmti.xml: > > 10357? ? ? ? id="can_sample_heap" since="9"> > 10358? ? ? ? ? > 10359? ? ? ? ? ?Can sample the heap. > 10360? ? ? ? ? ?If this capability > is enabled then the heap sampling > methods > can be called. > 10361? ? ? ? ? > 10362? ? ? ? > > Looks like this capability should > not be "since 9" if it gets integrated > now. > > Updated now to 11, crossing my fingers :) > > src/hotspot/share/runtime/heapMonitoring.cpp: > > ? 448? ? ? ?if > (is_alive->do_object_b(value)) { > ? 449? ? ? ? ?// Update the oop to > point to the new object if it is still > alive. > ? 450? ? ? ? ?f->do_oop(&(trace.obj)); > ? 451 > ? 452? ? ? ? ?// Copy the old > trace, if it is still live. > ? 453 > ?_allocated_traces->at_put(curr_pos++, trace); > ? 454 > ? 455? ? ? ? ?// Store the live > trace in a cache, to be served up on > /heapz. > ? 456 > ?_traces_on_last_full_gc->append(trace); > ? 457 > ? 458? ? ? ? ?count++; > ? 459? ? ? ?} else { > ? 460? ? ? ? ?// If the old trace > is no longer live, add it to the list of > ? 461? ? ? ? ?// recently collected > garbage. > ? 462 > ?store_garbage_trace(trace); > ? 463? ? ? ?} > > In the case where the oop was not > live, I would like it to be explicitly > cleared. > > Done I think how you wanted it. Let me > know because I'm not familiar > with the RootAccess API. I'm unclear if > I'm doing this right or not so > reviews of these parts are highly > appreciated. Robbin had talked of > perhaps later pushing this all into a > OopStorage, should I do this now > do you think? Or can that wait a second > webrev later down the road? > > I think using handles can and should be done > later. You can use the Access API now. > I noticed that you are missing an #include > "oops/access.inline.hpp" in your > heapMonitoring.cpp file. > > The missing header is there for me so I don't > know, I made sure it is present in the latest > webrev. Sorry about that. > > + Did I clear it the way you wanted me > to or were you thinking of > something else? > > > That is precisely how I wanted it to be > cleared. Thanks. > > + Final question here, seems like if I > were to want to not do the > f->do_oop directly on the trace.obj, I'd > need to do something like: > > ? ? f->do_oop(&value); > ? ? ... > ? ? trace->store_oop(value); > > to update the oop internally. Is that > right/is that one of the > advantages of going to the Oopstorage > sooner than later? > > > I think you really want to do the do_oop on > the root directly. Is there a particular > reason why you would not want to do that? > Otherwise, yes - the benefit with using the > handle approach is that you do not need to > call do_oop explicitly in your code. > > There is no reason except that now we have a > load_oop and a get_oop_addr, I was not sure what > you would think of that. > > That's fine. > > Also I see a lot of > concurrent-looking use of the > following field: > ? 267? ?volatile bool _initialized; > > Please note that the "volatile" > qualifier does not help with reordering > here. Reordering between volatile > and non-volatile fields is > completely free > for both compiler and hardware, > except for windows with MSVC, where > volatile > semantics is defined to use > acquire/release semantics, and the > hardware is > TSO. But for the general case, I > would expect this field to be stored > with > OrderAccess::release_store and > loaded with OrderAccess::load_acquire. > Otherwise it is not thread safe. > > Because everything is behind a mutex, I > wasn't really worried about > this. I have a test that has multiple > threads trying to hit this > corner case and it passes. > > However, to be paranoid, I updated it to > using the OrderAccess API > now, thanks! Let me know what you think > there too! > > > If it is indeed always supposed to be read > and written under a mutex, then I would > strongly prefer to have it accessed as a > normal non-volatile member, and have an > assertion that given lock is held or we are > in a safepoint, as we do in many other > places. Something like this: > > assert(HeapMonitorStorage_lock->owned_by_self() > || (SafepointSynchronize::is_at_safepoint() > && Thread::current()->is_VM_thread()), "this > should not be accessed concurrently"); > > It would be confusing to people reading the > code if there are uses of OrderAccess that > are actually always protected under a mutex. > > Thank you for the exact example to be put in the > code! I put it around each access/assignment of > the _initialized method and found one case where > yes you can touch it and not have the lock. It > actually is "ok" because you don't act on the > storage until later and only when you really > want to modify the storage (see the > object_alloc_do_sample method which calls the > add_trace method). > > But, because of this, I'm going to put the > OrderAccess here, I'll do some performance > numbers later and if there are issues, I might > add a "unsafe" read and a "safe" one to make it > explicit to the reader. But I don't think it > will come to that. > > > Okay. This double return in heapMonitoring.cpp looks > wrong: > > ?283?? bool initialized() { > ?284???? return > OrderAccess::load_acquire(&_initialized) != 0; > ?285???? return _initialized; > ?286?? } > > Since you said object_alloc_do_sample() is the only > place where you do not hold the mutex while reading > initialized(), I had a closer look at that. It looks > like in its current shape, the lack of a mutex may > lead to a memory leak. In particular, it first > checks if (initialized()). Let's assume this is now > true. It then allocates a bunch of stuff, and checks > if the number of frames were over 0. If they were, > it calls StackTraceStorage::storage()->add_trace() > seemingly hoping that after grabbing the lock in > there, initialized() will still return true. But it > could now return false and skip doing anything, in > which case the allocated stuff will never be freed. > > I fixed this now by making add_trace return a boolean > and checking for that. It will be in the next webrev. > Thanks, the truth is that in our implementation the > system is always on or off, so this never really occurs > :). In this version though, that is not true and it's > important to handle so thanks again! > > > So the analysis seems to be that _initialized is > only used outside of the mutex in once instance, > where it is used to perform double-checked locking, > that actually causes a memory leak. > > I am not proposing how to fix that, just raising the > issue. If you still want to perform this > double-checked locking somehow, then the use of > acquire/release still seems odd. Because the memory > ordering restrictions of it never comes into play in > this particular case. If it ever did, then the use > of destroy_stuff(); release_store(_initialized, 0) > would be broken anyway as that would imply that > whatever concurrent reader there ever was would > after reading _initialized with load_acquire() could > *never* read the data that is concurrently destroyed > anyway. I would be biased to think that > RawAccess::load/store looks like a more > appropriate solution, given that the memory leak > issue is resolved. I do not know how painful it > would be to not perform this double-checked locking. > > So I agree with this entirely. I looked also a bit more > and the difference and code really stems from our > internal version. In this version however, there are > actually a lot of things going on that I did not go > entirely through in my head but this comment made me > ponder a bit more on it. > > Since every object_alloc_do_sample is protected by a > check to HeapMonitoring::enabled(), there is only a > small chance that the call is happening when things have > been disabled. So there is no real need to do a first > check on the initialized, it is a rare occurence that a > call happens to object_alloc_do_sample and the > initialized of the storage returns false. > > (By the way, even if you did call object_alloc_do_sample > without looking at HeapMonitoring::enabled(), that would > be ok too. You would gather the stacktrace and get > nowhere at the add_trace call, which would return false; > so though not optimal performance wise, nothing would > break). > > Furthermore, the add_trace is really the moment of no > return and we have the mutex lock and then the > initialized check. So, in the end, I did two things: I > removed that first check and then I removed the > OrderAccess for the storage initialized. I think now I > have a better grasp and understanding why it was done in > our code and why it is not needed here. Thanks for > pointing it out :). This now still passes my JTREG > tests, especially the threaded one. > > > > As a kind of meta comment, I wonder > if it would make sense to add sampling > for non-TLAB allocations. Seems like > if someone is rapidly allocating a > whole bunch of 1 MB objects that > never fit in a TLAB, I might still be > interested in seeing that in my > traces, and not get surprised that the > allocation rate is very high yet not > showing up in any profiles. > > That is handled by the handle_sample > where you wanted me to put a > UseTlab because you hit that case if the > allocation is too big. > > > I see. It was not obvious to me that > non-TLAB sampling is done in the TLAB class. > That seems like an abstraction crime. > What I wanted in my previous comment was > that we do not call into the TLAB when we > are not using TLABs. If there is sampling > logic in the TLAB that is used for something > else than TLABs, then it seems like that > logic simply does not belong inside of the > TLAB. It should be moved out of the TLAB, > and instead have the TLAB call this common > abstraction that makes sense. > > So in the incremental version: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > , > this is still a "crime". The reason is that the > system has to have the bytes_until_sample on a > per-thread level and it made "sense" to have it > with the TLAB implementation. Also, I was not > sure how people felt about adding something to > the thread instance instead. > > Do you think it fits better at the Thread level? > I can see how difficult it is to make it happen > there and add some logic there. Let me know what > you think. > > > We have an unfortunate situation where everyone that > has some fields that are thread local tend to dump > them right into Thread, making the size and > complexity of Thread grow as it becomes tightly > coupled with various unrelated subsystems. It would > be desirable to have a separate class for this > instead that encapsulates the sampling logic. That > class could possibly reside in Thread though as a > value object of Thread. > > I imagined that would be the case but was not sure. I > will look at the example that Robbin is talking about > (ThreadSMR) and will see how to refactor my code to use > that. > > Thanks again for your help, > > Jc > > > > Hope I have answered your questions and that > my feedback makes sense to you. > > You have and thank you for them, I think we are > getting to a cleaner implementation and things > are getting better and more readable :) > > > Yes it is getting better. > > Thanks, > /Erik > > > > Thanks for your help! > > Jc > > Thanks, > /Erik > > I double checked by changing the test > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java > > > to use a smaller Tlab (2048) and made > the object bigger and it goes > through that and passes. > > Thanks again for your review and I look > forward to your pointers for > the questions I now have raised! > Jc > > > > > > > > Thanks, > /Erik > > > On 2018-01-26 06:45, JC Beyler wrote: > > Thanks Robbin for the reviews :) > > The new full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ > > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ > > > I inlined my answers: > > On Thu, Jan 25, 2018 at 1:15 AM, > Robbin Ehn > > > wrote: > > Hi JC, great to see another > revision! > > #### > heapMonitoring.cpp > > StackTraceData should not > contain the oop for 'safety' > reasons. > When StackTraceData is moved > from _allocated_traces: > L452 store_garbage_trace(trace); > it contains a dead oop. > _allocated_traces could > instead be a tupel of oop > and StackTraceData thus > dead oops are not kept. > > Done I used inheritance to make > the copier work regardless but the > idea is the same. > > You should use the new > Access API for loading the > oop, something like > this: > RootAccess AS_NO_KEEPALIVE>::load(...) > I don't think you need to > use Access API for clearing > the oop, but it > would > look nicer. And you > shouldn't probably be using: > Universe::heap()->is_in_reserved(value) > > I am unfamiliar with this but I > think I did do it like you wanted me > to (all tests pass so that's a > start). I'm not sure how to > clear the > oop exactly, is there somewhere > that does that, which I can use > to do > the same? > > I removed the is_in_reserved, > this came from our internal > version, I > don't know why it was there but > my tests work without so I > removed it > :) > > The lock: > L424? ?MutexLocker > mu(HeapMonitorStorage_lock); > Is not needed as far as I > can see. > weak_oops_do is called in a > safepoint, no TLAB > allocation can happen and > JVMTI thread can't access > these data-structures. Is > there something more > to > this lock that I'm missing? > > Since a thread can call the > JVMTI getLiveTraces (or any of > the other > ones), it can get to the point > of trying to copying the > _allocated_traces. I imagine it > is possible that this is happening > during a GC or that it can be > started and a GC happens afterwards. > Therefore, it seems to me that > you want this protected, no? > > #### > You have 6 files without any > changes in them (any more): > g1CollectedHeap.cpp > psMarkSweep.cpp > psParallelCompact.cpp > genCollectedHeap.cpp > referenceProcessor.cpp > thread.hpp > > Done. > > #### > I have not looked closely, > but is it possible to hide > heap sampling in > AllocTracer ? (with some > minor changes to the > AllocTracer API) > > I am imagining that you are > saying to move the code that > does the > sampling code (change the tlab > end, do the call to HeapMonitoring, > etc.) into the AllocTracer code > itself? I think that is right > and I'll > look if that is possible and > prepare a webrev to show what > would be > needed to make that happen. > > #### > Minor nit, when declaring > pointer there is a little > mix of having the > pointer adjacent by type > name and data name. (Most > hotspot code is by > type > name) > E.g. > heapMonitoring.cpp:711 > ?jvmtiStackTrace *trace = .... > heapMonitoring.cpp:733 > ? ?Method* m = vfst.method(); > (not just this file) > > Done! > > #### > HeapMonitorThreadOnOffTest.java:77 > I would make g_tmp volatile, > otherwise the assignment in > loop may > theoretical be skipped. > > Also done! > > Thanks again! > Jc > From jcbeyler at google.com Thu Apr 5 18:15:56 2018 From: jcbeyler at google.com (JC Beyler) Date: Thu, 05 Apr 2018 18:15:56 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Thanks Boris and Derek for testing it. Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features. Here is the incremental webrev: Here is the full webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ Basically, the new tests assert this: - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor. Let me know what you think, Jc On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich wrote: > Hi JC, > > I have just checked on arm32: your patch compiles and runs ok. > > As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not > correspond to actual library name: libHeapMonitorTest.c -> > libHeapMonitorTest.so > > Boris > > On 04.04.2018 01:54, White, Derek wrote: > > Thanks JC, > > > > New patch applies cleanly. Compiles and runs (simple test programs) on > > aarch64. > > > > * Derek > > > > *From:* JC Beyler [mailto:jcbeyler at google.com] > > *Sent:* Monday, April 02, 2018 1:17 PM > > *To:* White, Derek > > *Cc:* Erik ?sterlund ; > > serviceability-dev at openjdk.java.net; hotspot-compiler-dev > > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi Derek, > > > > I know there were a few things that went in that provoked a merge > > conflict. I worked on it and got it up to date. Sadly my lack of > > knowledge makes it a full rebase instead of keeping all the history. > > However, with a newly cloned jdk/hs you should now be able to use: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ > > > > The change you are referring to was done with the others so perhaps you > > were unlucky and I forgot it in a webrev and fixed it in another? I > > don't know but it's been there and I checked, it is here: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html > > > > I double checked that tlab_end_offset no longer appears in any > > architecture (as far as I can tell :)). > > > > Thanks for testing and let me know if you run into any other issues! > > > > Jc > > > > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > > wrote: > > > > Hi Jc, > > > > I?ve been having trouble getting your patch to apply correctly. I > > may have based it on the wrong version. > > > > In any case, I think there?s a missing update to > > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), > > where ?JavaThread::tlab_end_offset()? should become > > ?JavaThread::tlab_current_end_offset()?. > > > > This should correspond to the other port?s changes in > > templateTable_.cpp files. > > > > Thanks! > > - Derek > > > > *From:* hotspot-compiler-dev > > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > > ] *On Behalf > > Of *JC Beyler > > *Sent:* Wednesday, March 28, 2018 11:43 AM > > *To:* Erik ?sterlund > > > > *Cc:* serviceability-dev at openjdk.java.net > > ; hotspot-compiler-dev > > > > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in > > the JVMTI spec. > > > > Here is the two incremental webrevs: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > > > - Currently the tests pass 100 times in a row, I am working on > > checking if they pass 1000 times in a row. > > > > - The default sampling rate is set to 512k, this is what we use > > internally and having a default means that to enable the sampling > > with the default, the user only has to do a enable event/disable > > event via JVMTI (instead of enable + set sample rate). > > > > - I deprecated the code that was handling the fast path tlab > > refill if it happened since this is now deprecated > > > > - Though I saw that Graal is still using it so I have to see > > what needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for > > when the event system is turned on and the callback to the native > > agent is just empty. I got a 3% overhead with a 512k sampling rate > > with the code I put in the native side of my tests. > > > > Thanks and comments are appreciated, > > > > Jc > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > > wrote: > > > > Hi all, > > > > The incremental webrev update is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > > > - I've removed the heapMonitoring.cpp code in favor of just > > having the sampling events as per Serguei's request; I still > > have to do some overhead measurements but the tests prove the > > concept can work > > > > - Most of the tlab code is unchanged, the only major > > part is that now things get sent off to event collectors when > > used and enabled. > > > > - Added the interpreter collectors to handle interpreter > > execution > > > > - Updated the name from SetTlabHeapSampling to > > SetHeapSampling to be more generic > > > > - Added a mutex for the thread sampling so that we can > > initialize an internal static array safely > > > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed > > the heap sampling code entirely and now rely entirely on the > > event sampling system. The tests reflect this by using a > > simplified implementation of what an agent could do: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had > > originally in heapMonitoring to this. I hesitate only because > > that code was in C++, I'd have to port it to C and this is for > > tests so perhaps what I have now is good enough? > > > > As far as testing goes, I've ported all the relevant tests and > > then added a few: > > > > - Turning the system on/off > > > > - Testing using various GCs > > > > - Testing using the interpreter > > > > - Testing the sampling rate > > > > - Testing with objects and arrays > > > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off > > vs a clean build and I have 0% overhead, which is what we'd > > want. This was using the Dacapo benchmarks. I am now preparing > > to run a version with the events on using dacapo and will report > > back here. > > > > Any comments are welcome :) > > > > Jc > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > > wrote: > > > > Hi all, > > > > I apologize for the delay but I wanted to add an event > > system and that took a bit longer than expected and I also > > reworked the code to take into account the deprecation of > > FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to > > ThreadHeapSampler inside of Thread. Would you prefer it as a > > pointer inside of Thread or like this works for you? Second > > question would be would you rather have an association > > outside of Thread altogether that tries to remember when > > threads are live and then we would have something like: > > > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too > > too bad? > > > > B) I also have been working on the Allocation event system > > that sends out a notification at each sampled event. This > > will be practical when wanting to do something at the > > allocation point. I'm also looking at if the whole > > heapMonitoring code could not reside in the agent code and > > not in the JDK. I'm not convinced but I'm talking to Serguei > > about it to see/assess :) > > > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since > > now FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the > > methods. > > > > So the incremental webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track > > this :) > > > > Thanks for your input, > > > > Jc > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > > > wrote: > > > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer > > Robbin's concerns about the same thing (adding things to > > Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > > > > wrote: > > > > Hi JC, > > > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own > > answers/comments. > > > > I've done a new webrev here: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> > > > > The incremental is here: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> > > > > Note to all: > > > > - I've been integrating changes from > > Erin/Serguei/David comments so this webrev > > incremental is a bit an answer to all comments > > in one. I apologize for that :) > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > > > > wrote: > > > > Hi JC, > > > > Sorry for the delayed reply. > > > > Inlined answers: > > > > > > > > On 2018-02-06 00:04, JC Beyler wrote: > > > > Hi Erik, > > > > (Renaming this to be folded into the > > newly renamed thread :)) > > > > First off, thanks a lot for reviewing > > the webrev! I appreciate it! > > > > I updated the webrev to: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> > > > > And the incremental one is here: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> > > > > It contains: > > - The change for since from 9 to 11 for > > the jvmti.xml > > - The use of the OrderAccess for > initialized > > - Clearing the oop > > > > I also have inlined my answers to your > > comments. The biggest question > > will come from the multiple *_end > > variables. A bit of the logic there > > is due to handling the slow path refill > > vs fast path refill and > > checking that the rug was not pulled > > underneath the slowpath. I > > believe that a previous comment was that > > TlabFastRefill was going to > > be deprecated. > > > > If this is true, we could revert this > > code a bit and just do a : if > > TlabFastRefill is enabled, disable this. > > And then deprecate that when > > TlabFastRefill is deprecated. > > > > This might simplify this webrev and I > > can work on a follow-up that > > either: removes TlabFastRefill if Robbin > > does not have the time to do > > it or add the support to the assembly > > side to handle this correctly. > > What do you think? > > > > I support removing TlabFastRefill, but I > > think it is good to not depend on that > > happening first. > > > > > > I'm slowly pushing on the FastTLABRefill > > ( > https://bugs.openjdk.java.net/browse/JDK-8194084), > > I agree on keeping both separate for now though > > so that we can think of both differently > > > > Now, below, inlined are my answers: > > > > On Fri, Feb 2, 2018 at 8:44 AM, Erik > > ?sterlund > > > > > wrote: > > > > Hi JC, > > > > Hope I am reviewing the right > > version of your work. Here goes... > > > > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > > > 159 > > > AllocTracer::send_allocation_outside_tlab(klass, result, size * > > HeapWordSize, THREAD); > > 160 > > 161 > > > THREAD->tlab().handle_sample(THREAD, result, size); > > 162 return result; > > 163 } > > > > Should not call tlab()->X without > > checking if (UseTLAB) IMO. > > > > Done! > > > > > > More about this later. > > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > > > So first of all, there seems to > > quite a few ends. There is an "end", > > a "hard > > end", a "slow path end", and an > > "actual end". Moreover, it seems > > like the > > "hard end" is actually further away > > than the "actual end". So the "hard > end" > > seems like more of a "really > > definitely actual end" or something. > > I don't > > know about you, but I think it looks > > kind of messy. In particular, I don't > > feel like the name "actual end" > > reflects what it represents, > > especially when > > there is another end that is behind > > the "actual end". > > > > 413 HeapWord* > > ThreadLocalAllocBuffer::hard_end() { > > 414 // Did a fast TLAB refill > > occur? > > 415 if (_slow_path_end != _end) > { > > 416 // Fix up the actual end > > to be now the end of this TLAB. > > 417 _slow_path_end = _end; > > 418 _actual_end = _end; > > 419 } > > 420 > > 421 return _actual_end + > > alignment_reserve(); > > 422 } > > > > I really do not like making getters > > unexpectedly have these kind of side > > effects. It is not expected that > > when you ask for the "hard end", you > > implicitly update the "slow path > > end" and "actual end" to new values. > > > > As I said, a lot of this is due to the > > FastTlabRefill. If I make this > > not supporting FastTlabRefill, this goes > > away. The reason the system > > needs to update itself at the get is > > that you only know at that get if > > things have shifted underneath the tlab > > slow path. I am not sure of > > really better names (naming is hard!), > > perhaps we could do these > > names: > > > > - current_tlab_end // Either the > > allocated tlab end or a sampling point > > - last_allocation_address // The end of > > the tlab allocation > > - last_slowpath_allocated_end // In > > case a fast refill occurred the > > end might have changed, this is to > > remember slow vs fast past refills > > > > the hard_end method can be renamed to > > something like: > > tlab_end_pointer() // The end of > > the lab including a bit of > > alignment reserved bytes > > > > Those names sound better to me. Could you > > please provide a mapping from the old names > > to the new names so I understand which one > > is which please? > > > > This is my current guess of what you are > > proposing: > > > > end -> current_tlab_end > > actual_end -> last_allocation_address > > slow_path_end -> last_slowpath_allocated_end > > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was > proposing. > > > > I would prefer this naming: > > > > end -> slow_path_end // the end for taking a > > slow path; either due to sampling or > refilling > > actual_end -> allocation_end // the end for > > allocations > > slow_path_end -> last_slow_path_end // last > > address for slow_path_end (as opposed to > > allocation_end) > > hard_end -> reserved_end // the end of the > > reserved space of the TLAB > > > > About setting things in the getter... that > > still seems like a very unpleasant thing to > > me. It would be better to inspect the call > > hierarchy and explicitly update the ends > > where they need updating, and assert in the > > getter that they are in sync, rather than > > implicitly setting various ends as a > > surprising side effect in a getter. It looks > > like the call hierarchy is very small. With > > my new naming convention, reserved_end() > > would presumably return _allocation_end + > > alignment_reserve(), and have an assert > > checking that _allocation_end == > > _last_slow_path_allocation_end, complaining > > that this invariant must hold, and that a > > caller to this function, such as > > make_parsable(), must first explicitly > > synchronize the ends as required, to honor > > that invariant. > > > > > > I've renamed the variables to how you preferred > > it except for the _end one. I did: > > > > current_end > > > > last_allocation_address > > > > tlab_end_ptr > > > > The reason is that the architecture dependent > > code use the thread.hpp API and it already has > > tlab included into the name so it becomes > > tlab_current_end (which is better that > > tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method > > with a TODO that says to remove it when > > FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > > > Note that the following comment now needs updating > > accordingly in threadLocalAllocBuffer.hpp: > > > > 41 // Heap sampling is performed via > > the end/actual_end fields. > > > > 42 // actual_end contains the real end > > of the tlab allocation, > > > > 43 // whereas end can be set to an > > arbitrary spot in the tlab to > > > > 44 // trip the return and sample the > > allocation. > > > > 45 // slow_path_end is used to track > > if a fast tlab refill occured > > > > 46 // between slowpath calls. > > > > There might be other comments too, I have not looked > > in detail. > > > > This was the only spot that still had an actual_end, I > > fixed it now. I'll do a sweep to double check other > > comments. > > > > > > > > Not sure it's better but before updating > > the webrev, I wanted to try > > to get input/consensus :) > > > > (Note hard_end was always further off > > than end). > > > > src/hotspot/share/prims/jvmti.xml: > > > > 10357 > id="can_sample_heap" since="9"> > > 10358 > > 10359 Can sample the heap. > > 10360 If this capability > > is enabled then the heap sampling > > methods > > can be called. > > 10361 > > 10362 > > > > Looks like this capability should > > not be "since 9" if it gets > integrated > > now. > > > > Updated now to 11, crossing my fingers :) > > > > > src/hotspot/share/runtime/heapMonitoring.cpp: > > > > 448 if > > (is_alive->do_object_b(value)) { > > 449 // Update the oop to > > point to the new object if it is > still > > alive. > > 450 > f->do_oop(&(trace.obj)); > > 451 > > 452 // Copy the old > > trace, if it is still live. > > 453 > > > _allocated_traces->at_put(curr_pos++, trace); > > 454 > > 455 // Store the live > > trace in a cache, to be served up on > > /heapz. > > 456 > > > _traces_on_last_full_gc->append(trace); > > 457 > > 458 count++; > > 459 } else { > > 460 // If the old trace > > is no longer live, add it to the > list of > > 461 // recently collected > > garbage. > > 462 > > store_garbage_trace(trace); > > 463 } > > > > In the case where the oop was not > > live, I would like it to be > explicitly > > cleared. > > > > Done I think how you wanted it. Let me > > know because I'm not familiar > > with the RootAccess API. I'm unclear if > > I'm doing this right or not so > > reviews of these parts are highly > > appreciated. Robbin had talked of > > perhaps later pushing this all into a > > OopStorage, should I do this now > > do you think? Or can that wait a second > > webrev later down the road? > > > > I think using handles can and should be done > > later. You can use the Access API now. > > I noticed that you are missing an #include > > "oops/access.inline.hpp" in your > > heapMonitoring.cpp file. > > > > The missing header is there for me so I don't > > know, I made sure it is present in the latest > > webrev. Sorry about that. > > > > + Did I clear it the way you wanted me > > to or were you thinking of > > something else? > > > > > > That is precisely how I wanted it to be > > cleared. Thanks. > > > > + Final question here, seems like if I > > were to want to not do the > > f->do_oop directly on the trace.obj, I'd > > need to do something like: > > > > f->do_oop(&value); > > ... > > trace->store_oop(value); > > > > to update the oop internally. Is that > > right/is that one of the > > advantages of going to the Oopstorage > > sooner than later? > > > > > > I think you really want to do the do_oop on > > the root directly. Is there a particular > > reason why you would not want to do that? > > Otherwise, yes - the benefit with using the > > handle approach is that you do not need to > > call do_oop explicitly in your code. > > > > There is no reason except that now we have a > > load_oop and a get_oop_addr, I was not sure what > > you would think of that. > > > > That's fine. > > > > Also I see a lot of > > concurrent-looking use of the > > following field: > > 267 volatile bool _initialized; > > > > Please note that the "volatile" > > qualifier does not help with > reordering > > here. Reordering between volatile > > and non-volatile fields is > > completely free > > for both compiler and hardware, > > except for windows with MSVC, where > > volatile > > semantics is defined to use > > acquire/release semantics, and the > > hardware is > > TSO. But for the general case, I > > would expect this field to be stored > > with > > OrderAccess::release_store and > > loaded with > OrderAccess::load_acquire. > > Otherwise it is not thread safe. > > > > Because everything is behind a mutex, I > > wasn't really worried about > > this. I have a test that has multiple > > threads trying to hit this > > corner case and it passes. > > > > However, to be paranoid, I updated it to > > using the OrderAccess API > > now, thanks! Let me know what you think > > there too! > > > > > > If it is indeed always supposed to be read > > and written under a mutex, then I would > > strongly prefer to have it accessed as a > > normal non-volatile member, and have an > > assertion that given lock is held or we are > > in a safepoint, as we do in many other > > places. Something like this: > > > > > assert(HeapMonitorStorage_lock->owned_by_self() > > || (SafepointSynchronize::is_at_safepoint() > > && Thread::current()->is_VM_thread()), "this > > should not be accessed concurrently"); > > > > It would be confusing to people reading the > > code if there are uses of OrderAccess that > > are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the > > code! I put it around each access/assignment of > > the _initialized method and found one case where > > yes you can touch it and not have the lock. It > > actually is "ok" because you don't act on the > > storage until later and only when you really > > want to modify the storage (see the > > object_alloc_do_sample method which calls the > > add_trace method). > > > > But, because of this, I'm going to put the > > OrderAccess here, I'll do some performance > > numbers later and if there are issues, I might > > add a "unsafe" read and a "safe" one to make it > > explicit to the reader. But I don't think it > > will come to that. > > > > > > Okay. This double return in heapMonitoring.cpp looks > > wrong: > > > > 283 bool initialized() { > > 284 return > > OrderAccess::load_acquire(&_initialized) != 0; > > 285 return _initialized; > > 286 } > > > > Since you said object_alloc_do_sample() is the only > > place where you do not hold the mutex while reading > > initialized(), I had a closer look at that. It looks > > like in its current shape, the lack of a mutex may > > lead to a memory leak. In particular, it first > > checks if (initialized()). Let's assume this is now > > true. It then allocates a bunch of stuff, and checks > > if the number of frames were over 0. If they were, > > it calls StackTraceStorage::storage()->add_trace() > > seemingly hoping that after grabbing the lock in > > there, initialized() will still return true. But it > > could now return false and skip doing anything, in > > which case the allocated stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean > > and checking for that. It will be in the next webrev. > > Thanks, the truth is that in our implementation the > > system is always on or off, so this never really occurs > > :). In this version though, that is not true and it's > > important to handle so thanks again! > > > > > > So the analysis seems to be that _initialized is > > only used outside of the mutex in once instance, > > where it is used to perform double-checked locking, > > that actually causes a memory leak. > > > > I am not proposing how to fix that, just raising the > > issue. If you still want to perform this > > double-checked locking somehow, then the use of > > acquire/release still seems odd. Because the memory > > ordering restrictions of it never comes into play in > > this particular case. If it ever did, then the use > > of destroy_stuff(); release_store(_initialized, 0) > > would be broken anyway as that would imply that > > whatever concurrent reader there ever was would > > after reading _initialized with load_acquire() could > > *never* read the data that is concurrently destroyed > > anyway. I would be biased to think that > > RawAccess::load/store looks like a more > > appropriate solution, given that the memory leak > > issue is resolved. I do not know how painful it > > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more > > and the difference and code really stems from our > > internal version. In this version however, there are > > actually a lot of things going on that I did not go > > entirely through in my head but this comment made me > > ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a > > check to HeapMonitoring::enabled(), there is only a > > small chance that the call is happening when things have > > been disabled. So there is no real need to do a first > > check on the initialized, it is a rare occurence that a > > call happens to object_alloc_do_sample and the > > initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample > > without looking at HeapMonitoring::enabled(), that would > > be ok too. You would gather the stacktrace and get > > nowhere at the add_trace call, which would return false; > > so though not optimal performance wise, nothing would > > break). > > > > Furthermore, the add_trace is really the moment of no > > return and we have the mutex lock and then the > > initialized check. So, in the end, I did two things: I > > removed that first check and then I removed the > > OrderAccess for the storage initialized. I think now I > > have a better grasp and understanding why it was done in > > our code and why it is not needed here. Thanks for > > pointing it out :). This now still passes my JTREG > > tests, especially the threaded one. > > > > > > > > As a kind of meta comment, I wonder > > if it would make sense to add > sampling > > for non-TLAB allocations. Seems like > > if someone is rapidly allocating a > > whole bunch of 1 MB objects that > > never fit in a TLAB, I might still be > > interested in seeing that in my > > traces, and not get surprised that > the > > allocation rate is very high yet not > > showing up in any profiles. > > > > That is handled by the handle_sample > > where you wanted me to put a > > UseTlab because you hit that case if the > > allocation is too big. > > > > > > I see. It was not obvious to me that > > non-TLAB sampling is done in the TLAB class. > > That seems like an abstraction crime. > > What I wanted in my previous comment was > > that we do not call into the TLAB when we > > are not using TLABs. If there is sampling > > logic in the TLAB that is used for something > > else than TLABs, then it seems like that > > logic simply does not belong inside of the > > TLAB. It should be moved out of the TLAB, > > and instead have the TLAB call this common > > abstraction that makes sense. > > > > So in the incremental version: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/>, > > this is still a "crime". The reason is that the > > system has to have the bytes_until_sample on a > > per-thread level and it made "sense" to have it > > with the TLAB implementation. Also, I was not > > sure how people felt about adding something to > > the thread instance instead. > > > > Do you think it fits better at the Thread level? > > I can see how difficult it is to make it happen > > there and add some logic there. Let me know what > > you think. > > > > > > We have an unfortunate situation where everyone that > > has some fields that are thread local tend to dump > > them right into Thread, making the size and > > complexity of Thread grow as it becomes tightly > > coupled with various unrelated subsystems. It would > > be desirable to have a separate class for this > > instead that encapsulates the sampling logic. That > > class could possibly reside in Thread though as a > > value object of Thread. > > > > I imagined that would be the case but was not sure. I > > will look at the example that Robbin is talking about > > (ThreadSMR) and will see how to refactor my code to use > > that. > > > > Thanks again for your help, > > > > Jc > > > > > > > > Hope I have answered your questions and that > > my feedback makes sense to you. > > > > You have and thank you for them, I think we are > > getting to a cleaner implementation and things > > are getting better and more readable :) > > > > > > Yes it is getting better. > > > > Thanks, > > /Erik > > > > > > > > Thanks for your help! > > > > Jc > > > > Thanks, > > /Erik > > > > I double checked by changing the test > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Fri Apr 6 15:01:41 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Fri, 6 Apr 2018 15:01:41 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework Message-ID: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> Hi, can I please get reviews for a set of clean up changes that I came across when doing some integration work. Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ Detailed comments about the changes can be found in the bug. Thanks & best regards Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pietro.Paolini at alfasystems.com Fri Apr 6 16:13:49 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Fri, 6 Apr 2018 16:13:49 +0000 Subject: =?Windows-1252?Q?inspect_a_thread=92s_stack_?= Message-ID: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Hi all, I apologise if this is not the right ML for it but I couldn?t find exactly what I was looking for when Googling the problem. I am a bit new to the JDI world. I would like to inspect the stack-frame of a specific thread, I came across the StackFrame/ThreadReference classes but I couldn?t find a way examples where their usage is shown without connecting to the VM somehow, like a debugger would do. Is it possible to inspect a thread?s stack ?locally? ? In my mind I could be able to have a function such as : static void hook(Thread thread) { thread.wait() // stop that thread // inspect the frames of that thread doing any needed business with them } I?d need this for diagnostic purposes of my application. Thanks, Pietro Pietro Paolini Consultant Alfa ________________________________ e: pietro.paolini at alfasystems.com | w: alfasystems.com t: +44 (0) 20 7920-2643 | Moor Place, 1 Fore Street Avenue, London, EC2Y 9DT, GB ________________________________ The contents of this communication are not intended to be binding or constitute any form of offer or acceptance or give rise to any legal obligations on behalf of the sender or Alfa. The views or opinions expressed represent those of the author and not necessarily those of Alfa. This email and any attachments are strictly confidential and are intended solely for use by the individual or entity to whom it is addressed. If you are not the addressee (or responsible for delivery of the message to the addressee) you may not copy, forward, disclose or use any part of the message or its attachments. At present the integrity of email across the internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the use of this medium to transmit information by or to Alfa or its affiliates. Alfa Financial Software Ltd Reg. in England No: 0248 2325 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jini.george at oracle.com Fri Apr 6 16:21:50 2018 From: jini.george at oracle.com (Jini George) Date: Fri, 6 Apr 2018 21:51:50 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS Message-ID: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> Hello! Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ While trying to identify the type given an address, a WrongTypeException was getting thrown with various clhsdb commands (like printmdo, jstack, etc). This was since SA tries to map an address to a hotspot C++ type by comparing the vtable address to the vtable address values of known types. With CDS, since the vtables are copied over for the Metadata classes, the vtable addresses themselves don't match (though, of course, the contents will), and SA errors out. The fix has been implemented by making changes to read in the md region (consisting of the c++ vtables) of the CDS archive in SA, and mapping the vtable addresses to the corresponding metadata type (ConstantPool, InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). For corefiles, an additional modification has been done to have the replicated FileMapHeader structure (from src/hotspot/share/memory/filemap.hpp, which is replicated in SA in ps_core.c), to be in sync with the corresponding definition in src/hotspot/share/memory/filemap.hpp. Test cases to test both live and corefile debugging are being added with this. These and other SA tests pass on Mach5. Thanks, Jini. From chris.plummer at oracle.com Fri Apr 6 16:37:09 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 6 Apr 2018 09:37:09 -0700 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> Message-ID: <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> Hi Christoph, Can you explain a bit more about "fix handling of null values in ArgumentIterator::next". When does this turn up? Is there a test case? Everything else looks good. thanks, Chris On 4/6/18 8:01 AM, Langer, Christoph wrote: > > Hi, > > can I please get reviews for a set of clean up changes that I came > across when doing some integration work. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > > > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > > > Detailed comments about the changes can be found in the bug. > > Thanks & best regards > > Christoph > From karen.kinnear at oracle.com Fri Apr 6 21:40:47 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Fri, 6 Apr 2018 17:40:47 -0400 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: JC, Thank you for the updates - really glad you are including the compiler folks. I reviewed the version before this one, so ignore any comments you?ve already covered (although I did peek at the latest) 1. JDK-8194905 CSR - could you please delete attachments that are not current. It is a bit confusing right now. I have been looking at jvmti_event6.html. I am assuming the rest are obsolete and could be removed please. 2. In jvmti_event6.html under Sampled Object Allocation, there is a link to ?Heap Sampling Monitoring System?. It takes me to the top of the page - seems like something is missing in defining it? 3. Scope of memory allocation tracking I am struggling to understand the extent of memory allocation tracking that you are looking for (probably want to clarify in the JEP and CSR once we work this through). e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent So the current jvmtiVMObjectAllocEvent says: Sent when a method causes the VM to allocate an Object visible to Java and allocation not detectable by other instrumentation mechanisms. Generally detect by instrumenting bytecodes of allocating methods JNI - use JNI function interception e.g. Reflection: java.lang.Class.newInstance() e.g. VM intrinsics comment: Not generated due to bytecodes - e.g. new and newarray VM instructions Not allocation due to JNI: e.g. AllocObject NOT VM internal objects NOT allocations during VM init So from the JEP I can?t tell the intended scope of the new event - is this intended to cover all heap allocation? bytecodes JVM_* JNI_* internal VM objects other? (I?m not sure what other there are) - I presume not allocations during VM init - since sent only during live phase OR - is the primary goal to cover allocation for bytecodes so folks can skip instrumentation? OR - do you want to get performance numbers and see what is low enough overhead before deciding? 4. The design question is where to put the collectors in the source base - and that of course strongly depends on the scope of the information you want to collect, and on the performance overhead we are willing to incur. I was trying to figure out a way to put the collectors farther down the call stack so as to both catch more cases and to reduce the maintenance burden - i.e. if you were to add a new code generator, e.g. Graal - if it were to go through an existing interface, that might be a place to already have a collector. I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - and it appears that there are calls to instanceKlass::new_instance, oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, so one possibility would be to put hooks in those calls which would catch many? (I did not do a thorough search) of the slowpath calls for the bytecodes, and then check the fast paths in detail. I had wondered if it made sense to move the hooks even farther down, into CollectedHeap:obj_allocate and array_allocate. I do not think so. First reason is that for multidimensional arrays, ArrayKlass::multi_allocate the outer dimension array would have an event before storing the inner sub-arrays and I don?t think we want that exposed, so that won?t work for arrays. The second reason is that I strongly suspect the scope you want is bytecodes only. I think once you have added hooks to all the fast paths and slow paths that this will be pushing the performance overhead constraints you proposed and you won?t want to see e.g. internal allocations. But I think you need to experiment with the set of allocations (or possible alternative sets of allocations) you want recorded. The hooks I see today include: Interpreter: (looking at x86 as a sample) - slowpath in InterpreterRuntime - fastpath tlab allocation - your new threshold check handles that - allow_shared_alloc (GC specific): for _new isn?t handled C1 I don?t see changes in c1_Runtime.cpp note: you also want to look for the fast path C2: changes in opto/runtime.cpp for slow path did you also catch the fast path? 3. Performance - After you get all the collectors added - you need to rerun the performance numbers. thanks, Karen > On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: > > Thanks Boris and Derek for testing it. > > Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features. > > Here is the incremental webrev: > > Here is the full webrev: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ > > Basically, the new tests assert this: > - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one > - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit > > There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor. > > Let me know what you think, > Jc > > On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich > wrote: > Hi JC, > > I have just checked on arm32: your patch compiles and runs ok. > > As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not > correspond to actual library name: libHeapMonitorTest.c -> > libHeapMonitorTest.so > > Boris > > On 04.04.2018 01:54, White, Derek wrote: > > Thanks JC, > > > > New patch applies cleanly. Compiles and runs (simple test programs) on > > aarch64. > > > > * Derek > > > > *From:* JC Beyler [mailto:jcbeyler at google.com ] > > *Sent:* Monday, April 02, 2018 1:17 PM > > *To:* White, Derek > > > *Cc:* Erik ?sterlund >; > > serviceability-dev at openjdk.java.net ; hotspot-compiler-dev > > > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi Derek, > > > > I know there were a few things that went in that provoked a merge > > conflict. I worked on it and got it up to date. Sadly my lack of > > knowledge makes it a full rebase instead of keeping all the history. > > However, with a newly cloned jdk/hs you should now be able to use: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ > > > > The change you are referring to was done with the others so perhaps you > > were unlucky and I forgot it in a webrev and fixed it in another? I > > don't know but it's been there and I checked, it is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html > > > > I double checked that tlab_end_offset no longer appears in any > > architecture (as far as I can tell :)). > > > > Thanks for testing and let me know if you run into any other issues! > > > > Jc > > > > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > > >> wrote: > > > > Hi Jc, > > > > I?ve been having trouble getting your patch to apply correctly. I > > may have based it on the wrong version. > > > > In any case, I think there?s a missing update to > > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), > > where ?JavaThread::tlab_end_offset()? should become > > ?JavaThread::tlab_current_end_offset()?. > > > > This should correspond to the other port?s changes in > > templateTable_.cpp files. > > > > Thanks! > > - Derek > > > > *From:* hotspot-compiler-dev > > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > > >] *On Behalf > > Of *JC Beyler > > *Sent:* Wednesday, March 28, 2018 11:43 AM > > *To:* Erik ?sterlund > > >> > > *Cc:* serviceability-dev at openjdk.java.net > > >; hotspot-compiler-dev > > > > >> > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in > > the JVMTI spec. > > > > Here is the two incremental webrevs: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > > > - Currently the tests pass 100 times in a row, I am working on > > checking if they pass 1000 times in a row. > > > > - The default sampling rate is set to 512k, this is what we use > > internally and having a default means that to enable the sampling > > with the default, the user only has to do a enable event/disable > > event via JVMTI (instead of enable + set sample rate). > > > > - I deprecated the code that was handling the fast path tlab > > refill if it happened since this is now deprecated > > > > - Though I saw that Graal is still using it so I have to see > > what needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for > > when the event system is turned on and the callback to the native > > agent is just empty. I got a 3% overhead with a 512k sampling rate > > with the code I put in the native side of my tests. > > > > Thanks and comments are appreciated, > > > > Jc > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > > >> wrote: > > > > Hi all, > > > > The incremental webrev update is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > > > - I've removed the heapMonitoring.cpp code in favor of just > > having the sampling events as per Serguei's request; I still > > have to do some overhead measurements but the tests prove the > > concept can work > > > > - Most of the tlab code is unchanged, the only major > > part is that now things get sent off to event collectors when > > used and enabled. > > > > - Added the interpreter collectors to handle interpreter > > execution > > > > - Updated the name from SetTlabHeapSampling to > > SetHeapSampling to be more generic > > > > - Added a mutex for the thread sampling so that we can > > initialize an internal static array safely > > > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed > > the heap sampling code entirely and now rely entirely on the > > event sampling system. The tests reflect this by using a > > simplified implementation of what an agent could do: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had > > originally in heapMonitoring to this. I hesitate only because > > that code was in C++, I'd have to port it to C and this is for > > tests so perhaps what I have now is good enough? > > > > As far as testing goes, I've ported all the relevant tests and > > then added a few: > > > > - Turning the system on/off > > > > - Testing using various GCs > > > > - Testing using the interpreter > > > > - Testing the sampling rate > > > > - Testing with objects and arrays > > > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off > > vs a clean build and I have 0% overhead, which is what we'd > > want. This was using the Dacapo benchmarks. I am now preparing > > to run a version with the events on using dacapo and will report > > back here. > > > > Any comments are welcome :) > > > > Jc > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > > >> wrote: > > > > Hi all, > > > > I apologize for the delay but I wanted to add an event > > system and that took a bit longer than expected and I also > > reworked the code to take into account the deprecation of > > FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to > > ThreadHeapSampler inside of Thread. Would you prefer it as a > > pointer inside of Thread or like this works for you? Second > > question would be would you rather have an association > > outside of Thread altogether that tries to remember when > > threads are live and then we would have something like: > > > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too > > too bad? > > > > B) I also have been working on the Allocation event system > > that sends out a notification at each sampled event. This > > will be practical when wanting to do something at the > > allocation point. I'm also looking at if the whole > > heapMonitoring code could not reside in the agent code and > > not in the JDK. I'm not convinced but I'm talking to Serguei > > about it to see/assess :) > > > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since > > now FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the > > methods. > > > > So the incremental webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track > > this :) > > > > Thanks for your input, > > > > Jc > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > > >> wrote: > > > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer > > Robbin's concerns about the same thing (adding things to > > Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > > > > >> wrote: > > > > Hi JC, > > > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own > > answers/comments. > > > > I've done a new webrev here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > > > > > The incremental is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > > > > > Note to all: > > > > - I've been integrating changes from > > Erin/Serguei/David comments so this webrev > > incremental is a bit an answer to all comments > > in one. I apologize for that :) > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > > > > >> wrote: > > > > Hi JC, > > > > Sorry for the delayed reply. > > > > Inlined answers: > > > > > > > > On 2018-02-06 00:04, JC Beyler wrote: > > > > Hi Erik, > > > > (Renaming this to be folded into the > > newly renamed thread :)) > > > > First off, thanks a lot for reviewing > > the webrev! I appreciate it! > > > > I updated the webrev to: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > > > > > > And the incremental one is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > > > > > > It contains: > > - The change for since from 9 to 11 for > > the jvmti.xml > > - The use of the OrderAccess for initialized > > - Clearing the oop > > > > I also have inlined my answers to your > > comments. The biggest question > > will come from the multiple *_end > > variables. A bit of the logic there > > is due to handling the slow path refill > > vs fast path refill and > > checking that the rug was not pulled > > underneath the slowpath. I > > believe that a previous comment was that > > TlabFastRefill was going to > > be deprecated. > > > > If this is true, we could revert this > > code a bit and just do a : if > > TlabFastRefill is enabled, disable this. > > And then deprecate that when > > TlabFastRefill is deprecated. > > > > This might simplify this webrev and I > > can work on a follow-up that > > either: removes TlabFastRefill if Robbin > > does not have the time to do > > it or add the support to the assembly > > side to handle this correctly. > > What do you think? > > > > I support removing TlabFastRefill, but I > > think it is good to not depend on that > > happening first. > > > > > > I'm slowly pushing on the FastTLABRefill > > (https://bugs.openjdk.java.net/browse/JDK-8194084 ), > > I agree on keeping both separate for now though > > so that we can think of both differently > > > > Now, below, inlined are my answers: > > > > On Fri, Feb 2, 2018 at 8:44 AM, Erik > > ?sterlund > > > > >> wrote: > > > > Hi JC, > > > > Hope I am reviewing the right > > version of your work. Here goes... > > > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > > > 159 > > AllocTracer::send_allocation_outside_tlab(klass, result, size * > > HeapWordSize, THREAD); > > 160 > > 161 > > THREAD->tlab().handle_sample(THREAD, result, size); > > 162 return result; > > 163 } > > > > Should not call tlab()->X without > > checking if (UseTLAB) IMO. > > > > Done! > > > > > > More about this later. > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > > > So first of all, there seems to > > quite a few ends. There is an "end", > > a "hard > > end", a "slow path end", and an > > "actual end". Moreover, it seems > > like the > > "hard end" is actually further away > > than the "actual end". So the "hard end" > > seems like more of a "really > > definitely actual end" or something. > > I don't > > know about you, but I think it looks > > kind of messy. In particular, I don't > > feel like the name "actual end" > > reflects what it represents, > > especially when > > there is another end that is behind > > the "actual end". > > > > 413 HeapWord* > > ThreadLocalAllocBuffer::hard_end() { > > 414 // Did a fast TLAB refill > > occur? > > 415 if (_slow_path_end != _end) { > > 416 // Fix up the actual end > > to be now the end of this TLAB. > > 417 _slow_path_end = _end; > > 418 _actual_end = _end; > > 419 } > > 420 > > 421 return _actual_end + > > alignment_reserve(); > > 422 } > > > > I really do not like making getters > > unexpectedly have these kind of side > > effects. It is not expected that > > when you ask for the "hard end", you > > implicitly update the "slow path > > end" and "actual end" to new values. > > > > As I said, a lot of this is due to the > > FastTlabRefill. If I make this > > not supporting FastTlabRefill, this goes > > away. The reason the system > > needs to update itself at the get is > > that you only know at that get if > > things have shifted underneath the tlab > > slow path. I am not sure of > > really better names (naming is hard!), > > perhaps we could do these > > names: > > > > - current_tlab_end // Either the > > allocated tlab end or a sampling point > > - last_allocation_address // The end of > > the tlab allocation > > - last_slowpath_allocated_end // In > > case a fast refill occurred the > > end might have changed, this is to > > remember slow vs fast past refills > > > > the hard_end method can be renamed to > > something like: > > tlab_end_pointer() // The end of > > the lab including a bit of > > alignment reserved bytes > > > > Those names sound better to me. Could you > > please provide a mapping from the old names > > to the new names so I understand which one > > is which please? > > > > This is my current guess of what you are > > proposing: > > > > end -> current_tlab_end > > actual_end -> last_allocation_address > > slow_path_end -> last_slowpath_allocated_end > > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was proposing. > > > > I would prefer this naming: > > > > end -> slow_path_end // the end for taking a > > slow path; either due to sampling or refilling > > actual_end -> allocation_end // the end for > > allocations > > slow_path_end -> last_slow_path_end // last > > address for slow_path_end (as opposed to > > allocation_end) > > hard_end -> reserved_end // the end of the > > reserved space of the TLAB > > > > About setting things in the getter... that > > still seems like a very unpleasant thing to > > me. It would be better to inspect the call > > hierarchy and explicitly update the ends > > where they need updating, and assert in the > > getter that they are in sync, rather than > > implicitly setting various ends as a > > surprising side effect in a getter. It looks > > like the call hierarchy is very small. With > > my new naming convention, reserved_end() > > would presumably return _allocation_end + > > alignment_reserve(), and have an assert > > checking that _allocation_end == > > _last_slow_path_allocation_end, complaining > > that this invariant must hold, and that a > > caller to this function, such as > > make_parsable(), must first explicitly > > synchronize the ends as required, to honor > > that invariant. > > > > > > I've renamed the variables to how you preferred > > it except for the _end one. I did: > > > > current_end > > > > last_allocation_address > > > > tlab_end_ptr > > > > The reason is that the architecture dependent > > code use the thread.hpp API and it already has > > tlab included into the name so it becomes > > tlab_current_end (which is better that > > tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method > > with a TODO that says to remove it when > > FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > > > Note that the following comment now needs updating > > accordingly in threadLocalAllocBuffer.hpp: > > > > 41 // Heap sampling is performed via > > the end/actual_end fields. > > > > 42 // actual_end contains the real end > > of the tlab allocation, > > > > 43 // whereas end can be set to an > > arbitrary spot in the tlab to > > > > 44 // trip the return and sample the > > allocation. > > > > 45 // slow_path_end is used to track > > if a fast tlab refill occured > > > > 46 // between slowpath calls. > > > > There might be other comments too, I have not looked > > in detail. > > > > This was the only spot that still had an actual_end, I > > fixed it now. I'll do a sweep to double check other > > comments. > > > > > > > > Not sure it's better but before updating > > the webrev, I wanted to try > > to get input/consensus :) > > > > (Note hard_end was always further off > > than end). > > > > src/hotspot/share/prims/jvmti.xml: > > > > 10357 > id="can_sample_heap" since="9"> > > 10358 > > 10359 Can sample the heap. > > 10360 If this capability > > is enabled then the heap sampling > > methods > > can be called. > > 10361 > > 10362 > > > > Looks like this capability should > > not be "since 9" if it gets integrated > > now. > > > > Updated now to 11, crossing my fingers :) > > > > src/hotspot/share/runtime/heapMonitoring.cpp: > > > > 448 if > > (is_alive->do_object_b(value)) { > > 449 // Update the oop to > > point to the new object if it is still > > alive. > > 450 f->do_oop(&(trace.obj)); > > 451 > > 452 // Copy the old > > trace, if it is still live. > > 453 > > _allocated_traces->at_put(curr_pos++, trace); > > 454 > > 455 // Store the live > > trace in a cache, to be served up on > > /heapz. > > 456 > > _traces_on_last_full_gc->append(trace); > > 457 > > 458 count++; > > 459 } else { > > 460 // If the old trace > > is no longer live, add it to the list of > > 461 // recently collected > > garbage. > > 462 > > store_garbage_trace(trace); > > 463 } > > > > In the case where the oop was not > > live, I would like it to be explicitly > > cleared. > > > > Done I think how you wanted it. Let me > > know because I'm not familiar > > with the RootAccess API. I'm unclear if > > I'm doing this right or not so > > reviews of these parts are highly > > appreciated. Robbin had talked of > > perhaps later pushing this all into a > > OopStorage, should I do this now > > do you think? Or can that wait a second > > webrev later down the road? > > > > I think using handles can and should be done > > later. You can use the Access API now. > > I noticed that you are missing an #include > > "oops/access.inline.hpp" in your > > heapMonitoring.cpp file. > > > > The missing header is there for me so I don't > > know, I made sure it is present in the latest > > webrev. Sorry about that. > > > > + Did I clear it the way you wanted me > > to or were you thinking of > > something else? > > > > > > That is precisely how I wanted it to be > > cleared. Thanks. > > > > + Final question here, seems like if I > > were to want to not do the > > f->do_oop directly on the trace.obj, I'd > > need to do something like: > > > > f->do_oop(&value); > > ... > > trace->store_oop(value); > > > > to update the oop internally. Is that > > right/is that one of the > > advantages of going to the Oopstorage > > sooner than later? > > > > > > I think you really want to do the do_oop on > > the root directly. Is there a particular > > reason why you would not want to do that? > > Otherwise, yes - the benefit with using the > > handle approach is that you do not need to > > call do_oop explicitly in your code. > > > > There is no reason except that now we have a > > load_oop and a get_oop_addr, I was not sure what > > you would think of that. > > > > That's fine. > > > > Also I see a lot of > > concurrent-looking use of the > > following field: > > 267 volatile bool _initialized; > > > > Please note that the "volatile" > > qualifier does not help with reordering > > here. Reordering between volatile > > and non-volatile fields is > > completely free > > for both compiler and hardware, > > except for windows with MSVC, where > > volatile > > semantics is defined to use > > acquire/release semantics, and the > > hardware is > > TSO. But for the general case, I > > would expect this field to be stored > > with > > OrderAccess::release_store and > > loaded with OrderAccess::load_acquire. > > Otherwise it is not thread safe. > > > > Because everything is behind a mutex, I > > wasn't really worried about > > this. I have a test that has multiple > > threads trying to hit this > > corner case and it passes. > > > > However, to be paranoid, I updated it to > > using the OrderAccess API > > now, thanks! Let me know what you think > > there too! > > > > > > If it is indeed always supposed to be read > > and written under a mutex, then I would > > strongly prefer to have it accessed as a > > normal non-volatile member, and have an > > assertion that given lock is held or we are > > in a safepoint, as we do in many other > > places. Something like this: > > > > assert(HeapMonitorStorage_lock->owned_by_self() > > || (SafepointSynchronize::is_at_safepoint() > > && Thread::current()->is_VM_thread()), "this > > should not be accessed concurrently"); > > > > It would be confusing to people reading the > > code if there are uses of OrderAccess that > > are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the > > code! I put it around each access/assignment of > > the _initialized method and found one case where > > yes you can touch it and not have the lock. It > > actually is "ok" because you don't act on the > > storage until later and only when you really > > want to modify the storage (see the > > object_alloc_do_sample method which calls the > > add_trace method). > > > > But, because of this, I'm going to put the > > OrderAccess here, I'll do some performance > > numbers later and if there are issues, I might > > add a "unsafe" read and a "safe" one to make it > > explicit to the reader. But I don't think it > > will come to that. > > > > > > Okay. This double return in heapMonitoring.cpp looks > > wrong: > > > > 283 bool initialized() { > > 284 return > > OrderAccess::load_acquire(&_initialized) != 0; > > 285 return _initialized; > > 286 } > > > > Since you said object_alloc_do_sample() is the only > > place where you do not hold the mutex while reading > > initialized(), I had a closer look at that. It looks > > like in its current shape, the lack of a mutex may > > lead to a memory leak. In particular, it first > > checks if (initialized()). Let's assume this is now > > true. It then allocates a bunch of stuff, and checks > > if the number of frames were over 0. If they were, > > it calls StackTraceStorage::storage()->add_trace() > > seemingly hoping that after grabbing the lock in > > there, initialized() will still return true. But it > > could now return false and skip doing anything, in > > which case the allocated stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean > > and checking for that. It will be in the next webrev. > > Thanks, the truth is that in our implementation the > > system is always on or off, so this never really occurs > > :). In this version though, that is not true and it's > > important to handle so thanks again! > > > > > > So the analysis seems to be that _initialized is > > only used outside of the mutex in once instance, > > where it is used to perform double-checked locking, > > that actually causes a memory leak. > > > > I am not proposing how to fix that, just raising the > > issue. If you still want to perform this > > double-checked locking somehow, then the use of > > acquire/release still seems odd. Because the memory > > ordering restrictions of it never comes into play in > > this particular case. If it ever did, then the use > > of destroy_stuff(); release_store(_initialized, 0) > > would be broken anyway as that would imply that > > whatever concurrent reader there ever was would > > after reading _initialized with load_acquire() could > > *never* read the data that is concurrently destroyed > > anyway. I would be biased to think that > > RawAccess::load/store looks like a more > > appropriate solution, given that the memory leak > > issue is resolved. I do not know how painful it > > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more > > and the difference and code really stems from our > > internal version. In this version however, there are > > actually a lot of things going on that I did not go > > entirely through in my head but this comment made me > > ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a > > check to HeapMonitoring::enabled(), there is only a > > small chance that the call is happening when things have > > been disabled. So there is no real need to do a first > > check on the initialized, it is a rare occurence that a > > call happens to object_alloc_do_sample and the > > initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample > > without looking at HeapMonitoring::enabled(), that would > > be ok too. You would gather the stacktrace and get > > nowhere at the add_trace call, which would return false; > > so though not optimal performance wise, nothing would > > break). > > > > Furthermore, the add_trace is really the moment of no > > return and we have the mutex lock and then the > > initialized check. So, in the end, I did two things: I > > removed that first check and then I removed the > > OrderAccess for the storage initialized. I think now I > > have a better grasp and understanding why it was done in > > our code and why it is not needed here. Thanks for > > pointing it out :). This now still passes my JTREG > > tests, especially the threaded one. > > > > > > > > As a kind of meta comment, I wonder > > if it would make sense to add sampling > > for non-TLAB allocations. Seems like > > if someone is rapidly allocating a > > whole bunch of 1 MB objects that > > never fit in a TLAB, I might still be > > interested in seeing that in my > > traces, and not get surprised that the > > allocation rate is very high yet not > > showing up in any profiles. > > > > That is handled by the handle_sample > > where you wanted me to put a > > UseTlab because you hit that case if the > > allocation is too big. > > > > > > I see. It was not obvious to me that > > non-TLAB sampling is done in the TLAB class. > > That seems like an abstraction crime. > > What I wanted in my previous comment was > > that we do not call into the TLAB when we > > are not using TLABs. If there is sampling > > logic in the TLAB that is used for something > > else than TLABs, then it seems like that > > logic simply does not belong inside of the > > TLAB. It should be moved out of the TLAB, > > and instead have the TLAB call this common > > abstraction that makes sense. > > > > So in the incremental version: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > >, > > this is still a "crime". The reason is that the > > system has to have the bytes_until_sample on a > > per-thread level and it made "sense" to have it > > with the TLAB implementation. Also, I was not > > sure how people felt about adding something to > > the thread instance instead. > > > > Do you think it fits better at the Thread level? > > I can see how difficult it is to make it happen > > there and add some logic there. Let me know what > > you think. > > > > > > We have an unfortunate situation where everyone that > > has some fields that are thread local tend to dump > > them right into Thread, making the size and > > complexity of Thread grow as it becomes tightly > > coupled with various unrelated subsystems. It would > > be desirable to have a separate class for this > > instead that encapsulates the sampling logic. That > > class could possibly reside in Thread though as a > > value object of Thread. > > > > I imagined that would be the case but was not sure. I > > will look at the example that Robbin is talking about > > (ThreadSMR) and will see how to refactor my code to use > > that. > > > > Thanks again for your help, > > > > Jc > > > > > > > > Hope I have answered your questions and that > > my feedback makes sense to you. > > > > You have and thank you for them, I think we are > > getting to a cleaner implementation and things > > are getting better and more readable :) > > > > > > Yes it is getting better. > > > > Thanks, > > /Erik > > > > > > > > Thanks for your help! > > > > Jc > > > > Thanks, > > /Erik > > > > I double checked by changing the test > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Fri Apr 6 23:12:52 2018 From: jcbeyler at google.com (JC Beyler) Date: Fri, 06 Apr 2018 23:12:52 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi Karen, Let me inline my answers, it will probably be easier :) On Fri, Apr 6, 2018 at 2:40 PM Karen Kinnear wrote: > JC, > > Thank you for the updates - really glad you are including the compiler > folks. I reviewed the version before this one, so ignore > any comments you?ve already covered (although I did peek at the latest) > > 1. JDK-8194905 CSR - could you please delete attachments that are not > current. It is a bit confusing right now. > I have been looking at jvmti_event6.html. I am assuming the rest are > obsolete and could be removed please. > I tried but it does not allow me to do. It seems that someone with admistrative rights has to do it :-(. That is why I had to resort to this... > > 2. In jvmti_event6.html under Sampled Object Allocation, there is a link > to ?Heap Sampling Monitoring System?. > It takes me to the top of the page - seems like something is missing in > defining it? > So the Heap Sampling Monitoring System used to have more methods. It made sense to have them in a separate category. I now have moved it to the memory category to be consistent and grouped there. I also removed that link btw. > > 3. Scope of memory allocation tracking > > I am struggling to understand the extent of memory allocation tracking > that you are looking for (probably want to > clarify in the JEP and CSR once we work this through). > > e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent > So the current jvmtiVMObjectAllocEvent says: > > Sent when a method causes the VM to allocate an Object visible to Java > and allocation not detectable by other instrumentation mechanisms. > Generally detect by instrumenting bytecodes of allocating methods > JNI - use JNI function interception > e.g. Reflection: java.lang.Class.newInstance() > e.g. VM intrinsics > > comment: > Not generated due to bytecodes - e.g. new and newarray VM instructions > Not allocation due to JNI: e.g. AllocObject > NOT VM internal objects > NOT allocations during VM init > > So from the JEP I can?t tell the intended scope of the new event - is this > intended to cover all heap allocation? > bytecodes > JVM_* > JNI_* > internal VM objects > other? (I?m not sure what other there are) > - I presume not allocations during VM init - since sent only during > live phase > Yes exactly, as much as possible, I am aiming to cover all heap allocations. Mostly though and in practice, I think we care about bytecodes and to a lesser extend JNI. In being independent of why the memory is being allocated is probably even better: this thread allocated Y, no matter where/why that ones. > > OR - is the primary goal to cover allocation for bytecodes so folks can > skip instrumentation? > Yes that is the primary goal. > OR - do you want to get performance numbers and see what is low enough > overhead before deciding? > I think it is the same, the system is relatively in place and my overhead seems to indicate that there is a 0% off, 1% on but the callback to the user is empty, 3% for a naive implementation tracking live/GC'd objects. > > 4. The design question is where to put the collectors in the source base - > and that of course strongly depends on > the scope of the information you want to collect, and on the performance > overhead we are willing to incur. > Very true. > > I was trying to figure out a way to put the collectors farther down the > call stack so as to both catch more > cases and to reduce the maintenance burden - i.e. if you were to add a new > code generator, e.g. Graal - > if it were to go through an existing interface, that might be a place to > already have a collector. > > I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - > and it appears that there > are calls to instanceKlass::new_instance, > oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, > so one possibility would be to put hooks in those calls which would catch > many? (I did not do a thorough search) > of the slowpath calls for the bytecodes, and then check the fast paths in > detail. > I'll come to a major issue with the collector and its placement in the next paragraph. > > I had wondered if it made sense to move the hooks even farther down, into > CollectedHeap:obj_allocate and array_allocate. > I do not think so. First reason is that for multidimensional arrays, > ArrayKlass::multi_allocate the outer dimension array would > have an event before storing the inner sub-arrays and I don?t think we > want that exposed, so that won?t work for arrays. > So the major difficulty is that the steps of collection do this: - An object gets allocated and is decided to be sampled - The original pointer placement (where it resides originally in memory) is passed to the collector - Now one important thing of note: (a) In the VM code, until the point where the oop is going to be returned, GC is not yet aware of it (b) so the collector can't yet send it out to the user via JVMTI otherwise, the agent could put a weak reference for example I'm a bit fuzzy on this and maybe it's just that there would be more heavy lifting to make this possible but my initial tests seem to show problems when attempting this in the obj_allocate area. > > The second reason is that I strongly suspect the scope you want is > bytecodes only. I think once you have added hooks > to all the fast paths and slow paths that this will be pushing the > performance overhead constraints you proposed and > you won?t want to see e.g. internal allocations. > Yes agreed, allocations from bytecodes are mostly our concern generally :) > > But I think you need to experiment with the set of allocations (or possible > alternative sets of allocations) you want recorded. > > The hooks I see today include: > Interpreter: (looking at x86 as a sample) > - slowpath in InterpreterRuntime > - fastpath tlab allocation - your new threshold check handles that > Agreed > - allow_shared_alloc (GC specific): for _new isn?t handled > Where is that exactly? I can check why we are not catching it? > > C1 > I don?t see changes in c1_Runtime.cpp > note: you also want to look for the fast path > I added the calls to c1_Runtime in the latest webrev, but was still going through testing before pushing it out. I had waited on this one a bit. Fast path would be handled by the threshold check no? > C2: changes in opto/runtime.cpp for slow path > did you also catch the fast path? > Fast path gets handled by the same threshold check, no? Perhaps I've missed something (very likely)? > > 3. Performance - > After you get all the collectors added - you need to rerun the performance > numbers. > Agreed :) > > thanks, > Karen > > On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: > > Thanks Boris and Derek for testing it. > > Yes I was trying to get a new version out that had the tests ported as > well but got sidetracked while trying to add tests and two new features. > > Here is the incremental webrev: > > Here is the full webrev: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ > > Basically, the new tests assert this: > - Only one agent can currently ask for the sampling, I'm currently > seeing if I can push to a next webrev the multi-agent support to start > doing a code freeze on this one > - The event is not thread-enabled, meaning like the > VMObjectAllocationEvent, it's an all or nothing event; same as the > multi-agent, I'm going to see if a future webrev to add the support is a > better idea to freeze this webrev a bit > > There was another item that I added here and I'm unsure this webrev is > stable in debug mode: I added an assertion system to ascertain that all > paths leading to a TLAB slow path (and hence a sampling point) have a > sampling collector ready to post the event if a user wants it. This might > break a few thing in debug mode as I'm working through the kinks of that as > well. However, in release mode, this new webrev passes all the tests in > hotspot/jtreg/serviceability/jvmti/HeapMonitor. > > Let me know what you think, > Jc > > On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < > boris.ulasevich at bell-sw.com> wrote: > >> Hi JC, >> >> I have just checked on arm32: your patch compiles and runs ok. >> >> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >> correspond to actual library name: libHeapMonitorTest.c -> >> libHeapMonitorTest.so >> >> Boris >> >> On 04.04.2018 01:54, White, Derek wrote: >> > Thanks JC, >> > >> > New patch applies cleanly. Compiles and runs (simple test programs) on >> > aarch64. >> > >> > * Derek >> > >> > *From:* JC Beyler [mailto:jcbeyler at google.com] >> > *Sent:* Monday, April 02, 2018 1:17 PM >> > *To:* White, Derek >> > *Cc:* Erik ?sterlund ; >> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >> > >> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >> > >> > Hi Derek, >> > >> > I know there were a few things that went in that provoked a merge >> > conflict. I worked on it and got it up to date. Sadly my lack of >> > knowledge makes it a full rebase instead of keeping all the history. >> > However, with a newly cloned jdk/hs you should now be able to use: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >> > >> > The change you are referring to was done with the others so perhaps you >> > were unlucky and I forgot it in a webrev and fixed it in another? I >> > don't know but it's been there and I checked, it is here: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >> > >> > I double checked that tlab_end_offset no longer appears in any >> > architecture (as far as I can tell :)). >> > >> > Thanks for testing and let me know if you run into any other issues! >> > >> > Jc >> > >> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > > > wrote: >> > >> > Hi Jc, >> > >> > I?ve been having trouble getting your patch to apply correctly. I >> > may have based it on the wrong version. >> > >> > In any case, I think there?s a missing update to >> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >> > where ?JavaThread::tlab_end_offset()? should become >> > ?JavaThread::tlab_current_end_offset()?. >> > >> > This should correspond to the other port?s changes in >> > templateTable_.cpp files. >> > >> > Thanks! >> > - Derek >> > >> > *From:* hotspot-compiler-dev >> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >> > ] *On Behalf >> > Of *JC Beyler >> > *Sent:* Wednesday, March 28, 2018 11:43 AM >> > *To:* Erik ?sterlund > > > >> > *Cc:* serviceability-dev at openjdk.java.net >> > ; hotspot-compiler-dev >> > > > > >> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >> > >> > Hi all, >> > >> > I've been working on deflaking the tests mostly and the wording in >> > the JVMTI spec. >> > >> > Here is the two incremental webrevs: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >> > >> > Here is the total webrev: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >> > >> > Here are the notes of this change: >> > >> > - Currently the tests pass 100 times in a row, I am working on >> > checking if they pass 1000 times in a row. >> > >> > - The default sampling rate is set to 512k, this is what we use >> > internally and having a default means that to enable the sampling >> > with the default, the user only has to do a enable event/disable >> > event via JVMTI (instead of enable + set sample rate). >> > >> > - I deprecated the code that was handling the fast path tlab >> > refill if it happened since this is now deprecated >> > >> > - Though I saw that Graal is still using it so I have to see >> > what needs to be done there exactly >> > >> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for >> > when the event system is turned on and the callback to the native >> > agent is just empty. I got a 3% overhead with a 512k sampling rate >> > with the code I put in the native side of my tests. >> > >> > Thanks and comments are appreciated, >> > >> > Jc >> > >> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > > > wrote: >> > >> > Hi all, >> > >> > The incremental webrev update is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >> > >> > The full webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >> > >> > Major change here is: >> > >> > - I've removed the heapMonitoring.cpp code in favor of just >> > having the sampling events as per Serguei's request; I still >> > have to do some overhead measurements but the tests prove the >> > concept can work >> > >> > - Most of the tlab code is unchanged, the only major >> > part is that now things get sent off to event collectors when >> > used and enabled. >> > >> > - Added the interpreter collectors to handle interpreter >> > execution >> > >> > - Updated the name from SetTlabHeapSampling to >> > SetHeapSampling to be more generic >> > >> > - Added a mutex for the thread sampling so that we can >> > initialize an internal static array safely >> > >> > - Ported the tests from the old system to this new one >> > >> > I've also updated the JEP and CSR to reflect these changes: >> > >> > https://bugs.openjdk.java.net/browse/JDK-8194905 >> > >> > https://bugs.openjdk.java.net/browse/JDK-8171119 >> > >> > In order to make this have some forward progress, I've removed >> > the heap sampling code entirely and now rely entirely on the >> > event sampling system. The tests reflect this by using a >> > simplified implementation of what an agent could do: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >> > >> > (Search for anything mentioning event_storage). >> > >> > I have not taken the time to port the whole code we had >> > originally in heapMonitoring to this. I hesitate only because >> > that code was in C++, I'd have to port it to C and this is for >> > tests so perhaps what I have now is good enough? >> > >> > As far as testing goes, I've ported all the relevant tests and >> > then added a few: >> > >> > - Turning the system on/off >> > >> > - Testing using various GCs >> > >> > - Testing using the interpreter >> > >> > - Testing the sampling rate >> > >> > - Testing with objects and arrays >> > >> > - Testing with various threads >> > >> > Finally, as overhead goes, I have the numbers of the system off >> > vs a clean build and I have 0% overhead, which is what we'd >> > want. This was using the Dacapo benchmarks. I am now preparing >> > to run a version with the events on using dacapo and will report >> > back here. >> > >> > Any comments are welcome :) >> > >> > Jc >> > >> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > > > wrote: >> > >> > Hi all, >> > >> > I apologize for the delay but I wanted to add an event >> > system and that took a bit longer than expected and I also >> > reworked the code to take into account the deprecation of >> > FastTLABRefill. >> > >> > This update has four parts: >> > >> > A) I moved the implementation from Thread to >> > ThreadHeapSampler inside of Thread. Would you prefer it as a >> > pointer inside of Thread or like this works for you? Second >> > question would be would you rather have an association >> > outside of Thread altogether that tries to remember when >> > threads are live and then we would have something like: >> > >> > ThreadHeapSampler::get_sampling_size(this_thread); >> > >> > I worry about the overhead of this but perhaps it is not too >> > too bad? >> > >> > B) I also have been working on the Allocation event system >> > that sends out a notification at each sampled event. This >> > will be practical when wanting to do something at the >> > allocation point. I'm also looking at if the whole >> > heapMonitoring code could not reside in the agent code and >> > not in the JDK. I'm not convinced but I'm talking to Serguei >> > about it to see/assess :) >> > >> > - Also added two tests for the new event subsystem >> > >> > C) Removed the slow_path fields inside the TLAB code since >> > now FastTLABRefill is deprecated >> > >> > D) Updated the JVMTI documentation and specification for the >> > methods. >> > >> > So the incremental webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >> > >> > and the full webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >> > >> > I believe I have updated the various JIRA issues that track >> > this :) >> > >> > Thanks for your input, >> > >> > Jc >> > >> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >> > > wrote: >> > >> > Hi Erik, >> > >> > I inlined my answers, which the last one seems to answer >> > Robbin's concerns about the same thing (adding things to >> > Thread). >> > >> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >> > > > > wrote: >> > >> > Hi JC, >> > >> > Comments are inlined below. >> > >> > On 2018-02-13 06:18, JC Beyler wrote: >> > >> > Hi Erik, >> > >> > Thanks for your answers, I've now inlined my own >> > answers/comments. >> > >> > I've done a new webrev here: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >> > >> > The incremental is here: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >> > >> > Note to all: >> > >> > - I've been integrating changes from >> > Erin/Serguei/David comments so this webrev >> > incremental is a bit an answer to all comments >> > in one. I apologize for that :) >> > >> > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund >> > > > > wrote: >> > >> > Hi JC, >> > >> > Sorry for the delayed reply. >> > >> > Inlined answers: >> > >> > >> > >> > On 2018-02-06 00:04, JC Beyler wrote: >> > >> > Hi Erik, >> > >> > (Renaming this to be folded into the >> > newly renamed thread :)) >> > >> > First off, thanks a lot for reviewing >> > the webrev! I appreciate it! >> > >> > I updated the webrev to: >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >> > >> > And the incremental one is here: >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >> > >> > It contains: >> > - The change for since from 9 to 11 for >> > the jvmti.xml >> > - The use of the OrderAccess for >> initialized >> > - Clearing the oop >> > >> > I also have inlined my answers to your >> > comments. The biggest question >> > will come from the multiple *_end >> > variables. A bit of the logic there >> > is due to handling the slow path refill >> > vs fast path refill and >> > checking that the rug was not pulled >> > underneath the slowpath. I >> > believe that a previous comment was that >> > TlabFastRefill was going to >> > be deprecated. >> > >> > If this is true, we could revert this >> > code a bit and just do a : if >> > TlabFastRefill is enabled, disable this. >> > And then deprecate that when >> > TlabFastRefill is deprecated. >> > >> > This might simplify this webrev and I >> > can work on a follow-up that >> > either: removes TlabFastRefill if Robbin >> > does not have the time to do >> > it or add the support to the assembly >> > side to handle this correctly. >> > What do you think? >> > >> > I support removing TlabFastRefill, but I >> > think it is good to not depend on that >> > happening first. >> > >> > >> > I'm slowly pushing on the FastTLABRefill >> > ( >> https://bugs.openjdk.java.net/browse/JDK-8194084), >> > I agree on keeping both separate for now though >> > so that we can think of both differently >> > >> > Now, below, inlined are my answers: >> > >> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >> > ?sterlund >> > > > > >> wrote: >> > >> > Hi JC, >> > >> > Hope I am reviewing the right >> > version of your work. Here goes... >> > >> > >> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >> > >> > 159 >> > >> AllocTracer::send_allocation_outside_tlab(klass, result, size * >> > HeapWordSize, THREAD); >> > 160 >> > 161 >> > >> THREAD->tlab().handle_sample(THREAD, result, size); >> > 162 return result; >> > 163 } >> > >> > Should not call tlab()->X without >> > checking if (UseTLAB) IMO. >> > >> > Done! >> > >> > >> > More about this later. >> > >> > >> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >> > >> > So first of all, there seems to >> > quite a few ends. There is an "end", >> > a "hard >> > end", a "slow path end", and an >> > "actual end". Moreover, it seems >> > like the >> > "hard end" is actually further away >> > than the "actual end". So the "hard >> end" >> > seems like more of a "really >> > definitely actual end" or something. >> > I don't >> > know about you, but I think it looks >> > kind of messy. In particular, I >> don't >> > feel like the name "actual end" >> > reflects what it represents, >> > especially when >> > there is another end that is behind >> > the "actual end". >> > >> > 413 HeapWord* >> > ThreadLocalAllocBuffer::hard_end() { >> > 414 // Did a fast TLAB refill >> > occur? >> > 415 if (_slow_path_end != >> _end) { >> > 416 // Fix up the actual end >> > to be now the end of this TLAB. >> > 417 _slow_path_end = _end; >> > 418 _actual_end = _end; >> > 419 } >> > 420 >> > 421 return _actual_end + >> > alignment_reserve(); >> > 422 } >> > >> > I really do not like making getters >> > unexpectedly have these kind of side >> > effects. It is not expected that >> > when you ask for the "hard end", you >> > implicitly update the "slow path >> > end" and "actual end" to new values. >> > >> > As I said, a lot of this is due to the >> > FastTlabRefill. If I make this >> > not supporting FastTlabRefill, this goes >> > away. The reason the system >> > needs to update itself at the get is >> > that you only know at that get if >> > things have shifted underneath the tlab >> > slow path. I am not sure of >> > really better names (naming is hard!), >> > perhaps we could do these >> > names: >> > >> > - current_tlab_end // Either the >> > allocated tlab end or a sampling point >> > - last_allocation_address // The end of >> > the tlab allocation >> > - last_slowpath_allocated_end // In >> > case a fast refill occurred the >> > end might have changed, this is to >> > remember slow vs fast past refills >> > >> > the hard_end method can be renamed to >> > something like: >> > tlab_end_pointer() // The end of >> > the lab including a bit of >> > alignment reserved bytes >> > >> > Those names sound better to me. Could you >> > please provide a mapping from the old names >> > to the new names so I understand which one >> > is which please? >> > >> > This is my current guess of what you are >> > proposing: >> > >> > end -> current_tlab_end >> > actual_end -> last_allocation_address >> > slow_path_end -> last_slowpath_allocated_end >> > hard_end -> tlab_end_pointer >> > >> > Yes that is correct, that was what I was >> proposing. >> > >> > I would prefer this naming: >> > >> > end -> slow_path_end // the end for taking a >> > slow path; either due to sampling or >> refilling >> > actual_end -> allocation_end // the end for >> > allocations >> > slow_path_end -> last_slow_path_end // last >> > address for slow_path_end (as opposed to >> > allocation_end) >> > hard_end -> reserved_end // the end of the >> > reserved space of the TLAB >> > >> > About setting things in the getter... that >> > still seems like a very unpleasant thing to >> > me. It would be better to inspect the call >> > hierarchy and explicitly update the ends >> > where they need updating, and assert in the >> > getter that they are in sync, rather than >> > implicitly setting various ends as a >> > surprising side effect in a getter. It looks >> > like the call hierarchy is very small. With >> > my new naming convention, reserved_end() >> > would presumably return _allocation_end + >> > alignment_reserve(), and have an assert >> > checking that _allocation_end == >> > _last_slow_path_allocation_end, complaining >> > that this invariant must hold, and that a >> > caller to this function, such as >> > make_parsable(), must first explicitly >> > synchronize the ends as required, to honor >> > that invariant. >> > >> > >> > I've renamed the variables to how you preferred >> > it except for the _end one. I did: >> > >> > current_end >> > >> > last_allocation_address >> > >> > tlab_end_ptr >> > >> > The reason is that the architecture dependent >> > code use the thread.hpp API and it already has >> > tlab included into the name so it becomes >> > tlab_current_end (which is better that >> > tlab_current_tlab_end in my opinion). >> > >> > I also moved the update into a separate method >> > with a TODO that says to remove it when >> > FastTLABRefill is deprecated >> > >> > This looks a lot better now. Thanks. >> > >> > Note that the following comment now needs updating >> > accordingly in threadLocalAllocBuffer.hpp: >> > >> > 41 // Heap sampling is performed via >> > the end/actual_end fields. >> > >> > 42 // actual_end contains the real end >> > of the tlab allocation, >> > >> > 43 // whereas end can be set to an >> > arbitrary spot in the tlab to >> > >> > 44 // trip the return and sample the >> > allocation. >> > >> > 45 // slow_path_end is used to track >> > if a fast tlab refill occured >> > >> > 46 // between slowpath calls. >> > >> > There might be other comments too, I have not looked >> > in detail. >> > >> > This was the only spot that still had an actual_end, I >> > fixed it now. I'll do a sweep to double check other >> > comments. >> > >> > >> > >> > Not sure it's better but before updating >> > the webrev, I wanted to try >> > to get input/consensus :) >> > >> > (Note hard_end was always further off >> > than end). >> > >> > src/hotspot/share/prims/jvmti.xml: >> > >> > 10357 > > id="can_sample_heap" since="9"> >> > 10358 >> > 10359 Can sample the heap. >> > 10360 If this capability >> > is enabled then the heap sampling >> > methods >> > can be called. >> > 10361 >> > 10362 >> > >> > Looks like this capability should >> > not be "since 9" if it gets >> integrated >> > now. >> > >> > Updated now to 11, crossing my fingers >> :) >> > >> > >> src/hotspot/share/runtime/heapMonitoring.cpp: >> > >> > 448 if >> > (is_alive->do_object_b(value)) { >> > 449 // Update the oop to >> > point to the new object if it is >> still >> > alive. >> > 450 >> f->do_oop(&(trace.obj)); >> > 451 >> > 452 // Copy the old >> > trace, if it is still live. >> > 453 >> > >> _allocated_traces->at_put(curr_pos++, trace); >> > 454 >> > 455 // Store the live >> > trace in a cache, to be served up on >> > /heapz. >> > 456 >> > >> _traces_on_last_full_gc->append(trace); >> > 457 >> > 458 count++; >> > 459 } else { >> > 460 // If the old trace >> > is no longer live, add it to the >> list of >> > 461 // recently collected >> > garbage. >> > 462 >> > store_garbage_trace(trace); >> > 463 } >> > >> > In the case where the oop was not >> > live, I would like it to be >> explicitly >> > cleared. >> > >> > Done I think how you wanted it. Let me >> > know because I'm not familiar >> > with the RootAccess API. I'm unclear if >> > I'm doing this right or not so >> > reviews of these parts are highly >> > appreciated. Robbin had talked of >> > perhaps later pushing this all into a >> > OopStorage, should I do this now >> > do you think? Or can that wait a second >> > webrev later down the road? >> > >> > I think using handles can and should be done >> > later. You can use the Access API now. >> > I noticed that you are missing an #include >> > "oops/access.inline.hpp" in your >> > heapMonitoring.cpp file. >> > >> > The missing header is there for me so I don't >> > know, I made sure it is present in the latest >> > webrev. Sorry about that. >> > >> > + Did I clear it the way you wanted me >> > to or were you thinking of >> > something else? >> > >> > >> > That is precisely how I wanted it to be >> > cleared. Thanks. >> > >> > + Final question here, seems like if I >> > were to want to not do the >> > f->do_oop directly on the trace.obj, I'd >> > need to do something like: >> > >> > f->do_oop(&value); >> > ... >> > trace->store_oop(value); >> > >> > to update the oop internally. Is that >> > right/is that one of the >> > advantages of going to the Oopstorage >> > sooner than later? >> > >> > >> > I think you really want to do the do_oop on >> > the root directly. Is there a particular >> > reason why you would not want to do that? >> > Otherwise, yes - the benefit with using the >> > handle approach is that you do not need to >> > call do_oop explicitly in your code. >> > >> > There is no reason except that now we have a >> > load_oop and a get_oop_addr, I was not sure what >> > you would think of that. >> > >> > That's fine. >> > >> > Also I see a lot of >> > concurrent-looking use of the >> > following field: >> > 267 volatile bool _initialized; >> > >> > Please note that the "volatile" >> > qualifier does not help with >> reordering >> > here. Reordering between volatile >> > and non-volatile fields is >> > completely free >> > for both compiler and hardware, >> > except for windows with MSVC, where >> > volatile >> > semantics is defined to use >> > acquire/release semantics, and the >> > hardware is >> > TSO. But for the general case, I >> > would expect this field to be stored >> > with >> > OrderAccess::release_store and >> > loaded with >> OrderAccess::load_acquire. >> > Otherwise it is not thread safe. >> > >> > Because everything is behind a mutex, I >> > wasn't really worried about >> > this. I have a test that has multiple >> > threads trying to hit this >> > corner case and it passes. >> > >> > However, to be paranoid, I updated it to >> > using the OrderAccess API >> > now, thanks! Let me know what you think >> > there too! >> > >> > >> > If it is indeed always supposed to be read >> > and written under a mutex, then I would >> > strongly prefer to have it accessed as a >> > normal non-volatile member, and have an >> > assertion that given lock is held or we are >> > in a safepoint, as we do in many other >> > places. Something like this: >> > >> > >> assert(HeapMonitorStorage_lock->owned_by_self() >> > || (SafepointSynchronize::is_at_safepoint() >> > && Thread::current()->is_VM_thread()), "this >> > should not be accessed concurrently"); >> > >> > It would be confusing to people reading the >> > code if there are uses of OrderAccess that >> > are actually always protected under a mutex. >> > >> > Thank you for the exact example to be put in the >> > code! I put it around each access/assignment of >> > the _initialized method and found one case where >> > yes you can touch it and not have the lock. It >> > actually is "ok" because you don't act on the >> > storage until later and only when you really >> > want to modify the storage (see the >> > object_alloc_do_sample method which calls the >> > add_trace method). >> > >> > But, because of this, I'm going to put the >> > OrderAccess here, I'll do some performance >> > numbers later and if there are issues, I might >> > add a "unsafe" read and a "safe" one to make it >> > explicit to the reader. But I don't think it >> > will come to that. >> > >> > >> > Okay. This double return in heapMonitoring.cpp looks >> > wrong: >> > >> > 283 bool initialized() { >> > 284 return >> > OrderAccess::load_acquire(&_initialized) != 0; >> > 285 return _initialized; >> > 286 } >> > >> > Since you said object_alloc_do_sample() is the only >> > place where you do not hold the mutex while reading >> > initialized(), I had a closer look at that. It looks >> > like in its current shape, the lack of a mutex may >> > lead to a memory leak. In particular, it first >> > checks if (initialized()). Let's assume this is now >> > true. It then allocates a bunch of stuff, and checks >> > if the number of frames were over 0. If they were, >> > it calls StackTraceStorage::storage()->add_trace() >> > seemingly hoping that after grabbing the lock in >> > there, initialized() will still return true. But it >> > could now return false and skip doing anything, in >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at skarsaune.net Sat Apr 7 05:55:48 2018 From: martin at skarsaune.net (Martin Skarsaune) Date: Sat, 07 Apr 2018 05:55:48 +0000 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: Hi Pietro Not sure JDI is what you really want, but if you would like to play with it I have some code here that uses the PID of the JVM to open a connection to itself and among other things print stack frames with variables: https://github.com/skarsaune/kantega.debug and some demo here: https://www.youtube.com/watch?v=5sXxIfjaALg So an example of what you can do, but not suitable for anything serious. For inspecting the stack, there is an cool reflection hack to the Java 9 API demonstrated by Andrei Pangin here that is able to capture stack values: https://vimeo.com/233820012 For serious work I suppose an JVMTI agent is the best option. Others are in a better position to offer guidance on that. Martin fre. 6. apr. 2018 kl. 18:14 skrev Pietro Paolini < Pietro.Paolini at alfasystems.com>: > Hi all, > > > > I apologise if this is not the right ML for it but I couldn?t find > exactly what I was looking for when Googling the problem. I am a bit new to > the JDI world. > > > > I would like to inspect the stack-frame of a specific thread, I came > across the StackFrame/ThreadReference classes but I couldn?t find a way > examples where their usage is shown > > without connecting to the VM somehow, like a debugger would do. > > > > Is it possible to > > > > inspect a thread?s stack ?locally? ? In my mind I could be able to have a > function such as : > > > > static void hook(Thread thread) { > > > > thread.wait() // stop that thread > > > > // inspect the frames of that thread doing any needed business with them > > } > > > > I?d need this for diagnostic purposes of my application. > > > > Thanks, > > Pietro > > > > > > > Pietro Paolini > Consultant > > Alfa > ------------------------------ > e: pietro.paolini at alfasystems.com | w: alfasystems.com > > t: +44 (0) 20 7920-2643 <+44%2020%207920%202643> | Moor Place, 1 Fore > Street Avenue, London, EC2Y 9DT > , > GB > ------------------------------ > > The contents of this communication are not intended to be binding or > constitute any form of offer or acceptance or give rise to any legal > obligations on behalf of the sender or Alfa. The views or opinions > expressed represent those of the author and not necessarily those of Alfa. > This email and any attachments are strictly confidential and are intended > solely for use by the individual or entity to whom it is addressed. If you > are not the addressee (or responsible for delivery of the message to the > addressee) you may not copy, forward, disclose or use any part of the > message or its attachments. At present the integrity of email across the > internet cannot be guaranteed and messages sent via this medium are > potentially at risk. All liability is excluded to the extent permitted by > law for any claims arising as a result of the use of this medium to > transmit information by or to Alfa or its affiliates. > > Alfa Financial Software Ltd > Reg. in England No: 0248 2325 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Sat Apr 7 08:43:42 2018 From: aph at redhat.com (Andrew Haley) Date: Sat, 7 Apr 2018 09:43:42 +0100 Subject: =?UTF-8?Q?Re:_inspect_a_thread=e2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: On 04/06/2018 05:13 PM, Pietro Paolini wrote: > Is it possible to > > inspect a thread?s stack ?locally? ? Have you looked at ThreadMXBean.getThreadInfo(id).getStackTrace() ? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martinrb at google.com Sun Apr 8 18:49:26 2018 From: martinrb at google.com (Martin Buchholz) Date: Sun, 8 Apr 2018 11:49:26 -0700 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: Access to stacktraces with locals is demoed in this test http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/StackWalker/LocalsAndOperands.java but the functionality does not seem to be available (yet!) via a public API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Mon Apr 9 05:48:46 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 09 Apr 2018 05:48:46 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: Hi all, Here is the new webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12/ with the incremental here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11_12/ After banging my head against the interactions of the runtime and the GC, I finally got the next features up and ready: - The multi-agent support (with a new test) - The thread support (with a new test) - I have an assert at the moment of allocation to check there is a sampled collector available (though could be disabled) - This webrev puts the collector at the collectedHeap.inline.hpp level (so removing all of the outer collectors) - Note there is one current caveat: if the agent requests the VMObjectAlloc event, the sampler defers to that event due to a limitation in its implementation (ie: I am not convinced I can safely send out that event with the VM collector enabled, I'll happily white board that). - I updated the jvmti.xml accordingly btw. Let me know what you think! Jc On Fri, Apr 6, 2018 at 4:12 PM JC Beyler wrote: > Hi Karen, > > Let me inline my answers, it will probably be easier :) > > On Fri, Apr 6, 2018 at 2:40 PM Karen Kinnear > wrote: > >> JC, >> >> Thank you for the updates - really glad you are including the compiler >> folks. I reviewed the version before this one, so ignore >> any comments you?ve already covered (although I did peek at the latest) >> >> 1. JDK-8194905 CSR - could you please delete attachments that are not >> current. It is a bit confusing right now. >> I have been looking at jvmti_event6.html. I am assuming the rest are >> obsolete and could be removed please. >> > > I tried but it does not allow me to do. It seems that someone with > admistrative rights has to do it :-(. That is why I had to resort to this... > > >> >> 2. In jvmti_event6.html under Sampled Object Allocation, there is a link >> to ?Heap Sampling Monitoring System?. >> It takes me to the top of the page - seems like something is missing in >> defining it? >> > > So the Heap Sampling Monitoring System used to have more methods. It made > sense to have them in a separate category. I now have moved it to the > memory category to be consistent and grouped there. I also removed that > link btw. > > > >> >> 3. Scope of memory allocation tracking >> >> I am struggling to understand the extent of memory allocation tracking >> that you are looking for (probably want to >> clarify in the JEP and CSR once we work this through). >> >> e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent >> So the current jvmtiVMObjectAllocEvent says: >> >> Sent when a method causes the VM to allocate an Object visible to Java >> and allocation not detectable by other instrumentation mechanisms. >> Generally detect by instrumenting bytecodes of allocating methods >> JNI - use JNI function interception >> e.g. Reflection: java.lang.Class.newInstance() >> e.g. VM intrinsics >> >> comment: >> Not generated due to bytecodes - e.g. new and newarray VM instructions >> Not allocation due to JNI: e.g. AllocObject >> NOT VM internal objects >> NOT allocations during VM init >> >> So from the JEP I can?t tell the intended scope of the new event - is >> this intended to cover all heap allocation? >> bytecodes >> JVM_* >> JNI_* >> internal VM objects >> other? (I?m not sure what other there are) >> - I presume not allocations during VM init - since sent only during >> live phase >> > > Yes exactly, as much as possible, I am aiming to cover all heap > allocations. Mostly though and in practice, I think we care about bytecodes > and to a lesser extend JNI. In being independent of why the memory is being > allocated is probably even better: this thread allocated Y, no matter > where/why that ones. > > >> >> OR - is the primary goal to cover allocation for bytecodes so folks can >> skip instrumentation? >> > > Yes that is the primary goal. > > >> OR - do you want to get performance numbers and see what is low enough >> overhead before deciding? >> > > I think it is the same, the system is relatively in place and my overhead > seems to indicate that there is a 0% off, 1% on but the callback to the > user is empty, 3% for a naive implementation tracking live/GC'd objects. > > >> >> 4. The design question is where to put the collectors in the source base >> - and that of course strongly depends on >> the scope of the information you want to collect, and on the performance >> overhead we are willing to incur. >> > > Very true. > > >> >> I was trying to figure out a way to put the collectors farther down the >> call stack so as to both catch more >> cases and to reduce the maintenance burden - i.e. if you were to add a >> new code generator, e.g. Graal - >> if it were to go through an existing interface, that might be a place to >> already have a collector. >> >> I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - >> and it appears that there >> are calls to instanceKlass::new_instance, >> oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >> so one possibility would be to put hooks in those calls which would catch >> many? (I did not do a thorough search) >> of the slowpath calls for the bytecodes, and then check the fast paths in >> detail. >> > > I'll come to a major issue with the collector and its placement in the > next paragraph. > > >> >> I had wondered if it made sense to move the hooks even farther down, into >> CollectedHeap:obj_allocate and array_allocate. >> I do not think so. First reason is that for multidimensional arrays, >> ArrayKlass::multi_allocate the outer dimension array would >> have an event before storing the inner sub-arrays and I don?t think we >> want that exposed, so that won?t work for arrays. >> > > So the major difficulty is that the steps of collection do this: > > - An object gets allocated and is decided to be sampled > - The original pointer placement (where it resides originally in memory) > is passed to the collector > - Now one important thing of note: > (a) In the VM code, until the point where the oop is going to be > returned, GC is not yet aware of it > (b) so the collector can't yet send it out to the user via JVMTI > otherwise, the agent could put a weak reference for example > > I'm a bit fuzzy on this and maybe it's just that there would be more heavy > lifting to make this possible but my initial tests seem to show problems > when attempting this in the obj_allocate area. > > >> >> The second reason is that I strongly suspect the scope you want is >> bytecodes only. I think once you have added hooks >> to all the fast paths and slow paths that this will be pushing the >> performance overhead constraints you proposed and >> you won?t want to see e.g. internal allocations. >> > > Yes agreed, allocations from bytecodes are mostly our concern generally :) > > >> >> > But I think you need to experiment with the set of allocations (or >> possible alternative sets of allocations) you want recorded. >> >> The hooks I see today include: >> Interpreter: (looking at x86 as a sample) >> - slowpath in InterpreterRuntime >> - fastpath tlab allocation - your new threshold check handles that >> > > Agreed > > >> - allow_shared_alloc (GC specific): for _new isn?t handled >> > > Where is that exactly? I can check why we are not catching it? > > >> >> C1 >> I don?t see changes in c1_Runtime.cpp >> note: you also want to look for the fast path >> > > I added the calls to c1_Runtime in the latest webrev, but was still going > through testing before pushing it out. I had waited on this one a bit. Fast > path would be handled by the threshold check no? > > >> C2: changes in opto/runtime.cpp for slow path >> did you also catch the fast path? >> > > Fast path gets handled by the same threshold check, no? Perhaps I've > missed something (very likely)? > > >> >> 3. Performance - >> After you get all the collectors added - you need to rerun the >> performance numbers. >> > > Agreed :) > > >> >> thanks, >> Karen >> >> On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: >> >> Thanks Boris and Derek for testing it. >> >> Yes I was trying to get a new version out that had the tests ported as >> well but got sidetracked while trying to add tests and two new features. >> >> Here is the incremental webrev: >> >> Here is the full webrev: >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >> >> Basically, the new tests assert this: >> - Only one agent can currently ask for the sampling, I'm currently >> seeing if I can push to a next webrev the multi-agent support to start >> doing a code freeze on this one >> - The event is not thread-enabled, meaning like the >> VMObjectAllocationEvent, it's an all or nothing event; same as the >> multi-agent, I'm going to see if a future webrev to add the support is a >> better idea to freeze this webrev a bit >> >> There was another item that I added here and I'm unsure this webrev is >> stable in debug mode: I added an assertion system to ascertain that all >> paths leading to a TLAB slow path (and hence a sampling point) have a >> sampling collector ready to post the event if a user wants it. This might >> break a few thing in debug mode as I'm working through the kinks of that as >> well. However, in release mode, this new webrev passes all the tests in >> hotspot/jtreg/serviceability/jvmti/HeapMonitor. >> >> Let me know what you think, >> Jc >> >> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < >> boris.ulasevich at bell-sw.com> wrote: >> >>> Hi JC, >>> >>> I have just checked on arm32: your patch compiles and runs ok. >>> >>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>> correspond to actual library name: libHeapMonitorTest.c -> >>> libHeapMonitorTest.so >>> >>> Boris >>> >>> On 04.04.2018 01:54, White, Derek wrote: >>> > Thanks JC, >>> > >>> > New patch applies cleanly. Compiles and runs (simple test programs) on >>> > aarch64. >>> > >>> > * Derek >>> > >>> > *From:* JC Beyler [mailto:jcbeyler at google.com] >>> > *Sent:* Monday, April 02, 2018 1:17 PM >>> > *To:* White, Derek >>> > *Cc:* Erik ?sterlund ; >>> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >>> > >>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>> > >>> > Hi Derek, >>> > >>> > I know there were a few things that went in that provoked a merge >>> > conflict. I worked on it and got it up to date. Sadly my lack of >>> > knowledge makes it a full rebase instead of keeping all the history. >>> > However, with a newly cloned jdk/hs you should now be able to use: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>> > >>> > The change you are referring to was done with the others so perhaps you >>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>> > don't know but it's been there and I checked, it is here: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>> > >>> > I double checked that tlab_end_offset no longer appears in any >>> > architecture (as far as I can tell :)). >>> > >>> > Thanks for testing and let me know if you run into any other issues! >>> > >>> > Jc >>> > >>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek >> > > wrote: >>> > >>> > Hi Jc, >>> > >>> > I?ve been having trouble getting your patch to apply correctly. I >>> > may have based it on the wrong version. >>> > >>> > In any case, I think there?s a missing update to >>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>> > where ?JavaThread::tlab_end_offset()? should become >>> > ?JavaThread::tlab_current_end_offset()?. >>> > >>> > This should correspond to the other port?s changes in >>> > templateTable_.cpp files. >>> > >>> > Thanks! >>> > - Derek >>> > >>> > *From:* hotspot-compiler-dev >>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>> > ] *On Behalf >>> > Of *JC Beyler >>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>> > *To:* Erik ?sterlund >> > > >>> > *Cc:* serviceability-dev at openjdk.java.net >>> > ; hotspot-compiler-dev >>> > >> > > >>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>> > >>> > Hi all, >>> > >>> > I've been working on deflaking the tests mostly and the wording in >>> > the JVMTI spec. >>> > >>> > Here is the two incremental webrevs: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >>> > >>> > Here is the total webrev: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >>> > >>> > Here are the notes of this change: >>> > >>> > - Currently the tests pass 100 times in a row, I am working on >>> > checking if they pass 1000 times in a row. >>> > >>> > - The default sampling rate is set to 512k, this is what we use >>> > internally and having a default means that to enable the sampling >>> > with the default, the user only has to do a enable event/disable >>> > event via JVMTI (instead of enable + set sample rate). >>> > >>> > - I deprecated the code that was handling the fast path tlab >>> > refill if it happened since this is now deprecated >>> > >>> > - Though I saw that Graal is still using it so I have to see >>> > what needs to be done there exactly >>> > >>> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead >>> for >>> > when the event system is turned on and the callback to the native >>> > agent is just empty. I got a 3% overhead with a 512k sampling rate >>> > with the code I put in the native side of my tests. >>> > >>> > Thanks and comments are appreciated, >>> > >>> > Jc >>> > >>> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >> > > wrote: >>> > >>> > Hi all, >>> > >>> > The incremental webrev update is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >>> > >>> > The full webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >>> > >>> > Major change here is: >>> > >>> > - I've removed the heapMonitoring.cpp code in favor of just >>> > having the sampling events as per Serguei's request; I still >>> > have to do some overhead measurements but the tests prove the >>> > concept can work >>> > >>> > - Most of the tlab code is unchanged, the only major >>> > part is that now things get sent off to event collectors when >>> > used and enabled. >>> > >>> > - Added the interpreter collectors to handle interpreter >>> > execution >>> > >>> > - Updated the name from SetTlabHeapSampling to >>> > SetHeapSampling to be more generic >>> > >>> > - Added a mutex for the thread sampling so that we can >>> > initialize an internal static array safely >>> > >>> > - Ported the tests from the old system to this new one >>> > >>> > I've also updated the JEP and CSR to reflect these changes: >>> > >>> > https://bugs.openjdk.java.net/browse/JDK-8194905 >>> > >>> > https://bugs.openjdk.java.net/browse/JDK-8171119 >>> > >>> > In order to make this have some forward progress, I've removed >>> > the heap sampling code entirely and now rely entirely on the >>> > event sampling system. The tests reflect this by using a >>> > simplified implementation of what an agent could do: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >>> > >>> > (Search for anything mentioning event_storage). >>> > >>> > I have not taken the time to port the whole code we had >>> > originally in heapMonitoring to this. I hesitate only because >>> > that code was in C++, I'd have to port it to C and this is for >>> > tests so perhaps what I have now is good enough? >>> > >>> > As far as testing goes, I've ported all the relevant tests and >>> > then added a few: >>> > >>> > - Turning the system on/off >>> > >>> > - Testing using various GCs >>> > >>> > - Testing using the interpreter >>> > >>> > - Testing the sampling rate >>> > >>> > - Testing with objects and arrays >>> > >>> > - Testing with various threads >>> > >>> > Finally, as overhead goes, I have the numbers of the system off >>> > vs a clean build and I have 0% overhead, which is what we'd >>> > want. This was using the Dacapo benchmarks. I am now preparing >>> > to run a version with the events on using dacapo and will >>> report >>> > back here. >>> > >>> > Any comments are welcome :) >>> > >>> > Jc >>> > >>> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler >> > > wrote: >>> > >>> > Hi all, >>> > >>> > I apologize for the delay but I wanted to add an event >>> > system and that took a bit longer than expected and I also >>> > reworked the code to take into account the deprecation of >>> > FastTLABRefill. >>> > >>> > This update has four parts: >>> > >>> > A) I moved the implementation from Thread to >>> > ThreadHeapSampler inside of Thread. Would you prefer it as >>> a >>> > pointer inside of Thread or like this works for you? Second >>> > question would be would you rather have an association >>> > outside of Thread altogether that tries to remember when >>> > threads are live and then we would have something like: >>> > >>> > ThreadHeapSampler::get_sampling_size(this_thread); >>> > >>> > I worry about the overhead of this but perhaps it is not >>> too >>> > too bad? >>> > >>> > B) I also have been working on the Allocation event system >>> > that sends out a notification at each sampled event. This >>> > will be practical when wanting to do something at the >>> > allocation point. I'm also looking at if the whole >>> > heapMonitoring code could not reside in the agent code and >>> > not in the JDK. I'm not convinced but I'm talking to >>> Serguei >>> > about it to see/assess :) >>> > >>> > - Also added two tests for the new event subsystem >>> > >>> > C) Removed the slow_path fields inside the TLAB code since >>> > now FastTLABRefill is deprecated >>> > >>> > D) Updated the JVMTI documentation and specification for >>> the >>> > methods. >>> > >>> > So the incremental webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >>> > >>> > and the full webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >>> > >>> > I believe I have updated the various JIRA issues that track >>> > this :) >>> > >>> > Thanks for your input, >>> > >>> > Jc >>> > >>> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >>> > > wrote: >>> > >>> > Hi Erik, >>> > >>> > I inlined my answers, which the last one seems to >>> answer >>> > Robbin's concerns about the same thing (adding things >>> to >>> > Thread). >>> > >>> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >>> > >> > > wrote: >>> > >>> > Hi JC, >>> > >>> > Comments are inlined below. >>> > >>> > On 2018-02-13 06:18, JC Beyler wrote: >>> > >>> > Hi Erik, >>> > >>> > Thanks for your answers, I've now inlined my >>> own >>> > answers/comments. >>> > >>> > I've done a new webrev here: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >>> > >>> > The incremental is here: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >>> > >>> > Note to all: >>> > >>> > - I've been integrating changes from >>> > Erin/Serguei/David comments so this webrev >>> > incremental is a bit an answer to all comments >>> > in one. I apologize for that :) >>> > >>> > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund >>> > >> > > wrote: >>> > >>> > Hi JC, >>> > >>> > Sorry for the delayed reply. >>> > >>> > Inlined answers: >>> > >>> > >>> > >>> > On 2018-02-06 00:04, JC Beyler wrote: >>> > >>> > Hi Erik, >>> > >>> > (Renaming this to be folded into the >>> > newly renamed thread :)) >>> > >>> > First off, thanks a lot for reviewing >>> > the webrev! I appreciate it! >>> > >>> > I updated the webrev to: >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >>> > >>> > And the incremental one is here: >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >>> > >>> > It contains: >>> > - The change for since from 9 to 11 for >>> > the jvmti.xml >>> > - The use of the OrderAccess for >>> initialized >>> > - Clearing the oop >>> > >>> > I also have inlined my answers to your >>> > comments. The biggest question >>> > will come from the multiple *_end >>> > variables. A bit of the logic there >>> > is due to handling the slow path refill >>> > vs fast path refill and >>> > checking that the rug was not pulled >>> > underneath the slowpath. I >>> > believe that a previous comment was >>> that >>> > TlabFastRefill was going to >>> > be deprecated. >>> > >>> > If this is true, we could revert this >>> > code a bit and just do a : if >>> > TlabFastRefill is enabled, disable >>> this. >>> > And then deprecate that when >>> > TlabFastRefill is deprecated. >>> > >>> > This might simplify this webrev and I >>> > can work on a follow-up that >>> > either: removes TlabFastRefill if >>> Robbin >>> > does not have the time to do >>> > it or add the support to the assembly >>> > side to handle this correctly. >>> > What do you think? >>> > >>> > I support removing TlabFastRefill, but I >>> > think it is good to not depend on that >>> > happening first. >>> > >>> > >>> > I'm slowly pushing on the FastTLABRefill >>> > ( >>> https://bugs.openjdk.java.net/browse/JDK-8194084), >>> > I agree on keeping both separate for now though >>> > so that we can think of both differently >>> > >>> > Now, below, inlined are my answers: >>> > >>> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >>> > ?sterlund >>> > >> > > >>> wrote: >>> > >>> > Hi JC, >>> > >>> > Hope I am reviewing the right >>> > version of your work. Here goes... >>> > >>> > >>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>> > >>> > 159 >>> > >>> AllocTracer::send_allocation_outside_tlab(klass, result, size * >>> > HeapWordSize, THREAD); >>> > 160 >>> > 161 >>> > >>> THREAD->tlab().handle_sample(THREAD, result, size); >>> > 162 return result; >>> > 163 } >>> > >>> > Should not call tlab()->X without >>> > checking if (UseTLAB) IMO. >>> > >>> > Done! >>> > >>> > >>> > More about this later. >>> > >>> > >>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>> > >>> > So first of all, there seems to >>> > quite a few ends. There is an >>> "end", >>> > a "hard >>> > end", a "slow path end", and an >>> > "actual end". Moreover, it seems >>> > like the >>> > "hard end" is actually further away >>> > than the "actual end". So the >>> "hard end" >>> > seems like more of a "really >>> > definitely actual end" or >>> something. >>> > I don't >>> > know about you, but I think it >>> looks >>> > kind of messy. In particular, I >>> don't >>> > feel like the name "actual end" >>> > reflects what it represents, >>> > especially when >>> > there is another end that is behind >>> > the "actual end". >>> > >>> > 413 HeapWord* >>> > ThreadLocalAllocBuffer::hard_end() >>> { >>> > 414 // Did a fast TLAB refill >>> > occur? >>> > 415 if (_slow_path_end != >>> _end) { >>> > 416 // Fix up the actual end >>> > to be now the end of this TLAB. >>> > 417 _slow_path_end = _end; >>> > 418 _actual_end = _end; >>> > 419 } >>> > 420 >>> > 421 return _actual_end + >>> > alignment_reserve(); >>> > 422 } >>> > >>> > I really do not like making getters >>> > unexpectedly have these kind of >>> side >>> > effects. It is not expected that >>> > when you ask for the "hard end", >>> you >>> > implicitly update the "slow path >>> > end" and "actual end" to new >>> values. >>> > >>> > As I said, a lot of this is due to the >>> > FastTlabRefill. If I make this >>> > not supporting FastTlabRefill, this >>> goes >>> > away. The reason the system >>> > needs to update itself at the get is >>> > that you only know at that get if >>> > things have shifted underneath the tlab >>> > slow path. I am not sure of >>> > really better names (naming is hard!), >>> > perhaps we could do these >>> > names: >>> > >>> > - current_tlab_end // Either the >>> > allocated tlab end or a sampling point >>> > - last_allocation_address // The end >>> of >>> > the tlab allocation >>> > - last_slowpath_allocated_end // In >>> > case a fast refill occurred the >>> > end might have changed, this is to >>> > remember slow vs fast past refills >>> > >>> > the hard_end method can be renamed to >>> > something like: >>> > tlab_end_pointer() // The end of >>> > the lab including a bit of >>> > alignment reserved bytes >>> > >>> > Those names sound better to me. Could you >>> > please provide a mapping from the old names >>> > to the new names so I understand which one >>> > is which please? >>> > >>> > This is my current guess of what you are >>> > proposing: >>> > >>> > end -> current_tlab_end >>> > actual_end -> last_allocation_address >>> > slow_path_end -> >>> last_slowpath_allocated_end >>> > hard_end -> tlab_end_pointer >>> > >>> > Yes that is correct, that was what I was >>> proposing. >>> > >>> > I would prefer this naming: >>> > >>> > end -> slow_path_end // the end for taking >>> a >>> > slow path; either due to sampling or >>> refilling >>> > actual_end -> allocation_end // the end for >>> > allocations >>> > slow_path_end -> last_slow_path_end // last >>> > address for slow_path_end (as opposed to >>> > allocation_end) >>> > hard_end -> reserved_end // the end of the >>> > reserved space of the TLAB >>> > >>> > About setting things in the getter... that >>> > still seems like a very unpleasant thing to >>> > me. It would be better to inspect the call >>> > hierarchy and explicitly update the ends >>> > where they need updating, and assert in the >>> > getter that they are in sync, rather than >>> > implicitly setting various ends as a >>> > surprising side effect in a getter. It >>> looks >>> > like the call hierarchy is very small. With >>> > my new naming convention, reserved_end() >>> > would presumably return _allocation_end + >>> > alignment_reserve(), and have an assert >>> > checking that _allocation_end == >>> > _last_slow_path_allocation_end, complaining >>> > that this invariant must hold, and that a >>> > caller to this function, such as >>> > make_parsable(), must first explicitly >>> > synchronize the ends as required, to honor >>> > that invariant. >>> > >>> > >>> > I've renamed the variables to how you preferred >>> > it except for the _end one. I did: >>> > >>> > current_end >>> > >>> > last_allocation_address >>> > >>> > tlab_end_ptr >>> > >>> > The reason is that the architecture dependent >>> > code use the thread.hpp API and it already has >>> > tlab included into the name so it becomes >>> > tlab_current_end (which is better that >>> > tlab_current_tlab_end in my opinion). >>> > >>> > I also moved the update into a separate method >>> > with a TODO that says to remove it when >>> > FastTLABRefill is deprecated >>> > >>> > This looks a lot better now. Thanks. >>> > >>> > Note that the following comment now needs updating >>> > accordingly in threadLocalAllocBuffer.hpp: >>> > >>> > 41 // Heap sampling is performed via >>> > the end/actual_end fields. >>> > >>> > 42 // actual_end contains the real >>> end >>> > of the tlab allocation, >>> > >>> > 43 // whereas end can be set to an >>> > arbitrary spot in the tlab to >>> > >>> > 44 // trip the return and sample the >>> > allocation. >>> > >>> > 45 // slow_path_end is used to track >>> > if a fast tlab refill occured >>> > >>> > 46 // between slowpath calls. >>> > >>> > There might be other comments too, I have not >>> looked >>> > in detail. >>> > >>> > This was the only spot that still had an actual_end, I >>> > fixed it now. I'll do a sweep to double check other >>> > comments. >>> > >>> > >>> > >>> > Not sure it's better but before >>> updating >>> > the webrev, I wanted to try >>> > to get input/consensus :) >>> > >>> > (Note hard_end was always further off >>> > than end). >>> > >>> > src/hotspot/share/prims/jvmti.xml: >>> > >>> > 10357 >> > id="can_sample_heap" since="9"> >>> > 10358 >>> > 10359 Can sample the >>> heap. >>> > 10360 If this capability >>> > is enabled then the heap sampling >>> > methods >>> > can be called. >>> > 10361 >>> > 10362 >>> > >>> > Looks like this capability should >>> > not be "since 9" if it gets >>> integrated >>> > now. >>> > >>> > Updated now to 11, crossing my fingers >>> :) >>> > >>> > >>> src/hotspot/share/runtime/heapMonitoring.cpp: >>> > >>> > 448 if >>> > (is_alive->do_object_b(value)) { >>> > 449 // Update the oop to >>> > point to the new object if it is >>> still >>> > alive. >>> > 450 >>> f->do_oop(&(trace.obj)); >>> > 451 >>> > 452 // Copy the old >>> > trace, if it is still live. >>> > 453 >>> > >>> _allocated_traces->at_put(curr_pos++, trace); >>> > 454 >>> > 455 // Store the live >>> > trace in a cache, to be served up >>> on >>> > /heapz. >>> > 456 >>> > >>> _traces_on_last_full_gc->append(trace); >>> > 457 >>> > 458 count++; >>> > 459 } else { >>> > 460 // If the old trace >>> > is no longer live, add it to the >>> list of >>> > 461 // recently >>> collected >>> > garbage. >>> > 462 >>> > store_garbage_trace(trace); >>> > 463 } >>> > >>> > In the case where the oop was not >>> > live, I would like it to be >>> explicitly >>> > cleared. >>> > >>> > Done I think how you wanted it. Let me >>> > know because I'm not familiar >>> > with the RootAccess API. I'm unclear if >>> > I'm doing this right or not so >>> > reviews of these parts are highly >>> > appreciated. Robbin had talked of >>> > perhaps later pushing this all into a >>> > OopStorage, should I do this now >>> > do you think? Or can that wait a second >>> > webrev later down the road? >>> > >>> > I think using handles can and should be >>> done >>> > later. You can use the Access API now. >>> > I noticed that you are missing an #include >>> > "oops/access.inline.hpp" in your >>> > heapMonitoring.cpp file. >>> > >>> > The missing header is there for me so I don't >>> > know, I made sure it is present in the latest >>> > webrev. Sorry about that. >>> > >>> > + Did I clear it the way you wanted me >>> > to or were you thinking of >>> > something else? >>> > >>> > >>> > That is precisely how I wanted it to be >>> > cleared. Thanks. >>> > >>> > + Final question here, seems like if I >>> > were to want to not do the >>> > f->do_oop directly on the trace.obj, >>> I'd >>> > need to do something like: >>> > >>> > f->do_oop(&value); >>> > ... >>> > trace->store_oop(value); >>> > >>> > to update the oop internally. Is that >>> > right/is that one of the >>> > advantages of going to the Oopstorage >>> > sooner than later? >>> > >>> > >>> > I think you really want to do the do_oop on >>> > the root directly. Is there a particular >>> > reason why you would not want to do that? >>> > Otherwise, yes - the benefit with using the >>> > handle approach is that you do not need to >>> > call do_oop explicitly in your code. >>> > >>> > There is no reason except that now we have a >>> > load_oop and a get_oop_addr, I was not sure >>> what >>> > you would think of that. >>> > >>> > That's fine. >>> > >>> > Also I see a lot of >>> > concurrent-looking use of the >>> > following field: >>> > 267 volatile bool >>> _initialized; >>> > >>> > Please note that the "volatile" >>> > qualifier does not help with >>> reordering >>> > here. Reordering between volatile >>> > and non-volatile fields is >>> > completely free >>> > for both compiler and hardware, >>> > except for windows with MSVC, where >>> > volatile >>> > semantics is defined to use >>> > acquire/release semantics, and the >>> > hardware is >>> > TSO. But for the general case, I >>> > would expect this field to be >>> stored >>> > with >>> > OrderAccess::release_store and >>> > loaded with >>> OrderAccess::load_acquire. >>> > Otherwise it is not thread safe. >>> > >>> > Because everything is behind a mutex, I >>> > wasn't really worried about >>> > this. I have a test that has multiple >>> > threads trying to hit this >>> > corner case and it passes. >>> > >>> > However, to be paranoid, I updated it >>> to >>> > using the OrderAccess API >>> > now, thanks! Let me know what you think >>> > there too! >>> > >>> > >>> > If it is indeed always supposed to be read >>> > and written under a mutex, then I would >>> > strongly prefer to have it accessed as a >>> > normal non-volatile member, and have an >>> > assertion that given lock is held or we are >>> > in a safepoint, as we do in many other >>> > places. Something like this: >>> > >>> > >>> assert(HeapMonitorStorage_lock->owned_by_self() >>> > || (SafepointSynchronize::is_at_safepoint() >>> > && Thread::current()->is_VM_thread()), >>> "this >>> > should not be accessed concurrently"); >>> > >>> > It would be confusing to people reading the >>> > code if there are uses of OrderAccess that >>> > are actually always protected under a >>> mutex. >>> > >>> > Thank you for the exact example to be put in >>> the >>> > code! I put it around each access/assignment of >>> > the _initialized method and found one case >>> where >>> > yes you can touch it and not have the lock. It >>> > actually is "ok" because you don't act on the >>> > storage until later and only when you really >>> > want to modify the storage (see the >>> > object_alloc_do_sample method which calls the >>> > add_trace method). >>> > >>> > But, because of this, I'm going to put the >>> > OrderAccess here, I'll do some performance >>> > numbers later and if there are issues, I might >>> > add a "unsafe" read and a "safe" one to make it >>> > explicit to the reader. But I don't think it >>> > will come to that. >>> > >>> > >>> > Okay. This double return in heapMonitoring.cpp >>> looks >>> > wrong: >>> > >>> > 283 bool initialized() { >>> > 284 return >>> > OrderAccess::load_acquire(&_initialized) != 0; >>> > 285 return _initialized; >>> > 286 } >>> > >>> > Since you said object_alloc_do_sample() is the only >>> > place where you do not hold the mutex while reading >>> > initialized(), I had a closer look at that. It >>> looks >>> > like in its current shape, the lack of a mutex may >>> > lead to a memory leak. In particular, it first >>> > checks if (initialized()). Let's assume this is now >>> > true. It then allocates a bunch of stuff, and >>> checks >>> > if the number of frames were over 0. If they were, >>> > it calls StackTraceStorage::storage()->add_trace() >>> > seemingly hoping that after grabbing the lock in >>> > there, initialized() will still return true. But it >>> > could now return false and skip doing anything, in >>> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Mon Apr 9 07:08:46 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Mon, 9 Apr 2018 07:08:46 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> Message-ID: Hi Chris, thanks for looking into this. As for ArgumentIterator::next, I must admit, I found this patch in our code base when taking over the code. I believe that an issue would be seen if an attach operation has 2 or 3 arguments and the first one is NULL/empty. I guess such a situation can't happen with the attach operations currently existing in OpenJDK as none of these ops would allow such type of arguments. However, in our implementation, we have for instance enhanced the "dump_heap" operation to work with null as first argument where one usually would specify the desired output file name. We implemented a mechanism to compute a default filename when the param is left blank. So we need the fix for that case, I guess. I'll run the patch through the submission forest now and do some jtreg testing. Best regards Christoph > -----Original Message----- > From: Chris Plummer [mailto:chris.plummer at oracle.com] > Sent: Freitag, 6. April 2018 18:37 > To: Langer, Christoph ; serviceability- > dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > > Hi Christoph, > > Can you explain a bit more about "fix handling of null values in > ArgumentIterator::next". When does this turn up? Is there a test case? > > Everything else looks good. > > thanks, > > Chris > > On 4/6/18 8:01 AM, Langer, Christoph wrote: > > > > Hi, > > > > can I please get reviews for a set of clean up changes that I came > > across when doing some integration work. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > > > > > > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > > > > > > Detailed comments about the changes can be found in the bug. > > > > Thanks & best regards > > > > Christoph > > From Pietro.Paolini at alfasystems.com Mon Apr 9 08:33:03 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Mon, 9 Apr 2018 08:33:03 +0000 Subject: =?utf-8?B?UkU6IGluc3BlY3QgYSB0aHJlYWTigJlzIHN0YWNr?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: <5D285FC05679A441ACF34A90905BFA92241A7A35@GBEDBP01.chp.co.uk> Hi Martin, >Hi Pietro >Not sure JDI is what you really want, but if you would like to play with it I have some code here that uses the PID of the JVM to open a >connection to itself and among other things print stack frames with variables: >https://github.com/skarsaune/kantega.debug?and some demo here:?https://www.youtube.com/watch?v=5sXxIfjaALg >So an example of what you can do, but not suitable for anything serious. I don't want to setup a connection to myself and I was wondering if that could be avoided altogether, it is more complex than I would like it to be, for instance I would need to factor in the connection, what if it goes wrong etc etc . >For inspecting the stack, there is an cool reflection hack to the Java 9 API demonstrated by Andrei Pangin here that is >able to capture stack values:?https://vimeo.com/233820012 Do you think that is suitable for serious work ? I mean, production code. >For serious work I suppose an JVMTI agent is the best option. Others are in a better position to offer guidance on that. Reading the docs it seems that the agent has to be written in C/C++ and unfortunately that is not an option on my current project, I quote from there (https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#whatIs) : "Tools can be written directly to JVM TI or indirectly through higher level interfaces. The Java Platform Debugger Architecture includes JVM TI, but also contains higher-level, out-of-process debugger interfaces. The higher-level interfaces are more appropriate than JVM TI for many tools. For more information on the Java Platform Debugger Architecture, see the Java Platform Debugger Architecture website." It easy to get lost among acronyms - me being a newbie in the Java JVM related tooling - but when I open the https://docs.oracle.com/javase/7/docs/technotes/guides/jpda/architecture.html (Java Platform Debugger Architecture website) it lists three "things": 1) JVM TI if native it is not an option 2) JDWP not sure I need to look into that 3) JDI which is why I ended up here Wrapping up, my hope is that the Java 9 reflection hack can work well or that JDI allows me to do inspect frames without the need of having a connection, reading your answer that does not seem to be possible and I should exclude the possibility altogether. Is that right ? Thanks a lot for the answers. P. fre. 6. apr. 2018 kl. 18:14 skrev Pietro Paolini : Hi all, ? I apologise if this is not the right ML for it but? I couldn?t find exactly what I was looking for when Googling the problem. I am a bit new to the JDI world. ? I would like to inspect the stack-frame of a specific? thread, I came across the StackFrame/ThreadReference classes but I couldn?t find a way examples where their usage is shown without connecting to the VM somehow, like a debugger would do. ? Is it possible to ? inspect a thread?s stack ?locally? ?? In my mind I could be able to have a function such as : ? static void hook(Thread thread) { ? thread.wait() // stop that thread ? // inspect the frames of that thread doing any needed business with them } ? I?d need this for diagnostic purposes of my application. ? Thanks, Pietro ? ? Pietro Paolini Consultant Alfa ________________________________________ e: pietro.paolini at alfasystems.com | w: alfasystems.com t: +44 (0) 20 7920-2643 | Moor Place, 1 Fore Street Avenue, London, EC2Y 9DT, GB ________________________________________ The contents of this communication are not intended to be binding or constitute any form of offer or acceptance or give rise to any legal obligations on behalf of the sender or Alfa. The views or opinions expressed represent those of the author and not necessarily those of Alfa. This email and any attachments are strictly confidential and are intended solely for use by the individual or entity to whom it is addressed. If you are not the addressee (or responsible for delivery of the message to the addressee) you may not copy, forward, disclose or use any part of the message or its attachments. At present the integrity of email across the internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the use of this medium to transmit information by or to Alfa or its affiliates. Alfa Financial Software Ltd Reg. in England No: 0248 2325 From Pietro.Paolini at alfasystems.com Mon Apr 9 08:40:39 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Mon, 9 Apr 2018 08:40:39 +0000 Subject: =?utf-8?B?UkU6IGluc3BlY3QgYSB0aHJlYWTigJlzIHN0YWNr?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> >Access to stacktraces with locals is demoed in this test >http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/StackWalker/LocalsAndOperands.java Maybe I haven't read it well enough but isn't that accessible through https://docs.oracle.com/javase/9/docs/api/java/lang/StackWalker.html ? As long as you are on Java 9 that should not be a problem. >but the functionality does not seem to be available (yet!) via a public API. What do you mean ? Isn't that a public API ? Thanks, P. From thomas.stuefe at gmail.com Mon Apr 9 09:07:17 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 11:07:17 +0200 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: Hi Sergey, Christoph, thanks for the review! Sure, here you go: Old output, unsorted: thomas at mainframe /shared/projects/openjdk/jdk-submit-hs/output-fastdebug $ ./images/jdk/bin/jcmd test3.Example2 help 24278: The following commands are available: VM.log VM.native_memory ManagementAgent.status ManagementAgent.stop ManagementAgent.start_local ManagementAgent.start Compiler.directives_clear Compiler.directives_remove Compiler.directives_add Compiler.directives_print Compiler.CodeHeap_Analytics VM.print_touched_methods Compiler.codecache Compiler.codelist Compiler.queue VM.classloader_stats Thread.print JVMTI.data_dump JVMTI.agent_load VM.metaspace VM.stringtable VM.symboltable VM.class_hierarchy VM.systemdictionary GC.class_stats GC.class_histogram GC.heap_dump GC.finalizer_info GC.heap_info GC.run_finalization GC.run VM.info VM.uptime VM.dynlibs VM.set_flag VM.flags VM.system_properties VM.command_line VM.version help New output, sorted: thomas at mainframe /shared/projects/openjdk/jdk-submit-hs/output-fastdebug $ ./images/jdk/bin/jcmd test3.Example2 help 30230: The following commands are available: Compiler.CodeHeap_Analytics Compiler.codecache Compiler.codelist Compiler.directives_add Compiler.directives_clear Compiler.directives_print Compiler.directives_remove Compiler.queue GC.class_histogram GC.class_stats GC.finalizer_info GC.heap_dump GC.heap_info GC.run GC.run_finalization JVMTI.agent_load JVMTI.data_dump ManagementAgent.start ManagementAgent.start_local ManagementAgent.status ManagementAgent.stop Thread.print VM.class_hierarchy VM.classloader_stats VM.command_line VM.dynlibs VM.flags VM.info VM.log VM.metaspace VM.native_memory VM.print_touched_methods VM.set_flag VM.stringtable VM.symboltable VM.system_properties VM.systemdictionary VM.uptime VM.version help I'm running submit tests now, if they pass I'll push. Best Regards, Thomas On Tue, Apr 3, 2018 at 3:52 AM, serguei.spitsyn at oracle.com < serguei.spitsyn at oracle.com> wrote: > Hi Thomas, > > Added the serviceability-dev mailing list as it is a Serviceability area. > > The fix looks good to me. > One question: > Could you, please, post the sorted help output? > It is interesting how does it look like when sorted. > > Thanks, > Serguei > > > > On 3/28/18 13:08, Thomas St?fe wrote: > >> Hi all, >> >> may I get reviews for this tiny trivial change which causes jcmd help >> output (the command list) to be sorted? >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8200384 >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help >> -sorted/webrev.00/webrev/ >> >> Thanks! >> >> Best Regards, Thomas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 9 15:24:00 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 17:24:00 +0200 Subject: jcmd, windows x64: cannot see other processes? Message-ID: Hi all, I try to attach a jcmd to a running process on windows x64, but jcmd only lists its own process. Both jcmd and process are built from jdk-hs. Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not work either. Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 Vm, and still nothing... am I making a thinking error here? Do I need special options on Windows? On Unix this never gave me any trouble. Both processes run under the same user, from two console windows. I tried both within and without cygwin too, does not make any difference. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.gronlund at oracle.com Mon Apr 9 15:30:57 2018 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Mon, 9 Apr 2018 08:30:57 -0700 (PDT) Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References: Message-ID: Hi Thomas, ? Are you running in two separate Terminal Server Sessions? ? You need to be in the same WindowsStation https://msdn.microsoft.com/en-us/library/windows/desktop/ms687096(v=vs.85).aspx ? HTH Markus ? From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: den 9 april 2018 17:24 To: serviceability-dev at openjdk.java.net Subject: jcmd, windows x64: cannot see other processes? ? Hi all, ? I try to attach a jcmd to a running process on windows x64, but jcmd only lists its own process. ? Both jcmd and process are built from jdk-hs. ? Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not work either. ? Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 Vm, and still nothing... am I making a thinking error here? Do I need special options on Windows? On Unix this never gave me any trouble. ? Both processes run under the same user, from two console windows. I tried both within and without cygwin too, does not make any difference. ? Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 9 15:33:36 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 17:33:36 +0200 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References: Message-ID: Hi Markus, On Mon, Apr 9, 2018 at 5:30 PM, Markus Gronlund wrote: > Hi Thomas, > > > > Are you running in two separate Terminal Server Sessions? > > > no, this is all on my local Laptop. > You need to be in the same WindowsStation https://msdn.microsoft.com/en- > us/library/windows/desktop/ms687096(v=vs.85).aspx > > > > HTH > > Markus > > > Best Regards, Thomas > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* den 9 april 2018 17:24 > *To:* serviceability-dev at openjdk.java.net > *Subject:* jcmd, windows x64: cannot see other processes? > > > > Hi all, > > > > I try to attach a jcmd to a running process on windows x64, but jcmd only > lists its own process. > > > > Both jcmd and process are built from jdk-hs. > > > > Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not > work either. > > > > Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 > Vm, and still nothing... am I making a thinking error here? Do I need > special options on Windows? On Unix this never gave me any trouble. > > > > Both processes run under the same user, from two console windows. I tried > both within and without cygwin too, does not make any difference. > > > > Thanks, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 9 15:50:33 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 17:50:33 +0200 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References: Message-ID: So, I found that I can attach with jcmd just fine, just the process listing does not work. I can only attach via pid, not via command name, which I think stems from the same error. Does anyone have any idea? Should I open a bug report? ..Thomas On Mon, Apr 9, 2018 at 5:24 PM, Thomas St?fe wrote: > Hi all, > > I try to attach a jcmd to a running process on windows x64, but jcmd only > lists its own process. > > Both jcmd and process are built from jdk-hs. > > Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not > work either. > > Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 > Vm, and still nothing... am I making a thinking error here? Do I need > special options on Windows? On Unix this never gave me any trouble. > > Both processes run under the same user, from two console windows. I tried > both within and without cygwin too, does not make any difference. > > Thanks, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Mon Apr 9 15:57:27 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 9 Apr 2018 16:57:27 +0100 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References: Message-ID: <2e83859a-8623-c40a-00b7-b0b460c4d798@oracle.com> On 09/04/2018 16:50, Thomas St?fe wrote: > So, I found that I can attach with jcmd just fine, just the process > listing does not work. > > I can only attach via pid, not via command name, which I think stems > from the same error. > > Does anyone have any idea? Should I open a bug report? > > Its this something to do with the value of java.io.tmpdir? Are the running VMs using their own temp dir? -Alan From andrew_m_leonard at uk.ibm.com Mon Apr 9 16:07:27 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Mon, 9 Apr 2018 17:07:27 +0100 Subject: RFR: Fix race condition in jdwp Message-ID: Hi, We discovered in our testing with OpenJ9 that a race condition can occur in the jdwp under certain circumstances, and we were able to force the same issue with Hotspot. Normally, the event helper thread suspends all threads, then the debug loop in the listener thread receives a command to resume. The debugger may deadlock if the debug loop in the listener thread starts processing commands (e.g. resume threads) before the event helper completes the initialization (and suspends threads). This patch adds synchronization to ensure the event helper completes the initialization sequence before debugger commands are processed. Please can I find a sponsor for this contribution? Patch below.. Many thanks Andrew diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -58,6 +58,7 @@ static jboolean vmInitialized; static jrawMonitorID initMonitor; static jboolean initComplete; +static jboolean VMInitComplete; static jbyte currentSessionID; /* @@ -617,6 +618,35 @@ debugMonitorExit(initMonitor); } +/* + * Signal VM initialization is complete. + */ +void +signalVMInitComplete(void) +{ + /* + * VM Initialization is complete + */ + LOG_MISC(("signal VM initialization complete")); + debugMonitorEnter(initMonitor); + VMInitComplete = JNI_TRUE; + debugMonitorNotifyAll(initMonitor); + debugMonitorExit(initMonitor); +} + +/* + * Wait for VM initialization to complete. + */ +void +debugInit_waitVMInitComplete(void) +{ + debugMonitorEnter(initMonitor); + while (!VMInitComplete) { + debugMonitorWait(initMonitor); + } + debugMonitorExit(initMonitor); +} + /* All process exit() calls come from here */ void forceExit(int exit_code) @@ -672,6 +702,7 @@ LOG_MISC(("Begin initialize()")); currentSessionID = 0; initComplete = JNI_FALSE; + VMInitComplete = JNI_FALSE; if ( gdata->vmDead ) { EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -39,4 +39,7 @@ void debugInit_exit(jvmtiError, const char *); void forceExit(int); +void debugInit_waitVMInitComplete(void); +void signalVMInitComplete(void); + #endif diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -98,6 +98,7 @@ standardHandlers_onConnect(); threadControl_onConnect(); + debugInit_waitVMInitComplete(); /* Okay, start reading cmds! */ while (shouldListen) { if (!dequeue(&p)) { diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -580,6 +580,7 @@ (void)threadControl_suspendThread(command->thread, JNI_FALSE); } + signalVMInitComplete(); outStream_initCommand(&out, uniqueID(), 0x0, JDWP_COMMAND_SET(Event), JDWP_COMMAND(Event, Composite)); Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Mon Apr 9 16:27:11 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 09 Apr 2018 16:27:11 +0000 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> Message-ID: I think the conversation will shift a bit if you explain what you mean with: "// inspect the frames of that thread doing any needed business with them" What exactly do you have in mind? Do you want to change the stack in some way? Because, depending on what you want, Andrew's comment on: ThreadMXBean.getThreadInfo(id).getStackTrace() ? seems reasonable to me :) Jc On Mon, Apr 9, 2018 at 1:51 AM Pietro Paolini < Pietro.Paolini at alfasystems.com> wrote: > > >Access to stacktraces with locals is demoed in this test > > > http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/StackWalker/LocalsAndOperands.java > > Maybe I haven't read it well enough but isn't that accessible through > https://docs.oracle.com/javase/9/docs/api/java/lang/StackWalker.html ? As > long as you are on Java 9 that should not > be a problem. > > >but the functionality does not seem to be available (yet!) via a public > API. > > What do you mean ? Isn't that a public API ? > > Thanks, > P. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Mon Apr 9 17:54:29 2018 From: martinrb at google.com (Martin Buchholz) Date: Mon, 9 Apr 2018 10:54:29 -0700 Subject: RFR: 8201327: Make Sensor deeply immutably thread safe Message-ID: Another little cleanup to make Google's race detector happy. 8201327: Make Sensor deeply immutably thread safe http://cr.openjdk.java.net/~martin/webrevs/jdk/Sensor-init/ https://bugs.openjdk.java.net/browse/JDK-8201327 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Mon Apr 9 18:05:28 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 9 Apr 2018 11:05:28 -0700 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> Message-ID: <1497c4c6-d95e-7e25-b503-cc7da9d6d077@oracle.com> Hi Christoph, We have some closed "attach on demand" tests that should be run also. I can do this for you when you are ready. Please also let me know which other jtreg tests you have run. thanks, Chris On 4/9/18 12:08 AM, Langer, Christoph wrote: > Hi Chris, > > thanks for looking into this. > > As for ArgumentIterator::next, I must admit, I found this patch in our code base when taking over the code. I believe that an issue would be seen if an attach operation has 2 or 3 arguments and the first one is NULL/empty. I guess such a situation can't happen with the attach operations currently existing in OpenJDK as none of these ops would allow such type of arguments. However, in our implementation, we have for instance enhanced the "dump_heap" operation to work with null as first argument where one usually would specify the desired output file name. We implemented a mechanism to compute a default filename when the param is left blank. So we need the fix for that case, I guess. > > I'll run the patch through the submission forest now and do some jtreg testing. > > Best regards > Christoph > >> -----Original Message----- >> From: Chris Plummer [mailto:chris.plummer at oracle.com] >> Sent: Freitag, 6. April 2018 18:37 >> To: Langer, Christoph ; serviceability- >> dev at openjdk.java.net >> Cc: hotspot-dev at openjdk.java.net >> Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework >> >> Hi Christoph, >> >> Can you explain a bit more about "fix handling of null values in >> ArgumentIterator::next". When does this turn up? Is there a test case? >> >> Everything else looks good. >> >> thanks, >> >> Chris >> >> On 4/6/18 8:01 AM, Langer, Christoph wrote: >>> Hi, >>> >>> can I please get reviews for a set of clean up changes that I came >>> across when doing some integration work. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 >>> >>> >>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ >>> >>> >>> Detailed comments about the changes can be found in the bug. >>> >>> Thanks & best regards >>> >>> Christoph >>> From Alan.Bateman at oracle.com Mon Apr 9 19:31:57 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 9 Apr 2018 20:31:57 +0100 Subject: RFR: 8201327: Make Sensor deeply immutably thread safe In-Reply-To: References: Message-ID: <561c83e7-cbbb-8b7f-87f9-fd5c3f4a7422@oracle.com> On 09/04/2018 18:54, Martin Buchholz wrote: > Another little cleanup to make Google's race detector happy. > > 8201327: Make Sensor deeply immutably thread safe > http://cr.openjdk.java.net/~martin/webrevs/jdk/Sensor-init/ > > https://bugs.openjdk.java.net/browse/JDK-8201327 > This looks okay to me, except maybe the "initially false" and "initially 0" comments as they are just documenting the default values and don't add anything. -Alan. From chris.hegarty at oracle.com Mon Apr 9 20:25:49 2018 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Mon, 9 Apr 2018 21:25:49 +0100 Subject: RFR: 8201327: Make Sensor deeply immutably thread safe In-Reply-To: <561c83e7-cbbb-8b7f-87f9-fd5c3f4a7422@oracle.com> References: <561c83e7-cbbb-8b7f-87f9-fd5c3f4a7422@oracle.com> Message-ID: <358B88C1-9C17-4FFE-A17A-681E8F23F908@oracle.com> > On 9 Apr 2018, at 20:31, Alan Bateman wrote: > >> On 09/04/2018 18:54, Martin Buchholz wrote: >> Another little cleanup to make Google's race detector happy. >> >> 8201327: Make Sensor deeply immutably thread safe >> http://cr.openjdk.java.net/~martin/webrevs/jdk/Sensor-init/ >> https://bugs.openjdk.java.net/browse/JDK-8201327 >> > This looks okay to me, +1 > except maybe the "initially false" and "initially 0" comments as they are just documenting the default values and don't add anything. Yeah, I?ve done this before too, especially when removing default values ( as if to prevent them from inadvertently creeping back in ). -Chris. From alexey.menkov at oracle.com Mon Apr 9 21:13:49 2018 From: alexey.menkov at oracle.com (Alex Menkov) Date: Mon, 9 Apr 2018 14:13:49 -0700 Subject: RFR: 8200195 : serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java crashes with "assert(thread->thread_state() == _thread_in_native) failed: coming from wrong thread state" Message-ID: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> Hi all, Please review a fix for https://bugs.openjdk.java.net/browse/JDK-8200195 webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev/ The problem with the test is it uses cached JNIEnv value instead using a value passed to the callbacks. JNIEnv is valid only for the current thread. --alex From chris.plummer at oracle.com Mon Apr 9 21:28:49 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 9 Apr 2018 14:28:49 -0700 Subject: RFR: 8200195 : serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java crashes with "assert(thread->thread_state() == _thread_in_native) failed: coming from wrong thread state" In-Reply-To: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> References: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> Message-ID: <0cf45ce3-d6f2-f26f-81a0-331d5820e572@oracle.com> Hi Alex, I'd suggest renaming javaEnv to jni_env to be consistent. Not sure why javaEnv was chosen in the original implementation. Otherwise the changes look good. thanks, Chris On 4/9/18 2:13 PM, Alex Menkov wrote: > Hi all, > > Please review a fix for https://bugs.openjdk.java.net/browse/JDK-8200195 > webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev/ > > The problem with the test is it uses cached JNIEnv value instead using > a value passed to the callbacks. JNIEnv is valid only for the current > thread. > > --alex From david.holmes at oracle.com Mon Apr 9 21:35:01 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Apr 2018 07:35:01 +1000 Subject: RFR: 8200195 : serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java crashes with "assert(thread->thread_state() == _thread_in_native) failed: coming from wrong thread state" In-Reply-To: <0cf45ce3-d6f2-f26f-81a0-331d5820e572@oracle.com> References: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> <0cf45ce3-d6f2-f26f-81a0-331d5820e572@oracle.com> Message-ID: <09f9255e-95eb-aa57-2f4a-1c72f7a96ec6@oracle.com> +1 Though I'd also like to understand why the test only fails sometimes and which other thread is involved? Thanks, David On 10/04/2018 7:28 AM, Chris Plummer wrote: > Hi Alex, > > I'd suggest renaming javaEnv to jni_env to be consistent. Not sure why > javaEnv was chosen in the original implementation. Otherwise the changes > look good. > > thanks, > > Chris > > On 4/9/18 2:13 PM, Alex Menkov wrote: >> Hi all, >> >> Please review a fix for https://bugs.openjdk.java.net/browse/JDK-8200195 >> webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev/ >> >> The problem with the test is it uses cached JNIEnv value instead using >> a value passed to the callbacks. JNIEnv is valid only for the current >> thread. >> >> --alex > > From serguei.spitsyn at oracle.com Mon Apr 9 22:03:43 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 9 Apr 2018 15:03:43 -0700 Subject: RFR: Fix race condition in jdwp In-Reply-To: References: Message-ID: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > ?static jboolean vmInitialized; > ?static jrawMonitorID initMonitor; > ?static jboolean initComplete; > +static jboolean VMInitComplete; > ?static jbyte currentSessionID; > > ?/* > @@ -617,6 +618,35 @@ > ?debugMonitorExit(initMonitor); > ?} > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + ? ?/* > + * VM Initialization is complete > + */ > + ? ?LOG_MISC(("signal VM initialization complete")); > + ? ?debugMonitorEnter(initMonitor); > + ? ?VMInitComplete = JNI_TRUE; > + ? ?debugMonitorNotifyAll(initMonitor); > + ? ?debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + ? ?debugMonitorEnter(initMonitor); > + ? ?while (!VMInitComplete) { > + ? ?debugMonitorWait(initMonitor); > + ? ?} > + ? ?debugMonitorExit(initMonitor); > +} > + > ?/* All process exit() calls come from here */ > ?void > ?forceExit(int exit_code) > @@ -672,6 +702,7 @@ > ?LOG_MISC(("Begin initialize()")); > ?currentSessionID = 0; > ?initComplete = JNI_FALSE; > + ? ?VMInitComplete = JNI_FALSE; > > ?if ( gdata->vmDead ) { > ? ? ?EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > ?void debugInit_exit(jvmtiError, const char *); > ?void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > ?#endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > ?standardHandlers_onConnect(); > ?threadControl_onConnect(); > > + ? ?debugInit_waitVMInitComplete(); > ?/* Okay, start reading cmds! */ > ?while (shouldListen) { > ? ? ?if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > ? ? ?(void)threadControl_suspendThread(command->thread, JNI_FALSE); > ?} > > + ? ?signalVMInitComplete(); > ?outStream_initCommand(&out, uniqueID(), 0x0, > ?JDWP_COMMAND_SET(Event), > ?JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From alexey.menkov at oracle.com Mon Apr 9 22:04:22 2018 From: alexey.menkov at oracle.com (Alex Menkov) Date: Mon, 9 Apr 2018 15:04:22 -0700 Subject: RFR: 8200195 : serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java crashes with "assert(thread->thread_state() == _thread_in_native) failed: coming from wrong thread state" In-Reply-To: <09f9255e-95eb-aa57-2f4a-1c72f7a96ec6@oracle.com> References: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> <0cf45ce3-d6f2-f26f-81a0-331d5820e572@oracle.com> <09f9255e-95eb-aa57-2f4a-1c72f7a96ec6@oracle.com> Message-ID: <9cb6d451-2496-9ca9-ea87-43b729f73f8c@oracle.com> updated webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev.02/ (javaEnv is replaced with jni_env) On 04/09/2018 14:35, David Holmes wrote: > +1 > > Though I'd also like to understand why the test only fails sometimes and > which other thread is involved? It always fails with Graal compiler on win, linux and macosx. Looks like with Graal compiler access/modification events are sent on several threads (sorry, I don't have enough knowledge about Graal). --alex > > Thanks, > David > > On 10/04/2018 7:28 AM, Chris Plummer wrote: >> Hi Alex, >> >> I'd suggest renaming javaEnv to jni_env to be consistent. Not sure why >> javaEnv was chosen in the original implementation. Otherwise the >> changes look good. >> >> thanks, >> >> Chris >> >> On 4/9/18 2:13 PM, Alex Menkov wrote: >>> Hi all, >>> >>> Please review a fix for https://bugs.openjdk.java.net/browse/JDK-8200195 >>> webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev/ >>> >>> The problem with the test is it uses cached JNIEnv value instead >>> using a value passed to the callbacks. JNIEnv is valid only for the >>> current thread. >>> >>> --alex >> >> From serguei.spitsyn at oracle.com Mon Apr 9 22:19:38 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 9 Apr 2018 15:19:38 -0700 Subject: RFR: 8200195 : serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java crashes with "assert(thread->thread_state() == _thread_in_native) failed: coming from wrong thread state" In-Reply-To: <9cb6d451-2496-9ca9-ea87-43b729f73f8c@oracle.com> References: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> <0cf45ce3-d6f2-f26f-81a0-331d5820e572@oracle.com> <09f9255e-95eb-aa57-2f4a-1c72f7a96ec6@oracle.com> <9cb6d451-2496-9ca9-ea87-43b729f73f8c@oracle.com> Message-ID: Hi Alex, The fix looks good. Thank you for replacing the javaEnv with the jni_env. Thanks, Serguei On 4/9/18 15:04, Alex Menkov wrote: > updated webrev: > http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev.02/ > (javaEnv is replaced with jni_env) > > On 04/09/2018 14:35, David Holmes wrote: >> +1 >> >> Though I'd also like to understand why the test only fails sometimes >> and which other thread is involved? > > It always fails with Graal compiler on win, linux and macosx. > Looks like with Graal compiler access/modification events are sent on > several threads (sorry, I don't have enough knowledge about Graal). > > --alex > >> >> Thanks, >> David >> >> On 10/04/2018 7:28 AM, Chris Plummer wrote: >>> Hi Alex, >>> >>> I'd suggest renaming javaEnv to jni_env to be consistent. Not sure >>> why javaEnv was chosen in the original implementation. Otherwise the >>> changes look good. >>> >>> thanks, >>> >>> Chris >>> >>> On 4/9/18 2:13 PM, Alex Menkov wrote: >>>> Hi all, >>>> >>>> Please review a fix for >>>> https://bugs.openjdk.java.net/browse/JDK-8200195 >>>> webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev/ >>>> >>>> The problem with the test is it uses cached JNIEnv value instead >>>> using a value passed to the callbacks. JNIEnv is valid only for the >>>> current thread. >>>> >>>> --alex >>> >>> From chris.plummer at oracle.com Mon Apr 9 22:57:00 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 9 Apr 2018 15:57:00 -0700 Subject: RFR: 8200195 : serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java crashes with "assert(thread->thread_state() == _thread_in_native) failed: coming from wrong thread state" In-Reply-To: <9cb6d451-2496-9ca9-ea87-43b729f73f8c@oracle.com> References: <4bc183a3-43d8-5dbc-363c-af0a83713298@oracle.com> <0cf45ce3-d6f2-f26f-81a0-331d5820e572@oracle.com> <09f9255e-95eb-aa57-2f4a-1c72f7a96ec6@oracle.com> <9cb6d451-2496-9ca9-ea87-43b729f73f8c@oracle.com> Message-ID: <3e5e8c8b-a2e4-3686-8cd1-fef4ceb87bec@oracle.com> On 4/9/18 3:04 PM, Alex Menkov wrote: > updated webrev: > http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev.02/ > (javaEnv is replaced with jni_env) Looks good. > > On 04/09/2018 14:35, David Holmes wrote: >> +1 >> >> Though I'd also like to understand why the test only fails sometimes >> and which other thread is involved? > > It always fails with Graal compiler on win, linux and macosx. > Looks like with Graal compiler access/modification events are sent on > several threads (sorry, I don't have enough knowledge about Graal). > One thing to be careful of is that testResultClass and testResultObject may not be valid in the thread that is doing the GetFieldID() call. These were setup in Java_FieldAccessWatch_startTest() which might have been a different thread, with a different ClassLoader hierarchy, than the callback thread. I ran into this problem in another test where there was an unexpected callback from a finalizer thread. Probably in this case the worse that can happen is an exception, and the test deals with that. Chris > --alex > >> >> Thanks, >> David >> >> On 10/04/2018 7:28 AM, Chris Plummer wrote: >>> Hi Alex, >>> >>> I'd suggest renaming javaEnv to jni_env to be consistent. Not sure >>> why javaEnv was chosen in the original implementation. Otherwise the >>> changes look good. >>> >>> thanks, >>> >>> Chris >>> >>> On 4/9/18 2:13 PM, Alex Menkov wrote: >>>> Hi all, >>>> >>>> Please review a fix for >>>> https://bugs.openjdk.java.net/browse/JDK-8200195 >>>> webrev: http://cr.openjdk.java.net/~amenkov/field_access_graal/webrev/ >>>> >>>> The problem with the test is it uses cached JNIEnv value instead >>>> using a value passed to the callbacks. JNIEnv is valid only for the >>>> current thread. >>>> >>>> --alex >>> >>> From david.holmes at oracle.com Tue Apr 10 01:06:38 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Apr 2018 11:06:38 +1000 Subject: RFR: Fix race condition in jdwp In-Reply-To: References: Message-ID: Hi Andrew, On 10/04/2018 2:07 AM, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can occur > in the jdwp under certain circumstances, and we were able to force the > same issue with Hotspot. Normally, the event helper thread suspends all > threads, then the debug loop in the listener thread receives a command > to resume. The debugger may deadlock if the debug loop in the listener > thread starts processing commands (e.g. resume threads) before the event > helper completes the initialization (and suspends threads). > > This patch adds synchronization to ensure the event helper completes the > initialization sequence before debugger commands are processed. How does this relate to the existing debugInit_waitInitComplete? I don't see why we would want two initialization synchronization mechanisms. ?? Thanks, David > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > ?static jboolean vmInitialized; > ?static jrawMonitorID initMonitor; > ?static jboolean initComplete; > +static jboolean VMInitComplete; > ?static jbyte currentSessionID; > > ?/* > @@ -617,6 +618,35 @@ > ?debugMonitorExit(initMonitor); > ?} > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + ? ?/* > + * VM Initialization is complete > + */ > + ? ?LOG_MISC(("signal VM initialization complete")); > + ? ?debugMonitorEnter(initMonitor); > + ? ?VMInitComplete = JNI_TRUE; > + ? ?debugMonitorNotifyAll(initMonitor); > + ? ?debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + ? ?debugMonitorEnter(initMonitor); > + ? ?while (!VMInitComplete) { > + ? ?debugMonitorWait(initMonitor); > + ? ?} > + ? ?debugMonitorExit(initMonitor); > +} > + > ?/* All process exit() calls come from here */ > ?void > ?forceExit(int exit_code) > @@ -672,6 +702,7 @@ > ?LOG_MISC(("Begin initialize()")); > ?currentSessionID = 0; > ?initComplete = JNI_FALSE; > + ? ?VMInitComplete = JNI_FALSE; > > ?if ( gdata->vmDead ) { > ? ? ?EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > ?void debugInit_exit(jvmtiError, const char *); > ?void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > ?#endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > ?standardHandlers_onConnect(); > ?threadControl_onConnect(); > > + ? ?debugInit_waitVMInitComplete(); > ?/* Okay, start reading cmds! */ > ?while (shouldListen) { > ? ? ?if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > ?/* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > ? * > ? * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > ? ? ?(void)threadControl_suspendThread(command->thread, JNI_FALSE); > ?} > > + ? ?signalVMInitComplete(); > ?outStream_initCommand(&out, uniqueID(), 0x0, > ?JDWP_COMMAND_SET(Event), > ?JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From martinrb at google.com Tue Apr 10 02:13:33 2018 From: martinrb at google.com (Martin Buchholz) Date: Mon, 9 Apr 2018 19:13:33 -0700 Subject: RFR: 8201327: Make Sensor deeply immutably thread safe In-Reply-To: <358B88C1-9C17-4FFE-A17A-681E8F23F908@oracle.com> References: <561c83e7-cbbb-8b7f-87f9-fd5c3f4a7422@oracle.com> <358B88C1-9C17-4FFE-A17A-681E8F23F908@oracle.com> Message-ID: On Mon, Apr 9, 2018 at 1:25 PM, Chris Hegarty wrote: > > > On 9 Apr 2018, at 20:31, Alan Bateman wrote: > > > >> On 09/04/2018 18:54, Martin Buchholz wrote: > >> Another little cleanup to make Google's race detector happy. > >> > >> 8201327: Make Sensor deeply immutably thread safe > >> http://cr.openjdk.java.net/~martin/webrevs/jdk/Sensor-init/ < > http://cr.openjdk.java.net/%7Emartin/webrevs/jdk/Sensor-init/> > >> https://bugs.openjdk.java.net/browse/JDK-8201327 > >> > > This looks okay to me, > > +1 > > > except maybe the "initially false" and "initially 0" comments as they > are just documenting the default values and don't add anything. > > Yeah, I?ve done this before too, especially when removing default values ( > as if to prevent them from inadvertently creeping back in ). > If only there was standard concise wording for "intentionally using the field's default value, which is slightly more efficient and slightly more theoretically thread-safe, even though slightly less readable" ... // VM-initialized to false -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pietro.Paolini at alfasystems.com Tue Apr 10 07:26:31 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Tue, 10 Apr 2018 07:26:31 +0000 Subject: =?utf-8?B?UkU6IGluc3BlY3QgYSB0aHJlYWTigJlzIHN0YWNr?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> Message-ID: <5D285FC05679A441ACF34A90905BFA92241A80A7@GBEDBP01.chp.co.uk> > > I think the conversation will shift a bit if you explain what you mean with: > > "// inspect the frames of that thread doing any needed business with them" > > What exactly do you have in mind? Do you want to change the stack in some > way? I would like to inspect the variable's name/value on the stack at a specific point in time for diagnostic purposes, I don't want to change their value. I don't think that should be allowed anyway :-) > > Because, depending on what you want, Andrew's comment on: > ThreadMXBean.getThreadInfo(id).getStackTrace() ? > > > seems reasonable to me :) I had a look to the API's Javadoc, my understanding is that I could to get to the StackTraceElement array with it . That is OK to see the calls' stack but it does not provide any API to inspect what is actually on the stack in terms of variable's pair. Did I miss anything ? Thanks, P. From neugens at redhat.com Tue Apr 10 08:35:57 2018 From: neugens at redhat.com (Mario Torre) Date: Tue, 10 Apr 2018 10:35:57 +0200 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A80A7@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> <5D285FC05679A441ACF34A90905BFA92241A80A7@GBEDBP01.chp.co.uk> Message-ID: On Tue, Apr 10, 2018 at 9:26 AM, Pietro Paolini wrote: >> >> I think the conversation will shift a bit if you explain what you mean with: >> >> "// inspect the frames of that thread doing any needed business with them" >> >> What exactly do you have in mind? Do you want to change the stack in some >> way? > > I would like to inspect the variable's name/value on the stack at a specific point in time for diagnostic purposes, I don't want to change their value. I don't think that should be allowed anyway :-) > >> >> Because, depending on what you want, Andrew's comment on: >> ThreadMXBean.getThreadInfo(id).getStackTrace() ? >> >> >> seems reasonable to me :) > > I had a look to the API's Javadoc, my understanding is that I could to get to the StackTraceElement array with it . That is OK to see the > calls' stack but it does not provide any API to inspect what is actually on the stack in terms of variable's pair. > > Did I miss anything ? Just an idea, but did you try out Byteman too? That may be a simpler alternative than writing an agent in C. What Martin suggested may also work, but the API is hidden behind reflection and that may not work with external programs in 9+, I'm not sure any of that is exported but you can try it, I haven't used that API much lately unfortunately. Nevertheless, Byteman seems a more stable option, as this code may change internally any time. Cheers, Mario -- Mario Torre Associate Manager, Software Engineering Red Hat GmbH 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898 From thomas.stuefe at gmail.com Tue Apr 10 09:26:01 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 10 Apr 2018 11:26:01 +0200 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: <2e83859a-8623-c40a-00b7-b0b460c4d798@oracle.com> References: <2e83859a-8623-c40a-00b7-b0b460c4d798@oracle.com> Message-ID: Hi Alan, On Mon, Apr 9, 2018 at 5:57 PM, Alan Bateman wrote: > On 09/04/2018 16:50, Thomas St?fe wrote: > >> So, I found that I can attach with jcmd just fine, just the process >> listing does not work. >> >> I can only attach via pid, not via command name, which I think stems from >> the same error. >> >> Does anyone have any idea? Should I open a bug report? >> >> >> Its this something to do with the value of java.io.tmpdir? Are the > running VMs using their own temp dir? > > No, this is a very simple setup. On my local machine, I build jdk-hs from the current tip. Then I run a simple java HelloWorld, without any options given. The program just waits on a keystroke. $ ../../openjdk/jdk-hs/output-fastdbg/images/jdk/bin/java HelloWorld I start jcmd from the same image. Again, no options. $ ./images/jdk/bin/jcmd 248472 jdk.jcmd/sun.tools.jcmd.JCmd And I only see the jcmd process itself. Note that in this example I run from cygwin shells, but the same error happens when running from cmd.exe. Also note that I also have other java processes running on the same box, eg an Eclipse instance using openjdk9. It does not show up in the jcmd process listing either. I can, however, attach to my HelloWorld process via pid: $ ./images/jdk/bin/jcmd 248204 help 248204: The following commands are available: VM.log VM.native_memory ..... But unsurprisingly not via name: $ ./images/jdk/bin/jcmd HelloWorld help Could not find any processes matching : 'HelloWorld' I am pretty sure this used to work on my machine some time ago. Thanks, Thomas -Alan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasuenag at gmail.com Tue Apr 10 11:10:16 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Tue, 10 Apr 2018 20:10:16 +0900 Subject: PING: RFR: 8199519: Several GC tests fails with: java.lang.NumberFormatException: Unparseable number: "-" In-Reply-To: <33358f2d-4e01-7ccb-0f06-02b6828fe65b@gmail.com> References: <6755303f-a1a0-da4f-e1e0-a1bcb0c72efd@gmail.com> <7809552d-dfa0-5f26-bd82-c13df7f45f5f@oracle.com> <85853429-a520-1782-40e4-e05776aa639d@oracle.com> <40b04f2e-1d6c-524e-ea4a-08c42fd41ee6@gmail.com> <93a1ffeb-4959-3bdb-cbe3-510c258129b6@oracle.com> <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> <33358f2d-4e01-7ccb-0f06-02b6828fe65b@gmail.com> Message-ID: <56804cfb-7b80-a8f0-c866-cda4b36799fd@gmail.com> PING: Could you review it? We need one more reviewer. >>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ Yasumasa On 2018/04/03 21:37, Yasumasa Suenaga wrote: > PING: Could you review it? > This change has been passed Mach5 test. > >>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ > > > Thanks, > > Yasumasa > > > On 2018/03/28 22:38, Stefan Johansson wrote: >> Mach5 testing looks good. >> >> Can someone in the serviceability team do the second review? >> >> Cheers, >> Stefan >> >> On 2018-03-28 13:32, Yasumasa Suenaga wrote: >>> Thanks Stefan, >>> I'm waiting for second reviewer. >>> >>> >>> Yasumasa >>> >>> >>> 2018?3?28?(?) 18:36 Stefan Johansson >: >>> >>> ??? Hi Yasumasa, >>> >>> ??? Local testing looks good and I've kicked of some additional Mach5 >>> ??? testing that will include these tests on all platforms. >>> >>> ??? Cheers, >>> ??? Stefan >>> >>> ??? On 2018-03-28 06:04, Yasumasa Suenaga wrote: >>> ??? > Hi Stefan, >>> ??? > >>> ??? > Thank you for sharing your report! >>> ??? > I could reproduce them on my VM. >>> ??? > >>> ??? > I've fixed them in new webrev, and it works fine on my environment. >>> ??? > Could you check again? >>> ??? > >>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >>> ??? > >>> ??? > >>> ??? > Thanks, >>> ??? > >>> ??? > Yasumasa >>> ??? > >>> ??? > >>> ??? > >>> ??? > 2018-03-28 0:29 GMT+09:00 Stefan Johansson >: >>> ??? >> >>> ??? >> On 2018-03-27 16:44, Yasumasa Suenaga wrote: >>> ??? >>> Hi Stefan, >>> ??? >>> >>> ??? >>> On 2018/03/27 22:45, Stefan Johansson wrote: >>> ??? >>>> Hi Yasumasa, >>> ??? >>>> >>> ??? >>>> On 2018-03-27 10:56, Yasumasa Suenaga wrote: >>> ??? >>>>> Hi Stefan, >>> ??? >>>>> >>> ??? >>>>> Thank you for your comment. >>> ??? >>>>> I updated webrev: >>> ??? >>>>> >>> ??? >>>>>? ? ?webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.01/ >>> ??? >>>> I think the usage of Optional in Expression.setRequired(bool) is a bit >>> ??? >>>> unnecessary. It will create temporary objects and there is no benefit from >>> ??? >>>> just doing two simple if-statements. >>> ??? >>> >>> ??? >>> I fixed it in new webrev: >>> ??? >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.02/ >>> ??? >>> >>> ??? >>> >>> ??? >>>> I also ran this patch (and the one using forcibly) on my single core VM >>> ??? >>>> and realized that this fix will have to include some awk-file updates to >>> ??? >>>> make the test in test/jdk/sun/tools/jstat pass when Serial in chosen as the >>> ??? >>>> default collector. The tests in test/jdk/sun/tools/jstatd/ are fine. >>> ??? >>> >>> ??? >>> Can you share the failure report? >>> ??? >> It relates to all tests that display the the CGC and the CGCT columns, for >>> ??? >> example in jstatGCOutput1.sh: >>> ??? >>? ?S0C? ? S1C? ? S0U? ? S1U? ? ? EC? ? ? ?EU OC ?OU? ? ? ?MC? ? ?MU >>> ??? >> CCSC? ?CCSU? ?YGC? ? ?YGCT FGC? ? FGCT? ? CGC CGCT? ? ?GCT >>> ??? >> 256.0? 256.0? 254.0? ?0.0? ? 2176.0? ?1025.0 5504.0 920.5? ? 7168.0 >>> ??? >> 6839.7 768.0? 602.8? ? ? ?2? ? 0.007? ?0 0.000? ?- ? ? ? -? ? 0.007 >>> ??? >> >>> ??? >> The awk regex needs to be updated to handle '-' for these tests: >>> ??? >> test: sun/tools/jstat/jstatGcCapacityOutput1.sh >>> ??? >> Failed. Execution failed: exit code 1 >>> ??? >> >>> ??? >> test: sun/tools/jstat/jstatGcMetaCapacityOutput1.sh >>> ??? >> Failed. Execution failed: exit code 1 >>> ??? >> >>> ??? >> test: sun/tools/jstat/jstatGcNewCapacityOutput1.sh >>> ??? >> Failed. Execution failed: exit code 1 >>> ??? >> >>> ??? >> test: sun/tools/jstat/jstatGcOldCapacityOutput1.sh >>> ??? >> Failed. Execution failed: exit code 1 >>> ??? >> >>> ??? >> test: sun/tools/jstat/jstatGcOldOutput1.sh >>> ??? >> Failed. Execution failed: exit code 1 >>> ??? >> >>> ??? >> test: sun/tools/jstat/jstatGcOutput1.sh >>> ??? >> Failed. Execution failed: exit code 1 >>> ??? >> >>> ??? >> >>> ??? >>> If it occurs in jstatClassloadOutput1.sh, it relates to JDK-8173942. >>> ??? >>> >>> ??? >>> >>> ??? >>> Thanks, >>> ??? >>> >>> ??? >>> Yasumasa >>> ??? >>> >>> ??? >>> >>> ??? >>>> Thanks, >>> ??? >>>> Stefan >>> ??? >>>>>? ? ?submit-hs: mach5-one-ysuenaga-JDK-8199519-20180327-0652-16322 >>> ??? >>>>> >>> ??? >>>>> >>> ??? >>>>> Thanks, >>> ??? >>>>> >>> ??? >>>>> Yasumasa >>> ??? >>>>> >>> ??? >>>>> >>> ??? >>>>> >>> ??? >>>>> 2018-03-27 0:03 GMT+09:00 Stefan Johansson >>> ??? >>>>> >: >>> ??? >>>>>> Hi Yasumasa, >>> ??? >>>>>> >>> ??? >>>>>> On 2018-03-22 11:35, Yasumasa Suenaga wrote: >>> ??? >>>>>>> Hi all, >>> ??? >>>>>>> >>> ??? >>>>>>> Please review this change: >>> ??? >>>>>>> >>> ??? >>>>>>>? ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8199519 >>> ??? >>>>>>> webrev: cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.00/ >>> ??? >>>>>> The fix seems to make things to work as expected. Manually tested it >>> ??? >>>>>> and >>> ??? >>>>>> Mach5 also looks good. >>> ??? >>>>>> >>> ??? >>>>>> I have some comments regarding the patch. I think 'forcibly' should be >>> ??? >>>>>> rename to something more descriptive. Naming is never easy but I think >>> ??? >>>>>> 'required' would be better, as in, this column is required and not >>> ??? >>>>>> allowed >>> ??? >>>>>> to print '-'. That would also render the code in >>> ??? >>>>>> ExpressionResolver.java to >>> ??? >>>>>> be: >>> ??? >>>>>>? ? ?return new Literal(isRequired ? 0.0d : Double.NaN); >>> ??? >>>>>> I think that also better explains why we return 0 instead of NaN. >>> ??? >>>>>> >>> ??? >>>>>> I would also like to see the forcibly/required state moved into the >>> ??? >>>>>> Expression it self, that way we don't have to pass it around but can >>> ??? >>>>>> instead >>> ??? >>>>>> do: >>> ??? >>>>>>? ? ?return new Literal(e.isRequired() ? 0.0d : Double.NaN); >>> ??? >>>>>> >>> ??? >>>>>> Thanks, >>> ??? >>>>>> Stefan >>> ??? >>>>>> >>> ??? >>>>>> >>> ??? >>>>>>> After JDK-8153333, some jstat tests are failed because GCT in jstat >>> ??? >>>>>>> output >>> ??? >>>>>>> is dash (-) if garbage collector is not concurrent collector e.g. >>> ??? >>>>>>> Serial GC. >>> ??? >>>>>>> I fixed that GCT can be calculated correctly. >>> ??? >>>>>>> >>> ??? >>>>>>> This change has been tested on Mach5 by Stefan. >>> ??? >>>>>>> >>> ??? >>>>>>> >>> ??? >>>>>>> Thanks, >>> ??? >>>>>>> >>> ??? >>>>>>> Yasumasa >>> ??? >>>>>> >>> >> From amit.sapre at oracle.com Tue Apr 10 11:26:04 2018 From: amit.sapre at oracle.com (Amit Sapre) Date: Tue, 10 Apr 2018 04:26:04 -0700 (PDT) Subject: RFR : JDK-8042215 - javax/management/remote/mandatory/connection/ReconnectTest.java NoSuchObjectException no such object in table In-Reply-To: <9851f5fa-86e5-4ee3-a303-44a90dd934d6@default> References: <9851f5fa-86e5-4ee3-a303-44a90dd934d6@default> Message-ID: <27566165-47f4-4a94-ab88-5144e495cb17@default> Ping. Do review the changes. From: Amit Sapre Sent: Tuesday, April 03, 2018 3:38 PM To: serviceability-dev at openjdk.java.net Subject: RFR : JDK-8042215 - javax/management/remote/mandatory/connection/ReconnectTest.java NoSuchObjectException no such object in table Hello, Please review changes for refactored test case As part of refactoring, 1) Removed iiop & jmxmp protocol related code 2) Added exception handling during connector connection. Webrev : http://cr.openjdk.java.net/~asapre/webrev/2018/JDK-8042215/webrev.00/ Bug ID : https://bugs.openjdk.java.net/browse/JDK-8042215 Thanks, Amit -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew_m_leonard at uk.ibm.com Tue Apr 10 12:43:35 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Tue, 10 Apr 2018 13:43:35 +0100 Subject: RFR: Fix race condition in jdwp In-Reply-To: References: Message-ID: Hi David, The existing "initComplete" in debugInit.c logic is the debug main thread initialization complete logic, that does not handle nor needs to wait for the asynchronous VM initialization that will be reported but must be completed before the debug loop can start processing cmds. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: David Holmes To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 10/04/2018 02:06 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, On 10/04/2018 2:07 AM, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can occur > in the jdwp under certain circumstances, and we were able to force the > same issue with Hotspot. Normally, the event helper thread suspends all > threads, then the debug loop in the listener thread receives a command > to resume. The debugger may deadlock if the debug loop in the listener > thread starts processing commands (e.g. resume threads) before the event > helper completes the initialization (and suspends threads). > > This patch adds synchronization to ensure the event helper completes the > initialization sequence before debugger commands are processed. How does this relate to the existing debugInit_waitInitComplete? I don't see why we would want two initialization synchronization mechanisms. ?? Thanks, David > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew_m_leonard at uk.ibm.com Tue Apr 10 13:23:54 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Tue, 10 Apr 2018 14:23:54 +0100 Subject: RFR: Fix race condition in jdwp In-Reply-To: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> Message-ID: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Tue Apr 10 15:34:28 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 10 Apr 2018 15:34:28 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> Message-ID: <29475f2084e24093892b4289b1e34f62@sap.com> Hi Christoph, thanks for doing this laborious change ... comparing all these files :) Change looks good, just some minor comments: You say you are sorting the includes, but in the VirtualMachineImpl.c files the order is changed, but according to which order? It's not alphabetical as in other files. In windows VirtualMachineImpl.c, what was wrong with printing the last error code? Best regards, Goetz. > -----Original Message----- > From: serviceability-dev [mailto:serviceability-dev- > bounces at openjdk.java.net] On Behalf Of Langer, Christoph > Sent: Freitag, 6. April 2018 17:02 > To: serviceability-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: [CAUTION] RFR (M): 8201247: Various cleanups in the attach > framework > > Hi, > > > > can I please get reviews for a set of clean up changes that I came across > when doing some integration work. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > > > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > > > > > Detailed comments about the changes can be found in the bug. > > > > Thanks & best regards > > Christoph > > > > From martinrb at google.com Tue Apr 10 17:18:06 2018 From: martinrb at google.com (Martin Buchholz) Date: Tue, 10 Apr 2018 10:18:06 -0700 Subject: RFR: 8201327: Make Sensor deeply immutably thread safe In-Reply-To: References: <561c83e7-cbbb-8b7f-87f9-fd5c3f4a7422@oracle.com> <358B88C1-9C17-4FFE-A17A-681E8F23F908@oracle.com> Message-ID: On Mon, Apr 9, 2018 at 7:13 PM, Martin Buchholz wrote: > > > On Mon, Apr 9, 2018 at 1:25 PM, Chris Hegarty > wrote: > >> >> > except maybe the "initially false" and "initially 0" comments as they >> are just documenting the default values and don't add anything. >> >> Yeah, I?ve done this before too, especially when removing default values >> ( as if to prevent them from inadvertently creeping back in ). >> > > If only there was standard concise wording for "intentionally using the > field's default value, which is slightly more efficient and slightly more > theoretically thread-safe, even though slightly less readable" ... > > // VM-initialized to false > I'm committing with the hinting wording // VM-initialized to false -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Tue Apr 10 17:31:19 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 10 Apr 2018 10:31:19 -0700 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <29475f2084e24093892b4289b1e34f62@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <29475f2084e24093892b4289b1e34f62@sap.com> Message-ID: <50b1511e-9f45-60eb-bbec-9734da10f373@oracle.com> On 4/10/18 8:34 AM, Lindenmaier, Goetz wrote: > Hi Christoph, > > thanks for doing this laborious change ... comparing all these files :) > > Change looks good, just some minor comments: > > You say you are sorting the includes, but in the VirtualMachineImpl.c > files the order is changed, but according to which order? It's > not alphabetical as in other files. > > In windows VirtualMachineImpl.c, what was wrong with printing the > last error code? JNU_ThrowIOExceptionWithLastError already includes it. Chris > > Best regards, > Goetz. > > > >> -----Original Message----- >> From: serviceability-dev [mailto:serviceability-dev- >> bounces at openjdk.java.net] On Behalf Of Langer, Christoph >> Sent: Freitag, 6. April 2018 17:02 >> To: serviceability-dev at openjdk.java.net >> Cc: hotspot-dev at openjdk.java.net >> Subject: [CAUTION] RFR (M): 8201247: Various cleanups in the attach >> framework >> >> Hi, >> >> >> >> can I please get reviews for a set of clean up changes that I came across >> when doing some integration work. >> >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 >> >> >> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ >> >> >> >> >> Detailed comments about the changes can be found in the bug. >> >> >> >> Thanks & best regards >> >> Christoph >> >> >> >> From christoph.langer at sap.com Tue Apr 10 20:01:04 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Tue, 10 Apr 2018 20:01:04 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <1497c4c6-d95e-7e25-b503-cc7da9d6d077@oracle.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> <1497c4c6-d95e-7e25-b503-cc7da9d6d077@oracle.com> Message-ID: <1e441d12b6ca421f8b6774562c8e30b9@sap.com> Hi Chris, I ran the jtreg tests under hotspot/jtreg/serviceability/attach and jdk/com/sun/tools/attach for the main platforms (Windows, Linux X86_64, mac, solaris and AIX). I also pushed to submit-hs in branch "JDK-8201247" but it seems I have no luck and got no notification mails. May I ask you to check whether the build/test cycle was run and how the results looked like? Please also do your closed testing and let me know the outcome. Thanks a lot in advance Christoph > -----Original Message----- > From: Chris Plummer [mailto:chris.plummer at oracle.com] > Sent: Montag, 9. April 2018 20:05 > To: Langer, Christoph ; serviceability- > dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > > Hi Christoph, > > We have some closed "attach on demand" tests that should be run also. I > can do this for you when you are ready. Please also let me know which > other jtreg tests you have run. > > thanks, > > Chris > > On 4/9/18 12:08 AM, Langer, Christoph wrote: > > Hi Chris, > > > > thanks for looking into this. > > > > As for ArgumentIterator::next, I must admit, I found this patch in our code > base when taking over the code. I believe that an issue would be seen if an > attach operation has 2 or 3 arguments and the first one is NULL/empty. I > guess such a situation can't happen with the attach operations currently > existing in OpenJDK as none of these ops would allow such type of > arguments. However, in our implementation, we have for instance enhanced > the "dump_heap" operation to work with null as first argument where one > usually would specify the desired output file name. We implemented a > mechanism to compute a default filename when the param is left blank. So > we need the fix for that case, I guess. > > > > I'll run the patch through the submission forest now and do some jtreg > testing. > > > > Best regards > > Christoph > > > >> -----Original Message----- > >> From: Chris Plummer [mailto:chris.plummer at oracle.com] > >> Sent: Freitag, 6. April 2018 18:37 > >> To: Langer, Christoph ; serviceability- > >> dev at openjdk.java.net > >> Cc: hotspot-dev at openjdk.java.net > >> Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > >> > >> Hi Christoph, > >> > >> Can you explain a bit more about "fix handling of null values in > >> ArgumentIterator::next". When does this turn up? Is there a test case? > >> > >> Everything else looks good. > >> > >> thanks, > >> > >> Chris > >> > >> On 4/6/18 8:01 AM, Langer, Christoph wrote: > >>> Hi, > >>> > >>> can I please get reviews for a set of clean up changes that I came > >>> across when doing some integration work. > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > >>> > >>> > >>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > >>> > >>> > >>> Detailed comments about the changes can be found in the bug. > >>> > >>> Thanks & best regards > >>> > >>> Christoph > >>> From chris.plummer at oracle.com Tue Apr 10 20:08:04 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 10 Apr 2018 13:08:04 -0700 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <1e441d12b6ca421f8b6774562c8e30b9@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> <1497c4c6-d95e-7e25-b503-cc7da9d6d077@oracle.com> <1e441d12b6ca421f8b6774562c8e30b9@sap.com> Message-ID: <630dcc82-81c3-f09f-76db-da4ff7d03363@oracle.com> Hi Christoph, I'm somewhat new to looking at submit-hs test jobs. However I see know indication of there being a submit for JDK-8201247, so I don't think it was run. I'll start my own testing with the last patch you sent out. thanks, Chris On 4/10/18 1:01 PM, Langer, Christoph wrote: > Hi Chris, > > I ran the jtreg tests under hotspot/jtreg/serviceability/attach and jdk/com/sun/tools/attach for the main platforms (Windows, Linux X86_64, mac, solaris and AIX). > > I also pushed to submit-hs in branch "JDK-8201247" but it seems I have no luck and got no notification mails. May I ask you to check whether the build/test cycle was run and how the results looked like? > > Please also do your closed testing and let me know the outcome. > > Thanks a lot in advance > Christoph > >> -----Original Message----- >> From: Chris Plummer [mailto:chris.plummer at oracle.com] >> Sent: Montag, 9. April 2018 20:05 >> To: Langer, Christoph ; serviceability- >> dev at openjdk.java.net >> Cc: hotspot-dev at openjdk.java.net >> Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework >> >> Hi Christoph, >> >> We have some closed "attach on demand" tests that should be run also. I >> can do this for you when you are ready. Please also let me know which >> other jtreg tests you have run. >> >> thanks, >> >> Chris >> >> On 4/9/18 12:08 AM, Langer, Christoph wrote: >>> Hi Chris, >>> >>> thanks for looking into this. >>> >>> As for ArgumentIterator::next, I must admit, I found this patch in our code >> base when taking over the code. I believe that an issue would be seen if an >> attach operation has 2 or 3 arguments and the first one is NULL/empty. I >> guess such a situation can't happen with the attach operations currently >> existing in OpenJDK as none of these ops would allow such type of >> arguments. However, in our implementation, we have for instance enhanced >> the "dump_heap" operation to work with null as first argument where one >> usually would specify the desired output file name. We implemented a >> mechanism to compute a default filename when the param is left blank. So >> we need the fix for that case, I guess. >>> I'll run the patch through the submission forest now and do some jtreg >> testing. >>> Best regards >>> Christoph >>> >>>> -----Original Message----- >>>> From: Chris Plummer [mailto:chris.plummer at oracle.com] >>>> Sent: Freitag, 6. April 2018 18:37 >>>> To: Langer, Christoph ; serviceability- >>>> dev at openjdk.java.net >>>> Cc: hotspot-dev at openjdk.java.net >>>> Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework >>>> >>>> Hi Christoph, >>>> >>>> Can you explain a bit more about "fix handling of null values in >>>> ArgumentIterator::next". When does this turn up? Is there a test case? >>>> >>>> Everything else looks good. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 4/6/18 8:01 AM, Langer, Christoph wrote: >>>>> Hi, >>>>> >>>>> can I please get reviews for a set of clean up changes that I came >>>>> across when doing some integration work. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 >>>>> >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ >>>>> >>>>> >>>>> Detailed comments about the changes can be found in the bug. >>>>> >>>>> Thanks & best regards >>>>> >>>>> Christoph >>>>> From serguei.spitsyn at oracle.com Wed Apr 11 00:02:17 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 10 Apr 2018 17:02:17 -0700 Subject: RFR: Fix race condition in jdwp In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> Message-ID: <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> An HTML attachment was scrubbed... URL: From chris.plummer at oracle.com Wed Apr 11 05:36:54 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 10 Apr 2018 22:36:54 -0700 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <630dcc82-81c3-f09f-76db-da4ff7d03363@oracle.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> <1497c4c6-d95e-7e25-b503-cc7da9d6d077@oracle.com> <1e441d12b6ca421f8b6774562c8e30b9@sap.com> <630dcc82-81c3-f09f-76db-da4ff7d03363@oracle.com> Message-ID: Hi Christoph, I finished testing. No issues. thanks, Chris On 4/10/18 1:08 PM, Chris Plummer wrote: > Hi Christoph, > > I'm somewhat new to looking at submit-hs test jobs. However I see know > indication of there being a submit for JDK-8201247, so I don't think > it was run. I'll start my own testing with the last patch you sent out. > > thanks, > > Chris > > On 4/10/18 1:01 PM, Langer, Christoph wrote: >> Hi Chris, >> >> I ran the jtreg tests under hotspot/jtreg/serviceability/attach and >> jdk/com/sun/tools/attach for the main platforms (Windows, Linux >> X86_64, mac, solaris and AIX). >> >> I also pushed to submit-hs in branch "JDK-8201247" but it seems I >> have no luck and got no notification mails. May I ask you to check >> whether the build/test cycle was run and how the results looked like? >> >> Please also do your closed testing and let me know the outcome. >> >> Thanks a lot in advance >> Christoph >> >>> -----Original Message----- >>> From: Chris Plummer [mailto:chris.plummer at oracle.com] >>> Sent: Montag, 9. April 2018 20:05 >>> To: Langer, Christoph ; serviceability- >>> dev at openjdk.java.net >>> Cc: hotspot-dev at openjdk.java.net >>> Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework >>> >>> Hi Christoph, >>> >>> We have some closed "attach on demand" tests that should be run also. I >>> can do this for you when you are ready. Please also let me know which >>> other jtreg tests you have run. >>> >>> thanks, >>> >>> Chris >>> >>> On 4/9/18 12:08 AM, Langer, Christoph wrote: >>>> Hi Chris, >>>> >>>> thanks for looking into this. >>>> >>>> As for ArgumentIterator::next, I must admit, I found this patch in >>>> our code >>> base when taking over the code. I believe that an issue would be >>> seen if an >>> attach operation has 2 or 3 arguments and the first one is >>> NULL/empty. I >>> guess such a situation can't happen with the attach operations >>> currently >>> existing in OpenJDK as none of these ops would allow such type of >>> arguments. However, in our implementation, we have for instance >>> enhanced >>> the "dump_heap" operation to work with null as first argument where one >>> usually would specify the desired output file name. We implemented a >>> mechanism to compute a default filename when the param is left >>> blank. So >>> we need the fix for that case, I guess. >>>> I'll run the patch through the submission forest now and do some jtreg >>> testing. >>>> Best regards >>>> Christoph >>>> >>>>> -----Original Message----- >>>>> From: Chris Plummer [mailto:chris.plummer at oracle.com] >>>>> Sent: Freitag, 6. April 2018 18:37 >>>>> To: Langer, Christoph ; serviceability- >>>>> dev at openjdk.java.net >>>>> Cc: hotspot-dev at openjdk.java.net >>>>> Subject: Re: RFR (M): 8201247: Various cleanups in the attach >>>>> framework >>>>> >>>>> Hi Christoph, >>>>> >>>>> Can you explain a bit more about "fix handling of null values in >>>>> ArgumentIterator::next". When does this turn up? Is there a test >>>>> case? >>>>> >>>>> Everything else looks good. >>>>> >>>>> thanks, >>>>> >>>>> Chris >>>>> >>>>> On 4/6/18 8:01 AM, Langer, Christoph wrote: >>>>>> Hi, >>>>>> >>>>>> can I please get reviews for a set of clean up changes that I came >>>>>> across when doing some integration work. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 >>>>>> >>>>>> >>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ >>>>>> >>>>>> >>>>>> Detailed comments about the changes can be found in the bug. >>>>>> >>>>>> Thanks & best regards >>>>>> >>>>>> Christoph >>>>>> > From goetz.lindenmaier at sap.com Wed Apr 11 05:58:22 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 11 Apr 2018 05:58:22 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <50b1511e-9f45-60eb-bbec-9734da10f373@oracle.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <29475f2084e24093892b4289b1e34f62@sap.com> <50b1511e-9f45-60eb-bbec-9734da10f373@oracle.com> Message-ID: <0b78469d4d5a441da20572059ae5f216@sap.com> Ah, ok, thanks for the info! Best regards, Goetz. > -----Original Message----- > From: Chris Plummer [mailto:chris.plummer at oracle.com] > Sent: Dienstag, 10. April 2018 19:31 > To: Lindenmaier, Goetz ; Langer, Christoph > ; serviceability-dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > > On 4/10/18 8:34 AM, Lindenmaier, Goetz wrote: > > Hi Christoph, > > > > thanks for doing this laborious change ... comparing all these files :) > > > > Change looks good, just some minor comments: > > > > You say you are sorting the includes, but in the VirtualMachineImpl.c > > files the order is changed, but according to which order? It's > > not alphabetical as in other files. > > > > In windows VirtualMachineImpl.c, what was wrong with printing the > > last error code? > JNU_ThrowIOExceptionWithLastError already includes it. > > Chris > > > > Best regards, > > Goetz. > > > > > > > >> -----Original Message----- > >> From: serviceability-dev [mailto:serviceability-dev- > >> bounces at openjdk.java.net] On Behalf Of Langer, Christoph > >> Sent: Freitag, 6. April 2018 17:02 > >> To: serviceability-dev at openjdk.java.net > >> Cc: hotspot-dev at openjdk.java.net > >> Subject: [CAUTION] RFR (M): 8201247: Various cleanups in the attach > >> framework > >> > >> Hi, > >> > >> > >> > >> can I please get reviews for a set of clean up changes that I came across > >> when doing some integration work. > >> > >> > >> > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > >> > >> > >> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > >> > >> > >> > >> > >> Detailed comments about the changes can be found in the bug. > >> > >> > >> > >> Thanks & best regards > >> > >> Christoph > >> > >> > >> > >> From christoph.langer at sap.com Wed Apr 11 07:44:56 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 11 Apr 2018 07:44:56 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <29475f2084e24093892b4289b1e34f62@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <29475f2084e24093892b4289b1e34f62@sap.com> Message-ID: <5073b066ce2c43f78a17a5443a578379@sap.com> Hi Goetz, thanks for the review. > You say you are sorting the includes, but in the VirtualMachineImpl.c > files the order is changed, but according to which order? It's > not alphabetical as in other files. It is. However, I have put "subdirs" first. That is, the includes from sys/* come first (in alphabetical order). Best regards Christoph From christoph.langer at sap.com Wed Apr 11 07:45:34 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 11 Apr 2018 07:45:34 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> <1497c4c6-d95e-7e25-b503-cc7da9d6d077@oracle.com> <1e441d12b6ca421f8b6774562c8e30b9@sap.com> <630dcc82-81c3-f09f-76db-da4ff7d03363@oracle.com> Message-ID: <1fb9b7b44ce043ed91b9837b8a11ffe9@sap.com> Thanks Chris, I'll push it then today. > -----Original Message----- > From: Chris Plummer [mailto:chris.plummer at oracle.com] > Sent: Mittwoch, 11. April 2018 07:37 > To: Langer, Christoph ; serviceability- > dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > > Hi Christoph, > > I finished testing. No issues. > > thanks, > > Chris > > On 4/10/18 1:08 PM, Chris Plummer wrote: > > Hi Christoph, > > > > I'm somewhat new to looking at submit-hs test jobs. However I see know > > indication of there being a submit for JDK-8201247, so I don't think > > it was run. I'll start my own testing with the last patch you sent out. > > > > thanks, > > > > Chris > > > > On 4/10/18 1:01 PM, Langer, Christoph wrote: > >> Hi Chris, > >> > >> I ran the jtreg tests under hotspot/jtreg/serviceability/attach and > >> jdk/com/sun/tools/attach for the main platforms (Windows, Linux > >> X86_64, mac, solaris and AIX). > >> > >> I also pushed to submit-hs in branch "JDK-8201247" but it seems I > >> have no luck and got no notification mails. May I ask you to check > >> whether the build/test cycle was run and how the results looked like? > >> > >> Please also do your closed testing and let me know the outcome. > >> > >> Thanks a lot in advance > >> Christoph > >> > >>> -----Original Message----- > >>> From: Chris Plummer [mailto:chris.plummer at oracle.com] > >>> Sent: Montag, 9. April 2018 20:05 > >>> To: Langer, Christoph ; serviceability- > >>> dev at openjdk.java.net > >>> Cc: hotspot-dev at openjdk.java.net > >>> Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > >>> > >>> Hi Christoph, > >>> > >>> We have some closed "attach on demand" tests that should be run also. > I > >>> can do this for you when you are ready. Please also let me know which > >>> other jtreg tests you have run. > >>> > >>> thanks, > >>> > >>> Chris > >>> > >>> On 4/9/18 12:08 AM, Langer, Christoph wrote: > >>>> Hi Chris, > >>>> > >>>> thanks for looking into this. > >>>> > >>>> As for ArgumentIterator::next, I must admit, I found this patch in > >>>> our code > >>> base when taking over the code. I believe that an issue would be > >>> seen if an > >>> attach operation has 2 or 3 arguments and the first one is > >>> NULL/empty. I > >>> guess such a situation can't happen with the attach operations > >>> currently > >>> existing in OpenJDK as none of these ops would allow such type of > >>> arguments. However, in our implementation, we have for instance > >>> enhanced > >>> the "dump_heap" operation to work with null as first argument where > one > >>> usually would specify the desired output file name. We implemented a > >>> mechanism to compute a default filename when the param is left > >>> blank. So > >>> we need the fix for that case, I guess. > >>>> I'll run the patch through the submission forest now and do some jtreg > >>> testing. > >>>> Best regards > >>>> Christoph > >>>> > >>>>> -----Original Message----- > >>>>> From: Chris Plummer [mailto:chris.plummer at oracle.com] > >>>>> Sent: Freitag, 6. April 2018 18:37 > >>>>> To: Langer, Christoph ; serviceability- > >>>>> dev at openjdk.java.net > >>>>> Cc: hotspot-dev at openjdk.java.net > >>>>> Subject: Re: RFR (M): 8201247: Various cleanups in the attach > >>>>> framework > >>>>> > >>>>> Hi Christoph, > >>>>> > >>>>> Can you explain a bit more about "fix handling of null values in > >>>>> ArgumentIterator::next". When does this turn up? Is there a test > >>>>> case? > >>>>> > >>>>> Everything else looks good. > >>>>> > >>>>> thanks, > >>>>> > >>>>> Chris > >>>>> > >>>>> On 4/6/18 8:01 AM, Langer, Christoph wrote: > >>>>>> Hi, > >>>>>> > >>>>>> can I please get reviews for a set of clean up changes that I came > >>>>>> across when doing some integration work. > >>>>>> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > >>>>>> > >>>>>> > >>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > >>>>>> > >>>>>> > >>>>>> Detailed comments about the changes can be found in the bug. > >>>>>> > >>>>>> Thanks & best regards > >>>>>> > >>>>>> Christoph > >>>>>> > > From goetz.lindenmaier at sap.com Wed Apr 11 07:56:22 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 11 Apr 2018 07:56:22 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <5073b066ce2c43f78a17a5443a578379@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <29475f2084e24093892b4289b1e34f62@sap.com> <5073b066ce2c43f78a17a5443a578379@sap.com> Message-ID: <8b7165cb8e884c5bb6a8cf872817e527@sap.com> Hi Christoph, I'm familiar with the non-system includes in hotspot, there mostly a total alphabetical ordering is followed, like in http://hg.openjdk.java.net/jdk/hs/file/0d8ed8b2ac4f/src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp > It is. However, I have put "subdirs" first. That is, the includes from sys/* Also, that's not true, see the first file in your webrev, which is sorted as I would expect: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/src/hotspot/os/aix/attachListener_aix.cpp.html while this sorts as you state: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c.html Is there a different pattern followed in hotspot and jdk coding? In case you resort them (have fun :) ), no new webrev is needed. Best regards, Goetz. > -----Original Message----- > From: Langer, Christoph > Sent: Mittwoch, 11. April 2018 09:45 > To: Lindenmaier, Goetz ; serviceability- > dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: RE: RFR (M): 8201247: Various cleanups in the attach framework > > Hi Goetz, > > thanks for the review. > > > You say you are sorting the includes, but in the VirtualMachineImpl.c > > files the order is changed, but according to which order? It's > > not alphabetical as in other files. > > It is. However, I have put "subdirs" first. That is, the includes from sys/* > come first (in alphabetical order). > > Best regards > Christoph From christoph.langer at sap.com Wed Apr 11 08:02:52 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 11 Apr 2018 08:02:52 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <8b7165cb8e884c5bb6a8cf872817e527@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <29475f2084e24093892b4289b1e34f62@sap.com> <5073b066ce2c43f78a17a5443a578379@sap.com> <8b7165cb8e884c5bb6a8cf872817e527@sap.com> Message-ID: <55c2c1c33c2748aca863332d521b9eb6@sap.com> Hi Goetz, I did not aim to sort the includes in the attachListener_.cpp files. I also just saw that I made a change in attachListener_aix.cpp for some other merge-related reasons - but it was not correctly sorted before and neither is after ?? This has to be done in a future change... Best regards Christoph > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Mittwoch, 11. April 2018 09:56 > To: Langer, Christoph ; serviceability- > dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: RE: RFR (M): 8201247: Various cleanups in the attach framework > > Hi Christoph, > > I'm familiar with the non-system includes in hotspot, > there mostly a total alphabetical ordering is followed, like in > http://hg.openjdk.java.net/jdk/hs/file/0d8ed8b2ac4f/src/hotspot/cpu/x86/ > c1_CodeStubs_x86.cpp > > It is. However, I have put "subdirs" first. That is, the includes from sys/* > Also, that's not true, see the first file in your webrev, which is sorted as I > would expect: > http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/src/hotspot/os/aix > /attachListener_aix.cpp.html > while this sorts as you state: > http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/src/jdk.attach/linux > /native/libattach/VirtualMachineImpl.c.html > > Is there a different pattern followed in hotspot and jdk coding? > > In case you resort them (have fun :) ), no new webrev is needed. > > Best regards, > Goetz. > > > > -----Original Message----- > > From: Langer, Christoph > > Sent: Mittwoch, 11. April 2018 09:45 > > To: Lindenmaier, Goetz ; serviceability- > > dev at openjdk.java.net > > Cc: hotspot-dev at openjdk.java.net > > Subject: RE: RFR (M): 8201247: Various cleanups in the attach framework > > > > Hi Goetz, > > > > thanks for the review. > > > > > You say you are sorting the includes, but in the VirtualMachineImpl.c > > > files the order is changed, but according to which order? It's > > > not alphabetical as in other files. > > > > It is. However, I have put "subdirs" first. That is, the includes from sys/* > > come first (in alphabetical order). > > > > Best regards > > Christoph From andrew_m_leonard at uk.ibm.com Wed Apr 11 08:29:16 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Wed, 11 Apr 2018 09:29:16 +0100 Subject: RFR: Fix race condition in jdwp In-Reply-To: <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> Message-ID: Thanks Serguei, I terms of a standalone testcase it is quite tricky, as due to the nature of the issue which took a lot of investigation to solve it's very timing dependent and will only occur randomly. It can be forced as I indicated below by adding a "sleep" in the VMInit report code but that's not a testcase, however the issue was originally found in our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, but again only happened intermittently. Sort of like "performance" type issues we're not always going to be able to create a testcase that will always "fail" if the fix is not present. Your thoughts? Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 01:02 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, Okay, I'll file a bug on this topic. But do you have a standalone test demonstrating this issue? Thanks, Serguei On 4/10/18 06:23, Andrew Leonard wrote: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Apr 11 08:57:41 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 11 Apr 2018 01:57:41 -0700 Subject: RFR: Fix race condition in jdwp In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> Message-ID: <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> An HTML attachment was scrubbed... URL: From andrew_m_leonard at uk.ibm.com Wed Apr 11 13:33:00 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Wed, 11 Apr 2018 14:33:00 +0100 Subject: RFR: Fix race condition in jdwp In-Reply-To: <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> Message-ID: Hi Serguei, Thank you for raising the bug. I had a chat with one of my colleagues who could recreate it, and it's probably related to the handshaking that is done in the particular scenario. So with the JCK harness: com.sun.jck.lib.ExecJCKTestOtherJVMCmd LD_LIBRARY_PATH=/javatest/lib/jck /jck8b/natives/linux_x86-64 /projects/jck/jdwp/j2sdk-image/bin/java -Xdump:system:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -Xdump:snap:none -Xdump:snap:events=gpf+abort+traceassert+corruptcache -Xdump:java:none -Xdump:java:events=gpf+abort+traceassert+corruptcache -Xdump:heap:none -Xdump:heap:events=gpf+abort+traceassert+corruptcache - Xfuture -agentlib:jdwp=server=y,transport=dt_socket,address=localhost :35000,suspend=y -classpath /javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/classes -Djava.security.policy=/javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/lib/jck.policy javasoft.sqe.jck.lib.jpda.jdwp.DebuggeeLoader -waittime=600 -msgSwitch=ub1604x64vm10:38636 -componentName= ArrayReference.GetValues.getvalues002 Note that the JCK test harness starts the target process, attaches to it, and sends the resume command in a very short time with no handshaking. That may not help..but hopefully helps explain things a bit? It's the timing of the resume command during the test that is crucial, resuming before the VM initialization is complete will trigger it. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 09:57 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, I've filed the bug: https://bugs.openjdk.java.net/browse/JDK-8201409 Also, this is a webrev with your patch: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ I agree that creating a standalone test is tricky here. I've added usleep(10000) into the eventHelper_reportVMInit() and ran the JTreg com/sun/jdi tests with my JDK build. However, none of the tests failed with the failure mode you described. So that I'm puzzled a little bit. I suspect that some specific debugLoop commands were used in your scenario. It is still possible that I've missed something here. Will try to double check everything. Thanks, Serguei On 4/11/18 01:29, Andrew Leonard wrote: Thanks Serguei, I terms of a standalone testcase it is quite tricky, as due to the nature of the issue which took a lot of investigation to solve it's very timing dependent and will only occur randomly. It can be forced as I indicated below by adding a "sleep" in the VMInit report code but that's not a testcase, however the issue was originally found in our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, but again only happened intermittently. Sort of like "performance" type issues we're not always going to be able to create a testcase that will always "fail" if the fix is not present. Your thoughts? Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 01:02 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, Okay, I'll file a bug on this topic. But do you have a standalone test demonstrating this issue? Thanks, Serguei On 4/10/18 06:23, Andrew Leonard wrote: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From karen.kinnear at oracle.com Wed Apr 11 16:49:34 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 11 Apr 2018 12:49:34 -0400 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> Message-ID: <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> Vlad Kozlov - JC asked if he could add the Graal support as an incremental change. Serguei and I suggested he file a follow-on RFE and put his name on it. That is actually your call - so please l JC, Thank you for adding the thread support and multi-agent support and tests. Serguei and the serviceability team can review those. I have a couple of design questions before getting to the detailed code review: 1. jvmtiExport.cpp You have simplified the JvmtiObjectAllocEventCollector to assume there is only a single object. Do you have a test that allocates a multi-dimensional array? I would expect that to have multiple subarrays - and need the logic that you removed. *** Please add a test in which you allocate a multi-dimensional array - and try it with your earlier version which will show the full set of allocated objects 2. Tests - didn?t read them all - just ran into a hardcoded ?10% error ensures a sanity test without becoming flaky? do check with Serguei on this - that looks like a potential future test failure thanks, Karen > On Apr 9, 2018, at 1:48 AM, JC Beyler wrote: > > Hi all, > > Here is the new webrev: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12/ > > with the incremental here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11_12/ > > After banging my head against the interactions of the runtime and the GC, I finally got the next features up and ready: > - The multi-agent support (with a new test) > - The thread support (with a new test) > - I have an assert at the moment of allocation to check there is a sampled collector available (though could be disabled) > > - This webrev puts the collector at the collectedHeap.inline.hpp level (so removing all of the outer collectors) > - Note there is one current caveat: if the agent requests the VMObjectAlloc event, the sampler defers to that event due to a limitation in its implementation (ie: I am not convinced I can safely send out that event with the VM collector enabled, I'll happily white board that). Please work that one out with Serguei and the serviceability folks. > - I updated the jvmti.xml accordingly btw. > > Let me know what you think! > Jc > > > On Fri, Apr 6, 2018 at 4:12 PM JC Beyler > wrote: > Hi Karen, > > Let me inline my answers, it will probably be easier :) > > On Fri, Apr 6, 2018 at 2:40 PM Karen Kinnear > wrote: > JC, > > Thank you for the updates - really glad you are including the compiler folks. I reviewed the version before this one, so ignore > any comments you?ve already covered (although I did peek at the latest) > > 1. JDK-8194905 CSR - could you please delete attachments that are not current. It is a bit confusing right now. > I have been looking at jvmti_event6.html. I am assuming the rest are obsolete and could be removed please. > > I tried but it does not allow me to do. It seems that someone with admistrative rights has to do it :-(. That is why I had to resort to this? I couldn?t do it either ... > > > 2. In jvmti_event6.html under Sampled Object Allocation, there is a link to ?Heap Sampling Monitoring System?. > It takes me to the top of the page - seems like something is missing in defining it? > > So the Heap Sampling Monitoring System used to have more methods. It made sense to have them in a separate category. I now have moved it to the memory category to be consistent and grouped there. I also removed that link btw. Thanks. > > > > 3. Scope of memory allocation tracking > > I am struggling to understand the extent of memory allocation tracking that you are looking for (probably want to > clarify in the JEP and CSR once we work this through). > > e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent > So the current jvmtiVMObjectAllocEvent says: > > Sent when a method causes the VM to allocate an Object visible to Java > and allocation not detectable by other instrumentation mechanisms. > Generally detect by instrumenting bytecodes of allocating methods > JNI - use JNI function interception > e.g. Reflection: java.lang.Class.newInstance() > e.g. VM intrinsics > > comment: > Not generated due to bytecodes - e.g. new and newarray VM instructions > Not allocation due to JNI: e.g. AllocObject > NOT VM internal objects > NOT allocations during VM init > > So from the JEP I can?t tell the intended scope of the new event - is this intended to cover all heap allocation? > bytecodes > JVM_* > JNI_* > internal VM objects > other? (I?m not sure what other there are) > - I presume not allocations during VM init - since sent only during live phase > > Yes exactly, as much as possible, I am aiming to cover all heap allocations. Mostly though and in practice, I think we care about bytecodes and to a lesser extend JNI. In being independent of why the memory is being allocated is probably even better: this thread allocated Y, no matter where/why that ones. > > > OR - is the primary goal to cover allocation for bytecodes so folks can skip instrumentation? > > Yes that is the primary goal. > > OR - do you want to get performance numbers and see what is low enough overhead before deciding? > > I think it is the same, the system is relatively in place and my overhead seems to indicate that there is a 0% off, 1% on but the callback to the user is empty, 3% for a naive implementation tracking live/GC'd objects. > > > 4. The design question is where to put the collectors in the source base - and that of course strongly depends on > the scope of the information you want to collect, and on the performance overhead we are willing to incur. > > Very true. > > > I was trying to figure out a way to put the collectors farther down the call stack so as to both catch more > cases and to reduce the maintenance burden - i.e. if you were to add a new code generator, e.g. Graal - > if it were to go through an existing interface, that might be a place to already have a collector. > > I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - and it appears that there > are calls to instanceKlass::new_instance, oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, > so one possibility would be to put hooks in those calls which would catch many? (I did not do a thorough search) > of the slowpath calls for the bytecodes, and then check the fast paths in detail. > > I'll come to a major issue with the collector and its placement in the next paragraph. Still not clear on why you did not move the collectors into instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray and ArrayKlass::multi-allocate. > > > I had wondered if it made sense to move the hooks even farther down, into CollectedHeap:obj_allocate and array_allocate. > I do not think so. First reason is that for multidimensional arrays, ArrayKlass::multi_allocate the outer dimension array would > have an event before storing the inner sub-arrays and I don?t think we want that exposed, so that won?t work for arrays. > > So the major difficulty is that the steps of collection do this: > > - An object gets allocated and is decided to be sampled > - The original pointer placement (where it resides originally in memory) is passed to the collector > - Now one important thing of note: > (a) In the VM code, until the point where the oop is going to be returned, GC is not yet aware of it > (b) so the collector can't yet send it out to the user via JVMTI otherwise, the agent could put a weak reference for example > > I'm a bit fuzzy on this and maybe it's just that there would be more heavy lifting to make this possible but my initial tests seem to show problems when attempting this in the obj_allocate area. Not sure what you are seeing here - Let me state it the way I understand it. 1) You can collect the object into internal metadata at allocation point - which you already had Note: see comment in JvmtiExport.cpp: // In the case of the sampled object collector, we don?t want to perform the // oops_do because the object in the collector is still stored in registers // on the VM stack - so GC will find these objects as roots, once we allow GC to run, which should be after the header is initialized Totally agree with you that you can not post the event in the source code in which you allocate the memory - keep reading 2) event posting: - you want to ensure that the object has been fully initialized - the object needs to have the header set up - not just the memory allocated - so that applies to all objects (and that is done in a caller of the allocation code - so it can?t be done at the location which does the memory allocation) - the example I pointed out was the multianewarray case - all the subarrays need to be allocated - please add a test case for multi-array, so as you experiment with where to post the event, you ensure that you can access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as subarrays) - prior to setting up the object header information, GC would not know about the object - was this by chance the issue you ran into? 3) event posting - when you post the event to JVMTI - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), which creates JNIHandles::make_local(_thread, obj) > > > The second reason is that I strongly suspect the scope you want is bytecodes only. I think once you have added hooks > to all the fast paths and slow paths that this will be pushing the performance overhead constraints you proposed and > you won?t want to see e.g. internal allocations. > > Yes agreed, allocations from bytecodes are mostly our concern generally :) > > > But I think you need to experiment with the set of allocations (or possible alternative sets of allocations) you want recorded. > > The hooks I see today include: > Interpreter: (looking at x86 as a sample) > - slowpath in InterpreterRuntime > - fastpath tlab allocation - your new threshold check handles that > > Agreed > > - allow_shared_alloc (GC specific): for _new isn?t handled > > Where is that exactly? I can check why we are not catching it? > > > C1 > I don?t see changes in c1_Runtime.cpp > note: you also want to look for the fast path > > I added the calls to c1_Runtime in the latest webrev, but was still going through testing before pushing it out. I had waited on this one a bit. Fast path would be handled by the threshold check no? > > > C2: changes in opto/runtime.cpp for slow path > did you also catch the fast path? > > Fast path gets handled by the same threshold check, no? Perhaps I've missed something (very likely)? > > > 3. Performance - > After you get all the collectors added - you need to rerun the performance numbers. > > Agreed :) > > > thanks, > Karen > >> On Apr 5, 2018, at 2:15 PM, JC Beyler > wrote: >> >> Thanks Boris and Derek for testing it. >> >> Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features. >> >> Here is the incremental webrev: >> >> Here is the full webrev: >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >> >> Basically, the new tests assert this: >> - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one >> - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit >> >> There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor. >> >> Let me know what you think, >> Jc >> >> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich > wrote: >> Hi JC, >> >> I have just checked on arm32: your patch compiles and runs ok. >> >> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >> correspond to actual library name: libHeapMonitorTest.c -> >> libHeapMonitorTest.so >> >> Boris >> >> On 04.04.2018 01:54, White, Derek wrote: >> > Thanks JC, >> > >> > New patch applies cleanly. Compiles and runs (simple test programs) on >> > aarch64. >> > >> > * Derek >> > >> > *From:* JC Beyler [mailto:jcbeyler at google.com ] >> > *Sent:* Monday, April 02, 2018 1:17 PM >> > *To:* White, Derek > >> > *Cc:* Erik ?sterlund >; >> > serviceability-dev at openjdk.java.net ; hotspot-compiler-dev >> > > >> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >> > >> > Hi Derek, >> > >> > I know there were a few things that went in that provoked a merge >> > conflict. I worked on it and got it up to date. Sadly my lack of >> > knowledge makes it a full rebase instead of keeping all the history. >> > However, with a newly cloned jdk/hs you should now be able to use: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >> > >> > The change you are referring to was done with the others so perhaps you >> > were unlucky and I forgot it in a webrev and fixed it in another? I >> > don't know but it's been there and I checked, it is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >> > >> > I double checked that tlab_end_offset no longer appears in any >> > architecture (as far as I can tell :)). >> > >> > Thanks for testing and let me know if you run into any other issues! >> > >> > Jc >> > >> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek >> > >> wrote: >> > >> > Hi Jc, >> > >> > I?ve been having trouble getting your patch to apply correctly. I >> > may have based it on the wrong version. >> > >> > In any case, I think there?s a missing update to >> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >> > where ?JavaThread::tlab_end_offset()? should become >> > ?JavaThread::tlab_current_end_offset()?. >> > >> > This should correspond to the other port?s changes in >> > templateTable_.cpp files. >> > >> > Thanks! >> > - Derek >> > >> > *From:* hotspot-compiler-dev >> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >> > >] *On Behalf >> > Of *JC Beyler >> > *Sent:* Wednesday, March 28, 2018 11:43 AM >> > *To:* Erik ?sterlund >> > >> >> > *Cc:* serviceability-dev at openjdk.java.net >> > >; hotspot-compiler-dev >> > >> > >> >> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >> > >> > Hi all, >> > >> > I've been working on deflaking the tests mostly and the wording in >> > the JVMTI spec. >> > >> > Here is the two incremental webrevs: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >> > >> > Here is the total webrev: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >> > >> > Here are the notes of this change: >> > >> > - Currently the tests pass 100 times in a row, I am working on >> > checking if they pass 1000 times in a row. >> > >> > - The default sampling rate is set to 512k, this is what we use >> > internally and having a default means that to enable the sampling >> > with the default, the user only has to do a enable event/disable >> > event via JVMTI (instead of enable + set sample rate). >> > >> > - I deprecated the code that was handling the fast path tlab >> > refill if it happened since this is now deprecated >> > >> > - Though I saw that Graal is still using it so I have to see >> > what needs to be done there exactly >> > >> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for >> > when the event system is turned on and the callback to the native >> > agent is just empty. I got a 3% overhead with a 512k sampling rate >> > with the code I put in the native side of my tests. >> > >> > Thanks and comments are appreciated, >> > >> > Jc >> > >> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >> > >> wrote: >> > >> > Hi all, >> > >> > The incremental webrev update is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >> > >> > The full webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >> > >> > Major change here is: >> > >> > - I've removed the heapMonitoring.cpp code in favor of just >> > having the sampling events as per Serguei's request; I still >> > have to do some overhead measurements but the tests prove the >> > concept can work >> > >> > - Most of the tlab code is unchanged, the only major >> > part is that now things get sent off to event collectors when >> > used and enabled. >> > >> > - Added the interpreter collectors to handle interpreter >> > execution >> > >> > - Updated the name from SetTlabHeapSampling to >> > SetHeapSampling to be more generic >> > >> > - Added a mutex for the thread sampling so that we can >> > initialize an internal static array safely >> > >> > - Ported the tests from the old system to this new one >> > >> > I've also updated the JEP and CSR to reflect these changes: >> > >> > https://bugs.openjdk.java.net/browse/JDK-8194905 >> > >> > https://bugs.openjdk.java.net/browse/JDK-8171119 >> > >> > In order to make this have some forward progress, I've removed >> > the heap sampling code entirely and now rely entirely on the >> > event sampling system. The tests reflect this by using a >> > simplified implementation of what an agent could do: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >> > >> > (Search for anything mentioning event_storage). >> > >> > I have not taken the time to port the whole code we had >> > originally in heapMonitoring to this. I hesitate only because >> > that code was in C++, I'd have to port it to C and this is for >> > tests so perhaps what I have now is good enough? >> > >> > As far as testing goes, I've ported all the relevant tests and >> > then added a few: >> > >> > - Turning the system on/off >> > >> > - Testing using various GCs >> > >> > - Testing using the interpreter >> > >> > - Testing the sampling rate >> > >> > - Testing with objects and arrays >> > >> > - Testing with various threads >> > >> > Finally, as overhead goes, I have the numbers of the system off >> > vs a clean build and I have 0% overhead, which is what we'd >> > want. This was using the Dacapo benchmarks. I am now preparing >> > to run a version with the events on using dacapo and will report >> > back here. >> > >> > Any comments are welcome :) >> > >> > Jc >> > >> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler >> > >> wrote: >> > >> > Hi all, >> > >> > I apologize for the delay but I wanted to add an event >> > system and that took a bit longer than expected and I also >> > reworked the code to take into account the deprecation of >> > FastTLABRefill. >> > >> > This update has four parts: >> > >> > A) I moved the implementation from Thread to >> > ThreadHeapSampler inside of Thread. Would you prefer it as a >> > pointer inside of Thread or like this works for you? Second >> > question would be would you rather have an association >> > outside of Thread altogether that tries to remember when >> > threads are live and then we would have something like: >> > >> > ThreadHeapSampler::get_sampling_size(this_thread); >> > >> > I worry about the overhead of this but perhaps it is not too >> > too bad? >> > >> > B) I also have been working on the Allocation event system >> > that sends out a notification at each sampled event. This >> > will be practical when wanting to do something at the >> > allocation point. I'm also looking at if the whole >> > heapMonitoring code could not reside in the agent code and >> > not in the JDK. I'm not convinced but I'm talking to Serguei >> > about it to see/assess :) >> > >> > - Also added two tests for the new event subsystem >> > >> > C) Removed the slow_path fields inside the TLAB code since >> > now FastTLABRefill is deprecated >> > >> > D) Updated the JVMTI documentation and specification for the >> > methods. >> > >> > So the incremental webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >> > >> > and the full webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >> > >> > I believe I have updated the various JIRA issues that track >> > this :) >> > >> > Thanks for your input, >> > >> > Jc >> > >> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >> > >> wrote: >> > >> > Hi Erik, >> > >> > I inlined my answers, which the last one seems to answer >> > Robbin's concerns about the same thing (adding things to >> > Thread). >> > >> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >> > >> > >> wrote: >> > >> > Hi JC, >> > >> > Comments are inlined below. >> > >> > On 2018-02-13 06:18, JC Beyler wrote: >> > >> > Hi Erik, >> > >> > Thanks for your answers, I've now inlined my own >> > answers/comments. >> > >> > I've done a new webrev here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >> > > >> > >> > The incremental is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >> > > >> > >> > Note to all: >> > >> > - I've been integrating changes from >> > Erin/Serguei/David comments so this webrev >> > incremental is a bit an answer to all comments >> > in one. I apologize for that :) >> > >> > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund >> > >> > >> wrote: >> > >> > Hi JC, >> > >> > Sorry for the delayed reply. >> > >> > Inlined answers: >> > >> > >> > >> > On 2018-02-06 00:04, JC Beyler wrote: >> > >> > Hi Erik, >> > >> > (Renaming this to be folded into the >> > newly renamed thread :)) >> > >> > First off, thanks a lot for reviewing >> > the webrev! I appreciate it! >> > >> > I updated the webrev to: >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >> > > >> > >> > And the incremental one is here: >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >> > > >> > >> > It contains: >> > - The change for since from 9 to 11 for >> > the jvmti.xml >> > - The use of the OrderAccess for initialized >> > - Clearing the oop >> > >> > I also have inlined my answers to your >> > comments. The biggest question >> > will come from the multiple *_end >> > variables. A bit of the logic there >> > is due to handling the slow path refill >> > vs fast path refill and >> > checking that the rug was not pulled >> > underneath the slowpath. I >> > believe that a previous comment was that >> > TlabFastRefill was going to >> > be deprecated. >> > >> > If this is true, we could revert this >> > code a bit and just do a : if >> > TlabFastRefill is enabled, disable this. >> > And then deprecate that when >> > TlabFastRefill is deprecated. >> > >> > This might simplify this webrev and I >> > can work on a follow-up that >> > either: removes TlabFastRefill if Robbin >> > does not have the time to do >> > it or add the support to the assembly >> > side to handle this correctly. >> > What do you think? >> > >> > I support removing TlabFastRefill, but I >> > think it is good to not depend on that >> > happening first. >> > >> > >> > I'm slowly pushing on the FastTLABRefill >> > (https://bugs.openjdk.java.net/browse/JDK-8194084 ), >> > I agree on keeping both separate for now though >> > so that we can think of both differently >> > >> > Now, below, inlined are my answers: >> > >> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >> > ?sterlund >> > >> > >> wrote: >> > >> > Hi JC, >> > >> > Hope I am reviewing the right >> > version of your work. Here goes... >> > >> > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >> > >> > 159 >> > AllocTracer::send_allocation_outside_tlab(klass, result, size * >> > HeapWordSize, THREAD); >> > 160 >> > 161 >> > THREAD->tlab().handle_sample(THREAD, result, size); >> > 162 return result; >> > 163 } >> > >> > Should not call tlab()->X without >> > checking if (UseTLAB) IMO. >> > >> > Done! >> > >> > >> > More about this later. >> > >> > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >> > >> > So first of all, there seems to >> > quite a few ends. There is an "end", >> > a "hard >> > end", a "slow path end", and an >> > "actual end". Moreover, it seems >> > like the >> > "hard end" is actually further away >> > than the "actual end". So the "hard end" >> > seems like more of a "really >> > definitely actual end" or something. >> > I don't >> > know about you, but I think it looks >> > kind of messy. In particular, I don't >> > feel like the name "actual end" >> > reflects what it represents, >> > especially when >> > there is another end that is behind >> > the "actual end". >> > >> > 413 HeapWord* >> > ThreadLocalAllocBuffer::hard_end() { >> > 414 // Did a fast TLAB refill >> > occur? >> > 415 if (_slow_path_end != _end) { >> > 416 // Fix up the actual end >> > to be now the end of this TLAB. >> > 417 _slow_path_end = _end; >> > 418 _actual_end = _end; >> > 419 } >> > 420 >> > 421 return _actual_end + >> > alignment_reserve(); >> > 422 } >> > >> > I really do not like making getters >> > unexpectedly have these kind of side >> > effects. It is not expected that >> > when you ask for the "hard end", you >> > implicitly update the "slow path >> > end" and "actual end" to new values. >> > >> > As I said, a lot of this is due to the >> > FastTlabRefill. If I make this >> > not supporting FastTlabRefill, this goes >> > away. The reason the system >> > needs to update itself at the get is >> > that you only know at that get if >> > things have shifted underneath the tlab >> > slow path. I am not sure of >> > really better names (naming is hard!), >> > perhaps we could do these >> > names: >> > >> > - current_tlab_end // Either the >> > allocated tlab end or a sampling point >> > - last_allocation_address // The end of >> > the tlab allocation >> > - last_slowpath_allocated_end // In >> > case a fast refill occurred the >> > end might have changed, this is to >> > remember slow vs fast past refills >> > >> > the hard_end method can be renamed to >> > something like: >> > tlab_end_pointer() // The end of >> > the lab including a bit of >> > alignment reserved bytes >> > >> > Those names sound better to me. Could you >> > please provide a mapping from the old names >> > to the new names so I understand which one >> > is which please? >> > >> > This is my current guess of what you are >> > proposing: >> > >> > end -> current_tlab_end >> > actual_end -> last_allocation_address >> > slow_path_end -> last_slowpath_allocated_end >> > hard_end -> tlab_end_pointer >> > >> > Yes that is correct, that was what I was proposing. >> > >> > I would prefer this naming: >> > >> > end -> slow_path_end // the end for taking a >> > slow path; either due to sampling or refilling >> > actual_end -> allocation_end // the end for >> > allocations >> > slow_path_end -> last_slow_path_end // last >> > address for slow_path_end (as opposed to >> > allocation_end) >> > hard_end -> reserved_end // the end of the >> > reserved space of the TLAB >> > >> > About setting things in the getter... that >> > still seems like a very unpleasant thing to >> > me. It would be better to inspect the call >> > hierarchy and explicitly update the ends >> > where they need updating, and assert in the >> > getter that they are in sync, rather than >> > implicitly setting various ends as a >> > surprising side effect in a getter. It looks >> > like the call hierarchy is very small. With >> > my new naming convention, reserved_end() >> > would presumably return _allocation_end + >> > alignment_reserve(), and have an assert >> > checking that _allocation_end == >> > _last_slow_path_allocation_end, complaining >> > that this invariant must hold, and that a >> > caller to this function, such as >> > make_parsable(), must first explicitly >> > synchronize the ends as required, to honor >> > that invariant. >> > >> > >> > I've renamed the variables to how you preferred >> > it except for the _end one. I did: >> > >> > current_end >> > >> > last_allocation_address >> > >> > tlab_end_ptr >> > >> > The reason is that the architecture dependent >> > code use the thread.hpp API and it already has >> > tlab included into the name so it becomes >> > tlab_current_end (which is better that >> > tlab_current_tlab_end in my opinion). >> > >> > I also moved the update into a separate method >> > with a TODO that says to remove it when >> > FastTLABRefill is deprecated >> > >> > This looks a lot better now. Thanks. >> > >> > Note that the following comment now needs updating >> > accordingly in threadLocalAllocBuffer.hpp: >> > >> > 41 // Heap sampling is performed via >> > the end/actual_end fields. >> > >> > 42 // actual_end contains the real end >> > of the tlab allocation, >> > >> > 43 // whereas end can be set to an >> > arbitrary spot in the tlab to >> > >> > 44 // trip the return and sample the >> > allocation. >> > >> > 45 // slow_path_end is used to track >> > if a fast tlab refill occured >> > >> > 46 // between slowpath calls. >> > >> > There might be other comments too, I have not looked >> > in detail. >> > >> > This was the only spot that still had an actual_end, I >> > fixed it now. I'll do a sweep to double check other >> > comments. >> > >> > >> > >> > Not sure it's better but before updating >> > the webrev, I wanted to try >> > to get input/consensus :) >> > >> > (Note hard_end was always further off >> > than end). >> > >> > src/hotspot/share/prims/jvmti.xml: >> > >> > 10357 > > id="can_sample_heap" since="9"> >> > 10358 >> > 10359 Can sample the heap. >> > 10360 If this capability >> > is enabled then the heap sampling >> > methods >> > can be called. >> > 10361 >> > 10362 >> > >> > Looks like this capability should >> > not be "since 9" if it gets integrated >> > now. >> > >> > Updated now to 11, crossing my fingers :) >> > >> > src/hotspot/share/runtime/heapMonitoring.cpp: >> > >> > 448 if >> > (is_alive->do_object_b(value)) { >> > 449 // Update the oop to >> > point to the new object if it is still >> > alive. >> > 450 f->do_oop(&(trace.obj)); >> > 451 >> > 452 // Copy the old >> > trace, if it is still live. >> > 453 >> > _allocated_traces->at_put(curr_pos++, trace); >> > 454 >> > 455 // Store the live >> > trace in a cache, to be served up on >> > /heapz. >> > 456 >> > _traces_on_last_full_gc->append(trace); >> > 457 >> > 458 count++; >> > 459 } else { >> > 460 // If the old trace >> > is no longer live, add it to the list of >> > 461 // recently collected >> > garbage. >> > 462 >> > store_garbage_trace(trace); >> > 463 } >> > >> > In the case where the oop was not >> > live, I would like it to be explicitly >> > cleared. >> > >> > Done I think how you wanted it. Let me >> > know because I'm not familiar >> > with the RootAccess API. I'm unclear if >> > I'm doing this right or not so >> > reviews of these parts are highly >> > appreciated. Robbin had talked of >> > perhaps later pushing this all into a >> > OopStorage, should I do this now >> > do you think? Or can that wait a second >> > webrev later down the road? >> > >> > I think using handles can and should be done >> > later. You can use the Access API now. >> > I noticed that you are missing an #include >> > "oops/access.inline.hpp" in your >> > heapMonitoring.cpp file. >> > >> > The missing header is there for me so I don't >> > know, I made sure it is present in the latest >> > webrev. Sorry about that. >> > >> > + Did I clear it the way you wanted me >> > to or were you thinking of >> > something else? >> > >> > >> > That is precisely how I wanted it to be >> > cleared. Thanks. >> > >> > + Final question here, seems like if I >> > were to want to not do the >> > f->do_oop directly on the trace.obj, I'd >> > need to do something like: >> > >> > f->do_oop(&value); >> > ... >> > trace->store_oop(value); >> > >> > to update the oop internally. Is that >> > right/is that one of the >> > advantages of going to the Oopstorage >> > sooner than later? >> > >> > >> > I think you really want to do the do_oop on >> > the root directly. Is there a particular >> > reason why you would not want to do that? >> > Otherwise, yes - the benefit with using the >> > handle approach is that you do not need to >> > call do_oop explicitly in your code. >> > >> > There is no reason except that now we have a >> > load_oop and a get_oop_addr, I was not sure what >> > you would think of that. >> > >> > That's fine. >> > >> > Also I see a lot of >> > concurrent-looking use of the >> > following field: >> > 267 volatile bool _initialized; >> > >> > Please note that the "volatile" >> > qualifier does not help with reordering >> > here. Reordering between volatile >> > and non-volatile fields is >> > completely free >> > for both compiler and hardware, >> > except for windows with MSVC, where >> > volatile >> > semantics is defined to use >> > acquire/release semantics, and the >> > hardware is >> > TSO. But for the general case, I >> > would expect this field to be stored >> > with >> > OrderAccess::release_store and >> > loaded with OrderAccess::load_acquire. >> > Otherwise it is not thread safe. >> > >> > Because everything is behind a mutex, I >> > wasn't really worried about >> > this. I have a test that has multiple >> > threads trying to hit this >> > corner case and it passes. >> > >> > However, to be paranoid, I updated it to >> > using the OrderAccess API >> > now, thanks! Let me know what you think >> > there too! >> > >> > >> > If it is indeed always supposed to be read >> > and written under a mutex, then I would >> > strongly prefer to have it accessed as a >> > normal non-volatile member, and have an >> > assertion that given lock is held or we are >> > in a safepoint, as we do in many other >> > places. Something like this: >> > >> > assert(HeapMonitorStorage_lock->owned_by_self() >> > || (SafepointSynchronize::is_at_safepoint() >> > && Thread::current()->is_VM_thread()), "this >> > should not be accessed concurrently"); >> > >> > It would be confusing to people reading the >> > code if there are uses of OrderAccess that >> > are actually always protected under a mutex. >> > >> > Thank you for the exact example to be put in the >> > code! I put it around each access/assignment of >> > the _initialized method and found one case where >> > yes you can touch it and not have the lock. It >> > actually is "ok" because you don't act on the >> > storage until later and only when you really >> > want to modify the storage (see the >> > object_alloc_do_sample method which calls the >> > add_trace method). >> > >> > But, because of this, I'm going to put the >> > OrderAccess here, I'll do some performance >> > numbers later and if there are issues, I might >> > add a "unsafe" read and a "safe" one to make it >> > explicit to the reader. But I don't think it >> > will come to that. >> > >> > >> > Okay. This double return in heapMonitoring.cpp looks >> > wrong: >> > >> > 283 bool initialized() { >> > 284 return >> > OrderAccess::load_acquire(&_initialized) != 0; >> > 285 return _initialized; >> > 286 } >> > >> > Since you said object_alloc_do_sample() is the only >> > place where you do not hold the mutex while reading >> > initialized(), I had a closer look at that. It looks >> > like in its current shape, the lack of a mutex may >> > lead to a memory leak. In particular, it first >> > checks if (initialized()). Let's assume this is now >> > true. It then allocates a bunch of stuff, and checks >> > if the number of frames were over 0. If they were, >> > it calls StackTraceStorage::storage()->add_trace() >> > seemingly hoping that after grabbing the lock in >> > there, initialized() will still return true. But it >> > could now return false and skip doing anything, in >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Apr 11 17:51:52 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 11 Apr 2018 10:51:52 -0700 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Apr 11 17:57:24 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 11 Apr 2018 10:57:24 -0700 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Wed Apr 11 18:01:04 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 11 Apr 2018 20:01:04 +0200 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: Hi Serguei, On Wed, Apr 11, 2018 at 7:51 PM, serguei.spitsyn at oracle.com wrote: > Hi Thomas, > > Sorry, I did not reply to this. no problem. > Thank you a lot for providing the output! > I agree, the sorted one looks much better. > > Probably, it is late to say but I'm Ok with a push. :) > Thanks :) Have not yet pushed but will do it today or tomorrow. ..Thomas > Thanks, > Serguei > > > > On 4/9/18 02:07, Thomas St?fe wrote: > > Hi Sergey, Christoph, > > thanks for the review! > > Sure, here you go: > > Old output, unsorted: > > thomas at mainframe /shared/projects/openjdk/jdk-submit-hs/output-fastdebug $ > ./images/jdk/bin/jcmd test3.Example2 help > 24278: > The following commands are available: > VM.log > VM.native_memory > ManagementAgent.status > ManagementAgent.stop > ManagementAgent.start_local > ManagementAgent.start > Compiler.directives_clear > Compiler.directives_remove > Compiler.directives_add > Compiler.directives_print > Compiler.CodeHeap_Analytics > VM.print_touched_methods > Compiler.codecache > Compiler.codelist > Compiler.queue > VM.classloader_stats > Thread.print > JVMTI.data_dump > JVMTI.agent_load > VM.metaspace > VM.stringtable > VM.symboltable > VM.class_hierarchy > VM.systemdictionary > GC.class_stats > GC.class_histogram > GC.heap_dump > GC.finalizer_info > GC.heap_info > GC.run_finalization > GC.run > VM.info > VM.uptime > VM.dynlibs > VM.set_flag > VM.flags > VM.system_properties > VM.command_line > VM.version > help > > New output, sorted: > > thomas at mainframe /shared/projects/openjdk/jdk-submit-hs/output-fastdebug $ > ./images/jdk/bin/jcmd test3.Example2 help > 30230: > The following commands are available: > Compiler.CodeHeap_Analytics > Compiler.codecache > Compiler.codelist > Compiler.directives_add > Compiler.directives_clear > Compiler.directives_print > Compiler.directives_remove > Compiler.queue > GC.class_histogram > GC.class_stats > GC.finalizer_info > GC.heap_dump > GC.heap_info > GC.run > GC.run_finalization > JVMTI.agent_load > JVMTI.data_dump > ManagementAgent.start > ManagementAgent.start_local > ManagementAgent.status > ManagementAgent.stop > Thread.print > VM.class_hierarchy > VM.classloader_stats > VM.command_line > VM.dynlibs > VM.flags > VM.info > VM.log > VM.metaspace > VM.native_memory > VM.print_touched_methods > VM.set_flag > VM.stringtable > VM.symboltable > VM.system_properties > VM.systemdictionary > VM.uptime > VM.version > help > > > I'm running submit tests now, if they pass I'll push. > > Best Regards, Thomas > > > On Tue, Apr 3, 2018 at 3:52 AM, serguei.spitsyn at oracle.com > wrote: >> >> Hi Thomas, >> >> Added the serviceability-dev mailing list as it is a Serviceability area. >> >> The fix looks good to me. >> One question: >> Could you, please, post the sorted help output? >> It is interesting how does it look like when sorted. >> >> Thanks, >> Serguei >> >> >> >> On 3/28/18 13:08, Thomas St?fe wrote: >>> >>> Hi all, >>> >>> may I get reviews for this tiny trivial change which causes jcmd help >>> output (the command list) to be sorted? >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8200384 >>> webrev: >>> >>> http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help-sorted/webrev.00/webrev/ >>> >>> Thanks! >>> >>> Best Regards, Thomas >> >> > > From jcbeyler at google.com Thu Apr 12 00:17:04 2018 From: jcbeyler at google.com (JC Beyler) Date: Thu, 12 Apr 2018 00:17:04 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> Message-ID: Hi Karen, I put up a new webrev that is feature complete in my mind in terms of implementation. There could be a few tid-bits of optimizations here and there but I believe everything is now there in terms of features and there is the question of placement of collectors (I've now put it lower than what you talk about in this email thread). The incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/ and the full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/ The incremental webrev contains the name change of TLAB fields as per the conversation going on in the GC thread and the Graal changes to make this webrev not break anything. It also contains a test for when VM and sampled events are enabled at the same time. I'll answer your questions inlined below: > > I have a couple of design questions before getting to the detailed code > review: > > 1. jvmtiExport.cpp > You have simplified the JvmtiObjectAllocEventCollector to assume there is > only a single object. > Do you have a test that allocates a multi-dimensional array? > I would expect that to have multiple subarrays - and need the logic that > you removed. > I "think" you misread this. I did not change the implementation of JvmtiObjectAllocEventCollector to assume only one object. Actually the implementation is still doing what it was doing initially but now JvmtiObjectAllocEventCollector is a main class with two inherited behaviors: - JvmtiVMObjectAllocEventCollector is following the same logic as before - JvmtiSampledObjectAllocEventCollector has the logic of a single allocation during collection since it is placed that low. I'm thinking of perhaps separating the two classes now that the sampled case is so low and does not require the extra logic of handling a growable array. I don't have a test that tests multi-dimensional array. I have not looked at what exactly VMObjectAllocEvents track but I did do an example to see: - On a clean JDK, if I allocate: int[][][] myAwesomeTable = new int[10][10][10]; I get one single VMObject call back. - With my change, I get the same behavior. So instead, I did an object containing an array and cloned it. With the clean JDK I get two VM callbacks with and without my change. I'll add a test in the next webrev to ensure correctness. > *** Please add a test in which you allocate a multi-dimensional array - > and try it with your earlier version which > will show the full set of allocated objects > As I said, this works because it is so low in the system. I don't know the internals of the allocation of a three-dimensional array but: - If I try to collect all allocations required for the new int[10][10][10], I get more than a hundred allocation callbacks if the sample rate is 0, which is what I'd expect (getting a callback at each allocation, so what I expect) So though I only collect one object, I get a callback for each if that is what we want in regard to the sampling rate. > 2. Tests - didn?t read them all - just ran into a hardcoded ?10% error > ensures a sanity test without becoming flaky? > do check with Serguei on this - that looks like a potential future test > failure > Because the sampling rate is a geometric variable around a mean of the sampling rate, any meaningful test is going to have to be statistical. Therefore, I do a bit of error acceptance to allow us to test for the real thing and not hack the code to have less "real" tests. This is what we do internally, let me know if you want me to do it otherwise. Flaky is probably a wrong term or perhaps I need to better explain this. I'll change the comment in the tests to explain that potentially flakyness comes from the nature of the geometrical mean. Because we don't want too long running tests, it makes sense to me to have this error percentage. Let me now answer the comments from the other email here as well so we have all answers and conversations in a single thread: > - Note there is one current caveat: if the agent requests the > VMObjectAlloc event, the sampler defers to that event due to a limitation > in its implementation (ie: I am not convinced I can safely send out that > event with the VM collector enabled, I'll happily white board that). > > Please work that one out with Serguei and the serviceability folks. > > Agreed. I'll follow up with Serguei if this is a potential problem, I have to double check and ensure I am right that there is an issue. I see it as just a matter of life and not a problem for now. IF you do want both events, having a sample drop due to this limitation does not invalidate the system in my mind. I could be wrong about it though and would happily go over what I saw. >> So the Heap Sampling Monitoring System used to have more methods. It made >> sense to have them in a separate category. I now have moved it to the >> memory category to be consistent and grouped there. I also removed that >> link btw. >> > Thanks. > Actually it did seem weird to put it there since there was only Allocate/Deallocate, so for now the method is still again in its own category once again. If someone has a better spot, let me know. >> >>> >>> I was trying to figure out a way to put the collectors farther down the >>> call stack so as to both catch more >>> cases and to reduce the maintenance burden - i.e. if you were to add a >>> new code generator, e.g. Graal - >>> if it were to go through an existing interface, that might be a place to >>> already have a collector. >>> >>> I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - >>> and it appears that there >>> are calls to instanceKlass::new_instance, >>> oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >>> so one possibility would be to put hooks in those calls which would >>> catch many? (I did not do a thorough search) >>> of the slowpath calls for the bytecodes, and then check the fast paths >>> in detail. >>> >> >> I'll come to a major issue with the collector and its placement in the >> next paragraph. >> > Still not clear on why you did not move the collectors into > instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray > and ArrayKlass::multi-allocate. > I think what was happening is that the collectors would wrap the objects in handles but references to the originally allocated object would be on the stack still and would not get updated if required. Due to that issue, I believe was getting weird bugs. Because of this, it seems that any VM collector enabled has to guarantee that either: - its path to destruction (and thus posting of events) has no means of triggering a GC that would move things around (as long as you are in VM code you should be fine I believe) - if GC is occuring, the objects in its internal array are not somewhere on the stack without a handle around them to be able to be moved if need by from a GC operation. I'm not convinced this holds in the multithreaded with sampling and VM collection cases. >> >>> >>> I had wondered if it made sense to move the hooks even farther down, >>> into CollectedHeap:obj_allocate and array_allocate. >>> I do not think so. First reason is that for multidimensional arrays, >>> ArrayKlass::multi_allocate the outer dimension array would >>> have an event before storing the inner sub-arrays and I don?t think we >>> want that exposed, so that won?t work for arrays. >>> >> >> So the major difficulty is that the steps of collection do this: >> >> - An object gets allocated and is decided to be sampled >> - The original pointer placement (where it resides originally in memory) >> is passed to the collector >> - Now one important thing of note: >> (a) In the VM code, until the point where the oop is going to be >> returned, GC is not yet aware of it >> > (b) so the collector can't yet send it out to the user via JVMTI >> otherwise, the agent could put a weak reference for example >> >> I'm a bit fuzzy on this and maybe it's just that there would be more >> heavy lifting to make this possible but my initial tests seem to show >> problems when attempting this in the obj_allocate area. >> > Not sure what you are seeing here - > > Let me state it the way I understand it. > 1) You can collect the object into internal metadata at allocation point - > which you already had > Note: see comment in JvmtiExport.cpp: > // In the case of the sampled object collector, we don?t want to perform > the > // oops_do because the object in the collector is still stored in > registers > // on the VM stack > - so GC will find these objects as roots, once we allow GC to run, > which should be after the header is initialized > > Totally agree with you that you can not post the event in the source code > in which you allocate the memory - keep reading > > 2) event posting: > - you want to ensure that the object has been fully initialized > - the object needs to have the header set up - not just the memory > allocated - so that applies to all objects > (and that is done in a caller of the allocation code - so it can?t be > done at the location which does the memory allocation) > - the example I pointed out was the multianewarray case - all the > subarrays need to be allocated > > - please add a test case for multi-array, so as you experiment with where > to post the event, you ensure that you can > access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as > subarrays) > Technically it's more than just initialized, it is the fact that you cannot perform a callback about an object if any object of that thread is being held by a collector and also in a register/stack space without protections. > - prior to setting up the object header information, GC would not know > about the object > - was this by chance the issue you ran into? > No I believe the issue I was running into was above where an object on the stack was pointing to an oop that got moved during a GC due to that thread doing an event callback. Thanks for your help, Jc > 3) event posting > - when you post the event to JVMTI > - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), > which creates JNIHandles::make_local(_thread, obj) > > >> >>> >>> The second reason is that I strongly suspect the scope you want is >>> bytecodes only. I think once you have added hooks >>> to all the fast paths and slow paths that this will be pushing the >>> performance overhead constraints you proposed and >>> you won?t want to see e.g. internal allocations. >>> >> >> Yes agreed, allocations from bytecodes are mostly our concern generally :) >> >> >>> >>> >> But I think you need to experiment with the set of allocations (or >>> possible alternative sets of allocations) you want recorded. >>> >>> The hooks I see today include: >>> Interpreter: (looking at x86 as a sample) >>> - slowpath in InterpreterRuntime >>> - fastpath tlab allocation - your new threshold check handles that >>> >> >> Agreed >> >> >>> - allow_shared_alloc (GC specific): for _new isn?t handled >>> >> >> Where is that exactly? I can check why we are not catching it? >> >> >>> >>> C1 >>> I don?t see changes in c1_Runtime.cpp >>> note: you also want to look for the fast path >>> >> >> I added the calls to c1_Runtime in the latest webrev, but was still going >> through testing before pushing it out. I had waited on this one a bit. Fast >> path would be handled by the threshold check no? >> >> >>> C2: changes in opto/runtime.cpp for slow path >>> did you also catch the fast path? >>> >> >> Fast path gets handled by the same threshold check, no? Perhaps I've >> missed something (very likely)? >> >> >>> >>> 3. Performance - >>> After you get all the collectors added - you need to rerun the >>> performance numbers. >>> >> >> Agreed :) >> >> >>> >>> thanks, >>> Karen >>> >>> On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: >>> >>> Thanks Boris and Derek for testing it. >>> >>> Yes I was trying to get a new version out that had the tests ported as >>> well but got sidetracked while trying to add tests and two new features. >>> >>> Here is the incremental webrev: >>> >>> Here is the full webrev: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >>> >>> Basically, the new tests assert this: >>> - Only one agent can currently ask for the sampling, I'm currently >>> seeing if I can push to a next webrev the multi-agent support to start >>> doing a code freeze on this one >>> - The event is not thread-enabled, meaning like the >>> VMObjectAllocationEvent, it's an all or nothing event; same as the >>> multi-agent, I'm going to see if a future webrev to add the support is a >>> better idea to freeze this webrev a bit >>> >>> There was another item that I added here and I'm unsure this webrev is >>> stable in debug mode: I added an assertion system to ascertain that all >>> paths leading to a TLAB slow path (and hence a sampling point) have a >>> sampling collector ready to post the event if a user wants it. This might >>> break a few thing in debug mode as I'm working through the kinks of that as >>> well. However, in release mode, this new webrev passes all the tests in >>> hotspot/jtreg/serviceability/jvmti/HeapMonitor. >>> >>> Let me know what you think, >>> Jc >>> >>> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < >>> boris.ulasevich at bell-sw.com> wrote: >>> >>>> Hi JC, >>>> >>>> I have just checked on arm32: your patch compiles and runs ok. >>>> >>>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>>> correspond to actual library name: libHeapMonitorTest.c -> >>>> libHeapMonitorTest.so >>>> >>>> Boris >>>> >>>> On 04.04.2018 01:54, White, Derek wrote: >>>> > Thanks JC, >>>> > >>>> > New patch applies cleanly. Compiles and runs (simple test programs) on >>>> > aarch64. >>>> > >>>> > * Derek >>>> > >>>> > *From:* JC Beyler [mailto:jcbeyler at google.com] >>>> > *Sent:* Monday, April 02, 2018 1:17 PM >>>> > *To:* White, Derek >>>> > *Cc:* Erik ?sterlund ; >>>> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >>>> > >>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>> > >>>> > Hi Derek, >>>> > >>>> > I know there were a few things that went in that provoked a merge >>>> > conflict. I worked on it and got it up to date. Sadly my lack of >>>> > knowledge makes it a full rebase instead of keeping all the history. >>>> > However, with a newly cloned jdk/hs you should now be able to use: >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>>> > >>>> > The change you are referring to was done with the others so perhaps >>>> you >>>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>>> > don't know but it's been there and I checked, it is here: >>>> > >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>>> > >>>> > I double checked that tlab_end_offset no longer appears in any >>>> > architecture (as far as I can tell :)). >>>> > >>>> > Thanks for testing and let me know if you run into any other issues! >>>> > >>>> > Jc >>>> > >>>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek >>> > > wrote: >>>> > >>>> > Hi Jc, >>>> > >>>> > I?ve been having trouble getting your patch to apply correctly. I >>>> > may have based it on the wrong version. >>>> > >>>> > In any case, I think there?s a missing update to >>>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>>> > where ?JavaThread::tlab_end_offset()? should become >>>> > ?JavaThread::tlab_current_end_offset()?. >>>> > >>>> > This should correspond to the other port?s changes in >>>> > templateTable_.cpp files. >>>> > >>>> > Thanks! >>>> > - Derek >>>> > >>>> > *From:* hotspot-compiler-dev >>>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>>> > ] *On >>>> Behalf >>>> > Of *JC Beyler >>>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>>> > *To:* Erik ?sterlund >>> > > >>>> > *Cc:* serviceability-dev at openjdk.java.net >>>> > ; >>>> hotspot-compiler-dev >>>> > >>> > > >>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>> > >>>> > Hi all, >>>> > >>>> > I've been working on deflaking the tests mostly and the wording in >>>> > the JVMTI spec. >>>> > >>>> > Here is the two incremental webrevs: >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >>>> > >>>> > Here is the total webrev: >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >>>> > >>>> > Here are the notes of this change: >>>> > >>>> > - Currently the tests pass 100 times in a row, I am working on >>>> > checking if they pass 1000 times in a row. >>>> > >>>> > - The default sampling rate is set to 512k, this is what we use >>>> > internally and having a default means that to enable the sampling >>>> > with the default, the user only has to do a enable event/disable >>>> > event via JVMTI (instead of enable + set sample rate). >>>> > >>>> > - I deprecated the code that was handling the fast path tlab >>>> > refill if it happened since this is now deprecated >>>> > >>>> > - Though I saw that Graal is still using it so I have to >>>> see >>>> > what needs to be done there exactly >>>> > >>>> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead >>>> for >>>> > when the event system is turned on and the callback to the native >>>> > agent is just empty. I got a 3% overhead with a 512k sampling rate >>>> > with the code I put in the native side of my tests. >>>> > >>>> > Thanks and comments are appreciated, >>>> > >>>> > Jc >>>> > >>>> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >>> > > wrote: >>>> > >>>> > Hi all, >>>> > >>>> > The incremental webrev update is here: >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >>>> > >>>> > The full webrev is here: >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >>>> > >>>> > Major change here is: >>>> > >>>> > - I've removed the heapMonitoring.cpp code in favor of just >>>> > having the sampling events as per Serguei's request; I still >>>> > have to do some overhead measurements but the tests prove the >>>> > concept can work >>>> > >>>> > - Most of the tlab code is unchanged, the only major >>>> > part is that now things get sent off to event collectors when >>>> > used and enabled. >>>> > >>>> > - Added the interpreter collectors to handle interpreter >>>> > execution >>>> > >>>> > - Updated the name from SetTlabHeapSampling to >>>> > SetHeapSampling to be more generic >>>> > >>>> > - Added a mutex for the thread sampling so that we can >>>> > initialize an internal static array safely >>>> > >>>> > - Ported the tests from the old system to this new one >>>> > >>>> > I've also updated the JEP and CSR to reflect these changes: >>>> > >>>> > https://bugs.openjdk.java.net/browse/JDK-8194905 >>>> > >>>> > https://bugs.openjdk.java.net/browse/JDK-8171119 >>>> > >>>> > In order to make this have some forward progress, I've removed >>>> > the heap sampling code entirely and now rely entirely on the >>>> > event sampling system. The tests reflect this by using a >>>> > simplified implementation of what an agent could do: >>>> > >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >>>> > >>>> > (Search for anything mentioning event_storage). >>>> > >>>> > I have not taken the time to port the whole code we had >>>> > originally in heapMonitoring to this. I hesitate only because >>>> > that code was in C++, I'd have to port it to C and this is for >>>> > tests so perhaps what I have now is good enough? >>>> > >>>> > As far as testing goes, I've ported all the relevant tests and >>>> > then added a few: >>>> > >>>> > - Turning the system on/off >>>> > >>>> > - Testing using various GCs >>>> > >>>> > - Testing using the interpreter >>>> > >>>> > - Testing the sampling rate >>>> > >>>> > - Testing with objects and arrays >>>> > >>>> > - Testing with various threads >>>> > >>>> > Finally, as overhead goes, I have the numbers of the system >>>> off >>>> > vs a clean build and I have 0% overhead, which is what we'd >>>> > want. This was using the Dacapo benchmarks. I am now preparing >>>> > to run a version with the events on using dacapo and will >>>> report >>>> > back here. >>>> > >>>> > Any comments are welcome :) >>>> > >>>> > Jc >>>> > >>>> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler >>> > > wrote: >>>> > >>>> > Hi all, >>>> > >>>> > I apologize for the delay but I wanted to add an event >>>> > system and that took a bit longer than expected and I also >>>> > reworked the code to take into account the deprecation of >>>> > FastTLABRefill. >>>> > >>>> > This update has four parts: >>>> > >>>> > A) I moved the implementation from Thread to >>>> > ThreadHeapSampler inside of Thread. Would you prefer it >>>> as a >>>> > pointer inside of Thread or like this works for you? >>>> Second >>>> > question would be would you rather have an association >>>> > outside of Thread altogether that tries to remember when >>>> > threads are live and then we would have something like: >>>> > >>>> > ThreadHeapSampler::get_sampling_size(this_thread); >>>> > >>>> > I worry about the overhead of this but perhaps it is not >>>> too >>>> > too bad? >>>> > >>>> > B) I also have been working on the Allocation event system >>>> > that sends out a notification at each sampled event. This >>>> > will be practical when wanting to do something at the >>>> > allocation point. I'm also looking at if the whole >>>> > heapMonitoring code could not reside in the agent code and >>>> > not in the JDK. I'm not convinced but I'm talking to >>>> Serguei >>>> > about it to see/assess :) >>>> > >>>> > - Also added two tests for the new event subsystem >>>> > >>>> > C) Removed the slow_path fields inside the TLAB code since >>>> > now FastTLABRefill is deprecated >>>> > >>>> > D) Updated the JVMTI documentation and specification for >>>> the >>>> > methods. >>>> > >>>> > So the incremental webrev is here: >>>> > >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >>>> > >>>> > and the full webrev is here: >>>> > >>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >>>> > >>>> > I believe I have updated the various JIRA issues that >>>> track >>>> > this :) >>>> > >>>> > Thanks for your input, >>>> > >>>> > Jc >>>> > >>>> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >>>> > > wrote: >>>> > >>>> > Hi Erik, >>>> > >>>> > I inlined my answers, which the last one seems to >>>> answer >>>> > Robbin's concerns about the same thing (adding things >>>> to >>>> > Thread). >>>> > >>>> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >>>> > >>> > > wrote: >>>> > >>>> > Hi JC, >>>> > >>>> > Comments are inlined below. >>>> > >>>> > On 2018-02-13 06:18, JC Beyler wrote: >>>> > >>>> > Hi Erik, >>>> > >>>> > Thanks for your answers, I've now inlined my >>>> own >>>> > answers/comments. >>>> > >>>> > I've done a new webrev here: >>>> > >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>>> > < >>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >>>> > >>>> > The incremental is here: >>>> > >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>>> > < >>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >>>> > >>>> > Note to all: >>>> > >>>> > - I've been integrating changes from >>>> > Erin/Serguei/David comments so this webrev >>>> > incremental is a bit an answer to all comments >>>> > in one. I apologize for that :) >>>> > >>>> > On Mon, Feb 12, 2018 at 6:05 AM, Erik >>>> ?sterlund >>>> > >>> > > wrote: >>>> > >>>> > Hi JC, >>>> > >>>> > Sorry for the delayed reply. >>>> > >>>> > Inlined answers: >>>> > >>>> > >>>> > >>>> > On 2018-02-06 00:04, JC Beyler wrote: >>>> > >>>> > Hi Erik, >>>> > >>>> > (Renaming this to be folded into the >>>> > newly renamed thread :)) >>>> > >>>> > First off, thanks a lot for reviewing >>>> > the webrev! I appreciate it! >>>> > >>>> > I updated the webrev to: >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>>> > < >>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >>>> > >>>> > And the incremental one is here: >>>> > >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>>> > < >>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >>>> > >>>> > It contains: >>>> > - The change for since from 9 to 11 >>>> for >>>> > the jvmti.xml >>>> > - The use of the OrderAccess for >>>> initialized >>>> > - Clearing the oop >>>> > >>>> > I also have inlined my answers to your >>>> > comments. The biggest question >>>> > will come from the multiple *_end >>>> > variables. A bit of the logic there >>>> > is due to handling the slow path >>>> refill >>>> > vs fast path refill and >>>> > checking that the rug was not pulled >>>> > underneath the slowpath. I >>>> > believe that a previous comment was >>>> that >>>> > TlabFastRefill was going to >>>> > be deprecated. >>>> > >>>> > If this is true, we could revert this >>>> > code a bit and just do a : if >>>> > TlabFastRefill is enabled, disable >>>> this. >>>> > And then deprecate that when >>>> > TlabFastRefill is deprecated. >>>> > >>>> > This might simplify this webrev and I >>>> > can work on a follow-up that >>>> > either: removes TlabFastRefill if >>>> Robbin >>>> > does not have the time to do >>>> > it or add the support to the assembly >>>> > side to handle this correctly. >>>> > What do you think? >>>> > >>>> > I support removing TlabFastRefill, but I >>>> > think it is good to not depend on that >>>> > happening first. >>>> > >>>> > >>>> > I'm slowly pushing on the FastTLABRefill >>>> > ( >>>> https://bugs.openjdk.java.net/browse/JDK-8194084), >>>> > I agree on keeping both separate for now >>>> though >>>> > so that we can think of both differently >>>> > >>>> > Now, below, inlined are my answers: >>>> > >>>> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >>>> > ?sterlund >>>> > >>> > > >>>> wrote: >>>> > >>>> > Hi JC, >>>> > >>>> > Hope I am reviewing the right >>>> > version of your work. Here goes... >>>> > >>>> > >>>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>>> > >>>> > 159 >>>> > >>>> AllocTracer::send_allocation_outside_tlab(klass, result, size * >>>> > HeapWordSize, THREAD); >>>> > 160 >>>> > 161 >>>> > >>>> THREAD->tlab().handle_sample(THREAD, result, size); >>>> > 162 return result; >>>> > 163 } >>>> > >>>> > Should not call tlab()->X without >>>> > checking if (UseTLAB) IMO. >>>> > >>>> > Done! >>>> > >>>> > >>>> > More about this later. >>>> > >>>> > >>>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>>> > >>>> > So first of all, there seems to >>>> > quite a few ends. There is an >>>> "end", >>>> > a "hard >>>> > end", a "slow path end", and an >>>> > "actual end". Moreover, it seems >>>> > like the >>>> > "hard end" is actually further >>>> away >>>> > than the "actual end". So the >>>> "hard end" >>>> > seems like more of a "really >>>> > definitely actual end" or >>>> something. >>>> > I don't >>>> > know about you, but I think it >>>> looks >>>> > kind of messy. In particular, I >>>> don't >>>> > feel like the name "actual end" >>>> > reflects what it represents, >>>> > especially when >>>> > there is another end that is >>>> behind >>>> > the "actual end". >>>> > >>>> > 413 HeapWord* >>>> > >>>> ThreadLocalAllocBuffer::hard_end() { >>>> > 414 // Did a fast TLAB refill >>>> > occur? >>>> > 415 if (_slow_path_end != >>>> _end) { >>>> > 416 // Fix up the actual >>>> end >>>> > to be now the end of this TLAB. >>>> > 417 _slow_path_end = _end; >>>> > 418 _actual_end = _end; >>>> > 419 } >>>> > 420 >>>> > 421 return _actual_end + >>>> > alignment_reserve(); >>>> > 422 } >>>> > >>>> > I really do not like making >>>> getters >>>> > unexpectedly have these kind of >>>> side >>>> > effects. It is not expected that >>>> > when you ask for the "hard end", >>>> you >>>> > implicitly update the "slow path >>>> > end" and "actual end" to new >>>> values. >>>> > >>>> > As I said, a lot of this is due to the >>>> > FastTlabRefill. If I make this >>>> > not supporting FastTlabRefill, this >>>> goes >>>> > away. The reason the system >>>> > needs to update itself at the get is >>>> > that you only know at that get if >>>> > things have shifted underneath the >>>> tlab >>>> > slow path. I am not sure of >>>> > really better names (naming is hard!), >>>> > perhaps we could do these >>>> > names: >>>> > >>>> > - current_tlab_end // Either the >>>> > allocated tlab end or a sampling point >>>> > - last_allocation_address // The end >>>> of >>>> > the tlab allocation >>>> > - last_slowpath_allocated_end // In >>>> > case a fast refill occurred the >>>> > end might have changed, this is to >>>> > remember slow vs fast past refills >>>> > >>>> > the hard_end method can be renamed to >>>> > something like: >>>> > tlab_end_pointer() // The end >>>> of >>>> > the lab including a bit of >>>> > alignment reserved bytes >>>> > >>>> > Those names sound better to me. Could you >>>> > please provide a mapping from the old >>>> names >>>> > to the new names so I understand which one >>>> > is which please? >>>> > >>>> > This is my current guess of what you are >>>> > proposing: >>>> > >>>> > end -> current_tlab_end >>>> > actual_end -> last_allocation_address >>>> > slow_path_end -> >>>> last_slowpath_allocated_end >>>> > hard_end -> tlab_end_pointer >>>> > >>>> > Yes that is correct, that was what I was >>>> proposing. >>>> > >>>> > I would prefer this naming: >>>> > >>>> > end -> slow_path_end // the end for >>>> taking a >>>> > slow path; either due to sampling or >>>> refilling >>>> > actual_end -> allocation_end // the end >>>> for >>>> > allocations >>>> > slow_path_end -> last_slow_path_end // >>>> last >>>> > address for slow_path_end (as opposed to >>>> > allocation_end) >>>> > hard_end -> reserved_end // the end of the >>>> > reserved space of the TLAB >>>> > >>>> > About setting things in the getter... that >>>> > still seems like a very unpleasant thing >>>> to >>>> > me. It would be better to inspect the call >>>> > hierarchy and explicitly update the ends >>>> > where they need updating, and assert in >>>> the >>>> > getter that they are in sync, rather than >>>> > implicitly setting various ends as a >>>> > surprising side effect in a getter. It >>>> looks >>>> > like the call hierarchy is very small. >>>> With >>>> > my new naming convention, reserved_end() >>>> > would presumably return _allocation_end + >>>> > alignment_reserve(), and have an assert >>>> > checking that _allocation_end == >>>> > _last_slow_path_allocation_end, >>>> complaining >>>> > that this invariant must hold, and that a >>>> > caller to this function, such as >>>> > make_parsable(), must first explicitly >>>> > synchronize the ends as required, to honor >>>> > that invariant. >>>> > >>>> > >>>> > I've renamed the variables to how you >>>> preferred >>>> > it except for the _end one. I did: >>>> > >>>> > current_end >>>> > >>>> > last_allocation_address >>>> > >>>> > tlab_end_ptr >>>> > >>>> > The reason is that the architecture dependent >>>> > code use the thread.hpp API and it already has >>>> > tlab included into the name so it becomes >>>> > tlab_current_end (which is better that >>>> > tlab_current_tlab_end in my opinion). >>>> > >>>> > I also moved the update into a separate method >>>> > with a TODO that says to remove it when >>>> > FastTLABRefill is deprecated >>>> > >>>> > This looks a lot better now. Thanks. >>>> > >>>> > Note that the following comment now needs updating >>>> > accordingly in threadLocalAllocBuffer.hpp: >>>> > >>>> > 41 // Heap sampling is performed via >>>> > the end/actual_end fields. >>>> > >>>> > 42 // actual_end contains the real >>>> end >>>> > of the tlab allocation, >>>> > >>>> > 43 // whereas end can be set to an >>>> > arbitrary spot in the tlab to >>>> > >>>> > 44 // trip the return and sample the >>>> > allocation. >>>> > >>>> > 45 // slow_path_end is used to track >>>> > if a fast tlab refill occured >>>> > >>>> > 46 // between slowpath calls. >>>> > >>>> > There might be other comments too, I have not >>>> looked >>>> > in detail. >>>> > >>>> > This was the only spot that still had an actual_end, I >>>> > fixed it now. I'll do a sweep to double check other >>>> > comments. >>>> > >>>> > >>>> > >>>> > Not sure it's better but before >>>> updating >>>> > the webrev, I wanted to try >>>> > to get input/consensus :) >>>> > >>>> > (Note hard_end was always further off >>>> > than end). >>>> > >>>> > src/hotspot/share/prims/jvmti.xml: >>>> > >>>> > 10357 >>> > id="can_sample_heap" since="9"> >>>> > 10358 >>>> > 10359 Can sample the >>>> heap. >>>> > 10360 If this capability >>>> > is enabled then the heap sampling >>>> > methods >>>> > can be called. >>>> > 10361 >>>> > 10362 >>>> > >>>> > Looks like this capability should >>>> > not be "since 9" if it gets >>>> integrated >>>> > now. >>>> > >>>> > Updated now to 11, crossing my >>>> fingers :) >>>> > >>>> > >>>> src/hotspot/share/runtime/heapMonitoring.cpp: >>>> > >>>> > 448 if >>>> > (is_alive->do_object_b(value)) { >>>> > 449 // Update the oop >>>> to >>>> > point to the new object if it is >>>> still >>>> > alive. >>>> > 450 >>>> f->do_oop(&(trace.obj)); >>>> > 451 >>>> > 452 // Copy the old >>>> > trace, if it is still live. >>>> > 453 >>>> > >>>> _allocated_traces->at_put(curr_pos++, trace); >>>> > 454 >>>> > 455 // Store the live >>>> > trace in a cache, to be served up >>>> on >>>> > /heapz. >>>> > 456 >>>> > >>>> _traces_on_last_full_gc->append(trace); >>>> > 457 >>>> > 458 count++; >>>> > 459 } else { >>>> > 460 // If the old trace >>>> > is no longer live, add it to >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jini.george at oracle.com Thu Apr 12 04:21:54 2018 From: jini.george at oracle.com (Jini George) Date: Thu, 12 Apr 2018 09:51:54 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> Message-ID: <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> Ping: Gentle reminder ! Thanks, Jini. On 4/6/2018 9:51 PM, Jini George wrote: > Hello! > > Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 > Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ > > While trying to identify the type given an address, a WrongTypeException > was getting thrown with various clhsdb commands (like printmdo, jstack, > etc). This was since SA tries to map an address to a hotspot C++ type by > comparing the vtable address to the vtable address values of known > types. With CDS, since the vtables are copied over for the Metadata > classes, the vtable addresses themselves don't match (though, of course, > the contents will), and SA errors out. > > The fix has been implemented by making changes to read in the md region > (consisting of the c++ vtables) of the CDS archive in SA, and mapping > the vtable addresses to the corresponding metadata type (ConstantPool, > InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, > InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). > > For corefiles, an additional modification has been done to have the > replicated FileMapHeader structure (from > src/hotspot/share/memory/filemap.hpp, which is replicated in SA in > ps_core.c), to be in sync with the corresponding definition in > src/hotspot/share/memory/filemap.hpp. > > Test cases to test both live and corefile debugging are being added with > this. These and other SA tests pass on Mach5. > > Thanks, > Jini. From karen.kinnear at oracle.com Thu Apr 12 15:15:25 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 12 Apr 2018 11:15:25 -0400 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> Message-ID: JC, > On Apr 11, 2018, at 8:17 PM, JC Beyler wrote: > > Hi Karen, > > I put up a new webrev that is feature complete in my mind in terms of implementation. There could be a few tid-bits of optimizations here and there but I believe everything is now there in terms of features and there is the question of placement of collectors (I've now put it lower than what you talk about in this email thread). I believe that the primary goal of your JEP is to catch samples of allocation due to bytecodes. Given that, it makes sense to put the collectors in the code generators, so I am ok with your leaving them where they are. And it would save us all a lot of cycles if rather than frequent webrev updates with subsets of the requested changes - if you could wait until you?ve added all the changes people requested - then that would increase the signal to noise ratio. > > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/ > and the full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/ > > The incremental webrev contains the name change of TLAB fields as per the conversation going on in the GC thread and the Graal changes to make this webrev not break anything. It also contains a test for when VM and sampled events are enabled at the same time. > > I'll answer your questions inlined below: > > > > I have a couple of design questions before getting to the detailed code review: > > 1. jvmtiExport.cpp > You have simplified the JvmtiObjectAllocEventCollector to assume there is only a single object. > Do you have a test that allocates a multi-dimensional array? > I would expect that to have multiple subarrays - and need the logic that you removed. > > I "think" you misread this. I did not change the implementation of JvmtiObjectAllocEventCollector to assume only one object. Actually the implementation is still doing what it was doing initially but now JvmtiObjectAllocEventCollector is a main class with two inherited behaviors: > - JvmtiVMObjectAllocEventCollector is following the same logic as before > - JvmtiSampledObjectAllocEventCollector has the logic of a single allocation during collection since it is placed that low. I'm thinking of perhaps separating the two classes now that the sampled case is so low and does not require the extra logic of handling a growable array. > > I don't have a test that tests multi-dimensional array. I have not looked at what exactly VMObjectAllocEvents track but I did do an example to see: > > - On a clean JDK, if I allocate: > int[][][] myAwesomeTable = new int[10][10][10]; > > I get one single VMObject call back. > > - With my change, I get the same behavior. > > So instead, I did an object containing an array and cloned it. With the clean JDK I get two VM callbacks with and without my change. > > I'll add a test in the next webrev to ensure correctness. > > > *** Please add a test in which you allocate a multi-dimensional array - and try it with your earlier version which > will show the full set of allocated objects > > As I said, this works because it is so low in the system. I don't know the internals of the allocation of a three-dimensional array but: > - If I try to collect all allocations required for the new int[10][10][10], I get more than a hundred allocation callbacks if the sample rate is 0, which is what I'd expect (getting a callback at each allocation, so what I expect) > > So though I only collect one object, I get a callback for each if that is what we want in regard to the sampling rate. I?ll let you work this one out with Serguei - if he is ok with your change to not have a growable array and only report one object, given that you are sampling, sounds like that is not a functional loss for you. And yes, please do add the test. > > > 2. Tests - didn?t read them all - just ran into a hardcoded ?10% error ensures a sanity test without becoming flaky? > do check with Serguei on this - that looks like a potential future test failure > > Because the sampling rate is a geometric variable around a mean of the sampling rate, any meaningful test is going to have to be statistical. Therefore, I do a bit of error acceptance to allow us to test for the real thing and not hack the code to have less "real" tests. This is what we do internally, let me know if you want me to do it otherwise. > > Flaky is probably a wrong term or perhaps I need to better explain this. I'll change the comment in the tests to explain that potentially flakyness comes from the nature of the geometrical mean. Because we don't want too long running tests, it makes sense to me to have this error percentage. > > Let me now answer the comments from the other email here as well so we have all answers and conversations in a single thread: >> - Note there is one current caveat: if the agent requests the VMObjectAlloc event, the sampler defers to that event due to a limitation in its implementation (ie: I am not convinced I can safely send out that event with the VM collector enabled, I'll happily white board that). > Please work that one out with Serguei and the serviceability folks. > > > Agreed. I'll follow up with Serguei if this is a potential problem, I have to double check and ensure I am right that there is an issue. I see it as just a matter of life and not a problem for now. IF you do want both events, having a sample drop due to this limitation does not invalidate the system in my mind. I could be wrong about it though and would happily go over what I saw. > > >> >> So the Heap Sampling Monitoring System used to have more methods. It made sense to have them in a separate category. I now have moved it to the memory category to be consistent and grouped there. I also removed that link btw. > Thanks. > > Actually it did seem weird to put it there since there was only Allocate/Deallocate, so for now the method is still again in its own category once again. If someone has a better spot, let me know. > >> >> >> I was trying to figure out a way to put the collectors farther down the call stack so as to both catch more >> cases and to reduce the maintenance burden - i.e. if you were to add a new code generator, e.g. Graal - >> if it were to go through an existing interface, that might be a place to already have a collector. >> >> I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - and it appears that there >> are calls to instanceKlass::new_instance, oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >> so one possibility would be to put hooks in those calls which would catch many? (I did not do a thorough search) >> of the slowpath calls for the bytecodes, and then check the fast paths in detail. >> >> I'll come to a major issue with the collector and its placement in the next paragraph. > Still not clear on why you did not move the collectors into instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray > and ArrayKlass::multi-allocate. As I said above, I am ok with your leaving the collectors in the code generators since that is your focus. > > I think what was happening is that the collectors would wrap the objects in handles but references to the originally allocated object would be on the stack still and would not get updated if required. Due to that issue, I believe was getting weird bugs. > > Because of this, it seems that any VM collector enabled has to guarantee that either: > - its path to destruction (and thus posting of events) has no means of triggering a GC that would move things around (as long as you are in VM code you should be fine I believe) > - if GC is occuring, the objects in its internal array are not somewhere on the stack without a handle around them to be able to be moved if need by from a GC operation. > > I'm not convinced this holds in the multithreaded with sampling and VM collection cases. I will let you and Serguei work out whether you have sufficient test coverage for the multithreaded cases. thanks, Karen > >> >> >> I had wondered if it made sense to move the hooks even farther down, into CollectedHeap:obj_allocate and array_allocate. >> I do not think so. First reason is that for multidimensional arrays, ArrayKlass::multi_allocate the outer dimension array would >> have an event before storing the inner sub-arrays and I don?t think we want that exposed, so that won?t work for arrays. >> >> So the major difficulty is that the steps of collection do this: >> >> - An object gets allocated and is decided to be sampled >> - The original pointer placement (where it resides originally in memory) is passed to the collector >> - Now one important thing of note: >> (a) In the VM code, until the point where the oop is going to be returned, GC is not yet aware of it >> (b) so the collector can't yet send it out to the user via JVMTI otherwise, the agent could put a weak reference for example >> >> I'm a bit fuzzy on this and maybe it's just that there would be more heavy lifting to make this possible but my initial tests seem to show problems when attempting this in the obj_allocate area. > Not sure what you are seeing here - > > Let me state it the way I understand it. > 1) You can collect the object into internal metadata at allocation point - which you already had > Note: see comment in JvmtiExport.cpp: > // In the case of the sampled object collector, we don?t want to perform the > // oops_do because the object in the collector is still stored in registers > // on the VM stack > - so GC will find these objects as roots, once we allow GC to run, which should be after the header is initialized > > Totally agree with you that you can not post the event in the source code in which you allocate the memory - keep reading > > 2) event posting: > - you want to ensure that the object has been fully initialized > - the object needs to have the header set up - not just the memory allocated - so that applies to all objects > (and that is done in a caller of the allocation code - so it can?t be done at the location which does the memory allocation) > - the example I pointed out was the multianewarray case - all the subarrays need to be allocated > > - please add a test case for multi-array, so as you experiment with where to post the event, you ensure that you can > access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as subarrays) > > Technically it's more than just initialized, it is the fact that you cannot perform a callback about an object if any object of that thread is being held by a collector and also in a register/stack space without protections. > > > - prior to setting up the object header information, GC would not know about the object > - was this by chance the issue you ran into? > > No I believe the issue I was running into was above where an object on the stack was pointing to an oop that got moved during a GC due to that thread doing an event callback. > > Thanks for your help, > Jc > > > 3) event posting > - when you post the event to JVMTI > - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), which creates JNIHandles::make_local(_thread, obj) >> >> >> The second reason is that I strongly suspect the scope you want is bytecodes only. I think once you have added hooks >> to all the fast paths and slow paths that this will be pushing the performance overhead constraints you proposed and >> you won?t want to see e.g. internal allocations. >> >> Yes agreed, allocations from bytecodes are mostly our concern generally :) >> >> >> But I think you need to experiment with the set of allocations (or possible alternative sets of allocations) you want recorded. >> >> The hooks I see today include: >> Interpreter: (looking at x86 as a sample) >> - slowpath in InterpreterRuntime >> - fastpath tlab allocation - your new threshold check handles that >> >> Agreed >> >> - allow_shared_alloc (GC specific): for _new isn?t handled >> >> Where is that exactly? I can check why we are not catching it? >> >> >> C1 >> I don?t see changes in c1_Runtime.cpp >> note: you also want to look for the fast path >> >> I added the calls to c1_Runtime in the latest webrev, but was still going through testing before pushing it out. I had waited on this one a bit. Fast path would be handled by the threshold check no? >> >> >> C2: changes in opto/runtime.cpp for slow path >> did you also catch the fast path? >> >> Fast path gets handled by the same threshold check, no? Perhaps I've missed something (very likely)? >> >> >> 3. Performance - >> After you get all the collectors added - you need to rerun the performance numbers. >> >> Agreed :) >> >> >> thanks, >> Karen >> >>> On Apr 5, 2018, at 2:15 PM, JC Beyler > wrote: >>> >>> Thanks Boris and Derek for testing it. >>> >>> Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features. >>> >>> Here is the incremental webrev: >>> >>> Here is the full webrev: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >>> >>> Basically, the new tests assert this: >>> - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one >>> - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit >>> >>> There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor. >>> >>> Let me know what you think, >>> Jc >>> >>> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich > wrote: >>> Hi JC, >>> >>> I have just checked on arm32: your patch compiles and runs ok. >>> >>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>> correspond to actual library name: libHeapMonitorTest.c -> >>> libHeapMonitorTest.so >>> >>> Boris >>> >>> On 04.04.2018 01:54, White, Derek wrote: >>> > Thanks JC, >>> > >>> > New patch applies cleanly. Compiles and runs (simple test programs) on >>> > aarch64. >>> > >>> > * Derek >>> > >>> > *From:* JC Beyler [mailto:jcbeyler at google.com ] >>> > *Sent:* Monday, April 02, 2018 1:17 PM >>> > *To:* White, Derek > >>> > *Cc:* Erik ?sterlund >; >>> > serviceability-dev at openjdk.java.net ; hotspot-compiler-dev >>> > > >>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>> > >>> > Hi Derek, >>> > >>> > I know there were a few things that went in that provoked a merge >>> > conflict. I worked on it and got it up to date. Sadly my lack of >>> > knowledge makes it a full rebase instead of keeping all the history. >>> > However, with a newly cloned jdk/hs you should now be able to use: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>> > >>> > The change you are referring to was done with the others so perhaps you >>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>> > don't know but it's been there and I checked, it is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>> > >>> > I double checked that tlab_end_offset no longer appears in any >>> > architecture (as far as I can tell :)). >>> > >>> > Thanks for testing and let me know if you run into any other issues! >>> > >>> > Jc >>> > >>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek >>> > >> wrote: >>> > >>> > Hi Jc, >>> > >>> > I?ve been having trouble getting your patch to apply correctly. I >>> > may have based it on the wrong version. >>> > >>> > In any case, I think there?s a missing update to >>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>> > where ?JavaThread::tlab_end_offset()? should become >>> > ?JavaThread::tlab_current_end_offset()?. >>> > >>> > This should correspond to the other port?s changes in >>> > templateTable_.cpp files. >>> > >>> > Thanks! >>> > - Derek >>> > >>> > *From:* hotspot-compiler-dev >>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>> > >] *On Behalf >>> > Of *JC Beyler >>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>> > *To:* Erik ?sterlund >>> > >> >>> > *Cc:* serviceability-dev at openjdk.java.net >>> > >; hotspot-compiler-dev >>> > >>> > >> >>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>> > >>> > Hi all, >>> > >>> > I've been working on deflaking the tests mostly and the wording in >>> > the JVMTI spec. >>> > >>> > Here is the two incremental webrevs: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >>> > >>> > Here is the total webrev: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >>> > >>> > Here are the notes of this change: >>> > >>> > - Currently the tests pass 100 times in a row, I am working on >>> > checking if they pass 1000 times in a row. >>> > >>> > - The default sampling rate is set to 512k, this is what we use >>> > internally and having a default means that to enable the sampling >>> > with the default, the user only has to do a enable event/disable >>> > event via JVMTI (instead of enable + set sample rate). >>> > >>> > - I deprecated the code that was handling the fast path tlab >>> > refill if it happened since this is now deprecated >>> > >>> > - Though I saw that Graal is still using it so I have to see >>> > what needs to be done there exactly >>> > >>> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for >>> > when the event system is turned on and the callback to the native >>> > agent is just empty. I got a 3% overhead with a 512k sampling rate >>> > with the code I put in the native side of my tests. >>> > >>> > Thanks and comments are appreciated, >>> > >>> > Jc >>> > >>> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >>> > >> wrote: >>> > >>> > Hi all, >>> > >>> > The incremental webrev update is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >>> > >>> > The full webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >>> > >>> > Major change here is: >>> > >>> > - I've removed the heapMonitoring.cpp code in favor of just >>> > having the sampling events as per Serguei's request; I still >>> > have to do some overhead measurements but the tests prove the >>> > concept can work >>> > >>> > - Most of the tlab code is unchanged, the only major >>> > part is that now things get sent off to event collectors when >>> > used and enabled. >>> > >>> > - Added the interpreter collectors to handle interpreter >>> > execution >>> > >>> > - Updated the name from SetTlabHeapSampling to >>> > SetHeapSampling to be more generic >>> > >>> > - Added a mutex for the thread sampling so that we can >>> > initialize an internal static array safely >>> > >>> > - Ported the tests from the old system to this new one >>> > >>> > I've also updated the JEP and CSR to reflect these changes: >>> > >>> > https://bugs.openjdk.java.net/browse/JDK-8194905 >>> > >>> > https://bugs.openjdk.java.net/browse/JDK-8171119 >>> > >>> > In order to make this have some forward progress, I've removed >>> > the heap sampling code entirely and now rely entirely on the >>> > event sampling system. The tests reflect this by using a >>> > simplified implementation of what an agent could do: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >>> > >>> > (Search for anything mentioning event_storage). >>> > >>> > I have not taken the time to port the whole code we had >>> > originally in heapMonitoring to this. I hesitate only because >>> > that code was in C++, I'd have to port it to C and this is for >>> > tests so perhaps what I have now is good enough? >>> > >>> > As far as testing goes, I've ported all the relevant tests and >>> > then added a few: >>> > >>> > - Turning the system on/off >>> > >>> > - Testing using various GCs >>> > >>> > - Testing using the interpreter >>> > >>> > - Testing the sampling rate >>> > >>> > - Testing with objects and arrays >>> > >>> > - Testing with various threads >>> > >>> > Finally, as overhead goes, I have the numbers of the system off >>> > vs a clean build and I have 0% overhead, which is what we'd >>> > want. This was using the Dacapo benchmarks. I am now preparing >>> > to run a version with the events on using dacapo and will report >>> > back here. >>> > >>> > Any comments are welcome :) >>> > >>> > Jc >>> > >>> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler >>> > >> wrote: >>> > >>> > Hi all, >>> > >>> > I apologize for the delay but I wanted to add an event >>> > system and that took a bit longer than expected and I also >>> > reworked the code to take into account the deprecation of >>> > FastTLABRefill. >>> > >>> > This update has four parts: >>> > >>> > A) I moved the implementation from Thread to >>> > ThreadHeapSampler inside of Thread. Would you prefer it as a >>> > pointer inside of Thread or like this works for you? Second >>> > question would be would you rather have an association >>> > outside of Thread altogether that tries to remember when >>> > threads are live and then we would have something like: >>> > >>> > ThreadHeapSampler::get_sampling_size(this_thread); >>> > >>> > I worry about the overhead of this but perhaps it is not too >>> > too bad? >>> > >>> > B) I also have been working on the Allocation event system >>> > that sends out a notification at each sampled event. This >>> > will be practical when wanting to do something at the >>> > allocation point. I'm also looking at if the whole >>> > heapMonitoring code could not reside in the agent code and >>> > not in the JDK. I'm not convinced but I'm talking to Serguei >>> > about it to see/assess :) >>> > >>> > - Also added two tests for the new event subsystem >>> > >>> > C) Removed the slow_path fields inside the TLAB code since >>> > now FastTLABRefill is deprecated >>> > >>> > D) Updated the JVMTI documentation and specification for the >>> > methods. >>> > >>> > So the incremental webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >>> > >>> > and the full webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >>> > >>> > I believe I have updated the various JIRA issues that track >>> > this :) >>> > >>> > Thanks for your input, >>> > >>> > Jc >>> > >>> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >>> > >> wrote: >>> > >>> > Hi Erik, >>> > >>> > I inlined my answers, which the last one seems to answer >>> > Robbin's concerns about the same thing (adding things to >>> > Thread). >>> > >>> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >>> > >>> > >> wrote: >>> > >>> > Hi JC, >>> > >>> > Comments are inlined below. >>> > >>> > On 2018-02-13 06:18, JC Beyler wrote: >>> > >>> > Hi Erik, >>> > >>> > Thanks for your answers, I've now inlined my own >>> > answers/comments. >>> > >>> > I've done a new webrev here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>> > > >>> > >>> > The incremental is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>> > > >>> > >>> > Note to all: >>> > >>> > - I've been integrating changes from >>> > Erin/Serguei/David comments so this webrev >>> > incremental is a bit an answer to all comments >>> > in one. I apologize for that :) >>> > >>> > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund >>> > >>> > >> wrote: >>> > >>> > Hi JC, >>> > >>> > Sorry for the delayed reply. >>> > >>> > Inlined answers: >>> > >>> > >>> > >>> > On 2018-02-06 00:04, JC Beyler wrote: >>> > >>> > Hi Erik, >>> > >>> > (Renaming this to be folded into the >>> > newly renamed thread :)) >>> > >>> > First off, thanks a lot for reviewing >>> > the webrev! I appreciate it! >>> > >>> > I updated the webrev to: >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>> > > >>> > >>> > And the incremental one is here: >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>> > > >>> > >>> > It contains: >>> > - The change for since from 9 to 11 for >>> > the jvmti.xml >>> > - The use of the OrderAccess for initialized >>> > - Clearing the oop >>> > >>> > I also have inlined my answers to your >>> > comments. The biggest question >>> > will come from the multiple *_end >>> > variables. A bit of the logic there >>> > is due to handling the slow path refill >>> > vs fast path refill and >>> > checking that the rug was not pulled >>> > underneath the slowpath. I >>> > believe that a previous comment was that >>> > TlabFastRefill was going to >>> > be deprecated. >>> > >>> > If this is true, we could revert this >>> > code a bit and just do a : if >>> > TlabFastRefill is enabled, disable this. >>> > And then deprecate that when >>> > TlabFastRefill is deprecated. >>> > >>> > This might simplify this webrev and I >>> > can work on a follow-up that >>> > either: removes TlabFastRefill if Robbin >>> > does not have the time to do >>> > it or add the support to the assembly >>> > side to handle this correctly. >>> > What do you think? >>> > >>> > I support removing TlabFastRefill, but I >>> > think it is good to not depend on that >>> > happening first. >>> > >>> > >>> > I'm slowly pushing on the FastTLABRefill >>> > (https://bugs.openjdk.java.net/browse/JDK-8194084 ), >>> > I agree on keeping both separate for now though >>> > so that we can think of both differently >>> > >>> > Now, below, inlined are my answers: >>> > >>> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >>> > ?sterlund >>> > >>> > >> wrote: >>> > >>> > Hi JC, >>> > >>> > Hope I am reviewing the right >>> > version of your work. Here goes... >>> > >>> > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>> > >>> > 159 >>> > AllocTracer::send_allocation_outside_tlab(klass, result, size * >>> > HeapWordSize, THREAD); >>> > 160 >>> > 161 >>> > THREAD->tlab().handle_sample(THREAD, result, size); >>> > 162 return result; >>> > 163 } >>> > >>> > Should not call tlab()->X without >>> > checking if (UseTLAB) IMO. >>> > >>> > Done! >>> > >>> > >>> > More about this later. >>> > >>> > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>> > >>> > So first of all, there seems to >>> > quite a few ends. There is an "end", >>> > a "hard >>> > end", a "slow path end", and an >>> > "actual end". Moreover, it seems >>> > like the >>> > "hard end" is actually further away >>> > than the "actual end". So the "hard end" >>> > seems like more of a "really >>> > definitely actual end" or something. >>> > I don't >>> > know about you, but I think it looks >>> > kind of messy. In particular, I don't >>> > feel like the name "actual end" >>> > reflects what it represents, >>> > especially when >>> > there is another end that is behind >>> > the "actual end". >>> > >>> > 413 HeapWord* >>> > ThreadLocalAllocBuffer::hard_end() { >>> > 414 // Did a fast TLAB refill >>> > occur? >>> > 415 if (_slow_path_end != _end) { >>> > 416 // Fix up the actual end >>> > to be now the end of this TLAB. >>> > 417 _slow_path_end = _end; >>> > 418 _actual_end = _end; >>> > 419 } >>> > 420 >>> > 421 return _actual_end + >>> > alignment_reserve(); >>> > 422 } >>> > >>> > I really do not like making getters >>> > unexpectedly have these kind of side >>> > effects. It is not expected that >>> > when you ask for the "hard end", you >>> > implicitly update the "slow path >>> > end" and "actual end" to new values. >>> > >>> > As I said, a lot of this is due to the >>> > FastTlabRefill. If I make this >>> > not supporting FastTlabRefill, this goes >>> > away. The reason the system >>> > needs to update itself at the get is >>> > that you only know at that get if >>> > things have shifted underneath the tlab >>> > slow path. I am not sure of >>> > really better names (naming is hard!), >>> > perhaps we could do these >>> > names: >>> > >>> > - current_tlab_end // Either the >>> > allocated tlab end or a sampling point >>> > - last_allocation_address // The end of >>> > the tlab allocation >>> > - last_slowpath_allocated_end // In >>> > case a fast refill occurred the >>> > end might have changed, this is to >>> > remember slow vs fast past refills >>> > >>> > the hard_end method can be renamed to >>> > something like: >>> > tlab_end_pointer() // The end of >>> > the lab including a bit of >>> > alignment reserved bytes >>> > >>> > Those names sound better to me. Could you >>> > please provide a mapping from the old names >>> > to the new names so I understand which one >>> > is which please? >>> > >>> > This is my current guess of what you are >>> > proposing: >>> > >>> > end -> current_tlab_end >>> > actual_end -> last_allocation_address >>> > slow_path_end -> last_slowpath_allocated_end >>> > hard_end -> tlab_end_pointer >>> > >>> > Yes that is correct, that was what I was proposing. >>> > >>> > I would prefer this naming: >>> > >>> > end -> slow_path_end // the end for taking a >>> > slow path; either due to sampling or refilling >>> > actual_end -> allocation_end // the end for >>> > allocations >>> > slow_path_end -> last_slow_path_end // last >>> > address for slow_path_end (as opposed to >>> > allocation_end) >>> > hard_end -> reserved_end // the end of the >>> > reserved space of the TLAB >>> > >>> > About setting things in the getter... that >>> > still seems like a very unpleasant thing to >>> > me. It would be better to inspect the call >>> > hierarchy and explicitly update the ends >>> > where they need updating, and assert in the >>> > getter that they are in sync, rather than >>> > implicitly setting various ends as a >>> > surprising side effect in a getter. It looks >>> > like the call hierarchy is very small. With >>> > my new naming convention, reserved_end() >>> > would presumably return _allocation_end + >>> > alignment_reserve(), and have an assert >>> > checking that _allocation_end == >>> > _last_slow_path_allocation_end, complaining >>> > that this invariant must hold, and that a >>> > caller to this function, such as >>> > make_parsable(), must first explicitly >>> > synchronize the ends as required, to honor >>> > that invariant. >>> > >>> > >>> > I've renamed the variables to how you preferred >>> > it except for the _end one. I did: >>> > >>> > current_end >>> > >>> > last_allocation_address >>> > >>> > tlab_end_ptr >>> > >>> > The reason is that the architecture dependent >>> > code use the thread.hpp API and it already has >>> > tlab included into the name so it becomes >>> > tlab_current_end (which is better that >>> > tlab_current_tlab_end in my opinion). >>> > >>> > I also moved the update into a separate method >>> > with a TODO that says to remove it when >>> > FastTLABRefill is deprecated >>> > >>> > This looks a lot better now. Thanks. >>> > >>> > Note that the following comment now needs updating >>> > accordingly in threadLocalAllocBuffer.hpp: >>> > >>> > 41 // Heap sampling is performed via >>> > the end/actual_end fields. >>> > >>> > 42 // actual_end contains the real end >>> > of the tlab allocation, >>> > >>> > 43 // whereas end can be set to an >>> > arbitrary spot in the tlab to >>> > >>> > 44 // trip the return and sample the >>> > allocation. >>> > >>> > 45 // slow_path_end is used to track >>> > if a fast tlab refill occured >>> > >>> > 46 // between slowpath calls. >>> > >>> > There might be other comments too, I have not looked >>> > in detail. >>> > >>> > This was the only spot that still had an actual_end, I >>> > fixed it now. I'll do a sweep to double check other >>> > comments. >>> > >>> > >>> > >>> > Not sure it's better but before updating >>> > the webrev, I wanted to try >>> > to get input/consensus :) >>> > >>> > (Note hard_end was always further off >>> > than end). >>> > >>> > src/hotspot/share/prims/jvmti.xml: >>> > >>> > 10357 >> > id="can_sample_heap" since="9"> >>> > 10358 >>> > 10359 Can sample the heap. >>> > 10360 If this capability >>> > is enabled then the heap sampling >>> > methods >>> > can be called. >>> > 10361 >>> > 10362 >>> > >>> > Looks like this capability should >>> > not be "since 9" if it gets integrated >>> > now. >>> > >>> > Updated now to 11, crossing my fingers :) >>> > >>> > src/hotspot/share/runtime/heapMonitoring.cpp: >>> > >>> > 448 if >>> > (is_alive->do_object_b(value)) { >>> > 449 // Update the oop to >>> > point to the new object if it is still >>> > alive. >>> > 450 f->do_oop(&(trace.obj)); >>> > 451 >>> > 452 // Copy the old >>> > trace, if it is still live. >>> > 453 >>> > _allocated_traces->at_put(curr_pos++, trace); >>> > 454 >>> > 455 // Store the live >>> > trace in a cache, to be served up on >>> > /heapz. >>> > 456 >>> > _traces_on_last_full_gc->append(trace); >>> > 457 >>> > 458 count++; >>> > 459 } else { >>> > 460 // If the old trace >>> > is no longer live, add it to -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Thu Apr 12 18:24:33 2018 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 12 Apr 2018 20:24:33 +0200 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module Message-ID: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. http://cr.openjdk.java.net/~dnsimon/8187490/ -Doug From jcbeyler at google.com Thu Apr 12 21:12:10 2018 From: jcbeyler at google.com (JC Beyler) Date: Thu, 12 Apr 2018 21:12:10 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> Message-ID: Hi Karen, I apologize for sending too many webrevs. I try/tend to iterate fast and move in an iterative fashion. I also try to solve most, if not all, of the current items that are requested in one go. Perhaps I failed in doing that recently? I apologize for that. So I promise to not send a new webrev in this email or until I'm pretty sure I got all the current (And any incoming comment/reviews) handled :-) For the points you brought up: a) What are we sampling? In my mind, I'd rather have the sampler be sampling anything the thread is allocating and not only sample bytecode allocations. It turns out that I was focusing on that first to get it up. As I was stuck in figuring out how to get the VM collector and the sampling collector to co-exist, there was a bit of issues there. - That has been solved by now delaying the posting of a sampled object if a VM collector is present. So now that I've better understood interactions between collectors and when you could post an event, I'm way more able to talk about the feasibility and validity of the next item about bigger objects. b) You bring up an excellent point of if we have a multi-array object or a more complex object (such as a cloned object for example), if the sampler is tripped on an internal allocation, should we send that smaller allocation or should we send the bigger object - Because we get the stacktrace and we only use the oop to figure out GC information about the liveness of the object in our use-case in the JVMTI agent, this changes nothing really in practice. I do see value in sending the multi-array object as a whole to a user. - If that is what you think is best, I can work on getting that supported and the multi-array test would then prove that if part of the multi-array is sampled, the sampler returns the whole multi-array. Hopefully that answers your concern on me sending too many webrevs, to which I sincerely apologize. Probably a learning curve of different approaches of reviews. And I hope that my other answers do show the direction you were hoping to see. Thanks again for all your help, Jc On Thu, Apr 12, 2018 at 8:15 AM Karen Kinnear wrote: > JC, > > > On Apr 11, 2018, at 8:17 PM, JC Beyler wrote: > > Hi Karen, > > I put up a new webrev that is feature complete in my mind in terms of > implementation. There could be a few tid-bits of optimizations here and > there but I believe everything is now there in terms of features and there > is the question of placement of collectors (I've now put it lower than what > you talk about in this email thread). > > I believe that the primary goal of your JEP is to catch samples of > allocation due to bytecodes. Given that, it makes sense to > put the collectors in the code generators, so I am ok with your leaving > them where they are. > > And it would save us all a lot of cycles if rather than frequent webrev > updates with subsets of the requested changes - if you > could wait until you?ve added all the changes people requested - then that > would increase the signal to noise ratio. > > > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/ > and the full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/ > > The incremental webrev contains the name change of TLAB fields as per the > conversation going on in the GC thread and the Graal changes to make this > webrev not break anything. It also contains a test for when VM and sampled > events are enabled at the same time. > > I'll answer your questions inlined below: > > >> >> I have a couple of design questions before getting to the detailed code >> review: >> >> 1. jvmtiExport.cpp >> You have simplified the JvmtiObjectAllocEventCollector to assume there is >> only a single object. >> Do you have a test that allocates a multi-dimensional array? >> I would expect that to have multiple subarrays - and need the logic that >> you removed. >> > > I "think" you misread this. I did not change the implementation of JvmtiObjectAllocEventCollector > to assume only one object. Actually the implementation is still doing what > it was doing initially but now JvmtiObjectAllocEventCollector is a main > class with two inherited behaviors: > - JvmtiVMObjectAllocEventCollector is following the same logic as before > - JvmtiSampledObjectAllocEventCollector has the logic of a single > allocation during collection since it is placed that low. I'm thinking of > perhaps separating the two classes now that the sampled case is so low and > does not require the extra logic of handling a growable array. > > I don't have a test that tests multi-dimensional array. I have not looked > at what exactly VMObjectAllocEvents track but I did do an example to see: > > - On a clean JDK, if I allocate: > int[][][] myAwesomeTable = new int[10][10][10]; > > I get one single VMObject call back. > > > - With my change, I get the same behavior. > > So instead, I did an object containing an array and cloned it. With the > clean JDK I get two VM callbacks with and without my change. > > I'll add a test in the next webrev to ensure correctness. > > >> *** Please add a test in which you allocate a multi-dimensional array - >> and try it with your earlier version which >> will show the full set of allocated objects >> > > As I said, this works because it is so low in the system. I don't know the > internals of the allocation of a three-dimensional array but: > - If I try to collect all allocations required for the new > int[10][10][10], I get more than a hundred allocation callbacks if the > sample rate is 0, which is what I'd expect (getting a callback at each > allocation, so what I expect) > > So though I only collect one object, I get a callback for each if that is > what we want in regard to the sampling rate. > > I?ll let you work this one out with Serguei - if he is ok with your change > to not have a growable array and only report one object, > given that you are sampling, sounds like that is not a functional loss for > you. > > And yes, please do add the test. > > > >> 2. Tests - didn?t read them all - just ran into a hardcoded ?10% error >> ensures a sanity test without becoming flaky? >> do check with Serguei on this - that looks like a potential future test >> failure >> > > Because the sampling rate is a geometric variable around a mean of the > sampling rate, any meaningful test is going to have to be statistical. > Therefore, I do a bit of error acceptance to allow us to test for the real > thing and not hack the code to have less "real" tests. This is what we do > internally, let me know if you want me to do it otherwise. > > Flaky is probably a wrong term or perhaps I need to better explain this. > I'll change the comment in the tests to explain that potentially flakyness > comes from the nature of the geometrical mean. Because we don't want too > long running tests, it makes sense to me to have this error percentage. > > Let me now answer the comments from the other email here as well so we > have all answers and conversations in a single thread: > >> - Note there is one current caveat: if the agent requests the >> VMObjectAlloc event, the sampler defers to that event due to a limitation >> in its implementation (ie: I am not convinced I can safely send out that >> event with the VM collector enabled, I'll happily white board that). >> >> Please work that one out with Serguei and the serviceability folks. >> >> > Agreed. I'll follow up with Serguei if this is a potential problem, I have > to double check and ensure I am right that there is an issue. I see it as > just a matter of life and not a problem for now. IF you do want both > events, having a sample drop due to this limitation does not invalidate the > system in my mind. I could be wrong about it though and would happily go > over what I saw. > > > >>> So the Heap Sampling Monitoring System used to have more methods. It >>> made sense to have them in a separate category. I now have moved it to the >>> memory category to be consistent and grouped there. I also removed that >>> link btw. >>> >> Thanks. >> > > Actually it did seem weird to put it there since there was only > Allocate/Deallocate, so for now the method is still again in its own > category once again. If someone has a better spot, let me know. > > >>> >>>> >>>> I was trying to figure out a way to put the collectors farther down the >>>> call stack so as to both catch more >>>> cases and to reduce the maintenance burden - i.e. if you were to add a >>>> new code generator, e.g. Graal - >>>> if it were to go through an existing interface, that might be a place >>>> to already have a collector. >>>> >>>> I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp >>>> - and it appears that there >>>> are calls to instanceKlass::new_instance, >>>> oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >>>> so one possibility would be to put hooks in those calls which would >>>> catch many? (I did not do a thorough search) >>>> of the slowpath calls for the bytecodes, and then check the fast paths >>>> in detail. >>>> >>> >>> I'll come to a major issue with the collector and its placement in the >>> next paragraph. >>> >> Still not clear on why you did not move the collectors into >> instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray >> and ArrayKlass::multi-allocate. >> > As I said above, I am ok with your leaving the collectors in the code > generators since that is your focus. > > > I think what was happening is that the collectors would wrap the objects > in handles but references to the originally allocated object would be on > the stack still and would not get updated if required. Due to that issue, I > believe was getting weird bugs. > > Because of this, it seems that any VM collector enabled has to guarantee > that either: > - its path to destruction (and thus posting of events) has no means of > triggering a GC that would move things around (as long as you are in VM > code you should be fine I believe) > > - if GC is occuring, the objects in its internal array are not > somewhere on the stack without a handle around them to be able to be moved > if need by from a GC operation. > > > I'm not convinced this holds in the multithreaded with sampling and VM > collection cases. > > I will let you and Serguei work out whether you have sufficient test > coverage for the multithreaded cases. > > thanks, > Karen > > > >>> >>>> >>>> I had wondered if it made sense to move the hooks even farther down, >>>> into CollectedHeap:obj_allocate and array_allocate. >>>> I do not think so. First reason is that for multidimensional arrays, >>>> ArrayKlass::multi_allocate the outer dimension array would >>>> have an event before storing the inner sub-arrays and I don?t think we >>>> want that exposed, so that won?t work for arrays. >>>> >>> >>> So the major difficulty is that the steps of collection do this: >>> >>> - An object gets allocated and is decided to be sampled >>> - The original pointer placement (where it resides originally in memory) >>> is passed to the collector >>> - Now one important thing of note: >>> (a) In the VM code, until the point where the oop is going to be >>> returned, GC is not yet aware of it >>> >> (b) so the collector can't yet send it out to the user via JVMTI >>> otherwise, the agent could put a weak reference for example >>> >>> I'm a bit fuzzy on this and maybe it's just that there would be more >>> heavy lifting to make this possible but my initial tests seem to show >>> problems when attempting this in the obj_allocate area. >>> >> Not sure what you are seeing here - >> >> Let me state it the way I understand it. >> 1) You can collect the object into internal metadata at allocation point >> - which you already had >> Note: see comment in JvmtiExport.cpp: >> // In the case of the sampled object collector, we don?t want to perform >> the >> // oops_do because the object in the collector is still stored in >> registers >> // on the VM stack >> - so GC will find these objects as roots, once we allow GC to run, >> which should be after the header is initialized >> >> Totally agree with you that you can not post the event in the source code >> in which you allocate the memory - keep reading >> >> 2) event posting: >> - you want to ensure that the object has been fully initialized >> - the object needs to have the header set up - not just the memory >> allocated - so that applies to all objects >> (and that is done in a caller of the allocation code - so it can?t be >> done at the location which does the memory allocation) >> - the example I pointed out was the multianewarray case - all the >> subarrays need to be allocated >> >> - please add a test case for multi-array, so as you experiment with where >> to post the event, you ensure that you can >> access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as >> subarrays) >> > > Technically it's more than just initialized, it is the fact that you > cannot perform a callback about an object if any object of that thread is > being held by a collector and also in a register/stack space without > protections. > > > >> - prior to setting up the object header information, GC would not know >> about the object >> - was this by chance the issue you ran into? >> > > No I believe the issue I was running into was above where an object on the > stack was pointing to an oop that got moved during a GC due to that thread > doing an event callback. > > Thanks for your help, > Jc > > >> 3) event posting >> - when you post the event to JVMTI >> - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), >> which creates JNIHandles::make_local(_thread, obj) >> >> >>> >>>> >>>> The second reason is that I strongly suspect the scope you want is >>>> bytecodes only. I think once you have added hooks >>>> to all the fast paths and slow paths that this will be pushing the >>>> performance overhead constraints you proposed and >>>> you won?t want to see e.g. internal allocations. >>>> >>> >>> Yes agreed, allocations from bytecodes are mostly our concern generally >>> :) >>> >>> >>>> >>>> >>> But I think you need to experiment with the set of allocations (or >>>> possible alternative sets of allocations) you want recorded. >>>> >>>> The hooks I see today include: >>>> Interpreter: (looking at x86 as a sample) >>>> - slowpath in InterpreterRuntime >>>> - fastpath tlab allocation - your new threshold check handles that >>>> >>> >>> Agreed >>> >>> >>>> - allow_shared_alloc (GC specific): for _new isn?t handled >>>> >>> >>> Where is that exactly? I can check why we are not catching it? >>> >>> >>>> >>>> C1 >>>> I don?t see changes in c1_Runtime.cpp >>>> note: you also want to look for the fast path >>>> >>> >>> I added the calls to c1_Runtime in the latest webrev, but was still >>> going through testing before pushing it out. I had waited on this one a >>> bit. Fast path would be handled by the threshold check no? >>> >>> >>>> C2: changes in opto/runtime.cpp for slow path >>>> did you also catch the fast path? >>>> >>> >>> Fast path gets handled by the same threshold check, no? Perhaps I've >>> missed something (very likely)? >>> >>> >>>> >>>> 3. Performance - >>>> After you get all the collectors added - you need to rerun the >>>> performance numbers. >>>> >>> >>> Agreed :) >>> >>> >>>> >>>> thanks, >>>> Karen >>>> >>>> On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: >>>> >>>> Thanks Boris and Derek for testing it. >>>> >>>> Yes I was trying to get a new version out that had the tests ported as >>>> well but got sidetracked while trying to add tests and two new features. >>>> >>>> Here is the incremental webrev: >>>> >>>> Here is the full webrev: >>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >>>> >>>> Basically, the new tests assert this: >>>> - Only one agent can currently ask for the sampling, I'm currently >>>> seeing if I can push to a next webrev the multi-agent support to start >>>> doing a code freeze on this one >>>> - The event is not thread-enabled, meaning like the >>>> VMObjectAllocationEvent, it's an all or nothing event; same as the >>>> multi-agent, I'm going to see if a future webrev to add the support is a >>>> better idea to freeze this webrev a bit >>>> >>>> There was another item that I added here and I'm unsure this webrev is >>>> stable in debug mode: I added an assertion system to ascertain that all >>>> paths leading to a TLAB slow path (and hence a sampling point) have a >>>> sampling collector ready to post the event if a user wants it. This might >>>> break a few thing in debug mode as I'm working through the kinks of that as >>>> well. However, in release mode, this new webrev passes all the tests in >>>> hotspot/jtreg/serviceability/jvmti/HeapMonitor. >>>> >>>> Let me know what you think, >>>> Jc >>>> >>>> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < >>>> boris.ulasevich at bell-sw.com> wrote: >>>> >>>>> Hi JC, >>>>> >>>>> I have just checked on arm32: your patch compiles and runs ok. >>>>> >>>>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>>>> correspond to actual library name: libHeapMonitorTest.c -> >>>>> libHeapMonitorTest.so >>>>> >>>>> Boris >>>>> >>>>> On 04.04.2018 01:54, White, Derek wrote: >>>>> > Thanks JC, >>>>> > >>>>> > New patch applies cleanly. Compiles and runs (simple test programs) >>>>> on >>>>> > aarch64. >>>>> > >>>>> > * Derek >>>>> > >>>>> > *From:* JC Beyler [mailto:jcbeyler at google.com] >>>>> > *Sent:* Monday, April 02, 2018 1:17 PM >>>>> > *To:* White, Derek >>>>> > *Cc:* Erik ?sterlund ; >>>>> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >>>>> > >>>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>>> > >>>>> > Hi Derek, >>>>> > >>>>> > I know there were a few things that went in that provoked a merge >>>>> > conflict. I worked on it and got it up to date. Sadly my lack of >>>>> > knowledge makes it a full rebase instead of keeping all the history. >>>>> > However, with a newly cloned jdk/hs you should now be able to use: >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>>>> > >>>>> > The change you are referring to was done with the others so perhaps >>>>> you >>>>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>>>> > don't know but it's been there and I checked, it is here: >>>>> > >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>>>> > >>>>> > I double checked that tlab_end_offset no longer appears in any >>>>> > architecture (as far as I can tell :)). >>>>> > >>>>> > Thanks for testing and let me know if you run into any other issues! >>>>> > >>>>> > Jc >>>>> > >>>>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek >>>> > > wrote: >>>>> > >>>>> > Hi Jc, >>>>> > >>>>> > I?ve been having trouble getting your patch to apply correctly. I >>>>> > may have based it on the wrong version. >>>>> > >>>>> > In any case, I think there?s a missing update to >>>>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>>>> > where ?JavaThread::tlab_end_offset()? should become >>>>> > ?JavaThread::tlab_current_end_offset()?. >>>>> > >>>>> > This should correspond to the other port?s changes in >>>>> > templateTable_.cpp files. >>>>> > >>>>> > Thanks! >>>>> > - Derek >>>>> > >>>>> > *From:* hotspot-compiler-dev >>>>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>>>> > ] *On >>>>> Behalf >>>>> > Of *JC Beyler >>>>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>>>> > *To:* Erik ?sterlund >>>> > > >>>>> > *Cc:* serviceability-dev at openjdk.java.net >>>>> > ; >>>>> hotspot-compiler-dev >>>>> > >>>> > > >>>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>>> > >>>>> > Hi all, >>>>> > >>>>> > I've been working on deflaking the tests mostly and the wording >>>>> in >>>>> > the JVMTI spec. >>>>> > >>>>> > Here is the two incremental webrevs: >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >>>>> > >>>>> > Here is the total webrev: >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >>>>> > >>>>> > Here are the notes of this change: >>>>> > >>>>> > - Currently the tests pass 100 times in a row, I am working on >>>>> > checking if they pass 1000 times in a row. >>>>> > >>>>> > - The default sampling rate is set to 512k, this is what we >>>>> use >>>>> > internally and having a default means that to enable the sampling >>>>> > with the default, the user only has to do a enable event/disable >>>>> > event via JVMTI (instead of enable + set sample rate). >>>>> > >>>>> > - I deprecated the code that was handling the fast path tlab >>>>> > refill if it happened since this is now deprecated >>>>> > >>>>> > - Though I saw that Graal is still using it so I have to >>>>> see >>>>> > what needs to be done there exactly >>>>> > >>>>> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead >>>>> for >>>>> > when the event system is turned on and the callback to the native >>>>> > agent is just empty. I got a 3% overhead with a 512k sampling >>>>> rate >>>>> > with the code I put in the native side of my tests. >>>>> > >>>>> > Thanks and comments are appreciated, >>>>> > >>>>> > Jc >>>>> > >>>>> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >>>> > > wrote: >>>>> > >>>>> > Hi all, >>>>> > >>>>> > The incremental webrev update is here: >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >>>>> > >>>>> > The full webrev is here: >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >>>>> > >>>>> > Major change here is: >>>>> > >>>>> > - I've removed the heapMonitoring.cpp code in favor of >>>>> just >>>>> > having the sampling events as per Serguei's request; I still >>>>> > have to do some overhead measurements but the tests prove the >>>>> > concept can work >>>>> > >>>>> > - Most of the tlab code is unchanged, the only major >>>>> > part is that now things get sent off to event collectors when >>>>> > used and enabled. >>>>> > >>>>> > - Added the interpreter collectors to handle interpreter >>>>> > execution >>>>> > >>>>> > - Updated the name from SetTlabHeapSampling to >>>>> > SetHeapSampling to be more generic >>>>> > >>>>> > - Added a mutex for the thread sampling so that we can >>>>> > initialize an internal static array safely >>>>> > >>>>> > - Ported the tests from the old system to this new one >>>>> > >>>>> > I've also updated the JEP and CSR to reflect these changes: >>>>> > >>>>> > https://bugs.openjdk.java.net/browse/JDK-8194905 >>>>> > >>>>> > https://bugs.openjdk.java.net/browse/JDK-8171119 >>>>> > >>>>> > In order to make this have some forward progress, I've >>>>> removed >>>>> > the heap sampling code entirely and now rely entirely on the >>>>> > event sampling system. The tests reflect this by using a >>>>> > simplified implementation of what an agent could do: >>>>> > >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >>>>> > >>>>> > (Search for anything mentioning event_storage). >>>>> > >>>>> > I have not taken the time to port the whole code we had >>>>> > originally in heapMonitoring to this. I hesitate only because >>>>> > that code was in C++, I'd have to port it to C and this is >>>>> for >>>>> > tests so perhaps what I have now is good enough? >>>>> > >>>>> > As far as testing goes, I've ported all the relevant tests >>>>> and >>>>> > then added a few: >>>>> > >>>>> > - Turning the system on/off >>>>> > >>>>> > - Testing using various GCs >>>>> > >>>>> > - Testing using the interpreter >>>>> > >>>>> > - Testing the sampling rate >>>>> > >>>>> > - Testing with objects and arrays >>>>> > >>>>> > - Testing with various threads >>>>> > >>>>> > Finally, as overhead goes, I have the numbers of the system >>>>> off >>>>> > vs a clean build and I have 0% overhead, which is what we'd >>>>> > want. This was using the Dacapo benchmarks. I am now >>>>> preparing >>>>> > to run a version with the events on using dacapo and will >>>>> report >>>>> > back here. >>>>> > >>>>> > Any comments are welcome :) >>>>> > >>>>> > Jc >>>>> > >>>>> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler < >>>>> jcbeyler at google.com >>>>> > > wrote: >>>>> > >>>>> > Hi all, >>>>> > >>>>> > I apologize for the delay but I wanted to add an event >>>>> > system and that took a bit longer than expected and I >>>>> also >>>>> > reworked the code to take into account the deprecation of >>>>> > FastTLABRefill. >>>>> > >>>>> > This update has four parts: >>>>> > >>>>> > A) I moved the implementation from Thread to >>>>> > ThreadHeapSampler inside of Thread. Would you prefer it >>>>> as a >>>>> > pointer inside of Thread or like this works for you? >>>>> Second >>>>> > question would be would you rather have an association >>>>> > outside of Thread altogether that tries to remember when >>>>> > threads are live and then we would have something like: >>>>> > >>>>> > ThreadHeapSampler::get_sampling_size(this_thread); >>>>> > >>>>> > I worry about the overhead of this but perhaps it is not >>>>> too >>>>> > too bad? >>>>> > >>>>> > B) I also have been working on the Allocation event >>>>> system >>>>> > that sends out a notification at each sampled event. This >>>>> > will be practical when wanting to do something at the >>>>> > allocation point. I'm also looking at if the whole >>>>> > heapMonitoring code could not reside in the agent code >>>>> and >>>>> > not in the JDK. I'm not convinced but I'm talking to >>>>> Serguei >>>>> > about it to see/assess :) >>>>> > >>>>> > - Also added two tests for the new event subsystem >>>>> > >>>>> > C) Removed the slow_path fields inside the TLAB code >>>>> since >>>>> > now FastTLABRefill is deprecated >>>>> > >>>>> > D) Updated the JVMTI documentation and specification for >>>>> the >>>>> > methods. >>>>> > >>>>> > So the incremental webrev is here: >>>>> > >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >>>>> > >>>>> > and the full webrev is here: >>>>> > >>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >>>>> > >>>>> > I believe I have updated the various JIRA issues that >>>>> track >>>>> > this :) >>>>> > >>>>> > Thanks for your input, >>>>> > >>>>> > Jc >>>>> > >>>>> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >>>>> > > >>>>> wrote: >>>>> > >>>>> > Hi Erik, >>>>> > >>>>> > I inlined my answers, which the last one seems to >>>>> answer >>>>> > Robbin's concerns about the same thing (adding >>>>> things to >>>>> > Thread). >>>>> > >>>>> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >>>>> > >>>> > > wrote: >>>>> > >>>>> > Hi JC, >>>>> > >>>>> > Comments are inlined below. >>>>> > >>>>> > On 2018-02-13 06:18, JC Beyler wrote: >>>>> > >>>>> > Hi Erik, >>>>> > >>>>> > Thanks for your answers, I've now inlined my >>>>> own >>>>> > answers/comments. >>>>> > >>>>> > I've done a new webrev here: >>>>> > >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>>>> > < >>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >>>>> > >>>>> > The incremental is here: >>>>> > >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>>>> > < >>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >>>>> > >>>>> > Note to all: >>>>> > >>>>> > - I've been integrating changes from >>>>> > Erin/Serguei/David comments so this webrev >>>>> > incremental is a bit an answer to all >>>>> comments >>>>> > in one. I apologize for that :) >>>>> > >>>>> > On Mon, Feb 12, 2018 at 6:05 AM, Erik >>>>> ?sterlund >>>>> > >>>> > > wrote: >>>>> > >>>>> > Hi JC, >>>>> > >>>>> > Sorry for the delayed reply. >>>>> > >>>>> > Inlined answers: >>>>> > >>>>> > >>>>> > >>>>> > On 2018-02-06 00:04, JC Beyler wrote: >>>>> > >>>>> > Hi Erik, >>>>> > >>>>> > (Renaming this to be folded into the >>>>> > newly renamed thread :)) >>>>> > >>>>> > First off, thanks a lot for reviewing >>>>> > the webrev! I appreciate it! >>>>> > >>>>> > I updated the webrev to: >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>>>> > < >>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >>>>> > >>>>> > And the incremental one is here: >>>>> > >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>>>> > < >>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >>>>> > >>>>> > It contains: >>>>> > - The change for since from 9 to 11 >>>>> for >>>>> > the jvmti.xml >>>>> > - The use of the OrderAccess for >>>>> initialized >>>>> > - Clearing the oop >>>>> > >>>>> > I also have inlined my answers to >>>>> your >>>>> > comments. The biggest question >>>>> > will come from the multiple *_end >>>>> > variables. A bit of the logic there >>>>> > is due to handling the slow path >>>>> refill >>>>> > vs fast path refill and >>>>> > checking that the rug was not pulled >>>>> > underneath the slowpath. I >>>>> > believe that a previous comment was >>>>> that >>>>> > TlabFastRefill was going to >>>>> > be deprecated. >>>>> > >>>>> > If this is true, we could revert this >>>>> > code a bit and just do a : if >>>>> > TlabFastRefill is enabled, disable >>>>> this. >>>>> > And then deprecate that when >>>>> > TlabFastRefill is deprecated. >>>>> > >>>>> > This might simplify this webrev and I >>>>> > can work on a follow-up that >>>>> > either: removes TlabFastRefill if >>>>> Robbin >>>>> > does not have the time to do >>>>> > it or add the support to the assembly >>>>> > side to handle this correctly. >>>>> > What do you think? >>>>> > >>>>> > I support removing TlabFastRefill, but I >>>>> > think it is good to not depend on that >>>>> > happening first. >>>>> > >>>>> > >>>>> > I'm slowly pushing on the FastTLABRefill >>>>> > ( >>>>> https://bugs.openjdk.java.net/browse/JDK-8194084), >>>>> > I agree on keeping both separate for now >>>>> though >>>>> > so that we can think of both differently >>>>> > >>>>> > Now, below, inlined are my answers: >>>>> > >>>>> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >>>>> > ?sterlund >>>>> > >>>> > > >>>>> wrote: >>>>> > >>>>> > Hi JC, >>>>> > >>>>> > Hope I am reviewing the right >>>>> > version of your work. Here >>>>> goes... >>>>> > >>>>> > >>>>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>>>> > >>>>> > 159 >>>>> > >>>>> AllocTracer::send_allocation_outside_tlab(klass, result, size * >>>>> > HeapWordSize, THREAD); >>>>> > 160 >>>>> > 161 >>>>> > >>>>> THREAD->tlab().handle_sample(THREAD, result, size); >>>>> > 162 return result; >>>>> > 163 } >>>>> > >>>>> > Should not call tlab()->X without >>>>> > checking if (UseTLAB) IMO. >>>>> > >>>>> > Done! >>>>> > >>>>> > >>>>> > More about this later. >>>>> > >>>>> > >>>>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>>>> > >>>>> > So first of all, there seems to >>>>> > quite a few ends. There is an >>>>> "end", >>>>> > a "hard >>>>> > end", a "slow path end", and an >>>>> > "actual end". Moreover, it seems >>>>> > like the >>>>> > "hard end" is actually further >>>>> away >>>>> > than the "actual end". So the >>>>> "hard end" >>>>> > seems like more of a "really >>>>> > definitely actual end" or >>>>> something. >>>>> > I don't >>>>> > know about you, but I think it >>>>> looks >>>>> > kind of messy. In particular, I >>>>> don't >>>>> > feel like the name "actual end" >>>>> > reflects what it represents, >>>>> > especially when >>>>> > there is another end that is >>>>> behind >>>>> > the "actual end". >>>>> > >>>>> > 413 HeapWord* >>>>> > >>>>> ThreadLocalAllocBuffer::hard_end() { >>>>> > 414 // Did a fast TLAB >>>>> refill >>>>> > occur? >>>>> > 415 if (_slow_path_end != >>>>> _end) { >>>>> > 416 // Fix up the actual >>>>> end >>>>> > to be now the end of this TLAB. >>>>> > 417 _slow_path_end = _end; >>>>> > 418 _actual_end = _end; >>>>> > 419 } >>>>> > 420 >>>>> > 421 return _actual_end + >>>>> > alignment_reserve(); >>>>> > 422 } >>>>> > >>>>> > I really do not like making >>>>> getters >>>>> > unexpectedly have these kind of >>>>> side >>>>> > effects. It is not expected that >>>>> > when you ask for the "hard end", >>>>> you >>>>> > implicitly update the "slow path >>>>> > end" and "actual end" to new >>>>> values. >>>>> > >>>>> > As I said, a lot of this is due to >>>>> the >>>>> > FastTlabRefill. If I make this >>>>> > not supporting FastTlabRefill, this >>>>> goes >>>>> > away. The reason the system >>>>> > needs to update itself at the get is >>>>> > that you only know at that get if >>>>> > things have shifted underneath the >>>>> tlab >>>>> > slow path. I am not sure of >>>>> > really better names (naming is >>>>> hard!), >>>>> > perhaps we could do these >>>>> > names: >>>>> > >>>>> > - current_tlab_end // Either >>>>> the >>>>> > allocated tlab end or a sampling >>>>> point >>>>> > - last_allocation_address // The >>>>> end of >>>>> > the tlab allocation >>>>> > - last_slowpath_allocated_end // In >>>>> > case a fast refill occurred the >>>>> > end might have changed, this is to >>>>> > remember slow vs fast past refills >>>>> > >>>>> > the hard_end method can be renamed to >>>>> > something like: >>>>> > tlab_end_pointer() // The end >>>>> of >>>>> > the lab including a bit of >>>>> > alignment reserved bytes >>>>> > >>>>> > Those names sound better to me. Could you >>>>> > please provide a mapping from the old >>>>> names >>>>> > to the new names so I understand which >>>>> one >>>>> > is which please? >>>>> > >>>>> > This is my current guess of what you are >>>>> > proposing: >>>>> > >>>>> > end -> current_tlab_end >>>>> > actual_end -> last_allocation_address >>>>> > slow_path_end -> >>>>> last_slowpath_allocated_end >>>>> > hard_end -> tlab_end_pointer >>>>> > >>>>> > Yes that is correct, that was what I was >>>>> proposing. >>>>> > >>>>> > I would prefer this naming: >>>>> > >>>>> > end -> slow_path_end // the end for >>>>> taking a >>>>> > slow path; either due to sampling or >>>>> refilling >>>>> > actual_end -> allocation_end // the end >>>>> for >>>>> > allocations >>>>> > slow_path_end -> last_slow_path_end // >>>>> last >>>>> > address for slow_path_end (as opposed to >>>>> > allocation_end) >>>>> > hard_end -> reserved_end // the end of >>>>> the >>>>> > reserved space of the TLAB >>>>> > >>>>> > About setting things in the getter... >>>>> that >>>>> > still seems like a very unpleasant thing >>>>> to >>>>> > me. It would be better to inspect the >>>>> call >>>>> > hierarchy and explicitly update the ends >>>>> > where they need updating, and assert in >>>>> the >>>>> > getter that they are in sync, rather than >>>>> > implicitly setting various ends as a >>>>> > surprising side effect in a getter. It >>>>> looks >>>>> > like the call hierarchy is very small. >>>>> With >>>>> > my new naming convention, reserved_end() >>>>> > would presumably return _allocation_end + >>>>> > alignment_reserve(), and have an assert >>>>> > checking that _allocation_end == >>>>> > _last_slow_path_allocation_end, >>>>> complaining >>>>> > that this invariant must hold, and that a >>>>> > caller to this function, such as >>>>> > make_parsable(), must first explicitly >>>>> > synchronize the ends as required, to >>>>> honor >>>>> > that invariant. >>>>> > >>>>> > >>>>> > I've renamed the variables to how you >>>>> preferred >>>>> > it except for the _end one. I did: >>>>> > >>>>> > current_end >>>>> > >>>>> > last_allocation_address >>>>> > >>>>> > tlab_end_ptr >>>>> > >>>>> > The reason is that the architecture dependent >>>>> > code use the thread.hpp API and it already >>>>> has >>>>> > tlab included into the name so it becomes >>>>> > tlab_current_end (which is better that >>>>> > tlab_current_tlab_end in my opinion). >>>>> > >>>>> > I also moved the update into a separate >>>>> method >>>>> > with a TODO that says to remove it when >>>>> > FastTLABRefill is deprecated >>>>> > >>>>> > This looks a lot better now. Thanks. >>>>> > >>>>> > Note that the following comment now needs >>>>> updating >>>>> > accordingly in threadLocalAllocBuffer.hpp: >>>>> > >>>>> > 41 // Heap sampling is performed >>>>> via >>>>> > the end/actual_end fields. >>>>> > >>>>> > 42 // actual_end contains the real >>>>> end >>>>> > of the tlab allocation, >>>>> > >>>>> > 43 // whereas end can be set to an >>>>> > arbitrary spot in the tlab to >>>>> > >>>>> > 44 // trip the return and sample >>>>> the >>>>> > allocation. >>>>> > >>>>> > 45 // slow_path_end is used to >>>>> track >>>>> > if a fast tlab refill occured >>>>> > >>>>> > 46 // between slowpath calls. >>>>> > >>>>> > There might be other comments too, I have not >>>>> looked >>>>> > in detail. >>>>> > >>>>> > This was the only spot that still had an actual_end, >>>>> I >>>>> > fixed it now. I'll do a sweep to double check other >>>>> > comments. >>>>> > >>>>> > >>>>> > >>>>> > Not sure it's better but before >>>>> updating >>>>> > the webrev, I wanted to try >>>>> > to get input/consensus :) >>>>> > >>>>> > (Note hard_end was always further off >>>>> > than end). >>>>> > >>>>> > >>>>> src/hotspot/share/prims/jvmti.xml: >>>>> > >>>>> > 10357 >>>> > id="can_sample_heap" since="9"> >>>>> > 10358 >>>>> > 10359 Can sample the >>>>> heap. >>>>> > 10360 If this >>>>> capability >>>>> > is enabled then the heap sampling >>>>> > methods >>>>> > can be called. >>>>> > 10361 >>>>> > 10362 >>>>> > >>>>> > Looks like this capability should >>>>> > not be "since 9" if it gets >>>>> integrated >>>>> > now. >>>>> > >>>>> > Updated now to 11, crossing my >>>>> fingers :) >>>>> > >>>>> > >>>>> src/hotspot/share/runtime/heapMonitoring.cpp: >>>>> > >>>>> > 448 if >>>>> > (is_alive->do_object_b(value)) { >>>>> > 449 // Update the oop >>>>> to >>>>> > point to the new object if it is >>>>> still >>>>> > alive. >>>>> > 450 >>>>> f->do_oop(&(trace.obj)); >>>>> > 451 >>>>> > 452 // Copy the old >>>>> > trace, if it is still live. >>>>> > 453 >>>>> > >>>>> _allocated_traces->at_put(curr_pos++, trace); >>>>> > 454 >>>>> > 455 // Store the live >>>>> > trace in a cache, to be served >>>>> up on >>>>> > /heapz. >>>>> > 456 >>>>> > >>>>> _traces_on_last_full_gc->append(trace); >>>>> > 457 >>>>> > 458 count++; >>>>> > 459 } else { >>>>> > 460 // If the old >>>>> trace >>>>> > is no longer live, add it to >>>> >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Fri Apr 13 00:00:45 2018 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 12 Apr 2018 17:00:45 -0700 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> Message-ID: <7B7F73D7-740A-40C0-B66C-06F1AB182095@oracle.com> Looks good to me. igor > On Apr 12, 2018, at 11:24 AM, Doug Simon wrote: > > Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. > > Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. > > The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. > > http://cr.openjdk.java.net/~dnsimon/8187490/ > > -Doug From mandy.chung at oracle.com Fri Apr 13 04:04:22 2018 From: mandy.chung at oracle.com (mandy chung) Date: Fri, 13 Apr 2018 12:04:22 +0800 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> Message-ID: <1cc549db-fc60-d0e3-8405-17d23701288b@oracle.com> On 4/13/18 2:24 AM, Doug Simon wrote: > Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. > > Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. > > The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. > > http://cr.openjdk.java.net/~dnsimon/8187490/ > > This looks okay.?? jdk.internal.vm.compiler.management will become empty until the next Graal update and I guess this makes it easier to make Graal change in the upstream project.?? This is fine with me. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Apr 13 05:15:08 2018 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Apr 2018 15:15:08 +1000 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> Message-ID: Hi Doug, Not a review. :) Just wondering what HotSpotRuntimeMBean has to do with this ??? Thanks, David On 13/04/2018 4:24 AM, Doug Simon wrote: > Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. > > Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. > > The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. > > http://cr.openjdk.java.net/~dnsimon/8187490/ > > -Doug > From Alan.Bateman at oracle.com Fri Apr 13 06:27:25 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 13 Apr 2018 07:27:25 +0100 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> Message-ID: On 12/04/2018 19:24, Doug Simon wrote: > Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. > > Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. > > The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. > > http://cr.openjdk.java.net/~dnsimon/8187490/ > This looks okay to me. -Alan From doug.simon at oracle.com Fri Apr 13 07:12:08 2018 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 13 Apr 2018 09:12:08 +0200 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> Message-ID: <9F2671A0-BD33-4081-8829-1E93CA622FDD@oracle.com> > On 13 Apr 2018, at 07:15, David Holmes wrote: > > Hi Doug, > > Not a review. :) Just wondering what HotSpotRuntimeMBean has to do with this ??? These are the non-Graal code base changes needed to move the bean out of the jdk.internal.vm.compiler module. The rest of the changes will come in the next Graal update. If you'd like, I can defer pushing these changes until the Graal changes land on github so that the complete change can be reviewed. -Doug > On 13/04/2018 4:24 AM, Doug Simon wrote: >> Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. >> Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. >> The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. >> http://cr.openjdk.java.net/~dnsimon/8187490/ >> -Doug From david.holmes at oracle.com Fri Apr 13 13:59:22 2018 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Apr 2018 23:59:22 +1000 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: <9F2671A0-BD33-4081-8829-1E93CA622FDD@oracle.com> References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> <9F2671A0-BD33-4081-8829-1E93CA622FDD@oracle.com> Message-ID: <1961d59c-e37c-f98d-5442-4d857e74faf5@oracle.com> On 13/04/2018 5:12 PM, Doug Simon wrote: > > >> On 13 Apr 2018, at 07:15, David Holmes wrote: >> >> Hi Doug, >> >> Not a review. :) Just wondering what HotSpotRuntimeMBean has to do with this ??? > > These are the non-Graal code base changes needed to move the bean out of the jdk.internal.vm.compiler module. The rest of the changes will come in the next Graal update. If you'd like, I can defer pushing these changes until the Graal changes land on github so that the complete change can be reviewed. So we seem to have both HotSpotRuntimeMBean and HotspotRuntimeMBean (note small 's') defined in the source code! That seems to be a bad thing to me! I was wondering what this had to do with the small 's' HotspotRuntimeMBean - and the answer seems to be "nothing"! David > -Doug > >> On 13/04/2018 4:24 AM, Doug Simon wrote: >>> Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. >>> Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. >>> The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. >>> http://cr.openjdk.java.net/~dnsimon/8187490/ >>> -Doug > From doug.simon at oracle.com Fri Apr 13 14:05:52 2018 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 13 Apr 2018 16:05:52 +0200 Subject: RFR: 8187490: HotSpotRuntimeMBean should be moved to Graal management module In-Reply-To: <1961d59c-e37c-f98d-5442-4d857e74faf5@oracle.com> References: <324AC8E7-9737-4DB9-875E-A7FDB551D75E@oracle.com> <9F2671A0-BD33-4081-8829-1E93CA622FDD@oracle.com> <1961d59c-e37c-f98d-5442-4d857e74faf5@oracle.com> Message-ID: > On 13 Apr 2018, at 15:59, David Holmes wrote: > > On 13/04/2018 5:12 PM, Doug Simon wrote: >>> On 13 Apr 2018, at 07:15, David Holmes wrote: >>> >>> Hi Doug, >>> >>> Not a review. :) Just wondering what HotSpotRuntimeMBean has to do with this ??? >> These are the non-Graal code base changes needed to move the bean out of the jdk.internal.vm.compiler module. The rest of the changes will come in the next Graal update. If you'd like, I can defer pushing these changes until the Graal changes land on github so that the complete change can be reviewed. > > So we seem to have both HotSpotRuntimeMBean and HotspotRuntimeMBean (note small 's') defined in the source code! That seems to be a bad thing to me! I was wondering what this had to do with the small 's' HotspotRuntimeMBean - and the answer seems to be "nothing"! There is actually no HotSpotRuntimeMBean.java source - it's HotSpotGraalMBean.java. I've renamed the issue and fixed the naming in the description. -Doug >> -Doug >>> On 13/04/2018 4:24 AM, Doug Simon wrote: >>>> Please review this change that removes the existing Graal service provider for hooking into the Platform MBean Server and makes jdk.internal.vm.compiler.management an upgradeable module. >>>> Please refer to https://bugs.openjdk.java.net/browse/JDK-8187490?focusedCommentId=14170942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14170942 for discussion on the latter point. >>>> The Graal changes that dynamically register an MBean for accessing Graal will be part of a subsequent Graal update. >>>> http://cr.openjdk.java.net/~dnsimon/8187490/ >>>> -Doug From serguei.spitsyn at oracle.com Fri Apr 13 19:07:35 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 13 Apr 2018 12:07:35 -0700 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From yasuenag at gmail.com Sat Apr 14 14:09:21 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Sat, 14 Apr 2018 23:09:21 +0900 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> Message-ID: Hi Jini, ClhsdbCDSCore.java: Can this test work on modern Linux? AFAIK modern Linux contains systemd-coredump to gather core images. So I concern ClhsdbCDSCore.java fails in the future. Thanks, Yasumasa On 2018/04/12 13:21, Jini George wrote: > Ping: Gentle reminder ! > > Thanks, > Jini. > > On 4/6/2018 9:51 PM, Jini George wrote: >> Hello! >> >> Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 >> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >> >> While trying to identify the type given an address, a WrongTypeException >> was getting thrown with various clhsdb commands (like printmdo, jstack, >> etc). This was since SA tries to map an address to a hotspot C++ type by >> comparing the vtable address to the vtable address values of known >> types. With CDS, since the vtables are copied over for the Metadata >> classes, the vtable addresses themselves don't match (though, of course, >> the contents will), and SA errors out. >> >> The fix has been implemented by making changes to read in the md region >> (consisting of the c++ vtables) of the CDS archive in SA, and mapping >> the vtable addresses to the corresponding metadata type (ConstantPool, >> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >> >> For corefiles, an additional modification has been done to have the >> replicated FileMapHeader structure (from >> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >> ps_core.c), to be in sync with the corresponding definition in >> src/hotspot/share/memory/filemap.hpp. >> >> Test cases to test both live and corefile debugging are being added with >> this. These and other SA tests pass on Mach5. >> >> Thanks, >> Jini. > From mandy.chung at oracle.com Sun Apr 15 06:23:49 2018 From: mandy.chung at oracle.com (mandy chung) Date: Sun, 15 Apr 2018 14:23:49 +0800 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes Message-ID: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Background: Java agents support both load time and dynamic instrumentation. At load time, the agent's ClassFileTransformer is invoked to transform class bytes.? There is no Class objects at this time.? Dynamic instrumentation is when redefineClasses or retransformClasses is used to redefine an existing loaded class.? The ClassFileTransformer is invoked with class bytes where the Class object is present. Java agent doing instrumentation needs a means to define auxiliary classes that are visible and accessible to the instrumented class. Existing agents have been using sun.misc.Unsafe::defineClass to define aux classes directly or accessing protected ClassLoader::defineClass method with setAccessible to suppress the language access check (see [1] where this issue was brought up). Instrumentation::appendToBootstrapClassLoaderSearch and appendToSystemClassLoaderSearch APIs are existing means to supply additional classes.? It's too limited for example it can't inject a class in the same runtime package as the class being transformed. Proposal: This proposes to add a new ClassFileTransformer.transform method taking additional ClassDefiner parameter.? A transformer can define additional classes during the transformation process, i.e. when ClassFileTransformer::transform is invoked. Some details: 1. ClassDefiner::defineClass defines a class in the same runtime package ?? as the class being transformed. 2. The class is defined in the same thread as the transformers are being ?? invoked.?? ClassDefiner::defineClass returns Class object directly ?? before the transformed class is defined. 3. No transformation is applied to classes defined by ClassDefiner::defineClass. The first prototype we did is to collect the auxiliary classes and define them? until all transformers are invoked and have these aux classes to go through the transformation pipeline.? Several complicated issues would need to be resolved for example timing whether the auxiliary classes should be defined before the transformed class (otherwise a potential race where some other thread references the transformed class and cause the code to execute that in turn reference the auxiliary classes.? The current implementation has a native reentrancy check that ensure one class is being transformed to avoid potential circularity issues.? This may need JVM TI support to be reliable. This proposal would allow java agents to migrate from internal API and ClassDefiner to be enhanced in the future. Webrev: http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ Mandy [1] http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.daugherty at oracle.com Sun Apr 15 17:01:24 2018 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sun, 15 Apr 2018 13:01:24 -0400 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> Message-ID: <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> On 4/13/18 3:07 PM, serguei.spitsyn at oracle.com wrote: > Andrew and reviewers, > > I'm re-sending this RFR with a corrected subject that includes the bug > number. > > The issues is: > _https://bugs.openjdk.java.net/browse/JDK-8201409_ > > > Webrev: > _http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/_ > src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c ??? No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h ??? No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c ??? So now pauses in debugLoop_run() before the loop ??? that reads cmds. Looks good. src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c ??? So the VM_INIT event handler now signals that we have ??? received the VM_INIT event so that allows debugLoop_run() ??? to proceed. Serguei, this fix needs to have the most of the Serviceability stack of tests run against it (jdwp, JVM/TI, JDI and jdb tests). Based on the email thread, I can't tell which tests have been run with the fix in place. Dan > > > The fix looks good to me. > Also, I've agreed to skip a unit test as creating it for this issue is > not easy. > > At least, one more review is needed before the fix can be pushed. > > Thanks, > Serguei > > > On 4/11/18 06:33, Andrew Leonard wrote: >> Hi Serguei, >> Thank you for raising the bug. >> I had a chat with one of my colleagues who could recreate it, and >> it's probably related to the handshaking that is done in the >> particular scenario. So with the JCK harness: >> >> com.sun.jck.lib.ExecJCKTestOtherJVMCmd >> LD_LIBRARY_PATH=/_javatest_/_lib_/_jck_/jck8b/natives/linux_x86-64 >> /projects/_jck_/jdwp/j2sdk-image/bin/java -Xdump:system:none >> -Xdump:system:events=_gpf_+abort+_traceassert_+_corruptcache_ >> -Xdump:snap:none >> -Xdump:snap:events=_gpf_+abort+_traceassert_+_corruptcache_ >> -Xdump:java:none >> -Xdump:java:events=_gpf_+abort+_traceassert_+_corruptcache_ >> -Xdump:heap:none >> -Xdump:heap:events=_gpf_+abort+_traceassert_+_corruptcache_ >> -_Xfuture_ >> -agentlib:jdwp=server=y,transport=dt_socket,address=_localhost_:35000,suspend=y >> -_classpath_ /_javatest_/_lib_/_jck_/JCK8b-b03/JCK-runtime-8b/classes >> -Djava.security.policy=/_javatest_/_lib_/_jck_/JCK8b-b03/JCK-runtime-8b/_lib_/jck.policy >> javasoft.sqe.jck.lib.jpda.jdwp.DebuggeeLoader -_waittime_=600 >> -msgSwitch=ub1604x64vm10:38636 >> -componentName=*ArrayReference.GetValues.getvalues002* >> >> Note that the JCK test harness starts the target process, attaches to >> it, and sends the resume command >> in a very short time with no handshaking. >> >> That may not help..but hopefully helps explain things a bit? It's the >> timing of the resume command during the test that is crucial, >> resuming before the VM initialization is complete will trigger it. >> >> Thanks >> Andrew >> >> Andrew Leonard >> Java Runtimes Development >> IBM Hursley >> IBM United Kingdom Ltd >> Phone internal: 245913, external: 01962 815913 >> internet email: andrew_m_leonard at uk.ibm.com >> >> >> >> >> From: "serguei.spitsyn at oracle.com" >> To: Andrew Leonard >> Cc: serviceability-dev at openjdk.java.net >> Date: 11/04/2018 09:57 >> Subject: Re: RFR: Fix race condition in jdwp >> ------------------------------------------------------------------------ >> >> >> >> Hi Andrew, >> >> I've filed the bug: >> _https://bugs.openjdk.java.net/browse/JDK-8201409_ >> >> >> Also, this is a webrev with your patch: >> _http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/_ >> >> >> I agree that creating a standalone test is tricky here. >> >> I've added usleep(10000) into the eventHelper_reportVMInit() >> and ran the JTreg com/sun/jdi tests with my JDK build. >> However, none of the tests failed with the failure mode you described. >> So that I'm puzzled a little bit. >> I suspect that some specific debugLoop commands were used in your >> scenario. >> >> It is still possible that I've missed something here. >> Will try to double check everything. >> >> Thanks, >> Serguei >> >> >> On 4/11/18 01:29, Andrew Leonard wrote: >> Thanks Serguei, >> I terms of a standalone testcase it is quite tricky, as due to the >> nature of the issue which took a lot of investigation to solve it's >> very timing dependent and will only occur randomly. It can be forced >> as I indicated below by adding a "sleep" in the VMInit report code >> but that's not a testcase, however the issue was originally found in >> our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, >> but again only happened intermittently. Sort of like "performance" >> type issues we're not always going to be able to create a testcase >> that will always "fail" if the fix is not present. >> Your thoughts? >> Cheers >> Andrew >> >> Andrew Leonard >> Java Runtimes Development >> IBM Hursley >> IBM United Kingdom Ltd >> Phone internal: 245913, external: 01962 815913 >> internet email: _andrew_m_leonard at uk.ibm.com_ >> >> >> >> >> >> From: _"serguei.spitsyn at oracle.com"_ >> __ >> >> To: ? ? ? ?Andrew Leonard __ >> >> Cc: _serviceability-dev at openjdk.java.net_ >> >> Date: ? ? ? ?11/04/2018 01:02 >> Subject: ? ? ? ?Re: RFR: Fix race condition in jdwp >> ------------------------------------------------------------------------ >> >> >> >> Hi Andrew, >> >> Okay, I'll file a bug on this topic. >> But do you have a standalone test demonstrating this issue? >> >> Thanks, >> Serguei >> >> >> On 4/10/18 06:23, Andrew Leonard wrote: >> Hi Serguei, >> I don't have access to the bug database to raise one, are you able to >> please? >> >> Summary: JDWP debugger initialization hangs intermittently >> Description: If during the JDWP setup initialization the VM >> initialization takes slightly longer than the main debug >> initialization thread a "hang" situation can occur. This has been >> seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated >> easily by adding a 10 second sleep to the beginning of the >> src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method >> eventHelper_reportVMInit() . >> First seen: JDK8 >> Recreated: JDK11 >> >> Thanks >> Andrew >> >> Andrew Leonard >> Java Runtimes Development >> IBM Hursley >> IBM United Kingdom Ltd >> Phone internal: 245913, external: 01962 815913 >> internet email: _andrew_m_leonard at uk.ibm.com_ >> >> >> >> >> >> From: _"serguei.spitsyn at oracle.com"_ >> __ >> >> To: ? ? ? ?Andrew Leonard __ >> , >> _serviceability-dev at openjdk.java.net_ >> >> Date: ? ? ? ?09/04/2018 23:03 >> Subject: ? ? ? ?Re: RFR: Fix race condition in jdwp >> ------------------------------------------------------------------------ >> >> >> >> Hi Andrew, >> >> The patch itself looks reasonable. >> However, in order to proceed with it, a bug report with a standalone >> test case demonstrating the issue is needed. >> >> Thanks, >> Serguei >> >> >> On 4/9/18 09:07, Andrew Leonard wrote: >> > Hi, >> > We discovered in our testing with OpenJ9 that a race condition can >> > occur in the jdwp under certain circumstances, and we were able to >> > force the same issue with Hotspot. Normally, the event helper thread >> > suspends all threads, then the debug loop in the listener thread >> > receives a command to resume. The debugger may deadlock if the debug >> > loop in the listener thread starts processing commands (e.g. resume >> > threads) before the event helper completes the initialization (and >> > suspends threads). >> > >> > This patch adds synchronization to ensure the event helper completes >> > the initialization sequence before debugger commands are processed. >> > >> > Please can I find a sponsor for this contribution? Patch below.. >> > >> > Many thanks >> > >> > Andrew >> > >> > >> > >> > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c >> > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c >> > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c >> > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c >> > @@ -1,5 +1,5 @@ >> > ?/* >> > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights >> > reserved. >> > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights >> > reserved. >> > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> > ? * >> > ? * This code is free software; you can redistribute it and/or >> modify it >> > @@ -58,6 +58,7 @@ >> > ?static jboolean vmInitialized; >> > ?static jrawMonitorID initMonitor; >> > ?static jboolean initComplete; >> > +static jboolean VMInitComplete; >> > ?static jbyte currentSessionID; >> > >> > ?/* >> > @@ -617,6 +618,35 @@ >> > ?debugMonitorExit(initMonitor); >> > ?} >> > >> > +/* >> > + * Signal VM initialization is complete. >> > + */ >> > +void >> > +signalVMInitComplete(void) >> > +{ >> > + ? ?/* >> > + * VM Initialization is complete >> > + */ >> > + ? ?LOG_MISC(("signal VM initialization complete")); >> > + ? ?debugMonitorEnter(initMonitor); >> > + ? ?VMInitComplete = JNI_TRUE; >> > + ? ?debugMonitorNotifyAll(initMonitor); >> > + ? ?debugMonitorExit(initMonitor); >> > +} >> > + >> > +/* >> > + * Wait for VM initialization to complete. >> > + */ >> > +void >> > +debugInit_waitVMInitComplete(void) >> > +{ >> > + ? ?debugMonitorEnter(initMonitor); >> > + ? ?while (!VMInitComplete) { >> > + ? ?debugMonitorWait(initMonitor); >> > + ? ?} >> > + ? ?debugMonitorExit(initMonitor); >> > +} >> > + >> > ?/* All process exit() calls come from here */ >> > ?void >> > ?forceExit(int exit_code) >> > @@ -672,6 +702,7 @@ >> > ?LOG_MISC(("Begin initialize()")); >> > ?currentSessionID = 0; >> > ?initComplete = JNI_FALSE; >> > + ? ?VMInitComplete = JNI_FALSE; >> > >> > ?if ( gdata->vmDead ) { >> > ? ? ?EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); >> > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h >> > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h >> > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h >> > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h >> > @@ -1,5 +1,5 @@ >> > ?/* >> > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights >> > reserved. >> > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights >> > reserved. >> > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> > ? * >> > ? * This code is free software; you can redistribute it and/or >> modify it >> > @@ -39,4 +39,7 @@ >> > ?void debugInit_exit(jvmtiError, const char *); >> > ?void forceExit(int); >> > >> > +void debugInit_waitVMInitComplete(void); >> > +void signalVMInitComplete(void); >> > + >> > ?#endif >> > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c >> > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c >> > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c >> > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c >> > @@ -1,5 +1,5 @@ >> > ?/* >> > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights >> > reserved. >> > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights >> > reserved. >> > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> > ? * >> > ? * This code is free software; you can redistribute it and/or >> modify it >> > @@ -98,6 +98,7 @@ >> > ?standardHandlers_onConnect(); >> > ?threadControl_onConnect(); >> > >> > + ? ?debugInit_waitVMInitComplete(); >> > ?/* Okay, start reading cmds! */ >> > ?while (shouldListen) { >> > ? ? ?if (!dequeue(&p)) { >> > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c >> > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c >> > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c >> > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c >> > @@ -1,5 +1,5 @@ >> > ?/* >> > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights >> > reserved. >> > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights >> > reserved. >> > ? * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> > ? * >> > ? * This code is free software; you can redistribute it and/or >> modify it >> > @@ -580,6 +580,7 @@ >> > ?(void)threadControl_suspendThread(command->thread, JNI_FALSE); >> > ?} >> > >> > + ? ?signalVMInitComplete(); >> > ?outStream_initCommand(&out, uniqueID(), 0x0, >> > ?JDWP_COMMAND_SET(Event), >> > ?JDWP_COMMAND(Event, Composite)); >> > >> > >> > >> > Andrew Leonard >> > Java Runtimes Development >> > IBM Hursley >> > IBM United Kingdom Ltd >> > Phone internal: 245913, external: 01962 815913 >> > internet email: _andrew_m_leonard at uk.ibm.com_ >> >> > >> > >> > Unless stated otherwise above: >> > IBM United Kingdom Limited - Registered in England and Wales with >> > number 741598. >> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire >> PO6 3AU >> >> >> >> >> >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with >> number 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire >> PO6 3AU >> >> >> >> >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with >> number 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire >> PO6 3AU >> >> >> >> >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with >> number 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire >> PO6 3AU > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Apr 16 06:10:17 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sun, 15 Apr 2018 23:10:17 -0700 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon Apr 16 06:13:51 2018 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Apr 2018 16:13:51 +1000 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: Hi Mandy, How do you handle dependencies across a set of auxiliary types, i.e if you are defining a new class A, and it depends on B (which you also intend to define), how will the classloader resolve references from A to B when it can't load B itself? Is the key capability here the ability to inject a class into a package that would otherwise not be open to it? What are the security implications here? Thanks, David On 15/04/2018 4:23 PM, mandy chung wrote: > Background: > > Java agents support both load time and dynamic instrumentation. At load > time, > the agent's ClassFileTransformer is invoked to transform class bytes. > There is > no Class objects at this time.? Dynamic instrumentation is when > redefineClasses > or retransformClasses is used to redefine an existing loaded class.? The > ClassFileTransformer is invoked with class bytes where the Class object > is present. > > Java agent doing instrumentation needs a means to define auxiliary classes > that are visible and accessible to the instrumented class. Existing agents > have been using sun.misc.Unsafe::defineClass to define aux classes directly > or accessing protected ClassLoader::defineClass method with setAccessible to > suppress the language access check (see [1] where this issue was brought > up). > > Instrumentation::appendToBootstrapClassLoaderSearch and > appendToSystemClassLoaderSearch > APIs are existing means to supply additional classes.? It's too limited > for example it can't inject a class in the same runtime package as the class > being transformed. > > Proposal: > > This proposes to add a new ClassFileTransformer.transform method taking > additional ClassDefiner parameter.? A transformer can define additional > classes during the transformation process, i.e. > when ClassFileTransformer::transform is invoked. Some details: > > 1. ClassDefiner::defineClass defines a class in the same runtime package > ?? as the class being transformed. > 2. The class is defined in the same thread as the transformers are being > ?? invoked.?? ClassDefiner::defineClass returns Class object directly > ?? before the transformed class is defined. > 3. No transformation is applied to classes defined by > ClassDefiner::defineClass. > > The first prototype we did is to collect the auxiliary classes and define > them? until all transformers are invoked and have these aux classes to go > through the transformation pipeline.? Several complicated issues would > need to be resolved for example timing whether the auxiliary classes should > be defined before the transformed class (otherwise a potential race where > some other thread references the transformed class and cause the code to > execute that in turn reference the auxiliary classes.? The current > implementation has a native reentrancy check that ensure one class is being > transformed to avoid potential circularity issues.? This may need JVM TI > support to be reliable. > > This proposal would allow java agents to migrate from internal API and > ClassDefiner to be enhanced in the future. > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ > > Mandy > [1] http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html From mandy.chung at oracle.com Mon Apr 16 07:17:26 2018 From: mandy.chung at oracle.com (mandy chung) Date: Mon, 16 Apr 2018 15:17:26 +0800 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: <4ea48d91-ca79-e69b-05a4-384060e13a98@oracle.com> On 4/16/18 2:13 PM, David Holmes wrote: > Hi Mandy, > > How do you handle dependencies across a set of auxiliary types, i.e if > you are defining a new class A, and it depends on B (which you also > intend to define), how will the classloader resolve references from A > to B when it can't load B itself? It'll be the java agent's responsibility to arrange that.? The classloader does not find any class for the dynamic generated classes.? For example, if A is a subclass or subinterface of B (direct or indirect), B has to be defined before A or ensure that B is injected in the class path and visible to the class loader. This API is specific for instrumentation similar to the compiler that sometimes needs to define anonymous or bridge classes and they are typically quite straight forward.?? The existing java agent has been using sun.misc.Unsafe::defineClass and so this defineClass form should be adequate. I mention a possible future enhancement to allow the auxiliary classes to be defined together with the transformed classes and allowing them to go through the transformation pipeline.?? It could look into relaxing this constraint. > > Is the key capability here the ability to inject a class into a > package that would otherwise not be open to it? The public API defining a class to a given runtime package is via protected ClassLoader::defineClass method or Lookup::defineClass but these methods don't satisfy the java agent use cases to aid an instrumentation.? Rafael summarized it quite well in [1] and a brief note: ClassLoader::defineClass can be used from a custom class loader but it won't be able to define a class in the same runtime package as the instrumented class.? Lookup::defineClass requires a Class object as the lookup class while Class object is not available when doing transformation at load time. > What are the security implications here? The java agent has to be enabled explicitly.? To use this API, the java agent must obtain the Instrumentation object and add the ClassFileTransformer.?? The java agent can be started by -javaagent option at startup or be put on the classpath such that the java agent can be dynamically loaded via the attach mechansim. Mandy > > Thanks, > David > > On 15/04/2018 4:23 PM, mandy chung wrote: >> Background: >> >> Java agents support both load time and dynamic instrumentation. At >> load time, >> the agent's ClassFileTransformer is invoked to transform class >> bytes.? There is >> no Class objects at this time.? Dynamic instrumentation is when >> redefineClasses >> or retransformClasses is used to redefine an existing loaded class.? The >> ClassFileTransformer is invoked with class bytes where the Class >> object is present. >> >> Java agent doing instrumentation needs a means to define auxiliary >> classes >> that are visible and accessible to the instrumented class. Existing >> agents >> have been using sun.misc.Unsafe::defineClass to define aux classes >> directly >> or accessing protected ClassLoader::defineClass method with >> setAccessible to >> suppress the language access check (see [1] where this issue was >> brought up). >> >> Instrumentation::appendToBootstrapClassLoaderSearch and >> appendToSystemClassLoaderSearch >> APIs are existing means to supply additional classes.? It's too limited >> for example it can't inject a class in the same runtime package as >> the class >> being transformed. >> >> Proposal: >> >> This proposes to add a new ClassFileTransformer.transform method >> taking additional ClassDefiner parameter.? A transformer can define >> additional >> classes during the transformation process, i.e. >> when ClassFileTransformer::transform is invoked. Some details: >> >> 1. ClassDefiner::defineClass defines a class in the same runtime package >> ??? as the class being transformed. >> 2. The class is defined in the same thread as the transformers are being >> ??? invoked.?? ClassDefiner::defineClass returns Class object directly >> ??? before the transformed class is defined. >> 3. No transformation is applied to classes defined by >> ClassDefiner::defineClass. >> >> The first prototype we did is to collect the auxiliary classes and >> define >> them? until all transformers are invoked and have these aux classes >> to go >> through the transformation pipeline.? Several complicated issues would >> need to be resolved for example timing whether the auxiliary classes >> should >> be defined before the transformed class (otherwise a potential race >> where >> some other thread references the transformed class and cause the code to >> execute that in turn reference the auxiliary classes.? The current >> implementation has a native reentrancy check that ensure one class is >> being >> transformed to avoid potential circularity issues.? This may need JVM TI >> support to be reliable. >> >> This proposal would allow java agents to migrate from internal API >> and ClassDefiner to be enhanced in the future. >> >> Webrev: >> http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ >> >> Mandy >> [1] >> http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Mon Apr 16 07:11:28 2018 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Mon, 16 Apr 2018 07:11:28 +0000 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: Hi David, >From my perspective as a heavy user of this API in Unsafe today: >From my experience, the recursive dependency is not a common problem as auxiliary types normally only reference the instrumented type. Beyond, this problem is an issue for such explicit dynamic class loading, also when using a method handle lookup. I also think it is essential to being able to inject a class into a non-opened package. Otherwise one could use a method handle. When instrumenting classes one already has full privilege for all packages as an agent can open modules. If one opens packages to define a class, this would however open the package to the agent's module which is typically the unnamed module. As this cannot be undone, such a requirement might therefore be counter-productive to its intend. Best regards, Rafael David Holmes schrieb am Mo., 16. Apr. 2018, 08:13: > Hi Mandy, > > How do you handle dependencies across a set of auxiliary types, i.e if > you are defining a new class A, and it depends on B (which you also > intend to define), how will the classloader resolve references from A to > B when it can't load B itself? > > Is the key capability here the ability to inject a class into a package > that would otherwise not be open to it? What are the security > implications here? > > Thanks, > David > > On 15/04/2018 4:23 PM, mandy chung wrote: > > Background: > > > > Java agents support both load time and dynamic instrumentation. At load > > time, > > the agent's ClassFileTransformer is invoked to transform class bytes. > > There is > > no Class objects at this time. Dynamic instrumentation is when > > redefineClasses > > or retransformClasses is used to redefine an existing loaded class. The > > ClassFileTransformer is invoked with class bytes where the Class object > > is present. > > > > Java agent doing instrumentation needs a means to define auxiliary > classes > > that are visible and accessible to the instrumented class. Existing > agents > > have been using sun.misc.Unsafe::defineClass to define aux classes > directly > > or accessing protected ClassLoader::defineClass method with > setAccessible to > > suppress the language access check (see [1] where this issue was brought > > up). > > > > Instrumentation::appendToBootstrapClassLoaderSearch and > > appendToSystemClassLoaderSearch > > APIs are existing means to supply additional classes. It's too limited > > for example it can't inject a class in the same runtime package as the > class > > being transformed. > > > > Proposal: > > > > This proposes to add a new ClassFileTransformer.transform method taking > > additional ClassDefiner parameter. A transformer can define additional > > classes during the transformation process, i.e. > > when ClassFileTransformer::transform is invoked. Some details: > > > > 1. ClassDefiner::defineClass defines a class in the same runtime package > > as the class being transformed. > > 2. The class is defined in the same thread as the transformers are being > > invoked. ClassDefiner::defineClass returns Class object directly > > before the transformed class is defined. > > 3. No transformation is applied to classes defined by > > ClassDefiner::defineClass. > > > > The first prototype we did is to collect the auxiliary classes and define > > them until all transformers are invoked and have these aux classes to go > > through the transformation pipeline. Several complicated issues would > > need to be resolved for example timing whether the auxiliary classes > should > > be defined before the transformed class (otherwise a potential race where > > some other thread references the transformed class and cause the code to > > execute that in turn reference the auxiliary classes. The current > > implementation has a native reentrancy check that ensure one class is > being > > transformed to avoid potential circularity issues. This may need JVM TI > > support to be reliable. > > > > This proposal would allow java agents to migrate from internal API and > > ClassDefiner to be enhanced in the future. > > > > Webrev: > > http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ > > > > Mandy > > [1] > http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Mon Apr 16 14:55:58 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 16 Apr 2018 15:55:58 +0100 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: <3b5b593a-e052-2174-8540-037685326c17@oracle.com> On 16/04/2018 08:11, Rafael Winterhalter wrote: > : > > I also think it is essential to being able to inject a class into a > non-opened package. Otherwise one could use a method handle. When > instrumenting classes one already has full privilege for all packages > as an agent can open modules. If one opens packages to define a class, > this would however open the package to the agent's module which is > typically the unnamed module. As this cannot be undone, such a > requirement might therefore be counter-productive to its intend. > It's important to separate the usages. The Instrumentation object is for tool agents that are doing load time or dynamic instrumentation. Agents have full power to do anything and the accessibility of the classes that they instrument does not matter. With the proposed API, the agent can define classes in the same runtime package as the class being loaded or transformed, it does not matter if that the package is open or not. Libraries doing code injection is different of course. There is no Instrumentation object in the picture for this scenario. Instead libraries must be called with a Lookup that has the appropriate access - that should be the only way that they can inject into runtime packages that aren't open. -Alan From andrew_m_leonard at uk.ibm.com Mon Apr 16 15:05:48 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Mon, 16 Apr 2018 16:05:48 +0100 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> Message-ID: Hi Daniel, Thanks for reviewing this. Just to let you know, I have successfully run all the jdk_core and hotspot/jtreg/serviceability tests with this patch in place. Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "Daniel D. Daugherty" To: "serguei.spitsyn at oracle.com" , Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 15/04/2018 18:00 Subject: Re: RFR: 8201409: JDWP debugger initialization hangs intermittently On 4/13/18 3:07 PM, serguei.spitsyn at oracle.com wrote: Andrew and reviewers, I'm re-sending this RFR with a corrected subject that includes the bug number. The issues is: https://bugs.openjdk.java.net/browse/JDK-8201409 Webrev: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c So now pauses in debugLoop_run() before the loop that reads cmds. Looks good. src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c So the VM_INIT event handler now signals that we have received the VM_INIT event so that allows debugLoop_run() to proceed. Serguei, this fix needs to have the most of the Serviceability stack of tests run against it (jdwp, JVM/TI, JDI and jdb tests). Based on the email thread, I can't tell which tests have been run with the fix in place. Dan The fix looks good to me. Also, I've agreed to skip a unit test as creating it for this issue is not easy. At least, one more review is needed before the fix can be pushed. Thanks, Serguei On 4/11/18 06:33, Andrew Leonard wrote: Hi Serguei, Thank you for raising the bug. I had a chat with one of my colleagues who could recreate it, and it's probably related to the handshaking that is done in the particular scenario. So with the JCK harness: com.sun.jck.lib.ExecJCKTestOtherJVMCmd LD_LIBRARY_PATH=/javatest/lib/jck /jck8b/natives/linux_x86-64 /projects/jck/jdwp/j2sdk-image/bin/java -Xdump:system:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -Xdump:snap:none -Xdump:snap:events=gpf+abort+traceassert+corruptcache -Xdump:java:none -Xdump:java:events=gpf+abort+traceassert+corruptcache -Xdump:heap:none -Xdump:heap:events=gpf+abort+traceassert+corruptcache - Xfuture -agentlib:jdwp=server=y,transport=dt_socket,address=localhost :35000,suspend=y -classpath /javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/classes -Djava.security.policy=/javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/lib/jck.policy javasoft.sqe.jck.lib.jpda.jdwp.DebuggeeLoader -waittime=600 -msgSwitch=ub1604x64vm10:38636 -componentName= ArrayReference.GetValues.getvalues002 Note that the JCK test harness starts the target process, attaches to it, and sends the resume command in a very short time with no handshaking. That may not help..but hopefully helps explain things a bit? It's the timing of the resume command during the test that is crucial, resuming before the VM initialization is complete will trigger it. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 09:57 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, I've filed the bug: https://bugs.openjdk.java.net/browse/JDK-8201409 Also, this is a webrev with your patch: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ I agree that creating a standalone test is tricky here. I've added usleep(10000) into the eventHelper_reportVMInit() and ran the JTreg com/sun/jdi tests with my JDK build. However, none of the tests failed with the failure mode you described. So that I'm puzzled a little bit. I suspect that some specific debugLoop commands were used in your scenario. It is still possible that I've missed something here. Will try to double check everything. Thanks, Serguei On 4/11/18 01:29, Andrew Leonard wrote: Thanks Serguei, I terms of a standalone testcase it is quite tricky, as due to the nature of the issue which took a lot of investigation to solve it's very timing dependent and will only occur randomly. It can be forced as I indicated below by adding a "sleep" in the VMInit report code but that's not a testcase, however the issue was originally found in our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, but again only happened intermittently. Sort of like "performance" type issues we're not always going to be able to create a testcase that will always "fail" if the fix is not present. Your thoughts? Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 01:02 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, Okay, I'll file a bug on this topic. But do you have a standalone test demonstrating this issue? Thanks, Serguei On 4/10/18 06:23, Andrew Leonard wrote: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.field at oracle.com Mon Apr 16 18:36:21 2018 From: robert.field at oracle.com (Robert Field) Date: Mon, 16 Apr 2018 11:36:21 -0700 Subject: JVMTI retransformation and addition of private methods In-Reply-To: References: <386B2730-5732-4937-B334-411D8E458022@oracle.com> <6ddd571e-11a9-aa64-5ba6-dd98df9a792a@oracle.com> <2bd8bf1c-0328-9604-2661-634f8b1a59f3@oracle.com> <43f6c290-fe2f-f7e2-6e83-21cb11fbf0fe@oracle.com> <38c3949e-c9ac-8419-a5c3-5fac876a5125@oracle.com> Message-ID: <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> Rather than reverse engineer the spec from the hotspot implementation... Capabilities are the mechanism by which the level of functionality is defined.? Capabilities say what can be done, not what can't. The wording "The redefinition must not add, remove or rename fields or methods, change the signatures of methods, change modifiers, or change inheritance. These restrictions may be lifted in future versions." was clearly left from an earlier version.? Probably should not have been there in the first place but certainly should have updated/removed as JVMTI evolved. As written, it explicitly conflicts with the can_redefine_any_class capability: ? can_redefine_any_class??? Can modify (retransform or redefine) any non-primitive non-array class. See IsModifiableClass. as well as conflicting with the implementation of SetNativeMethodPrefixes et. al. and the RI in terms of private/final/static. i would suggest removing the quoted text and adding to the text for the can_redefine_classes capability.?? Maybe something like: ???? can_redefine_classes??? Can redefine classes with RedefineClasses, where the class is non-primitive and non-array and the redefinition does not add, remove or rename fields or methods, change the signatures of methods, change modifiers, or change inheritance. -Robert From serguei.spitsyn at oracle.com Mon Apr 16 18:43:21 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 16 Apr 2018 11:43:21 -0700 Subject: JVMTI retransformation and addition of private methods In-Reply-To: <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> References: <386B2730-5732-4937-B334-411D8E458022@oracle.com> <6ddd571e-11a9-aa64-5ba6-dd98df9a792a@oracle.com> <2bd8bf1c-0328-9604-2661-634f8b1a59f3@oracle.com> <43f6c290-fe2f-f7e2-6e83-21cb11fbf0fe@oracle.com> <38c3949e-c9ac-8419-a5c3-5fac876a5125@oracle.com> <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> Message-ID: <5c1e6cda-a2aa-cd96-192e-7f9a3a41df80@oracle.com> Added this and prev. suggestion to the JDK-8192936. Thansk, Serguei On 4/16/18 11:36, Robert Field wrote: > Rather than reverse engineer the spec from the hotspot implementation... > > Capabilities are the mechanism by which the level of functionality is > defined.? Capabilities say what can be done, not what can't. > > The wording "The redefinition must not add, remove or rename fields or > methods, change the signatures of methods, change modifiers, or change > inheritance. These restrictions may be lifted in future versions." was > clearly left from an earlier version. Probably should not have been > there in the first place but certainly should have updated/removed as > JVMTI evolved. > > As written, it explicitly conflicts with the can_redefine_any_class > capability: > > ? can_redefine_any_class??? Can modify (retransform or redefine) any > non-primitive non-array class. See IsModifiableClass. > > as well as conflicting with the implementation of > SetNativeMethodPrefixes et. al. and the RI in terms of > private/final/static. > > i would suggest removing the quoted text and adding to the text for > the can_redefine_classes capability.?? Maybe something like: > > ???? can_redefine_classes??? Can redefine classes with > RedefineClasses, where the class is non-primitive and non-array and > the redefinition does not add, remove or rename fields or methods, > change the signatures of methods, change modifiers, or change > inheritance. > > -Robert > From serguei.spitsyn at oracle.com Mon Apr 16 18:44:35 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 16 Apr 2018 11:44:35 -0700 Subject: JVMTI retransformation and addition of private methods In-Reply-To: <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> References: <386B2730-5732-4937-B334-411D8E458022@oracle.com> <6ddd571e-11a9-aa64-5ba6-dd98df9a792a@oracle.com> <2bd8bf1c-0328-9604-2661-634f8b1a59f3@oracle.com> <43f6c290-fe2f-f7e2-6e83-21cb11fbf0fe@oracle.com> <38c3949e-c9ac-8419-a5c3-5fac876a5125@oracle.com> <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> Message-ID: <87e65155-fe00-652f-bc7b-741bc97013e1@oracle.com> Added to the JDK-8192936. Thanks, Serguei On 4/16/18 11:36, Robert Field wrote: > Rather than reverse engineer the spec from the hotspot implementation... > > Capabilities are the mechanism by which the level of functionality is > defined.? Capabilities say what can be done, not what can't. > > The wording "The redefinition must not add, remove or rename fields or > methods, change the signatures of methods, change modifiers, or change > inheritance. These restrictions may be lifted in future versions." was > clearly left from an earlier version. Probably should not have been > there in the first place but certainly should have updated/removed as > JVMTI evolved. > > As written, it explicitly conflicts with the can_redefine_any_class > capability: > > ? can_redefine_any_class??? Can modify (retransform or redefine) any > non-primitive non-array class. See IsModifiableClass. > > as well as conflicting with the implementation of > SetNativeMethodPrefixes et. al. and the RI in terms of > private/final/static. > > i would suggest removing the quoted text and adding to the text for > the can_redefine_classes capability.?? Maybe something like: > > ???? can_redefine_classes??? Can redefine classes with > RedefineClasses, where the class is non-primitive and non-array and > the redefinition does not add, remove or rename fields or methods, > change the signatures of methods, change modifiers, or change > inheritance. > > -Robert > From rafael.wth at gmail.com Mon Apr 16 21:06:44 2018 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Mon, 16 Apr 2018 23:06:44 +0200 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: Hei Mandy, I have looked into several Java agents that I have worked on and for many of them, this API does unfortunately not supply sufficient access. I would therefore still prefer a method Instrumentation::defineClass. The problem is that some agents need to define classes in other packages then in that of the instrumented class. For example, I might need to enhance a library that defines a set of callback classes in package A. All these classes share a common super class with a package-private constructor. I want to instrument some class in package B to use a callback that the library does not supply and need to add a new callback class to A. This is not possible using the current API. I could however achieve do so by calling Instrumentation::retransform on one of the classes in A after registering a class file transformer. Once the retransformation is triggered, I can now define a class in A. Of course this is inefficient and I would rather open the jdk.internal.misc module and use the "old" API instead. For this reason, I argue that this rather restrained API is not convenient while it does not add anything to security. Also, for the use case of Mockito, this would neither be sufficient as Mockito sometimes redefines classes and sometimes adds a subclass without retransforming. We would rather have direct access to class definition once we are already running with the privileges of a Java agent. I would therefore suggest to add a method: interface Instrumentation { Class defineClass(byte[] bytes, ProtectionDomain pd); } which can be implemented simply by delegating to jdk.internal.misc.Unsafe. On a side note. Does JavaLangAccess::defineClass work with the bootstrap class loader? I have not tried it but I always thought it was just an access layer for the class loader API that cannot access the null value. Thanks for considering this use case! Best regards, Rafael 2018-04-15 8:23 GMT+02:00 mandy chung : > Background: > > Java agents support both load time and dynamic instrumentation. At load > time, > the agent's ClassFileTransformer is invoked to transform class bytes. > There is > no Class objects at this time. Dynamic instrumentation is when > redefineClasses > or retransformClasses is used to redefine an existing loaded class. The > ClassFileTransformer is invoked with class bytes where the Class object is > present. > > Java agent doing instrumentation needs a means to define auxiliary classes > that are visible and accessible to the instrumented class. Existing agents > have been using sun.misc.Unsafe::defineClass to define aux classes directly > or accessing protected ClassLoader::defineClass method with setAccessible > to > suppress the language access check (see [1] where this issue was brought > up). > > Instrumentation::appendToBootstrapClassLoaderSearch and > appendToSystemClassLoaderSearch > APIs are existing means to supply additional classes. It's too limited > for example it can't inject a class in the same runtime package as the > class > being transformed. > > Proposal: > > This proposes to add a new ClassFileTransformer.transform method taking > additional ClassDefiner parameter. A transformer can define additional > classes during the transformation process, i.e. > when ClassFileTransformer::transform is invoked. Some details: > > 1. ClassDefiner::defineClass defines a class in the same runtime package > as the class being transformed. > 2. The class is defined in the same thread as the transformers are being > invoked. ClassDefiner::defineClass returns Class object directly > before the transformed class is defined. > 3. No transformation is applied to classes defined by > ClassDefiner::defineClass. > > The first prototype we did is to collect the auxiliary classes and define > them until all transformers are invoked and have these aux classes to go > through the transformation pipeline. Several complicated issues would > need to be resolved for example timing whether the auxiliary classes > should > be defined before the transformed class (otherwise a potential race where > some other thread references the transformed class and cause the code to > execute that in turn reference the auxiliary classes. The current > implementation has a native reentrancy check that ensure one class is being > transformed to avoid potential circularity issues. This may need JVM TI > support to be reliable. > > This proposal would allow java agents to migrate from internal API and > ClassDefiner to be enhanced in the future. > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ > > Mandy > [1] http://mail.openjdk.java.net/pipermail/jdk-dev/2018- > January/000405.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Apr 17 00:05:28 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 17 Apr 2018 10:05:28 +1000 Subject: JVMTI retransformation and addition of private methods In-Reply-To: <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> References: <386B2730-5732-4937-B334-411D8E458022@oracle.com> <6ddd571e-11a9-aa64-5ba6-dd98df9a792a@oracle.com> <2bd8bf1c-0328-9604-2661-634f8b1a59f3@oracle.com> <43f6c290-fe2f-f7e2-6e83-21cb11fbf0fe@oracle.com> <38c3949e-c9ac-8419-a5c3-5fac876a5125@oracle.com> <528f344c-cbd5-3d59-c254-27f7178fb5e2@oracle.com> Message-ID: <03d6503c-e577-5695-05e2-2b121a9f9b9a@oracle.com> On 17/04/2018 4:36 AM, Robert Field wrote: > Rather than reverse engineer the spec from the hotspot implementation... > > Capabilities are the mechanism by which the level of functionality is > defined.? Capabilities say what can be done, not what can't. > > The wording "The redefinition must not add, remove or rename fields or > methods, change the signatures of methods, change modifiers, or change > inheritance. These restrictions may be lifted in future versions." was > clearly left from an earlier version.? Probably should not have been > there in the first place but certainly should have updated/removed as > JVMTI evolved. > > As written, it explicitly conflicts with the can_redefine_any_class > capability: > > ? can_redefine_any_class??? Can modify (retransform or redefine) any > non-primitive non-array class. See IsModifiableClass. I don't see any conflict there. can_redefine_any_class relates to the sets of classes amenable to redefinition, not to the set of changes that redefinition can make to a given class. Arguably there could have been individual capabilities for each kind of change, but that seems somewhat extreme to me, unless there are good reasons why only subsets of changes would be reasonable to support. The existing "you can do anything except x, y, z" within the spec for redefineClass is coarse but not unreasonable. It is problematic because the exception list is incomplete but that's a different story. > as well as conflicting with the implementation of > SetNativeMethodPrefixes et. al. and the RI in terms of > private/final/static. So two clear cases where the spec should have been updated but wasn't. > i would suggest removing the quoted text and adding to the text for the > can_redefine_classes capability.?? Maybe something like: > > ???? can_redefine_classes??? Can redefine classes with RedefineClasses, > where the class is non-primitive and non-array and the redefinition does > not add, remove or rename fields or methods, change the signatures of > methods, change modifiers, or change inheritance. Moving the text doesn't change the basic problem that the text and implementation are not in agreement. David ----- > -Robert > From mandy.chung at oracle.com Tue Apr 17 07:28:17 2018 From: mandy.chung at oracle.com (mandy chung) Date: Tue, 17 Apr 2018 15:28:17 +0800 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: Hi Rafael, I see that mocking/proxying/testing framework should be looked at separately since its requirements and approaches can be different than tool agents. On 4/17/18 5:06 AM, Rafael Winterhalter wrote: > Hei Mandy, > > I have looked into several Java agents that I have worked on and for > many of them, this API does unfortunately not supply sufficient > access. I would therefore still prefer a method > Instrumentation::defineClass. > > The problem is that some agents need to define classes in other > packages then in that of the instrumented class. For example, I might > need to enhance a library that defines a set of callback classes in > package A. All these classes share a common super class with a > package-private constructor. I want to instrument some class in > package B to use a callback that the library does not supply and need > to add a new callback class to A. This is not possible using the > current API. > Are these callback classes made available statically?? or just dynamically defining additional class as needed?? Is Lookup::defineClass an alternative if you get a hold of common super class in A? > I could however achieve do so by calling Instrumentation::retransform > on one of the classes in A after registering a class file transformer. > Once the retransformation is triggered, I can now define a class in A. > Of course this is inefficient and I would rather open the > jdk.internal.misc module and use the "old" API instead. > > For this reason, I argue that this rather restrained API is not > convenient while it does not add anything to security. Also, for the > use case of Mockito, this would neither be sufficient as Mockito > sometimes redefines classes and sometimes adds a subclass without > retransforming. We would rather have direct access to class definition > once we are already running with the privileges of a Java agent. > > I would therefore suggest to add a method: > > interface Instrumentation { > ? Class defineClass(byte[] bytes, ProtectionDomain pd); > } > > which can be implemented simply by delegating to jdk.internal.misc.Unsafe. > > On a side note. Does JavaLangAccess::defineClass work with the > bootstrap class loader? I have not tried it but I always thought it > was just an access layer for the class loader API that cannot access > the null value. > The JVM entry point does allow null loader. Mandy > Thanks for considering this use case! > Best regards, Rafael > > 2018-04-15 8:23 GMT+02:00 mandy chung >: > > Background: > > Java agents support both load time and dynamic instrumentation.?? > At load time, > the agent's ClassFileTransformer is invoked to transform class > bytes.? There is > no Class objects at this time.? Dynamic instrumentation is when > redefineClasses > or retransformClasses is used to redefine an existing loaded > class.? The > ClassFileTransformer is invoked with class bytes where the Class > object is present. > > Java agent doing instrumentation needs a means to define auxiliary > classes > that are visible and accessible to the instrumented class.? > Existing agents > have been using sun.misc.Unsafe::defineClass to define aux classes > directly > or accessing protected ClassLoader::defineClass method with > setAccessible to > suppress the language access check (see [1] where this issue was > brought up). > > Instrumentation::appendToBootstrapClassLoaderSearch and > appendToSystemClassLoaderSearch > APIs are existing means to supply additional classes.? It's too > limited > for example it can't inject a class in the same runtime package as > the class > being transformed. > > Proposal: > > This proposes to add a new ClassFileTransformer.transform method > taking additional ClassDefiner parameter.? A transformer can > define additional > classes during the transformation process, i.e. > when ClassFileTransformer::transform is invoked. Some details: > > 1. ClassDefiner::defineClass defines a class in the same runtime > package > ?? as the class being transformed. > 2. The class is defined in the same thread as the transformers are > being > ?? invoked.?? ClassDefiner::defineClass returns Class object directly > ?? before the transformed class is defined. > 3. No transformation is applied to classes defined by > ClassDefiner::defineClass. > > The first prototype we did is to collect the auxiliary classes and > define > them? until all transformers are invoked and have these aux > classes to go > through the transformation pipeline.? Several complicated issues > would > need to be resolved for example timing whether the auxiliary > classes should > be defined before the transformed class (otherwise a potential > race where > some other thread references the transformed class and cause the > code to > execute that in turn reference the auxiliary classes. The current > implementation has a native reentrancy check that ensure one class > is being > transformed to avoid potential circularity issues.? This may need > JVM TI > support to be reliable. > > This proposal would allow java agents to migrate from internal API > and ClassDefiner to be enhanced in the future. > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ > > > Mandy > [1] > http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremymanson at google.com Tue Apr 17 19:38:26 2018 From: jeremymanson at google.com (Jeremy Manson) Date: Tue, 17 Apr 2018 19:38:26 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com> <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> Message-ID: +1 to sampling anything the thread is allocating. With the bytecode rewriting based version of this, I had complaints about missing allocations from JNI and APIs like Class.newInstance. (I don't know how the placement of the collectors would affect this, but it did matter). Jeremy On Thu, Apr 12, 2018 at 2:23 PM JC Beyler wrote: > Hi Karen, > > I apologize for sending too many webrevs. I try/tend to iterate fast and > move in an iterative fashion. I also try to solve most, if not all, of the > current items that are requested in one go. Perhaps I failed in doing that > recently? I apologize for that. > > So I promise to not send a new webrev in this email or until I'm pretty > sure I got all the current (And any incoming comment/reviews) handled :-) > > For the points you brought up: > a) What are we sampling? In my mind, I'd rather have the sampler be > sampling anything the thread is allocating and not only sample bytecode > allocations. It turns out that I was focusing on that first to get it up. > As I was stuck in figuring out how to get the VM collector and the sampling > collector to co-exist, there was a bit of issues there. > - That has been solved by now delaying the posting of a sampled > object if a VM collector is present. So now that I've better understood > interactions between collectors and when you could post an event, I'm way > more able to talk about the feasibility and validity of the next item about > bigger objects. > > b) You bring up an excellent point of if we have a multi-array object > or a more complex object (such as a cloned object for example), if the > sampler is tripped on an internal allocation, should we send that smaller > allocation or should we send the bigger object > - Because we get the stacktrace and we only use the oop to figure out > GC information about the liveness of the object in our use-case in the > JVMTI agent, this changes nothing really in practice. I do see value in > sending the multi-array object as a whole to a user. > - If that is what you think is best, I can work on getting that > supported and the multi-array test would then prove that if part of the > multi-array is sampled, the sampler returns the whole multi-array. > > Hopefully that answers your concern on me sending too many webrevs, to > which I sincerely apologize. Probably a learning curve of different > approaches of reviews. And I hope that my other answers do show the > direction you were hoping to see. > > Thanks again for all your help, > Jc > > On Thu, Apr 12, 2018 at 8:15 AM Karen Kinnear > wrote: > >> JC, >> >> >> On Apr 11, 2018, at 8:17 PM, JC Beyler wrote: >> >> Hi Karen, >> >> I put up a new webrev that is feature complete in my mind in terms of >> implementation. There could be a few tid-bits of optimizations here and >> there but I believe everything is now there in terms of features and there >> is the question of placement of collectors (I've now put it lower than what >> you talk about in this email thread). >> >> I believe that the primary goal of your JEP is to catch samples of >> allocation due to bytecodes. Given that, it makes sense to >> put the collectors in the code generators, so I am ok with your leaving >> them where they are. >> >> And it would save us all a lot of cycles if rather than frequent webrev >> updates with subsets of the requested changes - if you >> could wait until you?ve added all the changes people requested - then >> that would increase the signal to noise ratio. >> >> >> The incremental webrev is here: >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/ >> and the full webrev is here: >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/ >> >> The incremental webrev contains the name change of TLAB fields as per the >> conversation going on in the GC thread and the Graal changes to make this >> webrev not break anything. It also contains a test for when VM and sampled >> events are enabled at the same time. >> >> I'll answer your questions inlined below: >> >> >>> >>> I have a couple of design questions before getting to the detailed code >>> review: >>> >>> 1. jvmtiExport.cpp >>> You have simplified the JvmtiObjectAllocEventCollector to assume there >>> is only a single object. >>> Do you have a test that allocates a multi-dimensional array? >>> I would expect that to have multiple subarrays - and need the logic >>> that you removed. >>> >> >> I "think" you misread this. I did not change the implementation of JvmtiObjectAllocEventCollector >> to assume only one object. Actually the implementation is still doing what >> it was doing initially but now JvmtiObjectAllocEventCollector is a main >> class with two inherited behaviors: >> - JvmtiVMObjectAllocEventCollector is following the same logic as >> before >> - JvmtiSampledObjectAllocEventCollector has the logic of a single >> allocation during collection since it is placed that low. I'm thinking of >> perhaps separating the two classes now that the sampled case is so low and >> does not require the extra logic of handling a growable array. >> >> I don't have a test that tests multi-dimensional array. I have not looked >> at what exactly VMObjectAllocEvents track but I did do an example to see: >> >> - On a clean JDK, if I allocate: >> int[][][] myAwesomeTable = new int[10][10][10]; >> >> I get one single VMObject call back. >> >> >> - With my change, I get the same behavior. >> >> So instead, I did an object containing an array and cloned it. With the >> clean JDK I get two VM callbacks with and without my change. >> >> I'll add a test in the next webrev to ensure correctness. >> >> >>> *** Please add a test in which you allocate a multi-dimensional array - >>> and try it with your earlier version which >>> will show the full set of allocated objects >>> >> >> As I said, this works because it is so low in the system. I don't know >> the internals of the allocation of a three-dimensional array but: >> - If I try to collect all allocations required for the new >> int[10][10][10], I get more than a hundred allocation callbacks if the >> sample rate is 0, which is what I'd expect (getting a callback at each >> allocation, so what I expect) >> >> So though I only collect one object, I get a callback for each if that is >> what we want in regard to the sampling rate. >> >> I?ll let you work this one out with Serguei - if he is ok with your >> change to not have a growable array and only report one object, >> given that you are sampling, sounds like that is not a functional loss >> for you. >> >> And yes, please do add the test. >> >> >> >>> 2. Tests - didn?t read them all - just ran into a hardcoded ?10% error >>> ensures a sanity test without becoming flaky? >>> do check with Serguei on this - that looks like a potential future test >>> failure >>> >> >> Because the sampling rate is a geometric variable around a mean of the >> sampling rate, any meaningful test is going to have to be statistical. >> Therefore, I do a bit of error acceptance to allow us to test for the real >> thing and not hack the code to have less "real" tests. This is what we do >> internally, let me know if you want me to do it otherwise. >> >> Flaky is probably a wrong term or perhaps I need to better explain this. >> I'll change the comment in the tests to explain that potentially flakyness >> comes from the nature of the geometrical mean. Because we don't want too >> long running tests, it makes sense to me to have this error percentage. >> >> Let me now answer the comments from the other email here as well so we >> have all answers and conversations in a single thread: >> >>> - Note there is one current caveat: if the agent requests the >>> VMObjectAlloc event, the sampler defers to that event due to a limitation >>> in its implementation (ie: I am not convinced I can safely send out that >>> event with the VM collector enabled, I'll happily white board that). >>> >>> Please work that one out with Serguei and the serviceability folks. >>> >>> >> Agreed. I'll follow up with Serguei if this is a potential problem, I >> have to double check and ensure I am right that there is an issue. I see it >> as just a matter of life and not a problem for now. IF you do want both >> events, having a sample drop due to this limitation does not invalidate the >> system in my mind. I could be wrong about it though and would happily go >> over what I saw. >> >> >> >>>> So the Heap Sampling Monitoring System used to have more methods. It >>>> made sense to have them in a separate category. I now have moved it to the >>>> memory category to be consistent and grouped there. I also removed that >>>> link btw. >>>> >>> Thanks. >>> >> >> Actually it did seem weird to put it there since there was only >> Allocate/Deallocate, so for now the method is still again in its own >> category once again. If someone has a better spot, let me know. >> >> >>>> >>>>> >>>>> I was trying to figure out a way to put the collectors farther down >>>>> the call stack so as to both catch more >>>>> cases and to reduce the maintenance burden - i.e. if you were to add a >>>>> new code generator, e.g. Graal - >>>>> if it were to go through an existing interface, that might be a place >>>>> to already have a collector. >>>>> >>>>> I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp >>>>> - and it appears that there >>>>> are calls to instanceKlass::new_instance, >>>>> oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >>>>> so one possibility would be to put hooks in those calls which would >>>>> catch many? (I did not do a thorough search) >>>>> of the slowpath calls for the bytecodes, and then check the fast paths >>>>> in detail. >>>>> >>>> >>>> I'll come to a major issue with the collector and its placement in the >>>> next paragraph. >>>> >>> Still not clear on why you did not move the collectors into >>> instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray >>> and ArrayKlass::multi-allocate. >>> >> As I said above, I am ok with your leaving the collectors in the code >> generators since that is your focus. >> >> >> I think what was happening is that the collectors would wrap the objects >> in handles but references to the originally allocated object would be on >> the stack still and would not get updated if required. Due to that issue, I >> believe was getting weird bugs. >> >> Because of this, it seems that any VM collector enabled has to guarantee >> that either: >> - its path to destruction (and thus posting of events) has no means of >> triggering a GC that would move things around (as long as you are in VM >> code you should be fine I believe) >> >> - if GC is occuring, the objects in its internal array are not >> somewhere on the stack without a handle around them to be able to be moved >> if need by from a GC operation. >> >> >> I'm not convinced this holds in the multithreaded with sampling and VM >> collection cases. >> >> I will let you and Serguei work out whether you have sufficient test >> coverage for the multithreaded cases. >> >> thanks, >> Karen >> >> >> >>>> >>>>> >>>>> I had wondered if it made sense to move the hooks even farther down, >>>>> into CollectedHeap:obj_allocate and array_allocate. >>>>> I do not think so. First reason is that for multidimensional arrays, >>>>> ArrayKlass::multi_allocate the outer dimension array would >>>>> have an event before storing the inner sub-arrays and I don?t think we >>>>> want that exposed, so that won?t work for arrays. >>>>> >>>> >>>> So the major difficulty is that the steps of collection do this: >>>> >>>> - An object gets allocated and is decided to be sampled >>>> - The original pointer placement (where it resides originally in >>>> memory) is passed to the collector >>>> - Now one important thing of note: >>>> (a) In the VM code, until the point where the oop is going to be >>>> returned, GC is not yet aware of it >>>> >>> (b) so the collector can't yet send it out to the user via JVMTI >>>> otherwise, the agent could put a weak reference for example >>>> >>>> I'm a bit fuzzy on this and maybe it's just that there would be more >>>> heavy lifting to make this possible but my initial tests seem to show >>>> problems when attempting this in the obj_allocate area. >>>> >>> Not sure what you are seeing here - >>> >>> Let me state it the way I understand it. >>> 1) You can collect the object into internal metadata at allocation point >>> - which you already had >>> Note: see comment in JvmtiExport.cpp: >>> // In the case of the sampled object collector, we don?t want to perform >>> the >>> // oops_do because the object in the collector is still stored in >>> registers >>> // on the VM stack >>> - so GC will find these objects as roots, once we allow GC to run, >>> which should be after the header is initialized >>> >>> Totally agree with you that you can not post the event in the source >>> code in which you allocate the memory - keep reading >>> >>> 2) event posting: >>> - you want to ensure that the object has been fully initialized >>> - the object needs to have the header set up - not just the memory >>> allocated - so that applies to all objects >>> (and that is done in a caller of the allocation code - so it can?t be >>> done at the location which does the memory allocation) >>> - the example I pointed out was the multianewarray case - all the >>> subarrays need to be allocated >>> >>> - please add a test case for multi-array, so as you experiment with >>> where to post the event, you ensure that you can >>> access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as >>> subarrays) >>> >> >> Technically it's more than just initialized, it is the fact that you >> cannot perform a callback about an object if any object of that thread is >> being held by a collector and also in a register/stack space without >> protections. >> >> >> >>> - prior to setting up the object header information, GC would not know >>> about the object >>> - was this by chance the issue you ran into? >>> >> >> No I believe the issue I was running into was above where an object on >> the stack was pointing to an oop that got moved during a GC due to that >> thread doing an event callback. >> >> Thanks for your help, >> Jc >> >> >>> 3) event posting >>> - when you post the event to JVMTI >>> - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), >>> which creates JNIHandles::make_local(_thread, obj) >>> >>> >>>> >>>>> >>>>> The second reason is that I strongly suspect the scope you want is >>>>> bytecodes only. I think once you have added hooks >>>>> to all the fast paths and slow paths that this will be pushing the >>>>> performance overhead constraints you proposed and >>>>> you won?t want to see e.g. internal allocations. >>>>> >>>> >>>> Yes agreed, allocations from bytecodes are mostly our concern generally >>>> :) >>>> >>>> >>>>> >>>>> >>>> But I think you need to experiment with the set of allocations (or >>>>> possible alternative sets of allocations) you want recorded. >>>>> >>>>> The hooks I see today include: >>>>> Interpreter: (looking at x86 as a sample) >>>>> - slowpath in InterpreterRuntime >>>>> - fastpath tlab allocation - your new threshold check handles that >>>>> >>>> >>>> Agreed >>>> >>>> >>>>> - allow_shared_alloc (GC specific): for _new isn?t handled >>>>> >>>> >>>> Where is that exactly? I can check why we are not catching it? >>>> >>>> >>>>> >>>>> C1 >>>>> I don?t see changes in c1_Runtime.cpp >>>>> note: you also want to look for the fast path >>>>> >>>> >>>> I added the calls to c1_Runtime in the latest webrev, but was still >>>> going through testing before pushing it out. I had waited on this one a >>>> bit. Fast path would be handled by the threshold check no? >>>> >>>> >>>>> C2: changes in opto/runtime.cpp for slow path >>>>> did you also catch the fast path? >>>>> >>>> >>>> Fast path gets handled by the same threshold check, no? Perhaps I've >>>> missed something (very likely)? >>>> >>>> >>>>> >>>>> 3. Performance - >>>>> After you get all the collectors added - you need to rerun the >>>>> performance numbers. >>>>> >>>> >>>> Agreed :) >>>> >>>> >>>>> >>>>> thanks, >>>>> Karen >>>>> >>>>> On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: >>>>> >>>>> Thanks Boris and Derek for testing it. >>>>> >>>>> Yes I was trying to get a new version out that had the tests ported as >>>>> well but got sidetracked while trying to add tests and two new features. >>>>> >>>>> Here is the incremental webrev: >>>>> >>>>> Here is the full webrev: >>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >>>>> >>>>> Basically, the new tests assert this: >>>>> - Only one agent can currently ask for the sampling, I'm currently >>>>> seeing if I can push to a next webrev the multi-agent support to start >>>>> doing a code freeze on this one >>>>> - The event is not thread-enabled, meaning like the >>>>> VMObjectAllocationEvent, it's an all or nothing event; same as the >>>>> multi-agent, I'm going to see if a future webrev to add the support is a >>>>> better idea to freeze this webrev a bit >>>>> >>>>> There was another item that I added here and I'm unsure this webrev is >>>>> stable in debug mode: I added an assertion system to ascertain that all >>>>> paths leading to a TLAB slow path (and hence a sampling point) have a >>>>> sampling collector ready to post the event if a user wants it. This might >>>>> break a few thing in debug mode as I'm working through the kinks of that as >>>>> well. However, in release mode, this new webrev passes all the tests in >>>>> hotspot/jtreg/serviceability/jvmti/HeapMonitor. >>>>> >>>>> Let me know what you think, >>>>> Jc >>>>> >>>>> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < >>>>> boris.ulasevich at bell-sw.com> wrote: >>>>> >>>>>> Hi JC, >>>>>> >>>>>> I have just checked on arm32: your patch compiles and runs ok. >>>>>> >>>>>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>>>>> correspond to actual library name: libHeapMonitorTest.c -> >>>>>> libHeapMonitorTest.so >>>>>> >>>>>> Boris >>>>>> >>>>>> On 04.04.2018 01:54, White, Derek wrote: >>>>>> > Thanks JC, >>>>>> > >>>>>> > New patch applies cleanly. Compiles and runs (simple test programs) >>>>>> on >>>>>> > aarch64. >>>>>> > >>>>>> > * Derek >>>>>> > >>>>>> > *From:* JC Beyler [mailto:jcbeyler at google.com] >>>>>> > *Sent:* Monday, April 02, 2018 1:17 PM >>>>>> > *To:* White, Derek >>>>>> > *Cc:* Erik ?sterlund ; >>>>>> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >>>>>> > >>>>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>>>> > >>>>>> > Hi Derek, >>>>>> > >>>>>> > I know there were a few things that went in that provoked a merge >>>>>> > conflict. I worked on it and got it up to date. Sadly my lack of >>>>>> > knowledge makes it a full rebase instead of keeping all the history. >>>>>> > However, with a newly cloned jdk/hs you should now be able to use: >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>>>>> > >>>>>> > The change you are referring to was done with the others so perhaps >>>>>> you >>>>>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>>>>> > don't know but it's been there and I checked, it is here: >>>>>> > >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>>>>> > >>>>>> > I double checked that tlab_end_offset no longer appears in any >>>>>> > architecture (as far as I can tell :)). >>>>>> > >>>>>> > Thanks for testing and let me know if you run into any other issues! >>>>>> > >>>>>> > Jc >>>>>> > >>>>>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek < >>>>>> Derek.White at cavium.com >>>>>> > > wrote: >>>>>> > >>>>>> > Hi Jc, >>>>>> > >>>>>> > I?ve been having trouble getting your patch to apply correctly. >>>>>> I >>>>>> > may have based it on the wrong version. >>>>>> > >>>>>> > In any case, I think there?s a missing update to >>>>>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>>>>> > where ?JavaThread::tlab_end_offset()? should become >>>>>> > ?JavaThread::tlab_current_end_offset()?. >>>>>> > >>>>>> > This should correspond to the other port?s changes in >>>>>> > templateTable_.cpp files. >>>>>> > >>>>>> > Thanks! >>>>>> > - Derek >>>>>> > >>>>>> > *From:* hotspot-compiler-dev >>>>>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>>>>> > ] *On >>>>>> Behalf >>>>>> > Of *JC Beyler >>>>>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>>>>> > *To:* Erik ?sterlund >>>>> > > >>>>>> > *Cc:* serviceability-dev at openjdk.java.net >>>>>> > ; >>>>>> hotspot-compiler-dev >>>>>> > >>>>> > > >>>>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>>>> > >>>>>> > Hi all, >>>>>> > >>>>>> > I've been working on deflaking the tests mostly and the wording >>>>>> in >>>>>> > the JVMTI spec. >>>>>> > >>>>>> > Here is the two incremental webrevs: >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >>>>>> > >>>>>> > Here is the total webrev: >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >>>>>> > >>>>>> > Here are the notes of this change: >>>>>> > >>>>>> > - Currently the tests pass 100 times in a row, I am working >>>>>> on >>>>>> > checking if they pass 1000 times in a row. >>>>>> > >>>>>> > - The default sampling rate is set to 512k, this is what we >>>>>> use >>>>>> > internally and having a default means that to enable the >>>>>> sampling >>>>>> > with the default, the user only has to do a enable event/disable >>>>>> > event via JVMTI (instead of enable + set sample rate). >>>>>> > >>>>>> > - I deprecated the code that was handling the fast path tlab >>>>>> > refill if it happened since this is now deprecated >>>>>> > >>>>>> > - Though I saw that Graal is still using it so I have to >>>>>> see >>>>>> > what needs to be done there exactly >>>>>> > >>>>>> > Finally, using the Dacapo benchmark suite, I noted a 1% >>>>>> overhead for >>>>>> > when the event system is turned on and the callback to the >>>>>> native >>>>>> > agent is just empty. I got a 3% overhead with a 512k sampling >>>>>> rate >>>>>> > with the code I put in the native side of my tests. >>>>>> > >>>>>> > Thanks and comments are appreciated, >>>>>> > >>>>>> > Jc >>>>>> > >>>>>> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >>>>> > > wrote: >>>>>> > >>>>>> > Hi all, >>>>>> > >>>>>> > The incremental webrev update is here: >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >>>>>> > >>>>>> > The full webrev is here: >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >>>>>> > >>>>>> > Major change here is: >>>>>> > >>>>>> > - I've removed the heapMonitoring.cpp code in favor of >>>>>> just >>>>>> > having the sampling events as per Serguei's request; I still >>>>>> > have to do some overhead measurements but the tests prove >>>>>> the >>>>>> > concept can work >>>>>> > >>>>>> > - Most of the tlab code is unchanged, the only major >>>>>> > part is that now things get sent off to event collectors >>>>>> when >>>>>> > used and enabled. >>>>>> > >>>>>> > - Added the interpreter collectors to handle interpreter >>>>>> > execution >>>>>> > >>>>>> > - Updated the name from SetTlabHeapSampling to >>>>>> > SetHeapSampling to be more generic >>>>>> > >>>>>> > - Added a mutex for the thread sampling so that we can >>>>>> > initialize an internal static array safely >>>>>> > >>>>>> > - Ported the tests from the old system to this new one >>>>>> > >>>>>> > I've also updated the JEP and CSR to reflect these changes: >>>>>> > >>>>>> > https://bugs.openjdk.java.net/browse/JDK-8194905 >>>>>> > >>>>>> > https://bugs.openjdk.java.net/browse/JDK-8171119 >>>>>> > >>>>>> > In order to make this have some forward progress, I've >>>>>> removed >>>>>> > the heap sampling code entirely and now rely entirely on the >>>>>> > event sampling system. The tests reflect this by using a >>>>>> > simplified implementation of what an agent could do: >>>>>> > >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >>>>>> > >>>>>> > (Search for anything mentioning event_storage). >>>>>> > >>>>>> > I have not taken the time to port the whole code we had >>>>>> > originally in heapMonitoring to this. I hesitate only >>>>>> because >>>>>> > that code was in C++, I'd have to port it to C and this is >>>>>> for >>>>>> > tests so perhaps what I have now is good enough? >>>>>> > >>>>>> > As far as testing goes, I've ported all the relevant tests >>>>>> and >>>>>> > then added a few: >>>>>> > >>>>>> > - Turning the system on/off >>>>>> > >>>>>> > - Testing using various GCs >>>>>> > >>>>>> > - Testing using the interpreter >>>>>> > >>>>>> > - Testing the sampling rate >>>>>> > >>>>>> > - Testing with objects and arrays >>>>>> > >>>>>> > - Testing with various threads >>>>>> > >>>>>> > Finally, as overhead goes, I have the numbers of the system >>>>>> off >>>>>> > vs a clean build and I have 0% overhead, which is what we'd >>>>>> > want. This was using the Dacapo benchmarks. I am now >>>>>> preparing >>>>>> > to run a version with the events on using dacapo and will >>>>>> report >>>>>> > back here. >>>>>> > >>>>>> > Any comments are welcome :) >>>>>> > >>>>>> > Jc >>>>>> > >>>>>> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler < >>>>>> jcbeyler at google.com >>>>>> > > wrote: >>>>>> > >>>>>> > Hi all, >>>>>> > >>>>>> > I apologize for the delay but I wanted to add an event >>>>>> > system and that took a bit longer than expected and I >>>>>> also >>>>>> > reworked the code to take into account the deprecation >>>>>> of >>>>>> > FastTLABRefill. >>>>>> > >>>>>> > This update has four parts: >>>>>> > >>>>>> > A) I moved the implementation from Thread to >>>>>> > ThreadHeapSampler inside of Thread. Would you prefer it >>>>>> as a >>>>>> > pointer inside of Thread or like this works for you? >>>>>> Second >>>>>> > question would be would you rather have an association >>>>>> > outside of Thread altogether that tries to remember when >>>>>> > threads are live and then we would have something like: >>>>>> > >>>>>> > ThreadHeapSampler::get_sampling_size(this_thread); >>>>>> > >>>>>> > I worry about the overhead of this but perhaps it is >>>>>> not too >>>>>> > too bad? >>>>>> > >>>>>> > B) I also have been working on the Allocation event >>>>>> system >>>>>> > that sends out a notification at each sampled event. >>>>>> This >>>>>> > will be practical when wanting to do something at the >>>>>> > allocation point. I'm also looking at if the whole >>>>>> > heapMonitoring code could not reside in the agent code >>>>>> and >>>>>> > not in the JDK. I'm not convinced but I'm talking to >>>>>> Serguei >>>>>> > about it to see/assess :) >>>>>> > >>>>>> > - Also added two tests for the new event subsystem >>>>>> > >>>>>> > C) Removed the slow_path fields inside the TLAB code >>>>>> since >>>>>> > now FastTLABRefill is deprecated >>>>>> > >>>>>> > D) Updated the JVMTI documentation and specification >>>>>> for the >>>>>> > methods. >>>>>> > >>>>>> > So the incremental webrev is here: >>>>>> > >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >>>>>> > >>>>>> > and the full webrev is here: >>>>>> > >>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >>>>>> > >>>>>> > I believe I have updated the various JIRA issues that >>>>>> track >>>>>> > this :) >>>>>> > >>>>>> > Thanks for your input, >>>>>> > >>>>>> > Jc >>>>>> > >>>>>> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >>>>>> > > >>>>>> wrote: >>>>>> > >>>>>> > Hi Erik, >>>>>> > >>>>>> > I inlined my answers, which the last one seems to >>>>>> answer >>>>>> > Robbin's concerns about the same thing (adding >>>>>> things to >>>>>> > Thread). >>>>>> > >>>>>> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >>>>>> > >>>>> > > wrote: >>>>>> > >>>>>> > Hi JC, >>>>>> > >>>>>> > Comments are inlined below. >>>>>> > >>>>>> > On 2018-02-13 06:18, JC Beyler wrote: >>>>>> > >>>>>> > Hi Erik, >>>>>> > >>>>>> > Thanks for your answers, I've now inlined >>>>>> my own >>>>>> > answers/comments. >>>>>> > >>>>>> > I've done a new webrev here: >>>>>> > >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>>>>> > < >>>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >>>>>> > >>>>>> > The incremental is here: >>>>>> > >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>>>>> > < >>>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >>>>>> > >>>>>> > Note to all: >>>>>> > >>>>>> > - I've been integrating changes from >>>>>> > Erin/Serguei/David comments so this webrev >>>>>> > incremental is a bit an answer to all >>>>>> comments >>>>>> > in one. I apologize for that :) >>>>>> > >>>>>> > On Mon, Feb 12, 2018 at 6:05 AM, Erik >>>>>> ?sterlund >>>>>> > >>>>> > > wrote: >>>>>> > >>>>>> > Hi JC, >>>>>> > >>>>>> > Sorry for the delayed reply. >>>>>> > >>>>>> > Inlined answers: >>>>>> > >>>>>> > >>>>>> > >>>>>> > On 2018-02-06 00:04, JC Beyler wrote: >>>>>> > >>>>>> > Hi Erik, >>>>>> > >>>>>> > (Renaming this to be folded into the >>>>>> > newly renamed thread :)) >>>>>> > >>>>>> > First off, thanks a lot for >>>>>> reviewing >>>>>> > the webrev! I appreciate it! >>>>>> > >>>>>> > I updated the webrev to: >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>>>>> > < >>>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >>>>>> > >>>>>> > And the incremental one is here: >>>>>> > >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>>>>> > < >>>>>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >>>>>> > >>>>>> > It contains: >>>>>> > - The change for since from 9 to 11 >>>>>> for >>>>>> > the jvmti.xml >>>>>> > - The use of the OrderAccess for >>>>>> initialized >>>>>> > - Clearing the oop >>>>>> > >>>>>> > I also have inlined my answers to >>>>>> your >>>>>> > comments. The biggest question >>>>>> > will come from the multiple *_end >>>>>> > variables. A bit of the logic there >>>>>> > is due to handling the slow path >>>>>> refill >>>>>> > vs fast path refill and >>>>>> > checking that the rug was not pulled >>>>>> > underneath the slowpath. I >>>>>> > believe that a previous comment was >>>>>> that >>>>>> > TlabFastRefill was going to >>>>>> > be deprecated. >>>>>> > >>>>>> > If this is true, we could revert >>>>>> this >>>>>> > code a bit and just do a : if >>>>>> > TlabFastRefill is enabled, disable >>>>>> this. >>>>>> > And then deprecate that when >>>>>> > TlabFastRefill is deprecated. >>>>>> > >>>>>> > This might simplify this webrev and >>>>>> I >>>>>> > can work on a follow-up that >>>>>> > either: removes TlabFastRefill if >>>>>> Robbin >>>>>> > does not have the time to do >>>>>> > it or add the support to the >>>>>> assembly >>>>>> > side to handle this correctly. >>>>>> > What do you think? >>>>>> > >>>>>> > I support removing TlabFastRefill, but I >>>>>> > think it is good to not depend on that >>>>>> > happening first. >>>>>> > >>>>>> > >>>>>> > I'm slowly pushing on the FastTLABRefill >>>>>> > ( >>>>>> https://bugs.openjdk.java.net/browse/JDK-8194084), >>>>>> > I agree on keeping both separate for now >>>>>> though >>>>>> > so that we can think of both differently >>>>>> > >>>>>> > Now, below, inlined are my answers: >>>>>> > >>>>>> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >>>>>> > ?sterlund >>>>>> > >>>>> > > >>>>>> wrote: >>>>>> > >>>>>> > Hi JC, >>>>>> > >>>>>> > Hope I am reviewing the right >>>>>> > version of your work. Here >>>>>> goes... >>>>>> > >>>>>> > >>>>>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>>>>> > >>>>>> > 159 >>>>>> > >>>>>> AllocTracer::send_allocation_outside_tlab(klass, result, size * >>>>>> > HeapWordSize, THREAD); >>>>>> > 160 >>>>>> > 161 >>>>>> > >>>>>> THREAD->tlab().handle_sample(THREAD, result, size); >>>>>> > 162 return result; >>>>>> > 163 } >>>>>> > >>>>>> > Should not call tlab()->X >>>>>> without >>>>>> > checking if (UseTLAB) IMO. >>>>>> > >>>>>> > Done! >>>>>> > >>>>>> > >>>>>> > More about this later. >>>>>> > >>>>>> > >>>>>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>>>>> > >>>>>> > So first of all, there seems to >>>>>> > quite a few ends. There is an >>>>>> "end", >>>>>> > a "hard >>>>>> > end", a "slow path end", and an >>>>>> > "actual end". Moreover, it seems >>>>>> > like the >>>>>> > "hard end" is actually further >>>>>> away >>>>>> > than the "actual end". So the >>>>>> "hard end" >>>>>> > seems like more of a "really >>>>>> > definitely actual end" or >>>>>> something. >>>>>> > I don't >>>>>> > know about you, but I think it >>>>>> looks >>>>>> > kind of messy. In particular, I >>>>>> don't >>>>>> > feel like the name "actual end" >>>>>> > reflects what it represents, >>>>>> > especially when >>>>>> > there is another end that is >>>>>> behind >>>>>> > the "actual end". >>>>>> > >>>>>> > 413 HeapWord* >>>>>> > >>>>>> ThreadLocalAllocBuffer::hard_end() { >>>>>> > 414 // Did a fast TLAB >>>>>> refill >>>>>> > occur? >>>>>> > 415 if (_slow_path_end != >>>>>> _end) { >>>>>> > 416 // Fix up the actual >>>>>> end >>>>>> > to be now the end of this TLAB. >>>>>> > 417 _slow_path_end = >>>>>> _end; >>>>>> > 418 _actual_end = _end; >>>>>> > 419 } >>>>>> > 420 >>>>>> > 421 return _actual_end + >>>>>> > alignment_reserve(); >>>>>> > 422 } >>>>>> > >>>>>> > I really do not like making >>>>>> getters >>>>>> > unexpectedly have these kind of >>>>>> side >>>>>> > effects. It is not expected that >>>>>> > when you ask for the "hard >>>>>> end", you >>>>>> > implicitly update the "slow path >>>>>> > end" and "actual end" to new >>>>>> values. >>>>>> > >>>>>> > As I said, a lot of this is due to >>>>>> the >>>>>> > FastTlabRefill. If I make this >>>>>> > not supporting FastTlabRefill, this >>>>>> goes >>>>>> > away. The reason the system >>>>>> > needs to update itself at the get is >>>>>> > that you only know at that get if >>>>>> > things have shifted underneath the >>>>>> tlab >>>>>> > slow path. I am not sure of >>>>>> > really better names (naming is >>>>>> hard!), >>>>>> > perhaps we could do these >>>>>> > names: >>>>>> > >>>>>> > - current_tlab_end // Either >>>>>> the >>>>>> > allocated tlab end or a sampling >>>>>> point >>>>>> > - last_allocation_address // The >>>>>> end of >>>>>> > the tlab allocation >>>>>> > - last_slowpath_allocated_end // In >>>>>> > case a fast refill occurred the >>>>>> > end might have changed, this is to >>>>>> > remember slow vs fast past refills >>>>>> > >>>>>> > the hard_end method can be renamed >>>>>> to >>>>>> > something like: >>>>>> > tlab_end_pointer() // The >>>>>> end of >>>>>> > the lab including a bit of >>>>>> > alignment reserved bytes >>>>>> > >>>>>> > Those names sound better to me. Could >>>>>> you >>>>>> > please provide a mapping from the old >>>>>> names >>>>>> > to the new names so I understand which >>>>>> one >>>>>> > is which please? >>>>>> > >>>>>> > This is my current guess of what you are >>>>>> > proposing: >>>>>> > >>>>>> > end -> current_tlab_end >>>>>> > actual_end -> last_allocation_address >>>>>> > slow_path_end -> >>>>>> last_slowpath_allocated_end >>>>>> > hard_end -> tlab_end_pointer >>>>>> > >>>>>> > Yes that is correct, that was what I was >>>>>> proposing. >>>>>> > >>>>>> > I would prefer this naming: >>>>>> > >>>>>> > end -> slow_path_end // the end for >>>>>> taking a >>>>>> > slow path; either due to sampling or >>>>>> refilling >>>>>> > actual_end -> allocation_end // the end >>>>>> for >>>>>> > allocations >>>>>> > slow_path_end -> last_slow_path_end // >>>>>> last >>>>>> > address for slow_path_end (as opposed to >>>>>> > allocation_end) >>>>>> > hard_end -> reserved_end // the end of >>>>>> the >>>>>> > reserved space of the TLAB >>>>>> > >>>>>> > About setting things in the getter... >>>>>> that >>>>>> > still seems like a very unpleasant >>>>>> thing to >>>>>> > me. It would be better to inspect the >>>>>> call >>>>>> > hierarchy and explicitly update the ends >>>>>> > where they need updating, and assert in >>>>>> the >>>>>> > getter that they are in sync, rather >>>>>> than >>>>>> > implicitly setting various ends as a >>>>>> > surprising side effect in a getter. It >>>>>> looks >>>>>> > like the call hierarchy is very small. >>>>>> With >>>>>> > my new naming convention, reserved_end() >>>>>> > would presumably return _allocation_end >>>>>> + >>>>>> > alignment_reserve(), and have an assert >>>>>> > checking that _allocation_end == >>>>>> > _last_slow_path_allocation_end, >>>>>> complaining >>>>>> > that this invariant must hold, and that >>>>>> a >>>>>> > caller to this function, such as >>>>>> > make_parsable(), must first explicitly >>>>>> > synchronize the ends as required, to >>>>>> honor >>>>>> > that invariant. >>>>>> > >>>>>> > >>>>>> > I've renamed the variables to how you >>>>>> preferred >>>>>> > it except for the _end one. I did: >>>>>> > >>>>>> > current_end >>>>>> > >>>>>> > last_allocation_address >>>>>> > >>>>>> > tlab_end_ptr >>>>>> > >>>>>> > The reason is that the architecture >>>>>> dependent >>>>>> > code use the thread.hpp API and it already >>>>>> has >>>>>> > tlab included into the name so it becomes >>>>>> > tlab_current_end (which is better that >>>>>> > tlab_current_tlab_end in my opinion). >>>>>> > >>>>>> > I also moved the update into a separate >>>>>> method >>>>>> > with a TODO that says to remove it when >>>>>> > FastTLABRefill is deprecated >>>>>> > >>>>>> > This looks a lot better now. Thanks. >>>>>> > >>>>>> > Note that the following comment now needs >>>>>> updating >>>>>> > accordingly in threadLocalAllocBuffer.hpp: >>>>>> > >>>>>> > 41 // Heap sampling is performed >>>>>> via >>>>>> > the end/actual_end fields. >>>>>> > >>>>>> > 42 // actual_end contains the >>>>>> real end >>>>>> > of the tlab allocation, >>>>>> > >>>>>> > 43 // whereas end can be set to an >>>>>> > arbitrary spot in the tlab to >>>>>> > >>>>>> > 44 // trip the return and sample >>>>>> the >>>>>> > allocation. >>>>>> > >>>>>> > 45 // slow_path_end is used to >>>>>> track >>>>>> > if a fast tlab refill occured >>>>>> > >>>>>> > 46 // between slowpath calls. >>>>>> > >>>>>> > There might be other comments too, I have not >>>>>> looked >>>>>> > in detail. >>>>>> > >>>>>> > This was the only spot that still had an >>>>>> actual_end, I >>>>>> > fixed it now. I'll do a sweep to double check other >>>>>> > comments. >>>>>> > >>>>>> > >>>>>> > >>>>>> > Not sure it's better but before >>>>>> updating >>>>>> > the webrev, I wanted to try >>>>>> > to get input/consensus :) >>>>>> > >>>>>> > (Note hard_end was always further >>>>>> off >>>>>> > than end). >>>>>> > >>>>>> > >>>>>> src/hotspot/share/prims/jvmti.xml: >>>>>> > >>>>>> > 10357 >>>>> > id="can_sample_heap" since="9"> >>>>>> > 10358 >>>>>> > 10359 Can sample the >>>>>> heap. >>>>>> > 10360 If this >>>>>> capability >>>>>> > is enabled then the heap >>>>>> sampling >>>>>> > methods >>>>>> > can be called. >>>>>> > 10361 >>>>>> > 10362 >>>>>> > >>>>>> > Looks like this capability >>>>>> should >>>>>> > not be "since 9" if it gets >>>>>> integrated >>>>>> > now. >>>>>> > >>>>>> > Updated now to 11, crossing my >>>>>> fingers :) >>>>>> > >>>>>> > >>>>>> src/hotspot/share/runtime/heapMonitoring.cpp: >>>>>> > >>>>>> > 448 if >>>>>> > (is_alive->do_object_b(value)) { >>>>>> > 449 // Update the >>>>>> oop to >>>>>> > point to the new object if it >>>>>> is still >>>>>> > alive. >>>>>> > 450 >>>>>> f->do_oop(&(trace.obj)); >>>>>> > 451 >>>>>> > 452 // Copy the old >>>>>> > trace, if it is still live. >>>>>> > 453 >>>>>> > >>>>>> _allocated_traces->at_put(curr_pos++, trace); >>>>>> > 454 >>>>>> > 455 // Store the live >>>>>> > trace in a cache, to be served >>>>>> up on >>>>>> > /heapz. >>>>>> > 456 >>>>>> > >>>>>> _traces_on_last_full_gc->append(trace); >>>>>> > 457 >>>>>> > 458 count++; >>>>>> > 459 } else { >>>>>> > 460 // If the old >>>>>> trace >>>>>> > is no longer live, add it to >>>>> >>>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Apr 17 21:01:13 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 17 Apr 2018 14:01:13 -0700 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> Message-ID: <38fabdf2-2627-fbff-d3c1-561799e0e795@oracle.com> An HTML attachment was scrubbed... URL: From rafael.wth at gmail.com Tue Apr 17 21:23:31 2018 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Tue, 17 Apr 2018 23:23:31 +0200 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: Hei Mandy, Lookup::defineClass would always be an alternative but it would require me to open the class first. If the instrumented type can read the module with the callback but its module was not opened, this would not help me much, unfortunately. Also, I could not resolve this lookup as the class in question is not necessarily loaded at this point. Best regards, Rafael 2018-04-17 9:28 GMT+02:00 mandy chung : > Hi Rafael, > > I see that mocking/proxying/testing framework should be looked at > separately since its requirements and approaches can be different than tool > agents. > > On 4/17/18 5:06 AM, Rafael Winterhalter wrote: > > Hei Mandy, > > I have looked into several Java agents that I have worked on and for many > of them, this API does unfortunately not supply sufficient access. I would > therefore still prefer a method Instrumentation::defineClass. > > The problem is that some agents need to define classes in other packages > then in that of the instrumented class. For example, I might need to > enhance a library that defines a set of callback classes in package A. All > these classes share a common super class with a package-private > constructor. I want to instrument some class in package B to use a callback > that the library does not supply and need to add a new callback class to A. > This is not possible using the current API. > > > Are these callback classes made available statically? or just dynamically > defining additional class as needed? Is Lookup::defineClass an alternative > if you get a hold of common super class in A? > > I could however achieve do so by calling Instrumentation::retransform on > one of the classes in A after registering a class file transformer. Once > the retransformation is triggered, I can now define a class in A. Of course > this is inefficient and I would rather open the jdk.internal.misc module > and use the "old" API instead. > > For this reason, I argue that this rather restrained API is not convenient > while it does not add anything to security. Also, for the use case of > Mockito, this would neither be sufficient as Mockito sometimes redefines > classes and sometimes adds a subclass without retransforming. We would > rather have direct access to class definition once we are already running > with the privileges of a Java agent. > > I would therefore suggest to add a method: > > interface Instrumentation { > Class defineClass(byte[] bytes, ProtectionDomain pd); > } > > which can be implemented simply by delegating to jdk.internal.misc.Unsafe. > > On a side note. Does JavaLangAccess::defineClass work with the bootstrap > class loader? I have not tried it but I always thought it was just an > access layer for the class loader API that cannot access the null value. > > > The JVM entry point does allow null loader. > > Mandy > > > Thanks for considering this use case! > Best regards, Rafael > > 2018-04-15 8:23 GMT+02:00 mandy chung : > >> Background: >> >> Java agents support both load time and dynamic instrumentation. At load >> time, >> the agent's ClassFileTransformer is invoked to transform class bytes. >> There is >> no Class objects at this time. Dynamic instrumentation is when >> redefineClasses >> or retransformClasses is used to redefine an existing loaded class. The >> ClassFileTransformer is invoked with class bytes where the Class object >> is present. >> >> Java agent doing instrumentation needs a means to define auxiliary >> classes >> that are visible and accessible to the instrumented class. Existing >> agents >> have been using sun.misc.Unsafe::defineClass to define aux classes >> directly >> or accessing protected ClassLoader::defineClass method with setAccessible >> to >> suppress the language access check (see [1] where this issue was brought >> up). >> >> Instrumentation::appendToBootstrapClassLoaderSearch and >> appendToSystemClassLoaderSearch >> APIs are existing means to supply additional classes. It's too limited >> for example it can't inject a class in the same runtime package as the >> class >> being transformed. >> >> Proposal: >> >> This proposes to add a new ClassFileTransformer.transform method taking >> additional ClassDefiner parameter. A transformer can define additional >> classes during the transformation process, i.e. >> when ClassFileTransformer::transform is invoked. Some details: >> >> 1. ClassDefiner::defineClass defines a class in the same runtime package >> as the class being transformed. >> 2. The class is defined in the same thread as the transformers are being >> invoked. ClassDefiner::defineClass returns Class object directly >> before the transformed class is defined. >> 3. No transformation is applied to classes defined by >> ClassDefiner::defineClass. >> >> The first prototype we did is to collect the auxiliary classes and define >> them until all transformers are invoked and have these aux classes to go >> through the transformation pipeline. Several complicated issues would >> need to be resolved for example timing whether the auxiliary classes >> should >> be defined before the transformed class (otherwise a potential race where >> some other thread references the transformed class and cause the code to >> execute that in turn reference the auxiliary classes. The current >> implementation has a native reentrancy check that ensure one class is >> being >> transformed to avoid potential circularity issues. This may need JVM TI >> support to be reliable. >> >> This proposal would allow java agents to migrate from internal API and >> ClassDefiner to be enhanced in the future. >> >> Webrev: >> http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ >> >> Mandy >> [1] http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/ >> 000405.html >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandy.chung at oracle.com Wed Apr 18 02:25:01 2018 From: mandy.chung at oracle.com (mandy chung) Date: Wed, 18 Apr 2018 10:25:01 +0800 Subject: RFR: 81820709 - Container Awareness JEP In-Reply-To: References: Message-ID: <4e73dfdf-54f5-eb43-5e7d-0cfb316fb9e7@oracle.com> On 4/3/18 10:09 PM, Bob Vandette wrote: > WEBREV: > > http://cr.openjdk.java.net/~bobv/8182070/v01/webrev I reviewed the webrev and look okay in general. I will look through the javadoc next. Metrics.java 37 *
  • 1. All processes, including the current process within a container.
      includes the numbering. You can drop "1." and other numbers. 42 *
    1. or This adds a bullet. Maybe dropping this line. 81 * @return The name of the provider or null if Metrics are 82 * not enabled. 85 public String getProvider(); Should this method always return non-null name? For optional metric (when it's not available), the method returns 0. For example: 533 * @return The number of bytes transferred or 0 if this metric is not available. How does the client know if the metrics is not available or zero? Or the client does not care? jdk/internal/platform/cgroupv1/Metrics.java 274 return SubSystem.getLongValue(cpuacct, "cpuacct.usage"); Should this be an instance method? like cpuacct.getLongValue("cpuacct.usage"); final field name can be made all caps. I know you are going to include regression tests. > > WEBREV including a Prototype MBEAN for exposing these Metrics: > > This prototype will not be integrated as part of this JEP. It?s for information only. > > http://cr.openjdk.java.net/~bobv/8182070/v01/mbean-proto/ > > > This feature adds a new -XshowSetting option ?system? which displays the > available system Metrics. What does java --help-extra show?? The help message should include -XshowSettings:system only on Linux. > > % java -XshowSettings:system I expect this option shows static/configuration information rather than timing statistics e.g. CPU time and usage.? It may be a smaller set but it may be good information though. It's more appropriate for monitoring tools to show the timing statistics and resource consumption rather than the launcher. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From sangheon.kim at oracle.com Wed Apr 18 04:41:41 2018 From: sangheon.kim at oracle.com (sangheon.kim) Date: Tue, 17 Apr 2018 21:41:41 -0700 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after Message-ID: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> Hi all, Could I have some reviews for this patch? GarbageCollectionNotificationInfo is printing same used size at Eden space before and after GC. But the problem is that memory pool names and actual GC information(including memory usage) are not matching. So the printed memory usage at Eden is actually metaspace memory usage in my case. Proposed patch is adding a new API to get all memory pool names without changing memory order. Enhanced existing JTREG test, to check used size change at Eden space. CR: https://bugs.openjdk.java.net/browse/JDK-8196325 webrev: http://cr.openjdk.java.net/~sangheki/8196325/webrev.0/ Testing: jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2,builds-tier1, jdk_management, jdk_jmx Thanks, Sangheon From jeremymanson at google.com Wed Apr 18 04:51:08 2018 From: jeremymanson at google.com (Jeremy Manson) Date: Wed, 18 Apr 2018 04:51:08 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: <38fabdf2-2627-fbff-d3c1-561799e0e795@oracle.com> References: <0A5AD1F8-EBDD-47EA-9741-1AE6B1B3971A@oracle.com> <38fabdf2-2627-fbff-d3c1-561799e0e795@oracle.com> Message-ID: Great, thanks! Jeremy On Tue, Apr 17, 2018 at 2:01 PM serguei.spitsyn at oracle.com < serguei.spitsyn at oracle.com> wrote: > Hi Jeremy, > > We had a private discussion with Jc on this and decided the same. > We would like to sample all allocations if possible and on a low level. > > Thanks, > Serguei > > > On 4/17/18 12:38, Jeremy Manson wrote: > > +1 to sampling anything the thread is allocating. With the bytecode > rewriting based version of this, I had complaints about missing allocations > from JNI and APIs like Class.newInstance. (I don't know how the placement > of the collectors would affect this, but it did matter). > > Jeremy > > On Thu, Apr 12, 2018 at 2:23 PM JC Beyler wrote: > >> Hi Karen, >> >> I apologize for sending too many webrevs. I try/tend to iterate fast and >> move in an iterative fashion. I also try to solve most, if not all, of the >> current items that are requested in one go. Perhaps I failed in doing that >> recently? I apologize for that. >> >> So I promise to not send a new webrev in this email or until I'm pretty >> sure I got all the current (And any incoming comment/reviews) handled :-) >> >> For the points you brought up: >> a) What are we sampling? In my mind, I'd rather have the sampler be >> sampling anything the thread is allocating and not only sample bytecode >> allocations. It turns out that I was focusing on that first to get it up. >> As I was stuck in figuring out how to get the VM collector and the sampling >> collector to co-exist, there was a bit of issues there. >> - That has been solved by now delaying the posting of a sampled >> object if a VM collector is present. So now that I've better understood >> interactions between collectors and when you could post an event, I'm way >> more able to talk about the feasibility and validity of the next item about >> bigger objects. >> >> b) You bring up an excellent point of if we have a multi-array object >> or a more complex object (such as a cloned object for example), if the >> sampler is tripped on an internal allocation, should we send that smaller >> allocation or should we send the bigger object >> - Because we get the stacktrace and we only use the oop to figure out >> GC information about the liveness of the object in our use-case in the >> JVMTI agent, this changes nothing really in practice. I do see value in >> sending the multi-array object as a whole to a user. >> - If that is what you think is best, I can work on getting that >> supported and the multi-array test would then prove that if part of the >> multi-array is sampled, the sampler returns the whole multi-array. >> >> Hopefully that answers your concern on me sending too many webrevs, to >> which I sincerely apologize. Probably a learning curve of different >> approaches of reviews. And I hope that my other answers do show the >> direction you were hoping to see. >> >> Thanks again for all your help, >> Jc >> >> On Thu, Apr 12, 2018 at 8:15 AM Karen Kinnear >> wrote: >> >>> JC, >>> >>> >>> On Apr 11, 2018, at 8:17 PM, JC Beyler wrote: >>> >>> Hi Karen, >>> >>> I put up a new webrev that is feature complete in my mind in terms of >>> implementation. There could be a few tid-bits of optimizations here and >>> there but I believe everything is now there in terms of features and there >>> is the question of placement of collectors (I've now put it lower than what >>> you talk about in this email thread). >>> >>> I believe that the primary goal of your JEP is to catch samples of >>> allocation due to bytecodes. Given that, it makes sense to >>> put the collectors in the code generators, so I am ok with your leaving >>> them where they are. >>> >>> And it would save us all a lot of cycles if rather than frequent webrev >>> updates with subsets of the requested changes - if you >>> could wait until you?ve added all the changes people requested - then >>> that would increase the signal to noise ratio. >>> >>> >>> The incremental webrev is here: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/ >>> and the full webrev is here: >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/ >>> >>> The incremental webrev contains the name change of TLAB fields as per >>> the conversation going on in the GC thread and the Graal changes to make >>> this webrev not break anything. It also contains a test for when VM and >>> sampled events are enabled at the same time. >>> >>> I'll answer your questions inlined below: >>> >>> >>>> >>>> I have a couple of design questions before getting to the detailed code >>>> review: >>>> >>>> 1. jvmtiExport.cpp >>>> You have simplified the JvmtiObjectAllocEventCollector to assume there >>>> is only a single object. >>>> Do you have a test that allocates a multi-dimensional array? >>>> I would expect that to have multiple subarrays - and need the logic >>>> that you removed. >>>> >>> >>> I "think" you misread this. I did not change the implementation of JvmtiObjectAllocEventCollector >>> to assume only one object. Actually the implementation is still doing what >>> it was doing initially but now JvmtiObjectAllocEventCollector is a main >>> class with two inherited behaviors: >>> - JvmtiVMObjectAllocEventCollector is following the same logic as >>> before >>> - JvmtiSampledObjectAllocEventCollector has the logic of a single >>> allocation during collection since it is placed that low. I'm thinking of >>> perhaps separating the two classes now that the sampled case is so low and >>> does not require the extra logic of handling a growable array. >>> >>> I don't have a test that tests multi-dimensional array. I have not >>> looked at what exactly VMObjectAllocEvents track but I did do an example to >>> see: >>> >>> - On a clean JDK, if I allocate: >>> int[][][] myAwesomeTable = new int[10][10][10]; >>> >>> I get one single VMObject call back. >>> >>> >>> - With my change, I get the same behavior. >>> >>> So instead, I did an object containing an array and cloned it. With the >>> clean JDK I get two VM callbacks with and without my change. >>> >>> I'll add a test in the next webrev to ensure correctness. >>> >>> >>>> *** Please add a test in which you allocate a multi-dimensional array - >>>> and try it with your earlier version which >>>> will show the full set of allocated objects >>>> >>> >>> As I said, this works because it is so low in the system. I don't know >>> the internals of the allocation of a three-dimensional array but: >>> - If I try to collect all allocations required for the new >>> int[10][10][10], I get more than a hundred allocation callbacks if the >>> sample rate is 0, which is what I'd expect (getting a callback at each >>> allocation, so what I expect) >>> >>> So though I only collect one object, I get a callback for each if that >>> is what we want in regard to the sampling rate. >>> >>> I?ll let you work this one out with Serguei - if he is ok with your >>> change to not have a growable array and only report one object, >>> given that you are sampling, sounds like that is not a functional loss >>> for you. >>> >>> And yes, please do add the test. >>> >>> >>> >>>> 2. Tests - didn?t read them all - just ran into a hardcoded ?10% error >>>> ensures a sanity test without becoming flaky? >>>> do check with Serguei on this - that looks like a potential future test >>>> failure >>>> >>> >>> Because the sampling rate is a geometric variable around a mean of the >>> sampling rate, any meaningful test is going to have to be statistical. >>> Therefore, I do a bit of error acceptance to allow us to test for the real >>> thing and not hack the code to have less "real" tests. This is what we do >>> internally, let me know if you want me to do it otherwise. >>> >>> Flaky is probably a wrong term or perhaps I need to better explain this. >>> I'll change the comment in the tests to explain that potentially flakyness >>> comes from the nature of the geometrical mean. Because we don't want too >>> long running tests, it makes sense to me to have this error percentage. >>> >>> Let me now answer the comments from the other email here as well so we >>> have all answers and conversations in a single thread: >>> >>>> - Note there is one current caveat: if the agent requests the >>>> VMObjectAlloc event, the sampler defers to that event due to a limitation >>>> in its implementation (ie: I am not convinced I can safely send out that >>>> event with the VM collector enabled, I'll happily white board that). >>>> >>>> Please work that one out with Serguei and the serviceability folks. >>>> >>>> >>> Agreed. I'll follow up with Serguei if this is a potential problem, I >>> have to double check and ensure I am right that there is an issue. I see it >>> as just a matter of life and not a problem for now. IF you do want both >>> events, having a sample drop due to this limitation does not invalidate the >>> system in my mind. I could be wrong about it though and would happily go >>> over what I saw. >>> >>> >>> >>>>> So the Heap Sampling Monitoring System used to have more methods. It >>>>> made sense to have them in a separate category. I now have moved it to the >>>>> memory category to be consistent and grouped there. I also removed that >>>>> link btw. >>>>> >>>> Thanks. >>>> >>> >>> Actually it did seem weird to put it there since there was only >>> Allocate/Deallocate, so for now the method is still again in its own >>> category once again. If someone has a better spot, let me know. >>> >>> >>>>> >>>>>> >>>>>> I was trying to figure out a way to put the collectors farther down >>>>>> the call stack so as to both catch more >>>>>> cases and to reduce the maintenance burden - i.e. if you were to add >>>>>> a new code generator, e.g. Graal - >>>>>> if it were to go through an existing interface, that might be a place >>>>>> to already have a collector. >>>>>> >>>>>> I do not know the Graal sources - I did look at >>>>>> jvmci/jvmciRuntime.cpp - and it appears that there >>>>>> are calls to instanceKlass::new_instance, >>>>>> oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >>>>>> so one possibility would be to put hooks in those calls which would >>>>>> catch many? (I did not do a thorough search) >>>>>> of the slowpath calls for the bytecodes, and then check the fast >>>>>> paths in detail. >>>>>> >>>>> >>>>> I'll come to a major issue with the collector and its placement in the >>>>> next paragraph. >>>>> >>>> Still not clear on why you did not move the collectors into >>>> instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray >>>> and ArrayKlass::multi-allocate. >>>> >>> As I said above, I am ok with your leaving the collectors in the code >>> generators since that is your focus. >>> >>> >>> I think what was happening is that the collectors would wrap the objects >>> in handles but references to the originally allocated object would be on >>> the stack still and would not get updated if required. Due to that issue, I >>> believe was getting weird bugs. >>> >>> Because of this, it seems that any VM collector enabled has to guarantee >>> that either: >>> - its path to destruction (and thus posting of events) has no means >>> of triggering a GC that would move things around (as long as you are in VM >>> code you should be fine I believe) >>> >>> - if GC is occuring, the objects in its internal array are not >>> somewhere on the stack without a handle around them to be able to be moved >>> if need by from a GC operation. >>> >>> >>> I'm not convinced this holds in the multithreaded with sampling and VM >>> collection cases. >>> >>> I will let you and Serguei work out whether you have sufficient test >>> coverage for the multithreaded cases. >>> >>> thanks, >>> Karen >>> >>> >>> >>>>> >>>>>> >>>>>> I had wondered if it made sense to move the hooks even farther down, >>>>>> into CollectedHeap:obj_allocate and array_allocate. >>>>>> I do not think so. First reason is that for multidimensional arrays, >>>>>> ArrayKlass::multi_allocate the outer dimension array would >>>>>> have an event before storing the inner sub-arrays and I don?t think >>>>>> we want that exposed, so that won?t work for arrays. >>>>>> >>>>> >>>>> So the major difficulty is that the steps of collection do this: >>>>> >>>>> - An object gets allocated and is decided to be sampled >>>>> - The original pointer placement (where it resides originally in >>>>> memory) is passed to the collector >>>>> - Now one important thing of note: >>>>> (a) In the VM code, until the point where the oop is going to be >>>>> returned, GC is not yet aware of it >>>>> >>>> (b) so the collector can't yet send it out to the user via JVMTI >>>>> otherwise, the agent could put a weak reference for example >>>>> >>>>> I'm a bit fuzzy on this and maybe it's just that there would be more >>>>> heavy lifting to make this possible but my initial tests seem to show >>>>> problems when attempting this in the obj_allocate area. >>>>> >>>> Not sure what you are seeing here - >>>> >>>> Let me state it the way I understand it. >>>> 1) You can collect the object into internal metadata at allocation >>>> point - which you already had >>>> Note: see comment in JvmtiExport.cpp: >>>> // In the case of the sampled object collector, we don?t want to >>>> perform the >>>> // oops_do because the object in the collector is still stored in >>>> registers >>>> // on the VM stack >>>> - so GC will find these objects as roots, once we allow GC to run, >>>> which should be after the header is initialized >>>> >>>> Totally agree with you that you can not post the event in the source >>>> code in which you allocate the memory - keep reading >>>> >>>> 2) event posting: >>>> - you want to ensure that the object has been fully initialized >>>> - the object needs to have the header set up - not just the memory >>>> allocated - so that applies to all objects >>>> (and that is done in a caller of the allocation code - so it can?t >>>> be done at the location which does the memory allocation) >>>> - the example I pointed out was the multianewarray case - all the >>>> subarrays need to be allocated >>>> >>>> - please add a test case for multi-array, so as you experiment with >>>> where to post the event, you ensure that you can >>>> access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as >>>> subarrays) >>>> >>> >>> Technically it's more than just initialized, it is the fact that you >>> cannot perform a callback about an object if any object of that thread is >>> being held by a collector and also in a register/stack space without >>> protections. >>> >>> >>> >>>> - prior to setting up the object header information, GC would not >>>> know about the object >>>> - was this by chance the issue you ran into? >>>> >>> >>> No I believe the issue I was running into was above where an object on >>> the stack was pointing to an oop that got moved during a GC due to that >>> thread doing an event callback. >>> >>> Thanks for your help, >>> Jc >>> >>> >>>> 3) event posting >>>> - when you post the event to JVMTI >>>> - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), >>>> which creates JNIHandles::make_local(_thread, obj) >>>> >>>> >>>>> >>>>>> >>>>>> The second reason is that I strongly suspect the scope you want is >>>>>> bytecodes only. I think once you have added hooks >>>>>> to all the fast paths and slow paths that this will be pushing the >>>>>> performance overhead constraints you proposed and >>>>>> you won?t want to see e.g. internal allocations. >>>>>> >>>>> >>>>> Yes agreed, allocations from bytecodes are mostly our concern >>>>> generally :) >>>>> >>>>> >>>>>> >>>>>> >>>>> But I think you need to experiment with the set of allocations (or >>>>>> possible alternative sets of allocations) you want recorded. >>>>>> >>>>>> The hooks I see today include: >>>>>> Interpreter: (looking at x86 as a sample) >>>>>> - slowpath in InterpreterRuntime >>>>>> - fastpath tlab allocation - your new threshold check handles that >>>>>> >>>>> >>>>> Agreed >>>>> >>>>> >>>>>> - allow_shared_alloc (GC specific): for _new isn?t handled >>>>>> >>>>> >>>>> Where is that exactly? I can check why we are not catching it? >>>>> >>>>> >>>>>> >>>>>> C1 >>>>>> I don?t see changes in c1_Runtime.cpp >>>>>> note: you also want to look for the fast path >>>>>> >>>>> >>>>> I added the calls to c1_Runtime in the latest webrev, but was still >>>>> going through testing before pushing it out. I had waited on this one a >>>>> bit. Fast path would be handled by the threshold check no? >>>>> >>>>> >>>>>> C2: changes in opto/runtime.cpp for slow path >>>>>> did you also catch the fast path? >>>>>> >>>>> >>>>> Fast path gets handled by the same threshold check, no? Perhaps I've >>>>> missed something (very likely)? >>>>> >>>>> >>>>>> >>>>>> 3. Performance - >>>>>> After you get all the collectors added - you need to rerun the >>>>>> performance numbers. >>>>>> >>>>> >>>>> Agreed :) >>>>> >>>>> >>>>>> >>>>>> thanks, >>>>>> Karen >>>>>> >>>>>> On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: >>>>>> >>>>>> Thanks Boris and Derek for testing it. >>>>>> >>>>>> Yes I was trying to get a new version out that had the tests ported >>>>>> as well but got sidetracked while trying to add tests and two new features. >>>>>> >>>>>> Here is the incremental webrev: >>>>>> >>>>>> Here is the full webrev: >>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >>>>>> >>>>>> Basically, the new tests assert this: >>>>>> - Only one agent can currently ask for the sampling, I'm currently >>>>>> seeing if I can push to a next webrev the multi-agent support to start >>>>>> doing a code freeze on this one >>>>>> - The event is not thread-enabled, meaning like the >>>>>> VMObjectAllocationEvent, it's an all or nothing event; same as the >>>>>> multi-agent, I'm going to see if a future webrev to add the support is a >>>>>> better idea to freeze this webrev a bit >>>>>> >>>>>> There was another item that I added here and I'm unsure this webrev >>>>>> is stable in debug mode: I added an assertion system to ascertain that all >>>>>> paths leading to a TLAB slow path (and hence a sampling point) have a >>>>>> sampling collector ready to post the event if a user wants it. This might >>>>>> break a few thing in debug mode as I'm working through the kinks of that as >>>>>> well. However, in release mode, this new webrev passes all the tests in >>>>>> hotspot/jtreg/serviceability/jvmti/HeapMonitor. >>>>>> >>>>>> Let me know what you think, >>>>>> Jc >>>>>> >>>>>> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < >>>>>> boris.ulasevich at bell-sw.com> wrote: >>>>>> >>>>>>> Hi JC, >>>>>>> >>>>>>> I have just checked on arm32: your patch compiles and runs ok. >>>>>>> >>>>>>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>>>>>> correspond to actual library name: libHeapMonitorTest.c -> >>>>>>> libHeapMonitorTest.so >>>>>>> >>>>>>> Boris >>>>>>> >>>>>>> On 04.04.2018 01:54, White, Derek wrote: >>>>>>> > Thanks JC, >>>>>>> > >>>>>>> > New patch applies cleanly. Compiles and runs (simple test >>>>>>> programs) on >>>>>>> > aarch64. >>>>>>> > >>>>>>> > * Derek >>>>>>> > >>>>>>> > *From:* JC Beyler [mailto:jcbeyler at google.com] >>>>>>> > *Sent:* Monday, April 02, 2018 1:17 PM >>>>>>> > *To:* White, Derek >>>>>>> > *Cc:* Erik ?sterlund ; >>>>>>> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >>>>>>> > >>>>>>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>>>>>> > >>>>>>> > Hi Derek, >>>>>>> > >>>>>>> > I know there were a few things that went in that provoked a merge >>>>>>> > conflict. I worked on it and got it up to date. Sadly my lack of >>>>>>> > knowledge makes it a full rebase instead of keeping all the >>>>>>> history. >>>>>>> > However, with a newly cloned jdk/hs you should now be able to use: >>>>>>> > >>>>>>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>>>>>> > >>>>>>> > The change you are referring to was done with the others so >>>>>>> perhaps you >>>>>>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>>>>>> > don't know but it's been there and I checked, it is here: >>>>>>> > >>>>>>> > >>>>>>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>>>>>> > >>>>>>> > I double checked that tlab_end_offset no longer appears in any >>>>>>> > architecture (as far as I can tell :)). >>>>>>> > >>>>>>> > Thanks for testing and let me know if you run into any other >>>>>>> issues! >>>>>>> > >>>>>>> > Jc >>>>>>> > >>>>>>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek < >>>>>>> Derek.White at cavium.com >>>>>>> > > wrote: >>>>>>> > >>>>>>> > Hi Jc, >>>>>>> > >>>>>>> > I?ve been having trouble getting your patch to apply >>>>>>> correctly. I >>>>>>> > may have based it on the wrong version. >>>>>>> > >>>>>>> > In any case, I think there?s a missing update to >>>>>>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>>>>>> > where ?JavaThread::tlab_end_offset()? should become >>>>>>> > ?JavaThread::tlab_current_end_offset()?. >>>>>>> > >>>>>>> > This should correspond to the other port?s changes in >>>>>>> > templateTable_.cpp files. >>>>>>> > >>>>>>> > Thanks! >>>>>>> > - Derek >>>>>>> > >>>>>>> > *From:* hotspot-compiler-dev >>>>>>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>>>>>> > ] *On >>>>>>> Behalf >>>>>>> > Of *JC Beyler >>>>>>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>>>>>> > *To:* Erik ?sterlund >>>>>> > > >>>>>>> > *Cc:* serviceability-dev at openjdk.java.net >>>>>>> > ; >>>>>>> hotspot-compiler-dev >>>>>>> > >>>>>> > >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jini.george at oracle.com Wed Apr 18 06:05:18 2018 From: jini.george at oracle.com (Jini George) Date: Wed, 18 Apr 2018 11:35:18 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> Message-ID: Thank you very much, Yasumasa, for pointing this out. You are right -- this would fail in the Linux systems if systemd-coredump is enabled. I plan to file an enhancement request to address this issue (wrt systemd-coredump) separately since this would apply to other coredump generating test cases also like: test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. From what i can gather, i think we might be able to at least partially address this by using coredumptl -o dump in the test cases, provided the kernel.core_pattern variable is not set to "|/bin/false". Let me know if you are not OK with this. Thank you, Jini. On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: > Hi Jini, > > ClhsdbCDSCore.java: > ? Can this test work on modern Linux? > ? AFAIK modern Linux contains systemd-coredump to gather core images. > So I concern ClhsdbCDSCore.java fails in the future. > > > Thanks, > > Yasumasa > > > On 2018/04/12 13:21, Jini George wrote: >> Ping: Gentle reminder ! >> >> Thanks, >> Jini. >> >> On 4/6/2018 9:51 PM, Jini George wrote: >>> Hello! >>> >>> Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 >>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>> >>> While trying to identify the type given an address, a WrongTypeException >>> was getting thrown with various clhsdb commands (like printmdo, jstack, >>> etc). This was since SA tries to map an address to a hotspot C++ type by >>> comparing the vtable address to the vtable address values of known >>> types. With CDS, since the vtables are copied over for the Metadata >>> classes, the vtable addresses themselves don't match (though, of course, >>> the contents will), and SA errors out. >>> >>> The fix has been implemented by making changes to read in the md region >>> (consisting of the c++ vtables) of the CDS archive in SA, and mapping >>> the vtable addresses to the corresponding metadata type (ConstantPool, >>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>> >>> For corefiles, an additional modification has been done to have the >>> replicated FileMapHeader structure (from >>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>> ps_core.c), to be in sync with the corresponding definition in >>> src/hotspot/share/memory/filemap.hpp. >>> >>> Test cases to test both live and corefile debugging are being added with >>> this. These and other SA tests pass on Mach5. >>> >>> Thanks, >>> Jini. >> From yasuenag at gmail.com Wed Apr 18 06:36:57 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 18 Apr 2018 15:36:57 +0900 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> Message-ID: 2018-04-18 15:05 GMT+09:00 Jini George : > Thank you very much, Yasumasa, for pointing this out. You are right -- this > would fail in the Linux systems if systemd-coredump is enabled. > > I plan to file an enhancement request to address this issue (wrt > systemd-coredump) separately since this would apply to other coredump > generating test cases also like: > test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. I agree with you, but... > From what i can gather, i think we might be able to at least partially > address this by using > > coredumptl -o dump > > in the test cases, provided the kernel.core_pattern variable is not set to > "|/bin/false". > > Let me know if you are not OK with this. IMHO it is not good. Some Linux distros use other coredump collector. For example, RHEL 6 uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. Hence I think we should disable all tests which requires core images for Linux like a Windows platform. Thanks, Yasumasa > Thank you, > Jini. > > > > > On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >> >> Hi Jini, >> >> ClhsdbCDSCore.java: >> Can this test work on modern Linux? >> AFAIK modern Linux contains systemd-coredump to gather core images. So >> I concern ClhsdbCDSCore.java fails in the future. >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2018/04/12 13:21, Jini George wrote: >>> >>> Ping: Gentle reminder ! >>> >>> Thanks, >>> Jini. >>> >>> On 4/6/2018 9:51 PM, Jini George wrote: >>>> >>>> Hello! >>>> >>>> Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 >>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>> >>>> While trying to identify the type given an address, a WrongTypeException >>>> was getting thrown with various clhsdb commands (like printmdo, jstack, >>>> etc). This was since SA tries to map an address to a hotspot C++ type by >>>> comparing the vtable address to the vtable address values of known >>>> types. With CDS, since the vtables are copied over for the Metadata >>>> classes, the vtable addresses themselves don't match (though, of course, >>>> the contents will), and SA errors out. >>>> >>>> The fix has been implemented by making changes to read in the md region >>>> (consisting of the c++ vtables) of the CDS archive in SA, and mapping >>>> the vtable addresses to the corresponding metadata type (ConstantPool, >>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>> >>>> For corefiles, an additional modification has been done to have the >>>> replicated FileMapHeader structure (from >>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>> ps_core.c), to be in sync with the corresponding definition in >>>> src/hotspot/share/memory/filemap.hpp. >>>> >>>> Test cases to test both live and corefile debugging are being added with >>>> this. These and other SA tests pass on Mach5. >>>> >>>> Thanks, >>>> Jini. >>> >>> > From david.holmes at oracle.com Wed Apr 18 07:10:22 2018 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Apr 2018 17:10:22 +1000 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> Message-ID: <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> My 2c ... We have to have tests that can test core file attaching capability - else we don't know it works. So we have to try and generate a core file. But, we have to expect that in many cases no core file will be generated even if the hs-err file claims it was. For example my primary local testing system never generates core files even though it claims to: # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c" (or dumping to / export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) apport isn't even installed, even though core_pattern lists it. Cheers, David On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: > 2018-04-18 15:05 GMT+09:00 Jini George : >> Thank you very much, Yasumasa, for pointing this out. You are right -- this >> would fail in the Linux systems if systemd-coredump is enabled. >> >> I plan to file an enhancement request to address this issue (wrt >> systemd-coredump) separately since this would apply to other coredump >> generating test cases also like: >> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. > > I agree with you, but... > >> From what i can gather, i think we might be able to at least partially >> address this by using >> >> coredumptl -o dump >> >> in the test cases, provided the kernel.core_pattern variable is not set to >> "|/bin/false". >> >> Let me know if you are not OK with this. > > IMHO it is not good. > Some Linux distros use other coredump collector. For example, RHEL 6 > uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. > Hence I think we should disable all tests which requires core images > for Linux like a Windows platform. > > > Thanks, > > Yasumasa > > >> Thank you, >> Jini. >> >> >> >> >> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>> >>> Hi Jini, >>> >>> ClhsdbCDSCore.java: >>> Can this test work on modern Linux? >>> AFAIK modern Linux contains systemd-coredump to gather core images. So >>> I concern ClhsdbCDSCore.java fails in the future. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2018/04/12 13:21, Jini George wrote: >>>> >>>> Ping: Gentle reminder ! >>>> >>>> Thanks, >>>> Jini. >>>> >>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>> >>>>> Hello! >>>>> >>>>> Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>> >>>>> While trying to identify the type given an address, a WrongTypeException >>>>> was getting thrown with various clhsdb commands (like printmdo, jstack, >>>>> etc). This was since SA tries to map an address to a hotspot C++ type by >>>>> comparing the vtable address to the vtable address values of known >>>>> types. With CDS, since the vtables are copied over for the Metadata >>>>> classes, the vtable addresses themselves don't match (though, of course, >>>>> the contents will), and SA errors out. >>>>> >>>>> The fix has been implemented by making changes to read in the md region >>>>> (consisting of the c++ vtables) of the CDS archive in SA, and mapping >>>>> the vtable addresses to the corresponding metadata type (ConstantPool, >>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>> >>>>> For corefiles, an additional modification has been done to have the >>>>> replicated FileMapHeader structure (from >>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>>> ps_core.c), to be in sync with the corresponding definition in >>>>> src/hotspot/share/memory/filemap.hpp. >>>>> >>>>> Test cases to test both live and corefile debugging are being added with >>>>> this. These and other SA tests pass on Mach5. >>>>> >>>>> Thanks, >>>>> Jini. >>>> >>>> >> From mandy.chung at oracle.com Wed Apr 18 07:46:37 2018 From: mandy.chung at oracle.com (mandy chung) Date: Wed, 18 Apr 2018 15:46:37 +0800 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: <460808aa-6eef-4eae-31b2-191e2fbaac79@oracle.com> Hi Rafael, I think it's best to separate the testing requirement from java agents doing instrumentation that may run in production environment. I have created a JBS issue to track the testing mode idea that would require more discussion and investigation: ? https://bugs.openjdk.java.net/browse/JDK-8201562 I understand it's not as efficient to inject a class in a package different than the package of the transformed class.? With the principle of least privilege, I prefer not to provide an API to inject any class in any package and you can achieve by calling retransformClasses. What do you think? Mandy On 4/18/18 5:23 AM, Rafael Winterhalter wrote: > Hei Mandy, > Lookup::defineClass would always be an alternative but it would > require me to open the class first. If the instrumented type can read > the module with the callback but its module was not opened, this would > not help me much, unfortunately. Also, I could not resolve this lookup > as the class in question is not necessarily loaded at this point. > Best regards, Rafael > > 2018-04-17 9:28 GMT+02:00 mandy chung >: > > Hi Rafael, > > I see that mocking/proxying/testing framework should be looked at > separately since its requirements and approaches can be different > than tool agents. > > On 4/17/18 5:06 AM, Rafael Winterhalter wrote: >> Hei Mandy, >> >> I have looked into several Java agents that I have worked on and >> for many of them, this API does unfortunately not supply >> sufficient access. I would therefore still prefer a method >> Instrumentation::defineClass. >> >> The problem is that some agents need to define classes in other >> packages then in that of the instrumented class. For example, I >> might need to enhance a library that defines a set of callback >> classes in package A. All these classes share a common super >> class with a package-private constructor. I want to instrument >> some class in package B to use a callback that the library does >> not supply and need to add a new callback class to A. This is not >> possible using the current API. >> > > Are these callback classes made available statically?? or just > dynamically defining additional class as needed?? Is > Lookup::defineClass an alternative if you get a hold of common > super class in A? > >> I could however achieve do so by calling >> Instrumentation::retransform on one of the classes in A after >> registering a class file transformer. Once the retransformation >> is triggered, I can now define a class in A. Of course this is >> inefficient and I would rather open the jdk.internal.misc module >> and use the "old" API instead. >> >> For this reason, I argue that this rather restrained API is not >> convenient while it does not add anything to security. Also, for >> the use case of Mockito, this would neither be sufficient as >> Mockito sometimes redefines classes and sometimes adds a subclass >> without retransforming. We would rather have direct access to >> class definition once we are already running with the privileges >> of a Java agent. >> >> I would therefore suggest to add a method: >> >> interface Instrumentation { >> ? Class defineClass(byte[] bytes, ProtectionDomain pd); >> } >> >> which can be implemented simply by delegating to >> jdk.internal.misc.Unsafe. >> >> On a side note. Does JavaLangAccess::defineClass work with the >> bootstrap class loader? I have not tried it but I always thought >> it was just an access layer for the class loader API that cannot >> access the null value. >> > > The JVM entry point does allow null loader. > > Mandy > > >> Thanks for considering this use case! >> Best regards, Rafael >> >> 2018-04-15 8:23 GMT+02:00 mandy chung > >: >> >> Background: >> >> Java agents support both load time and dynamic >> instrumentation.?? At load time, >> the agent's ClassFileTransformer is invoked to transform >> class bytes.? There is >> no Class objects at this time.? Dynamic instrumentation is >> when redefineClasses >> or retransformClasses is used to redefine an existing loaded >> class.? The >> ClassFileTransformer is invoked with class bytes where the >> Class object is present. >> >> Java agent doing instrumentation needs a means to define >> auxiliary classes >> that are visible and accessible to the instrumented class.? >> Existing agents >> have been using sun.misc.Unsafe::defineClass to define aux >> classes directly >> or accessing protected ClassLoader::defineClass method with >> setAccessible to >> suppress the language access check (see [1] where this issue >> was brought up). >> >> Instrumentation::appendToBootstrapClassLoaderSearch and >> appendToSystemClassLoaderSearch >> APIs are existing means to supply additional classes.? It's >> too limited >> for example it can't inject a class in the same runtime >> package as the class >> being transformed. >> >> Proposal: >> >> This proposes to add a new ClassFileTransformer.transform >> method taking additional ClassDefiner parameter.? A >> transformer can define additional >> classes during the transformation process, i.e. >> when ClassFileTransformer::transform is invoked. Some details: >> >> 1. ClassDefiner::defineClass defines a class in the same >> runtime package >> ?? as the class being transformed. >> 2. The class is defined in the same thread as the >> transformers are being >> ?? invoked.?? ClassDefiner::defineClass returns Class object >> directly >> ?? before the transformed class is defined. >> 3. No transformation is applied to classes defined by >> ClassDefiner::defineClass. >> >> The first prototype we did is to collect the auxiliary >> classes and define >> them? until all transformers are invoked and have these aux >> classes to go >> through the transformation pipeline. Several complicated >> issues would >> need to be resolved for example timing whether the auxiliary >> classes should >> be defined before the transformed class (otherwise a >> potential race where >> some other thread references the transformed class and cause >> the code to >> execute that in turn reference the auxiliary classes.? The >> current >> implementation has a native reentrancy check that ensure one >> class is being >> transformed to avoid potential circularity issues.? This may >> need JVM TI >> support to be reliable. >> >> This proposal would allow java agents to migrate from >> internal API and ClassDefiner to be enhanced in the future. >> >> Webrev: >> http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ >> >> >> Mandy >> [1] >> http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandy.chung at oracle.com Wed Apr 18 09:20:41 2018 From: mandy.chung at oracle.com (mandy chung) Date: Wed, 18 Apr 2018 17:20:41 +0800 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after In-Reply-To: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> References: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> Message-ID: <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> Hi Sangheon, On 4/18/18 12:41 PM, sangheon.kim wrote: > > CR: https://bugs.openjdk.java.net/browse/JDK-8196325 > webrev: http://cr.openjdk.java.net/~sangheki/8196325/webrev.0/ This is indeed a regression.? GcInfoBuilder depends on the order of the pool name array. The change looks okay.? I would suggest to use stream in the new getAllMemoryPoolNames() like this: ??? public static String[] getAllMemoryPoolNames() { ??????? return Arrays.stream(MemoryImpl.getMemoryPools()) ??????????????? ? ??? .map(MemoryPoolMXBean::getName) ???????????????????? .toArray(String[]::new); ??? } > Testing: jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2,builds-tier1, > jdk_management, jdk_jmx > These test groups are good. Mandy > Thanks, > Sangheon -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew_m_leonard at uk.ibm.com Wed Apr 18 15:23:28 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Wed, 18 Apr 2018 16:23:28 +0100 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> Message-ID: Hi Serguei, Do you need me to try anything else for this review? hotspot/jtreg/serviceability suite run successfully. Many Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: daniel.daugherty at oracle.com, Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 16/04/2018 07:10 Subject: Re: RFR: 8201409: JDWP debugger initialization hangs intermittently On 4/15/18 10:01, Daniel D. Daugherty wrote: On 4/13/18 3:07 PM, serguei.spitsyn at oracle.com wrote: Andrew and reviewers, I'm re-sending this RFR with a corrected subject that includes the bug number. The issues is: https://bugs.openjdk.java.net/browse/JDK-8201409 Webrev: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c So now pauses in debugLoop_run() before the loop that reads cmds. Looks good. src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c So the VM_INIT event handler now signals that we have received the VM_INIT event so that allows debugLoop_run() to proceed. Serguei, this fix needs to have the most of the Serviceability stack of tests run against it (jdwp, JVM/TI, JDI and jdb tests). Based on the email thread, I can't tell which tests have been run with the fix in place. Hi Dan, I'm going to sponsor this fix and will run all the debugger tests. Sorry that I did not announce it yet. Thanks, Serguei Dan The fix looks good to me. Also, I've agreed to skip a unit test as creating it for this issue is not easy. At least, one more review is needed before the fix can be pushed. Thanks, Serguei On 4/11/18 06:33, Andrew Leonard wrote: Hi Serguei, Thank you for raising the bug. I had a chat with one of my colleagues who could recreate it, and it's probably related to the handshaking that is done in the particular scenario. So with the JCK harness: com.sun.jck.lib.ExecJCKTestOtherJVMCmd LD_LIBRARY_PATH=/javatest/lib/jck /jck8b/natives/linux_x86-64 /projects/jck/jdwp/j2sdk-image/bin/java -Xdump:system:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -Xdump:snap:none -Xdump:snap:events=gpf+abort+traceassert+corruptcache -Xdump:java:none -Xdump:java:events=gpf+abort+traceassert+corruptcache -Xdump:heap:none -Xdump:heap:events=gpf+abort+traceassert+corruptcache - Xfuture -agentlib:jdwp=server=y,transport=dt_socket,address=localhost :35000,suspend=y -classpath /javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/classes -Djava.security.policy=/javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/lib/jck.policy javasoft.sqe.jck.lib.jpda.jdwp.DebuggeeLoader -waittime=600 -msgSwitch=ub1604x64vm10:38636 -componentName= ArrayReference.GetValues.getvalues002 Note that the JCK test harness starts the target process, attaches to it, and sends the resume command in a very short time with no handshaking. That may not help..but hopefully helps explain things a bit? It's the timing of the resume command during the test that is crucial, resuming before the VM initialization is complete will trigger it. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 09:57 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, I've filed the bug: https://bugs.openjdk.java.net/browse/JDK-8201409 Also, this is a webrev with your patch: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ I agree that creating a standalone test is tricky here. I've added usleep(10000) into the eventHelper_reportVMInit() and ran the JTreg com/sun/jdi tests with my JDK build. However, none of the tests failed with the failure mode you described. So that I'm puzzled a little bit. I suspect that some specific debugLoop commands were used in your scenario. It is still possible that I've missed something here. Will try to double check everything. Thanks, Serguei On 4/11/18 01:29, Andrew Leonard wrote: Thanks Serguei, I terms of a standalone testcase it is quite tricky, as due to the nature of the issue which took a lot of investigation to solve it's very timing dependent and will only occur randomly. It can be forced as I indicated below by adding a "sleep" in the VMInit report code but that's not a testcase, however the issue was originally found in our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, but again only happened intermittently. Sort of like "performance" type issues we're not always going to be able to create a testcase that will always "fail" if the fix is not present. Your thoughts? Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 01:02 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, Okay, I'll file a bug on this topic. But do you have a standalone test demonstrating this issue? Thanks, Serguei On 4/10/18 06:23, Andrew Leonard wrote: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Wed Apr 18 16:49:31 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 18 Apr 2018 09:49:31 -0700 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> Message-ID: <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> An HTML attachment was scrubbed... URL: From jini.george at oracle.com Wed Apr 18 17:07:07 2018 From: jini.george at oracle.com (Jini George) Date: Wed, 18 Apr 2018 22:37:07 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> Message-ID: <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> I agree with the need of testing as much as we can. I could do something on the lines of how other debuggers like LLDB test: if we can't find the core file location, check for "|" at the beginning of a line in the /proc/sys/kernel/core_pattern file -- and fail with a message stating that the system is using a crash reporting tool. Thank you, Jini. On 4/18/2018 12:40 PM, David Holmes wrote: > My 2c ... > > We have to have tests that can test core file attaching capability - > else we don't know it works. So we have to try and generate a core file. > > But, we have to expect that in many cases no core file will be generated > even if the hs-err file claims it was. For example my primary local > testing system never generates core files even though it claims to: > > # Core dump will be written. Default location: Core dumps may be > processed with "/usr/share/apport/apport %p %s %c" (or dumping to / > export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) > > apport isn't even installed, even though core_pattern lists it. > > Cheers, > David > > On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >> 2018-04-18 15:05 GMT+09:00 Jini George : >>> Thank you very much, Yasumasa, for pointing this out. You are right >>> -- this >>> would fail in the Linux systems if systemd-coredump is enabled. >>> >>> I plan to file an enhancement request to address this issue (wrt >>> systemd-coredump) separately since this would apply to other coredump >>> generating test cases also like: >>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >> >> I agree with you, but... >> >>> ?From what i can gather, i think we might be able to at least partially >>> address this by using >>> >>> coredumptl -o dump >>> >>> in the test cases, provided the kernel.core_pattern variable is not >>> set to >>> "|/bin/false". >>> >>> Let me know if you are not OK with this. >> >> IMHO it is not good. >> Some Linux distros use other coredump collector. For example, RHEL 6 >> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >> Hence I think we should disable all tests which requires core images >> for Linux like a Windows platform. >> >> >> Thanks, >> >> Yasumasa >> >> >>> Thank you, >>> Jini. >>> >>> >>> >>> >>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>> >>>> Hi Jini, >>>> >>>> ClhsdbCDSCore.java: >>>> ??? Can this test work on modern Linux? >>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>> images. So >>>> I concern ClhsdbCDSCore.java fails in the future. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> On 2018/04/12 13:21, Jini George wrote: >>>>> >>>>> Ping: Gentle reminder ! >>>>> >>>>> Thanks, >>>>> Jini. >>>>> >>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>> >>>>>> Hello! >>>>>> >>>>>> Requesting reviews for: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>> >>>>>> While trying to identify the type given an address, a >>>>>> WrongTypeException >>>>>> was getting thrown with various clhsdb commands (like printmdo, >>>>>> jstack, >>>>>> etc). This was since SA tries to map an address to a hotspot C++ >>>>>> type by >>>>>> comparing the vtable address to the vtable address values of known >>>>>> types. With CDS, since the vtables are copied over for the Metadata >>>>>> classes, the vtable addresses themselves don't match (though, of >>>>>> course, >>>>>> the contents will), and SA errors out. >>>>>> >>>>>> The fix has been implemented by making changes to read in the md >>>>>> region >>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and mapping >>>>>> the vtable addresses to the corresponding metadata type >>>>>> (ConstantPool, >>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>> >>>>>> For corefiles, an additional modification has been done to have the >>>>>> replicated FileMapHeader structure (from >>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>> >>>>>> Test cases to test both live and corefile debugging are being >>>>>> added with >>>>>> this. These and other SA tests pass on Mach5. >>>>>> >>>>>> Thanks, >>>>>> Jini. >>>>> >>>>> >>> From jini.george at oracle.com Wed Apr 18 17:22:59 2018 From: jini.george at oracle.com (Jini George) Date: Wed, 18 Apr 2018 22:52:59 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <0a257ea0-ec4f-3cdf-77d0-a19cbe3c992f@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <134bd4a8-9e65-73a2-df88-c66b70c37e60@oracle.com> <0a257ea0-ec4f-3cdf-77d0-a19cbe3c992f@oracle.com> Message-ID: <7a68951f-018f-7b31-3e4b-c1414118d510@oracle.com> Thank you very much for the reviews, Coleen and Ioi. I have filed the RFE. -Jini. On 4/17/2018 11:09 AM, Ioi Lam wrote: > The changes look good to me. > > I agree with Coleen. Maybe FileMapHeader should be moved to a common > file that can be included in all the ps_core files, as well as the VM's > filemap.hpp? > > Thanks > > - Ioi > > > On 4/13/18 8:35 AM, coleen.phillimore at oracle.com wrote: >> >> This change seems good. >> >> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_core.c.udiff.html >> >> >> It seems that you have three copies of this code.? Can you file an RFE >> to consolidate these? >> >> thanks, >> Coleen >> >> On 4/12/18 12:21 AM, Jini George wrote: >>> Ping: Gentle reminder ! >>> >>> Thanks, >>> Jini. >>> >>> On 4/6/2018 9:51 PM, Jini George wrote: >>>> Hello! >>>> >>>> Requesting reviews for: >>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>> >>>> While trying to identify the type given an address, a >>>> WrongTypeException was getting thrown with various clhsdb commands >>>> (like printmdo, jstack, etc). This was since SA tries to map an >>>> address to a hotspot C++ type by comparing the vtable address to the >>>> vtable address values of known types. With CDS, since the vtables >>>> are copied over for the Metadata classes, the vtable addresses >>>> themselves don't match (though, of course, the contents will), and >>>> SA errors out. >>>> >>>> The fix has been implemented by making changes to read in the md >>>> region (consisting of the c++ vtables) of the CDS archive in SA, and >>>> mapping the vtable addresses to the corresponding metadata type >>>> (ConstantPool, InstanceKlass, InstanceClassLoaderKlass, >>>> InstanceMirrorKlass, InstanceRefKlass, Method, ObjArrayKlass, >>>> TypeArrayKlass). >>>> >>>> For corefiles, an additional modification has been done to have the >>>> replicated FileMapHeader structure (from >>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>> ps_core.c), to be in sync with the corresponding definition in >>>> src/hotspot/share/memory/filemap.hpp. >>>> >>>> Test cases to test both live and corefile debugging are being added >>>> with this. These and other SA tests pass on Mach5. >>>> >>>> Thanks, >>>> Jini. >> > From rafael.wth at gmail.com Wed Apr 18 20:28:27 2018 From: rafael.wth at gmail.com (Rafael Winterhalter) Date: Wed, 18 Apr 2018 22:28:27 +0200 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: <460808aa-6eef-4eae-31b2-191e2fbaac79@oracle.com> References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> <460808aa-6eef-4eae-31b2-191e2fbaac79@oracle.com> Message-ID: Hei Mandy, Generally I agree with you that these two should be seperate but I think that there is a very blurry border between the two. Java agents are sometimes used to test something in a production environment and there is a long history of agent development that relies on the easy and general way of defining classes using sun.misc.Unsafe. I have looked through the agents I worked on and for the most, classes are defined in the same package. In a few cases, and this is true for most agents, there is however a need to define classes in other packages, too. In this context, things quickly get complex to manage. If I use a method handle lookup, I need to find a loaded class and potentially open its module. From within an agent, it is however way to easy to break class loading orders and to trigger premature loading what makes this inconvienent. The same goes for retransforming a class to get hold of a suitable class definer instance for a given package. For all these reasons, I believe that given this approach, most people will simply fall back to jdk.internal.misc.Unsafe which can be opened too and which is much easier. Therefore, given the tradition of using this class and its more powerful and more performent approach to defining classes given the multitude of use cases, I believe that this API would not be used by most but people would continue to rely on jdk.internal.misc.Unsafe. At least, I think that I would go for this solution for most of my agents, also considering an easy and safe migration. For this reason, I still recommend adding a method Instrumentation::defineClass instead. Especially since a Java agent does have this privilege, there is no security constraint to uphold here and it offers the simplest solution that most agent developers hope for. Therefore, I do not think that the principle of least privilege should apply here. In contrast, if this API is not added I am afraid that many tools opening jdk.internal.misc might lead to exploits if this package is opened to an unnamed module, for example by APM or other production monitoring tools that typically need to define classes in packages in which no retransformation is applied. (The example I mentioned comes from a real use case in a very widely used APM tool.) Thank you again for considering my point of view on this! Best regards, Rafael 2018-04-18 9:46 GMT+02:00 mandy chung : > Hi Rafael, > > I think it's best to separate the testing requirement from java agents > doing instrumentation that may run in production environment. > > I have created a JBS issue to track the testing mode idea that would > require more discussion and investigation: > https://bugs.openjdk.java.net/browse/JDK-8201562 > > I understand it's not as efficient to inject a class in a package > different than the package of the transformed class. With the principle of > least privilege, I prefer not to provide an API to inject any class in any > package and you can achieve by calling retransformClasses. > > What do you think? > > Mandy > > On 4/18/18 5:23 AM, Rafael Winterhalter wrote: > > Hei Mandy, > Lookup::defineClass would always be an alternative but it would require me > to open the class first. If the instrumented type can read the module with > the callback but its module was not opened, this would not help me much, > unfortunately. Also, I could not resolve this lookup as the class in > question is not necessarily loaded at this point. > Best regards, Rafael > > 2018-04-17 9:28 GMT+02:00 mandy chung : > >> Hi Rafael, >> >> I see that mocking/proxying/testing framework should be looked at >> separately since its requirements and approaches can be different than tool >> agents. >> >> On 4/17/18 5:06 AM, Rafael Winterhalter wrote: >> >> Hei Mandy, >> >> I have looked into several Java agents that I have worked on and for many >> of them, this API does unfortunately not supply sufficient access. I would >> therefore still prefer a method Instrumentation::defineClass. >> >> The problem is that some agents need to define classes in other packages >> then in that of the instrumented class. For example, I might need to >> enhance a library that defines a set of callback classes in package A. All >> these classes share a common super class with a package-private >> constructor. I want to instrument some class in package B to use a callback >> that the library does not supply and need to add a new callback class to A. >> This is not possible using the current API. >> >> >> Are these callback classes made available statically? or just >> dynamically defining additional class as needed? Is Lookup::defineClass an >> alternative if you get a hold of common super class in A? >> >> I could however achieve do so by calling Instrumentation::retransform on >> one of the classes in A after registering a class file transformer. Once >> the retransformation is triggered, I can now define a class in A. Of course >> this is inefficient and I would rather open the jdk.internal.misc module >> and use the "old" API instead. >> >> For this reason, I argue that this rather restrained API is not >> convenient while it does not add anything to security. Also, for the use >> case of Mockito, this would neither be sufficient as Mockito sometimes >> redefines classes and sometimes adds a subclass without retransforming. We >> would rather have direct access to class definition once we are already >> running with the privileges of a Java agent. >> >> I would therefore suggest to add a method: >> >> interface Instrumentation { >> Class defineClass(byte[] bytes, ProtectionDomain pd); >> } >> >> which can be implemented simply by delegating to jdk.internal.misc.Unsafe. >> >> On a side note. Does JavaLangAccess::defineClass work with the bootstrap >> class loader? I have not tried it but I always thought it was just an >> access layer for the class loader API that cannot access the null value. >> >> >> The JVM entry point does allow null loader. >> >> Mandy >> >> >> Thanks for considering this use case! >> Best regards, Rafael >> >> 2018-04-15 8:23 GMT+02:00 mandy chung : >> >>> Background: >>> >>> Java agents support both load time and dynamic instrumentation. At >>> load time, >>> the agent's ClassFileTransformer is invoked to transform class bytes. >>> There is >>> no Class objects at this time. Dynamic instrumentation is when >>> redefineClasses >>> or retransformClasses is used to redefine an existing loaded class. The >>> ClassFileTransformer is invoked with class bytes where the Class object >>> is present. >>> >>> Java agent doing instrumentation needs a means to define auxiliary >>> classes >>> that are visible and accessible to the instrumented class. Existing >>> agents >>> have been using sun.misc.Unsafe::defineClass to define aux classes >>> directly >>> or accessing protected ClassLoader::defineClass method with >>> setAccessible to >>> suppress the language access check (see [1] where this issue was brought >>> up). >>> >>> Instrumentation::appendToBootstrapClassLoaderSearch and >>> appendToSystemClassLoaderSearch >>> APIs are existing means to supply additional classes. It's too limited >>> for example it can't inject a class in the same runtime package as the >>> class >>> being transformed. >>> >>> Proposal: >>> >>> This proposes to add a new ClassFileTransformer.transform method taking >>> additional ClassDefiner parameter. A transformer can define additional >>> classes during the transformation process, i.e. >>> when ClassFileTransformer::transform is invoked. Some details: >>> >>> 1. ClassDefiner::defineClass defines a class in the same runtime package >>> as the class being transformed. >>> 2. The class is defined in the same thread as the transformers are being >>> invoked. ClassDefiner::defineClass returns Class object directly >>> before the transformed class is defined. >>> 3. No transformation is applied to classes defined by >>> ClassDefiner::defineClass. >>> >>> The first prototype we did is to collect the auxiliary classes and define >>> them until all transformers are invoked and have these aux classes to go >>> through the transformation pipeline. Several complicated issues would >>> need to be resolved for example timing whether the auxiliary classes >>> should >>> be defined before the transformed class (otherwise a potential race where >>> some other thread references the transformed class and cause the code to >>> execute that in turn reference the auxiliary classes. The current >>> implementation has a native reentrancy check that ensure one class is >>> being >>> transformed to avoid potential circularity issues. This may need JVM >>> TI >>> support to be reliable. >>> >>> This proposal would allow java agents to migrate from internal API and >>> ClassDefiner to be enhanced in the future. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ >>> >>> Mandy >>> [1] http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/ >>> 000405.html >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sangheon.kim at oracle.com Wed Apr 18 20:52:08 2018 From: sangheon.kim at oracle.com (sangheon.kim) Date: Wed, 18 Apr 2018 13:52:08 -0700 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after In-Reply-To: <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> References: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> Message-ID: Hi Mandy, On 04/18/2018 02:20 AM, mandy chung wrote: > Hi Sangheon, > > > On 4/18/18 12:41 PM, sangheon.kim wrote: >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8196325 >> webrev: http://cr.openjdk.java.net/~sangheki/8196325/webrev.0/ > > This is indeed a regression.? GcInfoBuilder depends on the order of > the pool name array. > > The change looks okay.? I would suggest to use stream in the new > getAllMemoryPoolNames() like this: > > ??? public static String[] getAllMemoryPoolNames() { > ??????? return Arrays.stream(MemoryImpl.getMemoryPools()) > ??????????????? ? ??? .map(MemoryPoolMXBean::getName) > ???????????????????? .toArray(String[]::new); > ??? } Done. > >> Testing: >> jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2,builds-tier1, >> jdk_management, jdk_jmx >> > > These test groups are good. Okay. Webrev: http://cr.openjdk.java.net/~sangheki/8196325/webrev.1 (full) http://cr.openjdk.java.net/~sangheki/8196325/webrev.1_to_0/ (inc) Sangheon > > Mandy > >> Thanks, >> Sangheon > From mandy.chung at oracle.com Wed Apr 18 23:57:21 2018 From: mandy.chung at oracle.com (mandy chung) Date: Thu, 19 Apr 2018 07:57:21 +0800 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after In-Reply-To: References: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> Message-ID: <1ca3129d-eab4-27ad-a068-27b93b2f823f@oracle.com> On 4/19/18 4:52 AM, sangheon.kim wrote: > > > Webrev: > http://cr.openjdk.java.net/~sangheki/8196325/webrev.1 (full) > http://cr.openjdk.java.net/~sangheki/8196325/webrev.1_to_0/ (inc) Looks good. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandy.chung at oracle.com Wed Apr 18 23:59:47 2018 From: mandy.chung at oracle.com (mandy chung) Date: Thu, 19 Apr 2018 07:59:47 +0800 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: <460808aa-6eef-4eae-31b2-191e2fbaac79@oracle.com> References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> <460808aa-6eef-4eae-31b2-191e2fbaac79@oracle.com> Message-ID: https://bugs.openjdk.java.net/browse/JDK-8201784 This is the correct JBS issue (Sorry I cut-n-paste the wrong link). Mandy On 4/18/18 3:46 PM, mandy chung wrote: > Hi Rafael, > > I think it's best to separate the testing requirement from java agents > doing instrumentation that may run in production environment. > > I have created a JBS issue to track the testing mode idea that would > require more discussion and investigation: > https://bugs.openjdk.java.net/browse/JDK-8201562 > > I understand it's not as efficient to inject a class in a package > different than the package of the transformed class. With the > principle of least privilege, I prefer not to provide an API to inject > any class in any package and you can achieve by calling > retransformClasses. > > What do you think? > > Mandy > > On 4/18/18 5:23 AM, Rafael Winterhalter wrote: >> Hei Mandy, >> Lookup::defineClass would always be an alternative but it would >> require me to open the class first. If the instrumented type can read >> the module with the callback but its module was not opened, this >> would not help me much, unfortunately. Also, I could not resolve this >> lookup as the class in question is not necessarily loaded at this point. >> Best regards, Rafael >> >> 2018-04-17 9:28 GMT+02:00 mandy chung > >: >> >> Hi Rafael, >> >> I see that mocking/proxying/testing framework should be looked at >> separately since its requirements and approaches can be different >> than tool agents. >> >> On 4/17/18 5:06 AM, Rafael Winterhalter wrote: >>> Hei Mandy, >>> >>> I have looked into several Java agents that I have worked on and >>> for many of them, this API does unfortunately not supply >>> sufficient access. I would therefore still prefer a method >>> Instrumentation::defineClass. >>> >>> The problem is that some agents need to define classes in other >>> packages then in that of the instrumented class. For example, I >>> might need to enhance a library that defines a set of callback >>> classes in package A. All these classes share a common super >>> class with a package-private constructor. I want to instrument >>> some class in package B to use a callback that the library does >>> not supply and need to add a new callback class to A. This is >>> not possible using the current API. >>> >> >> Are these callback classes made available statically?? or just >> dynamically defining additional class as needed?? Is >> Lookup::defineClass an alternative if you get a hold of common >> super class in A? >> >>> I could however achieve do so by calling >>> Instrumentation::retransform on one of the classes in A after >>> registering a class file transformer. Once the retransformation >>> is triggered, I can now define a class in A. Of course this is >>> inefficient and I would rather open the jdk.internal.misc module >>> and use the "old" API instead. >>> >>> For this reason, I argue that this rather restrained API is not >>> convenient while it does not add anything to security. Also, for >>> the use case of Mockito, this would neither be sufficient as >>> Mockito sometimes redefines classes and sometimes adds a >>> subclass without retransforming. We would rather have direct >>> access to class definition once we are already running with the >>> privileges of a Java agent. >>> >>> I would therefore suggest to add a method: >>> >>> interface Instrumentation { >>> ? Class defineClass(byte[] bytes, ProtectionDomain pd); >>> } >>> >>> which can be implemented simply by delegating to >>> jdk.internal.misc.Unsafe. >>> >>> On a side note. Does JavaLangAccess::defineClass work with the >>> bootstrap class loader? I have not tried it but I always thought >>> it was just an access layer for the class loader API that cannot >>> access the null value. >>> >> >> The JVM entry point does allow null loader. >> >> Mandy >> >> >>> Thanks for considering this use case! >>> Best regards, Rafael >>> >>> 2018-04-15 8:23 GMT+02:00 mandy chung >> >: >>> >>> Background: >>> >>> Java agents support both load time and dynamic >>> instrumentation.?? At load time, >>> the agent's ClassFileTransformer is invoked to transform >>> class bytes.? There is >>> no Class objects at this time.? Dynamic instrumentation is >>> when redefineClasses >>> or retransformClasses is used to redefine an existing loaded >>> class.? The >>> ClassFileTransformer is invoked with class bytes where the >>> Class object is present. >>> >>> Java agent doing instrumentation needs a means to define >>> auxiliary classes >>> that are visible and accessible to the instrumented class.? >>> Existing agents >>> have been using sun.misc.Unsafe::defineClass to define aux >>> classes directly >>> or accessing protected ClassLoader::defineClass method with >>> setAccessible to >>> suppress the language access check (see [1] where this issue >>> was brought up). >>> >>> Instrumentation::appendToBootstrapClassLoaderSearch and >>> appendToSystemClassLoaderSearch >>> APIs are existing means to supply additional classes.? It's >>> too limited >>> for example it can't inject a class in the same runtime >>> package as the class >>> being transformed. >>> >>> Proposal: >>> >>> This proposes to add a new ClassFileTransformer.transform >>> method taking additional ClassDefiner parameter.? A >>> transformer can define additional >>> classes during the transformation process, i.e. >>> when ClassFileTransformer::transform is invoked. Some details: >>> >>> 1. ClassDefiner::defineClass defines a class in the same >>> runtime package >>> ?? as the class being transformed. >>> 2. The class is defined in the same thread as the >>> transformers are being >>> ?? invoked.?? ClassDefiner::defineClass returns Class object >>> directly >>> ?? before the transformed class is defined. >>> 3. No transformation is applied to classes defined by >>> ClassDefiner::defineClass. >>> >>> The first prototype we did is to collect the auxiliary >>> classes and define >>> them? until all transformers are invoked and have these aux >>> classes to go >>> through the transformation pipeline. Several complicated >>> issues would >>> need to be resolved for example timing whether the auxiliary >>> classes should >>> be defined before the transformed class (otherwise a >>> potential race where >>> some other thread references the transformed class and cause >>> the code to >>> execute that in turn reference the auxiliary classes.? The >>> current >>> implementation has a native reentrancy check that ensure one >>> class is being >>> transformed to avoid potential circularity issues.? This may >>> need JVM TI >>> support to be reliable. >>> >>> This proposal would allow java agents to migrate from >>> internal API and ClassDefiner to be enhanced in the future. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ >>> >>> >>> Mandy >>> [1] >>> http://mail.openjdk.java.net/pipermail/jdk-dev/2018-January/000405.html >>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew_m_leonard at uk.ibm.com Thu Apr 19 07:37:59 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Thu, 19 Apr 2018 08:37:59 +0100 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> Message-ID: No Problem, thanks Serguei, let me know if I can help in any way. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: daniel.daugherty at oracle.com, serviceability-dev at openjdk.java.net Date: 18/04/2018 17:56 Subject: Re: RFR: 8201409: JDWP debugger initialization hangs intermittently Hi Andrew, Sorry, I did not reply earlier. The fix need more testing. We also have some important tests in closed. I'll run them but I'm a little bit busy at the moment. You have two reviews which is enough for push after testing. Thanks, Serguei On 4/18/18 08:23, Andrew Leonard wrote: Hi Serguei, Do you need me to try anything else for this review? hotspot/jtreg/serviceability suite run successfully. Many Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: daniel.daugherty at oracle.com, Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 16/04/2018 07:10 Subject: Re: RFR: 8201409: JDWP debugger initialization hangs intermittently On 4/15/18 10:01, Daniel D. Daugherty wrote: On 4/13/18 3:07 PM, serguei.spitsyn at oracle.com wrote: Andrew and reviewers, I'm re-sending this RFR with a corrected subject that includes the bug number. The issues is: https://bugs.openjdk.java.net/browse/JDK-8201409 Webrev: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c So now pauses in debugLoop_run() before the loop that reads cmds. Looks good. src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c So the VM_INIT event handler now signals that we have received the VM_INIT event so that allows debugLoop_run() to proceed. Serguei, this fix needs to have the most of the Serviceability stack of tests run against it (jdwp, JVM/TI, JDI and jdb tests). Based on the email thread, I can't tell which tests have been run with the fix in place. Hi Dan, I'm going to sponsor this fix and will run all the debugger tests. Sorry that I did not announce it yet. Thanks, Serguei Dan The fix looks good to me. Also, I've agreed to skip a unit test as creating it for this issue is not easy. At least, one more review is needed before the fix can be pushed. Thanks, Serguei On 4/11/18 06:33, Andrew Leonard wrote: Hi Serguei, Thank you for raising the bug. I had a chat with one of my colleagues who could recreate it, and it's probably related to the handshaking that is done in the particular scenario. So with the JCK harness: com.sun.jck.lib.ExecJCKTestOtherJVMCmd LD_LIBRARY_PATH=/javatest/lib/jck /jck8b/natives/linux_x86-64 /projects/jck/jdwp/j2sdk-image/bin/java -Xdump:system:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -Xdump:snap:none -Xdump:snap:events=gpf+abort+traceassert+corruptcache -Xdump:java:none -Xdump:java:events=gpf+abort+traceassert+corruptcache -Xdump:heap:none -Xdump:heap:events=gpf+abort+traceassert+corruptcache - Xfuture -agentlib:jdwp=server=y,transport=dt_socket,address=localhost :35000,suspend=y -classpath /javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/classes -Djava.security.policy=/javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/lib/jck.policy javasoft.sqe.jck.lib.jpda.jdwp.DebuggeeLoader -waittime=600 -msgSwitch=ub1604x64vm10:38636 -componentName= ArrayReference.GetValues.getvalues002 Note that the JCK test harness starts the target process, attaches to it, and sends the resume command in a very short time with no handshaking. That may not help..but hopefully helps explain things a bit? It's the timing of the resume command during the test that is crucial, resuming before the VM initialization is complete will trigger it. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 09:57 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, I've filed the bug: https://bugs.openjdk.java.net/browse/JDK-8201409 Also, this is a webrev with your patch: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ I agree that creating a standalone test is tricky here. I've added usleep(10000) into the eventHelper_reportVMInit() and ran the JTreg com/sun/jdi tests with my JDK build. However, none of the tests failed with the failure mode you described. So that I'm puzzled a little bit. I suspect that some specific debugLoop commands were used in your scenario. It is still possible that I've missed something here. Will try to double check everything. Thanks, Serguei On 4/11/18 01:29, Andrew Leonard wrote: Thanks Serguei, I terms of a standalone testcase it is quite tricky, as due to the nature of the issue which took a lot of investigation to solve it's very timing dependent and will only occur randomly. It can be forced as I indicated below by adding a "sleep" in the VMInit report code but that's not a testcase, however the issue was originally found in our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, but again only happened intermittently. Sort of like "performance" type issues we're not always going to be able to create a testcase that will always "fail" if the fix is not present. Your thoughts? Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 01:02 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, Okay, I'll file a bug on this topic. But do you have a standalone test demonstrating this issue? Thanks, Serguei On 4/10/18 06:23, Andrew Leonard wrote: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Sat Apr 21 01:29:55 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 20 Apr 2018 18:29:55 -0700 Subject: FW: 4698670: JDI spec?: some methods don't throw ClassNotLoadedException for non-loaded types In-Reply-To: <09c1a02d-c028-4e05-7ba8-35cc011dbe71@oracle.com> References: <08E8DEA5-86A8-4AF9-AC31-98988B365CEC@oracle.com> <28FD26AF-59D2-427B-BA04-B8601E67DA14@oracle.com> <09c1a02d-c028-4e05-7ba8-35cc011dbe71@oracle.com> Message-ID: <14b44422-eab8-6b08-4d80-c083dde20b0e@oracle.com> Hi Daniil, I have the same question and also suggestion to remove the referenced debuggee fields dummyF and finDummyF as they have no other uses. Also, a minor suggestion is to reorder the parameters of the MockReferenceType constructor the same as in the createObjectReference() method. It looks more natural to me. Hi Jerry, We have to reply in the open. I also tend to do this mistake. Let's fix each other. I've added the svc-dev list back. Thanks, Serguei On 4/20/18 18:10, Gerald Thornbrugh wrote: > Hi Daniil, > > Your changes look good but I have a small nit. > Since the code no longer uses the DEBUGGEE_METHODS[i][1] elements in the > array due to your change, should they be removed? > > Thanks for doing this! > > Jerry > > On 04/20/2018 04:47 PM, Daniil Titov wrote: >> Hello, >> >> Could you please help with reviewing this change? I need 2 reviewers. >> >> Thanks! >> >> Best regards, >> Daniil >> >> >> ?On 4/19/18, 4:52 PM, "Daniil Titov" wrote: >> >> ???? Please review the change that affects two tests: >> ???? ????? - >> fromTonga/nsk/jdi/ClassType/invokeMethod/invokemethod009/TestDescription.java >> ????? - >> fromTonga/nsk/jdi/ObjectReference/invokeMethod/invokemethod006/TestDescription.java >> ???? ???? Both tests try to emulate the case when the method with a >> parameter is invoked in the target VM via JDWP while the class >> corresponding to the parameter type is not yet loaded. The problem >> with these tests is that when they invoke the method in the target VM >> they pass null as an argument and in case of null >> com.sun.tools.jdi.ValueImpl.prepareForAssignment(Value, >> ValueContainer) method does a quick return without any additional check. >> ???? ??????? 39??????? static ValueImpl prepareForAssignment(Value >> value, >> ???????? 40 ValueContainer destination) >> ???????? 41????????????????????? throws InvalidTypeException, >> ClassNotLoadedException { >> ???????? 42??????????? if (value == null) { >> ???????? 43??????????????? /* >> ???????? 44???????????????? * TO DO: Centralize JNI signature knowledge >> ???????? 45???????????????? */ >> ???????? 46??????????????? if (destination.signature().length() == 1) { >> ???????? 47??????????????????? throw new InvalidTypeException("Can't >> set a primitive type to null"); >> ???????? 48??????????????? } >> ???????? 49??????????????? return null;??? // no further checking or >> conversion necessary >> ???????? 50??????????? } else { >> ???????? 51??????????????? return >> ((ValueImpl)value).prepareForAssignmentTo(destination); >> ???????? 52??????????? } >> ???????? 53??????? } >> ???? ???? This behavior is consistent with the code existing in >> open/src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java, >> open/src/jdk.jdi/share/classes/com/sun/tools/jdi/ClassTypeImpl.java, >> and >> open/src/jdk.jdi/share/classes/com/sun/tools/jdi/ArrayReferenceImpl.java >> ???? ??????? 279??????????? } catch (ClassNotLoadedException e) { >> ??????? 280??????????????? /* >> ??????? 281???????????????? * Since we got this exception, >> ??????? 282???????????????? * the field type must be a reference >> type. The value >> ??????? 283???????????????? * we're trying to set is null, but if the >> field's >> ??????? 284???????????????? * class has not yet been loaded through >> the enclosing >> ??????? 285???????????????? * class loader, then setting to null is >> essentially a >> ??????? 286???????????????? * no-op, and we should allow it without >> an exception. >> ??????? 287???????????????? */ >> ??????? 288??????????????? if (value != null) { >> ??????? 289??????????????????? throw e; >> ??????? 290??????????????? } >> ??????? 291??????????? } >> ??????? 292??????? } >> ???? ???? The fix corrects the tests to construct a mock value object >> with a required type and instead of null pass this mock value as an >> argument in the method invocation in the target VM. >> ???? ???? Bug: https://bugs.openjdk.java.net/browse/JDK-4698670 >> ???? Webrev: http://javaweb.us.oracle.com/~datitov/4698670/webrev.01 >> ???? ???? Best regards, >> ???? Daniil >> >> > From daniil.x.titov at oracle.com Mon Apr 23 04:56:43 2018 From: daniil.x.titov at oracle.com (daniil.x.titov at oracle.com) Date: Sun, 22 Apr 2018 21:56:43 -0700 Subject: FW: 4698670: JDI spec?: some methods don't throw ClassNotLoadedException for non-loaded types In-Reply-To: <14b44422-eab8-6b08-4d80-c083dde20b0e@oracle.com> References: <08E8DEA5-86A8-4AF9-AC31-98988B365CEC@oracle.com> <28FD26AF-59D2-427B-BA04-B8601E67DA14@oracle.com> <09c1a02d-c028-4e05-7ba8-35cc011dbe71@oracle.com> <14b44422-eab8-6b08-4d80-c083dde20b0e@oracle.com> Message-ID: <7d2a0f6e-4292-c542-d363-38e3984cd9f8@oracle.com> Thank you, Serguei and Jerry, I have a new version of the webrev, however I cannot upload it to javaweb.us.oracle.com. Moreover, all previous webrevs uploaded there (not only mine, just in case Mikael's webrev I attached to this email are affected as well) are no longer available. Not sure is is some part of the maintenance over a weekend or something else. Could you please advise what other internal resource could be used to publish closed webrev? Thanks! Best regards, Daniil On 4/20/18 6:29 PM, serguei.spitsyn at oracle.com wrote: > Hi Daniil, > > I have the same question and also suggestion to remove the referenced > debuggee fields dummyF and finDummyF as they have no other uses. > > Also, a minor suggestion is to reorder the parameters of the > MockReferenceType > constructor the same as in the createObjectReference() method. > It looks more natural to me. > > Hi Jerry, > We have to reply in the open. > I also tend to do this mistake. > Let's fix each other. > > I've added the svc-dev list back. > > Thanks, > Serguei > > > On 4/20/18 18:10, Gerald Thornbrugh wrote: >> Hi Daniil, >> >> Your changes look good but I have a small nit. >> Since the code no longer uses the DEBUGGEE_METHODS[i][1] elements in the >> array due to your change, should they be removed? >> >> Thanks for doing this! >> >> Jerry >> >> On 04/20/2018 04:47 PM, Daniil Titov wrote: >>> Hello, >>> >>> Could you please help with reviewing this change? I need 2 reviewers. >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> >>> ?On 4/19/18, 4:52 PM, "Daniil Titov" wrote: >>> >>> ???? Please review the change that affects two tests: >>> ???? ????? - >>> fromTonga/nsk/jdi/ClassType/invokeMethod/invokemethod009/TestDescription.java >>> ????? - >>> fromTonga/nsk/jdi/ObjectReference/invokeMethod/invokemethod006/TestDescription.java >>> ???? ???? Both tests try to emulate the case when the method with a >>> parameter is invoked in the target VM via JDWP while the class >>> corresponding to the parameter type is not yet loaded. The problem >>> with these tests is that when they invoke the method in the target >>> VM they pass null as an argument and in case of null >>> com.sun.tools.jdi.ValueImpl.prepareForAssignment(Value, >>> ValueContainer) method does a quick return without any additional >>> check. >>> ???? ??????? 39??????? static ValueImpl prepareForAssignment(Value >>> value, >>> ???????? 40 ValueContainer destination) >>> ???????? 41????????????????????? throws InvalidTypeException, >>> ClassNotLoadedException { >>> ???????? 42??????????? if (value == null) { >>> ???????? 43??????????????? /* >>> ???????? 44???????????????? * TO DO: Centralize JNI signature knowledge >>> ???????? 45???????????????? */ >>> ???????? 46??????????????? if (destination.signature().length() == 1) { >>> ???????? 47??????????????????? throw new InvalidTypeException("Can't >>> set a primitive type to null"); >>> ???????? 48??????????????? } >>> ???????? 49??????????????? return null;??? // no further checking or >>> conversion necessary >>> ???????? 50??????????? } else { >>> ???????? 51??????????????? return >>> ((ValueImpl)value).prepareForAssignmentTo(destination); >>> ???????? 52??????????? } >>> ???????? 53??????? } >>> ???? ???? This behavior is consistent with the code existing in >>> open/src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java, >>> open/src/jdk.jdi/share/classes/com/sun/tools/jdi/ClassTypeImpl.java, >>> and >>> open/src/jdk.jdi/share/classes/com/sun/tools/jdi/ArrayReferenceImpl.java >>> >>> ???? ??????? 279??????????? } catch (ClassNotLoadedException e) { >>> ??????? 280??????????????? /* >>> ??????? 281???????????????? * Since we got this exception, >>> ??????? 282???????????????? * the field type must be a reference >>> type. The value >>> ??????? 283???????????????? * we're trying to set is null, but if >>> the field's >>> ??????? 284???????????????? * class has not yet been loaded through >>> the enclosing >>> ??????? 285???????????????? * class loader, then setting to null is >>> essentially a >>> ??????? 286???????????????? * no-op, and we should allow it without >>> an exception. >>> ??????? 287???????????????? */ >>> ??????? 288??????????????? if (value != null) { >>> ??????? 289??????????????????? throw e; >>> ??????? 290??????????????? } >>> ??????? 291??????????? } >>> ??????? 292??????? } >>> ???? ???? The fix corrects the tests to construct a mock value >>> object with a required type and instead of null pass this mock value >>> as an argument in the method invocation in the target VM. >>> ???? ???? Bug: https://bugs.openjdk.java.net/browse/JDK-4698670 >>> ???? Webrev: http://javaweb.us.oracle.com/~datitov/4698670/webrev.01 >>> ???? ???? Best regards, >>> ???? Daniil >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: llniibcbnebkcpja.png Type: image/png Size: 111721 bytes Desc: not available URL: -------------- next part -------------- An embedded message was scrubbed... From: Mikael Vidstedt Subject: RFR: 8202053: Address warnings in former tonga tests when building with VS2017 Date: Thu, 19 Apr 2018 16:18:32 -0700 Size: 2349 URL: From harsha.wardhana.b at oracle.com Mon Apr 23 05:20:29 2018 From: harsha.wardhana.b at oracle.com (Harsha Wardhana B) Date: Mon, 23 Apr 2018 10:50:29 +0530 Subject: RFR: JDK-8187498: Add a -Xmanagement flag as syntactic sugar for -Dcom.sun.management.jmxremote.* properties In-Reply-To: <31d45eaa-b5cb-7cd7-5311-6eaa70afe914@Oracle.com> References: <0fca13c3-1b8f-947c-9041-bdea87b04ba8@oracle.com> <5A687F75.3050404@oracle.com> <05c2e5af-b9d0-0386-4565-ade1179a1d49@oracle.com> <7a18b9d5-3a59-8132-2c4f-ba5de35bfa1d@oracle.com> <0e019151-ef48-b4f4-d253-b9da62aa07c3@oracle.com> <248276b1-1589-561f-0718-7bcb1fd578c7@oracle.com> <40cf194e-4d74-3a34-5eb2-1acc5a5abafb@oracle.com> <87cb309b-8764-193a-0447-8ecb741d308d@oracle.com> <66ad342e-f41f-c01b-e515-8100d108e6ef@oracle.com> <0a13fe7c-b7b0-e5d9-1361-bfea19073574@oracle.com> <0332f3ad-1329-da9a-c6a7-192064fb04bb@oracle.com> <9ac62f66-c14d-951f-1a14-ccf8906ecab4@oracle.com> <5A8C33B6.4040206@oracle.com> <7bf813ed-9390-f9e1-b5df-32a4585c2be3@oracle.com> <31d45eaa-b5cb-7cd7-5311-6eaa70afe914@Oracle.com> Message-ID: <4b0155dc-c5b6-9b19-0afc-3bb83d00ce2a@oracle.com> Hi All, After internal discussions, many of the concerns below were addressed and final spec is published at, https://bugs.openjdk.java.net/browse/JDK-8199584 Below is the implementation of the above spec. http://cr.openjdk.java.net/~hb/8187498/webrev.05/ Please review and provide feedback. Thanks Harsha On Wednesday 21 February 2018 08:33 PM, Roger Riggs wrote: > Hi, > > I'm a bit leary of command line arguments being special cased and the > corresponding custom > mappings to system properties.?? The convenience is fine but we need > to keep the handling > out of native code so it is easier to maintain.? We don't have a Java > API for processing > (VM) command line arguments so they are being shoehorned into properties. > > $.02, Roger > > On 2/21/2018 12:55 AM, Harsha Wardhana B wrote: >> >> >> On Wednesday 21 February 2018 01:51 AM, mandy chung wrote: >>> The code review and CSR review can be in parallel.? For this case, >>> I agree with Kumar to have CSR written that would help the code >>> review. Please specify the behavior and its relationship with >>> jcmd and other relevant diagnosability tools. >> ok. >>> >>> On 2/20/18 6:41 AM, Kumar Srinivasan wrote: >>>> >>>>>>> What is the behavior when >>>>>>> -Dcom.sun.management.jmxremote.port=1234 >>>>>>> --start-management-agent port=2345 >>>>>>> -Dcom.sun.management.jmxremote.port=3456? >>>>>>> >>>>>>> What is the value of the system property >>>>>>> com.sun.management.jmxremote.port at runtime?? What port number >>>>>>> does the management server start with? >>>>>> As said earlier, values set via new flags override values set by >>>>>> -D flags. So 2345 will be the value of >>>>>> com.sun.management.jmxremote.port. Added a test case to validate >>>>>> that. >>> >>> VM options are the last one wins if same option specified multiple >>> times.? In this case, it could cause confusion (the last -D option >>> sets the value to 3456 but it's set to a different value). >>> >>> Why not taking the simplest approach - when --start-management-agent >>> is set, it does not accept mixing the old way (i.e. does not accept >>> the management properties to be set via -D)??? This RFE is to make >>> the command-line simpler and ease-of-use.? I don't see any downside >>> to migrate entirely to the new form. >> We cannot get rid of specifying options via -D. We have plenty of -D >> flags but very few have short-hand alternative via >> --start-management-agent. If management properties are specified by >> --start-management-agent, the options specified by -D are anyway >> overwritten if specified. >>> >>> Mandy >> Harsha > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthias.baesken at sap.com Mon Apr 23 12:01:14 2018 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Mon, 23 Apr 2018 12:01:14 +0000 Subject: INCLUDE_SA/serviceability agent - support on s390x Message-ID: <6e27de41f1eb40ee93b1a44a5d69d775@sap.com> Hello, as far as I know the serviceability agent is not supported on linux s390x . However (unlike on aix where it is not supported as well) , INCLUDE_SA=false is not set in the central configure m4 files . Should we set it ( suggested diff below) ? Best regards, Matthias hg diff diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 --- a/make/autoconf/jdk-options.m4 Wed Apr 18 11:19:32 2018 +0200 +++ b/make/autoconf/jdk-options.m4 Mon Apr 23 13:46:17 2018 +0200 @@ -238,6 +238,9 @@ if test "x$OPENJDK_TARGET_OS" = xaix ; then INCLUDE_SA=false fi + if test "x$OPENJDK_TARGET_CPU" = xs390x ; then + INCLUDE_SA=false + fi AC_SUBST(INCLUDE_SA) # Compress jars Best regards, Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Mon Apr 23 13:01:24 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 23 Apr 2018 14:01:24 +0100 Subject: Review Request JDK-8200559: Java agents doing instrumentation need a means to define auxiliary classes In-Reply-To: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> References: <78719929-5cc1-7770-c7bf-107bef65e57e@oracle.com> Message-ID: On 15/04/2018 07:23, mandy chung wrote: > > > This proposal would allow java agents to migrate from internal API and > ClassDefiner to be enhanced in the future. > > Webrev: > http://cr.openjdk.java.net/~mchung/jdk11/webrevs/8200559/webrev.00/ I went through the proposed spec/API addition and it looks good. The original instrumentation envisaged in JSR 163 was to support data collection by tools (profilers, tracing, etc.) and it's not unreasonable that such instrumentation would make use of (ideally non-public) helper classes that the agent defines into the same run-time package as the instrumented class. The reentrancy issues with transform implementations loading or defining new classes is very tricky so I think the proposal to not send the auxillary classes through the transformer pipeline is pragmatic. It means a small inconsistent with JVM TI but I don't think this matters too much. A minor suggestion is to replace "The transformers can ..."? with "Transformers may". The rest of the wording looks good. -Alan From erik.joelsson at oracle.com Mon Apr 23 15:43:08 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 23 Apr 2018 08:43:08 -0700 Subject: INCLUDE_SA/serviceability agent - support on s390x In-Reply-To: <6e27de41f1eb40ee93b1a44a5d69d775@sap.com> References: <6e27de41f1eb40ee93b1a44a5d69d775@sap.com> Message-ID: <77a91681-c2a2-ca0d-9585-cd9ef145ac06@oracle.com> Makes sense to me. Looks good. /Erik On 2018-04-23 05:01, Baesken, Matthias wrote: > > Hello,?? as far as I know? the serviceability agent?? is not? > supported on? linux s390x . > > However? (unlike? on aix where it is not supported as well) ,? > ?INCLUDE_SA=false is not set? in the central configure? m4 files . > > Should we set it? ( suggested diff below)? ? > > Best regards, Matthias > > hg diff > > diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 > > --- a/make/autoconf/jdk-options.m4????? Wed Apr 18 11:19:32 2018 +0200 > > +++ b/make/autoconf/jdk-options.m4????? Mon Apr 23 13:46:17 2018 +0200 > > @@ -238,6 +238,9 @@ > > ?? if test "x$OPENJDK_TARGET_OS" = xaix ; then > > ???? INCLUDE_SA=false > > ?? fi > > +? if test "x$OPENJDK_TARGET_CPU" = xs390x ; then > > +??? INCLUDE_SA=false > > +? fi > > ?? AC_SUBST(INCLUDE_SA) > > # Compress jars > > Best regards, Matthias > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sangheon.kim at oracle.com Mon Apr 23 16:52:51 2018 From: sangheon.kim at oracle.com (sangheon.kim) Date: Mon, 23 Apr 2018 09:52:51 -0700 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after In-Reply-To: <1ca3129d-eab4-27ad-a068-27b93b2f823f@oracle.com> References: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> <1ca3129d-eab4-27ad-a068-27b93b2f823f@oracle.com> Message-ID: Hi all, Can I have a second reviewer please? Thanks, Sangheon On 04/18/2018 04:57 PM, mandy chung wrote: > > > On 4/19/18 4:52 AM, sangheon.kim wrote: >> >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8196325/webrev.1 (full) >> http://cr.openjdk.java.net/~sangheki/8196325/webrev.1_to_0/ (inc) > > Looks good. > > Mandy From chris.plummer at oracle.com Mon Apr 23 20:25:16 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 23 Apr 2018 13:25:16 -0700 Subject: RFR(XS): JDK-8202155: quarantine test com/sun/jdi/JdbExprTest.sh on all platforms Message-ID: <4769d4cc-45ff-6781-577b-5daacb91751d@oracle.com> Hello, Please review the following: https://bugs.openjdk.java.net/browse/JDK-8202155 ------------------------------------------ diff --git a/test/jdk/ProblemList.txt b/test/jdk/ProblemList.txt --- a/test/jdk/ProblemList.txt +++ b/test/jdk/ProblemList.txt @@ -738,6 +738,8 @@ ?com/sun/jdi/RedefineImplementor.sh 8004127 generic-all +com/sun/jdi/JdbExprTest.sh 8185803 generic-all + ?com/sun/jdi/JdbMethodExitTest.sh 8171483 generic-all ?com/sun/jdi/RepStep.java 8043571 generic-all ------------------------------------------ https://bugs.openjdk.java.net/browse/JDK-8185803 was filed last year for this failure, but it has rarely turned up since. However, starting late last week it seems to fail on every run. Not too sure of the reason why, but it should be quarantined until it is figured out. I plan to push this as a trivial change. thanks, Chris From david.holmes at oracle.com Mon Apr 23 22:01:24 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 Apr 2018 08:01:24 +1000 Subject: RFR(XS): JDK-8202155: quarantine test com/sun/jdi/JdbExprTest.sh on all platforms In-Reply-To: <4769d4cc-45ff-6781-577b-5daacb91751d@oracle.com> References: <4769d4cc-45ff-6781-577b-5daacb91751d@oracle.com> Message-ID: <7233feed-e87f-d27c-77c4-5056ea73e458@oracle.com> Looks good. Thanks, David On 24/04/2018 6:25 AM, Chris Plummer wrote: > Hello, > > Please review the following: > > https://bugs.openjdk.java.net/browse/JDK-8202155 > > ------------------------------------------ > diff --git a/test/jdk/ProblemList.txt b/test/jdk/ProblemList.txt > --- a/test/jdk/ProblemList.txt > +++ b/test/jdk/ProblemList.txt > @@ -738,6 +738,8 @@ > > ?com/sun/jdi/RedefineImplementor.sh 8004127 generic-all > > +com/sun/jdi/JdbExprTest.sh 8185803 generic-all > + > ?com/sun/jdi/JdbMethodExitTest.sh 8171483 generic-all > > ?com/sun/jdi/RepStep.java 8043571 generic-all > ------------------------------------------ > > https://bugs.openjdk.java.net/browse/JDK-8185803 was filed last year for > this failure, but it has rarely turned up since. However, starting late > last week it seems to fail on every run. Not too sure of the reason why, > but it should be quarantined until it is figured out. > > I plan to push this as a trivial change. > > thanks, > > Chris > From chris.plummer at oracle.com Mon Apr 23 22:14:41 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 23 Apr 2018 15:14:41 -0700 Subject: RFR(XS): JDK-8202155: quarantine test com/sun/jdi/JdbExprTest.sh on all platforms In-Reply-To: <7233feed-e87f-d27c-77c4-5056ea73e458@oracle.com> References: <4769d4cc-45ff-6781-577b-5daacb91751d@oracle.com> <7233feed-e87f-d27c-77c4-5056ea73e458@oracle.com> Message-ID: Thanks! On 4/23/18 3:01 PM, David Holmes wrote: > Looks good. > > Thanks, > David > > On 24/04/2018 6:25 AM, Chris Plummer wrote: >> Hello, >> >> Please review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8202155 >> >> ------------------------------------------ >> diff --git a/test/jdk/ProblemList.txt b/test/jdk/ProblemList.txt >> --- a/test/jdk/ProblemList.txt >> +++ b/test/jdk/ProblemList.txt >> @@ -738,6 +738,8 @@ >> >> ??com/sun/jdi/RedefineImplementor.sh 8004127 generic-all >> >> +com/sun/jdi/JdbExprTest.sh 8185803 generic-all >> + >> ??com/sun/jdi/JdbMethodExitTest.sh 8171483 generic-all >> >> ??com/sun/jdi/RepStep.java 8043571 generic-all >> ------------------------------------------ >> >> https://bugs.openjdk.java.net/browse/JDK-8185803 was filed last year >> for this failure, but it has rarely turned up since. However, >> starting late last week it seems to fail on every run. Not too sure >> of the reason why, but it should be quarantined until it is figured out. >> >> I plan to push this as a trivial change. >> >> thanks, >> >> Chris >> From serguei.spitsyn at oracle.com Mon Apr 23 22:27:40 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 23 Apr 2018 15:27:40 -0700 Subject: RFR(XS): JDK-8202155: quarantine test com/sun/jdi/JdbExprTest.sh on all platforms In-Reply-To: <7233feed-e87f-d27c-77c4-5056ea73e458@oracle.com> References: <4769d4cc-45ff-6781-577b-5daacb91751d@oracle.com> <7233feed-e87f-d27c-77c4-5056ea73e458@oracle.com> Message-ID: <4d4faf5e-3e0f-01e2-7540-3a6c3c0e6f07@oracle.com> +1 Thanks, Serguei On 4/23/18 15:01, David Holmes wrote: > Looks good. > > Thanks, > David > > On 24/04/2018 6:25 AM, Chris Plummer wrote: >> Hello, >> >> Please review the following: >> >> https://bugs.openjdk.java.net/browse/JDK-8202155 >> >> ------------------------------------------ >> diff --git a/test/jdk/ProblemList.txt b/test/jdk/ProblemList.txt >> --- a/test/jdk/ProblemList.txt >> +++ b/test/jdk/ProblemList.txt >> @@ -738,6 +738,8 @@ >> >> ??com/sun/jdi/RedefineImplementor.sh 8004127 generic-all >> >> +com/sun/jdi/JdbExprTest.sh 8185803 generic-all >> + >> ??com/sun/jdi/JdbMethodExitTest.sh 8171483 generic-all >> >> ??com/sun/jdi/RepStep.java 8043571 generic-all >> ------------------------------------------ >> >> https://bugs.openjdk.java.net/browse/JDK-8185803 was filed last year >> for this failure, but it has rarely turned up since. However, >> starting late last week it seems to fail on every run. Not too sure >> of the reason why, but it should be quarantined until it is figured out. >> >> I plan to push this as a trivial change. >> >> thanks, >> >> Chris >> From serguei.spitsyn at oracle.com Mon Apr 23 23:08:55 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 23 Apr 2018 16:08:55 -0700 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> Message-ID: An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Apr 24 00:02:57 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 23 Apr 2018 17:02:57 -0700 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> Message-ID: <3c9fc4a5-48a2-82ab-590e-8d05b6c87016@oracle.com> An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Apr 24 00:47:01 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 23 Apr 2018 17:47:01 -0700 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: <3c9fc4a5-48a2-82ab-590e-8d05b6c87016@oracle.com> References: <5ffc2112-a801-3e1d-63ca-655fbbe0c1c6@oracle.com> <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> <3c9fc4a5-48a2-82ab-590e-8d05b6c87016@oracle.com> Message-ID: <8382dc15-3944-65f9-9599-1183c05a49eb@oracle.com> An HTML attachment was scrubbed... URL: From jini.george at oracle.com Tue Apr 24 06:57:52 2018 From: jini.george at oracle.com (Jini George) Date: Tue, 24 Apr 2018 12:27:52 +0530 Subject: INCLUDE_SA/serviceability agent - support on s390x In-Reply-To: <6e27de41f1eb40ee93b1a44a5d69d775@sap.com> References: <6e27de41f1eb40ee93b1a44a5d69d775@sap.com> Message-ID: Hi Matthias, Your change looks good to me. It might make sense to also remove the following lines from: src/jdk.hotspot.agent/linux/native/libsaproc/libproc.h 78 #if defined(s390x) 79 #include 80 #endif I am not sure if the following files are required either: src/hotspot/cpu/s390/vmStructs_s390.hpp src/hotspot/os_cpu/linux_s390/vmStructs_linux_s390.hpp Thanks, Jini (Not a Reviewer). On 4/23/2018 5:31 PM, Baesken, Matthias wrote: > Hello,?? as far as I know? the serviceability agent?? is not? supported > on? linux s390x . > > However? (unlike? on aix where it is not supported as well) , > ?INCLUDE_SA=false??? is not set? in the central configure? m4 files . > > Should we set it? ( suggested diff below)? ? > > Best regards, Matthias > > hg diff > > diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 > > --- a/make/autoconf/jdk-options.m4????? Wed Apr 18 11:19:32 2018 +0200 > > +++ b/make/autoconf/jdk-options.m4????? Mon Apr 23 13:46:17 2018 +0200 > > @@ -238,6 +238,9 @@ > > ?? if test "x$OPENJDK_TARGET_OS" = xaix ; then > > ???? INCLUDE_SA=false > > ?? fi > > +? if test "x$OPENJDK_TARGET_CPU" = xs390x ; then > > +??? INCLUDE_SA=false > > +? fi > > ?? AC_SUBST(INCLUDE_SA) > > # Compress jars > > Best regards, Matthias > From jini.george at oracle.com Tue Apr 24 09:03:23 2018 From: jini.george at oracle.com (Jini George) Date: Tue, 24 Apr 2018 14:33:23 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> Message-ID: <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> Hello! The webrev including the check for the "|" at the beginning of the core_pattern file is at: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ This webrev also includes a fix for a latent bug on MacOSX, where corefile debugging was failing due to SA trying to read in the incorrect mangled symbol name for "Arguments::SharedArchivePath". Clang seems to have prefixed an extra '_' to change the mangled name from '_ZN9Arguments17SharedArchivePathE' to '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for this is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. The difference between the earlier patch and this one can be seen at: http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch Thank you, Jini. On 4/18/2018 10:37 PM, Jini George wrote: > I agree with the need of testing as much as we can. I could do something > on the lines of how other debuggers like LLDB test: if we can't find the > core file location, check for "|" at the beginning of a line in the > /proc/sys/kernel/core_pattern file -- and fail with a message stating > that the system is using a crash reporting tool. > > Thank you, > Jini. > > On 4/18/2018 12:40 PM, David Holmes wrote: >> My 2c ... >> >> We have to have tests that can test core file attaching capability - >> else we don't know it works. So we have to try and generate a core file. >> >> But, we have to expect that in many cases no core file will be >> generated even if the hs-err file claims it was. For example my >> primary local testing system never generates core files even though it >> claims to: >> >> # Core dump will be written. Default location: Core dumps may be >> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >> >> >> apport isn't even installed, even though core_pattern lists it. >> >> Cheers, >> David >> >> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>> Thank you very much, Yasumasa, for pointing this out. You are right >>>> -- this >>>> would fail in the Linux systems if systemd-coredump is enabled. >>>> >>>> I plan to file an enhancement request to address this issue (wrt >>>> systemd-coredump) separately since this would apply to other coredump >>>> generating test cases also like: >>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>> >>> I agree with you, but... >>> >>>> ?From what i can gather, i think we might be able to at least partially >>>> address this by using >>>> >>>> coredumptl -o dump >>>> >>>> in the test cases, provided the kernel.core_pattern variable is not >>>> set to >>>> "|/bin/false". >>>> >>>> Let me know if you are not OK with this. >>> >>> IMHO it is not good. >>> Some Linux distros use other coredump collector. For example, RHEL 6 >>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>> Hence I think we should disable all tests which requires core images >>> for Linux like a Windows platform. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> Thank you, >>>> Jini. >>>> >>>> >>>> >>>> >>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>> >>>>> Hi Jini, >>>>> >>>>> ClhsdbCDSCore.java: >>>>> ??? Can this test work on modern Linux? >>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>>> images. So >>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>> >>>>>> Ping: Gentle reminder ! >>>>>> >>>>>> Thanks, >>>>>> Jini. >>>>>> >>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>> >>>>>>> Hello! >>>>>>> >>>>>>> Requesting reviews for: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>> >>>>>>> While trying to identify the type given an address, a >>>>>>> WrongTypeException >>>>>>> was getting thrown with various clhsdb commands (like printmdo, >>>>>>> jstack, >>>>>>> etc). This was since SA tries to map an address to a hotspot C++ >>>>>>> type by >>>>>>> comparing the vtable address to the vtable address values of known >>>>>>> types. With CDS, since the vtables are copied over for the Metadata >>>>>>> classes, the vtable addresses themselves don't match (though, of >>>>>>> course, >>>>>>> the contents will), and SA errors out. >>>>>>> >>>>>>> The fix has been implemented by making changes to read in the md >>>>>>> region >>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>> mapping >>>>>>> the vtable addresses to the corresponding metadata type >>>>>>> (ConstantPool, >>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>> >>>>>>> For corefiles, an additional modification has been done to have the >>>>>>> replicated FileMapHeader structure (from >>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>> >>>>>>> Test cases to test both live and corefile debugging are being >>>>>>> added with >>>>>>> this. These and other SA tests pass on Mach5. >>>>>>> >>>>>>> Thanks, >>>>>>> Jini. >>>>>> >>>>>> >>>> From andrew_m_leonard at uk.ibm.com Tue Apr 24 09:43:23 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Tue, 24 Apr 2018 10:43:23 +0100 Subject: RFR: 8201409: JDWP debugger initialization hangs intermittently In-Reply-To: <3c9fc4a5-48a2-82ab-590e-8d05b6c87016@oracle.com> References: <49b9573a-4cf6-7de3-4201-df21d0c66064@oracle.com> <8a44098b-577f-cfe2-61be-1d779b90db8a@oracle.com> <41d0c829-2189-90a4-3299-7dfdbe336e8b@oracle.com> <0bcb768c-0a1a-289d-6d45-382a78f43d97@oracle.com> Message-ID: Hi Serguei, Good find, i'll try it out and do some debugging. Many thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 24/04/2018 01:03 Subject: Re: RFR: 8201409: JDWP debugger initialization hangs intermittently On 4/23/18 16:08, serguei.spitsyn at oracle.com wrote: Hi Andrew, There is a regression with this fix. The following test is failed with timeout on all platforms except Windows: Sorry, forgot to copy the test name: open/test/jdk/com/sun/jdi/JITDebug.sh Thanks, Serguei I'll try to get more details about this timeout. Thanks, Serguei On 4/18/18 09:49, serguei.spitsyn at oracle.com wrote: Hi Andrew, Sorry, I did not reply earlier. The fix need more testing. We also have some important tests in closed. I'll run them but I'm a little bit busy at the moment. You have two reviews which is enough for push after testing. Thanks, Serguei On 4/18/18 08:23, Andrew Leonard wrote: Hi Serguei, Do you need me to try anything else for this review? hotspot/jtreg/serviceability suite run successfully. Many Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: daniel.daugherty at oracle.com, Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 16/04/2018 07:10 Subject: Re: RFR: 8201409: JDWP debugger initialization hangs intermittently On 4/15/18 10:01, Daniel D. Daugherty wrote: On 4/13/18 3:07 PM, serguei.spitsyn at oracle.com wrote: Andrew and reviewers, I'm re-sending this RFR with a corrected subject that includes the bug number. The issues is: https://bugs.openjdk.java.net/browse/JDK-8201409 Webrev: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h No comments. src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c So now pauses in debugLoop_run() before the loop that reads cmds. Looks good. src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c So the VM_INIT event handler now signals that we have received the VM_INIT event so that allows debugLoop_run() to proceed. Serguei, this fix needs to have the most of the Serviceability stack of tests run against it (jdwp, JVM/TI, JDI and jdb tests). Based on the email thread, I can't tell which tests have been run with the fix in place. Hi Dan, I'm going to sponsor this fix and will run all the debugger tests. Sorry that I did not announce it yet. Thanks, Serguei Dan The fix looks good to me. Also, I've agreed to skip a unit test as creating it for this issue is not easy. At least, one more review is needed before the fix can be pushed. Thanks, Serguei On 4/11/18 06:33, Andrew Leonard wrote: Hi Serguei, Thank you for raising the bug. I had a chat with one of my colleagues who could recreate it, and it's probably related to the handshaking that is done in the particular scenario. So with the JCK harness: com.sun.jck.lib.ExecJCKTestOtherJVMCmd LD_LIBRARY_PATH=/javatest/lib/jck /jck8b/natives/linux_x86-64 /projects/jck/jdwp/j2sdk-image/bin/java -Xdump:system:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -Xdump:snap:none -Xdump:snap:events=gpf+abort+traceassert+corruptcache -Xdump:java:none -Xdump:java:events=gpf+abort+traceassert+corruptcache -Xdump:heap:none -Xdump:heap:events=gpf+abort+traceassert+corruptcache - Xfuture -agentlib:jdwp=server=y,transport=dt_socket,address=localhost :35000,suspend=y -classpath /javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/classes -Djava.security.policy=/javatest/lib/jck /JCK8b-b03/JCK-runtime-8b/lib/jck.policy javasoft.sqe.jck.lib.jpda.jdwp.DebuggeeLoader -waittime=600 -msgSwitch=ub1604x64vm10:38636 -componentName= ArrayReference.GetValues.getvalues002 Note that the JCK test harness starts the target process, attaches to it, and sends the resume command in a very short time with no handshaking. That may not help..but hopefully helps explain things a bit? It's the timing of the resume command during the test that is crucial, resuming before the VM initialization is complete will trigger it. Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 09:57 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, I've filed the bug: https://bugs.openjdk.java.net/browse/JDK-8201409 Also, this is a webrev with your patch: http://cr.openjdk.java.net/~sspitsyn/webrevs/2018/8201409-jdwp-initsync.ibm.1/ I agree that creating a standalone test is tricky here. I've added usleep(10000) into the eventHelper_reportVMInit() and ran the JTreg com/sun/jdi tests with my JDK build. However, none of the tests failed with the failure mode you described. So that I'm puzzled a little bit. I suspect that some specific debugLoop commands were used in your scenario. It is still possible that I've missed something here. Will try to double check everything. Thanks, Serguei On 4/11/18 01:29, Andrew Leonard wrote: Thanks Serguei, I terms of a standalone testcase it is quite tricky, as due to the nature of the issue which took a lot of investigation to solve it's very timing dependent and will only occur randomly. It can be forced as I indicated below by adding a "sleep" in the VMInit report code but that's not a testcase, however the issue was originally found in our JCK testing for IBMJava8, testcase test.jck8b.runtime.vm.jdwp, but again only happened intermittently. Sort of like "performance" type issues we're not always going to be able to create a testcase that will always "fail" if the fix is not present. Your thoughts? Cheers Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard Cc: serviceability-dev at openjdk.java.net Date: 11/04/2018 01:02 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, Okay, I'll file a bug on this topic. But do you have a standalone test demonstrating this issue? Thanks, Serguei On 4/10/18 06:23, Andrew Leonard wrote: Hi Serguei, I don't have access to the bug database to raise one, are you able to please? Summary: JDWP debugger initialization hangs intermittently Description: If during the JDWP setup initialization the VM initialization takes slightly longer than the main debug initialization thread a "hang" situation can occur. This has been seen in testcase test.jck8b.runtime.vm.jdwp and can also be recreated easily by adding a 10 second sleep to the beginning of the src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c method eventHelper_reportVMInit() . First seen: JDK8 Recreated: JDK11 Thanks Andrew Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com From: "serguei.spitsyn at oracle.com" To: Andrew Leonard , serviceability-dev at openjdk.java.net Date: 09/04/2018 23:03 Subject: Re: RFR: Fix race condition in jdwp Hi Andrew, The patch itself looks reasonable. However, in order to proceed with it, a bug report with a standalone test case demonstrating the issue is needed. Thanks, Serguei On 4/9/18 09:07, Andrew Leonard wrote: > Hi, > We discovered in our testing with OpenJ9 that a race condition can > occur in the jdwp under certain circumstances, and we were able to > force the same issue with Hotspot. Normally, the event helper thread > suspends all threads, then the debug loop in the listener thread > receives a command to resume. The debugger may deadlock if the debug > loop in the listener thread starts processing commands (e.g. resume > threads) before the event helper completes the initialization (and > suspends threads). > > This patch adds synchronization to ensure the event helper completes > the initialization sequence before debugger commands are processed. > > Please can I find a sponsor for this contribution? Patch below.. > > Many thanks > > Andrew > > > > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -58,6 +58,7 @@ > static jboolean vmInitialized; > static jrawMonitorID initMonitor; > static jboolean initComplete; > +static jboolean VMInitComplete; > static jbyte currentSessionID; > > /* > @@ -617,6 +618,35 @@ > debugMonitorExit(initMonitor); > } > > +/* > + * Signal VM initialization is complete. > + */ > +void > +signalVMInitComplete(void) > +{ > + /* > + * VM Initialization is complete > + */ > + LOG_MISC(("signal VM initialization complete")); > + debugMonitorEnter(initMonitor); > + VMInitComplete = JNI_TRUE; > + debugMonitorNotifyAll(initMonitor); > + debugMonitorExit(initMonitor); > +} > + > +/* > + * Wait for VM initialization to complete. > + */ > +void > +debugInit_waitVMInitComplete(void) > +{ > + debugMonitorEnter(initMonitor); > + while (!VMInitComplete) { > + debugMonitorWait(initMonitor); > + } > + debugMonitorExit(initMonitor); > +} > + > /* All process exit() calls come from here */ > void > forceExit(int exit_code) > @@ -672,6 +702,7 @@ > LOG_MISC(("Begin initialize()")); > currentSessionID = 0; > initComplete = JNI_FALSE; > + VMInitComplete = JNI_FALSE; > > if ( gdata->vmDead ) { > EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -39,4 +39,7 @@ > void debugInit_exit(jvmtiError, const char *); > void forceExit(int); > > +void debugInit_waitVMInitComplete(void); > +void signalVMInitComplete(void); > + > #endif > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -98,6 +98,7 @@ > standardHandlers_onConnect(); > threadControl_onConnect(); > > + debugInit_waitVMInitComplete(); > /* Okay, start reading cmds! */ > while (shouldListen) { > if (!dequeue(&p)) { > diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights > reserved. > + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights > reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -580,6 +580,7 @@ > (void)threadControl_suspendThread(command->thread, JNI_FALSE); > } > > + signalVMInitComplete(); > outStream_initCommand(&out, uniqueID(), 0x0, > JDWP_COMMAND_SET(Event), > JDWP_COMMAND(Event, Composite)); > > > > Andrew Leonard > Java Runtimes Development > IBM Hursley > IBM United Kingdom Ltd > Phone internal: 245913, external: 01962 815913 > internet email: andrew_m_leonard at uk.ibm.com > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with > number 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Apr 24 09:59:33 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 Apr 2018 19:59:33 +1000 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> Message-ID: <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> Hi Jini, Not a full review as I'm not familiar enough with this code. My main comment, again, relates to test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it should not fail (throw Error) if there is no core file generated etc. In that case the test should be skipped with a clear message (as elsewhere). Otherwise this test will fail locally for me every time I run the serviceability tests! I also have a few style issues. Don't compare boolean functions with true or false i.e. if (isX() == true) -> if (isX()) if (isX() == false) -> if (!isX()) this occurs in most of the Java files. It is especially noticeable when you mix styles ie: + if (VM.getVM().isSharingEnabled()) { <= implicit check of true + // Check if the value falls in the _md_region + FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); + if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= explicit check + return cdsFileMapInfo.getTypeForVptrAddress(loc1); + } + } --- src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java 139 vTableTypeMap.put 140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), metadataTypeArray[i]); 141 // The '+ 1' below is to skip the entry containing the size of this metadata's vtable. 142 copiedVtableAddress = 143 copiedVtableAddress.addOffsetTo((metadataVTableSize + 1) * VM.getVM().getAddressSize()); If you store VM.getVM().getAddressSize() in a local you only need call it once, and the other lines of code will be shorter. On line 139/140 keep the opening parenthesis with the method name ie: vTableTypeMap.put( but with shorter lines you should be able to reformat that more cleanly anyway. 146 } // FileMapHeader 147 } // FileMapInfo We generally don't comment the end of blocks. --- test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java 96 } catch (Throwable t) { 97 throw new Error("Can't execute the java cds process."); 98 } Set 't' as the cause of the new Error so we can see why it failed. Thanks, David On 24/04/2018 7:03 PM, Jini George wrote: > Hello! > > The webrev including the check for the "|" at the beginning of the > core_pattern file is at: > > http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ > > This webrev also includes a fix for a latent bug on MacOSX, where > corefile debugging was failing due to SA trying to read in the incorrect > mangled symbol name for "Arguments::SharedArchivePath". Clang seems to > have prefixed an extra '_' to change the mangled name from > '_ZN9Arguments17SharedArchivePathE' to > '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for this > is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. > > The difference between the earlier patch and this one can be seen at: > > http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch > > Thank you, > Jini. > > > On 4/18/2018 10:37 PM, Jini George wrote: >> I agree with the need of testing as much as we can. I could do >> something on the lines of how other debuggers like LLDB test: if we >> can't find the core file location, check for "|" at the beginning of a >> line in the /proc/sys/kernel/core_pattern file -- and fail with a >> message stating that the system is using a crash reporting tool. >> >> Thank you, >> Jini. >> >> On 4/18/2018 12:40 PM, David Holmes wrote: >>> My 2c ... >>> >>> We have to have tests that can test core file attaching capability - >>> else we don't know it works. So we have to try and generate a core file. >>> >>> But, we have to expect that in many cases no core file will be >>> generated even if the hs-err file claims it was. For example my >>> primary local testing system never generates core files even though >>> it claims to: >>> >>> # Core dump will be written. Default location: Core dumps may be >>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>> >>> >>> apport isn't even installed, even though core_pattern lists it. >>> >>> Cheers, >>> David >>> >>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>> Thank you very much, Yasumasa, for pointing this out. You are right >>>>> -- this >>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>> >>>>> I plan to file an enhancement request to address this issue (wrt >>>>> systemd-coredump) separately since this would apply to other coredump >>>>> generating test cases also like: >>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>> >>>> I agree with you, but... >>>> >>>>> ?From what i can gather, i think we might be able to at least >>>>> partially >>>>> address this by using >>>>> >>>>> coredumptl -o dump >>>>> >>>>> in the test cases, provided the kernel.core_pattern variable is not >>>>> set to >>>>> "|/bin/false". >>>>> >>>>> Let me know if you are not OK with this. >>>> >>>> IMHO it is not good. >>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>> Hence I think we should disable all tests which requires core images >>>> for Linux like a Windows platform. >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>>> Thank you, >>>>> Jini. >>>>> >>>>> >>>>> >>>>> >>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>> >>>>>> Hi Jini, >>>>>> >>>>>> ClhsdbCDSCore.java: >>>>>> ??? Can this test work on modern Linux? >>>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>>>> images. So >>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>> >>>>>>> Ping: Gentle reminder ! >>>>>>> >>>>>>> Thanks, >>>>>>> Jini. >>>>>>> >>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>> >>>>>>>> Hello! >>>>>>>> >>>>>>>> Requesting reviews for: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>> >>>>>>>> While trying to identify the type given an address, a >>>>>>>> WrongTypeException >>>>>>>> was getting thrown with various clhsdb commands (like printmdo, >>>>>>>> jstack, >>>>>>>> etc). This was since SA tries to map an address to a hotspot C++ >>>>>>>> type by >>>>>>>> comparing the vtable address to the vtable address values of known >>>>>>>> types. With CDS, since the vtables are copied over for the Metadata >>>>>>>> classes, the vtable addresses themselves don't match (though, of >>>>>>>> course, >>>>>>>> the contents will), and SA errors out. >>>>>>>> >>>>>>>> The fix has been implemented by making changes to read in the md >>>>>>>> region >>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>> mapping >>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>> (ConstantPool, >>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>> >>>>>>>> For corefiles, an additional modification has been done to have the >>>>>>>> replicated FileMapHeader structure (from >>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>> >>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>> added with >>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jini. >>>>>>> >>>>>>> >>>>> From matthias.baesken at sap.com Tue Apr 24 16:09:29 2018 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 24 Apr 2018 16:09:29 +0000 Subject: RFR: JDK-8202200: set INCLUDE_SA to false on s390x by default -was : RE: INCLUDE_SA/serviceability agent - support on s390x Message-ID: <6a3fba0f7abd4ac38de4a13fd8ec33ef@sap.com> Hi Jini, the removal of the mentioned headers leads to build errors so we better keep it . Error is when the headers are removed : /nb/linuxs390x/nightly/jdk/src/hotspot/share/runtime/vmStructs.cpp:100:31: fatal error: vmStructs_s390.hpp: No such file or directory #include CPU_HEADER(vmStructs) I uploaded a webrev for review : http://cr.openjdk.java.net/~mbaesken/webrevs/8202200/ bug : https://bugs.openjdk.java.net/browse/JDK-8202200 Best regards, Matthias > -----Original Message----- > From: Jini George [mailto:jini.george at oracle.com] > Sent: Dienstag, 24. April 2018 08:58 > To: Baesken, Matthias ; 'build- > dev at openjdk.java.net' > Cc: serviceability-dev at openjdk.java.net; Schmidt, Lutz > > Subject: Re: INCLUDE_SA/serviceability agent - support on s390x > > Hi Matthias, > > Your change looks good to me. It might make sense to also remove the > following lines from: > > src/jdk.hotspot.agent/linux/native/libsaproc/libproc.h > > 78 #if defined(s390x) > 79 #include > 80 #endif > > I am not sure if the following files are required either: > src/hotspot/cpu/s390/vmStructs_s390.hpp > src/hotspot/os_cpu/linux_s390/vmStructs_linux_s390.hpp > > Thanks, > Jini (Not a Reviewer). > > > On 4/23/2018 5:31 PM, Baesken, Matthias wrote: > > Hello,?? as far as I know? the serviceability agent?? is not? supported > > on? linux s390x . > > > > However? (unlike? on aix where it is not supported as well) , > > ?INCLUDE_SA=false??? is not set? in the central configure? m4 files . > > > > Should we set it? ( suggested diff below)? ? > > > > Best regards, Matthias > > > > hg diff > > > > diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 > > > > --- a/make/autoconf/jdk-options.m4????? Wed Apr 18 11:19:32 2018 +0200 > > > > +++ b/make/autoconf/jdk-options.m4????? Mon Apr 23 13:46:17 2018 +0200 > > > > @@ -238,6 +238,9 @@ > > > > ?? if test "x$OPENJDK_TARGET_OS" = xaix ; then > > > > ???? INCLUDE_SA=false > > > > ?? fi > > > > +? if test "x$OPENJDK_TARGET_CPU" = xs390x ; then > > > > +??? INCLUDE_SA=false > > > > +? fi > > > > ?? AC_SUBST(INCLUDE_SA) > > > > # Compress jars > > > > Best regards, Matthias > > From jini.george at oracle.com Tue Apr 24 17:03:03 2018 From: jini.george at oracle.com (Jini George) Date: Tue, 24 Apr 2018 22:33:03 +0530 Subject: RFR: JDK-8202200: set INCLUDE_SA to false on s390x by default -was : RE: INCLUDE_SA/serviceability agent - support on s390x In-Reply-To: <6a3fba0f7abd4ac38de4a13fd8ec33ef@sap.com> References: <6a3fba0f7abd4ac38de4a13fd8ec33ef@sap.com> Message-ID: <8f19239b-fc06-1712-a297-5148b0df6a67@oracle.com> Looks good to me, Matthias. Thank you, Jini. On 4/24/2018 9:39 PM, Baesken, Matthias wrote: > Hi Jini, the removal of the mentioned headers leads to build errors so we better keep it . > Error is when the headers are removed : > > /nb/linuxs390x/nightly/jdk/src/hotspot/share/runtime/vmStructs.cpp:100:31: fatal error: vmStructs_s390.hpp: No such file or directory > #include CPU_HEADER(vmStructs) > > > I uploaded a webrev for review : > > > http://cr.openjdk.java.net/~mbaesken/webrevs/8202200/ > > bug : > > https://bugs.openjdk.java.net/browse/JDK-8202200 > > > > Best regards, Matthias > > >> -----Original Message----- >> From: Jini George [mailto:jini.george at oracle.com] >> Sent: Dienstag, 24. April 2018 08:58 >> To: Baesken, Matthias ; 'build- >> dev at openjdk.java.net' >> Cc: serviceability-dev at openjdk.java.net; Schmidt, Lutz >> >> Subject: Re: INCLUDE_SA/serviceability agent - support on s390x >> >> Hi Matthias, >> >> Your change looks good to me. It might make sense to also remove the >> following lines from: >> >> src/jdk.hotspot.agent/linux/native/libsaproc/libproc.h >> >> 78 #if defined(s390x) >> 79 #include >> 80 #endif >> >> I am not sure if the following files are required either: >> src/hotspot/cpu/s390/vmStructs_s390.hpp >> src/hotspot/os_cpu/linux_s390/vmStructs_linux_s390.hpp >> >> Thanks, >> Jini (Not a Reviewer). >> >> >> On 4/23/2018 5:31 PM, Baesken, Matthias wrote: >>> Hello,?? as far as I know? the serviceability agent?? is not? supported >>> on? linux s390x . >>> >>> However? (unlike? on aix where it is not supported as well) , >>> ?INCLUDE_SA=false??? is not set? in the central configure? m4 files . >>> >>> Should we set it? ( suggested diff below)? ? >>> >>> Best regards, Matthias >>> >>> hg diff >>> >>> diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 >>> >>> --- a/make/autoconf/jdk-options.m4????? Wed Apr 18 11:19:32 2018 +0200 >>> >>> +++ b/make/autoconf/jdk-options.m4????? Mon Apr 23 13:46:17 2018 +0200 >>> >>> @@ -238,6 +238,9 @@ >>> >>> ?? if test "x$OPENJDK_TARGET_OS" = xaix ; then >>> >>> ???? INCLUDE_SA=false >>> >>> ?? fi >>> >>> +? if test "x$OPENJDK_TARGET_CPU" = xs390x ; then >>> >>> +??? INCLUDE_SA=false >>> >>> +? fi >>> >>> ?? AC_SUBST(INCLUDE_SA) >>> >>> # Compress jars >>> >>> Best regards, Matthias >>> From serguei.spitsyn at oracle.com Tue Apr 24 19:24:39 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 24 Apr 2018 12:24:39 -0700 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after In-Reply-To: References: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> <1ca3129d-eab4-27ad-a068-27b93b2f823f@oracle.com> Message-ID: Hi Sangheon, The fix looks good to me. Thanks, Serguei On 4/23/18 09:52, sangheon.kim wrote: > Hi all, > > Can I have a second reviewer please? > > Thanks, > Sangheon > > > On 04/18/2018 04:57 PM, mandy chung wrote: >> >> >> On 4/19/18 4:52 AM, sangheon.kim wrote: >>> >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sangheki/8196325/webrev.1 (full) >>> http://cr.openjdk.java.net/~sangheki/8196325/webrev.1_to_0/ (inc) >> >> Looks good. >> >> Mandy > From sangheon.kim at oracle.com Tue Apr 24 21:01:07 2018 From: sangheon.kim at oracle.com (sangheon.kim) Date: Tue, 24 Apr 2018 14:01:07 -0700 Subject: RFR: 8196325: GarbageCollectionNotificationInfo has same information for before and after In-Reply-To: References: <56011de1-07c0-a575-1a14-7a05e01b7cbf@oracle.com> <2b3f9754-b369-0ad3-609d-9691c7965736@oracle.com> <1ca3129d-eab4-27ad-a068-27b93b2f823f@oracle.com> Message-ID: <26f7d2b1-d0f6-04a3-87a7-f0562d0f21b7@oracle.com> Hi Serguei, Thank you for the review! Thanks, Sangheon On 04/24/2018 12:24 PM, serguei.spitsyn at oracle.com wrote: > Hi Sangheon, > > The fix looks good to me. > > Thanks, > Serguei > > > On 4/23/18 09:52, sangheon.kim wrote: >> Hi all, >> >> Can I have a second reviewer please? >> >> Thanks, >> Sangheon >> >> >> On 04/18/2018 04:57 PM, mandy chung wrote: >>> >>> >>> On 4/19/18 4:52 AM, sangheon.kim wrote: >>>> >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~sangheki/8196325/webrev.1 (full) >>>> http://cr.openjdk.java.net/~sangheki/8196325/webrev.1_to_0/ (inc) >>> >>> Looks good. >>> >>> Mandy >> > From jini.george at oracle.com Wed Apr 25 03:26:20 2018 From: jini.george at oracle.com (Jini George) Date: Wed, 25 Apr 2018 08:56:20 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> Message-ID: <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> Thank you very much, David for looking into this. I have incorporated all the comments and the revised webrev is at: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html Thanks, Jini. On 4/24/2018 3:29 PM, David Holmes wrote: > Hi Jini, > > Not a full review as I'm not familiar enough with this code. > > My main comment, again, relates to > test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it > should not fail (throw Error) if there is no core file generated etc. In > that case the test should be skipped with a clear message (as > elsewhere). Otherwise this test will fail locally for me every time I > run the serviceability tests! > > I also have a few style issues. > > Don't compare boolean functions with true or false i.e. > > if (isX() == true) -> if (isX()) > if (isX() == false) -> if (!isX()) > > this occurs in most of the Java files. It is especially noticeable when > you mix styles ie: > > +???? if (VM.getVM().isSharingEnabled()) {? <= implicit check of true > +?????? // Check if the value falls in the _md_region > +?????? FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); > +?????? if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= > explicit check > +????????? return cdsFileMapInfo.getTypeForVptrAddress(loc1); > +?????? } > +???? } > > --- > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java > > ?139???????? vTableTypeMap.put > ?140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), > metadataTypeArray[i]); > ?141???????? // The '+ 1' below is to skip the entry containing the > size of this metadata's vtable. > ?142???????? copiedVtableAddress = > ?143?????????? copiedVtableAddress.addOffsetTo((metadataVTableSize + 1) > * VM.getVM().getAddressSize()); > > If you store VM.getVM().getAddressSize() in a local you only need call > it once, and the other lines of code will be shorter. > > On line 139/140 keep the opening parenthesis with the method name ie: > > ??? vTableTypeMap.put( > > but with shorter lines you should be able to reformat that more cleanly > anyway. > > > ?146?? } // FileMapHeader > ?147 } // FileMapInfo > > We generally? don't comment the end of blocks. > > --- > > test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java > > ?96???????????? } catch (Throwable t) { > ? 97???????????????? throw new Error("Can't execute the java cds > process."); > ? 98???????????? } > > Set 't' as the cause of the new Error so we can see why it failed. > > Thanks, > David > > On 24/04/2018 7:03 PM, Jini George wrote: >> Hello! >> >> The webrev including the check for the "|" at the beginning of the >> core_pattern file is at: >> >> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >> >> This webrev also includes a fix for a latent bug on MacOSX, where >> corefile debugging was failing due to SA trying to read in the >> incorrect mangled symbol name for "Arguments::SharedArchivePath". >> Clang seems to have prefixed an extra '_' to change the mangled name >> from '_ZN9Arguments17SharedArchivePathE' to >> '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for >> this is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >> >> The difference between the earlier patch and this one can be seen at: >> >> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >> >> Thank you, >> Jini. >> >> >> On 4/18/2018 10:37 PM, Jini George wrote: >>> I agree with the need of testing as much as we can. I could do >>> something on the lines of how other debuggers like LLDB test: if we >>> can't find the core file location, check for "|" at the beginning of >>> a line in the /proc/sys/kernel/core_pattern file -- and fail with a >>> message stating that the system is using a crash reporting tool. >>> >>> Thank you, >>> Jini. >>> >>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>> My 2c ... >>>> >>>> We have to have tests that can test core file attaching capability - >>>> else we don't know it works. So we have to try and generate a core >>>> file. >>>> >>>> But, we have to expect that in many cases no core file will be >>>> generated even if the hs-err file claims it was. For example my >>>> primary local testing system never generates core files even though >>>> it claims to: >>>> >>>> # Core dump will be written. Default location: Core dumps may be >>>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>> >>>> >>>> apport isn't even installed, even though core_pattern lists it. >>>> >>>> Cheers, >>>> David >>>> >>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>> Thank you very much, Yasumasa, for pointing this out. You are >>>>>> right -- this >>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>> >>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>> systemd-coredump) separately since this would apply to other coredump >>>>>> generating test cases also like: >>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>> >>>>> I agree with you, but... >>>>> >>>>>> ?From what i can gather, i think we might be able to at least >>>>>> partially >>>>>> address this by using >>>>>> >>>>>> coredumptl -o dump >>>>>> >>>>>> in the test cases, provided the kernel.core_pattern variable is >>>>>> not set to >>>>>> "|/bin/false". >>>>>> >>>>>> Let me know if you are not OK with this. >>>>> >>>>> IMHO it is not good. >>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>> Hence I think we should disable all tests which requires core images >>>>> for Linux like a Windows platform. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>>> Thank you, >>>>>> Jini. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>> >>>>>>> Hi Jini, >>>>>>> >>>>>>> ClhsdbCDSCore.java: >>>>>>> ??? Can this test work on modern Linux? >>>>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>>>>> images. So >>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>> >>>>>>>> Ping: Gentle reminder ! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jini. >>>>>>>> >>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>> >>>>>>>>> Hello! >>>>>>>>> >>>>>>>>> Requesting reviews for: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>> >>>>>>>>> While trying to identify the type given an address, a >>>>>>>>> WrongTypeException >>>>>>>>> was getting thrown with various clhsdb commands (like printmdo, >>>>>>>>> jstack, >>>>>>>>> etc). This was since SA tries to map an address to a hotspot >>>>>>>>> C++ type by >>>>>>>>> comparing the vtable address to the vtable address values of known >>>>>>>>> types. With CDS, since the vtables are copied over for the >>>>>>>>> Metadata >>>>>>>>> classes, the vtable addresses themselves don't match (though, >>>>>>>>> of course, >>>>>>>>> the contents will), and SA errors out. >>>>>>>>> >>>>>>>>> The fix has been implemented by making changes to read in the >>>>>>>>> md region >>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>>> mapping >>>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>>> (ConstantPool, >>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>> >>>>>>>>> For corefiles, an additional modification has been done to have >>>>>>>>> the >>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>> >>>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>>> added with >>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jini. >>>>>>>> >>>>>>>> >>>>>> From matthias.baesken at sap.com Wed Apr 25 08:14:34 2018 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Wed, 25 Apr 2018 08:14:34 +0000 Subject: RFR: JDK-8202200: set INCLUDE_SA to false on s390x by default -was : RE: INCLUDE_SA/serviceability agent - support on s390x Message-ID: <05e5a91232bc4957886e765126f971da@sap.com> Hi Erik, thanks ! Can I consider this as a review ? In the meantime I created a webrev + bug : webrev for review : http://cr.openjdk.java.net/~mbaesken/webrevs/8202200/ bug : https://bugs.openjdk.java.net/browse/JDK-8202200 Regards, Matthias From: Erik Joelsson [mailto:erik.joelsson at oracle.com] Sent: Montag, 23. April 2018 17:43 To: Baesken, Matthias ; 'build-dev at openjdk.java.net' Cc: serviceability-dev at openjdk.java.net; Schmidt, Lutz Subject: Re: INCLUDE_SA/serviceability agent - support on s390x Makes sense to me. Looks good. /Erik On 2018-04-23 05:01, Baesken, Matthias wrote: Hello, as far as I know the serviceability agent is not supported on linux s390x . However (unlike on aix where it is not supported as well) , INCLUDE_SA=false is not set in the central configure m4 files . Should we set it ( suggested diff below) ? Best regards, Matthias hg diff diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 --- a/make/autoconf/jdk-options.m4 Wed Apr 18 11:19:32 2018 +0200 +++ b/make/autoconf/jdk-options.m4 Mon Apr 23 13:46:17 2018 +0200 @@ -238,6 +238,9 @@ if test "x$OPENJDK_TARGET_OS" = xaix ; then INCLUDE_SA=false fi + if test "x$OPENJDK_TARGET_CPU" = xs390x ; then + INCLUDE_SA=false + fi AC_SUBST(INCLUDE_SA) # Compress jars Best regards, Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From magnus.ihse.bursie at oracle.com Wed Apr 25 09:15:22 2018 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 25 Apr 2018 11:15:22 +0200 Subject: RFR: JDK-8202200: set INCLUDE_SA to false on s390x by default -was : RE: INCLUDE_SA/serviceability agent - support on s390x In-Reply-To: <05e5a91232bc4957886e765126f971da@sap.com> References: <05e5a91232bc4957886e765126f971da@sap.com> Message-ID: On 2018-04-25 10:14, Baesken, Matthias wrote: > > Hi Erik, thanks ! > > Can I consider this as a review ? > > In the meantime I created a webrev + bug : > > webrev for review? : > > http://cr.openjdk.java.net/~mbaesken/webrevs/8202200/ > > Looks good to me. /Magnus > > bug : > > https://bugs.openjdk.java.net/browse/JDK-8202200 > > Regards, Matthias > > *From:*Erik Joelsson [mailto:erik.joelsson at oracle.com] > *Sent:* Montag, 23. April 2018 17:43 > *To:* Baesken, Matthias ; > 'build-dev at openjdk.java.net' > *Cc:* serviceability-dev at openjdk.java.net; Schmidt, Lutz > > *Subject:* Re: INCLUDE_SA/serviceability agent - support on s390x > > Makes sense to me. Looks good. > > /Erik > > On 2018-04-23 05:01, Baesken, Matthias wrote: > > Hello,?? as far as I know? the serviceability agent?? is not? > supported on linux s390x . > > However? (unlike? on aix where it is not supported as well) , > ?INCLUDE_SA=false??? is not set? in the central configure? m4 files . > > Should we set it? ( suggested diff below)? ? > > Best regards, Matthias > > hg diff > > diff -r fcd5df7aa235 make/autoconf/jdk-options.m4 > > --- a/make/autoconf/jdk-options.m4????? Wed Apr 18 11:19:32 2018 +0200 > > +++ b/make/autoconf/jdk-options.m4????? Mon Apr 23 13:46:17 2018 +0200 > > @@ -238,6 +238,9 @@ > > ?? if test "x$OPENJDK_TARGET_OS" = xaix ; then > > INCLUDE_SA=false > > ?? fi > > +? if test "x$OPENJDK_TARGET_CPU" = xs390x ; then > > + INCLUDE_SA=false > > +? fi > > AC_SUBST(INCLUDE_SA) > > # Compress jars > > Best regards, Matthias > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasuenag at gmail.com Wed Apr 25 12:44:43 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 25 Apr 2018 21:44:43 +0900 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> Message-ID: Hi Jini, >>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : : >>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>> systemd-coredump) separately since this would apply to other coredump >>>>>>> generating test cases also like: >>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. I guessed the tests for coredumps will be handled in another issue (with TestSAServer.java). Is it okay to implement coredump test in this changeset? IMHO, it looks to implement as new test basis (e.g. LingeredAppForCoredump - ulimit check, set, etc...). Thanks, Yasumasa On 2018/04/25 12:26, Jini George wrote: > Thank you very much, David for looking into this. I have incorporated all the comments and the revised webrev is at: > > http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html > > Thanks, > Jini. > > On 4/24/2018 3:29 PM, David Holmes wrote: >> Hi Jini, >> >> Not a full review as I'm not familiar enough with this code. >> >> My main comment, again, relates to test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it should not fail (throw Error) if there is no core file generated etc. In that case the test should be skipped with a clear message (as elsewhere). Otherwise this test will fail locally for me every time I run the serviceability tests! >> >> I also have a few style issues. >> >> Don't compare boolean functions with true or false i.e. >> >> if (isX() == true) -> if (isX()) >> if (isX() == false) -> if (!isX()) >> >> this occurs in most of the Java files. It is especially noticeable when you mix styles ie: >> >> +???? if (VM.getVM().isSharingEnabled()) {? <= implicit check of true >> +?????? // Check if the value falls in the _md_region >> +?????? FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); >> +?????? if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= explicit check >> +????????? return cdsFileMapInfo.getTypeForVptrAddress(loc1); >> +?????? } >> +???? } >> >> --- >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java >> >> ??139???????? vTableTypeMap.put >> ??140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), metadataTypeArray[i]); >> ??141???????? // The '+ 1' below is to skip the entry containing the size of this metadata's vtable. >> ??142???????? copiedVtableAddress = >> ??143?????????? copiedVtableAddress.addOffsetTo((metadataVTableSize + 1) * VM.getVM().getAddressSize()); >> >> If you store VM.getVM().getAddressSize() in a local you only need call it once, and the other lines of code will be shorter. >> >> On line 139/140 keep the opening parenthesis with the method name ie: >> >> ???? vTableTypeMap.put( >> >> but with shorter lines you should be able to reformat that more cleanly anyway. >> >> >> ??146?? } // FileMapHeader >> ??147 } // FileMapInfo >> >> We generally? don't comment the end of blocks. >> >> --- >> >> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java >> >> ??96???????????? } catch (Throwable t) { >> ?? 97???????????????? throw new Error("Can't execute the java cds process."); >> ?? 98???????????? } >> >> Set 't' as the cause of the new Error so we can see why it failed. >> >> Thanks, >> David >> >> On 24/04/2018 7:03 PM, Jini George wrote: >>> Hello! >>> >>> The webrev including the check for the "|" at the beginning of the core_pattern file is at: >>> >>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >>> >>> This webrev also includes a fix for a latent bug on MacOSX, where corefile debugging was failing due to SA trying to read in the incorrect mangled symbol name for "Arguments::SharedArchivePath". Clang seems to have prefixed an extra '_' to change the mangled name from '_ZN9Arguments17SharedArchivePathE' to '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for this is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >>> >>> The difference between the earlier patch and this one can be seen at: >>> >>> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >>> >>> Thank you, >>> Jini. >>> >>> >>> On 4/18/2018 10:37 PM, Jini George wrote: >>>> I agree with the need of testing as much as we can. I could do something on the lines of how other debuggers like LLDB test: if we can't find the core file location, check for "|" at the beginning of a line in the /proc/sys/kernel/core_pattern file -- and fail with a message stating that the system is using a crash reporting tool. >>>> >>>> Thank you, >>>> Jini. >>>> >>>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>>> My 2c ... >>>>> >>>>> We have to have tests that can test core file attaching capability - else we don't know it works. So we have to try and generate a core file. >>>>> >>>>> But, we have to expect that in many cases no core file will be generated even if the hs-err file claims it was. For example my primary local testing system never generates core files even though it claims to: >>>>> >>>>> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>>> >>>>> apport isn't even installed, even though core_pattern lists it. >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>>> Thank you very much, Yasumasa, for pointing this out. You are right -- this >>>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>>> >>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>> systemd-coredump) separately since this would apply to other coredump >>>>>>> generating test cases also like: >>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>>> >>>>>> I agree with you, but... >>>>>> >>>>>>> ?From what i can gather, i think we might be able to at least partially >>>>>>> address this by using >>>>>>> >>>>>>> coredumptl -o dump >>>>>>> >>>>>>> in the test cases, provided the kernel.core_pattern variable is not set to >>>>>>> "|/bin/false". >>>>>>> >>>>>>> Let me know if you are not OK with this. >>>>>> >>>>>> IMHO it is not good. >>>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>>> Hence I think we should disable all tests which requires core images >>>>>> for Linux like a Windows platform. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> Thank you, >>>>>>> Jini. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>>> >>>>>>>> Hi Jini, >>>>>>>> >>>>>>>> ClhsdbCDSCore.java: >>>>>>>> ??? Can this test work on modern Linux? >>>>>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core images. So >>>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>>> >>>>>>>>> Ping: Gentle reminder ! >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jini. >>>>>>>>> >>>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>>> >>>>>>>>>> Hello! >>>>>>>>>> >>>>>>>>>> Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>>> >>>>>>>>>> While trying to identify the type given an address, a WrongTypeException >>>>>>>>>> was getting thrown with various clhsdb commands (like printmdo, jstack, >>>>>>>>>> etc). This was since SA tries to map an address to a hotspot C++ type by >>>>>>>>>> comparing the vtable address to the vtable address values of known >>>>>>>>>> types. With CDS, since the vtables are copied over for the Metadata >>>>>>>>>> classes, the vtable addresses themselves don't match (though, of course, >>>>>>>>>> the contents will), and SA errors out. >>>>>>>>>> >>>>>>>>>> The fix has been implemented by making changes to read in the md region >>>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and mapping >>>>>>>>>> the vtable addresses to the corresponding metadata type (ConstantPool, >>>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>>> >>>>>>>>>> For corefiles, an additional modification has been done to have the >>>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA in >>>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>>> >>>>>>>>>> Test cases to test both live and corefile debugging are being added with >>>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jini. >>>>>>>>> >>>>>>>>> >>>>>>> From yasuenag at gmail.com Wed Apr 25 12:48:59 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 25 Apr 2018 21:48:59 +0900 Subject: PING: RFR: 8199519: Several GC tests fails with: java.lang.NumberFormatException: Unparseable number: "-" In-Reply-To: <56804cfb-7b80-a8f0-c866-cda4b36799fd@gmail.com> References: <6755303f-a1a0-da4f-e1e0-a1bcb0c72efd@gmail.com> <7809552d-dfa0-5f26-bd82-c13df7f45f5f@oracle.com> <85853429-a520-1782-40e4-e05776aa639d@oracle.com> <40b04f2e-1d6c-524e-ea4a-08c42fd41ee6@gmail.com> <93a1ffeb-4959-3bdb-cbe3-510c258129b6@oracle.com> <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> <33358f2d-4e01-7ccb-0f06-02b6828fe65b@gmail.com> <56804cfb-7b80-a8f0-c866-cda4b36799fd@gmail.com> Message-ID: PING: Could you review this change? I've sent review request about a month ago, but I do not yet get second reviewer. >>>> > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ Yasumasa On 2018/04/10 20:10, Yasumasa Suenaga wrote: > PING: Could you review it? > We need one more reviewer. > >>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ > > > Yasumasa > > > On 2018/04/03 21:37, Yasumasa Suenaga wrote: >> PING: Could you review it? >> This change has been passed Mach5 test. >> >>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >> >> >> Thanks, >> >> Yasumasa >> >> >> On 2018/03/28 22:38, Stefan Johansson wrote: >>> Mach5 testing looks good. >>> >>> Can someone in the serviceability team do the second review? >>> >>> Cheers, >>> Stefan >>> >>> On 2018-03-28 13:32, Yasumasa Suenaga wrote: >>>> Thanks Stefan, >>>> I'm waiting for second reviewer. >>>> >>>> >>>> Yasumasa >>>> >>>> >>>> 2018?3?28?(?) 18:36 Stefan Johansson >: >>>> >>>> ??? Hi Yasumasa, >>>> >>>> ??? Local testing looks good and I've kicked of some additional Mach5 >>>> ??? testing that will include these tests on all platforms. >>>> >>>> ??? Cheers, >>>> ??? Stefan >>>> >>>> ??? On 2018-03-28 06:04, Yasumasa Suenaga wrote: >>>> ??? > Hi Stefan, >>>> ??? > >>>> ??? > Thank you for sharing your report! >>>> ??? > I could reproduce them on my VM. >>>> ??? > >>>> ??? > I've fixed them in new webrev, and it works fine on my environment. >>>> ??? > Could you check again? >>>> ??? > >>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >>>> ??? > >>>> ??? > >>>> ??? > Thanks, >>>> ??? > >>>> ??? > Yasumasa >>>> ??? > >>>> ??? > >>>> ??? > >>>> ??? > 2018-03-28 0:29 GMT+09:00 Stefan Johansson >: >>>> ??? >> >>>> ??? >> On 2018-03-27 16:44, Yasumasa Suenaga wrote: >>>> ??? >>> Hi Stefan, >>>> ??? >>> >>>> ??? >>> On 2018/03/27 22:45, Stefan Johansson wrote: >>>> ??? >>>> Hi Yasumasa, >>>> ??? >>>> >>>> ??? >>>> On 2018-03-27 10:56, Yasumasa Suenaga wrote: >>>> ??? >>>>> Hi Stefan, >>>> ??? >>>>> >>>> ??? >>>>> Thank you for your comment. >>>> ??? >>>>> I updated webrev: >>>> ??? >>>>> >>>> ??? >>>>>? ? ?webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.01/ >>>> ??? >>>> I think the usage of Optional in Expression.setRequired(bool) is a bit >>>> ??? >>>> unnecessary. It will create temporary objects and there is no benefit from >>>> ??? >>>> just doing two simple if-statements. >>>> ??? >>> >>>> ??? >>> I fixed it in new webrev: >>>> ??? >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.02/ >>>> ??? >>> >>>> ??? >>> >>>> ??? >>>> I also ran this patch (and the one using forcibly) on my single core VM >>>> ??? >>>> and realized that this fix will have to include some awk-file updates to >>>> ??? >>>> make the test in test/jdk/sun/tools/jstat pass when Serial in chosen as the >>>> ??? >>>> default collector. The tests in test/jdk/sun/tools/jstatd/ are fine. >>>> ??? >>> >>>> ??? >>> Can you share the failure report? >>>> ??? >> It relates to all tests that display the the CGC and the CGCT columns, for >>>> ??? >> example in jstatGCOutput1.sh: >>>> ??? >>? ?S0C? ? S1C? ? S0U? ? S1U? ? ? EC? ? ? ?EU OC ?OU? ? ? ?MC? ? ?MU >>>> ??? >> CCSC? ?CCSU? ?YGC? ? ?YGCT FGC? ? FGCT? ? CGC CGCT? ? ?GCT >>>> ??? >> 256.0? 256.0? 254.0? ?0.0? ? 2176.0? ?1025.0 5504.0 920.5? ? 7168.0 >>>> ??? >> 6839.7 768.0? 602.8? ? ? ?2? ? 0.007? ?0 0.000? ?- ? ? ? -? ? 0.007 >>>> ??? >> >>>> ??? >> The awk regex needs to be updated to handle '-' for these tests: >>>> ??? >> test: sun/tools/jstat/jstatGcCapacityOutput1.sh >>>> ??? >> Failed. Execution failed: exit code 1 >>>> ??? >> >>>> ??? >> test: sun/tools/jstat/jstatGcMetaCapacityOutput1.sh >>>> ??? >> Failed. Execution failed: exit code 1 >>>> ??? >> >>>> ??? >> test: sun/tools/jstat/jstatGcNewCapacityOutput1.sh >>>> ??? >> Failed. Execution failed: exit code 1 >>>> ??? >> >>>> ??? >> test: sun/tools/jstat/jstatGcOldCapacityOutput1.sh >>>> ??? >> Failed. Execution failed: exit code 1 >>>> ??? >> >>>> ??? >> test: sun/tools/jstat/jstatGcOldOutput1.sh >>>> ??? >> Failed. Execution failed: exit code 1 >>>> ??? >> >>>> ??? >> test: sun/tools/jstat/jstatGcOutput1.sh >>>> ??? >> Failed. Execution failed: exit code 1 >>>> ??? >> >>>> ??? >> >>>> ??? >>> If it occurs in jstatClassloadOutput1.sh, it relates to JDK-8173942. >>>> ??? >>> >>>> ??? >>> >>>> ??? >>> Thanks, >>>> ??? >>> >>>> ??? >>> Yasumasa >>>> ??? >>> >>>> ??? >>> >>>> ??? >>>> Thanks, >>>> ??? >>>> Stefan >>>> ??? >>>>>? ? ?submit-hs: mach5-one-ysuenaga-JDK-8199519-20180327-0652-16322 >>>> ??? >>>>> >>>> ??? >>>>> >>>> ??? >>>>> Thanks, >>>> ??? >>>>> >>>> ??? >>>>> Yasumasa >>>> ??? >>>>> >>>> ??? >>>>> >>>> ??? >>>>> >>>> ??? >>>>> 2018-03-27 0:03 GMT+09:00 Stefan Johansson >>>> ??? >>>>> >: >>>> ??? >>>>>> Hi Yasumasa, >>>> ??? >>>>>> >>>> ??? >>>>>> On 2018-03-22 11:35, Yasumasa Suenaga wrote: >>>> ??? >>>>>>> Hi all, >>>> ??? >>>>>>> >>>> ??? >>>>>>> Please review this change: >>>> ??? >>>>>>> >>>> ??? >>>>>>>? ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8199519 >>>> ??? >>>>>>> webrev: cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.00/ >>>> ??? >>>>>> The fix seems to make things to work as expected. Manually tested it >>>> ??? >>>>>> and >>>> ??? >>>>>> Mach5 also looks good. >>>> ??? >>>>>> >>>> ??? >>>>>> I have some comments regarding the patch. I think 'forcibly' should be >>>> ??? >>>>>> rename to something more descriptive. Naming is never easy but I think >>>> ??? >>>>>> 'required' would be better, as in, this column is required and not >>>> ??? >>>>>> allowed >>>> ??? >>>>>> to print '-'. That would also render the code in >>>> ??? >>>>>> ExpressionResolver.java to >>>> ??? >>>>>> be: >>>> ??? >>>>>>? ? ?return new Literal(isRequired ? 0.0d : Double.NaN); >>>> ??? >>>>>> I think that also better explains why we return 0 instead of NaN. >>>> ??? >>>>>> >>>> ??? >>>>>> I would also like to see the forcibly/required state moved into the >>>> ??? >>>>>> Expression it self, that way we don't have to pass it around but can >>>> ??? >>>>>> instead >>>> ??? >>>>>> do: >>>> ??? >>>>>>? ? ?return new Literal(e.isRequired() ? 0.0d : Double.NaN); >>>> ??? >>>>>> >>>> ??? >>>>>> Thanks, >>>> ??? >>>>>> Stefan >>>> ??? >>>>>> >>>> ??? >>>>>> >>>> ??? >>>>>>> After JDK-8153333, some jstat tests are failed because GCT in jstat >>>> ??? >>>>>>> output >>>> ??? >>>>>>> is dash (-) if garbage collector is not concurrent collector e.g. >>>> ??? >>>>>>> Serial GC. >>>> ??? >>>>>>> I fixed that GCT can be calculated correctly. >>>> ??? >>>>>>> >>>> ??? >>>>>>> This change has been tested on Mach5 by Stefan. >>>> ??? >>>>>>> >>>> ??? >>>>>>> >>>> ??? >>>>>>> Thanks, >>>> ??? >>>>>>> >>>> ??? >>>>>>> Yasumasa >>>> ??? >>>>>> >>>> >>> From jini.george at oracle.com Wed Apr 25 17:49:29 2018 From: jini.george at oracle.com (Jini George) Date: Wed, 25 Apr 2018 23:19:29 +0530 Subject: PING: RFR: 8199519: Several GC tests fails with: java.lang.NumberFormatException: Unparseable number: "-" In-Reply-To: References: <6755303f-a1a0-da4f-e1e0-a1bcb0c72efd@gmail.com> <7809552d-dfa0-5f26-bd82-c13df7f45f5f@oracle.com> <85853429-a520-1782-40e4-e05776aa639d@oracle.com> <40b04f2e-1d6c-524e-ea4a-08c42fd41ee6@gmail.com> <93a1ffeb-4959-3bdb-cbe3-510c258129b6@oracle.com> <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> <33358f2d-4e01-7ccb-0f06-02b6828fe65b@gmail.com> <56804cfb-7b80-a8f0-c866-cda4b36799fd@gmail.com> Message-ID: Hi Yasumasa, Your changes look good to me. Thanks, Jini. On 4/25/2018 6:18 PM, Yasumasa Suenaga wrote: > PING: Could you review this change? > I've sent review request about a month ago, but I do not yet get second > reviewer. > >>>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ > > > Yasumasa > > > On 2018/04/10 20:10, Yasumasa Suenaga wrote: >> PING: Could you review it? >> We need one more reviewer. >> >>>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >> >> >> Yasumasa >> >> >> On 2018/04/03 21:37, Yasumasa Suenaga wrote: >>> PING: Could you review it? >>> This change has been passed Mach5 test. >>> >>>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> On 2018/03/28 22:38, Stefan Johansson wrote: >>>> Mach5 testing looks good. >>>> >>>> Can someone in the serviceability team do the second review? >>>> >>>> Cheers, >>>> Stefan >>>> >>>> On 2018-03-28 13:32, Yasumasa Suenaga wrote: >>>>> Thanks Stefan, >>>>> I'm waiting for second reviewer. >>>>> >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> 2018?3?28?(?) 18:36 Stefan Johansson >>>>> >: >>>>> >>>>> ??? Hi Yasumasa, >>>>> >>>>> ??? Local testing looks good and I've kicked of some additional Mach5 >>>>> ??? testing that will include these tests on all platforms. >>>>> >>>>> ??? Cheers, >>>>> ??? Stefan >>>>> >>>>> ??? On 2018-03-28 06:04, Yasumasa Suenaga wrote: >>>>> ??? > Hi Stefan, >>>>> ??? > >>>>> ??? > Thank you for sharing your report! >>>>> ??? > I could reproduce them on my VM. >>>>> ??? > >>>>> ??? > I've fixed them in new webrev, and it works fine on my >>>>> environment. >>>>> ??? > Could you check again? >>>>> ??? > >>>>> ??? > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >>>>> >>>>> ??? > >>>>> ??? > >>>>> ??? > Thanks, >>>>> ??? > >>>>> ??? > Yasumasa >>>>> ??? > >>>>> ??? > >>>>> ??? > >>>>> ??? > 2018-03-28 0:29 GMT+09:00 Stefan Johansson >>>>> >: >>>>> ??? >> >>>>> ??? >> On 2018-03-27 16:44, Yasumasa Suenaga wrote: >>>>> ??? >>> Hi Stefan, >>>>> ??? >>> >>>>> ??? >>> On 2018/03/27 22:45, Stefan Johansson wrote: >>>>> ??? >>>> Hi Yasumasa, >>>>> ??? >>>> >>>>> ??? >>>> On 2018-03-27 10:56, Yasumasa Suenaga wrote: >>>>> ??? >>>>> Hi Stefan, >>>>> ??? >>>>> >>>>> ??? >>>>> Thank you for your comment. >>>>> ??? >>>>> I updated webrev: >>>>> ??? >>>>> >>>>> ??? >>>>>? ? ?webrev: >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.01/ >>>>> >>>>> ??? >>>> I think the usage of Optional in >>>>> Expression.setRequired(bool) is a bit >>>>> ??? >>>> unnecessary. It will create temporary objects and there is >>>>> no benefit from >>>>> ??? >>>> just doing two simple if-statements. >>>>> ??? >>> >>>>> ??? >>> I fixed it in new webrev: >>>>> ??? >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.02/ >>>>> >>>>> ??? >>> >>>>> ??? >>> >>>>> ??? >>>> I also ran this patch (and the one using forcibly) on my >>>>> single core VM >>>>> ??? >>>> and realized that this fix will have to include some >>>>> awk-file updates to >>>>> ??? >>>> make the test in test/jdk/sun/tools/jstat pass when Serial >>>>> in chosen as the >>>>> ??? >>>> default collector. The tests in test/jdk/sun/tools/jstatd/ >>>>> are fine. >>>>> ??? >>> >>>>> ??? >>> Can you share the failure report? >>>>> ??? >> It relates to all tests that display the the CGC and the >>>>> CGCT columns, for >>>>> ??? >> example in jstatGCOutput1.sh: >>>>> ??? >>? ?S0C? ? S1C? ? S0U? ? S1U? ? ? EC? ? ? ?EU OC ?OU? ? ? ?MC >>>>> ? ?MU >>>>> ??? >> CCSC? ?CCSU? ?YGC? ? ?YGCT FGC? ? FGCT? ? CGC CGCT? ? ?GCT >>>>> ??? >> 256.0? 256.0? 254.0? ?0.0? ? 2176.0? ?1025.0 5504.0 920.5 >>>>> 7168.0 >>>>> ??? >> 6839.7 768.0? 602.8? ? ? ?2? ? 0.007? ?0 0.000? ?- ? ? ? - >>>>> ? 0.007 >>>>> ??? >> >>>>> ??? >> The awk regex needs to be updated to handle '-' for these >>>>> tests: >>>>> ??? >> test: sun/tools/jstat/jstatGcCapacityOutput1.sh >>>>> ??? >> Failed. Execution failed: exit code 1 >>>>> ??? >> >>>>> ??? >> test: sun/tools/jstat/jstatGcMetaCapacityOutput1.sh >>>>> ??? >> Failed. Execution failed: exit code 1 >>>>> ??? >> >>>>> ??? >> test: sun/tools/jstat/jstatGcNewCapacityOutput1.sh >>>>> ??? >> Failed. Execution failed: exit code 1 >>>>> ??? >> >>>>> ??? >> test: sun/tools/jstat/jstatGcOldCapacityOutput1.sh >>>>> ??? >> Failed. Execution failed: exit code 1 >>>>> ??? >> >>>>> ??? >> test: sun/tools/jstat/jstatGcOldOutput1.sh >>>>> ??? >> Failed. Execution failed: exit code 1 >>>>> ??? >> >>>>> ??? >> test: sun/tools/jstat/jstatGcOutput1.sh >>>>> ??? >> Failed. Execution failed: exit code 1 >>>>> ??? >> >>>>> ??? >> >>>>> ??? >>> If it occurs in jstatClassloadOutput1.sh, it relates to >>>>> JDK-8173942. >>>>> ??? >>> >>>>> ??? >>> >>>>> ??? >>> Thanks, >>>>> ??? >>> >>>>> ??? >>> Yasumasa >>>>> ??? >>> >>>>> ??? >>> >>>>> ??? >>>> Thanks, >>>>> ??? >>>> Stefan >>>>> ??? >>>>>? ? ?submit-hs: >>>>> mach5-one-ysuenaga-JDK-8199519-20180327-0652-16322 >>>>> ??? >>>>> >>>>> ??? >>>>> >>>>> ??? >>>>> Thanks, >>>>> ??? >>>>> >>>>> ??? >>>>> Yasumasa >>>>> ??? >>>>> >>>>> ??? >>>>> >>>>> ??? >>>>> >>>>> ??? >>>>> 2018-03-27 0:03 GMT+09:00 Stefan Johansson >>>>> ??? >>>>> >>>> >: >>>>> ??? >>>>>> Hi Yasumasa, >>>>> ??? >>>>>> >>>>> ??? >>>>>> On 2018-03-22 11:35, Yasumasa Suenaga wrote: >>>>> ??? >>>>>>> Hi all, >>>>> ??? >>>>>>> >>>>> ??? >>>>>>> Please review this change: >>>>> ??? >>>>>>> >>>>> ??? >>>>>>>? ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8199519 >>>>> ??? >>>>>>> webrev: >>>>> cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.00/ >>>>> >>>>> ??? >>>>>> The fix seems to make things to work as expected. >>>>> Manually tested it >>>>> ??? >>>>>> and >>>>> ??? >>>>>> Mach5 also looks good. >>>>> ??? >>>>>> >>>>> ??? >>>>>> I have some comments regarding the patch. I think >>>>> 'forcibly' should be >>>>> ??? >>>>>> rename to something more descriptive. Naming is never >>>>> easy but I think >>>>> ??? >>>>>> 'required' would be better, as in, this column is >>>>> required and not >>>>> ??? >>>>>> allowed >>>>> ??? >>>>>> to print '-'. That would also render the code in >>>>> ??? >>>>>> ExpressionResolver.java to >>>>> ??? >>>>>> be: >>>>> ??? >>>>>>? ? ?return new Literal(isRequired ? 0.0d : Double.NaN); >>>>> ??? >>>>>> I think that also better explains why we return 0 >>>>> instead of NaN. >>>>> ??? >>>>>> >>>>> ??? >>>>>> I would also like to see the forcibly/required state >>>>> moved into the >>>>> ??? >>>>>> Expression it self, that way we don't have to pass it >>>>> around but can >>>>> ??? >>>>>> instead >>>>> ??? >>>>>> do: >>>>> ??? >>>>>>? ? ?return new Literal(e.isRequired() ? 0.0d : Double.NaN); >>>>> ??? >>>>>> >>>>> ??? >>>>>> Thanks, >>>>> ??? >>>>>> Stefan >>>>> ??? >>>>>> >>>>> ??? >>>>>> >>>>> ??? >>>>>>> After JDK-8153333, some jstat tests are failed because >>>>> GCT in jstat >>>>> ??? >>>>>>> output >>>>> ??? >>>>>>> is dash (-) if garbage collector is not concurrent >>>>> collector e.g. >>>>> ??? >>>>>>> Serial GC. >>>>> ??? >>>>>>> I fixed that GCT can be calculated correctly. >>>>> ??? >>>>>>> >>>>> ??? >>>>>>> This change has been tested on Mach5 by Stefan. >>>>> ??? >>>>>>> >>>>> ??? >>>>>>> >>>>> ??? >>>>>>> Thanks, >>>>> ??? >>>>>>> >>>>> ??? >>>>>>> Yasumasa >>>>> ??? >>>>>> >>>>> >>>> From jini.george at oracle.com Thu Apr 26 04:21:30 2018 From: jini.george at oracle.com (Jini George) Date: Thu, 26 Apr 2018 09:51:30 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> Message-ID: Thank you, Yasumasa. I hope to implement the consolidation with TestSAServer.java (and have the SA core file debug testing template done) as a part of a separate enhancement: https://bugs.openjdk.java.net/browse/JDK-8202297 Let me know if this is not OK with you. Thanks, Jini. On 4/25/2018 6:14 PM, Yasumasa Suenaga wrote: > Hi Jini, > > >>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : > > ? : > >>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>> coredump >>>>>>>> generating test cases also like: >>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. > > > I guessed the tests for coredumps will be handled in another issue (with > TestSAServer.java). > Is it okay to implement coredump test in this changeset? > > IMHO, it looks to implement as new test basis (e.g. > LingeredAppForCoredump - ulimit check, set, etc...). > > > Thanks, > > Yasumasa > > > > On 2018/04/25 12:26, Jini George wrote: >> Thank you very much, David for looking into this. I have incorporated >> all the comments and the revised webrev is at: >> >> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html >> >> Thanks, >> Jini. >> >> On 4/24/2018 3:29 PM, David Holmes wrote: >>> Hi Jini, >>> >>> Not a full review as I'm not familiar enough with this code. >>> >>> My main comment, again, relates to >>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it >>> should not fail (throw Error) if there is no core file generated etc. >>> In that case the test should be skipped with a clear message (as >>> elsewhere). Otherwise this test will fail locally for me every time I >>> run the serviceability tests! >>> >>> I also have a few style issues. >>> >>> Don't compare boolean functions with true or false i.e. >>> >>> if (isX() == true) -> if (isX()) >>> if (isX() == false) -> if (!isX()) >>> >>> this occurs in most of the Java files. It is especially noticeable >>> when you mix styles ie: >>> >>> +???? if (VM.getVM().isSharingEnabled()) {? <= implicit check of true >>> +?????? // Check if the value falls in the _md_region >>> +?????? FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); >>> +?????? if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= >>> explicit check >>> +????????? return cdsFileMapInfo.getTypeForVptrAddress(loc1); >>> +?????? } >>> +???? } >>> >>> --- >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java >>> >>> >>> ??139???????? vTableTypeMap.put >>> ??140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), >>> metadataTypeArray[i]); >>> ??141???????? // The '+ 1' below is to skip the entry containing the >>> size of this metadata's vtable. >>> ??142???????? copiedVtableAddress = >>> ??143?????????? copiedVtableAddress.addOffsetTo((metadataVTableSize + >>> 1) * VM.getVM().getAddressSize()); >>> >>> If you store VM.getVM().getAddressSize() in a local you only need >>> call it once, and the other lines of code will be shorter. >>> >>> On line 139/140 keep the opening parenthesis with the method name ie: >>> >>> ???? vTableTypeMap.put( >>> >>> but with shorter lines you should be able to reformat that more >>> cleanly anyway. >>> >>> >>> ??146?? } // FileMapHeader >>> ??147 } // FileMapInfo >>> >>> We generally? don't comment the end of blocks. >>> >>> --- >>> >>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java >>> >>> ??96???????????? } catch (Throwable t) { >>> ?? 97???????????????? throw new Error("Can't execute the java cds >>> process."); >>> ?? 98???????????? } >>> >>> Set 't' as the cause of the new Error so we can see why it failed. >>> >>> Thanks, >>> David >>> >>> On 24/04/2018 7:03 PM, Jini George wrote: >>>> Hello! >>>> >>>> The webrev including the check for the "|" at the beginning of the >>>> core_pattern file is at: >>>> >>>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >>>> >>>> This webrev also includes a fix for a latent bug on MacOSX, where >>>> corefile debugging was failing due to SA trying to read in the >>>> incorrect mangled symbol name for "Arguments::SharedArchivePath". >>>> Clang seems to have prefixed an extra '_' to change the mangled name >>>> from '_ZN9Arguments17SharedArchivePathE' to >>>> '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for >>>> this is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >>>> >>>> The difference between the earlier patch and this one can be seen at: >>>> >>>> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >>>> >>>> Thank you, >>>> Jini. >>>> >>>> >>>> On 4/18/2018 10:37 PM, Jini George wrote: >>>>> I agree with the need of testing as much as we can. I could do >>>>> something on the lines of how other debuggers like LLDB test: if we >>>>> can't find the core file location, check for "|" at the beginning >>>>> of a line in the /proc/sys/kernel/core_pattern file -- and fail >>>>> with a message stating that the system is using a crash reporting >>>>> tool. >>>>> >>>>> Thank you, >>>>> Jini. >>>>> >>>>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>>>> My 2c ... >>>>>> >>>>>> We have to have tests that can test core file attaching capability >>>>>> - else we don't know it works. So we have to try and generate a >>>>>> core file. >>>>>> >>>>>> But, we have to expect that in many cases no core file will be >>>>>> generated even if the hs-err file claims it was. For example my >>>>>> primary local testing system never generates core files even >>>>>> though it claims to: >>>>>> >>>>>> # Core dump will be written. Default location: Core dumps may be >>>>>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>>>> >>>>>> >>>>>> apport isn't even installed, even though core_pattern lists it. >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>>>> Thank you very much, Yasumasa, for pointing this out. You are >>>>>>>> right -- this >>>>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>>>> >>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>> coredump >>>>>>>> generating test cases also like: >>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>>>> >>>>>>> I agree with you, but... >>>>>>> >>>>>>>> ?From what i can gather, i think we might be able to at least >>>>>>>> partially >>>>>>>> address this by using >>>>>>>> >>>>>>>> coredumptl -o dump >>>>>>>> >>>>>>>> in the test cases, provided the kernel.core_pattern variable is >>>>>>>> not set to >>>>>>>> "|/bin/false". >>>>>>>> >>>>>>>> Let me know if you are not OK with this. >>>>>>> >>>>>>> IMHO it is not good. >>>>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>>>> Hence I think we should disable all tests which requires core images >>>>>>> for Linux like a Windows platform. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>>> Thank you, >>>>>>>> Jini. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>>>> >>>>>>>>> Hi Jini, >>>>>>>>> >>>>>>>>> ClhsdbCDSCore.java: >>>>>>>>> ??? Can this test work on modern Linux? >>>>>>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>>>>>>> images. So >>>>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>>>> >>>>>>>>>> Ping: Gentle reminder ! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jini. >>>>>>>>>> >>>>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>>>> >>>>>>>>>>> Hello! >>>>>>>>>>> >>>>>>>>>>> Requesting reviews for: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> While trying to identify the type given an address, a >>>>>>>>>>> WrongTypeException >>>>>>>>>>> was getting thrown with various clhsdb commands (like >>>>>>>>>>> printmdo, jstack, >>>>>>>>>>> etc). This was since SA tries to map an address to a hotspot >>>>>>>>>>> C++ type by >>>>>>>>>>> comparing the vtable address to the vtable address values of >>>>>>>>>>> known >>>>>>>>>>> types. With CDS, since the vtables are copied over for the >>>>>>>>>>> Metadata >>>>>>>>>>> classes, the vtable addresses themselves don't match (though, >>>>>>>>>>> of course, >>>>>>>>>>> the contents will), and SA errors out. >>>>>>>>>>> >>>>>>>>>>> The fix has been implemented by making changes to read in the >>>>>>>>>>> md region >>>>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>>>>> mapping >>>>>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>>>>> (ConstantPool, >>>>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>>>> >>>>>>>>>>> For corefiles, an additional modification has been done to >>>>>>>>>>> have the >>>>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in >>>>>>>>>>> SA in >>>>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>>>> >>>>>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>>>>> added with >>>>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jini. >>>>>>>>>> >>>>>>>>>> >>>>>>>> From david.holmes at oracle.com Thu Apr 26 04:57:50 2018 From: david.holmes at oracle.com (David Holmes) Date: Thu, 26 Apr 2018 14:57:50 +1000 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> Message-ID: <35da72c7-21de-8eff-840b-e29ce041b1c9@oracle.com> Thanks Jini, I have no further comments as the tests "passed" for me. David On 25/04/2018 1:26 PM, Jini George wrote: > Thank you very much, David for looking into this. I have incorporated > all the comments and the revised webrev is at: > > http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html > > Thanks, > Jini. > > On 4/24/2018 3:29 PM, David Holmes wrote: >> Hi Jini, >> >> Not a full review as I'm not familiar enough with this code. >> >> My main comment, again, relates to >> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it >> should not fail (throw Error) if there is no core file generated etc. >> In that case the test should be skipped with a clear message (as >> elsewhere). Otherwise this test will fail locally for me every time I >> run the serviceability tests! >> >> I also have a few style issues. >> >> Don't compare boolean functions with true or false i.e. >> >> if (isX() == true) -> if (isX()) >> if (isX() == false) -> if (!isX()) >> >> this occurs in most of the Java files. It is especially noticeable >> when you mix styles ie: >> >> +???? if (VM.getVM().isSharingEnabled()) {? <= implicit check of true >> +?????? // Check if the value falls in the _md_region >> +?????? FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); >> +?????? if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= >> explicit check >> +????????? return cdsFileMapInfo.getTypeForVptrAddress(loc1); >> +?????? } >> +???? } >> >> --- >> >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java >> >> >> ??139???????? vTableTypeMap.put >> ??140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), >> metadataTypeArray[i]); >> ??141???????? // The '+ 1' below is to skip the entry containing the >> size of this metadata's vtable. >> ??142???????? copiedVtableAddress = >> ??143?????????? copiedVtableAddress.addOffsetTo((metadataVTableSize + >> 1) * VM.getVM().getAddressSize()); >> >> If you store VM.getVM().getAddressSize() in a local you only need call >> it once, and the other lines of code will be shorter. >> >> On line 139/140 keep the opening parenthesis with the method name ie: >> >> ???? vTableTypeMap.put( >> >> but with shorter lines you should be able to reformat that more >> cleanly anyway. >> >> >> ??146?? } // FileMapHeader >> ??147 } // FileMapInfo >> >> We generally? don't comment the end of blocks. >> >> --- >> >> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java >> >> ??96???????????? } catch (Throwable t) { >> ?? 97???????????????? throw new Error("Can't execute the java cds >> process."); >> ?? 98???????????? } >> >> Set 't' as the cause of the new Error so we can see why it failed. >> >> Thanks, >> David >> >> On 24/04/2018 7:03 PM, Jini George wrote: >>> Hello! >>> >>> The webrev including the check for the "|" at the beginning of the >>> core_pattern file is at: >>> >>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >>> >>> This webrev also includes a fix for a latent bug on MacOSX, where >>> corefile debugging was failing due to SA trying to read in the >>> incorrect mangled symbol name for "Arguments::SharedArchivePath". >>> Clang seems to have prefixed an extra '_' to change the mangled name >>> from '_ZN9Arguments17SharedArchivePathE' to >>> '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for >>> this is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >>> >>> The difference between the earlier patch and this one can be seen at: >>> >>> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >>> >>> Thank you, >>> Jini. >>> >>> >>> On 4/18/2018 10:37 PM, Jini George wrote: >>>> I agree with the need of testing as much as we can. I could do >>>> something on the lines of how other debuggers like LLDB test: if we >>>> can't find the core file location, check for "|" at the beginning of >>>> a line in the /proc/sys/kernel/core_pattern file -- and fail with a >>>> message stating that the system is using a crash reporting tool. >>>> >>>> Thank you, >>>> Jini. >>>> >>>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>>> My 2c ... >>>>> >>>>> We have to have tests that can test core file attaching capability >>>>> - else we don't know it works. So we have to try and generate a >>>>> core file. >>>>> >>>>> But, we have to expect that in many cases no core file will be >>>>> generated even if the hs-err file claims it was. For example my >>>>> primary local testing system never generates core files even though >>>>> it claims to: >>>>> >>>>> # Core dump will be written. Default location: Core dumps may be >>>>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>>> >>>>> >>>>> apport isn't even installed, even though core_pattern lists it. >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>>> Thank you very much, Yasumasa, for pointing this out. You are >>>>>>> right -- this >>>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>>> >>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>> systemd-coredump) separately since this would apply to other >>>>>>> coredump >>>>>>> generating test cases also like: >>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>>> >>>>>> I agree with you, but... >>>>>> >>>>>>> ?From what i can gather, i think we might be able to at least >>>>>>> partially >>>>>>> address this by using >>>>>>> >>>>>>> coredumptl -o dump >>>>>>> >>>>>>> in the test cases, provided the kernel.core_pattern variable is >>>>>>> not set to >>>>>>> "|/bin/false". >>>>>>> >>>>>>> Let me know if you are not OK with this. >>>>>> >>>>>> IMHO it is not good. >>>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>>> Hence I think we should disable all tests which requires core images >>>>>> for Linux like a Windows platform. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> Thank you, >>>>>>> Jini. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>>> >>>>>>>> Hi Jini, >>>>>>>> >>>>>>>> ClhsdbCDSCore.java: >>>>>>>> ??? Can this test work on modern Linux? >>>>>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>>>>>> images. So >>>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>>> >>>>>>>>> Ping: Gentle reminder ! >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jini. >>>>>>>>> >>>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>>> >>>>>>>>>> Hello! >>>>>>>>>> >>>>>>>>>> Requesting reviews for: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>>> >>>>>>>>>> While trying to identify the type given an address, a >>>>>>>>>> WrongTypeException >>>>>>>>>> was getting thrown with various clhsdb commands (like >>>>>>>>>> printmdo, jstack, >>>>>>>>>> etc). This was since SA tries to map an address to a hotspot >>>>>>>>>> C++ type by >>>>>>>>>> comparing the vtable address to the vtable address values of >>>>>>>>>> known >>>>>>>>>> types. With CDS, since the vtables are copied over for the >>>>>>>>>> Metadata >>>>>>>>>> classes, the vtable addresses themselves don't match (though, >>>>>>>>>> of course, >>>>>>>>>> the contents will), and SA errors out. >>>>>>>>>> >>>>>>>>>> The fix has been implemented by making changes to read in the >>>>>>>>>> md region >>>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>>>> mapping >>>>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>>>> (ConstantPool, >>>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>>> >>>>>>>>>> For corefiles, an additional modification has been done to >>>>>>>>>> have the >>>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in >>>>>>>>>> SA in >>>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>>> >>>>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>>>> added with >>>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jini. >>>>>>>>> >>>>>>>>> >>>>>>> From jini.george at oracle.com Thu Apr 26 05:43:50 2018 From: jini.george at oracle.com (Jini George) Date: Thu, 26 Apr 2018 11:13:50 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: <35da72c7-21de-8eff-840b-e29ce041b1c9@oracle.com> References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> <35da72c7-21de-8eff-840b-e29ce041b1c9@oracle.com> Message-ID: <2976b442-4b5b-94f6-398c-c3612075bbaa@oracle.com> Many thanks, David! - Jini. On 4/26/2018 10:27 AM, David Holmes wrote: > Thanks Jini, I have no further comments as the tests "passed" for me. > > David > > On 25/04/2018 1:26 PM, Jini George wrote: >> Thank you very much, David for looking into this. I have incorporated >> all the comments and the revised webrev is at: >> >> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html >> >> Thanks, >> Jini. >> >> On 4/24/2018 3:29 PM, David Holmes wrote: >>> Hi Jini, >>> >>> Not a full review as I'm not familiar enough with this code. >>> >>> My main comment, again, relates to >>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it >>> should not fail (throw Error) if there is no core file generated etc. >>> In that case the test should be skipped with a clear message (as >>> elsewhere). Otherwise this test will fail locally for me every time I >>> run the serviceability tests! >>> >>> I also have a few style issues. >>> >>> Don't compare boolean functions with true or false i.e. >>> >>> if (isX() == true) -> if (isX()) >>> if (isX() == false) -> if (!isX()) >>> >>> this occurs in most of the Java files. It is especially noticeable >>> when you mix styles ie: >>> >>> +???? if (VM.getVM().isSharingEnabled()) {? <= implicit check of true >>> +?????? // Check if the value falls in the _md_region >>> +?????? FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); >>> +?????? if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= >>> explicit check >>> +????????? return cdsFileMapInfo.getTypeForVptrAddress(loc1); >>> +?????? } >>> +???? } >>> >>> --- >>> >>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java >>> >>> >>> ??139???????? vTableTypeMap.put >>> ??140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), >>> metadataTypeArray[i]); >>> ??141???????? // The '+ 1' below is to skip the entry containing the >>> size of this metadata's vtable. >>> ??142???????? copiedVtableAddress = >>> ??143?????????? copiedVtableAddress.addOffsetTo((metadataVTableSize + >>> 1) * VM.getVM().getAddressSize()); >>> >>> If you store VM.getVM().getAddressSize() in a local you only need >>> call it once, and the other lines of code will be shorter. >>> >>> On line 139/140 keep the opening parenthesis with the method name ie: >>> >>> ???? vTableTypeMap.put( >>> >>> but with shorter lines you should be able to reformat that more >>> cleanly anyway. >>> >>> >>> ??146?? } // FileMapHeader >>> ??147 } // FileMapInfo >>> >>> We generally? don't comment the end of blocks. >>> >>> --- >>> >>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java >>> >>> ??96???????????? } catch (Throwable t) { >>> ?? 97???????????????? throw new Error("Can't execute the java cds >>> process."); >>> ?? 98???????????? } >>> >>> Set 't' as the cause of the new Error so we can see why it failed. >>> >>> Thanks, >>> David >>> >>> On 24/04/2018 7:03 PM, Jini George wrote: >>>> Hello! >>>> >>>> The webrev including the check for the "|" at the beginning of the >>>> core_pattern file is at: >>>> >>>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >>>> >>>> This webrev also includes a fix for a latent bug on MacOSX, where >>>> corefile debugging was failing due to SA trying to read in the >>>> incorrect mangled symbol name for "Arguments::SharedArchivePath". >>>> Clang seems to have prefixed an extra '_' to change the mangled name >>>> from '_ZN9Arguments17SharedArchivePathE' to >>>> '__ZN9Arguments17SharedArchivePathE' for MachO files. This fix for >>>> this is in src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >>>> >>>> The difference between the earlier patch and this one can be seen at: >>>> >>>> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >>>> >>>> Thank you, >>>> Jini. >>>> >>>> >>>> On 4/18/2018 10:37 PM, Jini George wrote: >>>>> I agree with the need of testing as much as we can. I could do >>>>> something on the lines of how other debuggers like LLDB test: if we >>>>> can't find the core file location, check for "|" at the beginning >>>>> of a line in the /proc/sys/kernel/core_pattern file -- and fail >>>>> with a message stating that the system is using a crash reporting >>>>> tool. >>>>> >>>>> Thank you, >>>>> Jini. >>>>> >>>>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>>>> My 2c ... >>>>>> >>>>>> We have to have tests that can test core file attaching capability >>>>>> - else we don't know it works. So we have to try and generate a >>>>>> core file. >>>>>> >>>>>> But, we have to expect that in many cases no core file will be >>>>>> generated even if the hs-err file claims it was. For example my >>>>>> primary local testing system never generates core files even >>>>>> though it claims to: >>>>>> >>>>>> # Core dump will be written. Default location: Core dumps may be >>>>>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>>>> >>>>>> >>>>>> apport isn't even installed, even though core_pattern lists it. >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>>>> Thank you very much, Yasumasa, for pointing this out. You are >>>>>>>> right -- this >>>>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>>>> >>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>> coredump >>>>>>>> generating test cases also like: >>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>>>> >>>>>>> I agree with you, but... >>>>>>> >>>>>>>> ?From what i can gather, i think we might be able to at least >>>>>>>> partially >>>>>>>> address this by using >>>>>>>> >>>>>>>> coredumptl -o dump >>>>>>>> >>>>>>>> in the test cases, provided the kernel.core_pattern variable is >>>>>>>> not set to >>>>>>>> "|/bin/false". >>>>>>>> >>>>>>>> Let me know if you are not OK with this. >>>>>>> >>>>>>> IMHO it is not good. >>>>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>>>> Hence I think we should disable all tests which requires core images >>>>>>> for Linux like a Windows platform. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Yasumasa >>>>>>> >>>>>>> >>>>>>>> Thank you, >>>>>>>> Jini. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>>>> >>>>>>>>> Hi Jini, >>>>>>>>> >>>>>>>>> ClhsdbCDSCore.java: >>>>>>>>> ??? Can this test work on modern Linux? >>>>>>>>> ??? AFAIK modern Linux contains systemd-coredump to gather core >>>>>>>>> images. So >>>>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>>>> >>>>>>>>>> Ping: Gentle reminder ! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jini. >>>>>>>>>> >>>>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>>>> >>>>>>>>>>> Hello! >>>>>>>>>>> >>>>>>>>>>> Requesting reviews for: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> While trying to identify the type given an address, a >>>>>>>>>>> WrongTypeException >>>>>>>>>>> was getting thrown with various clhsdb commands (like >>>>>>>>>>> printmdo, jstack, >>>>>>>>>>> etc). This was since SA tries to map an address to a hotspot >>>>>>>>>>> C++ type by >>>>>>>>>>> comparing the vtable address to the vtable address values of >>>>>>>>>>> known >>>>>>>>>>> types. With CDS, since the vtables are copied over for the >>>>>>>>>>> Metadata >>>>>>>>>>> classes, the vtable addresses themselves don't match (though, >>>>>>>>>>> of course, >>>>>>>>>>> the contents will), and SA errors out. >>>>>>>>>>> >>>>>>>>>>> The fix has been implemented by making changes to read in the >>>>>>>>>>> md region >>>>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>>>>> mapping >>>>>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>>>>> (ConstantPool, >>>>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>>>> >>>>>>>>>>> For corefiles, an additional modification has been done to >>>>>>>>>>> have the >>>>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in >>>>>>>>>>> SA in >>>>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>>>> >>>>>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>>>>> added with >>>>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jini. >>>>>>>>>> >>>>>>>>>> >>>>>>>> From yasuenag at gmail.com Thu Apr 26 06:11:32 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Thu, 26 Apr 2018 15:11:32 +0900 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> Message-ID: Hi Jini, I have no further comment. Yasumasa 2018-04-26 13:21 GMT+09:00 Jini George : > Thank you, Yasumasa. I hope to implement the consolidation with > TestSAServer.java (and have the SA core file debug testing template done) as > a part of a separate enhancement: > https://bugs.openjdk.java.net/browse/JDK-8202297 > > Let me know if this is not OK with you. > > Thanks, > Jini. > > > On 4/25/2018 6:14 PM, Yasumasa Suenaga wrote: >> >> Hi Jini, >> >> >>>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >> >> >> : >> >>>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>>> coredump >>>>>>>>> generating test cases also like: >>>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >> >> >> >> I guessed the tests for coredumps will be handled in another issue (with >> TestSAServer.java). >> Is it okay to implement coredump test in this changeset? >> >> IMHO, it looks to implement as new test basis (e.g. LingeredAppForCoredump >> - ulimit check, set, etc...). >> >> >> Thanks, >> >> Yasumasa >> >> >> >> On 2018/04/25 12:26, Jini George wrote: >>> >>> Thank you very much, David for looking into this. I have incorporated all >>> the comments and the revised webrev is at: >>> >>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html >>> >>> Thanks, >>> Jini. >>> >>> On 4/24/2018 3:29 PM, David Holmes wrote: >>>> >>>> Hi Jini, >>>> >>>> Not a full review as I'm not familiar enough with this code. >>>> >>>> My main comment, again, relates to >>>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it should >>>> not fail (throw Error) if there is no core file generated etc. In that case >>>> the test should be skipped with a clear message (as elsewhere). Otherwise >>>> this test will fail locally for me every time I run the serviceability >>>> tests! >>>> >>>> I also have a few style issues. >>>> >>>> Don't compare boolean functions with true or false i.e. >>>> >>>> if (isX() == true) -> if (isX()) >>>> if (isX() == false) -> if (!isX()) >>>> >>>> this occurs in most of the Java files. It is especially noticeable when >>>> you mix styles ie: >>>> >>>> + if (VM.getVM().isSharingEnabled()) { <= implicit check of true >>>> + // Check if the value falls in the _md_region >>>> + FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); >>>> + if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= >>>> explicit check >>>> + return cdsFileMapInfo.getTypeForVptrAddress(loc1); >>>> + } >>>> + } >>>> >>>> --- >>>> >>>> >>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java >>>> >>>> 139 vTableTypeMap.put >>>> 140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), >>>> metadataTypeArray[i]); >>>> 141 // The '+ 1' below is to skip the entry containing the >>>> size of this metadata's vtable. >>>> 142 copiedVtableAddress = >>>> 143 copiedVtableAddress.addOffsetTo((metadataVTableSize + 1) >>>> * VM.getVM().getAddressSize()); >>>> >>>> If you store VM.getVM().getAddressSize() in a local you only need call >>>> it once, and the other lines of code will be shorter. >>>> >>>> On line 139/140 keep the opening parenthesis with the method name ie: >>>> >>>> vTableTypeMap.put( >>>> >>>> but with shorter lines you should be able to reformat that more cleanly >>>> anyway. >>>> >>>> >>>> 146 } // FileMapHeader >>>> 147 } // FileMapInfo >>>> >>>> We generally don't comment the end of blocks. >>>> >>>> --- >>>> >>>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java >>>> >>>> 96 } catch (Throwable t) { >>>> 97 throw new Error("Can't execute the java cds >>>> process."); >>>> 98 } >>>> >>>> Set 't' as the cause of the new Error so we can see why it failed. >>>> >>>> Thanks, >>>> David >>>> >>>> On 24/04/2018 7:03 PM, Jini George wrote: >>>>> >>>>> Hello! >>>>> >>>>> The webrev including the check for the "|" at the beginning of the >>>>> core_pattern file is at: >>>>> >>>>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >>>>> >>>>> This webrev also includes a fix for a latent bug on MacOSX, where >>>>> corefile debugging was failing due to SA trying to read in the incorrect >>>>> mangled symbol name for "Arguments::SharedArchivePath". Clang seems to have >>>>> prefixed an extra '_' to change the mangled name from >>>>> '_ZN9Arguments17SharedArchivePathE' to '__ZN9Arguments17SharedArchivePathE' >>>>> for MachO files. This fix for this is in >>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >>>>> >>>>> The difference between the earlier patch and this one can be seen at: >>>>> >>>>> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >>>>> >>>>> Thank you, >>>>> Jini. >>>>> >>>>> >>>>> On 4/18/2018 10:37 PM, Jini George wrote: >>>>>> >>>>>> I agree with the need of testing as much as we can. I could do >>>>>> something on the lines of how other debuggers like LLDB test: if we can't >>>>>> find the core file location, check for "|" at the beginning of a line in the >>>>>> /proc/sys/kernel/core_pattern file -- and fail with a message stating that >>>>>> the system is using a crash reporting tool. >>>>>> >>>>>> Thank you, >>>>>> Jini. >>>>>> >>>>>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>>>>> >>>>>>> My 2c ... >>>>>>> >>>>>>> We have to have tests that can test core file attaching capability - >>>>>>> else we don't know it works. So we have to try and generate a core file. >>>>>>> >>>>>>> But, we have to expect that in many cases no core file will be >>>>>>> generated even if the hs-err file claims it was. For example my primary >>>>>>> local testing system never generates core files even though it claims to: >>>>>>> >>>>>>> # Core dump will be written. Default location: Core dumps may be >>>>>>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>>>>> >>>>>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>>>>> >>>>>>> apport isn't even installed, even though core_pattern lists it. >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>>>>> >>>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>>>>> >>>>>>>>> Thank you very much, Yasumasa, for pointing this out. You are right >>>>>>>>> -- this >>>>>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>>>>> >>>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>>> coredump >>>>>>>>> generating test cases also like: >>>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>>>>> >>>>>>>> >>>>>>>> I agree with you, but... >>>>>>>> >>>>>>>>> From what i can gather, i think we might be able to at least >>>>>>>>> partially >>>>>>>>> address this by using >>>>>>>>> >>>>>>>>> coredumptl -o dump >>>>>>>>> >>>>>>>>> in the test cases, provided the kernel.core_pattern variable is not >>>>>>>>> set to >>>>>>>>> "|/bin/false". >>>>>>>>> >>>>>>>>> Let me know if you are not OK with this. >>>>>>>> >>>>>>>> >>>>>>>> IMHO it is not good. >>>>>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>>>>> Hence I think we should disable all tests which requires core images >>>>>>>> for Linux like a Windows platform. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Jini. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Jini, >>>>>>>>>> >>>>>>>>>> ClhsdbCDSCore.java: >>>>>>>>>> Can this test work on modern Linux? >>>>>>>>>> AFAIK modern Linux contains systemd-coredump to gather core >>>>>>>>>> images. So >>>>>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Yasumasa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ping: Gentle reminder ! >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jini. >>>>>>>>>>> >>>>>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hello! >>>>>>>>>>>> >>>>>>>>>>>> Requesting reviews for: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>>>>> >>>>>>>>>>>> While trying to identify the type given an address, a >>>>>>>>>>>> WrongTypeException >>>>>>>>>>>> was getting thrown with various clhsdb commands (like printmdo, >>>>>>>>>>>> jstack, >>>>>>>>>>>> etc). This was since SA tries to map an address to a hotspot C++ >>>>>>>>>>>> type by >>>>>>>>>>>> comparing the vtable address to the vtable address values of >>>>>>>>>>>> known >>>>>>>>>>>> types. With CDS, since the vtables are copied over for the >>>>>>>>>>>> Metadata >>>>>>>>>>>> classes, the vtable addresses themselves don't match (though, of >>>>>>>>>>>> course, >>>>>>>>>>>> the contents will), and SA errors out. >>>>>>>>>>>> >>>>>>>>>>>> The fix has been implemented by making changes to read in the md >>>>>>>>>>>> region >>>>>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>>>>>> mapping >>>>>>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>>>>>> (ConstantPool, >>>>>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>>>>> >>>>>>>>>>>> For corefiles, an additional modification has been done to have >>>>>>>>>>>> the >>>>>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA >>>>>>>>>>>> in >>>>>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>>>>> >>>>>>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>>>>>> added with >>>>>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Jini. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> > From jini.george at oracle.com Thu Apr 26 06:32:29 2018 From: jini.george at oracle.com (Jini George) Date: Thu, 26 Apr 2018 12:02:29 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS In-Reply-To: References: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> <9b3dd876-8c11-96c2-2e01-62f9658dd6cc@oracle.com> <20c65db9-026e-59b6-6879-2301e69d6598@oracle.com> <6c22f61c-ec62-ccfe-a7c6-147333159b39@oracle.com> <3107d1ca-1272-243b-0c17-6814251eafbf@oracle.com> <75cd047e-0f48-7d41-7ead-c4948770cf7d@oracle.com> <508ee4dc-d391-553c-f98a-d1d939f5d591@oracle.com> Message-ID: Thank you, Yasumasa. - Jini. On 4/26/2018 11:41 AM, Yasumasa Suenaga wrote: > Hi Jini, > > I have no further comment. > > > Yasumasa > > > > 2018-04-26 13:21 GMT+09:00 Jini George : >> Thank you, Yasumasa. I hope to implement the consolidation with >> TestSAServer.java (and have the SA core file debug testing template done) as >> a part of a separate enhancement: >> https://bugs.openjdk.java.net/browse/JDK-8202297 >> >> Let me know if this is not OK with you. >> >> Thanks, >> Jini. >> >> >> On 4/25/2018 6:14 PM, Yasumasa Suenaga wrote: >>> >>> Hi Jini, >>> >>> >>>>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>> >>> >>> : >>> >>>>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>>>> coredump >>>>>>>>>> generating test cases also like: >>>>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>> >>> >>> >>> I guessed the tests for coredumps will be handled in another issue (with >>> TestSAServer.java). >>> Is it okay to implement coredump test in this changeset? >>> >>> IMHO, it looks to implement as new test basis (e.g. LingeredAppForCoredump >>> - ulimit check, set, etc...). >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> >>> On 2018/04/25 12:26, Jini George wrote: >>>> >>>> Thank you very much, David for looking into this. I have incorporated all >>>> the comments and the revised webrev is at: >>>> >>>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.02/index.html >>>> >>>> Thanks, >>>> Jini. >>>> >>>> On 4/24/2018 3:29 PM, David Holmes wrote: >>>>> >>>>> Hi Jini, >>>>> >>>>> Not a full review as I'm not familiar enough with this code. >>>>> >>>>> My main comment, again, relates to >>>>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java and that it should >>>>> not fail (throw Error) if there is no core file generated etc. In that case >>>>> the test should be skipped with a clear message (as elsewhere). Otherwise >>>>> this test will fail locally for me every time I run the serviceability >>>>> tests! >>>>> >>>>> I also have a few style issues. >>>>> >>>>> Don't compare boolean functions with true or false i.e. >>>>> >>>>> if (isX() == true) -> if (isX()) >>>>> if (isX() == false) -> if (!isX()) >>>>> >>>>> this occurs in most of the Java files. It is especially noticeable when >>>>> you mix styles ie: >>>>> >>>>> + if (VM.getVM().isSharingEnabled()) { <= implicit check of true >>>>> + // Check if the value falls in the _md_region >>>>> + FileMapInfo cdsFileMapInfo = VM.getVM().getFileMapInfo(); >>>>> + if (cdsFileMapInfo.inCopiedVtableSpace(loc1) == true) { <= >>>>> explicit check >>>>> + return cdsFileMapInfo.getTypeForVptrAddress(loc1); >>>>> + } >>>>> + } >>>>> >>>>> --- >>>>> >>>>> >>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java >>>>> >>>>> 139 vTableTypeMap.put >>>>> 140 (copiedVtableAddress.addOffsetTo(VM.getVM().getAddressSize()), >>>>> metadataTypeArray[i]); >>>>> 141 // The '+ 1' below is to skip the entry containing the >>>>> size of this metadata's vtable. >>>>> 142 copiedVtableAddress = >>>>> 143 copiedVtableAddress.addOffsetTo((metadataVTableSize + 1) >>>>> * VM.getVM().getAddressSize()); >>>>> >>>>> If you store VM.getVM().getAddressSize() in a local you only need call >>>>> it once, and the other lines of code will be shorter. >>>>> >>>>> On line 139/140 keep the opening parenthesis with the method name ie: >>>>> >>>>> vTableTypeMap.put( >>>>> >>>>> but with shorter lines you should be able to reformat that more cleanly >>>>> anyway. >>>>> >>>>> >>>>> 146 } // FileMapHeader >>>>> 147 } // FileMapInfo >>>>> >>>>> We generally don't comment the end of blocks. >>>>> >>>>> --- >>>>> >>>>> test/hotspot/jtreg/serviceability/sa/ClhsdbCDSCore.java >>>>> >>>>> 96 } catch (Throwable t) { >>>>> 97 throw new Error("Can't execute the java cds >>>>> process."); >>>>> 98 } >>>>> >>>>> Set 't' as the cause of the new Error so we can see why it failed. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 24/04/2018 7:03 PM, Jini George wrote: >>>>>> >>>>>> Hello! >>>>>> >>>>>> The webrev including the check for the "|" at the beginning of the >>>>>> core_pattern file is at: >>>>>> >>>>>> http://cr.openjdk.java.net/~jgeorge/8174994/webrev.01/ >>>>>> >>>>>> This webrev also includes a fix for a latent bug on MacOSX, where >>>>>> corefile debugging was failing due to SA trying to read in the incorrect >>>>>> mangled symbol name for "Arguments::SharedArchivePath". Clang seems to have >>>>>> prefixed an extra '_' to change the mangled name from >>>>>> '_ZN9Arguments17SharedArchivePathE' to '__ZN9Arguments17SharedArchivePathE' >>>>>> for MachO files. This fix for this is in >>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/ps_core.c. >>>>>> >>>>>> The difference between the earlier patch and this one can be seen at: >>>>>> >>>>>> http://cr.openjdk.java.net/~jgeorge/8174994/differential.patch >>>>>> >>>>>> Thank you, >>>>>> Jini. >>>>>> >>>>>> >>>>>> On 4/18/2018 10:37 PM, Jini George wrote: >>>>>>> >>>>>>> I agree with the need of testing as much as we can. I could do >>>>>>> something on the lines of how other debuggers like LLDB test: if we can't >>>>>>> find the core file location, check for "|" at the beginning of a line in the >>>>>>> /proc/sys/kernel/core_pattern file -- and fail with a message stating that >>>>>>> the system is using a crash reporting tool. >>>>>>> >>>>>>> Thank you, >>>>>>> Jini. >>>>>>> >>>>>>> On 4/18/2018 12:40 PM, David Holmes wrote: >>>>>>>> >>>>>>>> My 2c ... >>>>>>>> >>>>>>>> We have to have tests that can test core file attaching capability - >>>>>>>> else we don't know it works. So we have to try and generate a core file. >>>>>>>> >>>>>>>> But, we have to expect that in many cases no core file will be >>>>>>>> generated even if the hs-err file claims it was. For example my primary >>>>>>>> local testing system never generates core files even though it claims to: >>>>>>>> >>>>>>>> # Core dump will be written. Default location: Core dumps may be >>>>>>>> processed with "/usr/share/apport/apport %p %s %c" (or dumping to / >>>>>>>> >>>>>>>> export/users/dh198349/valhalla/repos/valhalla-dev/open/test/jdk/core.29848) >>>>>>>> >>>>>>>> apport isn't even installed, even though core_pattern lists it. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 18/04/2018 4:36 PM, Yasumasa Suenaga wrote: >>>>>>>>> >>>>>>>>> 2018-04-18 15:05 GMT+09:00 Jini George : >>>>>>>>>> >>>>>>>>>> Thank you very much, Yasumasa, for pointing this out. You are right >>>>>>>>>> -- this >>>>>>>>>> would fail in the Linux systems if systemd-coredump is enabled. >>>>>>>>>> >>>>>>>>>> I plan to file an enhancement request to address this issue (wrt >>>>>>>>>> systemd-coredump) separately since this would apply to other >>>>>>>>>> coredump >>>>>>>>>> generating test cases also like: >>>>>>>>>> test/hotspot/jtreg/compiler/ciReplay/TestSAServer.java. >>>>>>>>> >>>>>>>>> >>>>>>>>> I agree with you, but... >>>>>>>>> >>>>>>>>>> From what i can gather, i think we might be able to at least >>>>>>>>>> partially >>>>>>>>>> address this by using >>>>>>>>>> >>>>>>>>>> coredumptl -o dump >>>>>>>>>> >>>>>>>>>> in the test cases, provided the kernel.core_pattern variable is not >>>>>>>>>> set to >>>>>>>>>> "|/bin/false". >>>>>>>>>> >>>>>>>>>> Let me know if you are not OK with this. >>>>>>>>> >>>>>>>>> >>>>>>>>> IMHO it is not good. >>>>>>>>> Some Linux distros use other coredump collector. For example, RHEL 6 >>>>>>>>> uses ABRT, Ubuntu uses Apport, Fedora uses systemd-coredump. >>>>>>>>> Hence I think we should disable all tests which requires core images >>>>>>>>> for Linux like a Windows platform. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Yasumasa >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Jini. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 4/14/2018 7:39 PM, Yasumasa Suenaga wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Jini, >>>>>>>>>>> >>>>>>>>>>> ClhsdbCDSCore.java: >>>>>>>>>>> Can this test work on modern Linux? >>>>>>>>>>> AFAIK modern Linux contains systemd-coredump to gather core >>>>>>>>>>> images. So >>>>>>>>>>> I concern ClhsdbCDSCore.java fails in the future. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yasumasa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2018/04/12 13:21, Jini George wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ping: Gentle reminder ! >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Jini. >>>>>>>>>>>> >>>>>>>>>>>> On 4/6/2018 9:51 PM, Jini George wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello! >>>>>>>>>>>>> >>>>>>>>>>>>> Requesting reviews for: >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8174994 >>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ >>>>>>>>>>>>> >>>>>>>>>>>>> While trying to identify the type given an address, a >>>>>>>>>>>>> WrongTypeException >>>>>>>>>>>>> was getting thrown with various clhsdb commands (like printmdo, >>>>>>>>>>>>> jstack, >>>>>>>>>>>>> etc). This was since SA tries to map an address to a hotspot C++ >>>>>>>>>>>>> type by >>>>>>>>>>>>> comparing the vtable address to the vtable address values of >>>>>>>>>>>>> known >>>>>>>>>>>>> types. With CDS, since the vtables are copied over for the >>>>>>>>>>>>> Metadata >>>>>>>>>>>>> classes, the vtable addresses themselves don't match (though, of >>>>>>>>>>>>> course, >>>>>>>>>>>>> the contents will), and SA errors out. >>>>>>>>>>>>> >>>>>>>>>>>>> The fix has been implemented by making changes to read in the md >>>>>>>>>>>>> region >>>>>>>>>>>>> (consisting of the c++ vtables) of the CDS archive in SA, and >>>>>>>>>>>>> mapping >>>>>>>>>>>>> the vtable addresses to the corresponding metadata type >>>>>>>>>>>>> (ConstantPool, >>>>>>>>>>>>> InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, >>>>>>>>>>>>> InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). >>>>>>>>>>>>> >>>>>>>>>>>>> For corefiles, an additional modification has been done to have >>>>>>>>>>>>> the >>>>>>>>>>>>> replicated FileMapHeader structure (from >>>>>>>>>>>>> src/hotspot/share/memory/filemap.hpp, which is replicated in SA >>>>>>>>>>>>> in >>>>>>>>>>>>> ps_core.c), to be in sync with the corresponding definition in >>>>>>>>>>>>> src/hotspot/share/memory/filemap.hpp. >>>>>>>>>>>>> >>>>>>>>>>>>> Test cases to test both live and corefile debugging are being >>>>>>>>>>>>> added with >>>>>>>>>>>>> this. These and other SA tests pass on Mach5. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Jini. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >> From mandy.chung at oracle.com Thu Apr 26 15:39:13 2018 From: mandy.chung at oracle.com (mandy chung) Date: Thu, 26 Apr 2018 23:39:13 +0800 Subject: RFR: JDK-8187498: Add a -Xmanagement flag as syntactic sugar for -Dcom.sun.management.jmxremote.* properties In-Reply-To: <4b0155dc-c5b6-9b19-0afc-3bb83d00ce2a@oracle.com> References: <0fca13c3-1b8f-947c-9041-bdea87b04ba8@oracle.com> <5A687F75.3050404@oracle.com> <05c2e5af-b9d0-0386-4565-ade1179a1d49@oracle.com> <7a18b9d5-3a59-8132-2c4f-ba5de35bfa1d@oracle.com> <0e019151-ef48-b4f4-d253-b9da62aa07c3@oracle.com> <248276b1-1589-561f-0718-7bcb1fd578c7@oracle.com> <40cf194e-4d74-3a34-5eb2-1acc5a5abafb@oracle.com> <87cb309b-8764-193a-0447-8ecb741d308d@oracle.com> <66ad342e-f41f-c01b-e515-8100d108e6ef@oracle.com> <0a13fe7c-b7b0-e5d9-1361-bfea19073574@oracle.com> <0332f3ad-1329-da9a-c6a7-192064fb04bb@oracle.com> <9ac62f66-c14d-951f-1a14-ccf8906ecab4@oracle.com> <5A8C33B6.4040206@oracle.com> <7bf813ed-9390-f9e1-b5df-32a4585c2be3@oracle.com> <31d45eaa-b5cb-7cd7-5311-6eaa70afe914@Oracle.com> <4b0155dc-c5b6-9b19-0afc-3bb83d00ce2a@oracle.com> Message-ID: On 4/23/18 1:20 PM, Harsha Wardhana B wrote: > Hi All, > > After internal discussions, many of the concerns below were addressed > and final spec is published at, > > https://bugs.openjdk.java.net/browse/JDK-8199584 > > Below is the implementation of the above spec. > > http://cr.openjdk.java.net/~hb/8187498/webrev.05/ > src/java.base/share/classes/sun/launcher/resources/launcher.properties 112 \ --start-management-agent option=value[:option=value:....]\n\ option and value should be