From jcbeyler at google.com Mon Apr 2 17:17:03 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 02 Apr 2018 17:17:03 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: Hi Derek, I know there were a few things that went in that provoked a merge conflict. I worked on it and got it up to date. Sadly my lack of knowledge makes it a full rebase instead of keeping all the history. However, with a newly cloned jdk/hs you should now be able to use: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ The change you are referring to was done with the others so perhaps you were unlucky and I forgot it in a webrev and fixed it in another? I don't know but it's been there and I checked, it is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html I double checked that tlab_end_offset no longer appears in any architecture (as far as I can tell :)). Thanks for testing and let me know if you run into any other issues! Jc On Fri, Mar 30, 2018 at 4:24 PM White, Derek wrote: > Hi Jc, > > > > I?ve been having trouble getting your patch to apply correctly. I may have > based it on the wrong version. > > > > In any case, I think there?s a missing update to > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where > ?JavaThread::tlab_end_offset()? should become > ?JavaThread::tlab_current_end_offset()?. > > > > This should correspond to the other port?s changes in > templateTable_.cpp files. > > > > Thanks! > - Derek > > > > *From:* hotspot-compiler-dev [mailto: > hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of *JC Beyler > *Sent:* Wednesday, March 28, 2018 11:43 AM > *To:* Erik ?sterlund > *Cc:* serviceability-dev at openjdk.java.net; hotspot-compiler-dev < > hotspot-compiler-dev at openjdk.java.net> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in the > JVMTI spec. > > > > Here is the two incremental webrevs: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > - Currently the tests pass 100 times in a row, I am working on checking > if they pass 1000 times in a row. > > - The default sampling rate is set to 512k, this is what we use > internally and having a default means that to enable the sampling with the > default, the user only has to do a enable event/disable event via JVMTI > (instead of enable + set sample rate). > > - I deprecated the code that was handling the fast path tlab refill if > it happened since this is now deprecated > > - Though I saw that Graal is still using it so I have to see what > needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when > the event system is turned on and the callback to the native agent is just > empty. I got a 3% overhead with a 512k sampling rate with the code I put in > the native side of my tests. > > > > Thanks and comments are appreciated, > > Jc > > > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler wrote: > > Hi all, > > > > The incremental webrev update is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > - I've removed the heapMonitoring.cpp code in favor of just having the > sampling events as per Serguei's request; I still have to do some overhead > measurements but the tests prove the concept can work > > - Most of the tlab code is unchanged, the only major part is that > now things get sent off to event collectors when used and enabled. > > - Added the interpreter collectors to handle interpreter execution > > - Updated the name from SetTlabHeapSampling to SetHeapSampling to be > more generic > > - Added a mutex for the thread sampling so that we can initialize an > internal static array safely > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed the heap > sampling code entirely and now rely entirely on the event sampling system. > The tests reflect this by using a simplified implementation of what an > agent could do: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had originally in > heapMonitoring to this. I hesitate only because that code was in C++, I'd > have to port it to C and this is for tests so perhaps what I have now is > good enough? > > > > As far as testing goes, I've ported all the relevant tests and then added > a few: > > - Turning the system on/off > > - Testing using various GCs > > - Testing using the interpreter > > - Testing the sampling rate > > - Testing with objects and arrays > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off vs a clean > build and I have 0% overhead, which is what we'd want. This was using the > Dacapo benchmarks. I am now preparing to run a version with the events on > using dacapo and will report back here. > > > > Any comments are welcome :) > > Jc > > > > > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler wrote: > > Hi all, > > > > I apologize for the delay but I wanted to add an event system and that > took a bit longer than expected and I also reworked the code to take into > account the deprecation of FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to ThreadHeapSampler inside of > Thread. Would you prefer it as a pointer inside of Thread or like this > works for you? Second question would be would you rather have an > association outside of Thread altogether that tries to remember when > threads are live and then we would have something like: > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too too bad? > > > > B) I also have been working on the Allocation event system that sends out > a notification at each sampled event. This will be practical when wanting > to do something at the allocation point. I'm also looking at if the whole > heapMonitoring code could not reside in the agent code and not in the JDK. > I'm not convinced but I'm talking to Serguei about it to see/assess :) > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since now > FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the methods. > > > > So the incremental webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track this :) > > > > Thanks for your input, > > Jc > > > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler wrote: > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer Robbin's concerns > about the same thing (adding things to Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > wrote: > > Hi JC, > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own answers/comments. > > > > I've done a new webrev here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > > The incremental is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > > Note to all: > > - I've been integrating changes from Erin/Serguei/David comments so this > webrev incremental is a bit an answer to all comments in one. I apologize > for that :) > > > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > wrote: > > Hi JC, > > Sorry for the delayed reply. > > Inlined answers: > > > > On 2018-02-06 00:04, JC Beyler wrote: > > Hi Erik, > > (Renaming this to be folded into the newly renamed thread :)) > > First off, thanks a lot for reviewing the webrev! I appreciate it! > > I updated the webrev to: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > And the incremental one is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > It contains: > - The change for since from 9 to 11 for the jvmti.xml > - The use of the OrderAccess for initialized > - Clearing the oop > > I also have inlined my answers to your comments. The biggest question > will come from the multiple *_end variables. A bit of the logic there > is due to handling the slow path refill vs fast path refill and > checking that the rug was not pulled underneath the slowpath. I > believe that a previous comment was that TlabFastRefill was going to > be deprecated. > > If this is true, we could revert this code a bit and just do a : if > TlabFastRefill is enabled, disable this. And then deprecate that when > TlabFastRefill is deprecated. > > This might simplify this webrev and I can work on a follow-up that > either: removes TlabFastRefill if Robbin does not have the time to do > it or add the support to the assembly side to handle this correctly. > What do you think? > > > > I support removing TlabFastRefill, but I think it is good to not depend on > that happening first. > > > > > I'm slowly pushing on the FastTLABRefill (https://bugs.openjdk.java.net/browse/JDK-8194084), > I agree on keeping both separate for now though so that we can think of > both differently > > > > > > Now, below, inlined are my answers: > > On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund > wrote: > > Hi JC, > > Hope I am reviewing the right version of your work. Here goes... > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > 159 AllocTracer::send_allocation_outside_tlab(klass, result, size * > HeapWordSize, THREAD); > 160 > 161 THREAD->tlab().handle_sample(THREAD, result, size); > 162 return result; > 163 } > > Should not call tlab()->X without checking if (UseTLAB) IMO. > > Done! > > > More about this later. > > > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > So first of all, there seems to quite a few ends. There is an "end", a > "hard > end", a "slow path end", and an "actual end". Moreover, it seems like the > "hard end" is actually further away than the "actual end". So the "hard > end" > seems like more of a "really definitely actual end" or something. I don't > know about you, but I think it looks kind of messy. In particular, I don't > feel like the name "actual end" reflects what it represents, especially > when > there is another end that is behind the "actual end". > > 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { > 414 // Did a fast TLAB refill occur? > 415 if (_slow_path_end != _end) { > 416 // Fix up the actual end to be now the end of this TLAB. > 417 _slow_path_end = _end; > 418 _actual_end = _end; > 419 } > 420 > 421 return _actual_end + alignment_reserve(); > 422 } > > I really do not like making getters unexpectedly have these kind of side > effects. It is not expected that when you ask for the "hard end", you > implicitly update the "slow path end" and "actual end" to new values. > > As I said, a lot of this is due to the FastTlabRefill. If I make this > not supporting FastTlabRefill, this goes away. The reason the system > needs to update itself at the get is that you only know at that get if > things have shifted underneath the tlab slow path. I am not sure of > really better names (naming is hard!), perhaps we could do these > names: > > - current_tlab_end // Either the allocated tlab end or a sampling > point > - last_allocation_address // The end of the tlab allocation > - last_slowpath_allocated_end // In case a fast refill occurred the > end might have changed, this is to remember slow vs fast past refills > > the hard_end method can be renamed to something like: > tlab_end_pointer() // The end of the lab including a bit of > alignment reserved bytes > > > > Those names sound better to me. Could you please provide a mapping from > the old names to the new names so I understand which one is which please? > > This is my current guess of what you are proposing: > > end -> current_tlab_end > actual_end -> last_allocation_address > slow_path_end -> last_slowpath_allocated_end > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was proposing. > > > > I would prefer this naming: > > end -> slow_path_end // the end for taking a slow path; either due to > sampling or refilling > actual_end -> allocation_end // the end for allocations > slow_path_end -> last_slow_path_end // last address for slow_path_end (as > opposed to allocation_end) > hard_end -> reserved_end // the end of the reserved space of the TLAB > > About setting things in the getter... that still seems like a very > unpleasant thing to me. It would be better to inspect the call hierarchy > and explicitly update the ends where they need updating, and assert in the > getter that they are in sync, rather than implicitly setting various ends > as a surprising side effect in a getter. It looks like the call hierarchy > is very small. With my new naming convention, reserved_end() would > presumably return _allocation_end + alignment_reserve(), and have an assert > checking that _allocation_end == _last_slow_path_allocation_end, > complaining that this invariant must hold, and that a caller to this > function, such as make_parsable(), must first explicitly synchronize the > ends as required, to honor that invariant. > > > > > > > I've renamed the variables to how you preferred it except for the _end > one. I did: > > current_end > > last_allocation_address > > tlab_end_ptr > > > > The reason is that the architecture dependent code use the thread.hpp API > and it already has tlab included into the name so it becomes > tlab_current_end (which is better that tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method with a TODO that says to > remove it when FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > Note that the following comment now needs updating accordingly in > threadLocalAllocBuffer.hpp: > > 41 // Heap sampling is performed via the end/actual_end fields. > > 42 // actual_end contains the real end of the tlab allocation, > > 43 // whereas end can be set to an arbitrary spot in the tlab to > > 44 // trip the return and sample the allocation. > > 45 // slow_path_end is used to track if a fast tlab refill occured > > 46 // between slowpath calls. > > There might be other comments too, I have not looked in detail. > > > > This was the only spot that still had an actual_end, I fixed it now. I'll > do a sweep to double check other comments. > > > > > > > > > > > > > Not sure it's better but before updating the webrev, I wanted to try > to get input/consensus :) > > (Note hard_end was always further off than end). > > src/hotspot/share/prims/jvmti.xml: > > 10357 > 10358 > 10359 Can sample the heap. > 10360 If this capability is enabled then the heap sampling > methods > can be called. > 10361 > 10362 > > Looks like this capability should not be "since 9" if it gets integrated > now. > > Updated now to 11, crossing my fingers :) > > src/hotspot/share/runtime/heapMonitoring.cpp: > > 448 if (is_alive->do_object_b(value)) { > 449 // Update the oop to point to the new object if it is still > alive. > 450 f->do_oop(&(trace.obj)); > 451 > 452 // Copy the old trace, if it is still live. > 453 _allocated_traces->at_put(curr_pos++, trace); > 454 > 455 // Store the live trace in a cache, to be served up on > /heapz. > 456 _traces_on_last_full_gc->append(trace); > 457 > 458 count++; > 459 } else { > 460 // If the old trace is no longer live, add it to the list of > 461 // recently collected garbage. > 462 store_garbage_trace(trace); > 463 } > > In the case where the oop was not live, I would like it to be explicitly > cleared. > > Done I think how you wanted it. Let me know because I'm not familiar > with the RootAccess API. I'm unclear if I'm doing this right or not so > reviews of these parts are highly appreciated. Robbin had talked of > perhaps later pushing this all into a OopStorage, should I do this now > do you think? Or can that wait a second webrev later down the road? > > > > I think using handles can and should be done later. You can use the Access > API now. > I noticed that you are missing an #include "oops/access.inline.hpp" in > your heapMonitoring.cpp file. > > > > The missing header is there for me so I don't know, I made sure it is > present in the latest webrev. Sorry about that. > > > > + Did I clear it the way you wanted me to or were you thinking of > something else? > > > That is precisely how I wanted it to be cleared. Thanks. > > + Final question here, seems like if I were to want to not do the > f->do_oop directly on the trace.obj, I'd need to do something like: > > f->do_oop(&value); > ... > trace->store_oop(value); > > to update the oop internally. Is that right/is that one of the > advantages of going to the Oopstorage sooner than later? > > > I think you really want to do the do_oop on the root directly. Is there a > particular reason why you would not want to do that? > Otherwise, yes - the benefit with using the handle approach is that you do > not need to call do_oop explicitly in your code. > > > > There is no reason except that now we have a load_oop and a get_oop_addr, > I was not sure what you would think of that. > > > > > > That's fine. > > > > > > > Also I see a lot of concurrent-looking use of the following field: > 267 volatile bool _initialized; > > Please note that the "volatile" qualifier does not help with reordering > here. Reordering between volatile and non-volatile fields is completely > free > for both compiler and hardware, except for windows with MSVC, where > volatile > semantics is defined to use acquire/release semantics, and the hardware is > TSO. But for the general case, I would expect this field to be stored with > OrderAccess::release_store and loaded with OrderAccess::load_acquire. > Otherwise it is not thread safe. > > Because everything is behind a mutex, I wasn't really worried about > this. I have a test that has multiple threads trying to hit this > corner case and it passes. > > However, to be paranoid, I updated it to using the OrderAccess API > now, thanks! Let me know what you think there too! > > > If it is indeed always supposed to be read and written under a mutex, then > I would strongly prefer to have it accessed as a normal non-volatile > member, and have an assertion that given lock is held or we are in a > safepoint, as we do in many other places. Something like this: > > assert(HeapMonitorStorage_lock->owned_by_self() || > (SafepointSynchronize::is_at_safepoint() && > Thread::current()->is_VM_thread()), "this should not be accessed > concurrently"); > > It would be confusing to people reading the code if there are uses of > OrderAccess that are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the code! I put it around > each access/assignment of the _initialized method and found one case where > yes you can touch it and not have the lock. It actually is "ok" because you > don't act on the storage until later and only when you really want to > modify the storage (see the object_alloc_do_sample method which calls the > add_trace method). > > > > But, because of this, I'm going to put the OrderAccess here, I'll do some > performance numbers later and if there are issues, I might add a "unsafe" > read and a "safe" one to make it explicit to the reader. But I don't think > it will come to that. > > > Okay. This double return in heapMonitoring.cpp looks wrong: > > 283 bool initialized() { > 284 return OrderAccess::load_acquire(&_initialized) != 0; > 285 return _initialized; > 286 } > > Since you said object_alloc_do_sample() is the only place where you do not > hold the mutex while reading initialized(), I had a closer look at that. It > looks like in its current shape, the lack of a mutex may lead to a memory > leak. In particular, it first checks if (initialized()). Let's assume this > is now true. It then allocates a bunch of stuff, and checks if the number > of frames were over 0. If they were, it calls > StackTraceStorage::storage()->add_trace() seemingly hoping that after > grabbing the lock in there, initialized() will still return true. But it > could now return false and skip doing anything, in which case the allocated > stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean and checking for > that. It will be in the next webrev. Thanks, the truth is that in our > implementation the system is always on or off, so this never really occurs > :). In this version though, that is not true and it's important to handle > so thanks again! > > > > > > > So the analysis seems to be that _initialized is only used outside of the > mutex in once instance, where it is used to perform double-checked locking, > that actually causes a memory leak. > > I am not proposing how to fix that, just raising the issue. If you still > want to perform this double-checked locking somehow, then the use of > acquire/release still seems odd. Because the memory ordering restrictions > of it never comes into play in this particular case. If it ever did, then > the use of destroy_stuff(); release_store(_initialized, 0) would be broken > anyway as that would imply that whatever concurrent reader there ever was > would after reading _initialized with load_acquire() could *never* read the > data that is concurrently destroyed anyway. I would be biased to think that > RawAccess::load/store looks like a more appropriate solution, > given that the memory leak issue is resolved. I do not know how painful it > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more and the difference > and code really stems from our internal version. In this version however, > there are actually a lot of things going on that I did not go entirely > through in my head but this comment made me ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a check to > HeapMonitoring::enabled(), there is only a small chance that the call is > happening when things have been disabled. So there is no real need to do a > first check on the initialized, it is a rare occurence that a call happens > to object_alloc_do_sample and the initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample without looking > at HeapMonitoring::enabled(), that would be ok too. You would gather the > stacktrace and get nowhere at the add_trace call, which would return false; > so though not optimal performance wise, nothing would break). > > > > Furthermore, the add_trace is really the moment of no return and we have > the mutex lock and then the initialized check. So, in the end, I did two > things: I removed that first check and then I removed the OrderAccess for > the storage initialized. I think now I have a better grasp and > understanding why it was done in our code and why it is not needed here. > Thanks for pointing it out :). This now still passes my JTREG tests, > especially the threaded one. > > > > > > > > > > > > > > > > > As a kind of meta comment, I wonder if it would make sense to add sampling > for non-TLAB allocations. Seems like if someone is rapidly allocating a > whole bunch of 1 MB objects that never fit in a TLAB, I might still be > interested in seeing that in my traces, and not get surprised that the > allocation rate is very high yet not showing up in any profiles. > > That is handled by the handle_sample where you wanted me to put a > UseTlab because you hit that case if the allocation is too big. > > > I see. It was not obvious to me that non-TLAB sampling is done in the TLAB > class. That seems like an abstraction crime. > What I wanted in my previous comment was that we do not call into the TLAB > when we are not using TLABs. If there is sampling logic in the TLAB that is > used for something else than TLABs, then it seems like that logic simply > does not belong inside of the TLAB. It should be moved out of the TLAB, and > instead have the TLAB call this common abstraction that makes sense. > > > > So in the incremental version: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still > a "crime". The reason is that the system has to have the bytes_until_sample > on a per-thread level and it made "sense" to have it with the TLAB > implementation. Also, I was not sure how people felt about adding something > to the thread instance instead. > > > > Do you think it fits better at the Thread level? I can see how difficult > it is to make it happen there and add some logic there. Let me know what > you think. > > > We have an unfortunate situation where everyone that has some fields that > are thread local tend to dump them right into Thread, making the size and > complexity of Thread grow as it becomes tightly coupled with various > unrelated subsystems. It would be desirable to have a separate class for > this instead that encapsulates the sampling logic. That class could > possibly reside in Thread though as a value object of Thread. > > > > I imagined that would be the case but was not sure. I will look at the > example that Robbin is talking about (ThreadSMR) and will see how to > refactor my code to use that. > > > > Thanks again for your help, > > Jc > > > > > > > > > > > Hope I have answered your questions and that my feedback makes sense to > you. > > > > You have and thank you for them, I think we are getting to a cleaner > implementation and things are getting better and more readable :) > > > Yes it is getting better. > > Thanks, > /Erik > > > > > Thanks for your help! > > Jc > > > > > > Thanks, > /Erik > > > > I double checked by changing the test > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java > > to use a smaller Tlab (2048) and made the object bigger and it goes > through that and passes. > > Thanks again for your review and I look forward to your pointers for > the questions I now have raised! > Jc > > > > > > > > > Thanks, > /Erik > > > On 2018-01-26 06:45, JC Beyler wrote: > > Thanks Robbin for the reviews :) > > The new full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ > > I inlined my answers: > > On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn wrote: > > Hi JC, great to see another revision! > > #### > heapMonitoring.cpp > > StackTraceData should not contain the oop for 'safety' reasons. > When StackTraceData is moved from _allocated_traces: > L452 store_garbage_trace(trace); > it contains a dead oop. > _allocated_traces could instead be a tupel of oop and StackTraceData thus > dead oops are not kept. > > Done I used inheritance to make the copier work regardless but the > idea is the same. > > You should use the new Access API for loading the oop, something like > this: > RootAccess::load(...) > I don't think you need to use Access API for clearing the oop, but it > would > look nicer. And you shouldn't probably be using: > Universe::heap()->is_in_reserved(value) > > I am unfamiliar with this but I think I did do it like you wanted me > to (all tests pass so that's a start). I'm not sure how to clear the > oop exactly, is there somewhere that does that, which I can use to do > the same? > > I removed the is_in_reserved, this came from our internal version, I > don't know why it was there but my tests work without so I removed it > :) > > The lock: > L424 MutexLocker mu(HeapMonitorStorage_lock); > Is not needed as far as I can see. > weak_oops_do is called in a safepoint, no TLAB allocation can happen and > JVMTI thread can't access these data-structures. Is there something more > to > this lock that I'm missing? > > Since a thread can call the JVMTI getLiveTraces (or any of the other > ones), it can get to the point of trying to copying the > _allocated_traces. I imagine it is possible that this is happening > during a GC or that it can be started and a GC happens afterwards. > Therefore, it seems to me that you want this protected, no? > > #### > You have 6 files without any changes in them (any more): > g1CollectedHeap.cpp > psMarkSweep.cpp > psParallelCompact.cpp > genCollectedHeap.cpp > referenceProcessor.cpp > thread.hpp > > Done. > > #### > I have not looked closely, but is it possible to hide heap sampling in > AllocTracer ? (with some minor changes to the AllocTracer API) > > I am imagining that you are saying to move the code that does the > sampling code (change the tlab end, do the call to HeapMonitoring, > etc.) into the AllocTracer code itself? I think that is right and I'll > look if that is possible and prepare a webrev to show what would be > needed to make that happen. > > #### > Minor nit, when declaring pointer there is a little mix of having the > pointer adjacent by type name and data name. (Most hotspot code is by > type > name) > E.g. > heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... > heapMonitoring.cpp:733 Method* m = vfst.method(); > (not just this file) > > Done! > > #### > HeapMonitorThreadOnOffTest.java:77 > I would make g_tmp volatile, otherwise the assignment in loop may > theoretical be skipped. > > Also done! > > Thanks again! > Jc > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Mon Apr 2 18:32:50 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Apr 2018 11:32:50 -0700 Subject: [8u] RFR for backport of "JDK-8165736: Error message should be shown when JVMTI agent cannot be attached" to jdk8u-dev In-Reply-To: <70a18b4a-a310-babe-1f41-c86100638457@oracle.com> References: <8c218a37-4a50-4b4f-847b-4c67e02b7866@default> <70a18b4a-a310-babe-1f41-c86100638457@oracle.com> Message-ID: Hi Shafi, I agree with David. Consider it reviewed if you add the initialization of ebuf. Thanks, Serguei On 3/31/18 00:24, David Holmes wrote: > Hi Shafi, > > On 29/03/2018 7:11 PM, Shafi Ahmad wrote: >> Hi, >> >> Please review the backport of ' JDK-8165736: Error message should be >> shown when JVMTI agent cannot be attached' to jdk8u-dev. >> Please note that this is not a clean backport because we can't not >> backport native jtreg tests as? infrastructure of naive jtreg test >> has been available since JDK 9. > > Ok. > >> webrev: http://cr.openjdk.java.net/~shshahma/8165736/ >> jdk10 bug: https://bugs.openjdk.java.net/browse/JDK-8165736 >> original patch pushed to jdk10: >> http://hg.openjdk.java.net/jdk/jdk/rev/bc1cffa26561 > > src/share/vm/prims/jvmtiExport.cpp > > You missed the initalization of ebuf: > > +? char ebuf[1024] = {0}; > > Otherwise the functional backport seems okay. > > Thanks, > David > >> Test:? Run jprt -testset hotspot, -testset core >> >> Regards, >> Shafi >> From serguei.spitsyn at oracle.com Mon Apr 2 21:44:15 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Apr 2018 14:44:15 -0700 Subject: RFR 4613913: Four EventRequest methods are invokable on deleted request In-Reply-To: References: <579aad5f-fdaa-e0c9-dc16-7bc2394cb82f@oracle.com>

Message-ID: <9d8eb853-3e20-7369-a28d-0323cbc1b6e1@oracle.com> An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Mon Apr 2 22:02:55 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Apr 2018 08:02:55 +1000 Subject: RFR 4613913: Four EventRequest methods are invokable on deleted request In-Reply-To: <9d8eb853-3e20-7369-a28d-0323cbc1b6e1@oracle.com> References: <579aad5f-fdaa-e0c9-dc16-7bc2394cb82f@oracle.com>

<9d8eb853-3e20-7369-a28d-0323cbc1b6e1@oracle.com> Message-ID: Hi Serguei, On 3/04/2018 7:44 AM, serguei.spitsyn at oracle.com wrote: > Hi David and Daniil, > > > David, > > Thank you for raising this concern. > You are right. > > I've made a mistake when looked at the EventRequest.isEnabled() spec and > thought > that the following spec lines of the setEnbaled() belong to the isEnabled() > and other 3 methods as well: > > Throws: > |InvalidRequestStateException > | > - if this request has been deleted. > > In fact, the JDI spec for methods isEnabled(), getProperty(), > putProperty() and suspendPolicy() > does not say they can throw the InvalidRequestStateException. > > So, now I'd suggest to just relax the test checks by not expecting an > InvalidRequestStateException from isEnabled(), getProperty(), putProperty() > and suspendPolicy(). > > Would this approach resolve your concern? Yes. The semantics for these methods was established way back in 2000 under: https://bugs.openjdk.java.net/browse/JDK-4320478 I think this bug, 4613913, was misguided in expecting all of the methods to throw the exception. You could make a case for doing so, but as I said that's a spec change that should have been made back then. Changing the spec now seems pointless - it gains nothing but introduces an incompatible behaviour change. Changing the test is the way to go. Thanks, David ----- > Thanks, > Serguei > > > > > On 3/29/18 17:12, David Holmes wrote: >> Daniil, >> >> Even as far back as 2007 there was concern that changing the current >> behaviour might break existing code. That has to be an even bigger >> concern now! >> >> Further the spec is sloppy here: >> >> " Once the eventRequest is deleted, no operations (for example, >> EventRequest.setEnabled(boolean)) are permitted." >> >> This is too loose. What is an "operation"? Is a query like isEnabled() >> really an "operation"? I would not consider it so. And if we can >> delete requests why is there no "isDeleted" query? The spec seems >> incomplete and too vague. >> >> To me this something that should have been clarified in the spec first >> and then the implementation brought into alignment. But that should >> have happened many years ago. Changing this now seems risky to me. >> >> This change in long standing behaviour also requires a CSR request if >> it is to proceed. >> >> David >> ----- >> >> >> On 30/03/2018 8:36 AM, Daniil Titov wrote: >>> Hi Serguei, >>> >>> Please review a new version of the fix that has these places corrected. >>> >>> Webreb: http://cr.openjdk.java.net/~dtitov/4613913/webrev.03 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> ?On 3/29/18, 11:46 AM, "serguei.spitsyn at oracle.com" >>> wrote: >>> >>> ???? Hi Daniil, >>> ???? ???? It looks good in general. >>> ???? One minor comment is that it would be nice to make a cleanup >>> ???? (as we already discussed) for all places like this: >>> ???? ?????? 202???????????? if (isEnabled() || deleted) { >>> ?????? 203???????????????? throw invalidState(); >>> ?????? 204???????????? } >>> ???? ???? As the isEnabled() now checks for deleted and throws the >>> invalidState() >>> ???? then we can simplify these fragments to be: >>> ???? ?????? 202???????????? if (isEnabled()) { >>> ?????? 203???????????????? throw invalidState(); >>> ?????? 204???????????? } >>> ???? ???? ???? Thanks, >>> ???? Serguei >>> ???? ???? ???? On 3/29/18 10:27, Daniil Titov wrote: >>> ???? > Please review the changes that ensure that no operation on >>> deleted com.sun.jdi.request.EventRequest objects are permitted as per >>> JDI specification for >>> com.sun.jdi.request.EventRequestManager.deleteEventRequest(com.sun.jdi.request.EventRequest) >>> method.? The fix makes the following 4 methods in class >>> com.sun.tools.jdi. EventRequestManagerImpl$EventRequestImpl to throw >>> com.sun.jdi.request.InvalidRequestStateException if the request is >>> deleted: >>> ???? >??? - getProperty() >>> ???? >??? - putProperty(Object, Object) >>> ???? >??? - suspendPolicy() >>> ???? >??? - isEnabled() >>> ???? > >>> ???? > Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> ???? > Webrev: http://cr.openjdk.java.net/~dtitov/4613913/webrev.02/ >>> ???? > >>> ???? > Best regards, >>> ???? > Daniil >>> ???? > >>> ???? > >>> >>> > From serguei.spitsyn at oracle.com Mon Apr 2 22:25:37 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn) Date: Mon, 02 Apr 2018 15:25:37 -0700 Subject: =?US-ASCII?Q?Re:_RFR_4613913:_Four_EventRequest_me?= =?US-ASCII?Q?thods_are_invokable_on_deleted=0D__request?= Message-ID: Hi David, Somehow I can see your message from my smart phone only... Thank you for for confirming that you are agree with this approach! Thanks, Serguei Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: David Holmes Date: 04/02/2018 15:02 (GMT-08:00) To: serguei.spitsyn at oracle.com, Daniil Titov , serviceability-dev at openjdk.java.net Subject: Re: RFR 4613913: Four EventRequest methods are invokable on deleted request Hi Serguei, On 3/04/2018 7:44 AM, serguei.spitsyn at oracle.com wrote: > Hi David and Daniil, > > > David, > > Thank you for raising this concern. > You are right. > > I've made a mistake when looked at the EventRequest.isEnabled() spec and > thought > that the following spec lines of the setEnbaled() belong to the isEnabled() > and other 3 methods as well: > > Throws: > |InvalidRequestStateException > | > - if this request has been deleted. > > In fact, the JDI spec for methods isEnabled(), getProperty(), > putProperty() and suspendPolicy() > does not say they can throw the InvalidRequestStateException. > > So, now I'd suggest to just relax the test checks by not expecting an > InvalidRequestStateException from isEnabled(), getProperty(), putProperty() > and suspendPolicy(). > > Would this approach resolve your concern? Yes. The semantics for these methods was established way back in 2000 under: https://bugs.openjdk.java.net/browse/JDK-4320478 I think this bug, 4613913, was misguided in expecting all of the methods to throw the exception. You could make a case for doing so, but as I said that's a spec change that should have been made back then. Changing the spec now seems pointless - it gains nothing but introduces an incompatible behaviour change. Changing the test is the way to go. Thanks, David ----- > Thanks, > Serguei > > > > > On 3/29/18 17:12, David Holmes wrote: >> Daniil, >> >> Even as far back as 2007 there was concern that changing the current >> behaviour might break existing code. That has to be an even bigger >> concern now! >> >> Further the spec is sloppy here: >> >> " Once the eventRequest is deleted, no operations (for example, >> EventRequest.setEnabled(boolean)) are permitted." >> >> This is too loose. What is an "operation"? Is a query like isEnabled() >> really an "operation"? I would not consider it so. And if we can >> delete requests why is there no "isDeleted" query? The spec seems >> incomplete and too vague. >> >> To me this something that should have been clarified in the spec first >> and then the implementation brought into alignment. But that should >> have happened many years ago. Changing this now seems risky to me. >> >> This change in long standing behaviour also requires a CSR request if >> it is to proceed. >> >> David >> ----- >> >> >> On 30/03/2018 8:36 AM, Daniil Titov wrote: >>> Hi Serguei, >>> >>> Please review a new version of the fix that has these places corrected. >>> >>> Webreb: http://cr.openjdk.java.net/~dtitov/4613913/webrev.03 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> ?On 3/29/18, 11:46 AM, "serguei.spitsyn at oracle.com" >>> wrote: >>> >>> Hi Daniil, >>> It looks good in general. >>> One minor comment is that it would be nice to make a cleanup >>> (as we already discussed) for all places like this: >>> 202 if (isEnabled() || deleted) { >>> 203 throw invalidState(); >>> 204 } >>> As the isEnabled() now checks for deleted and throws the >>> invalidState() >>> then we can simplify these fragments to be: >>> 202 if (isEnabled()) { >>> 203 throw invalidState(); >>> 204 } >>> Thanks, >>> Serguei >>> On 3/29/18 10:27, Daniil Titov wrote: >>> > Please review the changes that ensure that no operation on >>> deleted com.sun.jdi.request.EventRequest objects are permitted as per >>> JDI specification for >>> com.sun.jdi.request.EventRequestManager.deleteEventRequest(com.sun.jdi.request.EventRequest) >>> method. The fix makes the following 4 methods in class >>> com.sun.tools.jdi. EventRequestManagerImpl$EventRequestImpl to throw >>> com.sun.jdi.request.InvalidRequestStateException if the request is >>> deleted: >>> > - getProperty() >>> > - putProperty(Object, Object) >>> > - suspendPolicy() >>> > - isEnabled() >>> > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> > Webrev: http://cr.openjdk.java.net/~dtitov/4613913/webrev.02/ >>> > >>> > Best regards, >>> > Daniil >>> > >>> > >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serguei.spitsyn at oracle.com Tue Apr 3 01:52:44 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Apr 2018 18:52:44 -0700 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: References: Message-ID: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Hi Thomas, Added the serviceability-dev mailing list as it is a Serviceability area. The fix looks good to me. One question: ?Could you, please, post the sorted help output? ?It is interesting how does it look like when sorted. Thanks, Serguei On 3/28/18 13:08, Thomas St?fe wrote: > Hi all, > > may I get reviews for this tiny trivial change which causes jcmd help > output (the command list) to be sorted? > > bug: https://bugs.openjdk.java.net/browse/JDK-8200384 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help-sorted/webrev.00/webrev/ > > Thanks! > > Best Regards, Thomas From shafi.s.ahmad at oracle.com Tue Apr 3 05:59:44 2018 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Mon, 2 Apr 2018 22:59:44 -0700 (PDT) Subject: [8u] RFR for backport of "JDK-8165736: Error message should be shown when JVMTI agent cannot be attached" to jdk8u-dev In-Reply-To: References: <8c218a37-4a50-4b4f-847b-4c67e02b7866@default> <70a18b4a-a310-babe-1f41-c86100638457@oracle.com> Message-ID: <18d65438-9a9d-4664-939c-74a2af1da73b@default> Thank you David and Serguei. I have uploaded the webrev for my reference. http://cr.openjdk.java.net/~shshahma/8165736/hotspot.01/ Regards, Shafi > -----Original Message----- > From: Serguei Spitsyn > Sent: Tuesday, April 03, 2018 12:03 AM > To: David Holmes ; Shafi Ahmad > ; serviceability-dev at openjdk.java.net > Cc: Yasumasa Suenaga > Subject: Re: [8u] RFR for backport of "JDK-8165736: Error message should be > shown when JVMTI agent cannot be attached" to jdk8u-dev > > Hi Shafi, > > I agree with David. > Consider it reviewed if you add the initialization of ebuf. > > Thanks, > Serguei > > > On 3/31/18 00:24, David Holmes wrote: > > Hi Shafi, > > > > On 29/03/2018 7:11 PM, Shafi Ahmad wrote: > >> Hi, > >> > >> Please review the backport of ' JDK-8165736: Error message should be > >> shown when JVMTI agent cannot be attached' to jdk8u-dev. > >> Please note that this is not a clean backport because we can't not > >> backport native jtreg tests as? infrastructure of naive jtreg test > >> has been available since JDK 9. > > > > Ok. > > > >> webrev: http://cr.openjdk.java.net/~shshahma/8165736/ > >> jdk10 bug: https://bugs.openjdk.java.net/browse/JDK-8165736 > >> original patch pushed to jdk10: > >> http://hg.openjdk.java.net/jdk/jdk/rev/bc1cffa26561 > > > > src/share/vm/prims/jvmtiExport.cpp > > > > You missed the initalization of ebuf: > > > > +? char ebuf[1024] = {0}; > > > > Otherwise the functional backport seems okay. > > > > Thanks, > > David > > > >> Test:? Run jprt -testset hotspot, -testset core > >> > >> Regards, > >> Shafi > >> > From amit.sapre at oracle.com Tue Apr 3 10:08:14 2018 From: amit.sapre at oracle.com (Amit Sapre) Date: Tue, 3 Apr 2018 03:08:14 -0700 (PDT) Subject: RFR : JDK-8042215 - javax/management/remote/mandatory/connection/ReconnectTest.java NoSuchObjectException no such object in table Message-ID: <9851f5fa-86e5-4ee3-a303-44a90dd934d6@default> Hello, Please review changes for refactored test case As part of refactoring, 1) Removed iiop & jmxmp protocol related code 2) Added exception handling during connector connection. Webrev : http://cr.openjdk.java.net/~asapre/webrev/2018/JDK-8042215/webrev.00/ Bug ID : https://bugs.openjdk.java.net/browse/JDK-8042215 Thanks, Amit -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasuenag at gmail.com Tue Apr 3 12:37:21 2018 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Tue, 3 Apr 2018 21:37:21 +0900 Subject: PING: RFR: 8199519: Several GC tests fails with: java.lang.NumberFormatException: Unparseable number: "-" In-Reply-To: <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> References: <6755303f-a1a0-da4f-e1e0-a1bcb0c72efd@gmail.com> <7809552d-dfa0-5f26-bd82-c13df7f45f5f@oracle.com> <85853429-a520-1782-40e4-e05776aa639d@oracle.com> <40b04f2e-1d6c-524e-ea4a-08c42fd41ee6@gmail.com>

<93a1ffeb-4959-3bdb-cbe3-510c258129b6@oracle.com> <5c1975cd-1080-652e-c23a-abd693cc0095@oracle.com> Message-ID: <33358f2d-4e01-7ccb-0f06-02b6828fe65b@gmail.com> PING: Could you review it? This change has been passed Mach5 test. >> > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ Thanks, Yasumasa On 2018/03/28 22:38, Stefan Johansson wrote: > Mach5 testing looks good. > > Can someone in the serviceability team do the second review? > > Cheers, > Stefan > > On 2018-03-28 13:32, Yasumasa Suenaga wrote: >> Thanks Stefan, >> I'm waiting for second reviewer. >> >> >> Yasumasa >> >> >> 2018?3?28?(?) 18:36 Stefan Johansson >: >> >> Hi Yasumasa, >> >> Local testing looks good and I've kicked of some additional Mach5 >> testing that will include these tests on all platforms. >> >> Cheers, >> Stefan >> >> On 2018-03-28 06:04, Yasumasa Suenaga wrote: >> > Hi Stefan, >> > >> > Thank you for sharing your report! >> > I could reproduce them on my VM. >> > >> > I've fixed them in new webrev, and it works fine on my environment. >> > Could you check again? >> > >> > http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.03/ >> > >> > >> > Thanks, >> > >> > Yasumasa >> > >> > >> > >> > 2018-03-28 0:29 GMT+09:00 Stefan Johansson >: >> >> >> >> On 2018-03-27 16:44, Yasumasa Suenaga wrote: >> >>> Hi Stefan, >> >>> >> >>> On 2018/03/27 22:45, Stefan Johansson wrote: >> >>>> Hi Yasumasa, >> >>>> >> >>>> On 2018-03-27 10:56, Yasumasa Suenaga wrote: >> >>>>> Hi Stefan, >> >>>>> >> >>>>> Thank you for your comment. >> >>>>> I updated webrev: >> >>>>> >> >>>>>? ? ?webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.01/ >> >>>> I think the usage of Optional in Expression.setRequired(bool) is a bit >> >>>> unnecessary. It will create temporary objects and there is no benefit from >> >>>> just doing two simple if-statements. >> >>> >> >>> I fixed it in new webrev: >> >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.02/ >> >>> >> >>> >> >>>> I also ran this patch (and the one using forcibly) on my single core VM >> >>>> and realized that this fix will have to include some awk-file updates to >> >>>> make the test in test/jdk/sun/tools/jstat pass when Serial in chosen as the >> >>>> default collector. The tests in test/jdk/sun/tools/jstatd/ are fine. >> >>> >> >>> Can you share the failure report? >> >> It relates to all tests that display the the CGC and the CGCT columns, for >> >> example in jstatGCOutput1.sh: >> >>? ?S0C? ? S1C? ? S0U? ? S1U? ? ? EC? ? ? ?EU OC ?OU? ? ? ?MC? ? ?MU >> >> CCSC? ?CCSU? ?YGC? ? ?YGCT FGC? ? FGCT? ? CGC CGCT? ? ?GCT >> >> 256.0? 256.0? 254.0? ?0.0? ? 2176.0? ?1025.0 5504.0 920.5? ? 7168.0 >> >> 6839.7 768.0? 602.8? ? ? ?2? ? 0.007? ?0 0.000? ?- ? ? ? -? ? 0.007 >> >> >> >> The awk regex needs to be updated to handle '-' for these tests: >> >> test: sun/tools/jstat/jstatGcCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcMetaCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcNewCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcOldCapacityOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcOldOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> test: sun/tools/jstat/jstatGcOutput1.sh >> >> Failed. Execution failed: exit code 1 >> >> >> >> >> >>> If it occurs in jstatClassloadOutput1.sh, it relates to JDK-8173942. >> >>> >> >>> >> >>> Thanks, >> >>> >> >>> Yasumasa >> >>> >> >>> >> >>>> Thanks, >> >>>> Stefan >> >>>>>? ? ?submit-hs: mach5-one-ysuenaga-JDK-8199519-20180327-0652-16322 >> >>>>> >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Yasumasa >> >>>>> >> >>>>> >> >>>>> >> >>>>> 2018-03-27 0:03 GMT+09:00 Stefan Johansson >> >>>>> >: >> >>>>>> Hi Yasumasa, >> >>>>>> >> >>>>>> On 2018-03-22 11:35, Yasumasa Suenaga wrote: >> >>>>>>> Hi all, >> >>>>>>> >> >>>>>>> Please review this change: >> >>>>>>> >> >>>>>>>? ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8199519 >> >>>>>>> webrev: cr.openjdk.java.net/~ysuenaga/JDK-8199519/webrev.00/ >> >>>>>> The fix seems to make things to work as expected. Manually tested it >> >>>>>> and >> >>>>>> Mach5 also looks good. >> >>>>>> >> >>>>>> I have some comments regarding the patch. I think 'forcibly' should be >> >>>>>> rename to something more descriptive. Naming is never easy but I think >> >>>>>> 'required' would be better, as in, this column is required and not >> >>>>>> allowed >> >>>>>> to print '-'. That would also render the code in >> >>>>>> ExpressionResolver.java to >> >>>>>> be: >> >>>>>>? ? ?return new Literal(isRequired ? 0.0d : Double.NaN); >> >>>>>> I think that also better explains why we return 0 instead of NaN. >> >>>>>> >> >>>>>> I would also like to see the forcibly/required state moved into the >> >>>>>> Expression it self, that way we don't have to pass it around but can >> >>>>>> instead >> >>>>>> do: >> >>>>>>? ? ?return new Literal(e.isRequired() ? 0.0d : Double.NaN); >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Stefan >> >>>>>> >> >>>>>> >> >>>>>>> After JDK-8153333, some jstat tests are failed because GCT in jstat >> >>>>>>> output >> >>>>>>> is dash (-) if garbage collector is not concurrent collector e.g. >> >>>>>>> Serial GC. >> >>>>>>> I fixed that GCT can be calculated correctly. >> >>>>>>> >> >>>>>>> This change has been tested on Mach5 by Stefan. >> >>>>>>> >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> Yasumasa >> >>>>>> >> > From bob.vandette at oracle.com Tue Apr 3 14:09:56 2018 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 3 Apr 2018 10:09:56 -0400 Subject: RFR: 81820709 - Container Awareness JEP Message-ID: Here is a first pass at an implementation of the Container Awareness JEP. This JEP adds an implementation of an internal API for the extraction of system metrics for processes running in Isolation Groups (Containers). The plan is to get the internal API integrated in JDK 11 with support for Linux x64 and then follow this work up with support for alternate platforms, the addition of a JMX MBean and Java Flight Recorder. JEP: https://bugs.openjdk.java.net/browse/JDK-8182070 JAVADOC: http://cr.openjdk.java.net/~bobv/8182070/v01/javadoc/jdk/internal/platform/Metrics.html WEBREV: http://cr.openjdk.java.net/~bobv/8182070/v01/webrev WEBREV including a Prototype MBEAN for exposing these Metrics: This prototype will not be integrated as part of this JEP. It?s for information only. http://cr.openjdk.java.net/~bobv/8182070/v01/mbean-proto/ This feature adds a new -XshowSetting option ?system? which displays the available system Metrics. % java -XshowSettings:system Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 24 CPUTime per Processor: [0]: 52805305 (ns) [1]: 70799492 (ns) [2]: 27449618 (ns) [3]: 12957734 (ns) [4]: 38382720 (ns) [5]: 20325731 (ns) [6]: 36374924 (ns) [7]: 40279640 (ns) [8]: 17557347 (ns) [9]: 19056675 (ns) [10]: 66185888 (ns) [11]: 56539480 (ns) [12]: 10009386 (ns) [13]: 19139797 (ns) [14]: 2257349 (ns) [15]: 8712468 (ns) [16]: 10306911 (ns) [17]: 9814800 (ns) [18]: 3516611 (ns) [19]: 747174 (ns) [20]: 4380756 (ns) [21]: 11803118 (ns) [22]: 1076297 (ns) [23]: 8069315 (ns) CPU Usage is: 550599580 (ns) CPU User Usage is: 36 (ticks) CPU System Usage is: 10 (ticks) CPU Period: 100000 CPU Quota: -1 CPU Shares: -1 CPU Number of Periods: 0 CPU Number of Throttled Periods: 0 CPU Throttled Time: 0 CPUSet Exclusive: false CPUSet Memory Exclusive: false List of Processors, 24 total: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 List of Effective Processors, 24 total: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 List of Memory Nodes, 2 total: 0 1 List of Available Memory Nodes, 2 total: 0 1 CPUSet Memory Pressure Enabled: false CPUSet Memory Pressure: 0.0 Memory Failed Count: 0 Memory Limit: Unlimited Memory Used: 43.31M Max Memory Used: 48.82M Memory Soft Limit: Unlimited Memory & Swap Failed Count: 0.00K Memory & Swap Limit: Unlimited Memory & Swap Used: 43.93M Max Memory & Swap Used: 48.82M Kernel Memory Failed Count: 0.00K Kernel Memory Limit: Unlimited Kernel Memory Used: 0.00K Kernel Max Memory Used: 0.00K TCP Memory Failed Count: 0.00K TCP Memory Limit: Unlimited TCP Memory Used: 0.00K TCP Max Memory Used: 0.00K Out Of Memory Killer Enabled: true BLKIO: Number of I/O Operations Completed: 42 BLKIO: Bytes Transferred from disk: 4923392 Bob Vandette From Derek.White at cavium.com Tue Apr 3 22:54:07 2018 From: Derek.White at cavium.com (White, Derek) Date: Tue, 3 Apr 2018 22:54:07 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: Thanks JC, New patch applies cleanly. Compiles and runs (simple test programs) on aarch64. * Derek From: JC Beyler [mailto:jcbeyler at google.com] Sent: Monday, April 02, 2018 1:17 PM To: White, Derek Cc: Erik ?sterlund ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev Subject: Re: JDK-8171119: Low-Overhead Heap Profiling Hi Derek, I know there were a few things that went in that provoked a merge conflict. I worked on it and got it up to date. Sadly my lack of knowledge makes it a full rebase instead of keeping all the history. However, with a newly cloned jdk/hs you should now be able to use: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ The change you are referring to was done with the others so perhaps you were unlucky and I forgot it in a webrev and fixed it in another? I don't know but it's been there and I checked, it is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html I double checked that tlab_end_offset no longer appears in any architecture (as far as I can tell :)). Thanks for testing and let me know if you run into any other issues! Jc On Fri, Mar 30, 2018 at 4:24 PM White, Derek > wrote: Hi Jc, I?ve been having trouble getting your patch to apply correctly. I may have based it on the wrong version. In any case, I think there?s a missing update to macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where ?JavaThread::tlab_end_offset()? should become ?JavaThread::tlab_current_end_offset()?. This should correspond to the other port?s changes in templateTable_.cpp files. Thanks! - Derek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of JC Beyler Sent: Wednesday, March 28, 2018 11:43 AM To: Erik ?sterlund > Cc: serviceability-dev at openjdk.java.net; hotspot-compiler-dev > Subject: Re: JDK-8171119: Low-Overhead Heap Profiling Hi all, I've been working on deflaking the tests mostly and the wording in the JVMTI spec. Here is the two incremental webrevs: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ Here is the total webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ Here are the notes of this change: - Currently the tests pass 100 times in a row, I am working on checking if they pass 1000 times in a row. - The default sampling rate is set to 512k, this is what we use internally and having a default means that to enable the sampling with the default, the user only has to do a enable event/disable event via JVMTI (instead of enable + set sample rate). - I deprecated the code that was handling the fast path tlab refill if it happened since this is now deprecated - Though I saw that Graal is still using it so I have to see what needs to be done there exactly Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when the event system is turned on and the callback to the native agent is just empty. I got a 3% overhead with a 512k sampling rate with the code I put in the native side of my tests. Thanks and comments are appreciated, Jc On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > wrote: Hi all, The incremental webrev update is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ The full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ Major change here is: - I've removed the heapMonitoring.cpp code in favor of just having the sampling events as per Serguei's request; I still have to do some overhead measurements but the tests prove the concept can work - Most of the tlab code is unchanged, the only major part is that now things get sent off to event collectors when used and enabled. - Added the interpreter collectors to handle interpreter execution - Updated the name from SetTlabHeapSampling to SetHeapSampling to be more generic - Added a mutex for the thread sampling so that we can initialize an internal static array safely - Ported the tests from the old system to this new one I've also updated the JEP and CSR to reflect these changes: https://bugs.openjdk.java.net/browse/JDK-8194905 https://bugs.openjdk.java.net/browse/JDK-8171119 In order to make this have some forward progress, I've removed the heap sampling code entirely and now rely entirely on the event sampling system. The tests reflect this by using a simplified implementation of what an agent could do: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c (Search for anything mentioning event_storage). I have not taken the time to port the whole code we had originally in heapMonitoring to this. I hesitate only because that code was in C++, I'd have to port it to C and this is for tests so perhaps what I have now is good enough? As far as testing goes, I've ported all the relevant tests and then added a few: - Turning the system on/off - Testing using various GCs - Testing using the interpreter - Testing the sampling rate - Testing with objects and arrays - Testing with various threads Finally, as overhead goes, I have the numbers of the system off vs a clean build and I have 0% overhead, which is what we'd want. This was using the Dacapo benchmarks. I am now preparing to run a version with the events on using dacapo and will report back here. Any comments are welcome :) Jc On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > wrote: Hi all, I apologize for the delay but I wanted to add an event system and that took a bit longer than expected and I also reworked the code to take into account the deprecation of FastTLABRefill. This update has four parts: A) I moved the implementation from Thread to ThreadHeapSampler inside of Thread. Would you prefer it as a pointer inside of Thread or like this works for you? Second question would be would you rather have an association outside of Thread altogether that tries to remember when threads are live and then we would have something like: ThreadHeapSampler::get_sampling_size(this_thread); I worry about the overhead of this but perhaps it is not too too bad? B) I also have been working on the Allocation event system that sends out a notification at each sampled event. This will be practical when wanting to do something at the allocation point. I'm also looking at if the whole heapMonitoring code could not reside in the agent code and not in the JDK. I'm not convinced but I'm talking to Serguei about it to see/assess :) - Also added two tests for the new event subsystem C) Removed the slow_path fields inside the TLAB code since now FastTLABRefill is deprecated D) Updated the JVMTI documentation and specification for the methods. So the incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ and the full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 I believe I have updated the various JIRA issues that track this :) Thanks for your input, Jc On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > wrote: Hi Erik, I inlined my answers, which the last one seems to answer Robbin's concerns about the same thing (adding things to Thread). On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > wrote: Hi JC, Comments are inlined below. On 2018-02-13 06:18, JC Beyler wrote: Hi Erik, Thanks for your answers, I've now inlined my own answers/comments. I've done a new webrev here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ The incremental is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ Note to all: - I've been integrating changes from Erin/Serguei/David comments so this webrev incremental is a bit an answer to all comments in one. I apologize for that :) On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > wrote: Hi JC, Sorry for the delayed reply. Inlined answers: On 2018-02-06 00:04, JC Beyler wrote: Hi Erik, (Renaming this to be folded into the newly renamed thread :)) First off, thanks a lot for reviewing the webrev! I appreciate it! I updated the webrev to: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ And the incremental one is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ It contains: - The change for since from 9 to 11 for the jvmti.xml - The use of the OrderAccess for initialized - Clearing the oop I also have inlined my answers to your comments. The biggest question will come from the multiple *_end variables. A bit of the logic there is due to handling the slow path refill vs fast path refill and checking that the rug was not pulled underneath the slowpath. I believe that a previous comment was that TlabFastRefill was going to be deprecated. If this is true, we could revert this code a bit and just do a : if TlabFastRefill is enabled, disable this. And then deprecate that when TlabFastRefill is deprecated. This might simplify this webrev and I can work on a follow-up that either: removes TlabFastRefill if Robbin does not have the time to do it or add the support to the assembly side to handle this correctly. What do you think? I support removing TlabFastRefill, but I think it is good to not depend on that happening first. I'm slowly pushing on the FastTLABRefill (https://bugs.openjdk.java.net/browse/JDK-8194084), I agree on keeping both separate for now though so that we can think of both differently Now, below, inlined are my answers: On Fri, Feb 2, 2018 at 8:44 AM, Erik ?sterlund > wrote: Hi JC, Hope I am reviewing the right version of your work. Here goes... src/hotspot/share/gc/shared/collectedHeap.inline.hpp: 159 AllocTracer::send_allocation_outside_tlab(klass, result, size * HeapWordSize, THREAD); 160 161 THREAD->tlab().handle_sample(THREAD, result, size); 162 return result; 163 } Should not call tlab()->X without checking if (UseTLAB) IMO. Done! More about this later. src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: So first of all, there seems to quite a few ends. There is an "end", a "hard end", a "slow path end", and an "actual end". Moreover, it seems like the "hard end" is actually further away than the "actual end". So the "hard end" seems like more of a "really definitely actual end" or something. I don't know about you, but I think it looks kind of messy. In particular, I don't feel like the name "actual end" reflects what it represents, especially when there is another end that is behind the "actual end". 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { 414 // Did a fast TLAB refill occur? 415 if (_slow_path_end != _end) { 416 // Fix up the actual end to be now the end of this TLAB. 417 _slow_path_end = _end; 418 _actual_end = _end; 419 } 420 421 return _actual_end + alignment_reserve(); 422 } I really do not like making getters unexpectedly have these kind of side effects. It is not expected that when you ask for the "hard end", you implicitly update the "slow path end" and "actual end" to new values. As I said, a lot of this is due to the FastTlabRefill. If I make this not supporting FastTlabRefill, this goes away. The reason the system needs to update itself at the get is that you only know at that get if things have shifted underneath the tlab slow path. I am not sure of really better names (naming is hard!), perhaps we could do these names: - current_tlab_end // Either the allocated tlab end or a sampling point - last_allocation_address // The end of the tlab allocation - last_slowpath_allocated_end // In case a fast refill occurred the end might have changed, this is to remember slow vs fast past refills the hard_end method can be renamed to something like: tlab_end_pointer() // The end of the lab including a bit of alignment reserved bytes Those names sound better to me. Could you please provide a mapping from the old names to the new names so I understand which one is which please? This is my current guess of what you are proposing: end -> current_tlab_end actual_end -> last_allocation_address slow_path_end -> last_slowpath_allocated_end hard_end -> tlab_end_pointer Yes that is correct, that was what I was proposing. I would prefer this naming: end -> slow_path_end // the end for taking a slow path; either due to sampling or refilling actual_end -> allocation_end // the end for allocations slow_path_end -> last_slow_path_end // last address for slow_path_end (as opposed to allocation_end) hard_end -> reserved_end // the end of the reserved space of the TLAB About setting things in the getter... that still seems like a very unpleasant thing to me. It would be better to inspect the call hierarchy and explicitly update the ends where they need updating, and assert in the getter that they are in sync, rather than implicitly setting various ends as a surprising side effect in a getter. It looks like the call hierarchy is very small. With my new naming convention, reserved_end() would presumably return _allocation_end + alignment_reserve(), and have an assert checking that _allocation_end == _last_slow_path_allocation_end, complaining that this invariant must hold, and that a caller to this function, such as make_parsable(), must first explicitly synchronize the ends as required, to honor that invariant. I've renamed the variables to how you preferred it except for the _end one. I did: current_end last_allocation_address tlab_end_ptr The reason is that the architecture dependent code use the thread.hpp API and it already has tlab included into the name so it becomes tlab_current_end (which is better that tlab_current_tlab_end in my opinion). I also moved the update into a separate method with a TODO that says to remove it when FastTLABRefill is deprecated This looks a lot better now. Thanks. Note that the following comment now needs updating accordingly in threadLocalAllocBuffer.hpp: 41 // Heap sampling is performed via the end/actual_end fields. 42 // actual_end contains the real end of the tlab allocation, 43 // whereas end can be set to an arbitrary spot in the tlab to 44 // trip the return and sample the allocation. 45 // slow_path_end is used to track if a fast tlab refill occured 46 // between slowpath calls. There might be other comments too, I have not looked in detail. This was the only spot that still had an actual_end, I fixed it now. I'll do a sweep to double check other comments. Not sure it's better but before updating the webrev, I wanted to try to get input/consensus :) (Note hard_end was always further off than end). src/hotspot/share/prims/jvmti.xml: 10357 10358 10359 Can sample the heap. 10360 If this capability is enabled then the heap sampling methods can be called. 10361 10362 Looks like this capability should not be "since 9" if it gets integrated now. Updated now to 11, crossing my fingers :) src/hotspot/share/runtime/heapMonitoring.cpp: 448 if (is_alive->do_object_b(value)) { 449 // Update the oop to point to the new object if it is still alive. 450 f->do_oop(&(trace.obj)); 451 452 // Copy the old trace, if it is still live. 453 _allocated_traces->at_put(curr_pos++, trace); 454 455 // Store the live trace in a cache, to be served up on /heapz. 456 _traces_on_last_full_gc->append(trace); 457 458 count++; 459 } else { 460 // If the old trace is no longer live, add it to the list of 461 // recently collected garbage. 462 store_garbage_trace(trace); 463 } In the case where the oop was not live, I would like it to be explicitly cleared. Done I think how you wanted it. Let me know because I'm not familiar with the RootAccess API. I'm unclear if I'm doing this right or not so reviews of these parts are highly appreciated. Robbin had talked of perhaps later pushing this all into a OopStorage, should I do this now do you think? Or can that wait a second webrev later down the road? I think using handles can and should be done later. You can use the Access API now. I noticed that you are missing an #include "oops/access.inline.hpp" in your heapMonitoring.cpp file. The missing header is there for me so I don't know, I made sure it is present in the latest webrev. Sorry about that. + Did I clear it the way you wanted me to or were you thinking of something else? That is precisely how I wanted it to be cleared. Thanks. + Final question here, seems like if I were to want to not do the f->do_oop directly on the trace.obj, I'd need to do something like: f->do_oop(&value); ... trace->store_oop(value); to update the oop internally. Is that right/is that one of the advantages of going to the Oopstorage sooner than later? I think you really want to do the do_oop on the root directly. Is there a particular reason why you would not want to do that? Otherwise, yes - the benefit with using the handle approach is that you do not need to call do_oop explicitly in your code. There is no reason except that now we have a load_oop and a get_oop_addr, I was not sure what you would think of that. That's fine. Also I see a lot of concurrent-looking use of the following field: 267 volatile bool _initialized; Please note that the "volatile" qualifier does not help with reordering here. Reordering between volatile and non-volatile fields is completely free for both compiler and hardware, except for windows with MSVC, where volatile semantics is defined to use acquire/release semantics, and the hardware is TSO. But for the general case, I would expect this field to be stored with OrderAccess::release_store and loaded with OrderAccess::load_acquire. Otherwise it is not thread safe. Because everything is behind a mutex, I wasn't really worried about this. I have a test that has multiple threads trying to hit this corner case and it passes. However, to be paranoid, I updated it to using the OrderAccess API now, thanks! Let me know what you think there too! If it is indeed always supposed to be read and written under a mutex, then I would strongly prefer to have it accessed as a normal non-volatile member, and have an assertion that given lock is held or we are in a safepoint, as we do in many other places. Something like this: assert(HeapMonitorStorage_lock->owned_by_self() || (SafepointSynchronize::is_at_safepoint() && Thread::current()->is_VM_thread()), "this should not be accessed concurrently"); It would be confusing to people reading the code if there are uses of OrderAccess that are actually always protected under a mutex. Thank you for the exact example to be put in the code! I put it around each access/assignment of the _initialized method and found one case where yes you can touch it and not have the lock. It actually is "ok" because you don't act on the storage until later and only when you really want to modify the storage (see the object_alloc_do_sample method which calls the add_trace method). But, because of this, I'm going to put the OrderAccess here, I'll do some performance numbers later and if there are issues, I might add a "unsafe" read and a "safe" one to make it explicit to the reader. But I don't think it will come to that. Okay. This double return in heapMonitoring.cpp looks wrong: 283 bool initialized() { 284 return OrderAccess::load_acquire(&_initialized) != 0; 285 return _initialized; 286 } Since you said object_alloc_do_sample() is the only place where you do not hold the mutex while reading initialized(), I had a closer look at that. It looks like in its current shape, the lack of a mutex may lead to a memory leak. In particular, it first checks if (initialized()). Let's assume this is now true. It then allocates a bunch of stuff, and checks if the number of frames were over 0. If they were, it calls StackTraceStorage::storage()->add_trace() seemingly hoping that after grabbing the lock in there, initialized() will still return true. But it could now return false and skip doing anything, in which case the allocated stuff will never be freed. I fixed this now by making add_trace return a boolean and checking for that. It will be in the next webrev. Thanks, the truth is that in our implementation the system is always on or off, so this never really occurs :). In this version though, that is not true and it's important to handle so thanks again! So the analysis seems to be that _initialized is only used outside of the mutex in once instance, where it is used to perform double-checked locking, that actually causes a memory leak. I am not proposing how to fix that, just raising the issue. If you still want to perform this double-checked locking somehow, then the use of acquire/release still seems odd. Because the memory ordering restrictions of it never comes into play in this particular case. If it ever did, then the use of destroy_stuff(); release_store(_initialized, 0) would be broken anyway as that would imply that whatever concurrent reader there ever was would after reading _initialized with load_acquire() could *never* read the data that is concurrently destroyed anyway. I would be biased to think that RawAccess::load/store looks like a more appropriate solution, given that the memory leak issue is resolved. I do not know how painful it would be to not perform this double-checked locking. So I agree with this entirely. I looked also a bit more and the difference and code really stems from our internal version. In this version however, there are actually a lot of things going on that I did not go entirely through in my head but this comment made me ponder a bit more on it. Since every object_alloc_do_sample is protected by a check to HeapMonitoring::enabled(), there is only a small chance that the call is happening when things have been disabled. So there is no real need to do a first check on the initialized, it is a rare occurence that a call happens to object_alloc_do_sample and the initialized of the storage returns false. (By the way, even if you did call object_alloc_do_sample without looking at HeapMonitoring::enabled(), that would be ok too. You would gather the stacktrace and get nowhere at the add_trace call, which would return false; so though not optimal performance wise, nothing would break). Furthermore, the add_trace is really the moment of no return and we have the mutex lock and then the initialized check. So, in the end, I did two things: I removed that first check and then I removed the OrderAccess for the storage initialized. I think now I have a better grasp and understanding why it was done in our code and why it is not needed here. Thanks for pointing it out :). This now still passes my JTREG tests, especially the threaded one. As a kind of meta comment, I wonder if it would make sense to add sampling for non-TLAB allocations. Seems like if someone is rapidly allocating a whole bunch of 1 MB objects that never fit in a TLAB, I might still be interested in seeing that in my traces, and not get surprised that the allocation rate is very high yet not showing up in any profiles. That is handled by the handle_sample where you wanted me to put a UseTlab because you hit that case if the allocation is too big. I see. It was not obvious to me that non-TLAB sampling is done in the TLAB class. That seems like an abstraction crime. What I wanted in my previous comment was that we do not call into the TLAB when we are not using TLABs. If there is sampling logic in the TLAB that is used for something else than TLABs, then it seems like that logic simply does not belong inside of the TLAB. It should be moved out of the TLAB, and instead have the TLAB call this common abstraction that makes sense. So in the incremental version: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still a "crime". The reason is that the system has to have the bytes_until_sample on a per-thread level and it made "sense" to have it with the TLAB implementation. Also, I was not sure how people felt about adding something to the thread instance instead. Do you think it fits better at the Thread level? I can see how difficult it is to make it happen there and add some logic there. Let me know what you think. We have an unfortunate situation where everyone that has some fields that are thread local tend to dump them right into Thread, making the size and complexity of Thread grow as it becomes tightly coupled with various unrelated subsystems. It would be desirable to have a separate class for this instead that encapsulates the sampling logic. That class could possibly reside in Thread though as a value object of Thread. I imagined that would be the case but was not sure. I will look at the example that Robbin is talking about (ThreadSMR) and will see how to refactor my code to use that. Thanks again for your help, Jc Hope I have answered your questions and that my feedback makes sense to you. You have and thank you for them, I think we are getting to a cleaner implementation and things are getting better and more readable :) Yes it is getting better. Thanks, /Erik Thanks for your help! Jc Thanks, /Erik I double checked by changing the test http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java to use a smaller Tlab (2048) and made the object bigger and it goes through that and passes. Thanks again for your review and I look forward to your pointers for the questions I now have raised! Jc Thanks, /Erik On 2018-01-26 06:45, JC Beyler wrote: Thanks Robbin for the reviews :) The new full webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ The incremental webrev is here: http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ I inlined my answers: On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn > wrote: Hi JC, great to see another revision! #### heapMonitoring.cpp StackTraceData should not contain the oop for 'safety' reasons. When StackTraceData is moved from _allocated_traces: L452 store_garbage_trace(trace); it contains a dead oop. _allocated_traces could instead be a tupel of oop and StackTraceData thus dead oops are not kept. Done I used inheritance to make the copier work regardless but the idea is the same. You should use the new Access API for loading the oop, something like this: RootAccess::load(...) I don't think you need to use Access API for clearing the oop, but it would look nicer. And you shouldn't probably be using: Universe::heap()->is_in_reserved(value) I am unfamiliar with this but I think I did do it like you wanted me to (all tests pass so that's a start). I'm not sure how to clear the oop exactly, is there somewhere that does that, which I can use to do the same? I removed the is_in_reserved, this came from our internal version, I don't know why it was there but my tests work without so I removed it :) The lock: L424 MutexLocker mu(HeapMonitorStorage_lock); Is not needed as far as I can see. weak_oops_do is called in a safepoint, no TLAB allocation can happen and JVMTI thread can't access these data-structures. Is there something more to this lock that I'm missing? Since a thread can call the JVMTI getLiveTraces (or any of the other ones), it can get to the point of trying to copying the _allocated_traces. I imagine it is possible that this is happening during a GC or that it can be started and a GC happens afterwards. Therefore, it seems to me that you want this protected, no? #### You have 6 files without any changes in them (any more): g1CollectedHeap.cpp psMarkSweep.cpp psParallelCompact.cpp genCollectedHeap.cpp referenceProcessor.cpp thread.hpp Done. #### I have not looked closely, but is it possible to hide heap sampling in AllocTracer ? (with some minor changes to the AllocTracer API) I am imagining that you are saying to move the code that does the sampling code (change the tlab end, do the call to HeapMonitoring, etc.) into the AllocTracer code itself? I think that is right and I'll look if that is possible and prepare a webrev to show what would be needed to make that happen. #### Minor nit, when declaring pointer there is a little mix of having the pointer adjacent by type name and data name. (Most hotspot code is by type name) E.g. heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... heapMonitoring.cpp:733 Method* m = vfst.method(); (not just this file) Done! #### HeapMonitorThreadOnOffTest.java:77 I would make g_tmp volatile, otherwise the assignment in loop may theoretical be skipped. Also done! Thanks again! Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Wed Apr 4 12:34:36 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Wed, 4 Apr 2018 12:34:36 +0000 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: Hi Thomas, I like the fix, too. Maybe you can add example output before and after sorting to the bug. Thanks Christoph > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > bounces at openjdk.java.net] On Behalf Of serguei.spitsyn at oracle.com > Sent: Dienstag, 3. April 2018 03:53 > To: Thomas St?fe ; Hotspot dev runtime > ; serviceability- > dev at openjdk.java.net > Subject: Re: RFR(xxs): 8200384: jcmd help output should be sorted > > Hi Thomas, > > Added the serviceability-dev mailing list as it is a Serviceability area. > > The fix looks good to me. > One question: > ?Could you, please, post the sorted help output? > ?It is interesting how does it look like when sorted. > > Thanks, > Serguei > > > On 3/28/18 13:08, Thomas St?fe wrote: > > Hi all, > > > > may I get reviews for this tiny trivial change which causes jcmd help > > output (the command list) to be sorted? > > > > bug: https://bugs.openjdk.java.net/browse/JDK-8200384 > > webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help- > sorted/webrev.00/webrev/ > > > > Thanks! > > > > Best Regards, Thomas From daniil.x.titov at oracle.com Wed Apr 4 17:45:14 2018 From: daniil.x.titov at oracle.com (Daniil Titov) Date: Wed, 04 Apr 2018 10:45:14 -0700 Subject: CANCELED: RFR 4613913: Four EventRequest methods are invokable on deleted request Message-ID: <3C232EEE-396C-4BB7-9153-28C1C641C10A@oracle.com> Hello, Based on the discussion below I am canceling this review. The issue will be addressed by changing the test that resides out of the open repository. Thanks! Best regards, Daniil ?On 4/2/18, 3:02 PM, "David Holmes" wrote: Hi Serguei, On 3/04/2018 7:44 AM, serguei.spitsyn at oracle.com wrote: > Hi David and Daniil, > > > David, > > Thank you for raising this concern. > You are right. > > I've made a mistake when looked at the EventRequest.isEnabled() spec and > thought > that the following spec lines of the setEnbaled() belong to the isEnabled() > and other 3 methods as well: > > Throws: > |InvalidRequestStateException > | > - if this request has been deleted. > > In fact, the JDI spec for methods isEnabled(), getProperty(), > putProperty() and suspendPolicy() > does not say they can throw the InvalidRequestStateException. > > So, now I'd suggest to just relax the test checks by not expecting an > InvalidRequestStateException from isEnabled(), getProperty(), putProperty() > and suspendPolicy(). > > Would this approach resolve your concern? Yes. The semantics for these methods was established way back in 2000 under: https://bugs.openjdk.java.net/browse/JDK-4320478 I think this bug, 4613913, was misguided in expecting all of the methods to throw the exception. You could make a case for doing so, but as I said that's a spec change that should have been made back then. Changing the spec now seems pointless - it gains nothing but introduces an incompatible behaviour change. Changing the test is the way to go. Thanks, David ----- > Thanks, > Serguei > > > > > On 3/29/18 17:12, David Holmes wrote: >> Daniil, >> >> Even as far back as 2007 there was concern that changing the current >> behaviour might break existing code. That has to be an even bigger >> concern now! >> >> Further the spec is sloppy here: >> >> " Once the eventRequest is deleted, no operations (for example, >> EventRequest.setEnabled(boolean)) are permitted." >> >> This is too loose. What is an "operation"? Is a query like isEnabled() >> really an "operation"? I would not consider it so. And if we can >> delete requests why is there no "isDeleted" query? The spec seems >> incomplete and too vague. >> >> To me this something that should have been clarified in the spec first >> and then the implementation brought into alignment. But that should >> have happened many years ago. Changing this now seems risky to me. >> >> This change in long standing behaviour also requires a CSR request if >> it is to proceed. >> >> David >> ----- >> >> >> On 30/03/2018 8:36 AM, Daniil Titov wrote: >>> Hi Serguei, >>> >>> Please review a new version of the fix that has these places corrected. >>> >>> Webreb: http://cr.openjdk.java.net/~dtitov/4613913/webrev.03 >>> Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> >>> Thanks! >>> >>> Best regards, >>> Daniil >>> >>> ?On 3/29/18, 11:46 AM, "serguei.spitsyn at oracle.com" >>> wrote: >>> >>> Hi Daniil, >>> It looks good in general. >>> One minor comment is that it would be nice to make a cleanup >>> (as we already discussed) for all places like this: >>> 202 if (isEnabled() || deleted) { >>> 203 throw invalidState(); >>> 204 } >>> As the isEnabled() now checks for deleted and throws the >>> invalidState() >>> then we can simplify these fragments to be: >>> 202 if (isEnabled()) { >>> 203 throw invalidState(); >>> 204 } >>> Thanks, >>> Serguei >>> On 3/29/18 10:27, Daniil Titov wrote: >>> > Please review the changes that ensure that no operation on >>> deleted com.sun.jdi.request.EventRequest objects are permitted as per >>> JDI specification for >>> com.sun.jdi.request.EventRequestManager.deleteEventRequest(com.sun.jdi.request.EventRequest) >>> method. The fix makes the following 4 methods in class >>> com.sun.tools.jdi. EventRequestManagerImpl$EventRequestImpl to throw >>> com.sun.jdi.request.InvalidRequestStateException if the request is >>> deleted: >>> > - getProperty() >>> > - putProperty(Object, Object) >>> > - suspendPolicy() >>> > - isEnabled() >>> > >>> > Bug: https://bugs.openjdk.java.net/browse/JDK-4613913 >>> > Webrev: http://cr.openjdk.java.net/~dtitov/4613913/webrev.02/ >>> > >>> > Best regards, >>> > Daniil >>> > >>> > >>> >>> > From gary.adams at oracle.com Wed Apr 4 18:18:35 2018 From: gary.adams at oracle.com (Gary Adams) Date: Wed, 04 Apr 2018 14:18:35 -0400 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 Message-ID: <5AC516FB.9010101@oracle.com> Getting the sources ready for the next Solaris developer studio toolchain. Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ This update conditionally disables some new error checks, if the new toolchain is used. From serguei.spitsyn at oracle.com Wed Apr 4 18:41:07 2018 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 4 Apr 2018 11:41:07 -0700 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: <5AC516FB.9010101@oracle.com> References: <5AC516FB.9010101@oracle.com> Message-ID: <3d192086-7fdd-4bc3-337a-6e2e34b1e99f@oracle.com> Hi Gary, It looks reasonable. I'm not very familiar with the concrete SolStudio versions. Thanks, Serguei On 4/4/18 11:18, Gary Adams wrote: > Getting the sources ready for the next Solaris developer studio > toolchain. > > ? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 > ? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ > > This update conditionally disables some new error checks, if the > new toolchain is used. From david.holmes at oracle.com Thu Apr 5 00:00:02 2018 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Apr 2018 10:00:02 +1000 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: <5AC516FB.9010101@oracle.com> References: <5AC516FB.9010101@oracle.com> Message-ID: Hi Gary, On 5/04/2018 4:18 AM, Gary Adams wrote: > Getting the sources ready for the next Solaris developer studio toolchain. > > ? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 > ? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ > > This update conditionally disables some new error checks, if the > new toolchain is used. This looks odd: 231 DISABLED_WARNINGS_solstudio := $(DISABLED_WARNINGS_solstudio), \ as it is self-referential. Should you use a different variable name? Is there an issue if this variable has not been set? Otherwise seems okay. Thanks, David From erik.joelsson at oracle.com Thu Apr 5 00:05:43 2018 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 4 Apr 2018 17:05:43 -0700 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: References: <5AC516FB.9010101@oracle.com> Message-ID: On 2018-04-04 17:00, David Holmes wrote: > Hi Gary, > > On 5/04/2018 4:18 AM, Gary Adams wrote: >> Getting the sources ready for the next Solaris developer studio >> toolchain. >> >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 >> ?? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ >> >> This update conditionally disables some new error checks, if the >> new toolchain is used. > > This looks odd: > > ?231???? DISABLED_WARNINGS_solstudio := $(DISABLED_WARNINGS_solstudio), \ > > as it is self-referential. Should you use a different variable name? > Is there an issue if this variable has not been set? > This construct may look a bit weird but is fine. The named parameter will get translated behind the scenes to BUILD_LIBJVM_DISABLED_WARNINGS_solstudio so it's not actually self referential (and even if it was, it would still work as expected, even if it looks a bit weird). /Erik > Otherwise seems okay. > > Thanks, > David From magnus.ihse.bursie at oracle.com Thu Apr 5 12:35:45 2018 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 5 Apr 2018 14:35:45 +0200 Subject: RFR: JDK-8199782: Fix compilation warnings detected by Solaris Developer Studio 12.6 In-Reply-To: <5AC516FB.9010101@oracle.com> References: <5AC516FB.9010101@oracle.com> Message-ID: <904f24c7-0fb7-a99a-669e-fcd3277291f8@oracle.com> On 2018-04-04 20:18, Gary Adams wrote: > Getting the sources ready for the next Solaris developer studio > toolchain. > > ? Issue: https://bugs.openjdk.java.net/browse/JDK-8199782 > ? Webrev: http://cr.openjdk.java.net/~gadams/8199782/webrev.00/ > > This update conditionally disables some new error checks, if the > new toolchain is used. Looks good to me. /Magnus From boris.ulasevich at bell-sw.com Thu Apr 5 11:54:01 2018 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 5 Apr 2018 14:54:01 +0300 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: Hi JC, I have just checked on arm32: your patch compiles and runs ok. As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not correspond to actual library name: libHeapMonitorTest.c -> libHeapMonitorTest.so Boris On 04.04.2018 01:54, White, Derek wrote: > Thanks JC, > > New patch applies cleanly. Compiles and runs (simple test programs) on > aarch64. > > * Derek > > *From:* JC Beyler [mailto:jcbeyler at google.com] > *Sent:* Monday, April 02, 2018 1:17 PM > *To:* White, Derek > *Cc:* Erik ?sterlund ; > serviceability-dev at openjdk.java.net; hotspot-compiler-dev > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > Hi Derek, > > I know there were a few things that went in that provoked a merge > conflict. I worked on it and got it up to date. Sadly my lack of > knowledge makes it a full rebase instead of keeping all the history. > However, with a newly cloned jdk/hs you should now be able to use: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ > > The change you are referring to was done with the others so perhaps you > were unlucky and I forgot it in a webrev and fixed it in another? I > don't know but it's been there and I checked, it is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html > > I double checked that tlab_end_offset no longer appears in any > architecture (as far as I can tell :)). > > Thanks for testing and let me know if you run into any other issues! > > Jc > > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > wrote: > > Hi Jc, > > I?ve been having trouble getting your patch to apply correctly. I > may have based it on the wrong version. > > In any case, I think there?s a missing update to > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), > where ?JavaThread::tlab_end_offset()? should become > ?JavaThread::tlab_current_end_offset()?. > > This should correspond to the other port?s changes in > templateTable_.cpp files. > > Thanks! > - Derek > > *From:* hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > ] *On Behalf > Of *JC Beyler > *Sent:* Wednesday, March 28, 2018 11:43 AM > *To:* Erik ?sterlund > > *Cc:* serviceability-dev at openjdk.java.net > ; hotspot-compiler-dev >

> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > Hi all, > > I've been working on deflaking the tests mostly and the wording in > the JVMTI spec. > > Here is the two incremental webrevs: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > Here is the total webrev: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > Here are the notes of this change: > > ? - Currently the tests pass 100 times in a row, I am working on > checking if they pass 1000 times in a row. > > ? - The default sampling rate is set to 512k, this is what we use > internally and having a default means that to enable the sampling > with the default, the user only has to do a enable event/disable > event via JVMTI (instead of enable?+ set sample rate). > > ? - I deprecated the code that was handling the fast path tlab > refill if it happened since this is now deprecated > > ? ? ? - Though I saw that Graal is still using it so I have to see > what needs to be done there exactly > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for > when the event system is turned on and the callback to the native > agent is just empty. I got a 3% overhead with a 512k sampling rate > with the code I put in the native side of my tests. > > Thanks and comments are appreciated, > > Jc > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > wrote: > > Hi all, > > The incremental webrev update is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > The full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > Major change here is: > > ? - I've removed the heapMonitoring.cpp code in favor of just > having the sampling events as per Serguei's request; I still > have to do some overhead measurements but the tests prove the > concept can work > > ? ? ? ?- Most of the tlab code is unchanged, the only major > part is that now things get sent off to event collectors when > used and enabled. > > ? - Added the interpreter collectors to handle interpreter > execution > > ? - Updated the name from SetTlabHeapSampling to > SetHeapSampling to be more generic > > ? - Added a mutex for the thread sampling so that we can > initialize an internal static array safely > > ? - Ported the tests from the old system to this new one > > I've also updated the JEP and CSR to reflect these changes: > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > In order to make this have some forward progress, I've removed > the heap sampling code entirely and now rely entirely on the > event sampling system. The tests reflect this by using a > simplified implementation of what an agent could do: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > (Search for anything mentioning event_storage). > > I have not taken the time to port the whole code we had > originally in heapMonitoring to this. I hesitate only because > that code was in C++, I'd have to port it to C and this is for > tests so perhaps what I have now is good enough? > > As far as testing goes, I've ported all the relevant tests and > then added a few: > > ? ?- Turning the system on/off > > ? ?- Testing using various GCs > > ? ?- Testing using the interpreter > > ? ?- Testing the sampling rate > > ? ?- Testing with objects and arrays > > ? ?- Testing with various threads > > Finally, as overhead goes, I have the numbers of the system off > vs a clean build and I have 0% overhead, which is what we'd > want. This was using the Dacapo benchmarks. I am now preparing > to run a version with the events on using dacapo and will report > back here. > > Any comments are welcome :) > > Jc > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > wrote: > > Hi all, > > I apologize for the delay but I wanted to add an event > system and that took a bit longer than expected and I also > reworked the code to take into account the deprecation of > FastTLABRefill. > > This update has four parts: > > A) I moved the implementation from Thread to > ThreadHeapSampler inside of Thread. Would you prefer it as a > pointer inside of Thread or like this works for you? Second > question would be would you rather have an association > outside of Thread altogether that tries to remember when > threads are live and then we would have something like: > > ThreadHeapSampler::get_sampling_size(this_thread); > > I worry about the overhead of this but perhaps it is not too > too bad? > > B) I also have been working on the Allocation event system > that sends out a notification at each sampled event. This > will be practical when wanting to do something at the > allocation point. I'm also looking at if the whole > heapMonitoring code could not reside in the agent code and > not in the JDK. I'm not convinced but I'm talking to Serguei > about it to see/assess :) > > ? ?- Also added two tests for the new event subsystem > > C) Removed the slow_path fields inside the TLAB code since > now FastTLABRefill is deprecated > > D) Updated the JVMTI documentation and specification for the > methods. > > So the incremental webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > and the full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > I believe I have updated the various JIRA issues that track > this :) > > Thanks for your input, > > Jc > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > > wrote: > > Hi Erik, > > I inlined my answers, which the last one seems to answer > Robbin's concerns about the same thing (adding things to > Thread). > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > > wrote: > > Hi JC, > > Comments are inlined below. > > On 2018-02-13 06:18, JC Beyler wrote: > > Hi Erik, > > Thanks for your answers, I've now inlined my own > answers/comments. > > I've done a new webrev here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > The incremental is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > Note to all: > > ? - I've been integrating changes from > Erin/Serguei/David comments so this webrev > incremental is a bit an answer to all comments > in one. I apologize for that :) > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > > wrote: > > Hi JC, > > Sorry for the delayed reply. > > Inlined answers: > > > > On 2018-02-06 00:04, JC Beyler wrote: > > Hi Erik, > > (Renaming this to be folded into the > newly renamed thread :)) > > First off, thanks a lot for reviewing > the webrev! I appreciate it! > > I updated the webrev to: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > > And the incremental one is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > > It contains: > - The change for since from 9 to 11 for > the jvmti.xml > - The use of the OrderAccess for initialized > - Clearing the oop > > I also have inlined my answers to your > comments. The biggest question > will come from the multiple *_end > variables. A bit of the logic there > is due to handling the slow path refill > vs fast path refill and > checking that the rug was not pulled > underneath the slowpath. I > believe that a previous comment was that > TlabFastRefill was going to > be deprecated. > > If this is true, we could revert this > code a bit and just do a : if > TlabFastRefill is enabled, disable this. > And then deprecate that when > TlabFastRefill is deprecated. > > This might simplify this webrev and I > can work on a follow-up that > either: removes TlabFastRefill if Robbin > does not have the time to do > it or add the support to the assembly > side to handle this correctly. > What do you think? > > I support removing TlabFastRefill, but I > think it is good to not depend on that > happening first. > > > I'm slowly pushing on the FastTLABRefill > (https://bugs.openjdk.java.net/browse/JDK-8194084), > I agree on keeping both separate for now though > so that we can think of both differently > > Now, below, inlined are my answers: > > On Fri, Feb 2, 2018 at 8:44 AM, Erik > ?sterlund > > wrote: > > Hi JC, > > Hope I am reviewing the right > version of your work. Here goes... > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > ? 159 > ?AllocTracer::send_allocation_outside_tlab(klass, result, size * > HeapWordSize, THREAD); > ? 160 > ? 161 > ?THREAD->tlab().handle_sample(THREAD, result, size); > ? 162? ? ?return result; > ? 163? ?} > > Should not call tlab()->X without > checking if (UseTLAB) IMO. > > Done! > > > More about this later. > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > So first of all, there seems to > quite a few ends. There is an "end", > a "hard > end", a "slow path end", and an > "actual end". Moreover, it seems > like the > "hard end" is actually further away > than the "actual end". So the "hard end" > seems like more of a "really > definitely actual end" or something. > I don't > know about you, but I think it looks > kind of messy. In particular, I don't > feel like the name "actual end" > reflects what it represents, > especially when > there is another end that is behind > the "actual end". > > ? 413 HeapWord* > ThreadLocalAllocBuffer::hard_end() { > ? 414? ?// Did a fast TLAB refill > occur? > ? 415? ?if (_slow_path_end != _end) { > ? 416? ? ?// Fix up the actual end > to be now the end of this TLAB. > ? 417? ? ?_slow_path_end = _end; > ? 418? ? ?_actual_end = _end; > ? 419? ?} > ? 420 > ? 421? ?return _actual_end + > alignment_reserve(); > ? 422 } > > I really do not like making getters > unexpectedly have these kind of side > effects. It is not expected that > when you ask for the "hard end", you > implicitly update the "slow path > end" and "actual end" to new values. > > As I said, a lot of this is due to the > FastTlabRefill. If I make this > not supporting FastTlabRefill, this goes > away. The reason the system > needs to update itself at the get is > that you only know at that get if > things have shifted underneath the tlab > slow path. I am not sure of > really better names (naming is hard!), > perhaps we could do these > names: > > - current_tlab_end? ? ? ?// Either the > allocated tlab end or a sampling point > - last_allocation_address? // The end of > the tlab allocation > - last_slowpath_allocated_end? // In > case a fast refill occurred the > end might have changed, this is to > remember slow vs fast past refills > > the hard_end method can be renamed to > something like: > tlab_end_pointer()? ? ? ? // The end of > the lab including a bit of > alignment reserved bytes > > Those names sound better to me. Could you > please provide a mapping from the old names > to the new names so I understand which one > is which please? > > This is my current guess of what you are > proposing: > > end -> current_tlab_end > actual_end -> last_allocation_address > slow_path_end -> last_slowpath_allocated_end > hard_end -> tlab_end_pointer > > Yes that is correct, that was what I was proposing. > > I would prefer this naming: > > end -> slow_path_end // the end for taking a > slow path; either due to sampling or refilling > actual_end -> allocation_end // the end for > allocations > slow_path_end -> last_slow_path_end // last > address for slow_path_end (as opposed to > allocation_end) > hard_end -> reserved_end // the end of the > reserved space of the TLAB > > About setting things in the getter... that > still seems like a very unpleasant thing to > me. It would be better to inspect the call > hierarchy and explicitly update the ends > where they need updating, and assert in the > getter that they are in sync, rather than > implicitly setting various ends as a > surprising side effect in a getter. It looks > like the call hierarchy is very small. With > my new naming convention, reserved_end() > would presumably return _allocation_end + > alignment_reserve(), and have an assert > checking that _allocation_end == > _last_slow_path_allocation_end, complaining > that this invariant must hold, and that a > caller to this function, such as > make_parsable(), must first explicitly > synchronize the ends as required, to honor > that invariant. > > > I've renamed the variables to how you preferred > it except for the _end one. I did: > > current_end > > last_allocation_address > > tlab_end_ptr > > The reason is that the architecture dependent > code use the thread.hpp API and it already has > tlab included into the name so it becomes > tlab_current_end (which is better that > tlab_current_tlab_end in my opinion). > > I also moved the update into a separate method > with a TODO that says to remove it when > FastTLABRefill is deprecated > > This looks a lot better now. Thanks. > > Note that the following comment now needs updating > accordingly in threadLocalAllocBuffer.hpp: > > ? 41 //??????????? Heap sampling is performed via > the end/actual_end fields. > > ? 42 //??????????? actual_end contains the real end > of the tlab allocation, > > ? 43 //??????????? whereas end can be set to an > arbitrary spot in the tlab to > > ? 44 //????????? ??trip the return and sample the > allocation. > > ? 45 //??????????? slow_path_end is used to track > if a fast tlab refill occured > > ? 46 //??????????? between slowpath calls. > > There might be other comments too, I have not looked > in detail. > > This was the only spot that still had an actual_end, I > fixed it now. I'll do a sweep to double check other > comments. > > > > Not sure it's better but before updating > the webrev, I wanted to try > to get input/consensus :) > > (Note hard_end was always further off > than end). > > src/hotspot/share/prims/jvmti.xml: > > 10357? ? ? ? id="can_sample_heap" since="9"> > 10358? ? ? ? ? > 10359? ? ? ? ? ?Can sample the heap. > 10360? ? ? ? ? ?If this capability > is enabled then the heap sampling > methods > can be called. > 10361? ? ? ? ? > 10362? ? ? ? > > Looks like this capability should > not be "since 9" if it gets integrated > now. > > Updated now to 11, crossing my fingers :) > > src/hotspot/share/runtime/heapMonitoring.cpp: > > ? 448? ? ? ?if > (is_alive->do_object_b(value)) { > ? 449? ? ? ? ?// Update the oop to > point to the new object if it is still > alive. > ? 450? ? ? ? ?f->do_oop(&(trace.obj)); > ? 451 > ? 452? ? ? ? ?// Copy the old > trace, if it is still live. > ? 453 > ?_allocated_traces->at_put(curr_pos++, trace); > ? 454 > ? 455? ? ? ? ?// Store the live > trace in a cache, to be served up on > /heapz. > ? 456 > ?_traces_on_last_full_gc->append(trace); > ? 457 > ? 458? ? ? ? ?count++; > ? 459? ? ? ?} else { > ? 460? ? ? ? ?// If the old trace > is no longer live, add it to the list of > ? 461? ? ? ? ?// recently collected > garbage. > ? 462 > ?store_garbage_trace(trace); > ? 463? ? ? ?} > > In the case where the oop was not > live, I would like it to be explicitly > cleared. > > Done I think how you wanted it. Let me > know because I'm not familiar > with the RootAccess API. I'm unclear if > I'm doing this right or not so > reviews of these parts are highly > appreciated. Robbin had talked of > perhaps later pushing this all into a > OopStorage, should I do this now > do you think? Or can that wait a second > webrev later down the road? > > I think using handles can and should be done > later. You can use the Access API now. > I noticed that you are missing an #include > "oops/access.inline.hpp" in your > heapMonitoring.cpp file. > > The missing header is there for me so I don't > know, I made sure it is present in the latest > webrev. Sorry about that. > > + Did I clear it the way you wanted me > to or were you thinking of > something else? > > > That is precisely how I wanted it to be > cleared. Thanks. > > + Final question here, seems like if I > were to want to not do the > f->do_oop directly on the trace.obj, I'd > need to do something like: > > ? ? f->do_oop(&value); > ? ? ... > ? ? trace->store_oop(value); > > to update the oop internally. Is that > right/is that one of the > advantages of going to the Oopstorage > sooner than later? > > > I think you really want to do the do_oop on > the root directly. Is there a particular > reason why you would not want to do that? > Otherwise, yes - the benefit with using the > handle approach is that you do not need to > call do_oop explicitly in your code. > > There is no reason except that now we have a > load_oop and a get_oop_addr, I was not sure what > you would think of that. > > That's fine. > > Also I see a lot of > concurrent-looking use of the > following field: > ? 267? ?volatile bool _initialized; > > Please note that the "volatile" > qualifier does not help with reordering > here. Reordering between volatile > and non-volatile fields is > completely free > for both compiler and hardware, > except for windows with MSVC, where > volatile > semantics is defined to use > acquire/release semantics, and the > hardware is > TSO. But for the general case, I > would expect this field to be stored > with > OrderAccess::release_store and > loaded with OrderAccess::load_acquire. > Otherwise it is not thread safe. > > Because everything is behind a mutex, I > wasn't really worried about > this. I have a test that has multiple > threads trying to hit this > corner case and it passes. > > However, to be paranoid, I updated it to > using the OrderAccess API > now, thanks! Let me know what you think > there too! > > > If it is indeed always supposed to be read > and written under a mutex, then I would > strongly prefer to have it accessed as a > normal non-volatile member, and have an > assertion that given lock is held or we are > in a safepoint, as we do in many other > places. Something like this: > > assert(HeapMonitorStorage_lock->owned_by_self() > || (SafepointSynchronize::is_at_safepoint() > && Thread::current()->is_VM_thread()), "this > should not be accessed concurrently"); > > It would be confusing to people reading the > code if there are uses of OrderAccess that > are actually always protected under a mutex. > > Thank you for the exact example to be put in the > code! I put it around each access/assignment of > the _initialized method and found one case where > yes you can touch it and not have the lock. It > actually is "ok" because you don't act on the > storage until later and only when you really > want to modify the storage (see the > object_alloc_do_sample method which calls the > add_trace method). > > But, because of this, I'm going to put the > OrderAccess here, I'll do some performance > numbers later and if there are issues, I might > add a "unsafe" read and a "safe" one to make it > explicit to the reader. But I don't think it > will come to that. > > > Okay. This double return in heapMonitoring.cpp looks > wrong: > > ?283?? bool initialized() { > ?284???? return > OrderAccess::load_acquire(&_initialized) != 0; > ?285???? return _initialized; > ?286?? } > > Since you said object_alloc_do_sample() is the only > place where you do not hold the mutex while reading > initialized(), I had a closer look at that. It looks > like in its current shape, the lack of a mutex may > lead to a memory leak. In particular, it first > checks if (initialized()). Let's assume this is now > true. It then allocates a bunch of stuff, and checks > if the number of frames were over 0. If they were, > it calls StackTraceStorage::storage()->add_trace() > seemingly hoping that after grabbing the lock in > there, initialized() will still return true. But it > could now return false and skip doing anything, in > which case the allocated stuff will never be freed. > > I fixed this now by making add_trace return a boolean > and checking for that. It will be in the next webrev. > Thanks, the truth is that in our implementation the > system is always on or off, so this never really occurs > :). In this version though, that is not true and it's > important to handle so thanks again! > > > So the analysis seems to be that _initialized is > only used outside of the mutex in once instance, > where it is used to perform double-checked locking, > that actually causes a memory leak. > > I am not proposing how to fix that, just raising the > issue. If you still want to perform this > double-checked locking somehow, then the use of > acquire/release still seems odd. Because the memory > ordering restrictions of it never comes into play in > this particular case. If it ever did, then the use > of destroy_stuff(); release_store(_initialized, 0) > would be broken anyway as that would imply that > whatever concurrent reader there ever was would > after reading _initialized with load_acquire() could > *never* read the data that is concurrently destroyed > anyway. I would be biased to think that > RawAccess::load/store looks like a more > appropriate solution, given that the memory leak > issue is resolved. I do not know how painful it > would be to not perform this double-checked locking. > > So I agree with this entirely. I looked also a bit more > and the difference and code really stems from our > internal version. In this version however, there are > actually a lot of things going on that I did not go > entirely through in my head but this comment made me > ponder a bit more on it. > > Since every object_alloc_do_sample is protected by a > check to HeapMonitoring::enabled(), there is only a > small chance that the call is happening when things have > been disabled. So there is no real need to do a first > check on the initialized, it is a rare occurence that a > call happens to object_alloc_do_sample and the > initialized of the storage returns false. > > (By the way, even if you did call object_alloc_do_sample > without looking at HeapMonitoring::enabled(), that would > be ok too. You would gather the stacktrace and get > nowhere at the add_trace call, which would return false; > so though not optimal performance wise, nothing would > break). > > Furthermore, the add_trace is really the moment of no > return and we have the mutex lock and then the > initialized check. So, in the end, I did two things: I > removed that first check and then I removed the > OrderAccess for the storage initialized. I think now I > have a better grasp and understanding why it was done in > our code and why it is not needed here. Thanks for > pointing it out :). This now still passes my JTREG > tests, especially the threaded one. > > > > As a kind of meta comment, I wonder > if it would make sense to add sampling > for non-TLAB allocations. Seems like > if someone is rapidly allocating a > whole bunch of 1 MB objects that > never fit in a TLAB, I might still be > interested in seeing that in my > traces, and not get surprised that the > allocation rate is very high yet not > showing up in any profiles. > > That is handled by the handle_sample > where you wanted me to put a > UseTlab because you hit that case if the > allocation is too big. > > > I see. It was not obvious to me that > non-TLAB sampling is done in the TLAB class. > That seems like an abstraction crime. > What I wanted in my previous comment was > that we do not call into the TLAB when we > are not using TLABs. If there is sampling > logic in the TLAB that is used for something > else than TLABs, then it seems like that > logic simply does not belong inside of the > TLAB. It should be moved out of the TLAB, > and instead have the TLAB call this common > abstraction that makes sense. > > So in the incremental version: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > , > this is still a "crime". The reason is that the > system has to have the bytes_until_sample on a > per-thread level and it made "sense" to have it > with the TLAB implementation. Also, I was not > sure how people felt about adding something to > the thread instance instead. > > Do you think it fits better at the Thread level? > I can see how difficult it is to make it happen > there and add some logic there. Let me know what > you think. > > > We have an unfortunate situation where everyone that > has some fields that are thread local tend to dump > them right into Thread, making the size and > complexity of Thread grow as it becomes tightly > coupled with various unrelated subsystems. It would > be desirable to have a separate class for this > instead that encapsulates the sampling logic. That > class could possibly reside in Thread though as a > value object of Thread. > > I imagined that would be the case but was not sure. I > will look at the example that Robbin is talking about > (ThreadSMR) and will see how to refactor my code to use > that. > > Thanks again for your help, > > Jc > > > > Hope I have answered your questions and that > my feedback makes sense to you. > > You have and thank you for them, I think we are > getting to a cleaner implementation and things > are getting better and more readable :) > > > Yes it is getting better. > > Thanks, > /Erik > > > > Thanks for your help! > > Jc > > Thanks, > /Erik > > I double checked by changing the test > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java > > > to use a smaller Tlab (2048) and made > the object bigger and it goes > through that and passes. > > Thanks again for your review and I look > forward to your pointers for > the questions I now have raised! > Jc > > > > > > > > Thanks, > /Erik > > > On 2018-01-26 06:45, JC Beyler wrote: > > Thanks Robbin for the reviews :) > > The new full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ > > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ > > > I inlined my answers: > > On Thu, Jan 25, 2018 at 1:15 AM, > Robbin Ehn > > > wrote: > > Hi JC, great to see another > revision! > > #### > heapMonitoring.cpp > > StackTraceData should not > contain the oop for 'safety' > reasons. > When StackTraceData is moved > from _allocated_traces: > L452 store_garbage_trace(trace); > it contains a dead oop. > _allocated_traces could > instead be a tupel of oop > and StackTraceData thus > dead oops are not kept. > > Done I used inheritance to make > the copier work regardless but the > idea is the same. > > You should use the new > Access API for loading the > oop, something like > this: > RootAccess AS_NO_KEEPALIVE>::load(...) > I don't think you need to > use Access API for clearing > the oop, but it > would > look nicer. And you > shouldn't probably be using: > Universe::heap()->is_in_reserved(value) > > I am unfamiliar with this but I > think I did do it like you wanted me > to (all tests pass so that's a > start). I'm not sure how to > clear the > oop exactly, is there somewhere > that does that, which I can use > to do > the same? > > I removed the is_in_reserved, > this came from our internal > version, I > don't know why it was there but > my tests work without so I > removed it > :) > > The lock: > L424? ?MutexLocker > mu(HeapMonitorStorage_lock); > Is not needed as far as I > can see. > weak_oops_do is called in a > safepoint, no TLAB > allocation can happen and > JVMTI thread can't access > these data-structures. Is > there something more > to > this lock that I'm missing? > > Since a thread can call the > JVMTI getLiveTraces (or any of > the other > ones), it can get to the point > of trying to copying the > _allocated_traces. I imagine it > is possible that this is happening > during a GC or that it can be > started and a GC happens afterwards. > Therefore, it seems to me that > you want this protected, no? > > #### > You have 6 files without any > changes in them (any more): > g1CollectedHeap.cpp > psMarkSweep.cpp > psParallelCompact.cpp > genCollectedHeap.cpp > referenceProcessor.cpp > thread.hpp > > Done. > > #### > I have not looked closely, > but is it possible to hide > heap sampling in > AllocTracer ? (with some > minor changes to the > AllocTracer API) > > I am imagining that you are > saying to move the code that > does the > sampling code (change the tlab > end, do the call to HeapMonitoring, > etc.) into the AllocTracer code > itself? I think that is right > and I'll > look if that is possible and > prepare a webrev to show what > would be > needed to make that happen. > > #### > Minor nit, when declaring > pointer there is a little > mix of having the > pointer adjacent by type > name and data name. (Most > hotspot code is by > type > name) > E.g. > heapMonitoring.cpp:711 > ?jvmtiStackTrace *trace = .... > heapMonitoring.cpp:733 > ? ?Method* m = vfst.method(); > (not just this file) > > Done! > > #### > HeapMonitorThreadOnOffTest.java:77 > I would make g_tmp volatile, > otherwise the assignment in > loop may > theoretical be skipped. > > Also done! > > Thanks again! > Jc > From jcbeyler at google.com Thu Apr 5 18:15:56 2018 From: jcbeyler at google.com (JC Beyler) Date: Thu, 05 Apr 2018 18:15:56 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: Thanks Boris and Derek for testing it. Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features. Here is the incremental webrev: Here is the full webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ Basically, the new tests assert this: - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor. Let me know what you think, Jc On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich wrote: > Hi JC, > > I have just checked on arm32: your patch compiles and runs ok. > > As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not > correspond to actual library name: libHeapMonitorTest.c -> > libHeapMonitorTest.so > > Boris > > On 04.04.2018 01:54, White, Derek wrote: > > Thanks JC, > > > > New patch applies cleanly. Compiles and runs (simple test programs) on > > aarch64. > > > > * Derek > > > > *From:* JC Beyler [mailto:jcbeyler at google.com] > > *Sent:* Monday, April 02, 2018 1:17 PM > > *To:* White, Derek > > *Cc:* Erik ?sterlund ; > > serviceability-dev at openjdk.java.net; hotspot-compiler-dev > > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi Derek, > > > > I know there were a few things that went in that provoked a merge > > conflict. I worked on it and got it up to date. Sadly my lack of > > knowledge makes it a full rebase instead of keeping all the history. > > However, with a newly cloned jdk/hs you should now be able to use: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ > > > > The change you are referring to was done with the others so perhaps you > > were unlucky and I forgot it in a webrev and fixed it in another? I > > don't know but it's been there and I checked, it is here: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html > > > > I double checked that tlab_end_offset no longer appears in any > > architecture (as far as I can tell :)). > > > > Thanks for testing and let me know if you run into any other issues! > > > > Jc > > > > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > > wrote: > > > > Hi Jc, > > > > I?ve been having trouble getting your patch to apply correctly. I > > may have based it on the wrong version. > > > > In any case, I think there?s a missing update to > > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), > > where ?JavaThread::tlab_end_offset()? should become > > ?JavaThread::tlab_current_end_offset()?. > > > > This should correspond to the other port?s changes in > > templateTable_.cpp files. > > > > Thanks! > > - Derek > > > > *From:* hotspot-compiler-dev > > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > > ] *On Behalf > > Of *JC Beyler > > *Sent:* Wednesday, March 28, 2018 11:43 AM > > *To:* Erik ?sterlund > > > > *Cc:* serviceability-dev at openjdk.java.net > > ; hotspot-compiler-dev > > > > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in > > the JVMTI spec. > > > > Here is the two incremental webrevs: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > > > - Currently the tests pass 100 times in a row, I am working on > > checking if they pass 1000 times in a row. > > > > - The default sampling rate is set to 512k, this is what we use > > internally and having a default means that to enable the sampling > > with the default, the user only has to do a enable event/disable > > event via JVMTI (instead of enable + set sample rate). > > > > - I deprecated the code that was handling the fast path tlab > > refill if it happened since this is now deprecated > > > > - Though I saw that Graal is still using it so I have to see > > what needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for > > when the event system is turned on and the callback to the native > > agent is just empty. I got a 3% overhead with a 512k sampling rate > > with the code I put in the native side of my tests. > > > > Thanks and comments are appreciated, > > > > Jc > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > > wrote: > > > > Hi all, > > > > The incremental webrev update is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > > > - I've removed the heapMonitoring.cpp code in favor of just > > having the sampling events as per Serguei's request; I still > > have to do some overhead measurements but the tests prove the > > concept can work > > > > - Most of the tlab code is unchanged, the only major > > part is that now things get sent off to event collectors when > > used and enabled. > > > > - Added the interpreter collectors to handle interpreter > > execution > > > > - Updated the name from SetTlabHeapSampling to > > SetHeapSampling to be more generic > > > > - Added a mutex for the thread sampling so that we can > > initialize an internal static array safely > > > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed > > the heap sampling code entirely and now rely entirely on the > > event sampling system. The tests reflect this by using a > > simplified implementation of what an agent could do: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had > > originally in heapMonitoring to this. I hesitate only because > > that code was in C++, I'd have to port it to C and this is for > > tests so perhaps what I have now is good enough? > > > > As far as testing goes, I've ported all the relevant tests and > > then added a few: > > > > - Turning the system on/off > > > > - Testing using various GCs > > > > - Testing using the interpreter > > > > - Testing the sampling rate > > > > - Testing with objects and arrays > > > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off > > vs a clean build and I have 0% overhead, which is what we'd > > want. This was using the Dacapo benchmarks. I am now preparing > > to run a version with the events on using dacapo and will report > > back here. > > > > Any comments are welcome :) > > > > Jc > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > > wrote: > > > > Hi all, > > > > I apologize for the delay but I wanted to add an event > > system and that took a bit longer than expected and I also > > reworked the code to take into account the deprecation of > > FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to > > ThreadHeapSampler inside of Thread. Would you prefer it as a > > pointer inside of Thread or like this works for you? Second > > question would be would you rather have an association > > outside of Thread altogether that tries to remember when > > threads are live and then we would have something like: > > > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too > > too bad? > > > > B) I also have been working on the Allocation event system > > that sends out a notification at each sampled event. This > > will be practical when wanting to do something at the > > allocation point. I'm also looking at if the whole > > heapMonitoring code could not reside in the agent code and > > not in the JDK. I'm not convinced but I'm talking to Serguei > > about it to see/assess :) > > > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since > > now FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the > > methods. > > > > So the incremental webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track > > this :) > > > > Thanks for your input, > > > > Jc > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > > > wrote: > > > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer > > Robbin's concerns about the same thing (adding things to > > Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > > > > wrote: > > > > Hi JC, > > > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own > > answers/comments. > > > > I've done a new webrev here: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> > > > > The incremental is here: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> > > > > Note to all: > > > > - I've been integrating changes from > > Erin/Serguei/David comments so this webrev > > incremental is a bit an answer to all comments > > in one. I apologize for that :) > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > > > > wrote: > > > > Hi JC, > > > > Sorry for the delayed reply. > > > > Inlined answers: > > > > > > > > On 2018-02-06 00:04, JC Beyler wrote: > > > > Hi Erik, > > > > (Renaming this to be folded into the > > newly renamed thread :)) > > > > First off, thanks a lot for reviewing > > the webrev! I appreciate it! > > > > I updated the webrev to: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> > > > > And the incremental one is here: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> > > > > It contains: > > - The change for since from 9 to 11 for > > the jvmti.xml > > - The use of the OrderAccess for > initialized > > - Clearing the oop > > > > I also have inlined my answers to your > > comments. The biggest question > > will come from the multiple *_end > > variables. A bit of the logic there > > is due to handling the slow path refill > > vs fast path refill and > > checking that the rug was not pulled > > underneath the slowpath. I > > believe that a previous comment was that > > TlabFastRefill was going to > > be deprecated. > > > > If this is true, we could revert this > > code a bit and just do a : if > > TlabFastRefill is enabled, disable this. > > And then deprecate that when > > TlabFastRefill is deprecated. > > > > This might simplify this webrev and I > > can work on a follow-up that > > either: removes TlabFastRefill if Robbin > > does not have the time to do > > it or add the support to the assembly > > side to handle this correctly. > > What do you think? > > > > I support removing TlabFastRefill, but I > > think it is good to not depend on that > > happening first. > > > > > > I'm slowly pushing on the FastTLABRefill > > ( > https://bugs.openjdk.java.net/browse/JDK-8194084), > > I agree on keeping both separate for now though > > so that we can think of both differently > > > > Now, below, inlined are my answers: > > > > On Fri, Feb 2, 2018 at 8:44 AM, Erik > > ?sterlund > > > > > wrote: > > > > Hi JC, > > > > Hope I am reviewing the right > > version of your work. Here goes... > > > > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > > > 159 > > > AllocTracer::send_allocation_outside_tlab(klass, result, size * > > HeapWordSize, THREAD); > > 160 > > 161 > > > THREAD->tlab().handle_sample(THREAD, result, size); > > 162 return result; > > 163 } > > > > Should not call tlab()->X without > > checking if (UseTLAB) IMO. > > > > Done! > > > > > > More about this later. > > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > > > So first of all, there seems to > > quite a few ends. There is an "end", > > a "hard > > end", a "slow path end", and an > > "actual end". Moreover, it seems > > like the > > "hard end" is actually further away > > than the "actual end". So the "hard > end" > > seems like more of a "really > > definitely actual end" or something. > > I don't > > know about you, but I think it looks > > kind of messy. In particular, I don't > > feel like the name "actual end" > > reflects what it represents, > > especially when > > there is another end that is behind > > the "actual end". > > > > 413 HeapWord* > > ThreadLocalAllocBuffer::hard_end() { > > 414 // Did a fast TLAB refill > > occur? > > 415 if (_slow_path_end != _end) > { > > 416 // Fix up the actual end > > to be now the end of this TLAB. > > 417 _slow_path_end = _end; > > 418 _actual_end = _end; > > 419 } > > 420 > > 421 return _actual_end + > > alignment_reserve(); > > 422 } > > > > I really do not like making getters > > unexpectedly have these kind of side > > effects. It is not expected that > > when you ask for the "hard end", you > > implicitly update the "slow path > > end" and "actual end" to new values. > > > > As I said, a lot of this is due to the > > FastTlabRefill. If I make this > > not supporting FastTlabRefill, this goes > > away. The reason the system > > needs to update itself at the get is > > that you only know at that get if > > things have shifted underneath the tlab > > slow path. I am not sure of > > really better names (naming is hard!), > > perhaps we could do these > > names: > > > > - current_tlab_end // Either the > > allocated tlab end or a sampling point > > - last_allocation_address // The end of > > the tlab allocation > > - last_slowpath_allocated_end // In > > case a fast refill occurred the > > end might have changed, this is to > > remember slow vs fast past refills > > > > the hard_end method can be renamed to > > something like: > > tlab_end_pointer() // The end of > > the lab including a bit of > > alignment reserved bytes > > > > Those names sound better to me. Could you > > please provide a mapping from the old names > > to the new names so I understand which one > > is which please? > > > > This is my current guess of what you are > > proposing: > > > > end -> current_tlab_end > > actual_end -> last_allocation_address > > slow_path_end -> last_slowpath_allocated_end > > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was > proposing. > > > > I would prefer this naming: > > > > end -> slow_path_end // the end for taking a > > slow path; either due to sampling or > refilling > > actual_end -> allocation_end // the end for > > allocations > > slow_path_end -> last_slow_path_end // last > > address for slow_path_end (as opposed to > > allocation_end) > > hard_end -> reserved_end // the end of the > > reserved space of the TLAB > > > > About setting things in the getter... that > > still seems like a very unpleasant thing to > > me. It would be better to inspect the call > > hierarchy and explicitly update the ends > > where they need updating, and assert in the > > getter that they are in sync, rather than > > implicitly setting various ends as a > > surprising side effect in a getter. It looks > > like the call hierarchy is very small. With > > my new naming convention, reserved_end() > > would presumably return _allocation_end + > > alignment_reserve(), and have an assert > > checking that _allocation_end == > > _last_slow_path_allocation_end, complaining > > that this invariant must hold, and that a > > caller to this function, such as > > make_parsable(), must first explicitly > > synchronize the ends as required, to honor > > that invariant. > > > > > > I've renamed the variables to how you preferred > > it except for the _end one. I did: > > > > current_end > > > > last_allocation_address > > > > tlab_end_ptr > > > > The reason is that the architecture dependent > > code use the thread.hpp API and it already has > > tlab included into the name so it becomes > > tlab_current_end (which is better that > > tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method > > with a TODO that says to remove it when > > FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > > > Note that the following comment now needs updating > > accordingly in threadLocalAllocBuffer.hpp: > > > > 41 // Heap sampling is performed via > > the end/actual_end fields. > > > > 42 // actual_end contains the real end > > of the tlab allocation, > > > > 43 // whereas end can be set to an > > arbitrary spot in the tlab to > > > > 44 // trip the return and sample the > > allocation. > > > > 45 // slow_path_end is used to track > > if a fast tlab refill occured > > > > 46 // between slowpath calls. > > > > There might be other comments too, I have not looked > > in detail. > > > > This was the only spot that still had an actual_end, I > > fixed it now. I'll do a sweep to double check other > > comments. > > > > > > > > Not sure it's better but before updating > > the webrev, I wanted to try > > to get input/consensus :) > > > > (Note hard_end was always further off > > than end). > > > > src/hotspot/share/prims/jvmti.xml: > > > > 10357 > id="can_sample_heap" since="9"> > > 10358 > > 10359 Can sample the heap. > > 10360 If this capability > > is enabled then the heap sampling > > methods > > can be called. > > 10361 > > 10362 > > > > Looks like this capability should > > not be "since 9" if it gets > integrated > > now. > > > > Updated now to 11, crossing my fingers :) > > > > > src/hotspot/share/runtime/heapMonitoring.cpp: > > > > 448 if > > (is_alive->do_object_b(value)) { > > 449 // Update the oop to > > point to the new object if it is > still > > alive. > > 450 > f->do_oop(&(trace.obj)); > > 451 > > 452 // Copy the old > > trace, if it is still live. > > 453 > > > _allocated_traces->at_put(curr_pos++, trace); > > 454 > > 455 // Store the live > > trace in a cache, to be served up on > > /heapz. > > 456 > > > _traces_on_last_full_gc->append(trace); > > 457 > > 458 count++; > > 459 } else { > > 460 // If the old trace > > is no longer live, add it to the > list of > > 461 // recently collected > > garbage. > > 462 > > store_garbage_trace(trace); > > 463 } > > > > In the case where the oop was not > > live, I would like it to be > explicitly > > cleared. > > > > Done I think how you wanted it. Let me > > know because I'm not familiar > > with the RootAccess API. I'm unclear if > > I'm doing this right or not so > > reviews of these parts are highly > > appreciated. Robbin had talked of > > perhaps later pushing this all into a > > OopStorage, should I do this now > > do you think? Or can that wait a second > > webrev later down the road? > > > > I think using handles can and should be done > > later. You can use the Access API now. > > I noticed that you are missing an #include > > "oops/access.inline.hpp" in your > > heapMonitoring.cpp file. > > > > The missing header is there for me so I don't > > know, I made sure it is present in the latest > > webrev. Sorry about that. > > > > + Did I clear it the way you wanted me > > to or were you thinking of > > something else? > > > > > > That is precisely how I wanted it to be > > cleared. Thanks. > > > > + Final question here, seems like if I > > were to want to not do the > > f->do_oop directly on the trace.obj, I'd > > need to do something like: > > > > f->do_oop(&value); > > ... > > trace->store_oop(value); > > > > to update the oop internally. Is that > > right/is that one of the > > advantages of going to the Oopstorage > > sooner than later? > > > > > > I think you really want to do the do_oop on > > the root directly. Is there a particular > > reason why you would not want to do that? > > Otherwise, yes - the benefit with using the > > handle approach is that you do not need to > > call do_oop explicitly in your code. > > > > There is no reason except that now we have a > > load_oop and a get_oop_addr, I was not sure what > > you would think of that. > > > > That's fine. > > > > Also I see a lot of > > concurrent-looking use of the > > following field: > > 267 volatile bool _initialized; > > > > Please note that the "volatile" > > qualifier does not help with > reordering > > here. Reordering between volatile > > and non-volatile fields is > > completely free > > for both compiler and hardware, > > except for windows with MSVC, where > > volatile > > semantics is defined to use > > acquire/release semantics, and the > > hardware is > > TSO. But for the general case, I > > would expect this field to be stored > > with > > OrderAccess::release_store and > > loaded with > OrderAccess::load_acquire. > > Otherwise it is not thread safe. > > > > Because everything is behind a mutex, I > > wasn't really worried about > > this. I have a test that has multiple > > threads trying to hit this > > corner case and it passes. > > > > However, to be paranoid, I updated it to > > using the OrderAccess API > > now, thanks! Let me know what you think > > there too! > > > > > > If it is indeed always supposed to be read > > and written under a mutex, then I would > > strongly prefer to have it accessed as a > > normal non-volatile member, and have an > > assertion that given lock is held or we are > > in a safepoint, as we do in many other > > places. Something like this: > > > > > assert(HeapMonitorStorage_lock->owned_by_self() > > || (SafepointSynchronize::is_at_safepoint() > > && Thread::current()->is_VM_thread()), "this > > should not be accessed concurrently"); > > > > It would be confusing to people reading the > > code if there are uses of OrderAccess that > > are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the > > code! I put it around each access/assignment of > > the _initialized method and found one case where > > yes you can touch it and not have the lock. It > > actually is "ok" because you don't act on the > > storage until later and only when you really > > want to modify the storage (see the > > object_alloc_do_sample method which calls the > > add_trace method). > > > > But, because of this, I'm going to put the > > OrderAccess here, I'll do some performance > > numbers later and if there are issues, I might > > add a "unsafe" read and a "safe" one to make it > > explicit to the reader. But I don't think it > > will come to that. > > > > > > Okay. This double return in heapMonitoring.cpp looks > > wrong: > > > > 283 bool initialized() { > > 284 return > > OrderAccess::load_acquire(&_initialized) != 0; > > 285 return _initialized; > > 286 } > > > > Since you said object_alloc_do_sample() is the only > > place where you do not hold the mutex while reading > > initialized(), I had a closer look at that. It looks > > like in its current shape, the lack of a mutex may > > lead to a memory leak. In particular, it first > > checks if (initialized()). Let's assume this is now > > true. It then allocates a bunch of stuff, and checks > > if the number of frames were over 0. If they were, > > it calls StackTraceStorage::storage()->add_trace() > > seemingly hoping that after grabbing the lock in > > there, initialized() will still return true. But it > > could now return false and skip doing anything, in > > which case the allocated stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean > > and checking for that. It will be in the next webrev. > > Thanks, the truth is that in our implementation the > > system is always on or off, so this never really occurs > > :). In this version though, that is not true and it's > > important to handle so thanks again! > > > > > > So the analysis seems to be that _initialized is > > only used outside of the mutex in once instance, > > where it is used to perform double-checked locking, > > that actually causes a memory leak. > > > > I am not proposing how to fix that, just raising the > > issue. If you still want to perform this > > double-checked locking somehow, then the use of > > acquire/release still seems odd. Because the memory > > ordering restrictions of it never comes into play in > > this particular case. If it ever did, then the use > > of destroy_stuff(); release_store(_initialized, 0) > > would be broken anyway as that would imply that > > whatever concurrent reader there ever was would > > after reading _initialized with load_acquire() could > > *never* read the data that is concurrently destroyed > > anyway. I would be biased to think that > > RawAccess::load/store looks like a more > > appropriate solution, given that the memory leak > > issue is resolved. I do not know how painful it > > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more > > and the difference and code really stems from our > > internal version. In this version however, there are > > actually a lot of things going on that I did not go > > entirely through in my head but this comment made me > > ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a > > check to HeapMonitoring::enabled(), there is only a > > small chance that the call is happening when things have > > been disabled. So there is no real need to do a first > > check on the initialized, it is a rare occurence that a > > call happens to object_alloc_do_sample and the > > initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample > > without looking at HeapMonitoring::enabled(), that would > > be ok too. You would gather the stacktrace and get > > nowhere at the add_trace call, which would return false; > > so though not optimal performance wise, nothing would > > break). > > > > Furthermore, the add_trace is really the moment of no > > return and we have the mutex lock and then the > > initialized check. So, in the end, I did two things: I > > removed that first check and then I removed the > > OrderAccess for the storage initialized. I think now I > > have a better grasp and understanding why it was done in > > our code and why it is not needed here. Thanks for > > pointing it out :). This now still passes my JTREG > > tests, especially the threaded one. > > > > > > > > As a kind of meta comment, I wonder > > if it would make sense to add > sampling > > for non-TLAB allocations. Seems like > > if someone is rapidly allocating a > > whole bunch of 1 MB objects that > > never fit in a TLAB, I might still be > > interested in seeing that in my > > traces, and not get surprised that > the > > allocation rate is very high yet not > > showing up in any profiles. > > > > That is handled by the handle_sample > > where you wanted me to put a > > UseTlab because you hit that case if the > > allocation is too big. > > > > > > I see. It was not obvious to me that > > non-TLAB sampling is done in the TLAB class. > > That seems like an abstraction crime. > > What I wanted in my previous comment was > > that we do not call into the TLAB when we > > are not using TLABs. If there is sampling > > logic in the TLAB that is used for something > > else than TLABs, then it seems like that > > logic simply does not belong inside of the > > TLAB. It should be moved out of the TLAB, > > and instead have the TLAB call this common > > abstraction that makes sense. > > > > So in the incremental version: > > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > < > http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/>, > > this is still a "crime". The reason is that the > > system has to have the bytes_until_sample on a > > per-thread level and it made "sense" to have it > > with the TLAB implementation. Also, I was not > > sure how people felt about adding something to > > the thread instance instead. > > > > Do you think it fits better at the Thread level? > > I can see how difficult it is to make it happen > > there and add some logic there. Let me know what > > you think. > > > > > > We have an unfortunate situation where everyone that > > has some fields that are thread local tend to dump > > them right into Thread, making the size and > > complexity of Thread grow as it becomes tightly > > coupled with various unrelated subsystems. It would > > be desirable to have a separate class for this > > instead that encapsulates the sampling logic. That > > class could possibly reside in Thread though as a > > value object of Thread. > > > > I imagined that would be the case but was not sure. I > > will look at the example that Robbin is talking about > > (ThreadSMR) and will see how to refactor my code to use > > that. > > > > Thanks again for your help, > > > > Jc > > > > > > > > Hope I have answered your questions and that > > my feedback makes sense to you. > > > > You have and thank you for them, I think we are > > getting to a cleaner implementation and things > > are getting better and more readable :) > > > > > > Yes it is getting better. > > > > Thanks, > > /Erik > > > > > > > > Thanks for your help! > > > > Jc > > > > Thanks, > > /Erik > > > > I double checked by changing the test > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Fri Apr 6 15:01:41 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Fri, 6 Apr 2018 15:01:41 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework Message-ID: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> Hi, can I please get reviews for a set of clean up changes that I came across when doing some integration work. Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ Detailed comments about the changes can be found in the bug. Thanks & best regards Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pietro.Paolini at alfasystems.com Fri Apr 6 16:13:49 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Fri, 6 Apr 2018 16:13:49 +0000 Subject: =?Windows-1252?Q?inspect_a_thread=92s_stack_?= Message-ID: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Hi all, I apologise if this is not the right ML for it but I couldn?t find exactly what I was looking for when Googling the problem. I am a bit new to the JDI world. I would like to inspect the stack-frame of a specific thread, I came across the StackFrame/ThreadReference classes but I couldn?t find a way examples where their usage is shown without connecting to the VM somehow, like a debugger would do. Is it possible to inspect a thread?s stack ?locally? ? In my mind I could be able to have a function such as : static void hook(Thread thread) { thread.wait() // stop that thread // inspect the frames of that thread doing any needed business with them } I?d need this for diagnostic purposes of my application. Thanks, Pietro Pietro Paolini Consultant Alfa ________________________________ e: pietro.paolini at alfasystems.com | w: alfasystems.com t: +44 (0) 20 7920-2643 | Moor Place, 1 Fore Street Avenue, London, EC2Y 9DT, GB ________________________________ The contents of this communication are not intended to be binding or constitute any form of offer or acceptance or give rise to any legal obligations on behalf of the sender or Alfa. The views or opinions expressed represent those of the author and not necessarily those of Alfa. This email and any attachments are strictly confidential and are intended solely for use by the individual or entity to whom it is addressed. If you are not the addressee (or responsible for delivery of the message to the addressee) you may not copy, forward, disclose or use any part of the message or its attachments. At present the integrity of email across the internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the use of this medium to transmit information by or to Alfa or its affiliates. Alfa Financial Software Ltd Reg. in England No: 0248 2325 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jini.george at oracle.com Fri Apr 6 16:21:50 2018 From: jini.george at oracle.com (Jini George) Date: Fri, 6 Apr 2018 21:51:50 +0530 Subject: RFR: JDK-8174994: SA: clhsdb printmdo throws WrongTypeException when attached to a process with CDS Message-ID: <93a055d6-133c-f92f-3408-368eea959326@oracle.com> Hello! Requesting reviews for: https://bugs.openjdk.java.net/browse/JDK-8174994 Webrev: http://cr.openjdk.java.net/~jgeorge/8174994/webrev.00/ While trying to identify the type given an address, a WrongTypeException was getting thrown with various clhsdb commands (like printmdo, jstack, etc). This was since SA tries to map an address to a hotspot C++ type by comparing the vtable address to the vtable address values of known types. With CDS, since the vtables are copied over for the Metadata classes, the vtable addresses themselves don't match (though, of course, the contents will), and SA errors out. The fix has been implemented by making changes to read in the md region (consisting of the c++ vtables) of the CDS archive in SA, and mapping the vtable addresses to the corresponding metadata type (ConstantPool, InstanceKlass, InstanceClassLoaderKlass, InstanceMirrorKlass, InstanceRefKlass, Method, ObjArrayKlass, TypeArrayKlass). For corefiles, an additional modification has been done to have the replicated FileMapHeader structure (from src/hotspot/share/memory/filemap.hpp, which is replicated in SA in ps_core.c), to be in sync with the corresponding definition in src/hotspot/share/memory/filemap.hpp. Test cases to test both live and corefile debugging are being added with this. These and other SA tests pass on Mach5. Thanks, Jini. From chris.plummer at oracle.com Fri Apr 6 16:37:09 2018 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 6 Apr 2018 09:37:09 -0700 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> Message-ID: <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> Hi Christoph, Can you explain a bit more about "fix handling of null values in ArgumentIterator::next". When does this turn up? Is there a test case? Everything else looks good. thanks, Chris On 4/6/18 8:01 AM, Langer, Christoph wrote: > > Hi, > > can I please get reviews for a set of clean up changes that I came > across when doing some integration work. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > > > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > > > Detailed comments about the changes can be found in the bug. > > Thanks & best regards > > Christoph > From karen.kinnear at oracle.com Fri Apr 6 21:40:47 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Fri, 6 Apr 2018 17:40:47 -0400 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: JC, Thank you for the updates - really glad you are including the compiler folks. I reviewed the version before this one, so ignore any comments you?ve already covered (although I did peek at the latest) 1. JDK-8194905 CSR - could you please delete attachments that are not current. It is a bit confusing right now. I have been looking at jvmti_event6.html. I am assuming the rest are obsolete and could be removed please. 2. In jvmti_event6.html under Sampled Object Allocation, there is a link to ?Heap Sampling Monitoring System?. It takes me to the top of the page - seems like something is missing in defining it? 3. Scope of memory allocation tracking I am struggling to understand the extent of memory allocation tracking that you are looking for (probably want to clarify in the JEP and CSR once we work this through). e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent So the current jvmtiVMObjectAllocEvent says: Sent when a method causes the VM to allocate an Object visible to Java and allocation not detectable by other instrumentation mechanisms. Generally detect by instrumenting bytecodes of allocating methods JNI - use JNI function interception e.g. Reflection: java.lang.Class.newInstance() e.g. VM intrinsics comment: Not generated due to bytecodes - e.g. new and newarray VM instructions Not allocation due to JNI: e.g. AllocObject NOT VM internal objects NOT allocations during VM init So from the JEP I can?t tell the intended scope of the new event - is this intended to cover all heap allocation? bytecodes JVM_* JNI_* internal VM objects other? (I?m not sure what other there are) - I presume not allocations during VM init - since sent only during live phase OR - is the primary goal to cover allocation for bytecodes so folks can skip instrumentation? OR - do you want to get performance numbers and see what is low enough overhead before deciding? 4. The design question is where to put the collectors in the source base - and that of course strongly depends on the scope of the information you want to collect, and on the performance overhead we are willing to incur. I was trying to figure out a way to put the collectors farther down the call stack so as to both catch more cases and to reduce the maintenance burden - i.e. if you were to add a new code generator, e.g. Graal - if it were to go through an existing interface, that might be a place to already have a collector. I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - and it appears that there are calls to instanceKlass::new_instance, oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, so one possibility would be to put hooks in those calls which would catch many? (I did not do a thorough search) of the slowpath calls for the bytecodes, and then check the fast paths in detail. I had wondered if it made sense to move the hooks even farther down, into CollectedHeap:obj_allocate and array_allocate. I do not think so. First reason is that for multidimensional arrays, ArrayKlass::multi_allocate the outer dimension array would have an event before storing the inner sub-arrays and I don?t think we want that exposed, so that won?t work for arrays. The second reason is that I strongly suspect the scope you want is bytecodes only. I think once you have added hooks to all the fast paths and slow paths that this will be pushing the performance overhead constraints you proposed and you won?t want to see e.g. internal allocations. But I think you need to experiment with the set of allocations (or possible alternative sets of allocations) you want recorded. The hooks I see today include: Interpreter: (looking at x86 as a sample) - slowpath in InterpreterRuntime - fastpath tlab allocation - your new threshold check handles that - allow_shared_alloc (GC specific): for _new isn?t handled C1 I don?t see changes in c1_Runtime.cpp note: you also want to look for the fast path C2: changes in opto/runtime.cpp for slow path did you also catch the fast path? 3. Performance - After you get all the collectors added - you need to rerun the performance numbers. thanks, Karen > On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: > > Thanks Boris and Derek for testing it. > > Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features. > > Here is the incremental webrev: > > Here is the full webrev: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ > > Basically, the new tests assert this: > - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one > - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit > > There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor. > > Let me know what you think, > Jc > > On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich > wrote: > Hi JC, > > I have just checked on arm32: your patch compiles and runs ok. > > As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not > correspond to actual library name: libHeapMonitorTest.c -> > libHeapMonitorTest.so > > Boris > > On 04.04.2018 01:54, White, Derek wrote: > > Thanks JC, > > > > New patch applies cleanly. Compiles and runs (simple test programs) on > > aarch64. > > > > * Derek > > > > *From:* JC Beyler [mailto:jcbeyler at google.com ] > > *Sent:* Monday, April 02, 2018 1:17 PM > > *To:* White, Derek > > > *Cc:* Erik ?sterlund >; > > serviceability-dev at openjdk.java.net ; hotspot-compiler-dev > > > > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi Derek, > > > > I know there were a few things that went in that provoked a merge > > conflict. I worked on it and got it up to date. Sadly my lack of > > knowledge makes it a full rebase instead of keeping all the history. > > However, with a newly cloned jdk/hs you should now be able to use: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ > > > > The change you are referring to was done with the others so perhaps you > > were unlucky and I forgot it in a webrev and fixed it in another? I > > don't know but it's been there and I checked, it is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html > > > > I double checked that tlab_end_offset no longer appears in any > > architecture (as far as I can tell :)). > > > > Thanks for testing and let me know if you run into any other issues! > > > > Jc > > > > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > > >> wrote: > > > > Hi Jc, > > > > I?ve been having trouble getting your patch to apply correctly. I > > may have based it on the wrong version. > > > > In any case, I think there?s a missing update to > > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), > > where ?JavaThread::tlab_end_offset()? should become > > ?JavaThread::tlab_current_end_offset()?. > > > > This should correspond to the other port?s changes in > > templateTable_.cpp files. > > > > Thanks! > > - Derek > > > > *From:* hotspot-compiler-dev > > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > > >] *On Behalf > > Of *JC Beyler > > *Sent:* Wednesday, March 28, 2018 11:43 AM > > *To:* Erik ?sterlund > > >> > > *Cc:* serviceability-dev at openjdk.java.net > > >; hotspot-compiler-dev > > > > >> > > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in > > the JVMTI spec. > > > > Here is the two incremental webrevs: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > > > - Currently the tests pass 100 times in a row, I am working on > > checking if they pass 1000 times in a row. > > > > - The default sampling rate is set to 512k, this is what we use > > internally and having a default means that to enable the sampling > > with the default, the user only has to do a enable event/disable > > event via JVMTI (instead of enable + set sample rate). > > > > - I deprecated the code that was handling the fast path tlab > > refill if it happened since this is now deprecated > > > > - Though I saw that Graal is still using it so I have to see > > what needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for > > when the event system is turned on and the callback to the native > > agent is just empty. I got a 3% overhead with a 512k sampling rate > > with the code I put in the native side of my tests. > > > > Thanks and comments are appreciated, > > > > Jc > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > > >> wrote: > > > > Hi all, > > > > The incremental webrev update is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > > > - I've removed the heapMonitoring.cpp code in favor of just > > having the sampling events as per Serguei's request; I still > > have to do some overhead measurements but the tests prove the > > concept can work > > > > - Most of the tlab code is unchanged, the only major > > part is that now things get sent off to event collectors when > > used and enabled. > > > > - Added the interpreter collectors to handle interpreter > > execution > > > > - Updated the name from SetTlabHeapSampling to > > SetHeapSampling to be more generic > > > > - Added a mutex for the thread sampling so that we can > > initialize an internal static array safely > > > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed > > the heap sampling code entirely and now rely entirely on the > > event sampling system. The tests reflect this by using a > > simplified implementation of what an agent could do: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had > > originally in heapMonitoring to this. I hesitate only because > > that code was in C++, I'd have to port it to C and this is for > > tests so perhaps what I have now is good enough? > > > > As far as testing goes, I've ported all the relevant tests and > > then added a few: > > > > - Turning the system on/off > > > > - Testing using various GCs > > > > - Testing using the interpreter > > > > - Testing the sampling rate > > > > - Testing with objects and arrays > > > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off > > vs a clean build and I have 0% overhead, which is what we'd > > want. This was using the Dacapo benchmarks. I am now preparing > > to run a version with the events on using dacapo and will report > > back here. > > > > Any comments are welcome :) > > > > Jc > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > > >> wrote: > > > > Hi all, > > > > I apologize for the delay but I wanted to add an event > > system and that took a bit longer than expected and I also > > reworked the code to take into account the deprecation of > > FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to > > ThreadHeapSampler inside of Thread. Would you prefer it as a > > pointer inside of Thread or like this works for you? Second > > question would be would you rather have an association > > outside of Thread altogether that tries to remember when > > threads are live and then we would have something like: > > > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too > > too bad? > > > > B) I also have been working on the Allocation event system > > that sends out a notification at each sampled event. This > > will be practical when wanting to do something at the > > allocation point. I'm also looking at if the whole > > heapMonitoring code could not reside in the agent code and > > not in the JDK. I'm not convinced but I'm talking to Serguei > > about it to see/assess :) > > > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since > > now FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the > > methods. > > > > So the incremental webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track > > this :) > > > > Thanks for your input, > > > > Jc > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler > > >> wrote: > > > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer > > Robbin's concerns about the same thing (adding things to > > Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund > > > > >> wrote: > > > > Hi JC, > > > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own > > answers/comments. > > > > I've done a new webrev here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > > > > > The incremental is here: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > > > > > Note to all: > > > > - I've been integrating changes from > > Erin/Serguei/David comments so this webrev > > incremental is a bit an answer to all comments > > in one. I apologize for that :) > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund > > > > >> wrote: > > > > Hi JC, > > > > Sorry for the delayed reply. > > > > Inlined answers: > > > > > > > > On 2018-02-06 00:04, JC Beyler wrote: > > > > Hi Erik, > > > > (Renaming this to be folded into the > > newly renamed thread :)) > > > > First off, thanks a lot for reviewing > > the webrev! I appreciate it! > > > > I updated the webrev to: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > > > > > > And the incremental one is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > > > > > > It contains: > > - The change for since from 9 to 11 for > > the jvmti.xml > > - The use of the OrderAccess for initialized > > - Clearing the oop > > > > I also have inlined my answers to your > > comments. The biggest question > > will come from the multiple *_end > > variables. A bit of the logic there > > is due to handling the slow path refill > > vs fast path refill and > > checking that the rug was not pulled > > underneath the slowpath. I > > believe that a previous comment was that > > TlabFastRefill was going to > > be deprecated. > > > > If this is true, we could revert this > > code a bit and just do a : if > > TlabFastRefill is enabled, disable this. > > And then deprecate that when > > TlabFastRefill is deprecated. > > > > This might simplify this webrev and I > > can work on a follow-up that > > either: removes TlabFastRefill if Robbin > > does not have the time to do > > it or add the support to the assembly > > side to handle this correctly. > > What do you think? > > > > I support removing TlabFastRefill, but I > > think it is good to not depend on that > > happening first. > > > > > > I'm slowly pushing on the FastTLABRefill > > (https://bugs.openjdk.java.net/browse/JDK-8194084 ), > > I agree on keeping both separate for now though > > so that we can think of both differently > > > > Now, below, inlined are my answers: > > > > On Fri, Feb 2, 2018 at 8:44 AM, Erik > > ?sterlund > > > > >> wrote: > > > > Hi JC, > > > > Hope I am reviewing the right > > version of your work. Here goes... > > > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > > > 159 > > AllocTracer::send_allocation_outside_tlab(klass, result, size * > > HeapWordSize, THREAD); > > 160 > > 161 > > THREAD->tlab().handle_sample(THREAD, result, size); > > 162 return result; > > 163 } > > > > Should not call tlab()->X without > > checking if (UseTLAB) IMO. > > > > Done! > > > > > > More about this later. > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > > > So first of all, there seems to > > quite a few ends. There is an "end", > > a "hard > > end", a "slow path end", and an > > "actual end". Moreover, it seems > > like the > > "hard end" is actually further away > > than the "actual end". So the "hard end" > > seems like more of a "really > > definitely actual end" or something. > > I don't > > know about you, but I think it looks > > kind of messy. In particular, I don't > > feel like the name "actual end" > > reflects what it represents, > > especially when > > there is another end that is behind > > the "actual end". > > > > 413 HeapWord* > > ThreadLocalAllocBuffer::hard_end() { > > 414 // Did a fast TLAB refill > > occur? > > 415 if (_slow_path_end != _end) { > > 416 // Fix up the actual end > > to be now the end of this TLAB. > > 417 _slow_path_end = _end; > > 418 _actual_end = _end; > > 419 } > > 420 > > 421 return _actual_end + > > alignment_reserve(); > > 422 } > > > > I really do not like making getters > > unexpectedly have these kind of side > > effects. It is not expected that > > when you ask for the "hard end", you > > implicitly update the "slow path > > end" and "actual end" to new values. > > > > As I said, a lot of this is due to the > > FastTlabRefill. If I make this > > not supporting FastTlabRefill, this goes > > away. The reason the system > > needs to update itself at the get is > > that you only know at that get if > > things have shifted underneath the tlab > > slow path. I am not sure of > > really better names (naming is hard!), > > perhaps we could do these > > names: > > > > - current_tlab_end // Either the > > allocated tlab end or a sampling point > > - last_allocation_address // The end of > > the tlab allocation > > - last_slowpath_allocated_end // In > > case a fast refill occurred the > > end might have changed, this is to > > remember slow vs fast past refills > > > > the hard_end method can be renamed to > > something like: > > tlab_end_pointer() // The end of > > the lab including a bit of > > alignment reserved bytes > > > > Those names sound better to me. Could you > > please provide a mapping from the old names > > to the new names so I understand which one > > is which please? > > > > This is my current guess of what you are > > proposing: > > > > end -> current_tlab_end > > actual_end -> last_allocation_address > > slow_path_end -> last_slowpath_allocated_end > > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was proposing. > > > > I would prefer this naming: > > > > end -> slow_path_end // the end for taking a > > slow path; either due to sampling or refilling > > actual_end -> allocation_end // the end for > > allocations > > slow_path_end -> last_slow_path_end // last > > address for slow_path_end (as opposed to > > allocation_end) > > hard_end -> reserved_end // the end of the > > reserved space of the TLAB > > > > About setting things in the getter... that > > still seems like a very unpleasant thing to > > me. It would be better to inspect the call > > hierarchy and explicitly update the ends > > where they need updating, and assert in the > > getter that they are in sync, rather than > > implicitly setting various ends as a > > surprising side effect in a getter. It looks > > like the call hierarchy is very small. With > > my new naming convention, reserved_end() > > would presumably return _allocation_end + > > alignment_reserve(), and have an assert > > checking that _allocation_end == > > _last_slow_path_allocation_end, complaining > > that this invariant must hold, and that a > > caller to this function, such as > > make_parsable(), must first explicitly > > synchronize the ends as required, to honor > > that invariant. > > > > > > I've renamed the variables to how you preferred > > it except for the _end one. I did: > > > > current_end > > > > last_allocation_address > > > > tlab_end_ptr > > > > The reason is that the architecture dependent > > code use the thread.hpp API and it already has > > tlab included into the name so it becomes > > tlab_current_end (which is better that > > tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method > > with a TODO that says to remove it when > > FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > > > Note that the following comment now needs updating > > accordingly in threadLocalAllocBuffer.hpp: > > > > 41 // Heap sampling is performed via > > the end/actual_end fields. > > > > 42 // actual_end contains the real end > > of the tlab allocation, > > > > 43 // whereas end can be set to an > > arbitrary spot in the tlab to > > > > 44 // trip the return and sample the > > allocation. > > > > 45 // slow_path_end is used to track > > if a fast tlab refill occured > > > > 46 // between slowpath calls. > > > > There might be other comments too, I have not looked > > in detail. > > > > This was the only spot that still had an actual_end, I > > fixed it now. I'll do a sweep to double check other > > comments. > > > > > > > > Not sure it's better but before updating > > the webrev, I wanted to try > > to get input/consensus :) > > > > (Note hard_end was always further off > > than end). > > > > src/hotspot/share/prims/jvmti.xml: > > > > 10357 > id="can_sample_heap" since="9"> > > 10358 > > 10359 Can sample the heap. > > 10360 If this capability > > is enabled then the heap sampling > > methods > > can be called. > > 10361 > > 10362 > > > > Looks like this capability should > > not be "since 9" if it gets integrated > > now. > > > > Updated now to 11, crossing my fingers :) > > > > src/hotspot/share/runtime/heapMonitoring.cpp: > > > > 448 if > > (is_alive->do_object_b(value)) { > > 449 // Update the oop to > > point to the new object if it is still > > alive. > > 450 f->do_oop(&(trace.obj)); > > 451 > > 452 // Copy the old > > trace, if it is still live. > > 453 > > _allocated_traces->at_put(curr_pos++, trace); > > 454 > > 455 // Store the live > > trace in a cache, to be served up on > > /heapz. > > 456 > > _traces_on_last_full_gc->append(trace); > > 457 > > 458 count++; > > 459 } else { > > 460 // If the old trace > > is no longer live, add it to the list of > > 461 // recently collected > > garbage. > > 462 > > store_garbage_trace(trace); > > 463 } > > > > In the case where the oop was not > > live, I would like it to be explicitly > > cleared. > > > > Done I think how you wanted it. Let me > > know because I'm not familiar > > with the RootAccess API. I'm unclear if > > I'm doing this right or not so > > reviews of these parts are highly > > appreciated. Robbin had talked of > > perhaps later pushing this all into a > > OopStorage, should I do this now > > do you think? Or can that wait a second > > webrev later down the road? > > > > I think using handles can and should be done > > later. You can use the Access API now. > > I noticed that you are missing an #include > > "oops/access.inline.hpp" in your > > heapMonitoring.cpp file. > > > > The missing header is there for me so I don't > > know, I made sure it is present in the latest > > webrev. Sorry about that. > > > > + Did I clear it the way you wanted me > > to or were you thinking of > > something else? > > > > > > That is precisely how I wanted it to be > > cleared. Thanks. > > > > + Final question here, seems like if I > > were to want to not do the > > f->do_oop directly on the trace.obj, I'd > > need to do something like: > > > > f->do_oop(&value); > > ... > > trace->store_oop(value); > > > > to update the oop internally. Is that > > right/is that one of the > > advantages of going to the Oopstorage > > sooner than later? > > > > > > I think you really want to do the do_oop on > > the root directly. Is there a particular > > reason why you would not want to do that? > > Otherwise, yes - the benefit with using the > > handle approach is that you do not need to > > call do_oop explicitly in your code. > > > > There is no reason except that now we have a > > load_oop and a get_oop_addr, I was not sure what > > you would think of that. > > > > That's fine. > > > > Also I see a lot of > > concurrent-looking use of the > > following field: > > 267 volatile bool _initialized; > > > > Please note that the "volatile" > > qualifier does not help with reordering > > here. Reordering between volatile > > and non-volatile fields is > > completely free > > for both compiler and hardware, > > except for windows with MSVC, where > > volatile > > semantics is defined to use > > acquire/release semantics, and the > > hardware is > > TSO. But for the general case, I > > would expect this field to be stored > > with > > OrderAccess::release_store and > > loaded with OrderAccess::load_acquire. > > Otherwise it is not thread safe. > > > > Because everything is behind a mutex, I > > wasn't really worried about > > this. I have a test that has multiple > > threads trying to hit this > > corner case and it passes. > > > > However, to be paranoid, I updated it to > > using the OrderAccess API > > now, thanks! Let me know what you think > > there too! > > > > > > If it is indeed always supposed to be read > > and written under a mutex, then I would > > strongly prefer to have it accessed as a > > normal non-volatile member, and have an > > assertion that given lock is held or we are > > in a safepoint, as we do in many other > > places. Something like this: > > > > assert(HeapMonitorStorage_lock->owned_by_self() > > || (SafepointSynchronize::is_at_safepoint() > > && Thread::current()->is_VM_thread()), "this > > should not be accessed concurrently"); > > > > It would be confusing to people reading the > > code if there are uses of OrderAccess that > > are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the > > code! I put it around each access/assignment of > > the _initialized method and found one case where > > yes you can touch it and not have the lock. It > > actually is "ok" because you don't act on the > > storage until later and only when you really > > want to modify the storage (see the > > object_alloc_do_sample method which calls the > > add_trace method). > > > > But, because of this, I'm going to put the > > OrderAccess here, I'll do some performance > > numbers later and if there are issues, I might > > add a "unsafe" read and a "safe" one to make it > > explicit to the reader. But I don't think it > > will come to that. > > > > > > Okay. This double return in heapMonitoring.cpp looks > > wrong: > > > > 283 bool initialized() { > > 284 return > > OrderAccess::load_acquire(&_initialized) != 0; > > 285 return _initialized; > > 286 } > > > > Since you said object_alloc_do_sample() is the only > > place where you do not hold the mutex while reading > > initialized(), I had a closer look at that. It looks > > like in its current shape, the lack of a mutex may > > lead to a memory leak. In particular, it first > > checks if (initialized()). Let's assume this is now > > true. It then allocates a bunch of stuff, and checks > > if the number of frames were over 0. If they were, > > it calls StackTraceStorage::storage()->add_trace() > > seemingly hoping that after grabbing the lock in > > there, initialized() will still return true. But it > > could now return false and skip doing anything, in > > which case the allocated stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean > > and checking for that. It will be in the next webrev. > > Thanks, the truth is that in our implementation the > > system is always on or off, so this never really occurs > > :). In this version though, that is not true and it's > > important to handle so thanks again! > > > > > > So the analysis seems to be that _initialized is > > only used outside of the mutex in once instance, > > where it is used to perform double-checked locking, > > that actually causes a memory leak. > > > > I am not proposing how to fix that, just raising the > > issue. If you still want to perform this > > double-checked locking somehow, then the use of > > acquire/release still seems odd. Because the memory > > ordering restrictions of it never comes into play in > > this particular case. If it ever did, then the use > > of destroy_stuff(); release_store(_initialized, 0) > > would be broken anyway as that would imply that > > whatever concurrent reader there ever was would > > after reading _initialized with load_acquire() could > > *never* read the data that is concurrently destroyed > > anyway. I would be biased to think that > > RawAccess::load/store looks like a more > > appropriate solution, given that the memory leak > > issue is resolved. I do not know how painful it > > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more > > and the difference and code really stems from our > > internal version. In this version however, there are > > actually a lot of things going on that I did not go > > entirely through in my head but this comment made me > > ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a > > check to HeapMonitoring::enabled(), there is only a > > small chance that the call is happening when things have > > been disabled. So there is no real need to do a first > > check on the initialized, it is a rare occurence that a > > call happens to object_alloc_do_sample and the > > initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample > > without looking at HeapMonitoring::enabled(), that would > > be ok too. You would gather the stacktrace and get > > nowhere at the add_trace call, which would return false; > > so though not optimal performance wise, nothing would > > break). > > > > Furthermore, the add_trace is really the moment of no > > return and we have the mutex lock and then the > > initialized check. So, in the end, I did two things: I > > removed that first check and then I removed the > > OrderAccess for the storage initialized. I think now I > > have a better grasp and understanding why it was done in > > our code and why it is not needed here. Thanks for > > pointing it out :). This now still passes my JTREG > > tests, especially the threaded one. > > > > > > > > As a kind of meta comment, I wonder > > if it would make sense to add sampling > > for non-TLAB allocations. Seems like > > if someone is rapidly allocating a > > whole bunch of 1 MB objects that > > never fit in a TLAB, I might still be > > interested in seeing that in my > > traces, and not get surprised that the > > allocation rate is very high yet not > > showing up in any profiles. > > > > That is handled by the handle_sample > > where you wanted me to put a > > UseTlab because you hit that case if the > > allocation is too big. > > > > > > I see. It was not obvious to me that > > non-TLAB sampling is done in the TLAB class. > > That seems like an abstraction crime. > > What I wanted in my previous comment was > > that we do not call into the TLAB when we > > are not using TLABs. If there is sampling > > logic in the TLAB that is used for something > > else than TLABs, then it seems like that > > logic simply does not belong inside of the > > TLAB. It should be moved out of the TLAB, > > and instead have the TLAB call this common > > abstraction that makes sense. > > > > So in the incremental version: > > > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > >, > > this is still a "crime". The reason is that the > > system has to have the bytes_until_sample on a > > per-thread level and it made "sense" to have it > > with the TLAB implementation. Also, I was not > > sure how people felt about adding something to > > the thread instance instead. > > > > Do you think it fits better at the Thread level? > > I can see how difficult it is to make it happen > > there and add some logic there. Let me know what > > you think. > > > > > > We have an unfortunate situation where everyone that > > has some fields that are thread local tend to dump > > them right into Thread, making the size and > > complexity of Thread grow as it becomes tightly > > coupled with various unrelated subsystems. It would > > be desirable to have a separate class for this > > instead that encapsulates the sampling logic. That > > class could possibly reside in Thread though as a > > value object of Thread. > > > > I imagined that would be the case but was not sure. I > > will look at the example that Robbin is talking about > > (ThreadSMR) and will see how to refactor my code to use > > that. > > > > Thanks again for your help, > > > > Jc > > > > > > > > Hope I have answered your questions and that > > my feedback makes sense to you. > > > > You have and thank you for them, I think we are > > getting to a cleaner implementation and things > > are getting better and more readable :) > > > > > > Yes it is getting better. > > > > Thanks, > > /Erik > > > > > > > > Thanks for your help! > > > > Jc > > > > Thanks, > > /Erik > > > > I double checked by changing the test > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Fri Apr 6 23:12:52 2018 From: jcbeyler at google.com (JC Beyler) Date: Fri, 06 Apr 2018 23:12:52 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: Hi Karen, Let me inline my answers, it will probably be easier :) On Fri, Apr 6, 2018 at 2:40 PM Karen Kinnear wrote: > JC, > > Thank you for the updates - really glad you are including the compiler > folks. I reviewed the version before this one, so ignore > any comments you?ve already covered (although I did peek at the latest) > > 1. JDK-8194905 CSR - could you please delete attachments that are not > current. It is a bit confusing right now. > I have been looking at jvmti_event6.html. I am assuming the rest are > obsolete and could be removed please. > I tried but it does not allow me to do. It seems that someone with admistrative rights has to do it :-(. That is why I had to resort to this... > > 2. In jvmti_event6.html under Sampled Object Allocation, there is a link > to ?Heap Sampling Monitoring System?. > It takes me to the top of the page - seems like something is missing in > defining it? > So the Heap Sampling Monitoring System used to have more methods. It made sense to have them in a separate category. I now have moved it to the memory category to be consistent and grouped there. I also removed that link btw. > > 3. Scope of memory allocation tracking > > I am struggling to understand the extent of memory allocation tracking > that you are looking for (probably want to > clarify in the JEP and CSR once we work this through). > > e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent > So the current jvmtiVMObjectAllocEvent says: > > Sent when a method causes the VM to allocate an Object visible to Java > and allocation not detectable by other instrumentation mechanisms. > Generally detect by instrumenting bytecodes of allocating methods > JNI - use JNI function interception > e.g. Reflection: java.lang.Class.newInstance() > e.g. VM intrinsics > > comment: > Not generated due to bytecodes - e.g. new and newarray VM instructions > Not allocation due to JNI: e.g. AllocObject > NOT VM internal objects > NOT allocations during VM init > > So from the JEP I can?t tell the intended scope of the new event - is this > intended to cover all heap allocation? > bytecodes > JVM_* > JNI_* > internal VM objects > other? (I?m not sure what other there are) > - I presume not allocations during VM init - since sent only during > live phase > Yes exactly, as much as possible, I am aiming to cover all heap allocations. Mostly though and in practice, I think we care about bytecodes and to a lesser extend JNI. In being independent of why the memory is being allocated is probably even better: this thread allocated Y, no matter where/why that ones. > > OR - is the primary goal to cover allocation for bytecodes so folks can > skip instrumentation? > Yes that is the primary goal. > OR - do you want to get performance numbers and see what is low enough > overhead before deciding? > I think it is the same, the system is relatively in place and my overhead seems to indicate that there is a 0% off, 1% on but the callback to the user is empty, 3% for a naive implementation tracking live/GC'd objects. > > 4. The design question is where to put the collectors in the source base - > and that of course strongly depends on > the scope of the information you want to collect, and on the performance > overhead we are willing to incur. > Very true. > > I was trying to figure out a way to put the collectors farther down the > call stack so as to both catch more > cases and to reduce the maintenance burden - i.e. if you were to add a new > code generator, e.g. Graal - > if it were to go through an existing interface, that might be a place to > already have a collector. > > I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - > and it appears that there > are calls to instanceKlass::new_instance, > oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, > so one possibility would be to put hooks in those calls which would catch > many? (I did not do a thorough search) > of the slowpath calls for the bytecodes, and then check the fast paths in > detail. > I'll come to a major issue with the collector and its placement in the next paragraph. > > I had wondered if it made sense to move the hooks even farther down, into > CollectedHeap:obj_allocate and array_allocate. > I do not think so. First reason is that for multidimensional arrays, > ArrayKlass::multi_allocate the outer dimension array would > have an event before storing the inner sub-arrays and I don?t think we > want that exposed, so that won?t work for arrays. > So the major difficulty is that the steps of collection do this: - An object gets allocated and is decided to be sampled - The original pointer placement (where it resides originally in memory) is passed to the collector - Now one important thing of note: (a) In the VM code, until the point where the oop is going to be returned, GC is not yet aware of it (b) so the collector can't yet send it out to the user via JVMTI otherwise, the agent could put a weak reference for example I'm a bit fuzzy on this and maybe it's just that there would be more heavy lifting to make this possible but my initial tests seem to show problems when attempting this in the obj_allocate area. > > The second reason is that I strongly suspect the scope you want is > bytecodes only. I think once you have added hooks > to all the fast paths and slow paths that this will be pushing the > performance overhead constraints you proposed and > you won?t want to see e.g. internal allocations. > Yes agreed, allocations from bytecodes are mostly our concern generally :) > > But I think you need to experiment with the set of allocations (or possible > alternative sets of allocations) you want recorded. > > The hooks I see today include: > Interpreter: (looking at x86 as a sample) > - slowpath in InterpreterRuntime > - fastpath tlab allocation - your new threshold check handles that > Agreed > - allow_shared_alloc (GC specific): for _new isn?t handled > Where is that exactly? I can check why we are not catching it? > > C1 > I don?t see changes in c1_Runtime.cpp > note: you also want to look for the fast path > I added the calls to c1_Runtime in the latest webrev, but was still going through testing before pushing it out. I had waited on this one a bit. Fast path would be handled by the threshold check no? > C2: changes in opto/runtime.cpp for slow path > did you also catch the fast path? > Fast path gets handled by the same threshold check, no? Perhaps I've missed something (very likely)? > > 3. Performance - > After you get all the collectors added - you need to rerun the performance > numbers. > Agreed :) > > thanks, > Karen > > On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: > > Thanks Boris and Derek for testing it. > > Yes I was trying to get a new version out that had the tests ported as > well but got sidetracked while trying to add tests and two new features. > > Here is the incremental webrev: > > Here is the full webrev: > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ > > Basically, the new tests assert this: > - Only one agent can currently ask for the sampling, I'm currently > seeing if I can push to a next webrev the multi-agent support to start > doing a code freeze on this one > - The event is not thread-enabled, meaning like the > VMObjectAllocationEvent, it's an all or nothing event; same as the > multi-agent, I'm going to see if a future webrev to add the support is a > better idea to freeze this webrev a bit > > There was another item that I added here and I'm unsure this webrev is > stable in debug mode: I added an assertion system to ascertain that all > paths leading to a TLAB slow path (and hence a sampling point) have a > sampling collector ready to post the event if a user wants it. This might > break a few thing in debug mode as I'm working through the kinks of that as > well. However, in release mode, this new webrev passes all the tests in > hotspot/jtreg/serviceability/jvmti/HeapMonitor. > > Let me know what you think, > Jc > > On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < > boris.ulasevich at bell-sw.com> wrote: > >> Hi JC, >> >> I have just checked on arm32: your patch compiles and runs ok. >> >> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >> correspond to actual library name: libHeapMonitorTest.c -> >> libHeapMonitorTest.so >> >> Boris >> >> On 04.04.2018 01:54, White, Derek wrote: >> > Thanks JC, >> > >> > New patch applies cleanly. Compiles and runs (simple test programs) on >> > aarch64. >> > >> > * Derek >> > >> > *From:* JC Beyler [mailto:jcbeyler at google.com] >> > *Sent:* Monday, April 02, 2018 1:17 PM >> > *To:* White, Derek >> > *Cc:* Erik ?sterlund ; >> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >> > >> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >> > >> > Hi Derek, >> > >> > I know there were a few things that went in that provoked a merge >> > conflict. I worked on it and got it up to date. Sadly my lack of >> > knowledge makes it a full rebase instead of keeping all the history. >> > However, with a newly cloned jdk/hs you should now be able to use: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >> > >> > The change you are referring to was done with the others so perhaps you >> > were unlucky and I forgot it in a webrev and fixed it in another? I >> > don't know but it's been there and I checked, it is here: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >> > >> > I double checked that tlab_end_offset no longer appears in any >> > architecture (as far as I can tell :)). >> > >> > Thanks for testing and let me know if you run into any other issues! >> > >> > Jc >> > >> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek > > > wrote: >> > >> > Hi Jc, >> > >> > I?ve been having trouble getting your patch to apply correctly. I >> > may have based it on the wrong version. >> > >> > In any case, I think there?s a missing update to >> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >> > where ?JavaThread::tlab_end_offset()? should become >> > ?JavaThread::tlab_current_end_offset()?. >> > >> > This should correspond to the other port?s changes in >> > templateTable_.cpp files. >> > >> > Thanks! >> > - Derek >> > >> > *From:* hotspot-compiler-dev >> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >> > ] *On Behalf >> > Of *JC Beyler >> > *Sent:* Wednesday, March 28, 2018 11:43 AM >> > *To:* Erik ?sterlund > > > >> > *Cc:* serviceability-dev at openjdk.java.net >> > ; hotspot-compiler-dev >> > > > > >> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >> > >> > Hi all, >> > >> > I've been working on deflaking the tests mostly and the wording in >> > the JVMTI spec. >> > >> > Here is the two incremental webrevs: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >> > >> > Here is the total webrev: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >> > >> > Here are the notes of this change: >> > >> > - Currently the tests pass 100 times in a row, I am working on >> > checking if they pass 1000 times in a row. >> > >> > - The default sampling rate is set to 512k, this is what we use >> > internally and having a default means that to enable the sampling >> > with the default, the user only has to do a enable event/disable >> > event via JVMTI (instead of enable + set sample rate). >> > >> > - I deprecated the code that was handling the fast path tlab >> > refill if it happened since this is now deprecated >> > >> > - Though I saw that Graal is still using it so I have to see >> > what needs to be done there exactly >> > >> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for >> > when the event system is turned on and the callback to the native >> > agent is just empty. I got a 3% overhead with a 512k sampling rate >> > with the code I put in the native side of my tests. >> > >> > Thanks and comments are appreciated, >> > >> > Jc >> > >> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler > > > wrote: >> > >> > Hi all, >> > >> > The incremental webrev update is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >> > >> > The full webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >> > >> > Major change here is: >> > >> > - I've removed the heapMonitoring.cpp code in favor of just >> > having the sampling events as per Serguei's request; I still >> > have to do some overhead measurements but the tests prove the >> > concept can work >> > >> > - Most of the tlab code is unchanged, the only major >> > part is that now things get sent off to event collectors when >> > used and enabled. >> > >> > - Added the interpreter collectors to handle interpreter >> > execution >> > >> > - Updated the name from SetTlabHeapSampling to >> > SetHeapSampling to be more generic >> > >> > - Added a mutex for the thread sampling so that we can >> > initialize an internal static array safely >> > >> > - Ported the tests from the old system to this new one >> > >> > I've also updated the JEP and CSR to reflect these changes: >> > >> > https://bugs.openjdk.java.net/browse/JDK-8194905 >> > >> > https://bugs.openjdk.java.net/browse/JDK-8171119 >> > >> > In order to make this have some forward progress, I've removed >> > the heap sampling code entirely and now rely entirely on the >> > event sampling system. The tests reflect this by using a >> > simplified implementation of what an agent could do: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >> > >> > (Search for anything mentioning event_storage). >> > >> > I have not taken the time to port the whole code we had >> > originally in heapMonitoring to this. I hesitate only because >> > that code was in C++, I'd have to port it to C and this is for >> > tests so perhaps what I have now is good enough? >> > >> > As far as testing goes, I've ported all the relevant tests and >> > then added a few: >> > >> > - Turning the system on/off >> > >> > - Testing using various GCs >> > >> > - Testing using the interpreter >> > >> > - Testing the sampling rate >> > >> > - Testing with objects and arrays >> > >> > - Testing with various threads >> > >> > Finally, as overhead goes, I have the numbers of the system off >> > vs a clean build and I have 0% overhead, which is what we'd >> > want. This was using the Dacapo benchmarks. I am now preparing >> > to run a version with the events on using dacapo and will report >> > back here. >> > >> > Any comments are welcome :) >> > >> > Jc >> > >> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler > > > wrote: >> > >> > Hi all, >> > >> > I apologize for the delay but I wanted to add an event >> > system and that took a bit longer than expected and I also >> > reworked the code to take into account the deprecation of >> > FastTLABRefill. >> > >> > This update has four parts: >> > >> > A) I moved the implementation from Thread to >> > ThreadHeapSampler inside of Thread. Would you prefer it as a >> > pointer inside of Thread or like this works for you? Second >> > question would be would you rather have an association >> > outside of Thread altogether that tries to remember when >> > threads are live and then we would have something like: >> > >> > ThreadHeapSampler::get_sampling_size(this_thread); >> > >> > I worry about the overhead of this but perhaps it is not too >> > too bad? >> > >> > B) I also have been working on the Allocation event system >> > that sends out a notification at each sampled event. This >> > will be practical when wanting to do something at the >> > allocation point. I'm also looking at if the whole >> > heapMonitoring code could not reside in the agent code and >> > not in the JDK. I'm not convinced but I'm talking to Serguei >> > about it to see/assess :) >> > >> > - Also added two tests for the new event subsystem >> > >> > C) Removed the slow_path fields inside the TLAB code since >> > now FastTLABRefill is deprecated >> > >> > D) Updated the JVMTI documentation and specification for the >> > methods. >> > >> > So the incremental webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >> > >> > and the full webrev is here: >> > >> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >> > >> > I believe I have updated the various JIRA issues that track >> > this :) >> > >> > Thanks for your input, >> > >> > Jc >> > >> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >> > > wrote: >> > >> > Hi Erik, >> > >> > I inlined my answers, which the last one seems to answer >> > Robbin's concerns about the same thing (adding things to >> > Thread). >> > >> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >> > > > > wrote: >> > >> > Hi JC, >> > >> > Comments are inlined below. >> > >> > On 2018-02-13 06:18, JC Beyler wrote: >> > >> > Hi Erik, >> > >> > Thanks for your answers, I've now inlined my own >> > answers/comments. >> > >> > I've done a new webrev here: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >> > >> > The incremental is here: >> > >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >> > >> > Note to all: >> > >> > - I've been integrating changes from >> > Erin/Serguei/David comments so this webrev >> > incremental is a bit an answer to all comments >> > in one. I apologize for that :) >> > >> > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund >> > > > > wrote: >> > >> > Hi JC, >> > >> > Sorry for the delayed reply. >> > >> > Inlined answers: >> > >> > >> > >> > On 2018-02-06 00:04, JC Beyler wrote: >> > >> > Hi Erik, >> > >> > (Renaming this to be folded into the >> > newly renamed thread :)) >> > >> > First off, thanks a lot for reviewing >> > the webrev! I appreciate it! >> > >> > I updated the webrev to: >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >> > >> > And the incremental one is here: >> > >> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >> > < >> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >> > >> > It contains: >> > - The change for since from 9 to 11 for >> > the jvmti.xml >> > - The use of the OrderAccess for >> initialized >> > - Clearing the oop >> > >> > I also have inlined my answers to your >> > comments. The biggest question >> > will come from the multiple *_end >> > variables. A bit of the logic there >> > is due to handling the slow path refill >> > vs fast path refill and >> > checking that the rug was not pulled >> > underneath the slowpath. I >> > believe that a previous comment was that >> > TlabFastRefill was going to >> > be deprecated. >> > >> > If this is true, we could revert this >> > code a bit and just do a : if >> > TlabFastRefill is enabled, disable this. >> > And then deprecate that when >> > TlabFastRefill is deprecated. >> > >> > This might simplify this webrev and I >> > can work on a follow-up that >> > either: removes TlabFastRefill if Robbin >> > does not have the time to do >> > it or add the support to the assembly >> > side to handle this correctly. >> > What do you think? >> > >> > I support removing TlabFastRefill, but I >> > think it is good to not depend on that >> > happening first. >> > >> > >> > I'm slowly pushing on the FastTLABRefill >> > ( >> https://bugs.openjdk.java.net/browse/JDK-8194084), >> > I agree on keeping both separate for now though >> > so that we can think of both differently >> > >> > Now, below, inlined are my answers: >> > >> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >> > ?sterlund >> > > > > >> wrote: >> > >> > Hi JC, >> > >> > Hope I am reviewing the right >> > version of your work. Here goes... >> > >> > >> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >> > >> > 159 >> > >> AllocTracer::send_allocation_outside_tlab(klass, result, size * >> > HeapWordSize, THREAD); >> > 160 >> > 161 >> > >> THREAD->tlab().handle_sample(THREAD, result, size); >> > 162 return result; >> > 163 } >> > >> > Should not call tlab()->X without >> > checking if (UseTLAB) IMO. >> > >> > Done! >> > >> > >> > More about this later. >> > >> > >> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >> > >> > So first of all, there seems to >> > quite a few ends. There is an "end", >> > a "hard >> > end", a "slow path end", and an >> > "actual end". Moreover, it seems >> > like the >> > "hard end" is actually further away >> > than the "actual end". So the "hard >> end" >> > seems like more of a "really >> > definitely actual end" or something. >> > I don't >> > know about you, but I think it looks >> > kind of messy. In particular, I >> don't >> > feel like the name "actual end" >> > reflects what it represents, >> > especially when >> > there is another end that is behind >> > the "actual end". >> > >> > 413 HeapWord* >> > ThreadLocalAllocBuffer::hard_end() { >> > 414 // Did a fast TLAB refill >> > occur? >> > 415 if (_slow_path_end != >> _end) { >> > 416 // Fix up the actual end >> > to be now the end of this TLAB. >> > 417 _slow_path_end = _end; >> > 418 _actual_end = _end; >> > 419 } >> > 420 >> > 421 return _actual_end + >> > alignment_reserve(); >> > 422 } >> > >> > I really do not like making getters >> > unexpectedly have these kind of side >> > effects. It is not expected that >> > when you ask for the "hard end", you >> > implicitly update the "slow path >> > end" and "actual end" to new values. >> > >> > As I said, a lot of this is due to the >> > FastTlabRefill. If I make this >> > not supporting FastTlabRefill, this goes >> > away. The reason the system >> > needs to update itself at the get is >> > that you only know at that get if >> > things have shifted underneath the tlab >> > slow path. I am not sure of >> > really better names (naming is hard!), >> > perhaps we could do these >> > names: >> > >> > - current_tlab_end // Either the >> > allocated tlab end or a sampling point >> > - last_allocation_address // The end of >> > the tlab allocation >> > - last_slowpath_allocated_end // In >> > case a fast refill occurred the >> > end might have changed, this is to >> > remember slow vs fast past refills >> > >> > the hard_end method can be renamed to >> > something like: >> > tlab_end_pointer() // The end of >> > the lab including a bit of >> > alignment reserved bytes >> > >> > Those names sound better to me. Could you >> > please provide a mapping from the old names >> > to the new names so I understand which one >> > is which please? >> > >> > This is my current guess of what you are >> > proposing: >> > >> > end -> current_tlab_end >> > actual_end -> last_allocation_address >> > slow_path_end -> last_slowpath_allocated_end >> > hard_end -> tlab_end_pointer >> > >> > Yes that is correct, that was what I was >> proposing. >> > >> > I would prefer this naming: >> > >> > end -> slow_path_end // the end for taking a >> > slow path; either due to sampling or >> refilling >> > actual_end -> allocation_end // the end for >> > allocations >> > slow_path_end -> last_slow_path_end // last >> > address for slow_path_end (as opposed to >> > allocation_end) >> > hard_end -> reserved_end // the end of the >> > reserved space of the TLAB >> > >> > About setting things in the getter... that >> > still seems like a very unpleasant thing to >> > me. It would be better to inspect the call >> > hierarchy and explicitly update the ends >> > where they need updating, and assert in the >> > getter that they are in sync, rather than >> > implicitly setting various ends as a >> > surprising side effect in a getter. It looks >> > like the call hierarchy is very small. With >> > my new naming convention, reserved_end() >> > would presumably return _allocation_end + >> > alignment_reserve(), and have an assert >> > checking that _allocation_end == >> > _last_slow_path_allocation_end, complaining >> > that this invariant must hold, and that a >> > caller to this function, such as >> > make_parsable(), must first explicitly >> > synchronize the ends as required, to honor >> > that invariant. >> > >> > >> > I've renamed the variables to how you preferred >> > it except for the _end one. I did: >> > >> > current_end >> > >> > last_allocation_address >> > >> > tlab_end_ptr >> > >> > The reason is that the architecture dependent >> > code use the thread.hpp API and it already has >> > tlab included into the name so it becomes >> > tlab_current_end (which is better that >> > tlab_current_tlab_end in my opinion). >> > >> > I also moved the update into a separate method >> > with a TODO that says to remove it when >> > FastTLABRefill is deprecated >> > >> > This looks a lot better now. Thanks. >> > >> > Note that the following comment now needs updating >> > accordingly in threadLocalAllocBuffer.hpp: >> > >> > 41 // Heap sampling is performed via >> > the end/actual_end fields. >> > >> > 42 // actual_end contains the real end >> > of the tlab allocation, >> > >> > 43 // whereas end can be set to an >> > arbitrary spot in the tlab to >> > >> > 44 // trip the return and sample the >> > allocation. >> > >> > 45 // slow_path_end is used to track >> > if a fast tlab refill occured >> > >> > 46 // between slowpath calls. >> > >> > There might be other comments too, I have not looked >> > in detail. >> > >> > This was the only spot that still had an actual_end, I >> > fixed it now. I'll do a sweep to double check other >> > comments. >> > >> > >> > >> > Not sure it's better but before updating >> > the webrev, I wanted to try >> > to get input/consensus :) >> > >> > (Note hard_end was always further off >> > than end). >> > >> > src/hotspot/share/prims/jvmti.xml: >> > >> > 10357 > > id="can_sample_heap" since="9"> >> > 10358 >> > 10359 Can sample the heap. >> > 10360 If this capability >> > is enabled then the heap sampling >> > methods >> > can be called. >> > 10361 >> > 10362 >> > >> > Looks like this capability should >> > not be "since 9" if it gets >> integrated >> > now. >> > >> > Updated now to 11, crossing my fingers >> :) >> > >> > >> src/hotspot/share/runtime/heapMonitoring.cpp: >> > >> > 448 if >> > (is_alive->do_object_b(value)) { >> > 449 // Update the oop to >> > point to the new object if it is >> still >> > alive. >> > 450 >> f->do_oop(&(trace.obj)); >> > 451 >> > 452 // Copy the old >> > trace, if it is still live. >> > 453 >> > >> _allocated_traces->at_put(curr_pos++, trace); >> > 454 >> > 455 // Store the live >> > trace in a cache, to be served up on >> > /heapz. >> > 456 >> > >> _traces_on_last_full_gc->append(trace); >> > 457 >> > 458 count++; >> > 459 } else { >> > 460 // If the old trace >> > is no longer live, add it to the >> list of >> > 461 // recently collected >> > garbage. >> > 462 >> > store_garbage_trace(trace); >> > 463 } >> > >> > In the case where the oop was not >> > live, I would like it to be >> explicitly >> > cleared. >> > >> > Done I think how you wanted it. Let me >> > know because I'm not familiar >> > with the RootAccess API. I'm unclear if >> > I'm doing this right or not so >> > reviews of these parts are highly >> > appreciated. Robbin had talked of >> > perhaps later pushing this all into a >> > OopStorage, should I do this now >> > do you think? Or can that wait a second >> > webrev later down the road? >> > >> > I think using handles can and should be done >> > later. You can use the Access API now. >> > I noticed that you are missing an #include >> > "oops/access.inline.hpp" in your >> > heapMonitoring.cpp file. >> > >> > The missing header is there for me so I don't >> > know, I made sure it is present in the latest >> > webrev. Sorry about that. >> > >> > + Did I clear it the way you wanted me >> > to or were you thinking of >> > something else? >> > >> > >> > That is precisely how I wanted it to be >> > cleared. Thanks. >> > >> > + Final question here, seems like if I >> > were to want to not do the >> > f->do_oop directly on the trace.obj, I'd >> > need to do something like: >> > >> > f->do_oop(&value); >> > ... >> > trace->store_oop(value); >> > >> > to update the oop internally. Is that >> > right/is that one of the >> > advantages of going to the Oopstorage >> > sooner than later? >> > >> > >> > I think you really want to do the do_oop on >> > the root directly. Is there a particular >> > reason why you would not want to do that? >> > Otherwise, yes - the benefit with using the >> > handle approach is that you do not need to >> > call do_oop explicitly in your code. >> > >> > There is no reason except that now we have a >> > load_oop and a get_oop_addr, I was not sure what >> > you would think of that. >> > >> > That's fine. >> > >> > Also I see a lot of >> > concurrent-looking use of the >> > following field: >> > 267 volatile bool _initialized; >> > >> > Please note that the "volatile" >> > qualifier does not help with >> reordering >> > here. Reordering between volatile >> > and non-volatile fields is >> > completely free >> > for both compiler and hardware, >> > except for windows with MSVC, where >> > volatile >> > semantics is defined to use >> > acquire/release semantics, and the >> > hardware is >> > TSO. But for the general case, I >> > would expect this field to be stored >> > with >> > OrderAccess::release_store and >> > loaded with >> OrderAccess::load_acquire. >> > Otherwise it is not thread safe. >> > >> > Because everything is behind a mutex, I >> > wasn't really worried about >> > this. I have a test that has multiple >> > threads trying to hit this >> > corner case and it passes. >> > >> > However, to be paranoid, I updated it to >> > using the OrderAccess API >> > now, thanks! Let me know what you think >> > there too! >> > >> > >> > If it is indeed always supposed to be read >> > and written under a mutex, then I would >> > strongly prefer to have it accessed as a >> > normal non-volatile member, and have an >> > assertion that given lock is held or we are >> > in a safepoint, as we do in many other >> > places. Something like this: >> > >> > >> assert(HeapMonitorStorage_lock->owned_by_self() >> > || (SafepointSynchronize::is_at_safepoint() >> > && Thread::current()->is_VM_thread()), "this >> > should not be accessed concurrently"); >> > >> > It would be confusing to people reading the >> > code if there are uses of OrderAccess that >> > are actually always protected under a mutex. >> > >> > Thank you for the exact example to be put in the >> > code! I put it around each access/assignment of >> > the _initialized method and found one case where >> > yes you can touch it and not have the lock. It >> > actually is "ok" because you don't act on the >> > storage until later and only when you really >> > want to modify the storage (see the >> > object_alloc_do_sample method which calls the >> > add_trace method). >> > >> > But, because of this, I'm going to put the >> > OrderAccess here, I'll do some performance >> > numbers later and if there are issues, I might >> > add a "unsafe" read and a "safe" one to make it >> > explicit to the reader. But I don't think it >> > will come to that. >> > >> > >> > Okay. This double return in heapMonitoring.cpp looks >> > wrong: >> > >> > 283 bool initialized() { >> > 284 return >> > OrderAccess::load_acquire(&_initialized) != 0; >> > 285 return _initialized; >> > 286 } >> > >> > Since you said object_alloc_do_sample() is the only >> > place where you do not hold the mutex while reading >> > initialized(), I had a closer look at that. It looks >> > like in its current shape, the lack of a mutex may >> > lead to a memory leak. In particular, it first >> > checks if (initialized()). Let's assume this is now >> > true. It then allocates a bunch of stuff, and checks >> > if the number of frames were over 0. If they were, >> > it calls StackTraceStorage::storage()->add_trace() >> > seemingly hoping that after grabbing the lock in >> > there, initialized() will still return true. But it >> > could now return false and skip doing anything, in >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at skarsaune.net Sat Apr 7 05:55:48 2018 From: martin at skarsaune.net (Martin Skarsaune) Date: Sat, 07 Apr 2018 05:55:48 +0000 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: Hi Pietro Not sure JDI is what you really want, but if you would like to play with it I have some code here that uses the PID of the JVM to open a connection to itself and among other things print stack frames with variables: https://github.com/skarsaune/kantega.debug and some demo here: https://www.youtube.com/watch?v=5sXxIfjaALg So an example of what you can do, but not suitable for anything serious. For inspecting the stack, there is an cool reflection hack to the Java 9 API demonstrated by Andrei Pangin here that is able to capture stack values: https://vimeo.com/233820012 For serious work I suppose an JVMTI agent is the best option. Others are in a better position to offer guidance on that. Martin fre. 6. apr. 2018 kl. 18:14 skrev Pietro Paolini < Pietro.Paolini at alfasystems.com>: > Hi all, > > > > I apologise if this is not the right ML for it but I couldn?t find > exactly what I was looking for when Googling the problem. I am a bit new to > the JDI world. > > > > I would like to inspect the stack-frame of a specific thread, I came > across the StackFrame/ThreadReference classes but I couldn?t find a way > examples where their usage is shown > > without connecting to the VM somehow, like a debugger would do. > > > > Is it possible to > > > > inspect a thread?s stack ?locally? ? In my mind I could be able to have a > function such as : > > > > static void hook(Thread thread) { > > > > thread.wait() // stop that thread > > > > // inspect the frames of that thread doing any needed business with them > > } > > > > I?d need this for diagnostic purposes of my application. > > > > Thanks, > > Pietro > > > > > > > Pietro Paolini > Consultant > > Alfa > ------------------------------ > e: pietro.paolini at alfasystems.com | w: alfasystems.com > > t: +44 (0) 20 7920-2643 <+44%2020%207920%202643> | Moor Place, 1 Fore > Street Avenue, London, EC2Y 9DT > , > GB > ------------------------------ > > The contents of this communication are not intended to be binding or > constitute any form of offer or acceptance or give rise to any legal > obligations on behalf of the sender or Alfa. The views or opinions > expressed represent those of the author and not necessarily those of Alfa. > This email and any attachments are strictly confidential and are intended > solely for use by the individual or entity to whom it is addressed. If you > are not the addressee (or responsible for delivery of the message to the > addressee) you may not copy, forward, disclose or use any part of the > message or its attachments. At present the integrity of email across the > internet cannot be guaranteed and messages sent via this medium are > potentially at risk. All liability is excluded to the extent permitted by > law for any claims arising as a result of the use of this medium to > transmit information by or to Alfa or its affiliates. > > Alfa Financial Software Ltd > Reg. in England No: 0248 2325 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Sat Apr 7 08:43:42 2018 From: aph at redhat.com (Andrew Haley) Date: Sat, 7 Apr 2018 09:43:42 +0100 Subject: =?UTF-8?Q?Re:_inspect_a_thread=e2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: On 04/06/2018 05:13 PM, Pietro Paolini wrote: > Is it possible to > > inspect a thread?s stack ?locally? ? Have you looked at ThreadMXBean.getThreadInfo(id).getStackTrace() ? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martinrb at google.com Sun Apr 8 18:49:26 2018 From: martinrb at google.com (Martin Buchholz) Date: Sun, 8 Apr 2018 11:49:26 -0700 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: Access to stacktraces with locals is demoed in this test http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/StackWalker/LocalsAndOperands.java but the functionality does not seem to be available (yet!) via a public API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Mon Apr 9 05:48:46 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 09 Apr 2018 05:48:46 +0000 Subject: JDK-8171119: Low-Overhead Heap Profiling In-Reply-To: References: <5A819F10.8040201@oracle.com> <5A8414AC.3020209@oracle.com>

Message-ID: Hi all, Here is the new webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12/ with the incremental here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11_12/ After banging my head against the interactions of the runtime and the GC, I finally got the next features up and ready: - The multi-agent support (with a new test) - The thread support (with a new test) - I have an assert at the moment of allocation to check there is a sampled collector available (though could be disabled) - This webrev puts the collector at the collectedHeap.inline.hpp level (so removing all of the outer collectors) - Note there is one current caveat: if the agent requests the VMObjectAlloc event, the sampler defers to that event due to a limitation in its implementation (ie: I am not convinced I can safely send out that event with the VM collector enabled, I'll happily white board that). - I updated the jvmti.xml accordingly btw. Let me know what you think! Jc On Fri, Apr 6, 2018 at 4:12 PM JC Beyler wrote: > Hi Karen, > > Let me inline my answers, it will probably be easier :) > > On Fri, Apr 6, 2018 at 2:40 PM Karen Kinnear > wrote: > >> JC, >> >> Thank you for the updates - really glad you are including the compiler >> folks. I reviewed the version before this one, so ignore >> any comments you?ve already covered (although I did peek at the latest) >> >> 1. JDK-8194905 CSR - could you please delete attachments that are not >> current. It is a bit confusing right now. >> I have been looking at jvmti_event6.html. I am assuming the rest are >> obsolete and could be removed please. >> > > I tried but it does not allow me to do. It seems that someone with > admistrative rights has to do it :-(. That is why I had to resort to this... > > >> >> 2. In jvmti_event6.html under Sampled Object Allocation, there is a link >> to ?Heap Sampling Monitoring System?. >> It takes me to the top of the page - seems like something is missing in >> defining it? >> > > So the Heap Sampling Monitoring System used to have more methods. It made > sense to have them in a separate category. I now have moved it to the > memory category to be consistent and grouped there. I also removed that > link btw. > > > >> >> 3. Scope of memory allocation tracking >> >> I am struggling to understand the extent of memory allocation tracking >> that you are looking for (probably want to >> clarify in the JEP and CSR once we work this through). >> >> e.g. Heap Sampler vs. JVMTI VMObjectAllocEvent >> So the current jvmtiVMObjectAllocEvent says: >> >> Sent when a method causes the VM to allocate an Object visible to Java >> and allocation not detectable by other instrumentation mechanisms. >> Generally detect by instrumenting bytecodes of allocating methods >> JNI - use JNI function interception >> e.g. Reflection: java.lang.Class.newInstance() >> e.g. VM intrinsics >> >> comment: >> Not generated due to bytecodes - e.g. new and newarray VM instructions >> Not allocation due to JNI: e.g. AllocObject >> NOT VM internal objects >> NOT allocations during VM init >> >> So from the JEP I can?t tell the intended scope of the new event - is >> this intended to cover all heap allocation? >> bytecodes >> JVM_* >> JNI_* >> internal VM objects >> other? (I?m not sure what other there are) >> - I presume not allocations during VM init - since sent only during >> live phase >> > > Yes exactly, as much as possible, I am aiming to cover all heap > allocations. Mostly though and in practice, I think we care about bytecodes > and to a lesser extend JNI. In being independent of why the memory is being > allocated is probably even better: this thread allocated Y, no matter > where/why that ones. > > >> >> OR - is the primary goal to cover allocation for bytecodes so folks can >> skip instrumentation? >> > > Yes that is the primary goal. > > >> OR - do you want to get performance numbers and see what is low enough >> overhead before deciding? >> > > I think it is the same, the system is relatively in place and my overhead > seems to indicate that there is a 0% off, 1% on but the callback to the > user is empty, 3% for a naive implementation tracking live/GC'd objects. > > >> >> 4. The design question is where to put the collectors in the source base >> - and that of course strongly depends on >> the scope of the information you want to collect, and on the performance >> overhead we are willing to incur. >> > > Very true. > > >> >> I was trying to figure out a way to put the collectors farther down the >> call stack so as to both catch more >> cases and to reduce the maintenance burden - i.e. if you were to add a >> new code generator, e.g. Graal - >> if it were to go through an existing interface, that might be a place to >> already have a collector. >> >> I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - >> and it appears that there >> are calls to instanceKlass::new_instance, >> oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate, >> so one possibility would be to put hooks in those calls which would catch >> many? (I did not do a thorough search) >> of the slowpath calls for the bytecodes, and then check the fast paths in >> detail. >> > > I'll come to a major issue with the collector and its placement in the > next paragraph. > > >> >> I had wondered if it made sense to move the hooks even farther down, into >> CollectedHeap:obj_allocate and array_allocate. >> I do not think so. First reason is that for multidimensional arrays, >> ArrayKlass::multi_allocate the outer dimension array would >> have an event before storing the inner sub-arrays and I don?t think we >> want that exposed, so that won?t work for arrays. >> > > So the major difficulty is that the steps of collection do this: > > - An object gets allocated and is decided to be sampled > - The original pointer placement (where it resides originally in memory) > is passed to the collector > - Now one important thing of note: > (a) In the VM code, until the point where the oop is going to be > returned, GC is not yet aware of it > (b) so the collector can't yet send it out to the user via JVMTI > otherwise, the agent could put a weak reference for example > > I'm a bit fuzzy on this and maybe it's just that there would be more heavy > lifting to make this possible but my initial tests seem to show problems > when attempting this in the obj_allocate area. > > >> >> The second reason is that I strongly suspect the scope you want is >> bytecodes only. I think once you have added hooks >> to all the fast paths and slow paths that this will be pushing the >> performance overhead constraints you proposed and >> you won?t want to see e.g. internal allocations. >> > > Yes agreed, allocations from bytecodes are mostly our concern generally :) > > >> >> > But I think you need to experiment with the set of allocations (or >> possible alternative sets of allocations) you want recorded. >> >> The hooks I see today include: >> Interpreter: (looking at x86 as a sample) >> - slowpath in InterpreterRuntime >> - fastpath tlab allocation - your new threshold check handles that >> > > Agreed > > >> - allow_shared_alloc (GC specific): for _new isn?t handled >> > > Where is that exactly? I can check why we are not catching it? > > >> >> C1 >> I don?t see changes in c1_Runtime.cpp >> note: you also want to look for the fast path >> > > I added the calls to c1_Runtime in the latest webrev, but was still going > through testing before pushing it out. I had waited on this one a bit. Fast > path would be handled by the threshold check no? > > >> C2: changes in opto/runtime.cpp for slow path >> did you also catch the fast path? >> > > Fast path gets handled by the same threshold check, no? Perhaps I've > missed something (very likely)? > > >> >> 3. Performance - >> After you get all the collectors added - you need to rerun the >> performance numbers. >> > > Agreed :) > > >> >> thanks, >> Karen >> >> On Apr 5, 2018, at 2:15 PM, JC Beyler wrote: >> >> Thanks Boris and Derek for testing it. >> >> Yes I was trying to get a new version out that had the tests ported as >> well but got sidetracked while trying to add tests and two new features. >> >> Here is the incremental webrev: >> >> Here is the full webrev: >> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/ >> >> Basically, the new tests assert this: >> - Only one agent can currently ask for the sampling, I'm currently >> seeing if I can push to a next webrev the multi-agent support to start >> doing a code freeze on this one >> - The event is not thread-enabled, meaning like the >> VMObjectAllocationEvent, it's an all or nothing event; same as the >> multi-agent, I'm going to see if a future webrev to add the support is a >> better idea to freeze this webrev a bit >> >> There was another item that I added here and I'm unsure this webrev is >> stable in debug mode: I added an assertion system to ascertain that all >> paths leading to a TLAB slow path (and hence a sampling point) have a >> sampling collector ready to post the event if a user wants it. This might >> break a few thing in debug mode as I'm working through the kinks of that as >> well. However, in release mode, this new webrev passes all the tests in >> hotspot/jtreg/serviceability/jvmti/HeapMonitor. >> >> Let me know what you think, >> Jc >> >> On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich < >> boris.ulasevich at bell-sw.com> wrote: >> >>> Hi JC, >>> >>> I have just checked on arm32: your patch compiles and runs ok. >>> >>> As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not >>> correspond to actual library name: libHeapMonitorTest.c -> >>> libHeapMonitorTest.so >>> >>> Boris >>> >>> On 04.04.2018 01:54, White, Derek wrote: >>> > Thanks JC, >>> > >>> > New patch applies cleanly. Compiles and runs (simple test programs) on >>> > aarch64. >>> > >>> > * Derek >>> > >>> > *From:* JC Beyler [mailto:jcbeyler at google.com] >>> > *Sent:* Monday, April 02, 2018 1:17 PM >>> > *To:* White, Derek >>> > *Cc:* Erik ?sterlund ; >>> > serviceability-dev at openjdk.java.net; hotspot-compiler-dev >>> > >>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>> > >>> > Hi Derek, >>> > >>> > I know there were a few things that went in that provoked a merge >>> > conflict. I worked on it and got it up to date. Sadly my lack of >>> > knowledge makes it a full rebase instead of keeping all the history. >>> > However, with a newly cloned jdk/hs you should now be able to use: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/ >>> > >>> > The change you are referring to was done with the others so perhaps you >>> > were unlucky and I forgot it in a webrev and fixed it in another? I >>> > don't know but it's been there and I checked, it is here: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html >>> > >>> > I double checked that tlab_end_offset no longer appears in any >>> > architecture (as far as I can tell :)). >>> > >>> > Thanks for testing and let me know if you run into any other issues! >>> > >>> > Jc >>> > >>> > On Fri, Mar 30, 2018 at 4:24 PM White, Derek >> > > wrote: >>> > >>> > Hi Jc, >>> > >>> > I?ve been having trouble getting your patch to apply correctly. I >>> > may have based it on the wrong version. >>> > >>> > In any case, I think there?s a missing update to >>> > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), >>> > where ?JavaThread::tlab_end_offset()? should become >>> > ?JavaThread::tlab_current_end_offset()?. >>> > >>> > This should correspond to the other port?s changes in >>> > templateTable_.cpp files. >>> > >>> > Thanks! >>> > - Derek >>> > >>> > *From:* hotspot-compiler-dev >>> > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net >>> > ] *On Behalf >>> > Of *JC Beyler >>> > *Sent:* Wednesday, March 28, 2018 11:43 AM >>> > *To:* Erik ?sterlund >> > > >>> > *Cc:* serviceability-dev at openjdk.java.net >>> > ; hotspot-compiler-dev >>> > >> > > >>> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling >>> > >>> > Hi all, >>> > >>> > I've been working on deflaking the tests mostly and the wording in >>> > the JVMTI spec. >>> > >>> > Here is the two incremental webrevs: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ >>> > >>> > Here is the total webrev: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ >>> > >>> > Here are the notes of this change: >>> > >>> > - Currently the tests pass 100 times in a row, I am working on >>> > checking if they pass 1000 times in a row. >>> > >>> > - The default sampling rate is set to 512k, this is what we use >>> > internally and having a default means that to enable the sampling >>> > with the default, the user only has to do a enable event/disable >>> > event via JVMTI (instead of enable + set sample rate). >>> > >>> > - I deprecated the code that was handling the fast path tlab >>> > refill if it happened since this is now deprecated >>> > >>> > - Though I saw that Graal is still using it so I have to see >>> > what needs to be done there exactly >>> > >>> > Finally, using the Dacapo benchmark suite, I noted a 1% overhead >>> for >>> > when the event system is turned on and the callback to the native >>> > agent is just empty. I got a 3% overhead with a 512k sampling rate >>> > with the code I put in the native side of my tests. >>> > >>> > Thanks and comments are appreciated, >>> > >>> > Jc >>> > >>> > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler >> > > wrote: >>> > >>> > Hi all, >>> > >>> > The incremental webrev update is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ >>> > >>> > The full webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ >>> > >>> > Major change here is: >>> > >>> > - I've removed the heapMonitoring.cpp code in favor of just >>> > having the sampling events as per Serguei's request; I still >>> > have to do some overhead measurements but the tests prove the >>> > concept can work >>> > >>> > - Most of the tlab code is unchanged, the only major >>> > part is that now things get sent off to event collectors when >>> > used and enabled. >>> > >>> > - Added the interpreter collectors to handle interpreter >>> > execution >>> > >>> > - Updated the name from SetTlabHeapSampling to >>> > SetHeapSampling to be more generic >>> > >>> > - Added a mutex for the thread sampling so that we can >>> > initialize an internal static array safely >>> > >>> > - Ported the tests from the old system to this new one >>> > >>> > I've also updated the JEP and CSR to reflect these changes: >>> > >>> > https://bugs.openjdk.java.net/browse/JDK-8194905 >>> > >>> > https://bugs.openjdk.java.net/browse/JDK-8171119 >>> > >>> > In order to make this have some forward progress, I've removed >>> > the heap sampling code entirely and now rely entirely on the >>> > event sampling system. The tests reflect this by using a >>> > simplified implementation of what an agent could do: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c >>> > >>> > (Search for anything mentioning event_storage). >>> > >>> > I have not taken the time to port the whole code we had >>> > originally in heapMonitoring to this. I hesitate only because >>> > that code was in C++, I'd have to port it to C and this is for >>> > tests so perhaps what I have now is good enough? >>> > >>> > As far as testing goes, I've ported all the relevant tests and >>> > then added a few: >>> > >>> > - Turning the system on/off >>> > >>> > - Testing using various GCs >>> > >>> > - Testing using the interpreter >>> > >>> > - Testing the sampling rate >>> > >>> > - Testing with objects and arrays >>> > >>> > - Testing with various threads >>> > >>> > Finally, as overhead goes, I have the numbers of the system off >>> > vs a clean build and I have 0% overhead, which is what we'd >>> > want. This was using the Dacapo benchmarks. I am now preparing >>> > to run a version with the events on using dacapo and will >>> report >>> > back here. >>> > >>> > Any comments are welcome :) >>> > >>> > Jc >>> > >>> > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler >> > > wrote: >>> > >>> > Hi all, >>> > >>> > I apologize for the delay but I wanted to add an event >>> > system and that took a bit longer than expected and I also >>> > reworked the code to take into account the deprecation of >>> > FastTLABRefill. >>> > >>> > This update has four parts: >>> > >>> > A) I moved the implementation from Thread to >>> > ThreadHeapSampler inside of Thread. Would you prefer it as >>> a >>> > pointer inside of Thread or like this works for you? Second >>> > question would be would you rather have an association >>> > outside of Thread altogether that tries to remember when >>> > threads are live and then we would have something like: >>> > >>> > ThreadHeapSampler::get_sampling_size(this_thread); >>> > >>> > I worry about the overhead of this but perhaps it is not >>> too >>> > too bad? >>> > >>> > B) I also have been working on the Allocation event system >>> > that sends out a notification at each sampled event. This >>> > will be practical when wanting to do something at the >>> > allocation point. I'm also looking at if the whole >>> > heapMonitoring code could not reside in the agent code and >>> > not in the JDK. I'm not convinced but I'm talking to >>> Serguei >>> > about it to see/assess :) >>> > >>> > - Also added two tests for the new event subsystem >>> > >>> > C) Removed the slow_path fields inside the TLAB code since >>> > now FastTLABRefill is deprecated >>> > >>> > D) Updated the JVMTI documentation and specification for >>> the >>> > methods. >>> > >>> > So the incremental webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ >>> > >>> > and the full webrev is here: >>> > >>> > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 >>> > >>> > I believe I have updated the various JIRA issues that track >>> > this :) >>> > >>> > Thanks for your input, >>> > >>> > Jc >>> > >>> > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler >>> > > wrote: >>> > >>> > Hi Erik, >>> > >>> > I inlined my answers, which the last one seems to >>> answer >>> > Robbin's concerns about the same thing (adding things >>> to >>> > Thread). >>> > >>> > On Wed, Feb 14, 2018 at 2:51 AM, Erik ?sterlund >>> > >> > > wrote: >>> > >>> > Hi JC, >>> > >>> > Comments are inlined below. >>> > >>> > On 2018-02-13 06:18, JC Beyler wrote: >>> > >>> > Hi Erik, >>> > >>> > Thanks for your answers, I've now inlined my >>> own >>> > answers/comments. >>> > >>> > I've done a new webrev here: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/> >>> > >>> > The incremental is here: >>> > >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/> >>> > >>> > Note to all: >>> > >>> > - I've been integrating changes from >>> > Erin/Serguei/David comments so this webrev >>> > incremental is a bit an answer to all comments >>> > in one. I apologize for that :) >>> > >>> > On Mon, Feb 12, 2018 at 6:05 AM, Erik ?sterlund >>> > >> > > wrote: >>> > >>> > Hi JC, >>> > >>> > Sorry for the delayed reply. >>> > >>> > Inlined answers: >>> > >>> > >>> > >>> > On 2018-02-06 00:04, JC Beyler wrote: >>> > >>> > Hi Erik, >>> > >>> > (Renaming this to be folded into the >>> > newly renamed thread :)) >>> > >>> > First off, thanks a lot for reviewing >>> > the webrev! I appreciate it! >>> > >>> > I updated the webrev to: >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/> >>> > >>> > And the incremental one is here: >>> > >>> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ >>> > < >>> http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/> >>> > >>> > It contains: >>> > - The change for since from 9 to 11 for >>> > the jvmti.xml >>> > - The use of the OrderAccess for >>> initialized >>> > - Clearing the oop >>> > >>> > I also have inlined my answers to your >>> > comments. The biggest question >>> > will come from the multiple *_end >>> > variables. A bit of the logic there >>> > is due to handling the slow path refill >>> > vs fast path refill and >>> > checking that the rug was not pulled >>> > underneath the slowpath. I >>> > believe that a previous comment was >>> that >>> > TlabFastRefill was going to >>> > be deprecated. >>> > >>> > If this is true, we could revert this >>> > code a bit and just do a : if >>> > TlabFastRefill is enabled, disable >>> this. >>> > And then deprecate that when >>> > TlabFastRefill is deprecated. >>> > >>> > This might simplify this webrev and I >>> > can work on a follow-up that >>> > either: removes TlabFastRefill if >>> Robbin >>> > does not have the time to do >>> > it or add the support to the assembly >>> > side to handle this correctly. >>> > What do you think? >>> > >>> > I support removing TlabFastRefill, but I >>> > think it is good to not depend on that >>> > happening first. >>> > >>> > >>> > I'm slowly pushing on the FastTLABRefill >>> > ( >>> https://bugs.openjdk.java.net/browse/JDK-8194084), >>> > I agree on keeping both separate for now though >>> > so that we can think of both differently >>> > >>> > Now, below, inlined are my answers: >>> > >>> > On Fri, Feb 2, 2018 at 8:44 AM, Erik >>> > ?sterlund >>> > >> > > >>> wrote: >>> > >>> > Hi JC, >>> > >>> > Hope I am reviewing the right >>> > version of your work. Here goes... >>> > >>> > >>> src/hotspot/share/gc/shared/collectedHeap.inline.hpp: >>> > >>> > 159 >>> > >>> AllocTracer::send_allocation_outside_tlab(klass, result, size * >>> > HeapWordSize, THREAD); >>> > 160 >>> > 161 >>> > >>> THREAD->tlab().handle_sample(THREAD, result, size); >>> > 162 return result; >>> > 163 } >>> > >>> > Should not call tlab()->X without >>> > checking if (UseTLAB) IMO. >>> > >>> > Done! >>> > >>> > >>> > More about this later. >>> > >>> > >>> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: >>> > >>> > So first of all, there seems to >>> > quite a few ends. There is an >>> "end", >>> > a "hard >>> > end", a "slow path end", and an >>> > "actual end". Moreover, it seems >>> > like the >>> > "hard end" is actually further away >>> > than the "actual end". So the >>> "hard end" >>> > seems like more of a "really >>> > definitely actual end" or >>> something. >>> > I don't >>> > know about you, but I think it >>> looks >>> > kind of messy. In particular, I >>> don't >>> > feel like the name "actual end" >>> > reflects what it represents, >>> > especially when >>> > there is another end that is behind >>> > the "actual end". >>> > >>> > 413 HeapWord* >>> > ThreadLocalAllocBuffer::hard_end() >>> { >>> > 414 // Did a fast TLAB refill >>> > occur? >>> > 415 if (_slow_path_end != >>> _end) { >>> > 416 // Fix up the actual end >>> > to be now the end of this TLAB. >>> > 417 _slow_path_end = _end; >>> > 418 _actual_end = _end; >>> > 419 } >>> > 420 >>> > 421 return _actual_end + >>> > alignment_reserve(); >>> > 422 } >>> > >>> > I really do not like making getters >>> > unexpectedly have these kind of >>> side >>> > effects. It is not expected that >>> > when you ask for the "hard end", >>> you >>> > implicitly update the "slow path >>> > end" and "actual end" to new >>> values. >>> > >>> > As I said, a lot of this is due to the >>> > FastTlabRefill. If I make this >>> > not supporting FastTlabRefill, this >>> goes >>> > away. The reason the system >>> > needs to update itself at the get is >>> > that you only know at that get if >>> > things have shifted underneath the tlab >>> > slow path. I am not sure of >>> > really better names (naming is hard!), >>> > perhaps we could do these >>> > names: >>> > >>> > - current_tlab_end // Either the >>> > allocated tlab end or a sampling point >>> > - last_allocation_address // The end >>> of >>> > the tlab allocation >>> > - last_slowpath_allocated_end // In >>> > case a fast refill occurred the >>> > end might have changed, this is to >>> > remember slow vs fast past refills >>> > >>> > the hard_end method can be renamed to >>> > something like: >>> > tlab_end_pointer() // The end of >>> > the lab including a bit of >>> > alignment reserved bytes >>> > >>> > Those names sound better to me. Could you >>> > please provide a mapping from the old names >>> > to the new names so I understand which one >>> > is which please? >>> > >>> > This is my current guess of what you are >>> > proposing: >>> > >>> > end -> current_tlab_end >>> > actual_end -> last_allocation_address >>> > slow_path_end -> >>> last_slowpath_allocated_end >>> > hard_end -> tlab_end_pointer >>> > >>> > Yes that is correct, that was what I was >>> proposing. >>> > >>> > I would prefer this naming: >>> > >>> > end -> slow_path_end // the end for taking >>> a >>> > slow path; either due to sampling or >>> refilling >>> > actual_end -> allocation_end // the end for >>> > allocations >>> > slow_path_end -> last_slow_path_end // last >>> > address for slow_path_end (as opposed to >>> > allocation_end) >>> > hard_end -> reserved_end // the end of the >>> > reserved space of the TLAB >>> > >>> > About setting things in the getter... that >>> > still seems like a very unpleasant thing to >>> > me. It would be better to inspect the call >>> > hierarchy and explicitly update the ends >>> > where they need updating, and assert in the >>> > getter that they are in sync, rather than >>> > implicitly setting various ends as a >>> > surprising side effect in a getter. It >>> looks >>> > like the call hierarchy is very small. With >>> > my new naming convention, reserved_end() >>> > would presumably return _allocation_end + >>> > alignment_reserve(), and have an assert >>> > checking that _allocation_end == >>> > _last_slow_path_allocation_end, complaining >>> > that this invariant must hold, and that a >>> > caller to this function, such as >>> > make_parsable(), must first explicitly >>> > synchronize the ends as required, to honor >>> > that invariant. >>> > >>> > >>> > I've renamed the variables to how you preferred >>> > it except for the _end one. I did: >>> > >>> > current_end >>> > >>> > last_allocation_address >>> > >>> > tlab_end_ptr >>> > >>> > The reason is that the architecture dependent >>> > code use the thread.hpp API and it already has >>> > tlab included into the name so it becomes >>> > tlab_current_end (which is better that >>> > tlab_current_tlab_end in my opinion). >>> > >>> > I also moved the update into a separate method >>> > with a TODO that says to remove it when >>> > FastTLABRefill is deprecated >>> > >>> > This looks a lot better now. Thanks. >>> > >>> > Note that the following comment now needs updating >>> > accordingly in threadLocalAllocBuffer.hpp: >>> > >>> > 41 // Heap sampling is performed via >>> > the end/actual_end fields. >>> > >>> > 42 // actual_end contains the real >>> end >>> > of the tlab allocation, >>> > >>> > 43 // whereas end can be set to an >>> > arbitrary spot in the tlab to >>> > >>> > 44 // trip the return and sample the >>> > allocation. >>> > >>> > 45 // slow_path_end is used to track >>> > if a fast tlab refill occured >>> > >>> > 46 // between slowpath calls. >>> > >>> > There might be other comments too, I have not >>> looked >>> > in detail. >>> > >>> > This was the only spot that still had an actual_end, I >>> > fixed it now. I'll do a sweep to double check other >>> > comments. >>> > >>> > >>> > >>> > Not sure it's better but before >>> updating >>> > the webrev, I wanted to try >>> > to get input/consensus :) >>> > >>> > (Note hard_end was always further off >>> > than end). >>> > >>> > src/hotspot/share/prims/jvmti.xml: >>> > >>> > 10357 >> > id="can_sample_heap" since="9"> >>> > 10358 >>> > 10359 Can sample the >>> heap. >>> > 10360 If this capability >>> > is enabled then the heap sampling >>> > methods >>> > can be called. >>> > 10361 >>> > 10362 >>> > >>> > Looks like this capability should >>> > not be "since 9" if it gets >>> integrated >>> > now. >>> > >>> > Updated now to 11, crossing my fingers >>> :) >>> > >>> > >>> src/hotspot/share/runtime/heapMonitoring.cpp: >>> > >>> > 448 if >>> > (is_alive->do_object_b(value)) { >>> > 449 // Update the oop to >>> > point to the new object if it is >>> still >>> > alive. >>> > 450 >>> f->do_oop(&(trace.obj)); >>> > 451 >>> > 452 // Copy the old >>> > trace, if it is still live. >>> > 453 >>> > >>> _allocated_traces->at_put(curr_pos++, trace); >>> > 454 >>> > 455 // Store the live >>> > trace in a cache, to be served up >>> on >>> > /heapz. >>> > 456 >>> > >>> _traces_on_last_full_gc->append(trace); >>> > 457 >>> > 458 count++; >>> > 459 } else { >>> > 460 // If the old trace >>> > is no longer live, add it to the >>> list of >>> > 461 // recently >>> collected >>> > garbage. >>> > 462 >>> > store_garbage_trace(trace); >>> > 463 } >>> > >>> > In the case where the oop was not >>> > live, I would like it to be >>> explicitly >>> > cleared. >>> > >>> > Done I think how you wanted it. Let me >>> > know because I'm not familiar >>> > with the RootAccess API. I'm unclear if >>> > I'm doing this right or not so >>> > reviews of these parts are highly >>> > appreciated. Robbin had talked of >>> > perhaps later pushing this all into a >>> > OopStorage, should I do this now >>> > do you think? Or can that wait a second >>> > webrev later down the road? >>> > >>> > I think using handles can and should be >>> done >>> > later. You can use the Access API now. >>> > I noticed that you are missing an #include >>> > "oops/access.inline.hpp" in your >>> > heapMonitoring.cpp file. >>> > >>> > The missing header is there for me so I don't >>> > know, I made sure it is present in the latest >>> > webrev. Sorry about that. >>> > >>> > + Did I clear it the way you wanted me >>> > to or were you thinking of >>> > something else? >>> > >>> > >>> > That is precisely how I wanted it to be >>> > cleared. Thanks. >>> > >>> > + Final question here, seems like if I >>> > were to want to not do the >>> > f->do_oop directly on the trace.obj, >>> I'd >>> > need to do something like: >>> > >>> > f->do_oop(&value); >>> > ... >>> > trace->store_oop(value); >>> > >>> > to update the oop internally. Is that >>> > right/is that one of the >>> > advantages of going to the Oopstorage >>> > sooner than later? >>> > >>> > >>> > I think you really want to do the do_oop on >>> > the root directly. Is there a particular >>> > reason why you would not want to do that? >>> > Otherwise, yes - the benefit with using the >>> > handle approach is that you do not need to >>> > call do_oop explicitly in your code. >>> > >>> > There is no reason except that now we have a >>> > load_oop and a get_oop_addr, I was not sure >>> what >>> > you would think of that. >>> > >>> > That's fine. >>> > >>> > Also I see a lot of >>> > concurrent-looking use of the >>> > following field: >>> > 267 volatile bool >>> _initialized; >>> > >>> > Please note that the "volatile" >>> > qualifier does not help with >>> reordering >>> > here. Reordering between volatile >>> > and non-volatile fields is >>> > completely free >>> > for both compiler and hardware, >>> > except for windows with MSVC, where >>> > volatile >>> > semantics is defined to use >>> > acquire/release semantics, and the >>> > hardware is >>> > TSO. But for the general case, I >>> > would expect this field to be >>> stored >>> > with >>> > OrderAccess::release_store and >>> > loaded with >>> OrderAccess::load_acquire. >>> > Otherwise it is not thread safe. >>> > >>> > Because everything is behind a mutex, I >>> > wasn't really worried about >>> > this. I have a test that has multiple >>> > threads trying to hit this >>> > corner case and it passes. >>> > >>> > However, to be paranoid, I updated it >>> to >>> > using the OrderAccess API >>> > now, thanks! Let me know what you think >>> > there too! >>> > >>> > >>> > If it is indeed always supposed to be read >>> > and written under a mutex, then I would >>> > strongly prefer to have it accessed as a >>> > normal non-volatile member, and have an >>> > assertion that given lock is held or we are >>> > in a safepoint, as we do in many other >>> > places. Something like this: >>> > >>> > >>> assert(HeapMonitorStorage_lock->owned_by_self() >>> > || (SafepointSynchronize::is_at_safepoint() >>> > && Thread::current()->is_VM_thread()), >>> "this >>> > should not be accessed concurrently"); >>> > >>> > It would be confusing to people reading the >>> > code if there are uses of OrderAccess that >>> > are actually always protected under a >>> mutex. >>> > >>> > Thank you for the exact example to be put in >>> the >>> > code! I put it around each access/assignment of >>> > the _initialized method and found one case >>> where >>> > yes you can touch it and not have the lock. It >>> > actually is "ok" because you don't act on the >>> > storage until later and only when you really >>> > want to modify the storage (see the >>> > object_alloc_do_sample method which calls the >>> > add_trace method). >>> > >>> > But, because of this, I'm going to put the >>> > OrderAccess here, I'll do some performance >>> > numbers later and if there are issues, I might >>> > add a "unsafe" read and a "safe" one to make it >>> > explicit to the reader. But I don't think it >>> > will come to that. >>> > >>> > >>> > Okay. This double return in heapMonitoring.cpp >>> looks >>> > wrong: >>> > >>> > 283 bool initialized() { >>> > 284 return >>> > OrderAccess::load_acquire(&_initialized) != 0; >>> > 285 return _initialized; >>> > 286 } >>> > >>> > Since you said object_alloc_do_sample() is the only >>> > place where you do not hold the mutex while reading >>> > initialized(), I had a closer look at that. It >>> looks >>> > like in its current shape, the lack of a mutex may >>> > lead to a memory leak. In particular, it first >>> > checks if (initialized()). Let's assume this is now >>> > true. It then allocates a bunch of stuff, and >>> checks >>> > if the number of frames were over 0. If they were, >>> > it calls StackTraceStorage::storage()->add_trace() >>> > seemingly hoping that after grabbing the lock in >>> > there, initialized() will still return true. But it >>> > could now return false and skip doing anything, in >>> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Mon Apr 9 07:08:46 2018 From: christoph.langer at sap.com (Langer, Christoph) Date: Mon, 9 Apr 2018 07:08:46 +0000 Subject: RFR (M): 8201247: Various cleanups in the attach framework In-Reply-To: <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> References: <14dff9b0cf5a4b888aef1d6452801b57@sap.com> <91d75e2d-47a4-e9ee-5d19-8f3e6dc13428@oracle.com> Message-ID: Hi Chris, thanks for looking into this. As for ArgumentIterator::next, I must admit, I found this patch in our code base when taking over the code. I believe that an issue would be seen if an attach operation has 2 or 3 arguments and the first one is NULL/empty. I guess such a situation can't happen with the attach operations currently existing in OpenJDK as none of these ops would allow such type of arguments. However, in our implementation, we have for instance enhanced the "dump_heap" operation to work with null as first argument where one usually would specify the desired output file name. We implemented a mechanism to compute a default filename when the param is left blank. So we need the fix for that case, I guess. I'll run the patch through the submission forest now and do some jtreg testing. Best regards Christoph > -----Original Message----- > From: Chris Plummer [mailto:chris.plummer at oracle.com] > Sent: Freitag, 6. April 2018 18:37 > To: Langer, Christoph ; serviceability- > dev at openjdk.java.net > Cc: hotspot-dev at openjdk.java.net > Subject: Re: RFR (M): 8201247: Various cleanups in the attach framework > > Hi Christoph, > > Can you explain a bit more about "fix handling of null values in > ArgumentIterator::next". When does this turn up? Is there a test case? > > Everything else looks good. > > thanks, > > Chris > > On 4/6/18 8:01 AM, Langer, Christoph wrote: > > > > Hi, > > > > can I please get reviews for a set of clean up changes that I came > > across when doing some integration work. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8201247 > > > > > > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8201247.0/ > > > > > > Detailed comments about the changes can be found in the bug. > > > > Thanks & best regards > > > > Christoph > > From Pietro.Paolini at alfasystems.com Mon Apr 9 08:33:03 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Mon, 9 Apr 2018 08:33:03 +0000 Subject: =?utf-8?B?UkU6IGluc3BlY3QgYSB0aHJlYWTigJlzIHN0YWNr?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: <5D285FC05679A441ACF34A90905BFA92241A7A35@GBEDBP01.chp.co.uk> Hi Martin, >Hi Pietro >Not sure JDI is what you really want, but if you would like to play with it I have some code here that uses the PID of the JVM to open a >connection to itself and among other things print stack frames with variables: >https://github.com/skarsaune/kantega.debug?and some demo here:?https://www.youtube.com/watch?v=5sXxIfjaALg >So an example of what you can do, but not suitable for anything serious. I don't want to setup a connection to myself and I was wondering if that could be avoided altogether, it is more complex than I would like it to be, for instance I would need to factor in the connection, what if it goes wrong etc etc . >For inspecting the stack, there is an cool reflection hack to the Java 9 API demonstrated by Andrei Pangin here that is >able to capture stack values:?https://vimeo.com/233820012 Do you think that is suitable for serious work ? I mean, production code. >For serious work I suppose an JVMTI agent is the best option. Others are in a better position to offer guidance on that. Reading the docs it seems that the agent has to be written in C/C++ and unfortunately that is not an option on my current project, I quote from there (https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#whatIs) : "Tools can be written directly to JVM TI or indirectly through higher level interfaces. The Java Platform Debugger Architecture includes JVM TI, but also contains higher-level, out-of-process debugger interfaces. The higher-level interfaces are more appropriate than JVM TI for many tools. For more information on the Java Platform Debugger Architecture, see the Java Platform Debugger Architecture website." It easy to get lost among acronyms - me being a newbie in the Java JVM related tooling - but when I open the https://docs.oracle.com/javase/7/docs/technotes/guides/jpda/architecture.html (Java Platform Debugger Architecture website) it lists three "things": 1) JVM TI if native it is not an option 2) JDWP not sure I need to look into that 3) JDI which is why I ended up here Wrapping up, my hope is that the Java 9 reflection hack can work well or that JDI allows me to do inspect frames without the need of having a connection, reading your answer that does not seem to be possible and I should exclude the possibility altogether. Is that right ? Thanks a lot for the answers. P. fre. 6. apr. 2018 kl. 18:14 skrev Pietro Paolini : Hi all, ? I apologise if this is not the right ML for it but? I couldn?t find exactly what I was looking for when Googling the problem. I am a bit new to the JDI world. ? I would like to inspect the stack-frame of a specific? thread, I came across the StackFrame/ThreadReference classes but I couldn?t find a way examples where their usage is shown without connecting to the VM somehow, like a debugger would do. ? Is it possible to ? inspect a thread?s stack ?locally? ?? In my mind I could be able to have a function such as : ? static void hook(Thread thread) { ? thread.wait() // stop that thread ? // inspect the frames of that thread doing any needed business with them } ? I?d need this for diagnostic purposes of my application. ? Thanks, Pietro ? ? Pietro Paolini Consultant Alfa ________________________________________ e: pietro.paolini at alfasystems.com | w: alfasystems.com t: +44 (0) 20 7920-2643 | Moor Place, 1 Fore Street Avenue, London, EC2Y 9DT, GB ________________________________________ The contents of this communication are not intended to be binding or constitute any form of offer or acceptance or give rise to any legal obligations on behalf of the sender or Alfa. The views or opinions expressed represent those of the author and not necessarily those of Alfa. This email and any attachments are strictly confidential and are intended solely for use by the individual or entity to whom it is addressed. If you are not the addressee (or responsible for delivery of the message to the addressee) you may not copy, forward, disclose or use any part of the message or its attachments. At present the integrity of email across the internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the use of this medium to transmit information by or to Alfa or its affiliates. Alfa Financial Software Ltd Reg. in England No: 0248 2325 From Pietro.Paolini at alfasystems.com Mon Apr 9 08:40:39 2018 From: Pietro.Paolini at alfasystems.com (Pietro Paolini) Date: Mon, 9 Apr 2018 08:40:39 +0000 Subject: =?utf-8?B?UkU6IGluc3BlY3QgYSB0aHJlYWTigJlzIHN0YWNr?= In-Reply-To: References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> Message-ID: <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> >Access to stacktraces with locals is demoed in this test >http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/StackWalker/LocalsAndOperands.java Maybe I haven't read it well enough but isn't that accessible through https://docs.oracle.com/javase/9/docs/api/java/lang/StackWalker.html ? As long as you are on Java 9 that should not be a problem. >but the functionality does not seem to be available (yet!) via a public API. What do you mean ? Isn't that a public API ? Thanks, P. From thomas.stuefe at gmail.com Mon Apr 9 09:07:17 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 11:07:17 +0200 Subject: RFR(xxs): 8200384: jcmd help output should be sorted In-Reply-To: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> References: <1f24a1d0-9cac-4a77-6023-918cf864a298@oracle.com> Message-ID: Hi Sergey, Christoph, thanks for the review! Sure, here you go: Old output, unsorted: thomas at mainframe /shared/projects/openjdk/jdk-submit-hs/output-fastdebug $ ./images/jdk/bin/jcmd test3.Example2 help 24278: The following commands are available: VM.log VM.native_memory ManagementAgent.status ManagementAgent.stop ManagementAgent.start_local ManagementAgent.start Compiler.directives_clear Compiler.directives_remove Compiler.directives_add Compiler.directives_print Compiler.CodeHeap_Analytics VM.print_touched_methods Compiler.codecache Compiler.codelist Compiler.queue VM.classloader_stats Thread.print JVMTI.data_dump JVMTI.agent_load VM.metaspace VM.stringtable VM.symboltable VM.class_hierarchy VM.systemdictionary GC.class_stats GC.class_histogram GC.heap_dump GC.finalizer_info GC.heap_info GC.run_finalization GC.run VM.info VM.uptime VM.dynlibs VM.set_flag VM.flags VM.system_properties VM.command_line VM.version help New output, sorted: thomas at mainframe /shared/projects/openjdk/jdk-submit-hs/output-fastdebug $ ./images/jdk/bin/jcmd test3.Example2 help 30230: The following commands are available: Compiler.CodeHeap_Analytics Compiler.codecache Compiler.codelist Compiler.directives_add Compiler.directives_clear Compiler.directives_print Compiler.directives_remove Compiler.queue GC.class_histogram GC.class_stats GC.finalizer_info GC.heap_dump GC.heap_info GC.run GC.run_finalization JVMTI.agent_load JVMTI.data_dump ManagementAgent.start ManagementAgent.start_local ManagementAgent.status ManagementAgent.stop Thread.print VM.class_hierarchy VM.classloader_stats VM.command_line VM.dynlibs VM.flags VM.info VM.log VM.metaspace VM.native_memory VM.print_touched_methods VM.set_flag VM.stringtable VM.symboltable VM.system_properties VM.systemdictionary VM.uptime VM.version help I'm running submit tests now, if they pass I'll push. Best Regards, Thomas On Tue, Apr 3, 2018 at 3:52 AM, serguei.spitsyn at oracle.com < serguei.spitsyn at oracle.com> wrote: > Hi Thomas, > > Added the serviceability-dev mailing list as it is a Serviceability area. > > The fix looks good to me. > One question: > Could you, please, post the sorted help output? > It is interesting how does it look like when sorted. > > Thanks, > Serguei > > > > On 3/28/18 13:08, Thomas St?fe wrote: > >> Hi all, >> >> may I get reviews for this tiny trivial change which causes jcmd help >> output (the command list) to be sorted? >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8200384 >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8200384-jcmd-help >> -sorted/webrev.00/webrev/ >> >> Thanks! >> >> Best Regards, Thomas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 9 15:24:00 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 17:24:00 +0200 Subject: jcmd, windows x64: cannot see other processes? Message-ID: Hi all, I try to attach a jcmd to a running process on windows x64, but jcmd only lists its own process. Both jcmd and process are built from jdk-hs. Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not work either. Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 Vm, and still nothing... am I making a thinking error here? Do I need special options on Windows? On Unix this never gave me any trouble. Both processes run under the same user, from two console windows. I tried both within and without cygwin too, does not make any difference. Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.gronlund at oracle.com Mon Apr 9 15:30:57 2018 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Mon, 9 Apr 2018 08:30:57 -0700 (PDT) Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References: Message-ID: Hi Thomas, ? Are you running in two separate Terminal Server Sessions? ? You need to be in the same WindowsStation https://msdn.microsoft.com/en-us/library/windows/desktop/ms687096(v=vs.85).aspx ? HTH Markus ? From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: den 9 april 2018 17:24 To: serviceability-dev at openjdk.java.net Subject: jcmd, windows x64: cannot see other processes? ? Hi all, ? I try to attach a jcmd to a running process on windows x64, but jcmd only lists its own process. ? Both jcmd and process are built from jdk-hs. ? Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not work either. ? Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 Vm, and still nothing... am I making a thinking error here? Do I need special options on Windows? On Unix this never gave me any trouble. ? Both processes run under the same user, from two console windows. I tried both within and without cygwin too, does not make any difference. ? Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 9 15:33:36 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 17:33:36 +0200 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References:

Message-ID: Hi Markus, On Mon, Apr 9, 2018 at 5:30 PM, Markus Gronlund wrote: > Hi Thomas, > > > > Are you running in two separate Terminal Server Sessions? > > > no, this is all on my local Laptop. > You need to be in the same WindowsStation https://msdn.microsoft.com/en- > us/library/windows/desktop/ms687096(v=vs.85).aspx > > > > HTH > > Markus > > > Best Regards, Thomas > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* den 9 april 2018 17:24 > *To:* serviceability-dev at openjdk.java.net > *Subject:* jcmd, windows x64: cannot see other processes? > > > > Hi all, > > > > I try to attach a jcmd to a running process on windows x64, but jcmd only > lists its own process. > > > > Both jcmd and process are built from jdk-hs. > > > > Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not > work either. > > > > Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 > Vm, and still nothing... am I making a thinking error here? Do I need > special options on Windows? On Unix this never gave me any trouble. > > > > Both processes run under the same user, from two console windows. I tried > both within and without cygwin too, does not make any difference. > > > > Thanks, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 9 15:50:33 2018 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 9 Apr 2018 17:50:33 +0200 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References: Message-ID: So, I found that I can attach with jcmd just fine, just the process listing does not work. I can only attach via pid, not via command name, which I think stems from the same error. Does anyone have any idea? Should I open a bug report? ..Thomas On Mon, Apr 9, 2018 at 5:24 PM, Thomas St?fe wrote: > Hi all, > > I try to attach a jcmd to a running process on windows x64, but jcmd only > lists its own process. > > Both jcmd and process are built from jdk-hs. > > Then I tried attaching jdk-hs tip jcmd to an older VM (jdk 9), did not > work either. > > Then - and here it gets weird - I tried attaching a jdk9 jcmd to a jdk9 > Vm, and still nothing... am I making a thinking error here? Do I need > special options on Windows? On Unix this never gave me any trouble. > > Both processes run under the same user, from two console windows. I tried > both within and without cygwin too, does not make any difference. > > Thanks, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alan.Bateman at oracle.com Mon Apr 9 15:57:27 2018 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 9 Apr 2018 16:57:27 +0100 Subject: jcmd, windows x64: cannot see other processes? In-Reply-To: References:

Message-ID: <2e83859a-8623-c40a-00b7-b0b460c4d798@oracle.com> On 09/04/2018 16:50, Thomas St?fe wrote: > So, I found that I can attach with jcmd just fine, just the process > listing does not work. > > I can only attach via pid, not via command name, which I think stems > from the same error. > > Does anyone have any idea? Should I open a bug report? > > Its this something to do with the value of java.io.tmpdir? Are the running VMs using their own temp dir? -Alan From andrew_m_leonard at uk.ibm.com Mon Apr 9 16:07:27 2018 From: andrew_m_leonard at uk.ibm.com (Andrew Leonard) Date: Mon, 9 Apr 2018 17:07:27 +0100 Subject: RFR: Fix race condition in jdwp Message-ID: Hi, We discovered in our testing with OpenJ9 that a race condition can occur in the jdwp under certain circumstances, and we were able to force the same issue with Hotspot. Normally, the event helper thread suspends all threads, then the debug loop in the listener thread receives a command to resume. The debugger may deadlock if the debug loop in the listener thread starts processing commands (e.g. resume threads) before the event helper completes the initialization (and suspends threads). This patch adds synchronization to ensure the event helper completes the initialization sequence before debugger commands are processed. Please can I find a sponsor for this contribution? Patch below.. Many thanks Andrew diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -58,6 +58,7 @@ static jboolean vmInitialized; static jrawMonitorID initMonitor; static jboolean initComplete; +static jboolean VMInitComplete; static jbyte currentSessionID; /* @@ -617,6 +618,35 @@ debugMonitorExit(initMonitor); } +/* + * Signal VM initialization is complete. + */ +void +signalVMInitComplete(void) +{ + /* + * VM Initialization is complete + */ + LOG_MISC(("signal VM initialization complete")); + debugMonitorEnter(initMonitor); + VMInitComplete = JNI_TRUE; + debugMonitorNotifyAll(initMonitor); + debugMonitorExit(initMonitor); +} + +/* + * Wait for VM initialization to complete. + */ +void +debugInit_waitVMInitComplete(void) +{ + debugMonitorEnter(initMonitor); + while (!VMInitComplete) { + debugMonitorWait(initMonitor); + } + debugMonitorExit(initMonitor); +} + /* All process exit() calls come from here */ void forceExit(int exit_code) @@ -672,6 +702,7 @@ LOG_MISC(("Begin initialize()")); currentSessionID = 0; initComplete = JNI_FALSE; + VMInitComplete = JNI_FALSE; if ( gdata->vmDead ) { EXIT_ERROR(AGENT_ERROR_INTERNAL,"VM dead at initialize() time"); diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugInit.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -39,4 +39,7 @@ void debugInit_exit(jvmtiError, const char *); void forceExit(int); +void debugInit_waitVMInitComplete(void); +void signalVMInitComplete(void); + #endif diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -98,6 +98,7 @@ standardHandlers_onConnect(); threadControl_onConnect(); + debugInit_waitVMInitComplete(); /* Okay, start reading cmds! */ while (shouldListen) { if (!dequeue(&p)) { diff --git a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c --- a/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c +++ b/src/jdk.jdwp.agent/share/native/libjdwp/eventHelper.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 1998, 2017, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -580,6 +580,7 @@ (void)threadControl_suspendThread(command->thread, JNI_FALSE); } + signalVMInitComplete(); outStream_initCommand(&out, uniqueID(), 0x0, JDWP_COMMAND_SET(Event), JDWP_COMMAND(Event, Composite)); Andrew Leonard Java Runtimes Development IBM Hursley IBM United Kingdom Ltd Phone internal: 245913, external: 01962 815913 internet email: andrew_m_leonard at uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Mon Apr 9 16:27:11 2018 From: jcbeyler at google.com (JC Beyler) Date: Mon, 09 Apr 2018 16:27:11 +0000 Subject: =?UTF-8?Q?Re=3A_inspect_a_thread=E2=80=99s_stack?= In-Reply-To: <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> References: <5D285FC05679A441ACF34A90905BFA92241A78A1@GBEDBP01.chp.co.uk> <5D285FC05679A441ACF34A90905BFA92241A7A43@GBEDBP01.chp.co.uk> Message-ID: I think the conversation will shift a bit if you explain what you mean with: "// inspect the frames of that thread doing any needed business with them" What exactly do you have in mind? Do you want to change the stack in some way? Because, depending on what you want, Andrew's comment on: ThreadMXBean.getThreadInfo(id).getStackTrace() ? seems reasonable to me :) Jc On Mon, Apr 9, 2018 at 1:51 AM Pietro Paolini < Pietro.Paolini at alfasystems.com> wrote: > > >Access to stacktraces with locals is demoed in this test > > > http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/StackWalker/LocalsAndOperands.java > > Maybe I haven't read it well enough but isn't that accessible through > https://docs.oracle.com/javase/9/docs/api/java/lang/StackWalker.html ? As > long as you are on Java 9 that should not > be a problem. > > >but the functionality does not seem to be available (yet!) via a public > API. > > What do you mean ? Isn't that a public API ? > > Thanks, > P. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Mon Apr 9 17:54:29 2018 From: martinrb at google.com (Martin Buchholz) Date: Mon, 9 Apr 2018 10:54:29 -0700 Subject: RFR: 8201327: Make Sensor deeply immutably thread safe Message-ID: Another little cleanup to make Google's race detector happy. 8201327: Make Sensor deeply immutably thread safe http://cr.openjdk.java.net/~martin/webrevs/jdk/Sensor-init/ https://bugs.openjdk.java.net/browse/JDK-8201327 -------------- next part -------------- An HTML attachment was scrubbed... URL: