RFR(S): 8200720: Print additional information in thread dump (times, allocated bytes etc.)
David Holmes
david.holmes at oracle.com
Wed Jun 6 02:55:53 UTC 2018
Hi Goetz,
On 5/06/2018 11:07 PM, Lindenmaier, Goetz wrote:
> Hi
>
> this discussion touched a lot of points so far, which seem
> to lead to different conclusions.
>
> I think we should look at the values printed:
>
> 1. cpu=6300.65ms elapsed=123.28s
> Overhead
> cpu time:
> * system calls at thread dump time
> elapsed time:
> * 1 system call at thread creation time
> * 1 64-bit field per thread for the thread start time
> * 1 system call at thread dump time
>
> As I understand, JDK-8143176 would have had to get the
> time at each locking, which is much more time critical
> than doing this during thread creation. While
Correct.
> the time a lock was held would be much more useful,
> the ratio here, combined with knowledge about the application,
> could lead to the conclusion that the thread is wrongly
> blocked at much lower cost.
Agreed. I see no issue with unconditionally adding this information as
it should address some of the concerns of 8143176 as well.
>
> 2. pthread-id=0x109708000
> Overhead:
> * a field access at thread dump time. Negligible I would say.
I'd overlooked this addition, but it is of course the chunk of ifdef
code in osThread.cpp that I objected to. The problem I have here is the
convoluted mess of "thread identifiers" that we already have. We
currently print:
st->print("tid=" INTPTR_FORMAT " ", p2i(this));
which is just the address of the Thread/JavaThread object. Then:
st->print("nid=0x%x ", thread_id());
where 'n' is supposed to represent "native thread id", which is:
- linux: kernel thread id from syscall gettid
- solaris: thread library identity from thr_create() or thr_self()
- windows: thread id from _beginthreadex
- OS X: kernel thread id from pthread_mach_thread_np(pthread_self());
- Other BSD: kernel (?) thread id from syscall thr_self or getthrid
- AIX: thread library identity from pthread_create() or pthread_self()
It's telling that the code Gunter added gets the thread library id on
linux, but a "kernel thread id" on other platforms! Do we want to see a
thread library id and a kernel thread id? IIRC nid is supposed be the id
you need to see the thread in a debugger. In which case I'm unclear what
role the thread id Gunter wants to add is playing? I don't think we need
to see kernel ids in general, and pthread-id isn't useful for debugging
is it?
>
> 3. allocated=242236760B defined_classes=1725
> Overhead:
> * 1 64-bit field per thread.
> * counter increment on class creation
>
> Especially defined_classes seems to be controversial.
> As this only makes sense for Java threads, this could
> be printed in a line of it's own in Threads::print_on_error()
> (which already gets a flag to customize for jstack:
> print_concurrent_locks) by calling a dedicated function in Thread.
> No need for flag PrintExtendedThreadInfo thus.
>
> But I would propose to drop this information as it is
> too controversial.
Agreed.
> This leaves the times and the pthread-id to be decided whether
> they are worth the cost. If so, I think they should be printed
> unconditional.
Agreed. With some refinement of the "thread id" issue.
> If a flag is required to avoid bloating the info in the default case,
> I would include more infos, as os_prio and nid under that flag.
Of course those things exist for good reason. The native id is supposed
to be what aids in you matching what you see in our thread dumps with
what you see in a debugger. Priority is less of an issue these days, but
there was a time ... ;-)
> As jstack is deprecated, changes to jstack could be skipped always
> disabling the new printouts.
>
> Finally, I would propose to move _allocated_bytes into
> threadStatisticInfo.hpp.
Didn't that part get dropped? (You don't need it for the time info or
thread id). That's a GC logging issue to me.
Thanks,
David
> Best regards,
> Goetz.
>
>
>
>
>> -----Original Message-----
>> From: serviceability-dev [mailto:serviceability-dev-
>> bounces at openjdk.java.net] On Behalf Of David Holmes
>> Sent: Dienstag, 5. Juni 2018 04:53
>> To: Haug, Gunter <gunter.haug at sap.com>; Chris Plummer
>> <chris.plummer at oracle.com>; serviceability-dev <serviceability-
>> dev at openjdk.java.net>; Langer, Christoph <christoph.langer at sap.com>;
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(S): 8200720: Print additional information in thread dump
>> (times, allocated bytes etc.)
>>
>> Hi Gunter,
>>
>> On 5/06/2018 1:27 AM, Haug, Gunter wrote:
>>> Hi David, Chris,
>>>
>>> would it be an option to unconditionally print the additional information?
>> Regardless which way this is implemented it will be rather complicated and it
>> only switches on/off a few field in the thread dump.
>>
>> I'm not convinced this is all suitable information for a thread stack
>> dump. I can see the time information being useful if using the dump to
>> try and diagnoze a hang or responsiveness issue. But the allocated-bytes
>> and classes-defined just doesn't seem useful in the context of a thread
>> dump to me. Anyone interested in allocation stats is going to be looking
>> at GC logs etc which is where this belongs. Ditto the class-stats belong
>> in some kind of classloading logging IMHO.
>>
>> I'm very reluctant to burden the implementation with capturing these
>> kinds of stats, just to use for diagnostic purposes that may not even be
>> asked for. I'm very much in the "you shouldn't pay for what you don't
>> use" camp in that regard. (See my comments in JDK-8143176 referenced by
>> Sean.)
>>
>> The ThreadStatisticInfo adds overhead to every thread object - and I'd
>> have to suspect there could be overlap with whatever flight recorder
>> sticks in there too. (Thread has become even more bloated in recent time!).
>>
>> Cheers,
>> David
>>
>>
>>> Thanks,
>>> Gunter
>>>
>>> On 04.06.18, 01:13, "David Holmes" <david.holmes at oracle.com> wrote:
>>>
>>> Hi Gunter, Chris,
>>>
>>> Sorry just trying to catch up and this is only a partial look so far ...
>>>
>>> BTW these changes are not limited to serviceability code so adding in
>>> runtime team as well.
>>>
>>> On 2/06/2018 9:12 AM, Chris Plummer wrote:
>>> > Hi Gunter,
>>> >
>>> > On 6/1/18 3:17 AM, Haug, Gunter wrote:
>>> >> Hi Chris,
>>> >>
>>> >> thanks for looking into this!
>>> >>
>>> >> Re the synchronization: The value is stored only in a VM operation at
>>> >> a safepoint by the VM thread. I was of the opinion, that this could
>>> >> not be interrupted by a second VM operation (of the same type). Or
>> am
>>> >> I missing something else?
>>> > I was really thinking more about the temporary changing of
>>> > PrintExtendedThreadInfo, not the value stored in the VMOp. You may
>> be be
>>> > correct that no more than one VMOp is executing, but while it is
>>> > executing it is has changed the value of PrintExtendedThreadInfo,
>> which
>>> > might have an impact on anything else that triggers printing thread info
>>> > while the VMOp is executing.
>>>
>>> Even if nothing else can possibly be running during the safepoint I find
>>> it extremely bad form to change a command-line flag in this way,
>>> particularly from a safepoint!
>>>
>>> If this flag is intended to be dynamically enabled as part of a dcmd
>>> then it should also be a Manageable flag IMHO.
>>>
>>> That said ...
>>>
>>> >> I did think about passing an argument to the various print_on
>> member
>>> >> functions of the thread classes, but this would require rather
>>> >> extensive code changes for a rather tiny extension. Therefore I feel
>>> >> doing it like this is the lesser evil.
>>>
>>> ... it's obviously not truly a global setting, but one that is needed on
>>> a per-print-request basis. The flag is just the default, but if the
>>> default is off you still want to enable extended printing on a
>>> per-request basis.
>>>
>>> The current print_on mechanics is not set up for this kind of
>>> flexibility. I think this needs more thought.
>>>
>>> ---
>>>
>>> Re osThread.cpp shared code changes ... I really hate to see all the
>>> ifdefs in there. Please add a pd_print_on function to support this if
>>> you truly want platform specific information.
>>>
>>> ---
>>>
>>> threadStatisticInfo.hpp
>>>
>>> typo: getElepsedTime() -> getElapsedTime()
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>
>>> >>
>>> >> Re thread_dump(): I think it's correct or, well, at least it works ;-)
>>> >> In fact jstack will transfer the "-e" and "-l" in only one string,
>>> >> i.e. op->arg(0).
>>> > So you are saying that op->arg(0) is "-e -l" when using jstack? I think
>>> > you really need to clean up the parsing. As it stands right now passing,
>>> > I get the feeling that if a user erroneously asks for help by using
>>> > "jcmd <pid> Thread.Print -help", it will end up executing with -e an -l
>>> > enabled, and no failure for the invalid "-help" option.
>>> >> Christoph has already explained my reasoning. But I agree, it's not
>>> >> nice and I would be happy to do it like Christoph suggested.
>>> > I'll respond separately to his suggestion.
>>> >
>>> > thanks,
>>> >
>>> > Chris
>>> >>
>>> >> And typo fixed, sorry.
>>> >>
>>> >> Thanks again,
>>> >> Gunter
>>> >>
>>> >> On 01.06.18, 00:03, "Chris Plummer" <chris.plummer at oracle.com>
>> wrote:
>>> >>
>>> >> Hi Gunter,
>>> >> globals.hpp: fix typo "informatiuon"
>>> >> I worry a little bit about the synchronizing (if that's the right
>>> >> word)
>>> >> of PrintExtendedThreadInfo and the dcmd's -e flag. When using -e,
>>> >> you
>>> >> are temporarily enabling PrintExtendedThreadInfo if it was false.
>>> >> This
>>> >> temporarily changes the behavior of thread dumps, and could
>>> >> impact other
>>> >> uses that happen in parallel. Also, could two simultaneous uses
>>> >> of -e
>>> >> result in PrintExtendedThreadInfo not getting restored properly?
>>> >> thread_dump() doesn't look right. It looks like you are iterating
>>> >> char
>>> >> by char over the argument, and expect something like "-el" to be
>>> >> specified rather then "-e -l". The loop should be iterating over
>>> >> op->arg(i), not op->arg(0)[i].
>>> >> The rest of the changes look fine.
>>> >> thanks,
>>> >> Chris
>>> >> On 5/30/18 8:12 AM, Haug, Gunter wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > As Chris proposed, I have made an the extended output
>>> >> switchable. There is an VM flag (PrintExtendedThreadInfo), which is
>>> >> false by default. Moreover, there is an Option (-e) which can be used
>>> >> with jcmd Thread.print as well as with jstack. The -e option
>>> >> essentially sets PrintExtendedThreadInfo true just for the respective
>>> >> thread dump.
>>> >> >
>>> >> > Here is the updated webrev:
>>> >> >
>>> >> > http://cr.openjdk.java.net/~ghaug/webrevs/8200720.v2
>>> >> >
>>> >> > (https://bugs.openjdk.java.net/browse/JDK-8200720)
>>> >> >
>>> >> > Thanks,
>>> >> > Gunter
>>> >> >
>>> >> >
>>> >> > On 02.05.18, 17:07, "serviceability-dev on behalf of Haug,
>>> >> Gunter" <serviceability-dev-bounces at openjdk.java.net on behalf of
>>> >> gunter.haug at sap.com> wrote:
>>> >> >
>>> >> > Hi Chris,
>>> >> >
>>> >> > Thanks for looking into this.
>>> >> > You're right, there is a little more we have. We have
>>> >> implemented an IO tracing mechanism which - rather as a byproduct -
>>> >> keeps track of bytes read and written per thread. However, this of
>>> >> course requires changes not only in hotspot. We would be happy to
>>> >> contribute this as well, but this is a far bigger change and will
>>> >> probably lead to a far bigger discussion. Anyway, with the number of
>>> >> bytes read, the number of classes defined doesn't look that arbitrary
>>> >> anymore, as one can correlate IO to class loading.
>>> >> >
>>> >> > Regarding the verbose option I think that's a good idea!
>>> >> >
>>> >> > Thanks again,
>>> >> > Gunter
>>> >> >
>>> >> > On 01.05.18, 22:55, "Chris Plummer"
>>> >> <chris.plummer at oracle.com> wrote:
>>> >> >
>>> >> > Hi Gunter,
>>> >> >
>>> >> > The output you are adding is all useful. I think the
>>> >> question is (and
>>> >> > I'd like to see a few people chime in on this for this
>>> >> review) is
>>> >> > whether or not all of it is the appropriate for a
>>> >> thread's stack dump.
>>> >> > For example, defined_classes is on the fringe of what
>>> >> I would call
>>> >> > generally useful info in a stack dump. Sure, there
>>> >> might be that rare
>>> >> > case when you need it, but how often compared to other
>>> >> useful info
>>> >> > maintained on a per thread basis. How many other bits
>>> >> of useful info are
>>> >> > not being printed in favor of defined_classes? It
>>> >> seems you have more in
>>> >> > the queue. How many? I'm worried about how cluttered
>>> >> the stack dumps
>>> >> > will get. Maybe we should add some sort of verbose
>>> >> thread dumping
>>> >> > option. Just a thought.
>>> >> >
>>> >> > As for the implementation, overall it looks good, but
>>> >> I can't speak to
>>> >> > whether or not you are doing proper accounting of
>>> >> defined_classes and
>>> >> > bytes allocated. You'll need input from someone with
>>> >> more knowledge of
>>> >> > those areas. We'll also need to do some testing to
>>> >> make sure tool tests
>>> >> > are not impacted.
>>> >> >
>>> >> > thanks,
>>> >> >
>>> >> > Chris
>>> >> >
>>> >> > On 4/30/18 2:51 AM, Haug, Gunter wrote:
>>> >> > > Hi,
>>> >> > >
>>> >> > > this is an update to an RFR I posted on hotspot-dev,
>>> >> but it is probably more suitable to post it here. Can I please have a
>>> >> review and a sponsor for the following enhancement:
>>> >> > >
>>> >> > > http://cr.openjdk.java.net/~ghaug/webrevs/8200720.v1
>>> >> > > https://bugs.openjdk.java.net/browse/JDK-8200720
>>> >> > >
>>> >> > > We at SAP have extended the thread dumps (obtained
>>> >> by jstack or jcmd) by several fields providing thread specific
>>> >> information. These extensions are quite popular with our support
>> team.
>>> >> With some knowledge of the architecture of the application, they
>> often
>>> >> allow for quick and simple diagnosis of a running system. Therefore
>> we
>>> >> would like to contribute these enhancements.
>>> >> > >
>>> >> > > I took a few simple examples here, namely cpu time,
>>> >> elapsed time since thread creation, bytes allocated and classes
>>> >> defined by the thread and the pthread-id or equivalent on platforms
>>> >> where it makes sense. Provided it is known how the application
>> should
>>> >> behave, a misbehaving thread can identified easily.
>>> >> > >
>>> >> > > There is no measurable overhead for this
>>> >> enhancement. However, it may be a problem that the format of the
>>> >> output is changed. Tools parsing the output may have to be changed.
>>> >> > >
>>> >> > > Here is an example of the output generated:
>>> >> > >
>>> >> > > ------------------------------------------------------
>>> >> > >
>>> >> > > "main" #1 prio=5 os_prio=31 cpu=6300.65ms
>>> >> elapsed=123.28s allocated=242236760B defined_classes=1725
>>> >> tid=0x00007fa13a806000 nid=0x1c03 pthread-id=0x109708000 waiting
>> on
>>> >> condition [0x0000000109707000]
>>> >> > > java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> >> > > JavaThread state: _thread_blocked
>>> >> > > Thread: 0x00007fa13a806000 [0x1c03] State:
>>> >> _at_safepoint _has_called_back 0 _at_poll_safepoint 0
>>> >> > > JavaThread state: _thread_blocked
>>> >> > > at java.lang.Thread.sleep(java.base/Native Method)
>>> >> > > ...
>>> >> > > ------------------------------------------------------
>>> >> > >
>>> >> > > As mentioned above, we have a whole bunch of other
>>> >> enhancements to the thread dump similar to this one and would be
>>> >> willing to contribute them if there is any interest.
>>> >> > >
>>> >> > > Thanks and best regards,
>>> >> > > Gunter
>>> >> > >
>>> >> > >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>>
More information about the serviceability-dev
mailing list