RFR(S): 8200720: Print additional information in thread dump (times, allocated bytes etc.)

Thomas Stüfe thomas.stuefe at gmail.com
Wed Jun 6 04:45:05 UTC 2018


On Wed, Jun 6, 2018 at 4:55 AM, David Holmes <david.holmes at oracle.com> wrote:
> Hi Goetz,
>
> On 5/06/2018 11:07 PM, Lindenmaier, Goetz wrote:
>>
>> Hi
>>
>> this discussion touched a lot of points so far, which seem
>> to lead to different conclusions.
>>
>> I think we should look at the values printed:
>>
>> 1. cpu=6300.65ms elapsed=123.28s
>> Overhead
>>   cpu time:
>>     * system calls at thread dump time
>>   elapsed time:
>>     * 1 system call at thread creation time
>>     * 1 64-bit field per thread for the thread start time
>>     * 1 system call at thread dump time
>>
>> As I understand, JDK-8143176 would have had to get the
>> time at each locking, which is much more time critical
>> than doing this during thread creation. While
>
>
> Correct.
>
>> the time a lock was held would be much more useful,
>> the ratio here, combined with knowledge about the application,
>> could lead to the conclusion that the thread is wrongly
>> blocked at much lower cost.
>
>
> Agreed. I see no issue with unconditionally adding this information as it
> should address some of the concerns of 8143176 as well.
>
>>
>> 2. pthread-id=0x109708000
>> Overhead:
>>   * a field access at thread dump time. Negligible I would say.
>
>
> I'd overlooked this addition, but it is of course the chunk of ifdef code in
> osThread.cpp that I objected to. The problem I have here is the convoluted
> mess of "thread identifiers" that we already have. We currently print:
>
>  st->print("tid=" INTPTR_FORMAT " ", p2i(this));
>
> which is just the address of the Thread/JavaThread object. Then:
>
>  st->print("nid=0x%x ", thread_id());
>
> where 'n' is supposed to represent "native thread id", which is:
> - linux: kernel thread id from syscall gettid
> - solaris: thread library identity from thr_create() or thr_self()
> - windows: thread id from _beginthreadex
> - OS X: kernel thread id from pthread_mach_thread_np(pthread_self());
> - Other BSD: kernel (?) thread id from syscall thr_self or getthrid
> - AIX: thread library identity from pthread_create() or pthread_self()
>
> It's telling that the code Gunter added gets the thread library id on linux,
> but a "kernel thread id" on other platforms! Do we want to see a thread
> library id and a kernel thread id? IIRC nid is supposed be the id you need
> to see the thread in a debugger. In which case I'm unclear what role the
> thread id Gunter wants to add is playing? I don't think we need to see
> kernel ids in general, and pthread-id isn't useful for debugging is it?
>

Yes, feels just wrong. Plus, not even sufficient, since you can have
both pthread-id and kernel thread id, e.g. on AIX, and depending on
your situation you may need both. But then, you may need a lot of
things, and where do we draw the line? I admit I also feel
uncomfortable with adding all this output to thread dump, which is
noise in 99% of cases.

(BTW what about tools parsing the thread dump in its current form?
Wont we make a lot of scripts out there unhappy?)

--

Code wise, at the very least, I would abstract this kind of coding
into something like

os::print_native_thread_id(outputStream* st)
with the expectation of "only writes a few words, no newline".
Platforms like AIX may then print both.

(Note: not "get_native_thread_id()", since that would stipulates that
there is only one and that is can be expressed as a common type over
all architectures. Also, I would not want anyone to use it as a
number.)

--

As others before me (Kirk) noted, I find the information added by this
patch useful. But after looking closely at the patch, I would strongly
prefer a new separate jcmd for this. Lets call it "VM.thread_stats".

You still would have to collect and carry the statistics by thread,
but output would be separate both from the user perspective and also
in coding - you would not have to force-shove it into
Thread::print_on, for instance. Such a hypothetical command could be
in the future extended with other statistics as we see fit. I do not
see any advantage in overloading the thread dump vehicle in this way.

Just my 5 cent.

Best Regards, Thomas

>>
>> 3. allocated=242236760B defined_classes=1725
>> Overhead:
>>    * 1 64-bit field per thread.
>>    * counter increment on class creation
>>
>> Especially defined_classes seems to be controversial.
>> As this only makes sense for Java threads, this could
>> be printed in a line of it's own in Threads::print_on_error()
>> (which already gets a flag to customize for jstack:
>> print_concurrent_locks) by calling a dedicated function in Thread.
>> No need for flag PrintExtendedThreadInfo thus.
>>
>> But I would propose to drop this information as it is
>> too controversial.
>
>
> Agreed.
>
>> This leaves the times and the pthread-id to be decided whether
>> they are worth the cost. If so, I think they should be printed
>> unconditional.
>
>
> Agreed. With some refinement of the "thread id" issue.
>
>> If a flag is required to avoid bloating the info in the default case,
>> I would include more infos, as os_prio and nid under that flag.
>
>
> Of course those things exist for good reason. The native id is supposed to
> be what aids in you matching what you see in our thread dumps with what you
> see in a debugger. Priority is less of an issue these days, but there was a
> time ... ;-)
>
>> As jstack is deprecated, changes to jstack could be skipped always
>> disabling the new printouts.
>>
>> Finally, I would propose to move _allocated_bytes into
>> threadStatisticInfo.hpp.
>
>
> Didn't that part get dropped? (You don't need it for the time info or thread
> id). That's a GC logging issue to me.
>
> Thanks,
> David
>
>
>> Best regards,
>>    Goetz.
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: serviceability-dev [mailto:serviceability-dev-
>>> bounces at openjdk.java.net] On Behalf Of David Holmes
>>> Sent: Dienstag, 5. Juni 2018 04:53
>>> To: Haug, Gunter <gunter.haug at sap.com>; Chris Plummer
>>> <chris.plummer at oracle.com>; serviceability-dev <serviceability-
>>> dev at openjdk.java.net>; Langer, Christoph <christoph.langer at sap.com>;
>>> hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(S): 8200720: Print additional information in thread dump
>>> (times, allocated bytes etc.)
>>>
>>> Hi Gunter,
>>>
>>> On 5/06/2018 1:27 AM, Haug, Gunter wrote:
>>>>
>>>> Hi David, Chris,
>>>>
>>>> would it be an option to unconditionally print the additional
>>>> information?
>>>
>>> Regardless which way this is implemented it will be rather complicated
>>> and it
>>> only switches on/off a few field in the thread dump.
>>>
>>> I'm not convinced this is all suitable information for a thread stack
>>> dump. I can see the time information being useful if using the dump to
>>> try and diagnoze a hang or responsiveness issue. But the allocated-bytes
>>> and classes-defined just doesn't seem useful in the context of a thread
>>> dump to me. Anyone interested in allocation stats is going to be looking
>>> at GC logs etc which is where this belongs. Ditto the class-stats belong
>>> in some kind of classloading logging IMHO.
>>>
>>> I'm very reluctant to burden the implementation with capturing these
>>> kinds of stats, just to use for diagnostic purposes that may not even be
>>> asked for. I'm very much in the "you shouldn't pay for what you don't
>>> use" camp in that regard. (See my comments in JDK-8143176 referenced by
>>> Sean.)
>>>
>>> The ThreadStatisticInfo adds overhead to every thread object - and I'd
>>> have to suspect there could be overlap with whatever flight recorder
>>> sticks in there too. (Thread has become even more bloated in recent
>>> time!).
>>>
>>> Cheers,
>>> David
>>>
>>>
>>>> Thanks,
>>>> Gunter
>>>>
>>>> On 04.06.18, 01:13, "David Holmes" <david.holmes at oracle.com> wrote:
>>>>
>>>>       Hi Gunter, Chris,
>>>>
>>>>       Sorry just trying to catch up and this is only a partial look so
>>>> far ...
>>>>
>>>>       BTW these changes are not limited to serviceability code so adding
>>>> in
>>>>       runtime team as well.
>>>>
>>>>       On 2/06/2018 9:12 AM, Chris Plummer wrote:
>>>>       > Hi Gunter,
>>>>       >
>>>>       > On 6/1/18 3:17 AM, Haug, Gunter wrote:
>>>>       >> Hi Chris,
>>>>       >>
>>>>       >> thanks for looking into this!
>>>>       >>
>>>>       >> Re the synchronization: The value is stored only in a VM
>>>> operation at
>>>>       >> a safepoint by the VM thread. I was of the opinion, that this
>>>> could
>>>>       >> not be interrupted by a second VM operation (of the same type).
>>>> Or
>>>
>>> am
>>>>
>>>>       >> I missing something else?
>>>>       > I was really thinking more about the temporary changing of
>>>>       > PrintExtendedThreadInfo, not the value stored in the VMOp. You
>>>> may
>>>
>>> be be
>>>>
>>>>       > correct that no more than one VMOp is executing, but while it is
>>>>       > executing it is has changed the value of
>>>> PrintExtendedThreadInfo,
>>>
>>> which
>>>>
>>>>       > might have an impact on anything else that triggers printing
>>>> thread info
>>>>       > while the VMOp is executing.
>>>>
>>>>       Even if nothing else can possibly be running during the safepoint
>>>> I find
>>>>       it extremely bad form to change a command-line flag in this way,
>>>>       particularly from a safepoint!
>>>>
>>>>       If this flag is intended to be dynamically enabled as part of a
>>>> dcmd
>>>>       then it should also be a Manageable flag IMHO.
>>>>
>>>>       That said ...
>>>>
>>>>       >>   I did think about passing an argument to the various print_on
>>>
>>> member
>>>>
>>>>       >> functions of the thread classes, but this would require rather
>>>>       >> extensive code changes for a rather tiny extension. Therefore I
>>>> feel
>>>>       >> doing it like this is the lesser evil.
>>>>
>>>>       ... it's obviously not truly a global setting, but one that is
>>>> needed on
>>>>       a per-print-request basis. The flag is just the default, but if
>>>> the
>>>>       default is off you still want to enable extended printing on a
>>>>       per-request basis.
>>>>
>>>>       The current print_on mechanics is not set up for this kind of
>>>>       flexibility. I think this needs more thought.
>>>>
>>>>       ---
>>>>
>>>>       Re osThread.cpp shared code changes ... I really hate to see all
>>>> the
>>>>       ifdefs in there. Please add a pd_print_on function to support this
>>>> if
>>>>       you truly want platform specific information.
>>>>
>>>>       ---
>>>>
>>>>       threadStatisticInfo.hpp
>>>>
>>>>       typo: getElepsedTime() -> getElapsedTime()
>>>>
>>>>       Thanks,
>>>>       David
>>>>
>>>>
>>>>
>>>>       >>
>>>>       >> Re thread_dump(): I think it's correct or, well, at least it
>>>> works ;-)
>>>>       >> In fact jstack will transfer the "-e" and "-l" in only one
>>>> string,
>>>>       >> i.e. op->arg(0).
>>>>       > So you are saying that op->arg(0) is "-e -l" when using jstack?
>>>> I think
>>>>       > you really need to clean up the parsing. As it stands right now
>>>> passing,
>>>>       > I get the feeling that if a user erroneously asks for help by
>>>> using
>>>>       > "jcmd <pid> Thread.Print -help", it will end up executing with
>>>> -e an -l
>>>>       > enabled, and no failure for the invalid "-help" option.
>>>>       >>   Christoph has already explained my reasoning. But I agree,
>>>> it's not
>>>>       >> nice and I would be happy to do it like Christoph suggested.
>>>>       > I'll respond separately to his suggestion.
>>>>       >
>>>>       > thanks,
>>>>       >
>>>>       > Chris
>>>>       >>
>>>>       >> And typo fixed, sorry.
>>>>       >>
>>>>       >> Thanks again,
>>>>       >> Gunter
>>>>       >>
>>>>       >> On 01.06.18, 00:03, "Chris Plummer" <chris.plummer at oracle.com>
>>>
>>> wrote:
>>>>
>>>>       >>
>>>>       >>      Hi Gunter,
>>>>       >>      globals.hpp: fix typo "informatiuon"
>>>>       >>      I worry a little bit about the synchronizing (if that's
>>>> the right
>>>>       >> word)
>>>>       >>      of PrintExtendedThreadInfo and the dcmd's -e flag. When
>>>> using -e,
>>>>       >> you
>>>>       >>      are temporarily enabling PrintExtendedThreadInfo if it was
>>>> false.
>>>>       >> This
>>>>       >>      temporarily changes the behavior of thread dumps, and
>>>> could
>>>>       >> impact other
>>>>       >>      uses that happen in parallel. Also, could two simultaneous
>>>> uses
>>>>       >> of -e
>>>>       >>      result in PrintExtendedThreadInfo not getting restored
>>>> properly?
>>>>       >>      thread_dump() doesn't look right. It looks like you are
>>>> iterating
>>>>       >> char
>>>>       >>      by char over the argument, and expect something like "-el"
>>>> to be
>>>>       >>      specified rather then "-e -l". The loop should be
>>>> iterating over
>>>>       >>      op->arg(i), not op->arg(0)[i].
>>>>       >>      The rest of the changes look fine.
>>>>       >>      thanks,
>>>>       >>      Chris
>>>>       >>      On 5/30/18 8:12 AM, Haug, Gunter wrote:
>>>>       >>      > Hi all,
>>>>       >>      >
>>>>       >>      > As Chris proposed, I have made an the extended output
>>>>       >> switchable. There is an VM flag (PrintExtendedThreadInfo),
>>>> which is
>>>>       >> false by default. Moreover, there is an Option (-e) which can
>>>> be used
>>>>       >> with jcmd Thread.print as well as with jstack. The -e option
>>>>       >> essentially sets PrintExtendedThreadInfo true just for the
>>>> respective
>>>>       >> thread dump.
>>>>       >>      >
>>>>       >>      > Here is the updated webrev:
>>>>       >>      >
>>>>       >>      > http://cr.openjdk.java.net/~ghaug/webrevs/8200720.v2
>>>>       >>      >
>>>>       >>      > (https://bugs.openjdk.java.net/browse/JDK-8200720)
>>>>       >>      >
>>>>       >>      > Thanks,
>>>>       >>      > Gunter
>>>>       >>      >
>>>>       >>      >
>>>>       >>      > On 02.05.18, 17:07, "serviceability-dev on behalf of
>>>> Haug,
>>>>       >> Gunter" <serviceability-dev-bounces at openjdk.java.net on behalf
>>>> of
>>>>       >> gunter.haug at sap.com> wrote:
>>>>       >>      >
>>>>       >>      >      Hi Chris,
>>>>       >>      >
>>>>       >>      >      Thanks for looking into this.
>>>>       >>      >      You're right, there is a little more we have. We
>>>> have
>>>>       >> implemented an IO tracing mechanism which - rather as a
>>>> byproduct -
>>>>       >> keeps track of bytes read and written per thread. However, this
>>>> of
>>>>       >> course requires changes not only in hotspot. We would be happy
>>>> to
>>>>       >> contribute this as well, but this is a far bigger change and
>>>> will
>>>>       >> probably lead to a far bigger discussion. Anyway, with the
>>>> number of
>>>>       >> bytes read, the number of classes defined doesn't look that
>>>> arbitrary
>>>>       >> anymore, as one can correlate IO to class loading.
>>>>       >>      >
>>>>       >>      >      Regarding the verbose option I think that's a good
>>>> idea!
>>>>       >>      >
>>>>       >>      >      Thanks again,
>>>>       >>      >      Gunter
>>>>       >>      >
>>>>       >>      >      On 01.05.18, 22:55, "Chris Plummer"
>>>>       >> <chris.plummer at oracle.com> wrote:
>>>>       >>      >
>>>>       >>      >          Hi Gunter,
>>>>       >>      >
>>>>       >>      >          The output you are adding is all useful. I
>>>> think the
>>>>       >> question is (and
>>>>       >>      >          I'd like to see a few people chime in on this
>>>> for this
>>>>       >> review) is
>>>>       >>      >          whether or not all of it is the appropriate for
>>>> a
>>>>       >> thread's stack dump.
>>>>       >>      >          For example, defined_classes is on the fringe
>>>> of what
>>>>       >> I would call
>>>>       >>      >          generally useful info in a stack dump. Sure,
>>>> there
>>>>       >> might be that rare
>>>>       >>      >          case when you need it, but how often compared
>>>> to other
>>>>       >> useful info
>>>>       >>      >          maintained on a per thread basis. How many
>>>> other bits
>>>>       >> of useful info are
>>>>       >>      >          not being printed in favor of defined_classes?
>>>> It
>>>>       >> seems you have more in
>>>>       >>      >          the queue. How many? I'm worried about how
>>>> cluttered
>>>>       >> the stack dumps
>>>>       >>      >          will get. Maybe we should add some sort of
>>>> verbose
>>>>       >> thread dumping
>>>>       >>      >          option. Just a thought.
>>>>       >>      >
>>>>       >>      >          As for the implementation, overall it looks
>>>> good, but
>>>>       >> I can't speak to
>>>>       >>      >          whether or not you are doing proper accounting
>>>> of
>>>>       >> defined_classes and
>>>>       >>      >          bytes allocated. You'll need input from someone
>>>> with
>>>>       >> more knowledge of
>>>>       >>      >          those areas. We'll also need to do some testing
>>>> to
>>>>       >> make sure tool tests
>>>>       >>      >          are not impacted.
>>>>       >>      >
>>>>       >>      >          thanks,
>>>>       >>      >
>>>>       >>      >          Chris
>>>>       >>      >
>>>>       >>      >          On 4/30/18 2:51 AM, Haug, Gunter wrote:
>>>>       >>      >          > Hi,
>>>>       >>      >          >
>>>>       >>      >          > this is an update to an RFR I posted on
>>>> hotspot-dev,
>>>>       >> but it is probably more suitable to post it here. Can I please
>>>> have a
>>>>       >> review and a sponsor for the following enhancement:
>>>>       >>      >          >
>>>>       >>      >          >
>>>> http://cr.openjdk.java.net/~ghaug/webrevs/8200720.v1
>>>>       >>      >          >
>>>> https://bugs.openjdk.java.net/browse/JDK-8200720
>>>>       >>      >          >
>>>>       >>      >          > We at SAP have extended the thread dumps
>>>> (obtained
>>>>       >> by jstack or jcmd) by several fields providing thread specific
>>>>       >> information. These extensions are quite popular with our
>>>> support
>>>
>>> team.
>>>>
>>>>       >> With some knowledge of the architecture of the application,
>>>> they
>>>
>>> often
>>>>
>>>>       >> allow for quick and simple diagnosis of a running system.
>>>> Therefore
>>>
>>> we
>>>>
>>>>       >> would like to contribute these enhancements.
>>>>       >>      >          >
>>>>       >>      >          > I took a few simple examples here, namely cpu
>>>> time,
>>>>       >> elapsed time since thread creation, bytes allocated and classes
>>>>       >> defined by the thread and the pthread-id or equivalent on
>>>> platforms
>>>>       >> where it makes sense. Provided it is known how the application
>>>
>>> should
>>>>
>>>>       >> behave, a misbehaving thread can identified easily.
>>>>       >>      >          >
>>>>       >>      >          > There is no measurable overhead for this
>>>>       >> enhancement. However, it may be a problem that the format of
>>>> the
>>>>       >> output is changed. Tools parsing the output may have to be
>>>> changed.
>>>>       >>      >          >
>>>>       >>      >          > Here is an example of the output generated:
>>>>       >>      >          >
>>>>       >>      >          >
>>>> ------------------------------------------------------
>>>>       >>      >          >
>>>>       >>      >          > "main" #1 prio=5 os_prio=31 cpu=6300.65ms
>>>>       >> elapsed=123.28s allocated=242236760B defined_classes=1725
>>>>       >> tid=0x00007fa13a806000 nid=0x1c03 pthread-id=0x109708000
>>>> waiting
>>>
>>> on
>>>>
>>>>       >> condition [0x0000000109707000]
>>>>       >>      >          >     java.lang.Thread.State: TIMED_WAITING
>>>> (sleeping)
>>>>       >>      >          >     JavaThread state: _thread_blocked
>>>>       >>      >          > Thread: 0x00007fa13a806000 [0x1c03] State:
>>>>       >> _at_safepoint _has_called_back 0 _at_poll_safepoint 0
>>>>       >>      >          >     JavaThread state: _thread_blocked
>>>>       >>      >          > at java.lang.Thread.sleep(java.base/Native
>>>> Method)
>>>>       >>      >          >             ...
>>>>       >>      >          >
>>>> ------------------------------------------------------
>>>>       >>      >          >
>>>>       >>      >          > As mentioned above, we have a whole bunch of
>>>> other
>>>>       >> enhancements to the thread dump similar to this one and would
>>>> be
>>>>       >> willing to contribute them if there is any interest.
>>>>       >>      >          >
>>>>       >>      >          > Thanks and best regards,
>>>>       >>      >          > Gunter
>>>>       >>      >          >
>>>>       >>      >          >
>>>>       >>      >
>>>>       >>      >
>>>>       >>      >
>>>>       >>      >
>>>>       >>      >
>>>>       >>
>>>>       >
>>>>       >
>>>>
>>>>
>


More information about the hotspot-runtime-dev mailing list