RFR(XS): 8205195 NestedThreadsListHandleInErrorHandlingTest fails because hs_err doesn't contain _nested_thread_list_max

Daniel D. Daugherty daniel.daugherty at oracle.com
Thu Jun 21 12:57:54 UTC 2018


Thomas,

Thanks for the quick review.


On 6/21/18 2:53 AM, Thomas Stüfe wrote:
> Hi Daniel,
>
> yes, that is annoying.
>
> I am okay with your fix, if you want to push it in this form.

Thanks.


> But preparing the test crash in this way feels weird since the whole
> point of this exercise is to test error handling in close-to-real
> scenarios... but it sure does not hurt in this case.

Yup. This is definitely weird, but my goal is to reduce testing noise.
I do acknowledge that the real world use of error reporting may run
into this failure mode which will suppress a section of hs_err_pid
output.


> Also, note that at different places we decide differently, see e.g.
> the "printing heap information" STEP - we omit locking Heap_lock in
> VMError::report() and only lock it in VMError::print_vm_info() (where
> we have no secondary signal handling and must not crash). So, in that
> case we are okay with risking a secondary crash in error handling.
> Probably there are just no regression tests for the heap information
> printout whose intermittent fails could annoy us :)

Yup. I recognized that when I wrote the Thread-SMR tests I was making
them picky enough to possible run into failure modes we never would
have detected before.


> My feeling is that I would like to see a solution at the test
> framework side. Maybe, if a test is marked as "may fail rarely and
> thats okay", the test framework could retry the test and only fail if
> the error happens again.

We currently don't have a way of tagging a test like that and I'm
not convinced that I would really want us to do that. However, this
particular bug truly falls into a no win scenario and that's a
different situation than I've encountered before.

Again, thanks for the review.

Dan


>
> Thanks, Thomas
>
>
>
> On Thu, Jun 21, 2018 at 2:18 AM, Daniel D. Daugherty
> <daniel.daugherty at oracle.com> wrote:
>> Greetings,
>>
>> I have a fix for a recent (very rare) Thread-SMR related test failure.
>>
>> Since the fix is related to the ErrorHandling tests and affects hs_err_pid
>> file generation, this code review is being sent to both the Runtime and
>> the Serviceability teams. Please make sure you reply-all to any responses
>> so we have complete review threads on both aliases.
>>
>> Bug URL: https://bugs.openjdk.java.net/browse/JDK-8205195
>>
>> Webrev URL: http://cr.openjdk.java.net/~dcubed/8205195-webrev/0-for-jdk-jdk/
>>
>> The bug itself contains analysis about the root cause of the bug and
>> the comment updates to the code describes the no win scenario that the
>> hs_err_pid file generation code is in. Of course, I also have a comment
>> where I was able to harden the ErrorHandling tests. I did manage to
>> resist the urge to mention the "Kobiyashi Maru" [1] in the new comments.
>>
>> Testing: Mach5 builds-tier1,jdk-tier1,jdk-tier2,hs-tier1,hs-tier2,hs-tier3
>>           on the usual Oracle platforms.
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
>> [1] https://www.urbandictionary.com/define.php?term=Kobayashi%20Maru
>>



More information about the hotspot-runtime-dev mailing list