Code Review fix for 6799919 Recursive calls to report_vm_out_of_memory are handled incorrectly

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Feb 20 15:54:17 PST 2013


Thanks for closing the loop on this thread.
And thanks for the careful reviews!

Dan


On 2/20/13 4:37 PM, David Holmes wrote:
> To be clear, all good from my perspective.
>
> Interesting archeology challenge too.
>
> Thanks,
> David
>
> On 21/02/2013 8:24 AM, Daniel D. Daugherty wrote:
>> Thanks again for the review!
>>
>> Dan
>>
>>
>> On 2/20/13 3:03 PM, Karen Kinnear wrote:
>>> Thanks for the explanation. The guarantee should let us know we are
>>> in that situation.
>>>
>>> thanks,
>>> Karen
>>>
>>> On Feb 20, 2013, at 1:48 PM, Daniel D. Daugherty wrote:
>>>
>>>> Karen,
>>>>
>>>> Thanks for the review!
>>>>
>>>>
>>>> On 2/20/13 11:39 AM, Karen Kinnear wrote:
>>>>> Thank you for fixing this Ron and to both you and Dan for figuring
>>>>> out a way to simulate this problem
>>>>> to test the fix.
>>>>>
>>>>> So what does happen with the UseOSErrorReporting option?
>>>>> If we know the answer, perhaps the comment could not state that we
>>>>> have to figure this out later?
>>>>
>>>> Since I had asked Ron to put in both the comment and the guarantee,
>>>> I'll field that question. The UseOSErrorReporting option in
>>>> VMError.report_and_die() is meant to handle the case where
>>>> report_and_die() is called for every frame during some error
>>>> reporting stack walk; Coleen made that change in report_and_die()
>>>> years ago...
>>>>
>>>> Based on my analysis of how the UseOSErrorReporting option is used,
>>>> I don't expect it to come into play when report_vm_out_of_memory()
>>>> calls VMError.report_and_die(), but I'm paranoid so I asked Ron to
>>>> add a guarantee() so we got some failure indication just in case
>>>> we returned from VMError.report_and_die() at some point in the
>>>> future. I didn't want us to return to the report_vm_out_of_memory()
>>>> caller nor did I want us to simply call abort() like the old code.
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>>> On Feb 20, 2013, at 12:46 PM, Daniel D. Daugherty wrote:
>>>>>
>>>>>> On 2/20/13 10:37 AM, Coleen Phillimore wrote:
>>>>>>> On 2/20/2013 12:05 PM, Daniel D. Daugherty wrote:
>>>>>>>> On 2/20/13 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>> On 2/20/13 9:34 AM, Coleen Phillimore wrote:
>>>>>>>>>>
>>>>>>>>>> This looks good.
>>>>>>>>>
>>>>>>>>> Thanks for the review! Don't know if Ron is having the same
>>>>>>>>> connectivity
>>>>>>>>> problems I'm having this morning, but my access into OWAN is
>>>>>>>>> going up
>>>>>>>>> and down...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> It looks like the webrev was updated to get rid of the unused
>>>>>>>>>> variable, so that is good.
>>>>>>>>>
>>>>>>>>> The webrev was not updated.
>>>>>>>
>>>>>>> Yes, I see that now.  Mikael has a much better eye than I do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Is there a test for ErrorHandlerTest in our repository already?
>>>>>>>>>
>>>>>>>>> ErrorHandlerTest? Is there yet another possible test that we 
>>>>>>>>> don't
>>>>>>>>> know about?
>>>>>>>>
>>>>>>>> OK. I know that test by a different name:
>>>>>>>>
>>>>>>>> $ rgrep ErrorHandlerTest agent make src test
>>>>>>>> src/share/vm/runtime/globals.hpp: notproduct(uintx,
>>>>>>>> ErrorHandlerTest, 0, \
>>>>>>>> src/share/vm/prims/jni.cpp:
>>>>>>>> NOT_PRODUCT(test_error_handler(ErrorHandlerTest));
>>>>>>>> test/runtime/6888954/vmerrors.sh: -XX:ErrorHandlerTest=${i}
>>>>>>>> -version > ${i2}.out 2>&1
>>>>>>>> test/runtime/6888954/vmerrors.sh:    # If ErrorHandlerTest is
>>>>>>>> ignored (product build), stop.
>>>>>>>> test/runtime/6888954/vmerrors.sh: echo "ErrorHandlerTest=$i
>>>>>>>> failed ($f)"
>>>>>>>>
>>>>>>>> Ron had previously explored using vmerror.sh to exercise the
>>>>>>>> vm_exit_out_of_memory() code path as one of his early experiments.
>>>>>>>> He also did some testing using the ErrorHandlerTest command line
>>>>>>>> option. In neither case did it seem possible to get multi-threaded
>>>>>>>> test cases up and running. Perhaps both he and I missed something.
>>>>>>>>
>>>>>>>> It also looks like Ron didn't record the specifics of his testing
>>>>>>>> with either vmerrors.sh or the ErrorHandlerTest command line 
>>>>>>>> option
>>>>>>>> in the bug report. I'll have to rattle his cage about that.
>>>>>>>>
>>>>>>>
>>>>>>> My question was mostly if we had a jtreg test in hotspot/test
>>>>>>> repository that exercised this ErrorHandlerTest option.   The
>>>>>>> second part of the question is whether we should have a test
>>>>>>> because it'll look like it failed.   Maybe this is more of a
>>>>>>> question for Christian Tornqvist and SQE and is not tied to this
>>>>>>> specific change.
>>>>>>>
>>>>>>> I talked to Ron about trying to test this also and we couldn't
>>>>>>> come up with a good strategy either.   We need a good way to test
>>>>>>> out of C heap memory without actually allocating lots of memory
>>>>>>> and running out of C heap.  Such tests don't run well.  Maybe we
>>>>>>> can do something in the future with this ErrorHandlerTest option
>>>>>>> to have the VM return NULL or assert for various malloc calls
>>>>>>> triggered through some heuristic. Having the experiments to date
>>>>>>> recorded somewhere would be great.
>>>>>>
>>>>>> See the READ_ME file attached to the bug report for the craziness 
>>>>>> that
>>>>>> Ron and I had to go through to properly test this. Some of what 
>>>>>> we came
>>>>>> up with should be useful as a diagnostic option, but that should be
>>>>>> done
>>>>>> as a separate bug fix.
>>>>> Totally agree, this should be a separate discussion.
>>>>>
>>>>> thanks,
>>>>> Karen
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>> On 2/19/2013 6:48 PM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I'm sponsoring this code review request from Ron Durbin. This
>>>>>>>>>>> change
>>>>>>>>>>> is targeted at JDK8/HSX-25 in the RT_Baseline repo.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have a proposed fix for the following bug:
>>>>>>>>>>>
>>>>>>>>>>>     6799919 Recursive calls to report_vm_out_of_memory are
>>>>>>>>>>> handled incorrectly
>>>>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6799919
>>>>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-6799919
>>>>>>>>>>>
>>>>>>>>>>> This is one of those bug fixes where the commit message nicely
>>>>>>>>>>> describes
>>>>>>>>>>> the change:
>>>>>>>>>>>
>>>>>>>>>>> 6799919: Recursive calls to report_vm_out_of_memory are
>>>>>>>>>>> handled incorrectly
>>>>>>>>>>> Summary: report_vm_out_of_memory() should allow
>>>>>>>>>>> VMError.report_and_die() to handle multiple out of native
>>>>>>>>>>> memory errors.
>>>>>>>>>>> Reviewed-by: dcubed, <other-reviewers>
>>>>>>>>>>> Contributed-by ron.durbin at oracle.com
>>>>>>>>>>>
>>>>>>>>>>> Here is the webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/for_rdurbin/6799919-webrev/0-hsx25 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Testing:
>>>>>>>>>>>    - See the READ_ME file attached to the JDK-6799919 for the
>>>>>>>>>>> gory details
>>>>>>>>>>>      of the testing needed to reproduce this failure and
>>>>>>>>>>> verify the fix
>>>>>>>>>>>    - regular JPRT test job is in process
>>>>>>>>>>>
>>>>>>>>>>> Comments, questions and suggestions are welcome.
>>>>>>>>>>>
>>>>>>>>>>> Ron
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the serviceability-dev mailing list