Code Review fix for 6799919 Recursive calls to report_vm_out_of_memory are handled incorrectly

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Feb 20 14:24:40 PST 2013


Thanks again for the review!

Dan


On 2/20/13 3:03 PM, Karen Kinnear wrote:
> Thanks for the explanation. The guarantee should let us know we are
> in that situation.
>
> thanks,
> Karen
>
> On Feb 20, 2013, at 1:48 PM, Daniel D. Daugherty wrote:
>
>> Karen,
>>
>> Thanks for the review!
>>
>>
>> On 2/20/13 11:39 AM, Karen Kinnear wrote:
>>> Thank you for fixing this Ron and to both you and Dan for figuring 
>>> out a way to simulate this problem
>>> to test the fix.
>>>
>>> So what does happen with the UseOSErrorReporting option?
>>> If we know the answer, perhaps the comment could not state that we 
>>> have to figure this out later?
>>
>> Since I had asked Ron to put in both the comment and the guarantee,
>> I'll field that question. The UseOSErrorReporting option in
>> VMError.report_and_die() is meant to handle the case where
>> report_and_die() is called for every frame during some error
>> reporting stack walk; Coleen made that change in report_and_die()
>> years ago...
>>
>> Based on my analysis of how the UseOSErrorReporting option is used,
>> I don't expect it to come into play when report_vm_out_of_memory()
>> calls VMError.report_and_die(), but I'm paranoid so I asked Ron to
>> add a guarantee() so we got some failure indication just in case
>> we returned from VMError.report_and_die() at some point in the
>> future. I didn't want us to return to the report_vm_out_of_memory()
>> caller nor did I want us to simply call abort() like the old code.
>>
>> Dan
>>
>>
>>
>>> On Feb 20, 2013, at 12:46 PM, Daniel D. Daugherty wrote:
>>>
>>>> On 2/20/13 10:37 AM, Coleen Phillimore wrote:
>>>>> On 2/20/2013 12:05 PM, Daniel D. Daugherty wrote:
>>>>>> On 2/20/13 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>> On 2/20/13 9:34 AM, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> This looks good.
>>>>>>>
>>>>>>> Thanks for the review! Don't know if Ron is having the same 
>>>>>>> connectivity
>>>>>>> problems I'm having this morning, but my access into OWAN is 
>>>>>>> going up
>>>>>>> and down...
>>>>>>>
>>>>>>>
>>>>>>>> It looks like the webrev was updated to get rid of the unused 
>>>>>>>> variable, so that is good.
>>>>>>>
>>>>>>> The webrev was not updated.
>>>>>
>>>>> Yes, I see that now.  Mikael has a much better eye than I do.
>>>>>>>
>>>>>>>
>>>>>>>> Is there a test for ErrorHandlerTest in our repository already?
>>>>>>>
>>>>>>> ErrorHandlerTest? Is there yet another possible test that we don't
>>>>>>> know about?
>>>>>>
>>>>>> OK. I know that test by a different name:
>>>>>>
>>>>>> $ rgrep ErrorHandlerTest agent make src test
>>>>>> src/share/vm/runtime/globals.hpp: notproduct(uintx, 
>>>>>> ErrorHandlerTest, 0, \
>>>>>> src/share/vm/prims/jni.cpp: 
>>>>>> NOT_PRODUCT(test_error_handler(ErrorHandlerTest));
>>>>>> test/runtime/6888954/vmerrors.sh: -XX:ErrorHandlerTest=${i} 
>>>>>> -version > ${i2}.out 2>&1
>>>>>> test/runtime/6888954/vmerrors.sh:    # If ErrorHandlerTest is 
>>>>>> ignored (product build), stop.
>>>>>> test/runtime/6888954/vmerrors.sh: echo "ErrorHandlerTest=$i 
>>>>>> failed ($f)"
>>>>>>
>>>>>> Ron had previously explored using vmerror.sh to exercise the
>>>>>> vm_exit_out_of_memory() code path as one of his early experiments.
>>>>>> He also did some testing using the ErrorHandlerTest command line
>>>>>> option. In neither case did it seem possible to get multi-threaded
>>>>>> test cases up and running. Perhaps both he and I missed something.
>>>>>>
>>>>>> It also looks like Ron didn't record the specifics of his testing
>>>>>> with either vmerrors.sh or the ErrorHandlerTest command line option
>>>>>> in the bug report. I'll have to rattle his cage about that.
>>>>>>
>>>>>
>>>>> My question was mostly if we had a jtreg test in hotspot/test 
>>>>> repository that exercised this ErrorHandlerTest option.   The 
>>>>> second part of the question is whether we should have a test 
>>>>> because it'll look like it failed.   Maybe this is more of a 
>>>>> question for Christian Tornqvist and SQE and is not tied to this 
>>>>> specific change.
>>>>>
>>>>> I talked to Ron about trying to test this also and we couldn't 
>>>>> come up with a good strategy either.   We need a good way to test 
>>>>> out of C heap memory without actually allocating lots of memory 
>>>>> and running out of C heap.  Such tests don't run well.  Maybe we 
>>>>> can do something in the future with this ErrorHandlerTest option 
>>>>> to have the VM return NULL or assert for various malloc calls 
>>>>> triggered through some heuristic. Having the experiments to date 
>>>>> recorded somewhere would be great.
>>>>
>>>> See the READ_ME file attached to the bug report for the craziness that
>>>> Ron and I had to go through to properly test this. Some of what we came
>>>> up with should be useful as a diagnostic option, but that should be 
>>>> done
>>>> as a separate bug fix.
>>> Totally agree, this should be a separate discussion.
>>>
>>> thanks,
>>> Karen
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>>> Dan
>>>>>>
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 2/19/2013 6:48 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I'm sponsoring this code review request from Ron Durbin. This 
>>>>>>>>> change
>>>>>>>>> is targeted at JDK8/HSX-25 in the RT_Baseline repo.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have a proposed fix for the following bug:
>>>>>>>>>
>>>>>>>>>     6799919 Recursive calls to report_vm_out_of_memory are 
>>>>>>>>> handled incorrectly
>>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6799919
>>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-6799919
>>>>>>>>>
>>>>>>>>> This is one of those bug fixes where the commit message nicely 
>>>>>>>>> describes
>>>>>>>>> the change:
>>>>>>>>>
>>>>>>>>> 6799919: Recursive calls to report_vm_out_of_memory are 
>>>>>>>>> handled incorrectly
>>>>>>>>> Summary: report_vm_out_of_memory() should allow 
>>>>>>>>> VMError.report_and_die() to handle multiple out of native 
>>>>>>>>> memory errors.
>>>>>>>>> Reviewed-by: dcubed, <other-reviewers>
>>>>>>>>> Contributed-by ron.durbin at oracle.com
>>>>>>>>>
>>>>>>>>> Here is the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/for_rdurbin/6799919-webrev/0-hsx25 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>>    - See the READ_ME file attached to the JDK-6799919 for the 
>>>>>>>>> gory details
>>>>>>>>>      of the testing needed to reproduce this failure and 
>>>>>>>>> verify the fix
>>>>>>>>>    - regular JPRT test job is in process
>>>>>>>>>
>>>>>>>>> Comments, questions and suggestions are welcome.
>>>>>>>>>
>>>>>>>>> Ron
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20130220/211ff170/attachment-0001.html 


More information about the serviceability-dev mailing list