Code Review fix for 6799919 Recursive calls to report_vm_out_of_memory are handled incorrectly

Karen Kinnear karen.kinnear at oracle.com
Wed Feb 20 14:03:47 PST 2013


Thanks for the explanation. The guarantee should let us know we are
in that situation.

thanks,
Karen

On Feb 20, 2013, at 1:48 PM, Daniel D. Daugherty wrote:

> Karen,
> 
> Thanks for the review!
> 
> 
> On 2/20/13 11:39 AM, Karen Kinnear wrote:
>> Thank you for fixing this Ron and to both you and Dan for figuring out a way to simulate this problem
>> to test the fix.
>> 
>> So what does happen with the UseOSErrorReporting option?
>> If we know the answer, perhaps the comment could not state that we have to figure this out later?
> 
> Since I had asked Ron to put in both the comment and the guarantee,
> I'll field that question. The UseOSErrorReporting option in
> VMError.report_and_die() is meant to handle the case where
> report_and_die() is called for every frame during some error
> reporting stack walk; Coleen made that change in report_and_die()
> years ago...
> 
> Based on my analysis of how the UseOSErrorReporting option is used,
> I don't expect it to come into play when report_vm_out_of_memory()
> calls VMError.report_and_die(), but I'm paranoid so I asked Ron to
> add a guarantee() so we got some failure indication just in case
> we returned from VMError.report_and_die() at some point in the
> future. I didn't want us to return to the report_vm_out_of_memory()
> caller nor did I want us to simply call abort() like the old code.
> 
> Dan
> 
> 
> 
>>  
>> On Feb 20, 2013, at 12:46 PM, Daniel D. Daugherty wrote:
>> 
>>> On 2/20/13 10:37 AM, Coleen Phillimore wrote:
>>>> On 2/20/2013 12:05 PM, Daniel D. Daugherty wrote:
>>>>> On 2/20/13 9:57 AM, Daniel D. Daugherty wrote:
>>>>>> On 2/20/13 9:34 AM, Coleen Phillimore wrote:
>>>>>>> 
>>>>>>> This looks good.
>>>>>> 
>>>>>> Thanks for the review! Don't know if Ron is having the same connectivity
>>>>>> problems I'm having this morning, but my access into OWAN is going up
>>>>>> and down...
>>>>>> 
>>>>>> 
>>>>>>> It looks like the webrev was updated to get rid of the unused variable, so that is good.
>>>>>> 
>>>>>> The webrev was not updated.
>>>> 
>>>> Yes, I see that now.  Mikael has a much better eye than I do.
>>>>>> 
>>>>>> 
>>>>>>> Is there a test for ErrorHandlerTest in our repository already?
>>>>>> 
>>>>>> ErrorHandlerTest? Is there yet another possible test that we don't
>>>>>> know about?
>>>>> 
>>>>> OK. I know that test by a different name:
>>>>> 
>>>>> $ rgrep ErrorHandlerTest agent make src test
>>>>> src/share/vm/runtime/globals.hpp:  notproduct(uintx, ErrorHandlerTest, 0,                                    \
>>>>> src/share/vm/prims/jni.cpp:  NOT_PRODUCT(test_error_handler(ErrorHandlerTest));
>>>>> test/runtime/6888954/vmerrors.sh:        -XX:ErrorHandlerTest=${i} -version > ${i2}.out 2>&1
>>>>> test/runtime/6888954/vmerrors.sh:    # If ErrorHandlerTest is ignored (product build), stop.
>>>>> test/runtime/6888954/vmerrors.sh:            echo "ErrorHandlerTest=$i failed ($f)"
>>>>> 
>>>>> Ron had previously explored using vmerror.sh to exercise the
>>>>> vm_exit_out_of_memory() code path as one of his early experiments.
>>>>> He also did some testing using the ErrorHandlerTest command line
>>>>> option. In neither case did it seem possible to get multi-threaded
>>>>> test cases up and running. Perhaps both he and I missed something.
>>>>> 
>>>>> It also looks like Ron didn't record the specifics of his testing
>>>>> with either vmerrors.sh or the ErrorHandlerTest command line option
>>>>> in the bug report. I'll have to rattle his cage about that.
>>>>> 
>>>> 
>>>> My question was mostly if we had a jtreg test in hotspot/test repository that exercised this ErrorHandlerTest option.   The second part of the question is whether we should have a test because it'll look like it failed.   Maybe this is more of a question for Christian Tornqvist and SQE and is not tied to this specific change.
>>>> 
>>>> I talked to Ron about trying to test this also and we couldn't come up with a good strategy either.   We need a good way to test out of C heap memory without actually allocating lots of memory and running out of C heap.  Such tests don't run well.  Maybe we can do something in the future with this ErrorHandlerTest option to have the VM return NULL or assert for various malloc calls triggered through some heuristic.   Having the experiments to date recorded somewhere would be great.
>>> 
>>> See the READ_ME file attached to the bug report for the craziness that
>>> Ron and I had to go through to properly test this. Some of what we came
>>> up with should be useful as a diagnostic option, but that should be done
>>> as a separate bug fix.
>> Totally agree, this should be a separate discussion.
>> 
>> thanks,
>> Karen
>>> 
>>> Dan
>>> 
>>> 
>>>> 
>>>> Thanks,
>>>> Coleen
>>>> 
>>>>> Dan
>>>>> 
>>>>>> 
>>>>>> Dan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>> 
>>>>>>> On 2/19/2013 6:48 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings, 
>>>>>>>> 
>>>>>>>> I'm sponsoring this code review request from Ron Durbin. This change 
>>>>>>>> is targeted at JDK8/HSX-25 in the RT_Baseline repo. 
>>>>>>>> 
>>>>>>>> Dan 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have a proposed fix for the following bug: 
>>>>>>>> 
>>>>>>>>     6799919 Recursive calls to report_vm_out_of_memory are handled incorrectly 
>>>>>>>>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6799919 
>>>>>>>>     https://jbs.oracle.com/bugs/browse/JDK-6799919 
>>>>>>>> 
>>>>>>>> This is one of those bug fixes where the commit message nicely describes 
>>>>>>>> the change: 
>>>>>>>> 
>>>>>>>> 6799919: Recursive calls to report_vm_out_of_memory are handled incorrectly 
>>>>>>>> Summary: report_vm_out_of_memory() should allow VMError.report_and_die() to handle multiple out of native memory errors. 
>>>>>>>> Reviewed-by: dcubed, <other-reviewers> 
>>>>>>>> Contributed-by ron.durbin at oracle.com 
>>>>>>>> 
>>>>>>>> Here is the webrev URL: 
>>>>>>>> 
>>>>>>>> http://cr.openjdk.java.net/~dcubed/for_rdurbin/6799919-webrev/0-hsx25 
>>>>>>>> 
>>>>>>>> Testing: 
>>>>>>>>    - See the READ_ME file attached to the JDK-6799919 for the gory details 
>>>>>>>>      of the testing needed to reproduce this failure and verify the fix 
>>>>>>>>    - regular JPRT test job is in process 
>>>>>>>> 
>>>>>>>> Comments, questions and suggestions are welcome. 
>>>>>>>> 
>>>>>>>> Ron 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130220/870c7394/attachment-0001.html 


More information about the hotspot-runtime-dev mailing list