Code Review fix for 6799919 Recursive calls to report_vm_out_of_memory are handled incorrectly

David Holmes david.holmes at oracle.com
Wed Feb 20 15:37:51 PST 2013


To be clear, all good from my perspective.

Interesting archeology challenge too.

Thanks,
David

On 21/02/2013 8:24 AM, Daniel D. Daugherty wrote:
> Thanks again for the review!
>
> Dan
>
>
> On 2/20/13 3:03 PM, Karen Kinnear wrote:
>> Thanks for the explanation. The guarantee should let us know we are
>> in that situation.
>>
>> thanks,
>> Karen
>>
>> On Feb 20, 2013, at 1:48 PM, Daniel D. Daugherty wrote:
>>
>>> Karen,
>>>
>>> Thanks for the review!
>>>
>>>
>>> On 2/20/13 11:39 AM, Karen Kinnear wrote:
>>>> Thank you for fixing this Ron and to both you and Dan for figuring
>>>> out a way to simulate this problem
>>>> to test the fix.
>>>>
>>>> So what does happen with the UseOSErrorReporting option?
>>>> If we know the answer, perhaps the comment could not state that we
>>>> have to figure this out later?
>>>
>>> Since I had asked Ron to put in both the comment and the guarantee,
>>> I'll field that question. The UseOSErrorReporting option in
>>> VMError.report_and_die() is meant to handle the case where
>>> report_and_die() is called for every frame during some error
>>> reporting stack walk; Coleen made that change in report_and_die()
>>> years ago...
>>>
>>> Based on my analysis of how the UseOSErrorReporting option is used,
>>> I don't expect it to come into play when report_vm_out_of_memory()
>>> calls VMError.report_and_die(), but I'm paranoid so I asked Ron to
>>> add a guarantee() so we got some failure indication just in case
>>> we returned from VMError.report_and_die() at some point in the
>>> future. I didn't want us to return to the report_vm_out_of_memory()
>>> caller nor did I want us to simply call abort() like the old code.
>>>
>>> Dan
>>>
>>>
>>>
>>>> On Feb 20, 2013, at 12:46 PM, Daniel D. Daugherty wrote:
>>>>
>>>>> On 2/20/13 10:37 AM, Coleen Phillimore wrote:
>>>>>> On 2/20/2013 12:05 PM, Daniel D. Daugherty wrote:
>>>>>>> On 2/20/13 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>> On 2/20/13 9:34 AM, Coleen Phillimore wrote:
>>>>>>>>>
>>>>>>>>> This looks good.
>>>>>>>>
>>>>>>>> Thanks for the review! Don't know if Ron is having the same
>>>>>>>> connectivity
>>>>>>>> problems I'm having this morning, but my access into OWAN is
>>>>>>>> going up
>>>>>>>> and down...
>>>>>>>>
>>>>>>>>
>>>>>>>>> It looks like the webrev was updated to get rid of the unused
>>>>>>>>> variable, so that is good.
>>>>>>>>
>>>>>>>> The webrev was not updated.
>>>>>>
>>>>>> Yes, I see that now.  Mikael has a much better eye than I do.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Is there a test for ErrorHandlerTest in our repository already?
>>>>>>>>
>>>>>>>> ErrorHandlerTest? Is there yet another possible test that we don't
>>>>>>>> know about?
>>>>>>>
>>>>>>> OK. I know that test by a different name:
>>>>>>>
>>>>>>> $ rgrep ErrorHandlerTest agent make src test
>>>>>>> src/share/vm/runtime/globals.hpp: notproduct(uintx,
>>>>>>> ErrorHandlerTest, 0, \
>>>>>>> src/share/vm/prims/jni.cpp:
>>>>>>> NOT_PRODUCT(test_error_handler(ErrorHandlerTest));
>>>>>>> test/runtime/6888954/vmerrors.sh: -XX:ErrorHandlerTest=${i}
>>>>>>> -version > ${i2}.out 2>&1
>>>>>>> test/runtime/6888954/vmerrors.sh:    # If ErrorHandlerTest is
>>>>>>> ignored (product build), stop.
>>>>>>> test/runtime/6888954/vmerrors.sh: echo "ErrorHandlerTest=$i
>>>>>>> failed ($f)"
>>>>>>>
>>>>>>> Ron had previously explored using vmerror.sh to exercise the
>>>>>>> vm_exit_out_of_memory() code path as one of his early experiments.
>>>>>>> He also did some testing using the ErrorHandlerTest command line
>>>>>>> option. In neither case did it seem possible to get multi-threaded
>>>>>>> test cases up and running. Perhaps both he and I missed something.
>>>>>>>
>>>>>>> It also looks like Ron didn't record the specifics of his testing
>>>>>>> with either vmerrors.sh or the ErrorHandlerTest command line option
>>>>>>> in the bug report. I'll have to rattle his cage about that.
>>>>>>>
>>>>>>
>>>>>> My question was mostly if we had a jtreg test in hotspot/test
>>>>>> repository that exercised this ErrorHandlerTest option.   The
>>>>>> second part of the question is whether we should have a test
>>>>>> because it'll look like it failed.   Maybe this is more of a
>>>>>> question for Christian Tornqvist and SQE and is not tied to this
>>>>>> specific change.
>>>>>>
>>>>>> I talked to Ron about trying to test this also and we couldn't
>>>>>> come up with a good strategy either.   We need a good way to test
>>>>>> out of C heap memory without actually allocating lots of memory
>>>>>> and running out of C heap.  Such tests don't run well.  Maybe we
>>>>>> can do something in the future with this ErrorHandlerTest option
>>>>>> to have the VM return NULL or assert for various malloc calls
>>>>>> triggered through some heuristic. Having the experiments to date
>>>>>> recorded somewhere would be great.
>>>>>
>>>>> See the READ_ME file attached to the bug report for the craziness that
>>>>> Ron and I had to go through to properly test this. Some of what we came
>>>>> up with should be useful as a diagnostic option, but that should be
>>>>> done
>>>>> as a separate bug fix.
>>>> Totally agree, this should be a separate discussion.
>>>>
>>>> thanks,
>>>> Karen
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 2/19/2013 6:48 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I'm sponsoring this code review request from Ron Durbin. This
>>>>>>>>>> change
>>>>>>>>>> is targeted at JDK8/HSX-25 in the RT_Baseline repo.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have a proposed fix for the following bug:
>>>>>>>>>>
>>>>>>>>>>     6799919 Recursive calls to report_vm_out_of_memory are
>>>>>>>>>> handled incorrectly
>>>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6799919
>>>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-6799919
>>>>>>>>>>
>>>>>>>>>> This is one of those bug fixes where the commit message nicely
>>>>>>>>>> describes
>>>>>>>>>> the change:
>>>>>>>>>>
>>>>>>>>>> 6799919: Recursive calls to report_vm_out_of_memory are
>>>>>>>>>> handled incorrectly
>>>>>>>>>> Summary: report_vm_out_of_memory() should allow
>>>>>>>>>> VMError.report_and_die() to handle multiple out of native
>>>>>>>>>> memory errors.
>>>>>>>>>> Reviewed-by: dcubed, <other-reviewers>
>>>>>>>>>> Contributed-by ron.durbin at oracle.com
>>>>>>>>>>
>>>>>>>>>> Here is the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/for_rdurbin/6799919-webrev/0-hsx25
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>>    - See the READ_ME file attached to the JDK-6799919 for the
>>>>>>>>>> gory details
>>>>>>>>>>      of the testing needed to reproduce this failure and
>>>>>>>>>> verify the fix
>>>>>>>>>>    - regular JPRT test job is in process
>>>>>>>>>>
>>>>>>>>>> Comments, questions and suggestions are welcome.
>>>>>>>>>>
>>>>>>>>>> Ron
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the hotspot-runtime-dev mailing list