RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]
David Holmes
dholmes at openjdk.java.net
Tue Oct 6 02:59:45 UTC 2020
On Thu, 1 Oct 2020 10:12:54 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:
>>> _Mailing list message from [Kim Barrett](mailto:kim.barrett at oracle.com) on
>>> [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_
>>> I've only looked at scattered pieces, but what I've looked at seemed to be
>>> in good shape. Only a few minor comments.
>>>
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/runtime/frame.cpp
>>> 456 // for(StackFrameStream fst(thread); !fst.is_done(); fst.next()) {
>>>
>>> Needs to be updated for the new constructor arguments. Just in general, the
>>> class documentation seems to need some updating for this change.
>>
>> Fixed.
>>
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/runtime/frame.cpp
>>> 466 StackFrameStream(JavaThread *thread, bool update, bool process_frames);
>>>
>>> Something to consider is that bool parameters like that, especially when
>>> there are multiple, are error prone. An alternative is distinct enums, which
>>> likely also obviates the need for comments in calls.
>>
>> Coleen also had the same comment, and we agreed to file a follow-up RFE to clean that up. This also applies to all the
>> existing parameters passed in. So I would still like to do that, but in a follow-up RFE. Said RFE will also make the
>> parameter use in RegisterMap and vframeStream explicit.
>>> ------------------------------------------------------------------------------
>>> src/hotspot/share/runtime/thread.hpp
>>> 956 void set_processed_thread(Thread *thread) { _processed_thread = thread; }
>>>
>>> I think this should assert that either _processed_thread or thread are NULL.
>>> Or maybe the RememberProcessedThread constructor should be asserting that
>>> _cur_thr->processed_thread() is NULL.
>>
>> Fixed.
>>
>>> ------------------------------------------------------------------------------
>>
>> Thanks for the review Kim!
>
> In my last PR update, I included a fix to an exception handling problem that I encountered after lots of stress testing
> that I have been running for a while now. I managed to catch the issue, get a reliable reproducer, and fix it.
> The root problem is that the hook I had placed in SharedRuntime::exception_handler_for_return_address has been ignored.
> The reason is that the stack is not walkable at this point. The hook then just ignores it. This had some unexpected
> consequences. After looking closer at this code, I found that if we did have a walkable stack when we call
> SharedRuntime::raw_exception_handler_for_return_address, that would have been the only hook we need at all for
> exception handling. It is always the common root point where we unwind into a caller frame due to an exception throwing
> into the caller, and we need to look up the rethrow handler of the caller. However, we are indeed not walkable here. To
> deal with this, I have rearranged the exceptino hooks a bit. First of all, I have deleted all before_unwind hooks for
> exception handling, because they should not be needed if the after_unwind hook is reliably called on the caller side
> instead. And those hooks do indeed need to be there, because we do not always have a point where we can call
> before_unwind (e.g. C1 unwind exception code, that just unwinds and looks up the rethrow handler via
> SharedRuntime::exception_handler_for_return_address). I have then traced all paths from
> SharedRuntime::raw_exception_handler_for_return_address into runtime rethrow handlers called, for each rethrow
> exception handler PC exposed in the function. They are:
> * OptoRuntime::rethrow_C when unwinding into C2 code
> * exception_handler_for_pc_helper via Runtime1::handle_exception_from_callee_id when unwinding into C1 code
> * JavaCallWrapper::~JavaCallWrapper when unwinding into a Java call stub
> * InterpreterRuntime::exception_handler_for_exception when unwinding into an interpreted method
> * Deoptimization::fetch_unroll_info (with exec_mode == Unpack_exception) when unwinding into a deoptimized nmethod
>
> Each rethrow handler returned has a corresponding comment saying which rethrow runtime rethrow handler it will end up
> in, once the stack has been walkable and we have transferred control into the caller. And all of those runtime hooks
> now have an after_unwind() hook. The good news is that now the responsibility for who calls the unwind hook for
> exception is clearer: it is never done by the callee, and always done by the caller, in its rethrow handler, at which
> point the stack is walkable. In order to avoid further issues where an unwind hook is ignored, I have changed them to
> assert that there is a last_Java_frame present. Previously I did not assert that, because there was shared code between
> runtime native transitions and the native wrapper, that called an unwind hook. This prevented the unwind hooks to
> assert this, as compiler threads would still perform native transitions, that just ignored the request. I moved the
> problematic hook for native up one level in the hierarchy to a path where it is only called by native wrappers (where
> we always have a last_Java_frame), so that I can finally assert that the unwind hooks always are called at points where
> we have a last_Java_frame. This makes me feel confident that I do not have another hook that is being accidentally
> ignored. However, the relationship for the various exception handling code executed in the caller, the callee, and
> between the two (before we are walkable) is rather complicated. So it would be good to have someone that knows the
> exception code very well have a look at this, to make sure I have not missed anything. I have rerun all testing and
> done a load of stress testing to sanity check this. The reproducer I eventually found that reproduced the issue with
> 100% success rate, was run many times with the new patch, and no longer reproduces any issue.
Hi Erik,
Can you give an overview of the use of the "poll word" and its relation to the "poll page" please?
Thanks,
David
-------------
PR: https://git.openjdk.java.net/jdk/pull/296
More information about the serviceability-dev
mailing list