RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v3]

Axel Boldt-Christmas aboldtch at openjdk.org
Tue Dec 20 14:59:57 UTC 2022


On Mon, 21 Nov 2022 06:01:46 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration.
>> 
>> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed.
>> 
>> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values.
>> 
>> Enables the following
>> ```C++
>> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized())
>>   os::print_register_info_header(st, _context);
>> 
>>   REENTRANT_LOOP_START(os::print_nth_register_info_max_index())
>>     // decode register contents if possible
>>     ResourceMark rm(_thread);
>>     os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context);
>>   REENTRANT_LOOP_END
>> 
>>   st->cr();
>> 
>> 
>> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local)
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix whitespace

Sorry for the late reply. Felt no need to push anything related to this into 20.

> > Added some limitations on reentry of a reentrant step. It will now break the inner loop if:
> > ```
> > * It is the fourth time reentering this step
> > 
> > * It is the eight time reentering any reentrant step
> > 
> > * The stack headroom is less than 64K
> > 
> > * A timeout has been issued
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > The post loop logic of a reentrant step is given another timeout window. Currently all it does is make sure there are line breaks after the step output, but I imagine this can be useful incase some reentrant step logic is used where the loop builds up some data structure and the post logic prints it.
> > All of the limit constants are just picked rather ad-hoc. Would be nice to have some extra feedback on this.
> 
> I think the approach is better, but I'm not a big fan of the broadened os interface and the fact that error-handling-specifics now leak into the os interface.
> 

I rewrote the code so the os interface is simply:
```c++
  static void print_register_info(outputStream* st, int n, const void* context);
  static int  printable_register_count();

It simply allows querying for how many printable registers there are and asking it to print a register's content on an outputStream given a context. I do not believe that any error handling specifics leaks into the interface, if anything it is more agnostic interface now. The previous `print_register_info` was written to output data which fits in with hs_err printing. This approach just prints one register's info followed by a newline.

> How about letting the print function fill in an opaque continuation object instead? Something like:
> 
> ```
> // Print register information; optionally re-startable. If (*continuation_info) is null,
> // register printing starts with the first register, otherwise beyond whatever point
> // it had been interrupted before.
> void os::print_register_info(const ucontext_t* context, outputStream* st, void** continuation_info);
> ```
> 
> `continuation_info` can have any shape or form the platform-specific implementation wants. It does not have to be visible or known on the outside. It can exist as a struct only in the os_xxx.cpp file. Or, the platform could just hide an integer in the pointer and use it as restart point.
> 
> The standard `os::print_register_info(const ucontext_t*, outputStream*)` can then be implemented with a dummy continuation info that does nothing:
> 
> ```
> os::print_register_info(const ucontext_t* c, outputStream* st) {
>   void* info;
>   os_print_register_info(c, st, &info);
> }
> ```
> 
> If that is too C-ish for you, there can be a C++ equivalent via runtime polymorphy. In any case, the charm would be that you can re-start register printing without having to know specific implementation details. You just hand in the continuation object. In a very primitive form, without the need for further STEP macros, that could even look like this:
> 
> ```
> void* print_reg_continuation_info = nullptr;
> ...
> STEP(print_register_info_attempt_1)
> { os::print_register_info(context, st, &print_reg_continuation_info); }
> STEP(print_register_info_attempt_2)
> { os::print_register_info(context, st, &print_reg_continuation_info); }
> STEP(print_register_info_attempt_3)
> { os::print_register_info(context, st, &print_reg_continuation_info); }
> ```
> 
> (for this simple approach to work, os::print_register_info() would have recognize a completed printing by the content of continuation info).
> 
> In theory, we could use a similar pattern for call stack printing too.
> 

The problem I see with this approach is that if the storage of the continuation must either be dynamic and managed or static and global. Both I do not think fit with error reporting. Dynamic has too many side effects and will be unstable. If the storage lives statically inside the os object file then we have a global state and the `os::print_register_info` will be locked to the error printing or some mechanism has to be added to manage the global state of the program if the `os::os::print_register_info` is to be used from any other location. It is possible to create some construct which can at least hold any platform specific construct and have the storage be inside `vmError`, but this solution is often ugly and hard to make work correctly with the C++ object model. 

Maybe there are different opinions here, but I think such an approach should be avoided as it is harder to reason about. 
I think that having a reentrant step with previously mentioned semantics is easier to reason about and read than having repeated code with some externally managed state. 

> As a side note, I have not been idle since print_register_info() not working bugs me too. I found that many reasons for it not working are caused by only a few bugs. oopDesc::print_on() could do a lot more error checking before printing Klass* for instance. And we also could do more error checking when printing object array elements. I have a patch open, but won't make it before JDK 20 freeze, and probably not before my holidays either. If someone else wants to do it, that would be fine with me too.
> 
> Cheers, Thomas

Sounds good. I will continue to monitor the progress on vmError. Will mostly be focusing on ZGC related issues for the coming months. But will be happy to review any PRs that improve the he_err printing.

-------------

PR: https://git.openjdk.org/jdk/pull/11017


More information about the hotspot-dev mailing list