RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v8]
Christian Hagedorn
chagedorn at openjdk.org
Wed May 10 15:26:40 UTC 2023
On Wed, 10 May 2023 12:55:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> **Motivation**
>>
>> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy.
>> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`)
>>
>> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach.
>>
>> **Changes**
>>
>> - Make many containers `NONCOPYABLE`:
>> - `Dict`
>> - `VectorSet`
>> - `Node_Array`, `Node_List`, `Unique_Node_List`
>> - `Node_Stack`
>> - `NodeHash`
>> - `Type_Array`
>> - `Phase`
>> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them.
>> - Create "global" containers for `Compile`:
>> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`)
>> - `C->type_array()` (referenced to by `PhaseValues._types`)
>> - `C->node_hash_table()` (referenced to by `PhaseValues._table`)
>> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them.
>> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it.
>> - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`.
>> - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers.
>> - Moved things from `PhaseTransform` to `PhaseValues`:
>> - `_types` (now only by reference) and all type related functions
>> - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`)
>> - `saturate / saturate_and_maybe_push_to_igvn_worklist`
>> - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed.
>> - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them.
>> - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way.
>> - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more.
>> - Refactoring around `PhaseRenumberLive`:
>> - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler.
>> - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`.
>> - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions.
>> - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for:
>> - `PhaseIdealLoop._loop_ctrl`
>> - `Matcher._new_nodes`
>> - `node_map` in `haseCCP::do_transform`
>> - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for.
>> - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`.
>> - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`.
>>
>> **Future Work**
>> - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn.
>> - Refactor Phases:
>> - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial.
>> - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass.
>> - `init_con_caches` is this really something local, or could it live "globally" at `Compile`?
>> - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution?
>> - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it?
>> - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others.
>> - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date.
>> - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered.
>>
>> I filed: [JDK-8307815](https://bugs.openjdk.org/browse/JDK-8307815) C2 Phase structure cleanup
>> (and linked this PR in it)
>>
>> **Testing**
>>
>> Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`.
>> **TODO**: performance testing
>>
>> **Discussion**
>>
>> This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> Last of 4 suggestion commits from @TobiHartmann
Nice work! That's a great cleanup and makes things much easier to follow.
I only have some comments.
> Value -> GVN -> IGVN -> CCP nesting does not really make sense. We should probably have CCP next to GVN. Maybe IGVN should also be next to GVN, or a subclass.
That was always confusing. I like the idea of having them as separate subclasses of Value.
Maybe we could also investigate in a future RFE to place `_loop_ctrl` in a separate arena as well. I've run into some problems when using a `ResourceMark` for a local `Node_List`. I've called `get_ctrl()` for some unmapped node which initiated the `_loop_ctrl` list to be grown/reallocated. When the local `ResourceMark` went out of scope, it also reverted the newly allocated space for `_loop_ctrl` which was hard to foresee.
This would also allow us to add some more `ResourceMark`s throughout the code (could also be part of this future RFE).
src/hotspot/share/opto/callnode.hpp line 1074:
> 1072:
> 1073: // locking does not modify its arguments
> 1074: virtual bool may_modify(const TypeOopPtr* t_oop, PhaseValues* phase){ return false;}
Suggestion:
virtual bool may_modify(const TypeOopPtr* t_oop, PhaseValues* phase){ return false; }
src/hotspot/share/opto/cfgnode.cpp line 1094:
> 1092: Arena *a = Thread::current()->resource_area();
> 1093: Node_Array node_map;
> 1094: Node_Stack stack(a, C->live_nodes() >> 4);
To further clean this up, you could also use this constructor which implicitly uses `Thread::current()->resource_area()`:
https://github.com/openjdk/jdk/blob/cc396895e5a1dac49f4e341ce91c04b8c092d0af/src/hotspot/share/opto/node.hpp#L1698-L1704
src/hotspot/share/opto/loopnode.cpp line 5195:
> 5193:
> 5194: } else { // Else not a nested loop
> 5195: if( !_loop_ctrl[m->_idx] ) continue; // Dead code has no loop
Suggestion:
if (!_loop_ctrl[m->_idx]) continue; // Dead code has no loop
src/hotspot/share/opto/loopnode.cpp line 5883:
> 5881: uint i = 0;
> 5882: while (true) {
> 5883: assert( _loop_ctrl[n->_idx], "no dead nodes" );
Suggestion:
assert(_loop_ctrl[n->_idx], "no dead nodes");
src/hotspot/share/opto/loopnode.cpp line 5891:
> 5889: // dead in the global sense, but still have local uses so I cannot
> 5890: // easily call 'remove_dead_node'.
> 5891: if( _loop_ctrl[use->_idx] != nullptr || use->is_top() ) { // Not dead?
Suggestion:
if (_loop_ctrl[use->_idx] != nullptr || use->is_top()) { // Not dead?
src/hotspot/share/opto/loopnode.cpp line 6046:
> 6044: #ifdef ASSERT
> 6045: for (DUIterator i1 = n->outs(); n->has_out(i1); i1++) {
> 6046: assert( _loop_ctrl[n->out(i1)->_idx] == nullptr, "all uses must also be dead");
Suggestion:
assert(_loop_ctrl[n->out(i1)->_idx] == nullptr, "all uses must also be dead");
src/hotspot/share/opto/loopnode.hpp line 835:
> 833:
> 834: // Map loop membership for CFG nodes, and ctrl for non-CFG nodes.
> 835: Node_List _loop_ctrl;
Maybe we can rename this to `_loop_or_ctrl` or something like that as I was first reading it as "loop ctrl" (ctrl inside a loop).
src/hotspot/share/opto/phaseX.cpp line 366:
> 364: //------------------------------PhaseRemoveUseless-----------------------------
> 365: // 1) Use a breadthfirst walk to collect useful nodes reachable from root.
> 366: PhaseRemoveUseless::PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List& worklist, PhaseNumber phase_num) : Phase(phase_num) {
You could have kept `Unique_Node_List*` and pass `igvn_worklist()` directly instead of `*igvn_worklist()`. But passing by reference is perfectly fine as well.
src/hotspot/share/opto/phaseX.cpp line 404:
> 402: // (2) Type information (the field PhaseGVN::_types) maps type information to each
> 403: // node ID. The mapping is updated to use the new node IDs as well. Updated type
> 404: // information is returned in PhaseGVN::_types.
This comment should also be updated to reflect the change to use the dedicated `C->types()` type array as with comment `(1)`.
src/hotspot/share/opto/phaseX.cpp line 408:
> 406: // Other data structures used by the compiler are not updated. The hash table for value
> 407: // numbering (the field PhaseGVN::_table) is not updated because computing the hash
> 408: // values is not based on node IDs.
Should mention `PhaseValue::_table` and/or `C->node_hash_table()` instead of `PhaseGVN::_table`.
src/hotspot/share/opto/phaseX.cpp line 574:
> 572:
> 573: //------------------------------makecon----------------------------------------
> 574: ConNode* PhaseValues::makecon(const Type *t) {
Suggestion:
ConNode* PhaseValues::makecon(const Type* t) {
src/hotspot/share/opto/phaseX.hpp line 157:
> 155: // list is allocated from current resource area
> 156: public:
> 157: PhaseRemoveUseless(PhaseGVN *gvn, Unique_Node_List &worklist, PhaseNumber phase_num = Remove_Useless);
Suggestion:
PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List& worklist, PhaseNumber phase_num = Remove_Useless);
src/hotspot/share/opto/phaseX.hpp line 417:
> 415:
> 416: public:
> 417: PhaseGVN() {}
I think this default constructor can be removed as it will be implicitly defined.
-------------
PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1420669327
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189925814
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189932663
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189964654
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189965102
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189965358
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189965741
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189963391
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189972449
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189984965
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189984879
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189989461
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1190016673
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1190021447
More information about the hotspot-dev
mailing list