RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2]

Tobias Hartmann thartmann at openjdk.org
Wed May 10 11:41:22 UTC 2023


On Wed, 10 May 2023 10:31:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Motivation**
>> 
>> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy.
>> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`)
>> 
>> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach.
>> 
>> **Changes**
>> 
>> - Make many containers `NONCOPYABLE`:
>>   - `Dict`
>>   - `VectorSet`
>>   - `Node_Array`, `Node_List`, `Unique_Node_List`
>>   - `Node_Stack`
>>   - `NodeHash`
>>   - `Type_Array`
>>   - `Phase`
>>   - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them.
>> - Create "global" containers for `Compile`:
>>   - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`)
>>   - `C->type_array()` (referenced to by `PhaseValues._types`)
>>   - `C->node_hash_table()` (referenced to by `PhaseValues._table`)
>>   - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them.
>>   - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it.
>>   - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`.
>>   - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers.
>> - Moved things from `PhaseTransform` to `PhaseValues`:
>>   - `_types` (now only by reference) and all type related functions
>>   - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`)
>>   - `saturate / saturate_and_maybe_push_to_igvn_worklist`
>>   - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed.
>>   - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them.
>> - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way.
>> - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more.
>> - Refactoring around `PhaseRenumberLive`:
>>   - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler.
>>   - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`.
>>   - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions.
>> - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for:
>>   - `PhaseIdealLoop._loop_ctrl`
>>   - `Matcher._new_nodes`
>>   - `node_map` in `haseCCP::do_transform`
>>   - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for.
>> - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`.
>> - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`.
>> 
>> **Future Work**
>> - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn.
>> - Refactor Phases:
>>   - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial.
>>   - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass.
>>   - `init_con_caches` is this really something local, or could it live "globally" at `Compile`?
>>   - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution?
>>   - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it?
>>   - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others.
>>   - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date.
>> - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered.
>> 
>> **Testing**
>> 
>> Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`.
>> **TODO**: performance testing
>> 
>> **Discussion**
>> 
>> This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   update copyright years

Great work, Emanuel! I only have a few minor comments.

> Phase._pnum (PhaseNumber): do we really need this? Is there not a better solution?

Since you removed the least two usages outside of the `Phase` constructor, let's file a follow-up RFE to investigate if we can simply remove it.

src/hotspot/share/libadt/vectset.hpp line 57:

> 55:   VectorSet(Arena* arena);
> 56: 
> 57:   // Allow move constructor for && (eg. capture return of function)

It's not completely clear yet to me why this is required and how it correlates with `NONCOPYABLE` but I leave this to the experts :)

src/hotspot/share/opto/arraycopynode.cpp line 675:

> 673: }
> 674: 
> 675: bool ArrayCopyNode::may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode* &call) {

Suggestion:

bool ArrayCopyNode::may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode*& call) {

src/hotspot/share/opto/arraycopynode.cpp line 686:

> 684: }
> 685: 
> 686: bool ArrayCopyNode::may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode* &ac) {

Suggestion:

bool ArrayCopyNode::may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode*& ac) {

src/hotspot/share/opto/arraycopynode.hpp line 114:

> 112:   bool finish_transform(PhaseGVN *phase, bool can_reshape,
> 113:                         Node* ctl, Node *mem);
> 114:   static bool may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode* &call);

Suggestion:

  static bool may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode*& call);

src/hotspot/share/opto/arraycopynode.hpp line 182:

> 180:   bool has_negative_length_guard() const { return _has_negative_length_guard; }
> 181: 
> 182:   static bool may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode* &ac);

Suggestion:

  static bool may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode*& ac);

src/hotspot/share/opto/compile.cpp line 410:

> 408: 
> 409: // Disconnect all useless nodes by disconnecting those at the boundary.
> 410: void Compile::disconnect_useless_nodes(Unique_Node_List &useful, Unique_Node_List &worklist) {

Suggestion:

void Compile::disconnect_useless_nodes(Unique_Node_List& useful, Unique_Node_List& worklist) {

src/hotspot/share/opto/compile.hpp line 426:

> 424: 
> 425:   // Shared type array for GVN, IGVN and CCP. It maps node ID -> Type*.
> 426:   Type_Array*           _type_array;

Should we call this `_types` (or `_node_type`) instead?

src/hotspot/share/opto/compile.hpp line 429:

> 427: 
> 428:   // Shared node hash table for GVN, IGVN and CCP.
> 429:   NodeHash*             _node_hash_table;

Should we call this `_node_hash` instead?

src/hotspot/share/opto/compile.hpp line 957:

> 955:     return *_type_array;
> 956:   }
> 957:   NodeHash& node_hash_table() {

Can we just use a pointer return value for these?

src/hotspot/share/opto/compile.hpp line 972:

> 970:   void              identify_useful_nodes(Unique_Node_List &useful);
> 971:   void              update_dead_node_list(Unique_Node_List &useful);
> 972:   void              disconnect_useless_nodes(Unique_Node_List &useful, Unique_Node_List &worklist);

Suggestion:

  void              disconnect_useless_nodes(Unique_Node_List& useful, Unique_Node_List& worklist);

src/hotspot/share/opto/lcm.cpp line 1272:

> 1270:     // Block at same level in dom-tree is not a successor.  It needs a
> 1271:     // PhiNode, the PhiNode uses from the def and IT's uses need fixup.
> 1272:     Node_Array inputs;

Urgh, looks like we had object slicing here before your changes.

src/hotspot/share/opto/matcher.hpp line 91:

> 89:   ResourceArea _states_arena;
> 90: 
> 91:   Node_List   _new_nodes;

Please add a comment.

src/hotspot/share/opto/node.hpp line 1661:

> 1659:   }
> 1660: 
> 1661: #ifndef PRODUCT

Suggestion:

#ifdef ASSERT


We don't need this in the optimized build.

src/hotspot/share/opto/node.hpp line 1662:

> 1660: 
> 1661: #ifndef PRODUCT
> 1662:   bool is_subset_of(Unique_Node_List &other) {

Suggestion:

  bool is_subset_of(Unique_Node_List& other) {

src/hotspot/share/opto/phaseX.cpp line 358:

> 356: //------------------------------PhaseRemoveUseless-----------------------------
> 357: // 1) Use a breadthfirst walk to collect useful nodes reachable from root.
> 358: PhaseRemoveUseless::PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List &worklist, PhaseNumber phase_num) : Phase(phase_num) {

Suggestion:

PhaseRemoveUseless::PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List& worklist, PhaseNumber phase_num) : Phase(phase_num) {

src/hotspot/share/opto/phaseX.cpp line 402:

> 400: // values is not based on node IDs.
> 401: PhaseRenumberLive::PhaseRenumberLive(PhaseGVN* gvn,
> 402:                                      Unique_Node_List &worklist,

Suggestion:

                                     Unique_Node_List& worklist,

src/hotspot/share/opto/phaseX.cpp line 424:

> 422:   }
> 423: 
> 424:   assert(worklist.is_subset_of(_useful), "sanity");

Please add a more useful error message.

src/hotspot/share/opto/phaseX.cpp line 448:

> 446:   }
> 447: 
> 448:   // VectorSet in Unique_Node_Set must be recomputed, since ID's have changed.

Suggestion:

  // VectorSet in Unique_Node_Set must be recomputed, since IDs have changed.

src/hotspot/share/opto/phaseX.hpp line 28:

> 26: #define SHARE_OPTO_PHASEX_HPP
> 27: 
> 28: #include "utilities/globalDefinitions.hpp"

Should be added in alphabetical order.

src/hotspot/share/opto/phaseX.hpp line 466:

> 464: 
> 465:   // Reset IGVN from GVN: call deconstructor, and placement new.
> 466:   // Acheives the same as the following (but without move constructors):

Suggestion:

  // Achieves the same as the following (but without move constructors):

src/hotspot/share/opto/phaseX.hpp line 476:

> 474: 
> 475:   // Reset IGVN with another: call deconstructor, and placement new.
> 476:   // Acheives the same as the following (but without move constructors):

Suggestion:

  // Achieves the same as the following (but without move constructors):

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1420337524
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189754877
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718073
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718238
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718622
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718768
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189727149
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189730630
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189735492
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189736841
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189740559
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189744321
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189750611
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189756813
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189760080
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189761087
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189761815
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189759140
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189762954
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189765854
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189775800
PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189776325


More information about the hotspot-dev mailing list