RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v6]

Emanuel Peter epeter at openjdk.org
Wed May 10 12:31:14 UTC 2023


> **Motivation**
> 
> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy.
> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`)
> 
> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach.
> 
> **Changes**
> 
> - Make many containers `NONCOPYABLE`:
>   - `Dict`
>   - `VectorSet`
>   - `Node_Array`, `Node_List`, `Unique_Node_List`
>   - `Node_Stack`
>   - `NodeHash`
>   - `Type_Array`
>   - `Phase`
>   - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them.
> - Create "global" containers for `Compile`:
>   - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`)
>   - `C->type_array()` (referenced to by `PhaseValues._types`)
>   - `C->node_hash_table()` (referenced to by `PhaseValues._table`)
>   - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them.
>   - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it.
>   - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`.
>   - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers.
> - Moved things from `PhaseTransform` to `PhaseValues`:
>   - `_types` (now only by reference) and all type related functions
>   - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`)
>   - `saturate / saturate_and_maybe_push_to_igvn_worklist`
>   - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed.
>   - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them.
> - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way.
> - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more.
> - Refactoring around `PhaseRenumberLive`:
>   - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler.
>   - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`.
>   - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions.
> - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for:
>   - `PhaseIdealLoop._loop_ctrl`
>   - `Matcher._new_nodes`
>   - `node_map` in `haseCCP::do_transform`
>   - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for.
> - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`.
> - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`.
> 
> **Future Work**
> - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn.
> - Refactor Phases:
>   - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial.
>   - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass.
>   - `init_con_caches` is this really something local, or could it live "globally" at `Compile`?
>   - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution?
>   - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it?
>   - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others.
>   - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date.
> - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered.
> 
> **Testing**
> 
> Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`.
> **TODO**: performance testing
> 
> **Discussion**
> 
> This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions.

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  renamed _type_array to types, and _node_hash_table to _node_hash

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/13833/files
  - new: https://git.openjdk.org/jdk/pull/13833/files/2a09fc85..2c8acd18

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=04-05

  Stats: 23 lines in 4 files changed: 0 ins; 0 del; 23 mod
  Patch: https://git.openjdk.org/jdk/pull/13833.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833

PR: https://git.openjdk.org/jdk/pull/13833


More information about the hotspot-dev mailing list