RFR: 8325095: C2: bailout message broken: ResourceArea allocated string used after free

Emanuel Peter epeter at openjdk.org
Thu Feb 15 06:54:21 UTC 2024


On Tue, 6 Feb 2024 10:18:26 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> **Problem**
>> Bailout `failure_reason` strings used to be either `nullptr` or a static string. But with [JDK-8303951](https://bugs.openjdk.org/browse/JDK-8303951) I had introduced dynamic strings (e.g. formatted using `stringStream::as_string()`). These dynamic strings were ResourceArea allocated, and have a very limited life-span, i.e. withing the most recent ResourceMark scope. The pointer is passed to `Compile::_failure_reason`, which already might leave that ResourceMark scope, and the pointer is invalid. And then the pointer is further passed to `ciEnv::_failure_reason`. And finally, it even gets passed up to `CompileBroker::invoke_compiler_on_method`. When the string is finally printed for the bailout message, we already left original `ResourceMark` scope, and also the `Compile` and `ciEnv` scopes. The pointer now points to basically random data.
>> 
>> **Solution**
>> Whenever a string is passed to an outer scope, I make a CHeap copy, and the outer scope is the owner of that copy.
>> This way every scope is the owner of its own copy, and can allocate and deallocate according to its own strategy safely.
>> 
>> I introduced a utility class `CHeapStringHolder`:
>>  - `set`: make local copy on CHeap.
>>  - `clear`: free local copy. We before `set`, and in the destructor. Thus, when the holder goes out of scope, the memory is automatically freed.
>> 
>> We have these 4 scopes:
>> - `ResourceMark`: It allocates from `ResourcArea`, and deallocation is taken care at the end of the ResourceMark scope.
>> - `Compile`: We turn the `_failure_reason` from a `char*` into a `CHeapStringHolder`. We set `_failure_reason` while the `ResourceMark` is live.
>> - `ciEnv`: We turn the `_failure_reason` from a `char*` into a `CHeapStringHolder`. We set `_failure_reason` while `Compile` is live.
>> - `CompileTask`: We used to just set `failure_reason`, which assumes that the string is static or elsewhere managed. Now I use pattern available in other places of `CompileBroker::invoke_compiler_on_method`: duplicate the string with `os::strdup` from some other scope, and set `reason_on_C_heap = true`, which means that we are the owner of that copy, and are responsible for freeing it later. Not sure if that is a nice pattern, but I don't want to refactor that code and it does what I need it to do.
>
> Hi,
> 
> I don't agree with the way this patch works and I think we're bound for trouble. Why not store an instance of this class inside of the task instead? It takes a bit more space (136 bytes) but it actually handles its own lifetime.
> 
> ```c++
> class CompilationFailure {
>   stringStream failure_reason;
>   CompilationFailure() : failure_reason() {}
>   void replace_failure(const char* reason) { 
>     failure_reason.reset();
>     failure_reason.print_raw(reason);
>   }
>   const stringStream& reason() const {
>     return failure_reason;
>   }
> }

@jdksjolen
> I don't agree with the way this patch works and I think we're bound for trouble.

Why?

Essencially, your solution would just be a better version of what we have with `failure_reason_on_C_heap`.

Your suggestion would require that there is only a single failure_reason around at any given time. But I'm not sure if there is really only a single reason around, at least they are set at different places, and can be replaced... That is why I was looking for something that can keep the lifetime of multiple strings.

I'm not saying I'm proud of the draft-solution yet :rofl: 

These strings are stored at the level of the compilation, ciEnv (outlives compilation), and CompileTask (outlives ciEnv). So I need to allocate the string from somewhere that persists when I format the string, and then pass it into `record_method_not_compilable`. Currently, there is no clear ownership of the strings, and we did not need that as long as all the strings were static - but that has changed.

Maybe I can refactor it all and make sure there is only a single reason at any time. Not sure if that is feasible or how much work that would be.

One issue is that failure_reasons, and especially dynamic failure_reasons are rare, and not really tested. So it is hard to see if I would break things.

> I don't like it because you're leaking memory over the lifetime of the `CompileTask` and because there are simpler solutions, as we only ever store one failure reason.
> 
>>Your suggestion would require that there is only a single failure_reason around at any given time. But I'm not sure if there is really only a single reason around, at least they are set at different places, and can be replaced... That is why I was looking for something that can keep the lifetime of multiple strings.
> 
> How would multiple strings be accessed through one const pointer :-)?
> 
>>Essencially, your solution would just be a better version of what we have with failure_reason_on_C_heap.
> 
> Yes, the lifetime is managed correctly and automatically by the RAII object. Plus, the bool can be removed. One issue is that you will probably have to add a bool `has_reason` to the class  I suggested, we can't depend on something being `nullptr` to indicate absence of reason anymore.

@jdksjolen 
> How would multiple strings be accessed through one const pointer :-)?

Well, there are multiple pointers 🙈 
`grep "failure_reason" src/hotspot/share/ -r | grep "char*"`

I know that the strings are passed between `Compile`, `ciEnv` and `CompileTask`. This means we would have multiple owners, and that is not great. Again: maybe that can be refactored, but it would take me some time and it would be risky because we don't have good testing for failure_reason reporting. Feel free to dive into the different `record_method_not_compilable` methods yourself.

What about having a new `CHeapString` class? It could hold a string (or null, or even special case for static string). We pass that around. The copy-constructor just re-allocates the string on CHeap, the destructor frees the string (unless static or null). This all comes at the cost of some more (occasional, probably rare) allocations.

> You don't need a new string class, you can use `stringStream` as that will hold the string memory and its local copy. I've looked at the code and we have 3 separate pointers to a failure reason (as you said), one in ciEnv, one in Compile, and one in CompileTask. Instead of using pointers, use the class I suggested and have these 3 have ownership of their own copies of these strings. The allocations' cost is irrelevant :-).

@jdksjolen
Ok, so this new approach would have every place that currently has a pointer become a proper "owner" that manages the life-time of its string.

I think I prefer a new string-class, that also allows for null and possibly static strings (no allocation). Otherwise, if I use `stringStream`, then I would either have to initialize them as empty, and check for empty (-> no failure_reason). And I also would  have to always export the actual string over `as_string`, which copies one more time to either ResourceArea or CHeap.

Thanks for chiming in here, Johan :)

> Depending on the usage you'll just have to use `.freeze()` or `.base()` which will give you the underlying string array, no copying needed!

@jdksjolen I tried it with such a string class. And then I tried to use it for `ciEnv::_failure_reason`. But now I run into this:


/oracle-work/jdk-fork3/open/src/hotspot/share/runtime/vmStructs.cpp:729:3: note: in expansion of macro 'CHECK_NONSTATIC_VM_STRUCT_ENTRY'
  729 |   nonstatic_field(ciEnv,                       _failure_reason,                               const char*)                           \


Any suggestions with that?

Update: I think it is actually not used by the servicability agent, and I can just delete that reference.

Thanks @jdksjolen for your suggestions! I now found a solution I'm fine with, and that seems to work.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17710#issuecomment-1929217140
PR Comment: https://git.openjdk.org/jdk/pull/17710#issuecomment-1929317038
PR Comment: https://git.openjdk.org/jdk/pull/17710#issuecomment-1929381900
PR Comment: https://git.openjdk.org/jdk/pull/17710#issuecomment-1936517746
PR Comment: https://git.openjdk.org/jdk/pull/17710#issuecomment-1945464104


More information about the hotspot-dev mailing list