RFR: 8257621: JFR StringPool misses cached items across consecutive recordings

Jie Kang jkang at openjdk.java.net
Fri Dec 4 17:14:13 UTC 2020


On Fri, 4 Dec 2020 17:07:05 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> This addresses 8257621 by resetting the Java cache when a recording is stopped and there are no other recordings still running. The reproducer described in 8257621 no longer exhibits the bug with this change. However, I'm not sure if there are other edge cases missed, or if this fits with the expected design of the StringPool, Recording and Chunk systems. Any insight would be appreciated; I'm willing to adjust as needed.
>> 
>> On:
>> Linux (Fedora 31), jtreg 5.1-b01 configured with flags --disable-warnings-as-errors --with-jtreg=/path/to/jtreg-5.1-b01/
>> 
>> I have run the following with this patch applied:
>> `make run-test TEST=":jdk_jfr"`
>> `make run-test-tier1`
>> 
>> The test `jdk.jfr.api.event.TestShouldCommit` fails for me but I believe it is fragile and not relevant to the change proposed here.
>
> Great analysis, makes sense. 
> 
> The test paths I see in the test system is about 80 characters.

The problem scenario in more detail is:

StringPool has a Java side cache for Strings that meet the caching requirements. jfrStringPool.hpp/cpp is the native accompaniment that is called to write strings into the constant pool if a string is submitted to the Java layer cache that does not already exist, or is submitted for a *new* epoch.

Start recording A, at epoch 0. Emit 2 events with a String "s". This does nto exist in the Java cache, so is cached and pushed into the native layer to emit as a constant pool. Stop recording A. The event will have message=1, and constant pool will have java.lang.String entry:  `1="s"`.

Start recording B, still at epoch 0. Emit 2 events with a String "s". The Java layer cache still contains "s" for epoch 0. No request is made to push the string to the native layer to emit as a constant pool. Stop recording B. The event will have message=1, but there will be no constant pool entry for `1`.

My fix in more detail is:

Reset the Java layer cache across the recording boundary, specifically when a recording is stopped and there are no other running recordings. I was also thinking about rotating the epoch at this boundary, but I'm not sure what impacts it would have with the other jfr sub-systems.

I have also tested this fix in the simple "concurrent recording" case where I start recording A, start recording B, stop recording A and stop recording B.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1576


More information about the hotspot-jfr-dev mailing list