RFR: 8289688: jfr command hangs when it processes invalid file

Yasumasa Suenaga ysuenaga at openjdk.org
Sun Jul 24 06:51:01 UTC 2022


On Sat, 23 Jul 2022 15:51:27 GMT, Erik Gahlin <egahlin at openjdk.org> wrote:

>> jfr command was in infinite loop when I passed invalid file which are in JFR repository of GraalVM Native Image (I attached the recording file to JBS).
>> 
>> CPU usage was 100% when jfr hanged, and I got following thread stacks:
>> 
>> 
>> "main" #1 prio=5 os_prio=0 cpu=10333.35ms elapsed=10.67s tid=0x00007f4994013810 nid=10079 runnable [0x00007f499adfe000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.io.RandomAccessFile.seek0([java.base at 18.0.1](mailto:java.base at 18.0.1)/Native Method)
>>         at java.io.RandomAccessFile.seek([java.base at 18.0.1](mailto:java.base at 18.0.1)/RandomAccessFile.java:591)
>>         at jdk.jfr.internal.consumer.RecordingInput.positionPhysical([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingInput.java:102)
>>         at jdk.jfr.internal.consumer.ChunkHeader.refresh([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:124)
>>         at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:108)
>>         at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:66)
>>         at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:117)
>>         at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:113)
>>         at jdk.jfr.consumer.RecordingFile.findNext([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:256)
>>         at jdk.jfr.consumer.RecordingFile.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:89)
>>         at jdk.jfr.internal.tool.EventPrintWriter.print([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/EventPrintWriter.java:72)
>>         at jdk.jfr.internal.tool.Print.execute([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Print.java:164)
>>         at jdk.jfr.internal.tool.Main.main([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Main.java:87)
>> 
>> 
>> And also I got following log with `-J-Xlog:"jfr*=info"` - chunk size was 0, and the parser hanged in processing first chunk (id = 0):
>> 
>> 
>> $ jfr -J-Xlog:"jfr*=info" print ~/2022_07_04_18_05_11.jfr
>> [0.353s][info][jfr,system,parser] Chunk: 0
>> [0.353s][info][jfr,system,parser] Chunk: file=/home/ysuenaga/2022_07_04_18_05_11.jfr
>> [0.353s][info][jfr,system,parser] Chunk: startPosition=0
>> [0.353s][info][jfr,system,parser] Chunk: major=2
>> [0.353s][info][jfr,system,parser] Chunk: minor=0
>> [0.353s][info][jfr,system,parser] Chunk: chunkSize=0
>> [0.353s][info][jfr,system,parser] Chunk: constantPoolPosition=0
>> [0.353s][info][jfr,system,parser] Chunk: metadataPosition=0
>> [0.353s][info][jfr,system,parser] Chunk: startNanos=0
>> [0.353s][info][jfr,system,parser] Chunk: durationNanos=0
>> [0.354s][info][jfr,system,parser] Chunk: startTicks=9410187
>> [0.354s][info][jfr,system,parser] Chunk: ticksPerSecond=1000000000
>> 
>> 
>> We should abort the process when invalid chunk was detected.
>
> There is a limit of 1000 iterations, unless the recordings is opened using the EventStream or RecordingStream API, which is not the case with the 'jfr' command. 
> 
> It was a compromise, allowing unfinished recording to be parsed using the RecordingFile API (used by the 'jfr' command) in the disk repository, but it might fail if the JVM is not able to write to the chunk header within 1-2 s.
> 
> https://github.com/openjdk/jdk/blob/0599a05f8c7e26d4acae0b2cc805a65bdd6c6f67/src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/RecordingInput.java#L451

@egahlin I pushed new commit to this PR. How about this? It adds loop limit like `RecordingInput::pollWait`.

I confirmed the recording file from Native Image has `0` in `flagByte` (it is called as `feature` in HotSpot) - so root cause of the hang is `metadataPosition` is zero (we can see it in UL). This change can avoid infinite loop in that case.

-------------

PR: https://git.openjdk.org/jdk/pull/9363


More information about the hotspot-jfr-dev mailing list