RFR: 8289688: jfr command hangs when it processes invalid file
    Erik Gahlin 
    egahlin at openjdk.org
       
    Mon Jul 11 03:09:41 UTC 2022
    
    
  
On Mon, 4 Jul 2022 11:33:51 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
> jfr command was in infinite loop when I passed invalid file which are in JFR repository of GraalVM Native Image (I attached the recording file to JBS).
> 
> CPU usage was 100% when jfr hanged, and I got following thread stacks:
> 
> 
> "main" #1 prio=5 os_prio=0 cpu=10333.35ms elapsed=10.67s tid=0x00007f4994013810 nid=10079 runnable [0x00007f499adfe000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.RandomAccessFile.seek0([java.base at 18.0.1](mailto:java.base at 18.0.1)/Native Method)
>         at java.io.RandomAccessFile.seek([java.base at 18.0.1](mailto:java.base at 18.0.1)/RandomAccessFile.java:591)
>         at jdk.jfr.internal.consumer.RecordingInput.positionPhysical([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingInput.java:102)
>         at jdk.jfr.internal.consumer.ChunkHeader.refresh([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:124)
>         at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:108)
>         at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:66)
>         at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:117)
>         at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:113)
>         at jdk.jfr.consumer.RecordingFile.findNext([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:256)
>         at jdk.jfr.consumer.RecordingFile.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:89)
>         at jdk.jfr.internal.tool.EventPrintWriter.print([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/EventPrintWriter.java:72)
>         at jdk.jfr.internal.tool.Print.execute([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Print.java:164)
>         at jdk.jfr.internal.tool.Main.main([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Main.java:87)
> 
> 
> And also I got following log with `-J-Xlog:"jfr*=info"` - chunk size was 0, and the parser hanged in processing first chunk (id = 0):
> 
> 
> $ jfr -J-Xlog:"jfr*=info" print ~/2022_07_04_18_05_11.jfr
> [0.353s][info][jfr,system,parser] Chunk: 0
> [0.353s][info][jfr,system,parser] Chunk: file=/home/ysuenaga/2022_07_04_18_05_11.jfr
> [0.353s][info][jfr,system,parser] Chunk: startPosition=0
> [0.353s][info][jfr,system,parser] Chunk: major=2
> [0.353s][info][jfr,system,parser] Chunk: minor=0
> [0.353s][info][jfr,system,parser] Chunk: chunkSize=0
> [0.353s][info][jfr,system,parser] Chunk: constantPoolPosition=0
> [0.353s][info][jfr,system,parser] Chunk: metadataPosition=0
> [0.353s][info][jfr,system,parser] Chunk: startNanos=0
> [0.353s][info][jfr,system,parser] Chunk: durationNanos=0
> [0.354s][info][jfr,system,parser] Chunk: startTicks=9410187
> [0.354s][info][jfr,system,parser] Chunk: ticksPerSecond=1000000000
> 
> 
> We should abort the process when invalid chunk was detected.
It looks like the real problem is how GraalVM Native Image writes recording data.
If the header is complete, is 68 bytes or more, the size of the field should be larger than 0. If the size filed is updated during a recording, it should use the proper protocol (modifying the file state in the header) to make sure the update is atomic.
The new check (c <= 0) is added where the value is read for logging, which is an "unreliable" read. The JVM might be modifying bytes in the header at the same time as it is being read by the parser, so it can be incorrect. Not sure we should add checks here. The magic and version are checked today, but they never change for a recording.
I would try to change this in GraalVM Native Image image instead.
-------------
PR: https://git.openjdk.org/jdk/pull/9363
    
    
More information about the hotspot-jfr-dev
mailing list