RFR: 8289688: jfr command hangs when it processes invalid file [v2]
Yasumasa Suenaga
ysuenaga at openjdk.org
Sun Jul 24 09:56:11 UTC 2022
On Sun, 24 Jul 2022 06:45:03 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
>> jfr command was in infinite loop when I passed invalid file which are in JFR repository of GraalVM Native Image (I attached the recording file to JBS).
>>
>> CPU usage was 100% when jfr hanged, and I got following thread stacks:
>>
>>
>> "main" #1 prio=5 os_prio=0 cpu=10333.35ms elapsed=10.67s tid=0x00007f4994013810 nid=10079 runnable [0x00007f499adfe000]
>> java.lang.Thread.State: RUNNABLE
>> at java.io.RandomAccessFile.seek0([java.base at 18.0.1](mailto:java.base at 18.0.1)/Native Method)
>> at java.io.RandomAccessFile.seek([java.base at 18.0.1](mailto:java.base at 18.0.1)/RandomAccessFile.java:591)
>> at jdk.jfr.internal.consumer.RecordingInput.positionPhysical([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingInput.java:102)
>> at jdk.jfr.internal.consumer.ChunkHeader.refresh([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:124)
>> at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:108)
>> at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:66)
>> at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:117)
>> at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:113)
>> at jdk.jfr.consumer.RecordingFile.findNext([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:256)
>> at jdk.jfr.consumer.RecordingFile.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:89)
>> at jdk.jfr.internal.tool.EventPrintWriter.print([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/EventPrintWriter.java:72)
>> at jdk.jfr.internal.tool.Print.execute([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Print.java:164)
>> at jdk.jfr.internal.tool.Main.main([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Main.java:87)
>>
>>
>> And also I got following log with `-J-Xlog:"jfr*=info"` - chunk size was 0, and the parser hanged in processing first chunk (id = 0):
>>
>>
>> $ jfr -J-Xlog:"jfr*=info" print ~/2022_07_04_18_05_11.jfr
>> [0.353s][info][jfr,system,parser] Chunk: 0
>> [0.353s][info][jfr,system,parser] Chunk: file=/home/ysuenaga/2022_07_04_18_05_11.jfr
>> [0.353s][info][jfr,system,parser] Chunk: startPosition=0
>> [0.353s][info][jfr,system,parser] Chunk: major=2
>> [0.353s][info][jfr,system,parser] Chunk: minor=0
>> [0.353s][info][jfr,system,parser] Chunk: chunkSize=0
>> [0.353s][info][jfr,system,parser] Chunk: constantPoolPosition=0
>> [0.353s][info][jfr,system,parser] Chunk: metadataPosition=0
>> [0.353s][info][jfr,system,parser] Chunk: startNanos=0
>> [0.353s][info][jfr,system,parser] Chunk: durationNanos=0
>> [0.354s][info][jfr,system,parser] Chunk: startTicks=9410187
>> [0.354s][info][jfr,system,parser] Chunk: ticksPerSecond=1000000000
>>
>>
>> We should abort the process when invalid chunk was detected.
>
> Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Add loop limit to ChunkHeader::refresh
> - Revert "8289688: jfr command hangs when it processes invalid file"
>
> This reverts commit 52c247039332d9e51570fb2085b6b7c180aee079.
> - Merge remote-tracking branch 'upstream/master' into JDK-8289688
> - 8289688: jfr command hangs when it processes invalid file
`ChunkHeader::refresh` need to satisfy to exit loop following conditions:
* File state is "stable" (get same state twice (L123 and L131))
* `metadataPosition` is not (greater than) 0
`fileState1` comes from `readFileState()` which uses `pollWait()` as you mentioned, so we can deem the recording data is "stable" if L134 is true.
123 byte fileState1 = readFileState();
:
131 byte fileState2 = input.readPhysicalByte();
132 input.positionPhysical(absoluteChunkStart + FLAG_BYTE_POSITION);
133 int flagByte = input.readPhysicalByte();
134 if (fileState1 == fileState2) { // valid header
135 finished = fileState1 == 0;
136 if (metadataPosition != 0) {
:
164 return;
165 }
In addition, we can deem that recording data would not be updated if `finished` is set to `true` in L135. Thus I think we can fix this problem with following patch. How about this?
diff --git a/src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/ChunkHeader.java b/src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/ChunkHeader.java
index 8c8ec55f0da..8648c030f06 100644
--- a/src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/ChunkHeader.java
+++ b/src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/ChunkHeader.java
@@ -162,6 +162,8 @@ public final class ChunkHeader {
Logger.log(LogTag.JFR_SYSTEM_PARSER, LogLevel.INFO, "Chunk: finalChunk=" + finalChunk);
absoluteChunkEnd = absoluteChunkStart + chunkSize;
return;
+ } else if (finished) {
+ throw new IOException("metadataPosition is 0 but chunk header is valid.");
}
}
}
I understand that producer (Native Image in this case) should be fixed not to out invalid recording file. However we need to notify errors when the user passes invalid file to jfr command. So I've proposed this to jdk project.
-------------
PR: https://git.openjdk.org/jdk/pull/9363
More information about the hotspot-jfr-dev
mailing list