RFR: 8289688: jfr command hangs when it processes invalid file
Yasumasa Suenaga
ysuenaga at openjdk.org
Fri Jul 22 07:25:02 UTC 2022
On Mon, 4 Jul 2022 11:33:51 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
> jfr command was in infinite loop when I passed invalid file which are in JFR repository of GraalVM Native Image (I attached the recording file to JBS).
>
> CPU usage was 100% when jfr hanged, and I got following thread stacks:
>
>
> "main" #1 prio=5 os_prio=0 cpu=10333.35ms elapsed=10.67s tid=0x00007f4994013810 nid=10079 runnable [0x00007f499adfe000]
> java.lang.Thread.State: RUNNABLE
> at java.io.RandomAccessFile.seek0([java.base at 18.0.1](mailto:java.base at 18.0.1)/Native Method)
> at java.io.RandomAccessFile.seek([java.base at 18.0.1](mailto:java.base at 18.0.1)/RandomAccessFile.java:591)
> at jdk.jfr.internal.consumer.RecordingInput.positionPhysical([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingInput.java:102)
> at jdk.jfr.internal.consumer.ChunkHeader.refresh([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:124)
> at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:108)
> at jdk.jfr.internal.consumer.ChunkHeader.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkHeader.java:66)
> at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:117)
> at jdk.jfr.internal.consumer.ChunkParser.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/ChunkParser.java:113)
> at jdk.jfr.consumer.RecordingFile.findNext([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:256)
> at jdk.jfr.consumer.RecordingFile.<init>([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/RecordingFile.java:89)
> at jdk.jfr.internal.tool.EventPrintWriter.print([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/EventPrintWriter.java:72)
> at jdk.jfr.internal.tool.Print.execute([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Print.java:164)
> at jdk.jfr.internal.tool.Main.main([jdk.jfr at 18.0.1](mailto:jdk.jfr at 18.0.1)/Main.java:87)
>
>
> And also I got following log with `-J-Xlog:"jfr*=info"` - chunk size was 0, and the parser hanged in processing first chunk (id = 0):
>
>
> $ jfr -J-Xlog:"jfr*=info" print ~/2022_07_04_18_05_11.jfr
> [0.353s][info][jfr,system,parser] Chunk: 0
> [0.353s][info][jfr,system,parser] Chunk: file=/home/ysuenaga/2022_07_04_18_05_11.jfr
> [0.353s][info][jfr,system,parser] Chunk: startPosition=0
> [0.353s][info][jfr,system,parser] Chunk: major=2
> [0.353s][info][jfr,system,parser] Chunk: minor=0
> [0.353s][info][jfr,system,parser] Chunk: chunkSize=0
> [0.353s][info][jfr,system,parser] Chunk: constantPoolPosition=0
> [0.353s][info][jfr,system,parser] Chunk: metadataPosition=0
> [0.353s][info][jfr,system,parser] Chunk: startNanos=0
> [0.353s][info][jfr,system,parser] Chunk: durationNanos=0
> [0.354s][info][jfr,system,parser] Chunk: startTicks=9410187
> [0.354s][info][jfr,system,parser] Chunk: ticksPerSecond=1000000000
>
>
> We should abort the process when invalid chunk was detected.
I debug'ed JfrHtmlRulesReport in JMC to know how JMC detects the chunk is broken.
I found the error happend to process the second chunk - JMC detected it has invalid magic.
JMC would read the chunk while the index is lower than chunk size as following. On the other hand, jfr command does not rely on chunk size in `ChunkHeader::refresh()`. This is the big difference between JMC and jfr, and it is cause of infinite loop.
https://github.com/openjdk/jmc/blob/8.2.0-ga/core/org.openjdk.jmc.flightrecorder/src/main/java/org/openjdk/jmc/flightrecorder/internal/parser/v1/ChunkLoaderV1.java#L82-L95
// Read events
long index = header.getBodyStartOffset();
while (index < header.getChunkSize()) {
input.seek(index);
int size = input.readInt();
long type = input.readLong();
if (size == 0) {
throw new CouldNotLoadRecordingException("Found event with invalid size (0)"); //$NON-NLS-1$
}
if (type != CONSTANT_POOL_EVENT_TYPE && type != ChunkMetadata.METADATA_EVENT_TYPE) {
manager.readEvent(type, input, size);
}
index += size;
}
I think it is better to refactor jfr code like JMC, but it will big change. So I think this PR is a reasonable to fix the problem.
-------------
PR: https://git.openjdk.org/jdk/pull/9363
More information about the hotspot-jfr-dev
mailing list