RFR: 8307526: [JFR] Better handling of tampered JFR repository [v2]

Erik Gahlin egahlin at openjdk.org
Wed Jun 14 12:27:02 UTC 2023


On Wed, 7 Jun 2023 15:42:44 GMT, Joakim Nordström <jnordstrom at openjdk.org> wrote:

>> This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
>> 
>> With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing.
>> A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
>> 
>> Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
>
> Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge
>  - 8307526: [JFR] Better handling of tampered JFR repository
>  - 8307526: [JFR] Better handling of tampered JFR repository
>  - 8307526: [JFR] Better handling of tampered JFR repository
>  - 8307526: [JFR] Better handling of tampered JFR repository
>  - 8307526: [JFR] Better handling of tampered JFR repository

src/jdk.jfr/share/classes/jdk/jfr/internal/FilePurger.java line 70:

> 68:                 return true;
> 69:             }
> 70:         } catch (IOException e) {

Looks like there could be a race here. If the file is removed after the exists check, but before the file is deleted, the method may return false incorrectly.

src/jdk.jfr/share/classes/jdk/jfr/internal/RepositoryChunk.java line 202:

> 200:         String message = "Chunkfile \""+ chunkFile.toString() + "\" is missing.";
> 201:         Logger.log(LogTag.JFR, LogLevel.ERROR, message + " Data loss might occur from " + startTime.toString() + (endTime != null ? " to " + endTime.toString() : ""));
> 202:         ErrorThrownEvent.commit(startTime.getNano(), endTime != null ? Duration.between(startTime, endTime).toNanos() : 0L, message, MissingChunkFileError.class);

The event timing is in ticks, not nanos, so the time would not be correct.. I think it would be best to skip the MissingChunkFileError event completely and rely on the log.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1229515075
PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1229520038


More information about the hotspot-jfr-dev mailing list