RFR: 8307526: [JFR] Better handling of tampered JFR repository
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored. With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight. Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files. ------------- Commit messages: - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository Changes: https://git.openjdk.org/jdk/pull/14360/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307526 Stats: 172 lines in 5 files changed: 137 ins; 11 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository ------------- Changes: https://git.openjdk.org/jdk/pull/14360/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=01 Stats: 172 lines in 5 files changed: 137 ins; 11 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
On Wed, 7 Jun 2023 15:42:44 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecorder.java line 460:
458: } 459: } 460: } catch (FileNotFoundException e) {
Suggestion: } catch (FileNotFoundException e) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1228308228
On Tue, 13 Jun 2023 15:13:46 GMT, Andrey Turbanov <aturbanov@openjdk.org> wrote:
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecorder.java line 460:
458: } 459: } 460: } catch (FileNotFoundException e) {
Suggestion:
} catch (FileNotFoundException e) {
Nice catch! Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1229550950
On Wed, 7 Jun 2023 15:42:44 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecorder.java line 526:
524: long minDelta = PeriodicEvents.doPeriodic(); 525: wait = Math.min(minDelta, Options.getWaitInterval()); 526: } catch(Throwable t) {
Suggestion: } catch (Throwable t) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1228314125
On Tue, 13 Jun 2023 15:17:46 GMT, Andrey Turbanov <aturbanov@openjdk.org> wrote:
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecorder.java line 526:
524: long minDelta = PeriodicEvents.doPeriodic(); 525: wait = Math.min(minDelta, Options.getWaitInterval()); 526: } catch(Throwable t) {
Suggestion:
} catch (Throwable t) {
Fixed this too. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1230953293
On Wed, 7 Jun 2023 15:42:44 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/FilePurger.java line 70:
68: return true; 69: } 70: } catch (IOException e) {
Looks like there could be a race here. If the file is removed after the exists check, but before the file is deleted, the method may return false incorrectly. src/jdk.jfr/share/classes/jdk/jfr/internal/RepositoryChunk.java line 202:
200: String message = "Chunkfile \""+ chunkFile.toString() + "\" is missing."; 201: Logger.log(LogTag.JFR, LogLevel.ERROR, message + " Data loss might occur from " + startTime.toString() + (endTime != null ? " to " + endTime.toString() : "")); 202: ErrorThrownEvent.commit(startTime.getNano(), endTime != null ? Duration.between(startTime, endTime).toNanos() : 0L, message, MissingChunkFileError.class);
The event timing is in ticks, not nanos, so the time would not be correct.. I think it would be best to skip the MissingChunkFileError event completely and rely on the log. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1229515075 PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1229520038
On Wed, 14 Jun 2023 12:18:28 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/FilePurger.java line 70:
68: return true; 69: } 70: } catch (IOException e) {
Looks like there could be a race here. If the file is removed after the exists check, but before the file is deleted, the method may return false incorrectly.
Yes, well spotted. However, does it matter? If this method return false, it means the file will still be in the list of paths to remove, and upon next purge, the SecuritySupport.exists(p) would see the file doesn't exist, and return true. I added it because SecuritySupport.delete() might throw an exception (via Files.delete()) if the file doesn't exists, causing non-existent files to be left in the purge list until removeOldest() removes them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1230540737
On Wed, 14 Jun 2023 12:21:58 GMT, Erik Gahlin <egahlin@openjdk.org> wrote:
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
- Merge - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository - 8307526: [JFR] Better handling of tampered JFR repository
src/jdk.jfr/share/classes/jdk/jfr/internal/RepositoryChunk.java line 202:
200: String message = "Chunkfile \""+ chunkFile.toString() + "\" is missing."; 201: Logger.log(LogTag.JFR, LogLevel.ERROR, message + " Data loss might occur from " + startTime.toString() + (endTime != null ? " to " + endTime.toString() : "")); 202: ErrorThrownEvent.commit(startTime.getNano(), endTime != null ? Duration.between(startTime, endTime).toNanos() : 0L, message, MissingChunkFileError.class);
The event timing is in ticks, not nanos, so the time would not be correct.. I think it would be best to skip the MissingChunkFileError event completely and rely on the log.
I've skipped the MissingChunkFileError event when finding missing files during dumping (it would've only shown up in subsequent recordings anyways, so of little use). I instead throw the error when finding missing files in-flight; I think it's important to have something that hints to user that a recording might be missing data, and the error will show up in the recording that might be missing some data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1230952579
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with two additional commits since the last revision: - Changed to throw error when missing files in-flight, and changing to only logging when dumping recording to disk. - Fixed copyright year and comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/78d8e6bc..752186d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=01-02 Stats: 15 lines in 4 files changed: 0 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with one additional commit since the last revision: Removed now unused import. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/752186d2..05dd37d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with one additional commit since the last revision: Emit jdk.DataLoss event and some other re-work. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/05dd37d6..54b536b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=03-04 Stats: 90 lines in 7 files changed: 43 ins; 26 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with three additional commits since the last revision: - Commit - Commit - Removed MissingChunkFileError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/54b536b7..ad22f815 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=04-05 Stats: 45 lines in 2 files changed: 0 ins; 45 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
On Fri, 30 Jun 2023 11:40:05 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with three additional commits since the last revision:
- Commit - Commit - Removed MissingChunkFileError
I've rewritten some parts of the solution. - Removed the MissingChunkfileError to instead emit a jdk.DataLoss event for each missing chunkfile. The "amount" field contains the number of bytes missing, the "total" field of the event is set to 0, the event time is when the chunkfile was found missing. - Moved the privileged IO into to transferChunksWithRetry method. Please have a look again. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14360#issuecomment-1614537515
On Fri, 30 Jun 2023 11:40:05 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with three additional commits since the last revision:
- Commit - Commit - Removed MissingChunkFileError
src/hotspot/share/jfr/jni/jfrJniMethod.cpp line 406:
404: 405: JVM_ENTRY_NO_ENV(void, jfr_emit_data_loss(JNIEnv* env, jobject jvm, jlong bytes)) 406: EventDataLoss::commit(bytes, 0L);
If a value is missing, we use min_jlong to indicate not available or not applicable src/jdk.jfr/share/classes/jdk/jfr/internal/PlatformRecorder.java line 525:
523: } catch(Throwable t) { 524: // Catch everything and log, but don't allow it to end the periodic task 525: Logger.log(JFR_SYSTEM, ERROR, "Error in Periodic task: " + t.getClass().getName() + ", " + t.getMessage());
Not sure if it matters, but you could get an OOM when string are concatenated and the thread will end. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1250856963 PR Review Comment: https://git.openjdk.org/jdk/pull/14360#discussion_r1250890601
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with one additional commit since the last revision: Fix logging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/ad22f815..a5a6ce99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with one additional commit since the last revision: Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/a5a6ce99..0f3b9ca5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=06-07 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with one additional commit since the last revision: Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14360/files - new: https://git.openjdk.org/jdk/pull/14360/files/0f3b9ca5..4f7427c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
On Tue, 4 Jul 2023 07:37:20 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request incrementally with one additional commit since the last revision:
Fixes
Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14360#pullrequestreview-1512719925
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Minor fix - Merge - Fixes - Fixes - Fix logging - Commit - Commit - Removed MissingChunkFileError - Emit jdk.DataLoss event and some other re-work. - Removed now unused import. - ... and 8 more: https://git.openjdk.org/jdk/compare/d6578bff...2ebc5e56 ------------- Changes: https://git.openjdk.org/jdk/pull/14360/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14360&range=09 Stats: 144 lines in 8 files changed: 102 ins; 8 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/14360.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14360/head:pull/14360 PR: https://git.openjdk.org/jdk/pull/14360
On Wed, 5 Jul 2023 08:31:24 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
### Testing - [x] tier1-3 - [x] jdk_jfr
Joakim Nordström has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits:
- Minor fix - Merge - Fixes - Fixes - Fix logging - Commit - Commit - Removed MissingChunkFileError - Emit jdk.DataLoss event and some other re-work. - Removed now unused import. - ... and 8 more: https://git.openjdk.org/jdk/compare/d6578bff...2ebc5e56
Thanks for reviewing, @turbanoff and @egahlin. Could I also request a sponsor for this? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14360#issuecomment-1622502602
On Wed, 7 Jun 2023 14:48:52 GMT, Joakim Nordström <jnordstrom@openjdk.org> wrote:
This change makes sure that "JFR Periodic Task" isn't interrupted by any errors. This can happen when an unfinished chunkfile is removed "in-flight", which would lead to the chunkfiles not being rotated properly, and the maxsize/maxage being ignored.
With this fix, when an unfinished chunkfile is detected, all of the chunkfiles in the recording are checked for existence (since one likely cause for this could be f.i. 'rm -r /tmp' being invoked, effectively deleting all chunkfiles). Upon seeing missing chunkfiles, an error is logged, and emitted to the recording to signal that some data might be missing. A check to catch any missing chunkfiles is also added when dumping a recording -- this can happen if only finished chunkfiles have been removed, which is not detected in-flight.
Also, a check was added to the file purger to check whether the file had already been deleted to not fill the purge list with already removed files.
### Testing - [x] tier1-3 - [x] jdk_jfr
This pull request has now been integrated. Changeset: 66d27365 Author: Joakim Nordström <jnordstrom@openjdk.org> Committer: Erik Gahlin <egahlin@openjdk.org> URL: https://git.openjdk.org/jdk/commit/66d2736521611fbe7652356894d046c17d7cf743 Stats: 144 lines in 8 files changed: 102 ins; 8 del; 34 mod 8307526: [JFR] Better handling of tampered JFR repository Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/14360
participants (3)
-
Andrey Turbanov
-
Erik Gahlin
-
Joakim Nordström