[External] : Re: JFR: Scrubbing sensitive information from events

Erwan Viollet erwan.viollet at gmail.com
Fri Jun 13 12:51:22 UTC 2025


I like the simple and pragmatic approach. Especially if we can have this as
default.
Writing custom scrubbing logic for these specific events is much more
straightforward than building a generic scrubbing framework.

To address the risk of missing sensitive patterns that are specific to user
environments, I'd suggest adding a customization option:
# Allow users to extend or override default patterns
-XX:FlightRecorderOptions:scrub-sensitive-patterns="password,secret,key,token,credential,auth,pwd,passwd,api[_-]?key,custompattern"

We had to do the same for the Datadog Agent process scrubbing
<https://docs.datadoghq.com/infrastructure/process/?tab=linuxwindows#process-arguments-scrubbing>,
where users can specify `custom_sensitive_words` to handle their specific
use cases.

The remaining questions:
1. Scope limitations: Are there any known use cases that would be impacted
by limiting scrubbing to these four event types? We believe this covers the
vast majority of sensitive data exposure, but want to ensure we're not
missing critical scenarios.

2. Default pattern selection:
We should align on what a reasonable default would look like. We can use
our JFR Data to do this.

Regards,

Erwan

Le jeu. 12 juin 2025 à 19:24, Erik Gahlin <erik.gahlin at oracle.com> a écrit :

> Thanks for the file.
>
> I worry that processing the file in the JVM or creating an intuitive Java
> API for post-processing it will be hard. The context/event determines what
> needs to be redacted. If scrubbing is only necessary for these four events,
> hardcoding the sensitive tokens and logic into the JVM might be a viable
> approach.
>
> Users would specify:
>
>  $ java -XX:FlightRecorderOptions:scrub-sensitive=true
>
> or it might be enabled by default and users would need to opt-out.
>
> Anyway, if enabled, a jfrScrub.cpp class would do the job. Something like
> this:
>
>     EventInitialEnvironmentVariable event(UNTIMED);
>     event.set_starttime(time_stamp);
>     event.set_endtime(time_stamp);
>     event.set_key(key);
>     if (JfrScrub::is_sensitive_key(key)) {
>       event.set_value("[REDACTED]");
>     } else {
>       event.set_value(value);
>     }
>     event.commit();
>
>     EventInitialSystemProperty event(UNTIMED);
>     event.set_key(p->key());
>     if (JfrScrub::is_sensitive_key(p->key()) {
>        event.set_value("[REDACTED]");
>     } else {
>        event.set_value(p->value());
>     }
>     event.set_starttime(time_stamp);
>     event.set_endtime(time_stamp);
>     event.commit();
>
>     EventSystemProcess event(UNTIMED);
>     event.set_pid(pid_buf);
>     event.set_commandLine(JfrScrub::command_line(info));
>     event.set_starttime(start_time);
>     event.set_endtime(end_time);
>     event.commit();
>
>     EventJVMInformation event;
>     event.set_jvmName(VM_Version::vm_name());
>     event.set_jvmVersion(VM_Version::internal_vm_info_string());
>
> event.set_javaArguments(JfrScrub::command_line(Arguments::java_command()));
>     event.set_jvmArguments(Arguments::jvm_args());
>     event.set_jvmFlags(Arguments::jvm_flags());
>     event.set_jvmStartTime(Management::vm_init_done_time());
>     event.set_pid(os::current_process_id());
>     event.commit();
>
> It's a bit ugly and not as flexible, but perhaps that's something we need
> to tolerate. Or will it be useless because new passwords/keys will be added
> all the time, or because they will match false positives, and more advanced
> logic is needed? Perhaps it will give users the false(?) impression that
> they don't need to worry about sensitive data?
>
> Thanks
> Erik
> ------------------------------
> *From:* Erwan Viollet <erwan.viollet at gmail.com>
> *Sent:* Thursday, June 12, 2025 5:07 PM
> *To:* Erik Gahlin <erik.gahlin at oracle.com>
> *Cc:* hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>
> *Subject:* [External] : Re: JFR: Scrubbing sensitive information from
> events
>
> Hello,
>
> Here is an example of the types of events we are concerned about:
>
> Recording
>> ├── Event (e.g. jdk.InitialSystemProperty)
> │     ├── eventType: "jdk.InitialSystemProperty"
> │     ├── startTime
> │     ├── duration
> │     ├── fields:
> │     │     ├── key: "javax.net.ssl.keyStorePassword"
> │     │     ├── value: "*supersecret*"
> │     │     └── ...
> │     └── ...
>> ├── Event (e.g. jdk.JVMInformation)
> │     ├── eventType: "jdk.JVMInformation"
> │     ├── jvmArguments: [ "-Xmx4G", "-Djavax.net.ssl.keyStorePassword=
> *supersecret*", ... ]
> │     └── ...
>> └── ...
>
> The rules are slightly challenging as they need to account for key/value
> pairs, arrays and simple fields (like commandLine field).
> Here
> <https://urldefense.com/v3/__https://gist.github.com/r1viollet/812ed70c6410c4f62640fd792570d36c__;!!ACWV5N9M2RV99hQ!LrFg9xF9Jy2l4LW6sj6mxVPhLXr30tA_2lzstCSiBbi4SxLyh8t2wDJGc4E1b7ePKrrsivDhkoZtMsHWCKsBeBLJ$>
> is a scrub file example. I'm happy to consider ways to simplify this
> proposal. Storing JFR files would also be helpful to consider test cases.
> Regards,
>
> Erwan
>
>
> Le mar. 3 juin 2025 à 11:50, Erik Gahlin <erik.gahlin at oracle.com> a
> écrit :
>
> We have discussed it, but we don't understand all the details. We are also
> unsure how to best expose it to the end user. Let's say there was a command
> line option -XX:FlightRecorder:scrub-file=<file>.
>
> What would you fill that file with? I want examples that work on real data
> to understand how expressive the filters must be.
>
> Thanks
> Erik
> ------------------------------
> *From:* hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org> on behalf of
> Erwan Viollet <erwan.viollet at gmail.com>
> *Sent:* Monday, June 2, 2025 3:30 PM
> *To:* hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>
> *Subject:* JFR: Scrubbing sensitive information from events
>
> Hello,
>
> I am currently looking into how to remove sensitive information from JFR
> events. The main events that typically contain sensitive information:
> jdk.SystemProcess,  jdk.InitialSystemProperty, jdk.JVMInformation.
> Passwords from command lines can typically be found in these events.
>
> Dropping these events altogether is not ideal, as we need them to make
> relevant performance recommendations to users (e.g. suggesting JVM or
> system setting adjustments).
>
> Dropping them or scrubbing them on the backend side (after the fact)
> requires decompressing and re-writing these events, which is wasteful in
> terms of both compute and storage. The approach is not perfect, as we still
> end up intaking and temporarily storing sensitive information.
>
> Ideally, we would like to be able to scrub or redact only the sensitive
> fields within these events (for example, using a simple regex or
> pattern-based rule), rather than dropping the whole event. We also want to
> avoid handling this only after the event has already been written to the
> JFR file, as that does not fully mitigate the risk of exposing sensitive
> data.
>
> At present, it appears there is no public API or supported mechanism to
> intercept or scrub JFR events in-process, before they are persisted. What
> would you think of an API accepting custom scrubbing patterns so that
> sensitive data never leaves the JVM in an unredacted state?
>
> Are there any plans or discussions in this area? I am fairly new to the
> JFR world, so it is likely that I missed previous discussions around this.
>
> Thank you, Best regards,
>
> Erwan Viollet,
>
> Profiling team, Datadog
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20250613/e67af53c/attachment-0001.htm>


More information about the hotspot-jfr-dev mailing list