Proposal: Expand Filesystem JFR Events

Marius Volkhart marius at volkhart.com
Sat Apr 1 23:55:05 UTC 2023


Hi Erik,

Thanks for reviewing my proposal. I think having the events disabled by default is a very reasonable default. In JDK 20, JFR events were add for cryptography provider usage https://seanjmullan.org/blog/2023/03/22/jdk20. These events also have a very short duration, are likely to be quite noisy for certain applications, and are disabled by default.

While I agree that many users are unlikely to turn these events on in production, especially initially, CI/development/QA use cases could really benefit from these types of insights.

My reply has taken a while as I wanted to spend some time exploring alternatives after your reply. For example, I looked into using the JMC Agent to instrument the IO I was interested in. This is doable, but it’s really quite tedious and likely to break arbitrarily between JDK releases as it requires a understanding of JDK internal classes and modification of the boot classpath. I wrote a blog post about my process and experience if you’re interested in the details: https://mariusvolkhart.github.io/Diagnosing-More-Java-Disk-IO.html

I also looked at implementing a FileSystemProvider that instruments calls, and replacing the default FileSystem in use. However, this unsatisfactory as it:
- only covers the NIO APIs, not the IO variants
- feels like a weird level to instrument at, given alternative FileSystems exist backed by cloud Object Stores, in-memory filesystems for testing, etc.

As a result, I think there is unique value in adding these events to the JDK directly. Disk interactions are a unique service provided by the JVM, the details of which are intentionally abstracted away from applications. It really feels appropriate for the JVM to also make available insights into how that service is used.

Cheers,
Marius 

> On Mar 12, 2023, at 19:25, Erik Gahlin <erik.gahlin at oracle.com> wrote:
> 
> Hi Marius,
> 
> The events you describe have a very short duration so they can't be thresholded as easily as other I/O events where a minimum duration can be set. Without means to reduce the number of events, some applications could emit millions of them, flooding the buffers and potentially create more than 1% overhead in the default configuration. If the events are not enabled in the default confiuguration, few will use them, and their value limited.
> 
> There are plans (and some ongoing work) to introduce a throttling mechanism for Java events, for example, limit to at most 300 events/second, but we are not there yet.
> 
> Cheers,
> Erik
> From: hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org> on behalf of Marius Volkhart <marius at volkhart.com>
> Sent: Tuesday, March 7, 2023 7:01 PM
> To: hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>
> Subject: Proposal: Expand Filesystem JFR Events
>  
> Hi devs,
> 
> OpenJDK already has built-in JFR events for file reads and write. However, there are a variety of additional file system interactions that are not instrumented, despite their potential impact on application performance. I would like to contribute changes that expand the built-in instrumentation to cover these interactions. I am in search of a sponsor and folks interested who might have opinions or use cases they can share.
> 
> Examples of these interactions include checking if the file exists, creating a directory, creating a symlink, and reading file attributes.
> 
> One of the design questions for something like this is, which Event to use? For example, reading and writing file attributes could be reported as a jdk.FileRead and jdk.FileWrite. However, this might muddy the waters, especially as we get to interactions like creating a directory or symlink. Here, the byte count is unknown.
> 
> Alternatively, a new event could be created. This event might report only that a file interaction occurred, but not go into further details about data count. This is likely still enough information for developers to act upon.
> 
> A concrete example comes from my day job. The desktop application in question works with user files, and customers of this app are very IO-sensitive, due to frequently using network shares and virtualized desktops. A recent dependency update caused a jump in IOPS. This was caught pre-release by Windows OS tooling, but JFR didn’t show any immediate problems. However, digging into the changes revealed that the new dependency versions was calling Files.exists(Path), Files.isRegularFile(Path), and getting file attributes far more often than the previous version.
> 
> I’ve done a bit of prototyping and research on the JFR code. At a high level, I think these changes include
> - Instrumenting various subclasses of sun.nio.fs.AbstractFileSystemProvider
> - Instrumenting various subclasses of java.io.FileSystem
> - Modifying JFR instrumentation code to handle the differences between these types and previously instrumented types. Differences include platform-specific implementation classes, and leaf-classes that don’t define all the methods being instrumented.
> 
> I have OpenJDK building and have used jtreg before, but this would be only my second contribution, so I am very much still a beginner on OpenJDK tooling and process. Still if folks show interest and a sponsor presents themselves, I’m happy to break down this admittedly large-scoped proposal into smaller chunks with more details and concrete questions/decisions.
> 
> Cheers,
> Marius Volkhart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20230401/8b286bc8/attachment.htm>


More information about the hotspot-jfr-dev mailing list