[crac] RFR: Move more FD tracking to java layer [v4]

Anton Kozlov akozlov at openjdk.org
Mon Jun 12 13:48:15 UTC 2023

On Mon, 12 Jun 2023 13:15:05 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

>> The PR develops the idea of file descriptors tracking in Java started in #43. In general, that PR had two purposes. First, it provides CheckpointExceptions in terms that are clear for Java developers, improving the experience of developing for CRac. So if a FileDescriptor causes an exception, it's possible to look at the heap dump and find references to the offending FD, or to look at the stack trace when FD was created. And second, Java FD tracking is independent of the platform, so that was the first step to bring CRaC to non-Linux platforms, but that is a bit longer road.
>> We can eliminate manual heap inspection, and this is proposed in this PR. A FileDescriptor does not exist on its own but it is owned by some higher-level Java object implementation. So an object can "claim" a FileDescriptor and define how and if to report the FD to the user. E.g. Socket can describe the its port and address without deep inspection of the process internals. Turns out, Socket.toString() provides enough information (but the reporting can be extended later if required).
>> 	Suppressed: jdk.crac.impl.CheckpointOpenSocketException: Socket[addr=localhost/,port=39957,localport=41464]
>> 		at java.base/java.net.SocketImpl$SocketResource.lambda$beforeCheckpoint$0(SocketImpl.java:123)
>> 		at java.base/jdk.crac.Core.lambda$checkpointRestore1$0(Core.java:128)
>> 		... 7 more
>> A FileDescriptor is claiming itself in case there is a bug in JDK that no higher-level object is claiming the FD. FD provides just a very short description just for debugging. With stack trace to FD (which is a very nice debugging aid!), that should be enough to find the containing object and implement claiming.
>> I believe this overlaps with #69, which at first glance would benefit a lot from being able to define policies in the domain objects. I'll comment on this after a closer look at the other PR.
> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision:
>   reSuppress -> resuppress

> I am not talking about the native _detection_, but rather about this 'minimal descriptor'. That's the part that has to know about FD types, and moving that to native does not make it any simpler.

Minimal FD reporting can be removed. I don't wan't to do this right now, as it's very likely java reporting does not cover many cases. But once we've get more java reporting, we'll be able to remove file descriptor altogether. 

Although, I can remove it right now https://github.com/openjdk/crac/pull/79#discussion_r1226574946

> > There should be no major drawback. Can you see something?
> Let's keep the test with FileReader please. I hate removing tested scenarios; the outcome could change (accompanied with a comment) but it's rare where the behaviour does not make sense after the refactoring.

Exception in thread "main" jdk.crac.CheckpointException
	Suppressed: jdk.crac.impl.CheckpointOpenResourceException: FileDescriptor 4: regular: /etc/passwd
		at java.base/java.io.FileDescriptor$Resource.lambda$beforeCheckpoint$0(FileDescriptor.java:76)
		at java.base/jdk.crac.Core.checkpointRestore1(Core.java:141)
		at java.base/jdk.crac.Core.checkpointRestore(Core.java:266)
		at java.base/jdk.crac.Core.checkpointRestore(Core.java:245)
		at OpenFileDetectionTest.exec(OpenFileDetectionTest.java:53)
		at jdk.test.lib.crac.CracTest.run(CracTest.java:157)
		at jdk.test.lib.crac.CracTest.main(CracTest.java:89)
	Caused by: java.lang.Exception: This file descriptor was created by main at epoch:1686575586305 here
		at java.base/jdk.internal.crac.JDKFdResource.<init>(JDKFdResource.java:25)
		at java.base/java.io.FileDescriptor$Resource.<init>(FileDescriptor.java:61)
		at java.base/java.io.FileDescriptor.<init>(FileDescriptor.java:87)
		at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
		at java.base/java.io.FileInputStream.<init>(FileInputStream.java:111)
		at java.base/java.io.FileReader.<init>(FileReader.java:60)
		at OpenFileDetectionTest.exec(OpenFileDetectionTest.java:52)
		... 2 more

The existing claiminig code in the RandomFileAccess does nothing in the current code state, as FileInputStream manages its own FileDescriptor. Reverting FileReader in the test would require a new implementation in FileInputStream and extending the test, as that does not cover FileInputStream created with fd. So test was not adequate to the implementation, but now it is. FileReader/FileInputStream are better to be done separately, without bloating this PR.

> > I think rebasing #69 on top of this would benefit the implementations of the policies, it will make them cleaner and more roboust. E.g. policy implementaiton in File can also prevent access to FileDescriptor with CLOSE policy, during small time window between the point FD is closed, but checkpoint is not started yet.
> The way you made `getPath` abstract, expecting the owner of FD to know it has several effects:
> * You'd need to update every use of FD rather than having this in one place.

Yes, this is a design. Repeating code is abstracted into utillity classes, e.g. JDKFileResrouce, but a more specific File class can implement something different compared to the default.

> * Some information (flags, offsets) could not be reproduced reliably, would be complicated or would have to be based on defaults, rather than the actual value. E.g. logical offset of reading differs from physical offset, there's buffering in place (in libc) - I ran into something like this in https://github.com/openjdk/crac/pull/69/files#diff-ac18cee51b4abf767b1ad66d127afe6e609aaef2e66349032ce23635efa6457aR59 (that's why the tested content is past 1MB - I don't remember what was the actual size of the buffer).

That's interesting, but anyway should not be handled by FileDescriptor.


PR Comment: https://git.openjdk.org/crac/pull/79#issuecomment-1587375100

More information about the crac-dev mailing list