[crac] RFR: Move more FD tracking to java layer [v2]

Radim Vansa duke at openjdk.org
Mon Jun 12 08:29:12 UTC 2023

On Fri, 9 Jun 2023 16:33:49 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

>> The PR develops the idea of file descriptors tracking in Java started in #43. In general, that PR had two purposes. First, it provides CheckpointExceptions in terms that are clear for Java developers, improving the experience of developing for CRac. So if a FileDescriptor causes an exception, it's possible to look at the heap dump and find references to the offending FD, or to look at the stack trace when FD was created. And second, Java FD tracking is independent of the platform, so that was the first step to bring CRaC to non-Linux platforms, but that is a bit longer road.
>> We can eliminate manual heap inspection, and this is proposed in this PR. A FileDescriptor does not exist on its own but it is owned by some higher-level Java object implementation. So an object can "claim" a FileDescriptor and define how and if to report the FD to the user. E.g. Socket can describe the its port and address without deep inspection of the process internals. Turns out, Socket.toString() provides enough information (but the reporting can be extended later if required).
>> 	Suppressed: jdk.crac.impl.CheckpointOpenSocketException: Socket[addr=localhost/,port=39957,localport=41464]
>> 		at java.base/java.net.SocketImpl$SocketResource.lambda$beforeCheckpoint$0(SocketImpl.java:123)
>> 		at java.base/jdk.crac.Core.lambda$checkpointRestore1$0(Core.java:128)
>> 		... 7 more
>> A FileDescriptor is claiming itself in case there is a bug in JDK that no higher-level object is claiming the FD. FD provides just a very short description just for debugging. With stack trace to FD (which is a very nice debugging aid!), that should be enough to find the containing object and implement claiming.
>> I believe this overlaps with #69, which at first glance would benefit a lot from being able to define policies in the domain objects. I'll comment on this after a closer look at the other PR.
> Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 95 commits:
>  - Merge remote-tracking branch 'jdk/crac/crac' into newfd
>  - Fix a bug in classPathMatching
>  - Remove misleading UnreachableSocket
>  - Merge remote-tracking branch 'jdk/crac/crac' into newfd
>  - Cleanup
>  - Remove AcceptingSocketResource
>    Does not need to exist
>  - Cleanup and extend FileDescriptor nativeDesc
>  - Update
>  - Workaround for the blocking context
>  - Claim closing socket
>  - ... and 85 more: https://git.openjdk.org/crac/compare/a282698d...bcfd07a4

> I'm really not sure about Windows. With this, Socket details will be reported on Windows without additional code for reporting.

Being able to retrieve details from the Socket class is certainly nice, though you haven't removed the socket-describing code and you won't, as there are cases where sockets are created natively without the Socket class and you still want to show info about those.

> That requires additional code in the FD that needs to know all possible file descriptors types on the system, and new interfaces to extract information from them, like you had to do with Sockets.

... and this is present again, but this time in the native code. I was trying to extract the bits in a form that Java understands, though getting them was not a one-liner in the native, so I understand that it might be tempting to just format that in native as well and have fewer native methods. I was hoping for eventual reuse.

> Since when this becomes a prerequisite? :) Refactorings and non-functional changes are fine for CRaC Project.

Two points:
- having a case where this is actually helpful explains the general reasoning behind the refactoring
- the example can help verifying that the changes are complete (to an extent)

> Before the patch, claimFdWeak and claimFd were proto versions of this claiming. As you can see, they required "external" ordering between multiple claimers, thus we had PRE_FILE_DESCRIPTOR and NATIVE_PRNG priorities before FILE_DESRIPTOR, to override FileDescriptors throwing an exception. That did not scale well.

The NATIVE_PRNG priority is still present, do you expect to remove those in a follow up?

I have realized that this is removing some technical debt part of the `claimFd`/`claimFdWeak` part, and I am good with tracking the ownership graph. However I am worried that while you have 'simplified' some parts in FileDescriptor #69 would reintroduce those, or be forced to pass the FD state info in an opaque way.

src/java.base/share/classes/java/io/FileDescriptor.java line 67:

> 65:         public void beforeCheckpoint(Context<? extends jdk.crac.Resource> context) throws Exception {
> 66:             if (!closedByNIO && valid()) {
> 67:                 ClaimedFDs ctx = Core.getClaimedFDs();

nit: not a context anymore

Please check other uses of `getClaimedFDs` as well.

src/java.base/share/classes/jdk/crac/impl/ExceptionHolder.java line 30:

> 28:     }
> 29: 
> 30:     public void reSuppress(Exception e) {

Was thinking about a better name... `adoptSuppressed`?

src/java.base/share/classes/jdk/internal/crac/ClaimedFDs.java line 50:

> 48:     }
> 49: 
> 50:     public static class Tuple<T1, T2> {

This class would deserve a name, esp. since it's used outside of this class and doesn't need to be generic.

src/java.base/share/classes/jdk/internal/crac/ClaimedFDs.java line 62:

> 60:     }
> 61: 
> 62:     public void claimFd(FileDescriptor fd, Object claimer, Supplier<Exception> supplier, Object... suppressedClaimers) {

Knowing all potential claimers seems limiting, this would require breaking the encapsulation in a more complex chain.
I suggest to change this to `claim(Object claimed, Object claimer, Suppler<Exception> supplier)` and then assume that `claimed == suppressedClaimer`. The implementation would handle the `claimed == claimer` case for FD a bit differently, but e.g. `ZipFile` would not claim FD itself, but only the `RandomAccessFile zfile`.
All claims would be recorded (not overwriting anything) and the final claimant would be resolved only after all notifications complete.

src/java.base/unix/native/libjava/FileDescriptor_md.c line 117:

> 115: }
> 116: 
> 117: static const char* family2str(int family) {

I don't think this belongs to FileDescriptor. You have tried to let Java part of FileDescriptor not know about its type, but only by moving socket-specific handling to native code which contradicts the title of this PR.


PR Review: https://git.openjdk.org/crac/pull/79#pullrequestreview-1474173204
PR Review Comment: https://git.openjdk.org/crac/pull/79#discussion_r1226195086
PR Review Comment: https://git.openjdk.org/crac/pull/79#discussion_r1226178016
PR Review Comment: https://git.openjdk.org/crac/pull/79#discussion_r1226179869
PR Review Comment: https://git.openjdk.org/crac/pull/79#discussion_r1226192993
PR Review Comment: https://git.openjdk.org/crac/pull/79#discussion_r1226205671

More information about the crac-dev mailing list