InputStream/OutputStream concurrency guarantees

Thu Oct 30 21:39:46 UTC 2025

* Archie Cobbs:

> I think this is still missing the bigger picture. I.e., what is
> "internally consistent" supposed to mean? As Pavel said:
>
>> the array may be changed while it's being copied, which is no
>> different from it being changed while it is being written from.
>
> From a "consistency" point of view, I think the array behaves
> essentially as a bunch of unrelated 8-bit quantities, all of which are
> subject to change, in any order. Even if the adversarial writers were
> writing in some order, the OutputStream method won't necessarily
> observe those writes in that same order (actually I am not 100%
> certain, but I'm pretty sure the memory model doesn't guarantee
> that). So it doesn't matter if OutputStream happens to read the array
> from 0 to N - that doesn't really impose any "order" so to speak.

Okay, maybe I should write down why I'm asking these questions.

I'm trying to image in a way to get rid of the double buffering in
FileInputStream/FileOutputStream: have the kernel access the heap
directly, instead of a copy.  The naïve approach with critical byte
array operations doesn't work because it won't meet pause time
expectations.  On the other hand, it seems perfectly safe to let the
kernel read or write the array during the GC marking phase because the
two kinds of accesses do not interfere at all.  Evacuation while a
FileInputStream is reading is more tricky because it's in principle
observable: the evacuated object would have to be copied over again once
the read comes out of the kernel.

However, the real challenge is to get out of the kernel when it is time
to make the old object available for reuse (which may happen pretty
quickly with in-place compaction used by some GCs).  I believe it is
possible to achieve this by sending a signal that is already handled by
the JVM.  There is an inherent race condition there because the required
behavior differs somewhat depending on whether the system call has not
yet started, has started, but was interrupted by the signal, has
concluded, and has concluded and the thread previously doing I/O is now
stuck at the safepoint.  It is possible to tell these cases apart if the
JVM does a direct system call (not through libc) from a machine code
sequence with a known layout, and the signal handler examines the
program counter.  (A single system call helper could handle all system
calls, if the system call number is an additional parameter.)  This is
how POSIX thread cancellation is nowadays implemented in musl and glibc,
as it's the only way to figure out if it is safe to act on the
cancellation.

(All of this is of course very Linux-specific.)

Thanks,
Florian