ATOMIC_MOVE fails under high frequency conditions on Windows

Robert Muir rcmuir at gmail.com
Mon Feb 27 16:56:57 UTC 2023


On Sun, Feb 26, 2023 at 3:37 PM Michael Osipov <michaelo at apache.org> wrote:
> The MoveFileEx does not properly block until the operation completes and
> some Windows-internal locking fails the operation. Even the
> While investigating the problem, it seems to be very common on Windows
> with Jenkins, Kafka, etc. but also Python folks discussed the issue
> (https://github.com/python/cpython/issues/53074).
> Now, if I apply the following patch:
> > -                Files.move( tempFile, file, StandardCopyOption.ATOMIC_MOVE );
> > +                Kernel32.INSTANCE.MoveFileEx( tempFile.toAbsolutePath().toString(), file.toAbsolutePath().toString(),
> > +                    new DWORD( WinBase.MOVEFILE_REPLACE_EXISTING | WinBase.MOVEFILE_WRITE_THROUGH ) );
>
> It just works. MOVEFILE_WRITE_THROUGH guarantees the operaton to block
> until it is completed.

Hi, we rely on this as well in apache lucene, and have some gnarly
tests around it, but we haven't seen this specific issue.
Maybe we dodge it because we always fsync() files before renaming
them? I don't see any synchronization to disk in your sample code. It
could be a possible workaround for your problem, and generally IMO
should be done anyway rather than relying upon hacks like
auto_da_alloc to do it for us:
https://docs.kernel.org/admin-guide/ext4.html


More information about the nio-dev mailing list