ATOMIC_MOVE fails under high frequency conditions on Windows
Michael Osipov
michaelo at apache.org
Thu Mar 2 10:51:55 UTC 2023
Am 2023-02-27 um 17:56 schrieb Robert Muir:
> On Sun, Feb 26, 2023 at 3:37 PM Michael Osipov <michaelo at apache.org> wrote:
>> The MoveFileEx does not properly block until the operation completes and
>> some Windows-internal locking fails the operation. Even the
>> While investigating the problem, it seems to be very common on Windows
>> with Jenkins, Kafka, etc. but also Python folks discussed the issue
>> (https://github.com/python/cpython/issues/53074).
>> Now, if I apply the following patch:
>>> - Files.move( tempFile, file, StandardCopyOption.ATOMIC_MOVE );
>>> + Kernel32.INSTANCE.MoveFileEx( tempFile.toAbsolutePath().toString(), file.toAbsolutePath().toString(),
>>> + new DWORD( WinBase.MOVEFILE_REPLACE_EXISTING | WinBase.MOVEFILE_WRITE_THROUGH ) );
>>
>> It just works. MOVEFILE_WRITE_THROUGH guarantees the operaton to block
>> until it is completed.
>
> Hi, we rely on this as well in apache lucene, and have some gnarly
> tests around it, but we haven't seen this specific issue.
> Maybe we dodge it because we always fsync() files before renaming
> them? I don't see any synchronization to disk in your sample code.
No, we haven't used fsync() so far, this collocated file stuff is pretty
new. We simply didn't expect this and never had such issues.
Can you guide me to the code spots in Lucene for the optimal behavior?
M
More information about the nio-dev
mailing list