ATOMIC_MOVE fails under high frequency conditions on Windows
Michael Osipov
michaelo at apache.org
Sun Feb 26 20:37:13 UTC 2023
Hi folks,
this is Michael from the Apache Maven team.
We have recently released Apache Maven 3.9.0 and shortly after received
this bug report: https://issues.apache.org/jira/browse/MRESOLVER-325
A short summary of the issue: Maven Resolver 1.9.x uses now same-dir
temp files with atomic moves to the target to reduce the concurrency
update window additionally to a high level locking layer. Before release
we have performed our testing on several POSIX-like platforms like
Fedora, macOS and FreeBSD, no issue. An Eclipse (Tycho) committer
approach to us and reported failures now in GH Actions on Windows. The
failures happen when the same tracking file is updated for the one
artifact and the same thread (no concurrency), but different classifiers
(none, sources) within a very short period of time. Windows fails with
AccessDeniedException. I was able to reproduce it reliably on different
Windows machines and a standalone example.
Here it is:
> import java.io.IOException;
> import java.io.Writer;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import java.nio.file.StandardCopyOption;
>
> public class AtomicMove {
>
> public static void main(String[] args) throws IOException, InterruptedException {
>
> Path target = Paths.get("foo");
>
> for (int i = 0; i < 100; i++) {
> Path temp = Paths.get("foo." + i);
> try (Writer w = Files.newBufferedWriter(temp)) {
> w.write("I was made for crashing you, baby!");
> }
> Files.move(temp, target, StandardCopyOption.ATOMIC_MOVE);
> }
> }
>
> }
fails with:
> Exception in thread "main" java.nio.file.AccessDeniedException: foo.41 -> foo
> at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
> at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
> at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
> at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
> at java.nio.file.Files.move(Files.java:1395)
> at com.siemens.sb.sso.example.AtomicMove.main(AtomicMove.java:21)
All with Zulu 8+, so 11 and 17 are affected as well since the JDK code
is identical.
The problem is this piece of code in the JDK:
> if (atomicMove) {
> try {
> MoveFileEx(sourcePath, targetPath, MOVEFILE_REPLACE_EXISTING);
> } catch (WindowsException x) {
> if (x.lastError() == ERROR_NOT_SAME_DEVICE) {
> throw new AtomicMoveNotSupportedException(
> source.getPathForExceptionMessage(),
> target.getPathForExceptionMessage(),
> x.errorString());
> }
> x.rethrowAsIOException(source, target);
> }
> return;
> }
The MoveFileEx does not properly block until the operation completes and
some Windows-internal locking fails the operation. Even the
While investigating the problem, it seems to be very common on Windows
with Jenkins, Kafka, etc. but also Python folks discussed the issue
(https://github.com/python/cpython/issues/53074).
Now, if I apply the following patch:
> - Files.move( tempFile, file, StandardCopyOption.ATOMIC_MOVE );
> + Kernel32.INSTANCE.MoveFileEx( tempFile.toAbsolutePath().toString(), file.toAbsolutePath().toString(),
> + new DWORD( WinBase.MOVEFILE_REPLACE_EXISTING | WinBase.MOVEFILE_WRITE_THROUGH ) );
It just works. MOVEFILE_WRITE_THROUGH guarantees the operaton to block
until it is completed.
I know, that Windows does *not* provide atomicity, granted, but it can
at least block the operation until completion.
Can you guys at least log a bug with JBS and have a look whether you
could set the flag for plagued Java users on Windows?
For now, we will revert the usage of same-dir temp files with atomic
moves on those tracking files:
https://github.com/apache/maven-resolver/pull/259/files
Note: replacing ATOMIC_MOVE with REPLACE_EXISTING yields to another
exception since the JDK just does DeleteFile() and MoveFileEx() again
which leads to the same chicken and egg problem.
Many thanks,
Michael
More information about the nio-dev
mailing list