RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)
Chris Hegarty
chris.hegarty at oracle.com
Fri Nov 8 07:35:10 PST 2013
On 08/11/2013 15:01, roger riggs wrote:
> Hi,
>
> Does renaming the file/directory suffer the same delay?
I have not tried, but I read that MoveFileEx does not suffer from this.
> I could see a cleanup mechanism that renames them to hidden files (or
> entirely out of the work directory)
> and then deletes them. That would immediately clear the namespace for
> tests to proceed.
Given the above, then I do think that this idea has potential, but I
haven't looked into it further, yet. More investigation needed.
We have used the retry technique in a few places in the jdk tests. All I
am trying to do here is prevent everyone from writing their own version
of this.
Maybe we could go with what I have for now (pending reviews), and
revisit later, if needed. I'm scared to open a discussion on where to
move test files to ;-)
-Chris.
>
> That technique should work on all platforms.
>
> Roger
>
> On 11/8/2013 9:47 AM, Chris Hegarty wrote:
>> Alan,
>>
>> > An alternative might be to just throw the IOException with
>> > InterruptedException as the cause.
>>
>> Perfect. Updated in the new webrev.
>>
>> Dan,
>>
>> You are completely correct. I was only catering for the case where
>> "java.nio.file.FileSystemException: <your_file>: The process cannot
>> access the file because it is being used by another process."
>>
>> Where the delete "succeeds" then we need to wait until the underlying
>> platform delete completes, i.e. the file no longer exists.
>>
>> Updated webrev ( with only the diff from the previous ) :
>> http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/
>>
>> Thanks,
>> -Chris.
>>
>>
>> On 08/11/2013 02:26, Dan Xu wrote:
>>>
>>> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>>>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>>>> Given both Michael and Alan's comments. I've update the webrev:
>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>>>
>>>>> 1) more descriptive method names
>>>>> 2) deleteXXX methods return if interrupted, leaving the
>>>>> interrupt status set
>>>>> 3) Use Files.copy with REPLACE_EXISTING
>>>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>>>
>>>> This looks better although interrupting the sleep means that the
>>>> deleteXXX will quietly terminate with the interrupt status set (which
>>>> could be awkward to diagnose if used with tests that are also using
>>>> Thread.interrupt). An alternative might be to just throw the
>>>> IOException with InterruptedException as the cause.
>>>>
>>>> -Alan.
>>>>
>>>>
>>> Hi Chris,
>>>
>>> In the method, deleteFileWithRetry0(), it assumes that if any other
>>> process is accessing the same file, the delete operation,
>>> Files.delete(), will throw out IOException on Windows. But I don't see
>>> this assumption is always true when I investigated this issue on
>>> intermittent test failures.
>>>
>>> When Files.delete() method is called, it finally calls DeleteFile or
>>> RemoveDirectory functions based on whether the target is a file or
>>> directory. And these Windows APIs only mark the target for deletion on
>>> close and return immediately without waiting the operation to be
>>> completed. If another process is accessing the file in the meantime, the
>>> delete operation does not occur and the target file stays at
>>> delete-pending status until that open handle is closed. It basically
>>> implies that DeleteFile and RemoveDirectory is like an async operation.
>>> Therefore, we cannot assume that the file/directory is deleted after
>>> Files.delete() returns or File.delete() returns true.
>>>
>>> When checking those intermittently test failures, I find the test
>>> normally succeeds on the Files.delete() call. But due to the
>>> interference of Anti-virus or other Windows daemon services, the target
>>> file changes to delete-pending status. And the immediately following
>>> operation fails due the target file still exists, but our tests assume
>>> the target file is already gone. Because the delete-pending status of a
>>> file usually last for a very short time which depends on the
>>> interference source, such failures normally happens when we recursively
>>> delete a folder or delete-and-create a file with the same file name at a
>>> high frequency.
>>>
>>> It is basically a Windows API design or implementation issue. I have
>>> logged an enhancement, JDK-8024496, to solve it from Java library layer.
>>> Currently, I have two strategies in mind. One is to make the delete
>>> operation blocking, which means to make sure the file/directory is
>>> deleted before the return. The other is to make sure the delete-pending
>>> file does not lead to a failure of subsequent file operations. But they
>>> both has pros and cons.
>>>
>>> Thank!
>>>
>>> -Dan
>
More information about the net-dev
mailing list