RFR: 8022213 Intermittent test failures in java/net/URLClassLoader (Add jdk/testlibrary/FileUtils.java)

Dan Xu dan.xu at oracle.com
Mon Nov 11 09:55:45 PST 2013


Thank you, Chris. Others look good to me. Thanks for helping solve this 
hard problem!

-Dan

On 11/11/2013 02:41 AM, Chris Hegarty wrote:
> Thanks Dan,
>
> I'll make the changes before pushing.
>
> -Chris.
>
> On 09/11/2013 05:43, Dan Xu wrote:
>> Hi Chris,
>>
>> In deleteFileWithRetry0(), the following lines
>>
>> 79 while (true) {
>> 80 if (Files.notExists(path))
>> 81 break;
>>
>> can be combined to
>>
>> while (Files.exists(path)) {
>> ...
>>
>> And L99
>>
>> 99                Thread.sleep(RETRY_DELETE_MILLIS);
>>
>> seems not indented correctly.
>>
>> Thanks,
>>
>> -Dan
>>
>> On 11/08/2013 07:35 AM, Chris Hegarty wrote:
>>> On 08/11/2013 15:01, roger riggs wrote:
>>>> Hi,
>>>>
>>>> Does renaming the file/directory suffer the same delay?
>>>
>>> I have not tried, but I read that MoveFileEx does not suffer from this.
>>>
>>>> I could see a cleanup mechanism that renames them to hidden files (or
>>>> entirely out of the work directory)
>>>> and then deletes them. That would immediately clear the namespace for
>>>> tests to proceed.
>>>
>>> Given the above, then I do think that this idea has potential, but I
>>> haven't looked into it further, yet. More investigation needed.
>>>
>>> We have used the retry technique in a few places in the jdk tests. All
>>> I am trying to do here is prevent everyone from writing their own
>>> version of this.
>>>
>>> Maybe we could go with what I have for now (pending reviews), and
>>> revisit later, if needed. I'm scared to open a discussion on where to
>>> move test files to ;-)
>>>
>>> -Chris.
>>>
>>>>
>>>> That technique should work on all platforms.
>>>>
>>>> Roger
>>>>
>>>> On 11/8/2013 9:47 AM, Chris Hegarty wrote:
>>>>> Alan,
>>>>>
>>>>> > An alternative might be to just throw the IOException with
>>>>> > InterruptedException as the cause.
>>>>>
>>>>> Perfect. Updated in the new webrev.
>>>>>
>>>>> Dan,
>>>>>
>>>>> You are completely correct. I was only catering for the case where
>>>>> "java.nio.file.FileSystemException: <your_file>: The process cannot
>>>>> access the file because it is being used by another process."
>>>>>
>>>>> Where the delete "succeeds" then we need to wait until the underlying
>>>>> platform delete completes, i.e. the file no longer exists.
>>>>>
>>>>> Updated webrev ( with only the diff from the previous ) :
>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.02/webrev/
>>>>>
>>>>> Thanks,
>>>>> -Chris.
>>>>>
>>>>>
>>>>> On 08/11/2013 02:26, Dan Xu wrote:
>>>>>>
>>>>>> On 11/07/2013 11:04 AM, Alan Bateman wrote:
>>>>>>> On 07/11/2013 14:59, Chris Hegarty wrote:
>>>>>>>> Given both Michael and Alan's comments. I've update the webrev:
>>>>>>>> http://cr.openjdk.java.net/~chegar/fileUtils.01/webrev/
>>>>>>>>
>>>>>>>> 1) more descriptive method names
>>>>>>>> 2) deleteXXX methods return if interrupted, leaving the
>>>>>>>> interrupt status set
>>>>>>>> 3) Use Files.copy with REPLACE_EXISTING
>>>>>>>> 4) Use SimpleFileVisitor, rather than FileVisitor
>>>>>>>>
>>>>>>> This looks better although interrupting the sleep means that the
>>>>>>> deleteXXX will quietly terminate with the interrupt status set 
>>>>>>> (which
>>>>>>> could be awkward to diagnose if used with tests that are also using
>>>>>>> Thread.interrupt). An alternative might be to just throw the
>>>>>>> IOException with InterruptedException as the cause.
>>>>>>>
>>>>>>> -Alan.
>>>>>>>
>>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>> In the method, deleteFileWithRetry0(), it assumes that if any other
>>>>>> process is accessing the same file, the delete operation,
>>>>>> Files.delete(), will throw out IOException on Windows. But I 
>>>>>> don't see
>>>>>> this assumption is always true when I investigated this issue on
>>>>>> intermittent test failures.
>>>>>>
>>>>>> When Files.delete() method is called, it finally calls DeleteFile or
>>>>>> RemoveDirectory functions based on whether the target is a file or
>>>>>> directory. And these Windows APIs only mark the target for 
>>>>>> deletion on
>>>>>> close and return immediately without waiting the operation to be
>>>>>> completed. If another process is accessing the file in the
>>>>>> meantime, the
>>>>>> delete operation does not occur and the target file stays at
>>>>>> delete-pending status until that open handle is closed. It basically
>>>>>> implies that DeleteFile and RemoveDirectory is like an async
>>>>>> operation.
>>>>>> Therefore, we cannot assume that the file/directory is deleted after
>>>>>> Files.delete() returns or File.delete() returns true.
>>>>>>
>>>>>> When checking those intermittently test failures, I find the test
>>>>>> normally succeeds on the Files.delete() call. But due to the
>>>>>> interference of Anti-virus or other Windows daemon services, the
>>>>>> target
>>>>>> file changes to delete-pending status. And the immediately following
>>>>>> operation fails due the target file still exists, but our tests 
>>>>>> assume
>>>>>> the target file is already gone. Because the delete-pending status
>>>>>> of a
>>>>>> file usually last for a very short time which depends on the
>>>>>> interference source, such failures normally happens when we
>>>>>> recursively
>>>>>> delete a folder or delete-and-create a file with the same file name
>>>>>> at a
>>>>>> high frequency.
>>>>>>
>>>>>> It is basically a Windows API design or implementation issue. I have
>>>>>> logged an enhancement, JDK-8024496, to solve it from Java library
>>>>>> layer.
>>>>>> Currently, I have two strategies in mind. One is to make the delete
>>>>>> operation blocking, which means to make sure the file/directory is
>>>>>> deleted before the return. The other is to make sure the
>>>>>> delete-pending
>>>>>> file does not lead to a failure of subsequent file operations. But
>>>>>> they
>>>>>> both has pros and cons.
>>>>>>
>>>>>> Thank!
>>>>>>
>>>>>> -Dan
>>>>
>>




More information about the net-dev mailing list