[EXTERNAL] Re: Bug in Files.isSameFile(Path,Path)

Nhat Nguyen honguye at microsoft.com
Tue Feb 2 21:56:27 UTC 2021


Hi Alan,

It has been a while since we last had this conversation. I just want to ask if
you are still interested in investigating more into this issue?

Thank you,
Nhat

-----Original Message-----
From: Nhat Nguyen 
Sent: Thursday, November 12, 2020 4:26 PM
To: Alan Bateman <Alan.Bateman at oracle.com>; Nikola Grcevski <Nikola.Grcevski at microsoft.com>; WarnerJan Veldhuis <veldhuis at freedom.nl>; nio-dev at openjdk.java.net
Cc: Brian Stafford <Brian.Stafford at microsoft.com>
Subject: RE: [EXTERNAL] Re: Bug in Files.isSameFile(Path,Path)

Hi Alan,

> If you confirm that GetFinalPathNameByHandle is essentially POSIX 
> realpath then I would be happy to see WindowsLinkSupport.getRealPath 
> changed to use that for the followLinks=true case. It might be that we 
> need a fallback for cases where GetFinalPathNameByHandle fails

I found an interesting discussion from the cpython folks [1]. They recently started using GetFinalPathNameByHandle by default for their realpath implementation [2]. If the call fails, they fall back to a combination of GetFinalPathNameByHandle and manually resolving links [3].

In this discussion, there was a particularly useful comment from Eryk Sun [4] where he mentioned that GetFinalPathNameByHandle is not entirely similar to POSIX realpath when the path contains a mount point:

> Eryk Sun:
> What we need is an implementation of realpath("C:/spam/scripts") that returns "C:\\spam\\scripts"
> when "scripts" is a mount point and returns "C:\\spam\\etc\\scripts" when "scripts" is a symlink.
> This means we need an implementation of realpath() that looks a lot like posixpath.realpath.
> Generally a mount point should be walked over like a directory, just as mount points are handled in Unix.

However, looking at the current implementation of OpenJDK's toRealPath, as soon as we detect that there's a subpath that contains the flag FILE_ATTRIBUTE_REPARSE_POINT, we immediately call getFinalPath on the original path. So if a path contains a symlink or a mount point, the result of toRealPath is always the result of getFinalPath (unless the call fails due to broken links, at which point we try to resolve them and recursively call toRealPath again).

So my understanding is that we may still be ok to use getFinalPath for toRealPath, at least in the case where the path contains links/mount points, since the behaviour is the same as what we currently have. However, assuming we always use getFinalPath when followLinks=true, users who mount webdav on drive Z: will now get back the full link \\sharepointHostName.com at SSL\DavWWWRoot\a.txt" which used to be "Z:\a.txt". It is still unclear to me what should be the right behaviour in this particular case.


> I'd also would be okay with change isSameFile to use toRealPath on both files so that it does not rely on the volume or file index.

Solely for the purpose of file comparison, what is your opinion on using getFinalPath for isSameFile instead of toRealPath, which would also require some changes to its implementation? The paths are only used for comparison and aren't returned to the users, so we won't have to worry about breaking compatibility for existing code. 

Thanks,
Nhat

[1]: https://bugs.python.org/issue9949
[2]: https://github.com/python/cpython/blob/cc75ab791dd5ae2cb9f6e0c3c5f734a6ae1eb2a9/Lib/ntpath.py#L647
[3]: https://github.com/python/cpython/blob/cc75ab791dd5ae2cb9f6e0c3c5f734a6ae1eb2a9/Lib/ntpath.py#L579
[4]: https://bugs.python.org/msg350138, this is the directly link to one of the comments in [1]

-----Original Message-----
From: Alan Bateman <Alan.Bateman at oracle.com>
Sent: Thursday, November 5, 2020 6:12 AM
To: Nhat Nguyen <honguye at microsoft.com>; Nikola Grcevski <Nikola.Grcevski at microsoft.com>; WarnerJan Veldhuis <veldhuis at freedom.nl>; nio-dev at openjdk.java.net
Subject: Re: [EXTERNAL] Re: Bug in Files.isSameFile(Path,Path)

On 04/11/2020 00:16, Nhat Nguyen wrote:
> Hi Alan,
>
> Thank you for the suggestion! I tried using toRealPath and noticed this interesting scenario.
>
> Let's assume that we have a drive "Z:" mapped to a sharepoint drive containing file "a.txt" and another soft link "C:/link"
> whose target is "Z:/a.txt". Further assume we want to see if "Z:/a.txt" and "C:/link" are the same.
>
> If we perform toRealPath on both inputs, "Z:/a.txt" will stay as 
> "Z:/a.txt". However, "C:/link" will follow a different code path in WindowsLinkSupport.getRealPath where it notices that the input is a reparse point and eventually calls getFinalPath.
> The result for "C:/link" is then
> "\\sharepointHostName.com at SSL\DavWWWRoot\a.txt". So, we conclude that the files are different even though they are the same.
>
> If we use GetFinalPath for both, we will be able to notice that they 
> are the same. I'd love to know what you think about this scenario and if it's worth supporting it.
>
I agree we need to re-examine WindowsLinkSupport.getRealPath, at least for the followLinks=true case. If I remember correctly, we couldn't rely on GetFinalPathNameByHandle because it wasn't available on Windows XP and that is historical now. Also there was an issue with needing to open the file with backup semantics, I need to page in some of the details but I think that was needed for directories. There was another issue with sym links on NTFS that linked to files on non-NTFS volumes. If you confirm that GetFinalPathNameByHandle is essentially POSIX realpath then I would be happy to see WindowsLinkSupport.getRealPath changed to use that for the followLinks=true case. It might be that we need a fallback for cases where GetFinalPathNameByHandle fails. I'd also would be okay with change isSameFile to use toRealPath on both files so that it does not rely on the volume or file index.

-Alan


More information about the nio-dev mailing list