RFR: 8298318: (fs) APIs for handling filename extensions [v8]

Anthony Vanelverdinghe duke at openjdk.org
Thu Mar 14 17:38:41 UTC 2024


On Wed, 13 Mar 2024 23:12:50 GMT, Brian Burkhalter <bpb at openjdk.org> wrote:

>> Add to `java.nio.file.Path` a method `getExtension` to retrieve the `Path`'s extension, and companion methods `removeExtension` and `addExtension`.
>
> Brian Burkhalter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8298318: Add parameter checks for withExtension

Let me summarize. For `getExtension` there's:

> whether to include the period

I'm happy to be proven wrong, but I hold that including it is a prerequisite for intuitive behavior in all cases. For example, both `Path.of("test.").withoutExtension().withExtension("txt")` and `Path.of("test.").withExtension("txt")` actually result in `test..txt`, whereas I expect these to result in `test.txt`.

and there's:

> how to define file extension

1. platform and/or provider dependent (matches 2 or 3, depending on the platform and/or provider)
   * pro: intuitive behavior on all platforms
   * con: not consistent across all platforms
2. "anything past the last period, if any, the empty string otherwise" (matches all common Java APIs, Windows, ...)
   * pro: trivial definition
   * pro: compatible with all common Java APIs (`Files::probeContentType`, Guava, Apache Commons)
   * pro: Windows "owns" the concept of file extension, in the sense that it actually extensively relies on file extensions in its treatment of files. So if 1 global definition must be chosen, it makes sense to adopt the one of the platform that "owns" it
3. "anything past the last period, if any and if it is not the first character of the file name, the empty string otherwise" (matches UNIX, ...)
   * pro: intuitive behavior on UNIX
   * con: "if it is not the first character of the file name" cannot be intuitively justified without mentioning the UNIX-specific way of marking a path as hidden
     * it might be motivated by introducing the concept of a file "root" and requiring the root to be non-empty, but "root" is an artificial construct and both Windows and UNIX actually allow empty roots (for UNIX this can't be proven, but it also can't be disproven)
     * note that this is actually a pro if this option is considered in the context of `1.`
4. "anything past the last period, if any and if it is not the first character of the file name and if it is not preceded by nothing but dots, the empty string otherwise" (matches Python, Ruby)
   * con: very uncommon definition
   * con: "if it is not preceded by nothing but dots" is an arbitrary condition

Personally I prefer `1.`. I agree that consistency across providers is a worthwhile goal though (e.g. to have `jdk.zipfs` be consistent with the default provider). So I'd implement `getExtension` as follows in Path itself, and override it in `WindowsPath` (implementing `2.`) and `UnixPath` (implementing `3.`):


default String getExtension() {
    // avoid StackOverflowError
    if(/* java.nio.file.spi.DefaultFileSystemProvider is specified and this path is an instance of the specified default provider */) {
        // implement `2.`
    } else {
        return Path.of(getFileName()).getExtension();
    }
}


If consistency across all platforms and providers is a must, I'm advocating for `2.` for the reasons mentioned above.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16226#issuecomment-1997990007


More information about the nio-dev mailing list