Unix paths as bytes
Martin Buchholz
martinrb at google.com
Sun May 3 17:02:42 PDT 2009
When I implemented environment variables in Java, I pondered whether
to expose a native representation in some way, but eventually decided that
the JDK had already decided to pretend that there was a one-to-one
correspondence between Strings and external OS entities like file names,
and so in the JDK environment variables are a Map<String,String>.
But as I wrote in http://bugs.sun.com/view_bug.do?bug_id=4899439,
---
When I implemented environment variables, I tried to avoid this sort of bug.
When examined by Java code, an environment variable has only String names
and values, approximations of the underlying real names and values,
but the environment variables themselves will not be corrupted by
being passed through the ProcessBuilder abstraction.
---
The python proposal is interesting,
but also does not provide real access to the underlying bytes,
and appears to have round-trip preservation problems.
The Paths API seems to be parallel to the environment variable API
in that it catches most of the places where file names would be
corrupted by round-trip encoding/decoding, but it is easy to
construct sample code where the abstraction is leaky,
E.g. if you try to construct a file name from the concatenation of
an existing file name and a suffix defined in Java code as a string.
(Correct me if I'm wrong)
Martin
On Fri, May 1, 2009 at 18:03, Philip Jenvey <pjenvey at underboss.org> wrote:
> UnixPath solves the issue of java.io.File treating unix paths as Strings
> (e.g. http://bugs.sun.com/view_bug.do?bug_id=4899439 ) -- but AFAICT not for
> all situations on the JVM.
>
> For example in Jython, paths are represented by Strings, not wrapper objects
> (JRuby has wrappers but e.g. their Dir.entries() similarly return paths as
> Strings). Without access to the underlying unix path name as bytes we are
> stuck with the same old problem of garbage names -- UnixPaths translate
> their byte representation to Strings by munging invalid characters to the
> 0xFFFD replacement character.
>
> FYI Python 3 will deal with these invalid characters by representing them
> with half surrogates (detailed in PEP 383
> http://www.python.org/dev/peps/pep-0383/ ) -- this allows roundtripping
> those invalid characters back to bytes.
>
> Can we allow access to UnixPath's byte representation of path names and the
> reverse: the ability to create a Path object from said bytes?
>
> --
> Philip Jenvey
>
More information about the nio-dev
mailing list