Unix paths as bytes
Alan Bateman
Alan.Bateman at Sun.COM
Sat May 2 04:21:22 PDT 2009
Philip Jenvey wrote:
> UnixPath solves the issue of java.io.File treating unix paths as
> Strings (e.g. http://bugs.sun.com/view_bug.do?bug_id=4899439 ) -- but
> AFAICT not for all situations on the JVM.
>
> For example in Jython, paths are represented by Strings, not wrapper
> objects (JRuby has wrappers but e.g. their Dir.entries() similarly
> return paths as Strings). Without access to the underlying unix path
> name as bytes we are stuck with the same old problem of garbage names
> -- UnixPaths translate their byte representation to Strings by munging
> invalid characters to the 0xFFFD replacement character.
>
> FYI Python 3 will deal with these invalid characters by representing
> them with half surrogates (detailed in PEP 383
> http://www.python.org/dev/peps/pep-0383/ ) -- this allows
> roundtripping those invalid characters back to bytes.
>
> Can we allow access to UnixPath's byte representation of path names
> and the reverse: the ability to create a Path object from said bytes?
The only way currently to "export" or "import" as bytes is via URIs.
When encoding as a URI the platform representation is used and
characters that aren't legal in the URI path component are escaped. This
gives you the round-trip but isn't exactly what you want in the String
is a URI rather than a path. I'm not familar with the Python proposal
but I will examine it - thanks for forwarding.
-Alan.
More information about the nio-dev
mailing list