Unix paths as bytes

Philip Jenvey pjenvey at underboss.org
Mon May 4 00:20:04 PDT 2009


On May 3, 2009, at 5:02 PM, Martin Buchholz wrote:

> The python proposal is interesting,
> but also does not provide real access to the underlying bytes,
> and appears to have round-trip preservation problems.

Python does provide direct access to paths as bytes via different  
APIs. Byte versions of the environment and the command line args have  
been discussed and may happen in the future, even with PEP 383.

I mention this new PEP because it's made for the general case of  
working with strings and expecting strings back from these APIs. Our  
UNIX APIs will encode these paths back to their original bytes via the  
filesystem's encoding + the PEP's new encoder error handler, and  
Python code can also encode them back to bytes in the same way. There  
are no round-trip preservation issues.

This scheme is similar to what Mono does: http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding

Whereas Mono uses a NUL followed by the offending byte as a char. The  
Mono scheme wasn't chosen because external libs (e.g. PyGTK) would  
truncate the strings at NUL upon rendering or reject them entirely.  
Which is worse than just rendering lone surrogates as garbage.

> The Paths API seems to be parallel to the environment variable API
> in that it catches most of the places where file names would be
> corrupted by round-trip encoding/decoding, but it is easy to
> construct sample code where the abstraction is leaky,
> E.g. if you try to construct a file name from the concatenation of
> an existing file name and a suffix defined in Java code as a string.
> (Correct me if I'm wrong)

This example does work for paths as long as you're concatenating via  
Path objects (and the value of suffix is valid according to  
file.encoding). In the case of other JVM languages we just don't have  
that luxury of always representing paths with nio's Path objects.

We'd also love access to environment variables as bytes but that's a  
whole different story.

--
Philip Jenvey



More information about the nio-dev mailing list