Unix paths as bytes

Philip Jenvey pjenvey at underboss.org
Mon May 4 19:54:31 PDT 2009


On May 4, 2009, at 3:41 PM, Martin Buchholz wrote:
>
> I believe that no implementation based on error handlers can work
> because it cannot handle the situation where two different byte inputs
> are converted to the same char sequence without error.  The original
> byte sequence cannot be reliably re-created.
> What am I missing?

There's no case where 2 different sets of bytes would convert to the  
same chars

Invalid bytes decode to lone low surrogates which by themselves are  
invalid unicode. These never get a preceding high surrogate as the  
actual decoder would never yield one by itself. So only invalid bytes  
produce invalid isolated surrogates and both are always decoded/ 
encoded through the error handler.

It's taking advantage of the fact that valid bytes never decode to  
isolated surrogates regardless of their encoding. Like Mono's scheme  
takes advantage of the fact that paths never decode to a NUL as it's  
an invalid path character.

The fact that it's done via an error handler is more out of  
convenience -- so it can be piggy backed on to any encoding. The  
scheme itself is an actual encoding called utf8b: http://bsittler.livejournal.com/10381.html

>
>>> The Paths API seems to be parallel to the environment variable API
>>> in that it catches most of the places where file names would be
>>> corrupted by round-trip encoding/decoding, but it is easy to
>>> construct sample code where the abstraction is leaky,
>>> E.g. if you try to construct a file name from the concatenation of
>>> an existing file name and a suffix defined in Java code as a string.
>>> (Correct me if I'm wrong)
>>
>> This example does work for paths as long as you're concatenating  
>> via Path
>> objects (and the value of suffix is valid according to  
>> file.encoding).
>
> I don't see any place in the Paths API where manipulation of a Path  
> component
> is supported.  E.g. how would an Emacs implemented in Java append the
> "~" character to the filename to create the backup file?

I misread your example, as Alan pointed out you're right, this is a  
problem.

--
Philip Jenvey




More information about the nio-dev mailing list