Unix paths as bytes

Tue May 5 19:14:28 PDT 2009

On May 4, 2009, at 11:24 PM, Martin Buchholz wrote:
>>
>> There's no case where 2 different sets of bytes would convert to  
>> the same
>> chars
>
> I don't understand this.  There are many locales with encodings with  
> non-unique
> representations.  Until the UTF-8 security reform,
> even UTF-8 had non-unique representations.
> The Python PEP seems designed to be used with
> any system encoding, not just UTF-8.

Ok, like ISO-2022-JP, ShiftJIS. These did come up in the PEP  
discussion on the python-dev ML.

They weren't highly regarded as they're pretty broken as Unix locales.  
The POSIX spec describes these "locking shift encodings" as fishy/ 
invalid for its character set [1] and they're incompatible with ASCII.  
RedHat, Debian and others disable them as locales by default.

These are indeed problematic, I guess they just weren't a deal breaker  
for the simpler scheme -- designed to be used with any system encoding  
that isn't annoying. The PEP mentions:

"Encodings that are not compatible with ASCII are not supported by  
this specification; bytes in the ASCII range that fail to decode will  
cause an exception. It is widely agreed that such encodings should not  
be used as locale charsets."

[1]: http://opengroup.org/onlinepubs/007908775/xbd/charset.html#tag_001_002

--
Philip Jenvey