Unix paths as bytes
Philip Jenvey
pjenvey at underboss.org
Tue May 5 19:14:28 PDT 2009
On May 4, 2009, at 11:24 PM, Martin Buchholz wrote:
>>
>> There's no case where 2 different sets of bytes would convert to
>> the same
>> chars
>
> I don't understand this. There are many locales with encodings with
> non-unique
> representations. Until the UTF-8 security reform,
> even UTF-8 had non-unique representations.
> The Python PEP seems designed to be used with
> any system encoding, not just UTF-8.
Ok, like ISO-2022-JP, ShiftJIS. These did come up in the PEP
discussion on the python-dev ML.
They weren't highly regarded as they're pretty broken as Unix locales.
The POSIX spec describes these "locking shift encodings" as fishy/
invalid for its character set [1] and they're incompatible with ASCII.
RedHat, Debian and others disable them as locales by default.
These are indeed problematic, I guess they just weren't a deal breaker
for the simpler scheme -- designed to be used with any system encoding
that isn't annoying. The PEP mentions:
"Encodings that are not compatible with ASCII are not supported by
this specification; bytes in the ASCII range that fail to decode will
cause an exception. It is widely agreed that such encodings should not
be used as locale charsets."
[1]: http://opengroup.org/onlinepubs/007908775/xbd/charset.html#tag_001_002
--
Philip Jenvey
More information about the nio-dev
mailing list