Rewrite of IBM doublebyte charsets
Xueming Shen
Xueming.Shen at Sun.COM
Thu May 21 19:52:35 UTC 2009
Ulf Zibis wrote:
> Am 21.05.2009 01:48, Xueming Shen schrieb:
>> Thanks for the 5 minutes:-)
>>
>> Your FindXYZcoderBugs tests are indeed very helpful to catch most of
>> the "inconsistent" behaviors
>> between different paths by feeding the "random" inputs.
>>
>> The TestIBMDB.java is diffing the behaviors of old implementation and
>> new implementation
>> with all "decode-able" bytes and "encode-able" chars...so it gives us
>> some of the guarantee.
>
> Why do we *try* to stick on old behaviour in case of malformed and/or
> unmappable input, if we don't diff new against old ?
> Then we also could *try*, to treat malformed and/or unmappable input
> most accurate.
> As you mentioned, most users don't distinguish between those, so they
> won't be affected. On the other hand, user's, who did this
> distinction, would probably happy to return more accurate results,
> even if not identical to recent results.
>
This is the approach/plan I decided to go with to achieve the goals I
listed last time. Sticking with the old behavior for
now make it easy, or say possible, to push in such a big change. You
don't want to be stuck on this kind of "arguable"
issues when it's not the main goal of the project, detour yourself to
defend/argue whether or not this is the "correct"
change, if it's correct, then is this the right thing to do to break
the compatibility, is there people depend on them. If
you just start a new implementation, you definitely should do all the
"right" things. It is a different story when you
maintenance some existing products. As I said last time, with this
change, the implementation, the data structure are
now real open and ready for further optimization (instead of looking at
a big chunk of data without knowledge where
they come from), you can now work on the issue, if any, one by one,
including starting the argument of which error
should be "malformed" and which one should "unmapped". We're (I'm) 60%
done after this:-)
I
More information about the core-libs-dev
mailing list