java-nio-charset-enhanced -- Milestone 4 is released

Sun Mar 29 19:49:24 UTC 2009

Am 29.03.2009 20:27, Martin Buchholz schrieb:
> On Fri, Mar 27, 2009 at 15:44, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>   
>>
>> I also have coded such a test for full-scan comparision:
>> See CharsetsTest + LegacyCharset (it retrieves the legacy charsets by
>> reflection directly from rt.jar of the patched JDK) here:
>> https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/sun/nio/cs/
>>
>> It cost me several nights having all code points equal, faced to my special
>> mixture of range-limited direct maps and full-range indirected map.
>>     
>
> It does look like you've written a lot of good tests.
> It would be nice not to have an explicit list of charsets in
> CharsetsTest.java.PARAMETERS.
>   

The advantage of this list is, that I can disable charsets by 
line-commenting to speed up the test while debugging special cases.

> I guess it's a list of charsets subject to single-byte testing?
>   

Yes, + charsets depending on those. E.g. EUC-JP depends on JIS-X-0201.

> If so, better documentation would be good.
> Charsets named ISO-8859-* are guaranteed to be single-byte,
> it might be good to include those programmatically,
> by filtering Charsets.availableCharsets().
>   

Good idea, but how to catch those, which internally use single-byte 
charsets e.g. JIS-X-0201?

> Why include EUC-JP but not UTF-8?
>   

UTF-8 is not affected of my changes in single-byte charsets.

> It's probably still a good idea to get inspiration from my
> Find*Bugs 

I'll keep this in mind.

> tests which test many other things like
> complete compatibility of exceptions in case of invalid input.
>   

I see, this would affect our discussion about malformed().
Concerning the malformed length on invalid low surrogate, I now have 
understood your philosophy while hacking the UTF-8 coder. As result I've 
filed a bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6798515

Concerning \uFFFE and \uFFFF I still think, that they are invalid, as 
these code points don't have any valid meaning from Java VM side, so why 
should they be seen as possibly mappable to other char encodings. 
Handling of BOM etc. should be done otherwise, e.g. by coder 
initialization or the flush() method.

> The problem is more human.  One would like to give credit for good ideas
> or good analysis, but the only official way to give credit in a commit
> message is
> via a simple
> Contributed-by: email-address
> which raises legal doubts even when there is no copyrighted material.
> I guess one can abuse the Summary: field to squeeze in thank-yous,
> but it's pretty obvious that you are circumventing the process.
>   

The last paragraph is difficult for me to understand in english. Could 
you please translate it?

-Ulf