java-nio-charset-enhanced -- Milestone 4 is released

Ulf Zibis Ulf.Zibis at gmx.de
Fri Mar 27 22:44:54 UTC 2009


Am 27.03.2009 22:49, Martin Buchholz schrieb:
> Again, Ulf, I love the sort of stuff you're doing.
>   

Much thanks again for the flowers. :-)

> I hope to be able to contribute some enginering
> to your effort myself someday.
>
> In the meantime, we need some infrastructure to guarantee that
> the behavior of the charsets is completely unchanged as we optimize.
> I have some code left behind at Sun to do that, i.e. compare different
> JDKs w.r.t charset compatibility.
> Hopefully Sun engineers can resurrect that code and perhaps put it
> into a public mercurial repo somewhere.
>
> Another approach is to take the code in tests like my
> Find{En,De}coderBugs.java tests which compare direct
> vs. regular buffers, and retarget it to compare two different jdks.
>   

I also have coded such a test for full-scan comparision:
See CharsetsTest + LegacyCharset (it retrieves the legacy charsets by 
reflection directly from rt.jar of the patched JDK) here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/sun/nio/cs/

It cost me several nights having all code points equal, faced to my 
special mixture of range-limited direct maps and full-range indirected map.

> It's too difficult to give credit to external contributors.
> One problem is that the Contributed-by: line is a red flag to
> lawyers and other folks that might cause the legality of the change
> to be questioned without end.  Let's try to get Ulf a proper commit bit
> and make sure the legal questions come to an end.
>   

Aren't "Contributed-by" and "author" comments usual practice in open 
source products?
Even in Sun's JRL source author was mentioned. I think, the lawyer guys 
and girls from Sun should rethink that subject.
Ok, we will see ...

> Martin
>
> On Fri, Mar 27, 2009 at 13:29, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>   
>> Hi folks,
>>
>> milestone 4 of charset enhancement is released.
>>
>> - I reduced the jar-footprint, concerning entire single-byte needs, compared
>> to original JDK 6 binaries, down to 7 %, which also should perform class
>> loading, (not to forget: encoder maps are lazy initialized), even though
>> there are added 21 specialized coder algorithms.
>> - In this release there is only 1 class <SingleByteCharset> for all
>> single-byte charsets, which reads decoder mapping + all names including
>> aliases from a small data file (69..731 Bytes, average 250 Bytes). This is
>> possible, because numerous charsets can inherit their mappings (256 2-byte
>> chars) from each other, and empty or 1:1 ranges (especially \u0000..\u007F)
>> are filled by constructor.
>> - Additionally a set of 7 Decoder and 14 Encoder classes do there work,
>> specially speed + memory optimised for the charsets, having diverse
>> character spreading and frequency of occurrence. A special MapCalculator
>> class for playing with different parameters is provided in the test package.
>> - The aliases and historical names should no more statically and entirely
>> loaded, provided and linked from StandardCharsets class. They additionally
>> could be easy edited in files standard-charsets and extended-charsets (refer
>> Bug Id: 6795538). If some day they are defined entirely upper-case, they
>> could be omitted completely, as they are redundantly case-standardised
>> existing in the FastCharsetProvider lookup maps. Determining the
>> 'contains()' references by this way would be also reasonable (refer Bug Id:
>> 6761481), but containment of ASCII is already calculated automatically.
>>
>> See my projects home: ---> https://java-nio-charset-enhanced.dev.java.net/
>>
>> I believe, these techniques could also be used for most multi-byte charsets,
>> especially inheriting maps to reduce entire charsets footprint.
>>
>>
>> Outlook Milestone 5 : Final performance optimisation by dedicated inlining,
>> exception catching, surrogate handling etc..
>>                                   Urgently waiting for Christian Thalinger's
>> optimization of "widening conversions".
>>
>>
>> Happy easter,
>>
>> -Ulf
>>
>> P.S.: I'm on the way, providing changesets slice by slice for OpenJDK 7.
>> BTW: Is there a way to add author and/or contributor annotation in the
>> sources to honour the investigation of external collaborators (almost 1 year
>> in my case)?
>>
>>
>>
>>     
>
>
>   




More information about the core-libs-dev mailing list