Rewrite of IBM doublebyte charsets

Ulf Zibis Ulf.Zibis at gmx.de
Mon May 18 13:54:46 UTC 2009


Am 17.05.2009 23:00, Xueming Shen schrieb:
> Ulf Zibis wrote:
>> *** Encoder-Suggestions:
>>
>> (26) Why copying String to char[] in initC2B(), String access should 
>> be same fast?:
>>      - char[] sb = b2cSB.toCharArray();
>>      - char[] db = b2c[i].toCharArray();
>>
>> -Ulf
>>
>>
>
> because the b2c tables need to be updated before used to generate the 
> c2b tables, if there is
> a b2cNR table (means there are multiple "bytes" mapped to a single 
> same "char", when do
> c->b, we need to know which "bytes" to map to, this is done by 
> specified that in .nr map). In
> theory we need only do that if b2cNR presents, but I don't want to 
> keep two paths. A possible
> optimization is to pass in char[] instead of String, then only make a 
> copy when necessary.

Oops, yes, it was late after hours of thinking digital.

While thinking, why I didn't have this problem in my code....
   I didn't have to manipulate the b2c map, as I transformed all the 
NR's to the *.irregularities map file, which you called *.c2b, which is 
in fact an overwriting of the from b2c generated c2b map. (BTW, in *.nr 
the 2nd value is redundant and could be saved)
So if we have
    15 --> 000A
    25 --> 000A
in *.map, instead of
    25 (--> 000A)
in *.nr, we could have
    15 <-- 000A
in *.c2b

So avoiding the copying of the whole b2c map should be an additional 
sincere argument for my suggestion (21), which I must correct:

(21) join *.nr to *.c2b files (25->000a becomes 000a->15):
   Benefit[21]: reduce no. of files
   Benefit[22]: simplifies initC2B() (avoids 2 loops + saves copying the 
whole b2c map)

-Ulf







More information about the core-libs-dev mailing list