Unicode script support in Regex and Character class

Ulf Zibis Ulf.Zibis at gmx.de
Tue Apr 27 01:36:28 UTC 2010


Am 24.04.2010 01:09, schrieb Xueming Shen:
>
> I changed the data file "format" a bit, so now the overal uniName.dat 
> is less than 88k (last version is 122+k), but
> the I can no long use cpLen as the capacity for the hashmap. I'm now 
> using a hardcoded 20000 for 5.2.

Again, is 88k the compressed or the uncompressed size ?

>> -- Is it faster, first copying the whole date in a byte[], and then 
>> using ByteBuffer.getInt etc. against directly using DataInputStream 
>> methods?
>> -- You could create a very long String with the whole data and then 
>> use subString for the individual strings which could share the same 
>> backing char[].

See attachment.

>> -- I don't think, it's a good idea, holding the whole data in memory, 
>> especiallly as String objects; Additionally the backing char[]'s 
>> occupy twice the space than a byte[]
>> -- the big new byte[total] and later the huge amount of String 
>> objects could result in OOM error on small VM heap.
>> -- as compromise, you could put the cp->nameOff pointers in a 
>> separate not-compressed data file, only hold this in memory, or 
>> access it via DirectByteBuffer, and read the string data from 
>> separate file only on request from Character.getName(int codePoint). 
>> As option, a PreHashMap could cache individual loaded strings.
>> -- Anyway, having DirectByteBuffer access on deflated data would be a 
>> performace/footprint gain.
>>
> Sorry, I don't think I fully understand your points here.

See above, the others I try tomorrow.

-Ulf

-------------- next part --------------
A non-text attachment was scrubbed...
Name: CharacterName1.java
Type: java/*
Size: 3571 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100427/6e396367/CharacterName1.java>


More information about the core-libs-dev mailing list