Unicode script support in Regex and Character class
Ulf Zibis
Ulf.Zibis at gmx.de
Tue Apr 27 01:36:28 UTC 2010
Am 24.04.2010 01:09, schrieb Xueming Shen:
>
> I changed the data file "format" a bit, so now the overal uniName.dat
> is less than 88k (last version is 122+k), but
> the I can no long use cpLen as the capacity for the hashmap. I'm now
> using a hardcoded 20000 for 5.2.
Again, is 88k the compressed or the uncompressed size ?
>> -- Is it faster, first copying the whole date in a byte[], and then
>> using ByteBuffer.getInt etc. against directly using DataInputStream
>> methods?
>> -- You could create a very long String with the whole data and then
>> use subString for the individual strings which could share the same
>> backing char[].
See attachment.
>> -- I don't think, it's a good idea, holding the whole data in memory,
>> especiallly as String objects; Additionally the backing char[]'s
>> occupy twice the space than a byte[]
>> -- the big new byte[total] and later the huge amount of String
>> objects could result in OOM error on small VM heap.
>> -- as compromise, you could put the cp->nameOff pointers in a
>> separate not-compressed data file, only hold this in memory, or
>> access it via DirectByteBuffer, and read the string data from
>> separate file only on request from Character.getName(int codePoint).
>> As option, a PreHashMap could cache individual loaded strings.
>> -- Anyway, having DirectByteBuffer access on deflated data would be a
>> performace/footprint gain.
>>
> Sorry, I don't think I fully understand your points here.
See above, the others I try tomorrow.
-Ulf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CharacterName1.java
Type: java/*
Size: 3571 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100427/6e396367/CharacterName1.java>
More information about the core-libs-dev
mailing list