Unicode script support in Regex and Character class
Ulf Zibis
Ulf.Zibis at gmx.de
Mon May 10 21:56:07 UTC 2010
Some additional thoughts:
- out.writeShort((short)(num & 0xffff)); ---short form--->
out.writeShort((short)num);
- use Arrays.binarySearch() in Character.UnicodeBlock.of().
- "if (notFirst)" could be saved if you would first append the first
word to sb outside the while loop.
- StringBuilder sb could be initialized by the maximum name length (=83)
to avoid resizing;
- we could reuse the same Stringbuilder for multiple invokations of
Character.getName(cp)?
-- make CharacterName.get(cp) instance method and save CharacterName
object as ThreadLocal from Character.getName(cp).
-- synchronize Character.getName(cp).
- Instead using StringBuilder we could use ByteBuffer, omit the char[]
and build the final String by new String(bb.toArray(), "ASCII").
-- saves the twice bigger char[] for the pool.
-- I imagine, ByteBuffer would perform better than StringBuilder.
- save UnicodeBlocks, BlockStarts and scriptStarts in a file instead
statically in classfile.
-- e.g. init of scriptStarts is a big waste of byte code (7/11 bytes per
short/integer entry).
Am 08.05.2010 23:49, schrieb Xueming Shen:
> Hi,
>
> The API proposals for Unicode script support below have been approved.
>
> 6945564: Unicode script support in Character class
> 6948903: Make Unicode scripts available for use in regular expressions
>
> (2)Testing result suggests there is not too much runtime benefit of
> keeping a huge string
> data pool + an access hashmap for getName() implementation. The latest
> implementation now
> takes Ulf's suggestion to keep a relatively small byte[] pool and
> generate the names at runtime.
> (there is "even smaller" implementation, which consumes about 300K
> memory at runtime
> http://cr.openjdk.java.net/~sherman/script/webrev.00/
> but it has a "scalability" problem need to address when string pool
> grows beyond 64k and it
> is little slow)
I'm investigating in that.
For 1st, my string pool has size of only 35243.
-Ulf
More information about the core-libs-dev
mailing list