RFR JDK-8147531,To add named character construct \N{...} to support Unicode name property
Xueming Shen
xueming.shen at oracle.com
Fri Jan 22 03:44:12 UTC 2016
On 1/19/16 11:43 AM, Martin Buchholz wrote:
> Many years ago I considered implementing this cool feature.
> I thought that few would find it worth the cost - it would be hard to
> keep the cost low if this feature is used only rarely. You might want
> an expiring cache of character name mappings, and the JDK doesn't have
> such a thing yet.
As a matter of fact. The compressed data file is about 130k in the file
system. The
inflated runtime data for the name string table is about 700k. The
cp->name lookup
table is about 160k and the name->cp lookup mapping is about 400k+
(there might
be a little more space can be cut from the homemade hashmap...). So the
overall
runtime cost is about 1.2mb for this "cool" feature. Yes, it's a little
bigger than the
zt_tw charset, but consider you can have a round trip mapping between
all the
codepoints and their names, 1.3mb might not be that expensive, consider
a "normal"
pic now takes couple mb memory.
How about you help take a look to see if we can squeeze out more space?
really need
a reviewer :-)
-sherman
>
> (I haven't actually reviewed the implementation)
> On Mon, Jan 18, 2016 at 11:52 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
>> Hi,
>>
>> Please help review the change to add \N support in regex.
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8147531
>> webrev: http://cr.openjdk.java.net/~sherman/8147531/webrev
>>
>> This is one of the items we were planning to address via JEP111
>> http://openjdk.java.net/jeps/111
>> https://bugs.openjdk.java.net/browse/JDK-8046101
>>
>> Some of the constructs had been added already in early release. I'm
>> planning to address the rest as individual rfe separately.
>>
>> Thanks,
>> Sherman
More information about the core-libs-dev
mailing list