<i18n dev> RFR JDK-7071819: To support Extended Grapheme Clusters in Regex

Xueming Shen xueming.shen at oracle.com
Wed Feb 3 17:09:39 UTC 2016


Hi Peter,

Thanks for the review and suggestion.

This appears to be a better approach. I was wondering if I should go 
this way to cache
those lookup tables as well, but ...

Webrev has been updated as suggested.

Thanks!
Sherman

On 2/3/16 3:26 AM, Peter Levart wrote:
> Hi Again,
>
> I also have a comment on the implementation of the hash table and it's 
> GC-ness. You keep the string pool under a SoftReference because it is 
> the biggest object so it makes most sense to clear it when heap 
> becomes full. Other arrays are kept strongly reachable, but you 
> re-generate them nevertheless when string pool is cleared by GC. Would 
> it make sense to:
>
> - define all int[] arrays (including strPool) as final instance 
> variables of CharacterName
> - have one static field:
>
> private static SoftReference<CharacterName> refInstance;
>
> - convert initNamePool() into a private constructor.
>
> - convert public static getName/getCodePoint into public instance methods
>
> - introduce public static method:
>
> public static CharacterName getInstance() {
>     SoftReference<CharacterName> ref = refInstance;
>     CharacterName instance;
>     if (ref != null && ((instance = ref.get) != null) {
>         return instance;
>     }
>     instance = new CharacterName();
>     refInstance = new SoftReference<>(instance);
>     return instance;
> }
>
> ...in this arrangement, you don't need volatile fields or explicit 
> fences, as all instance fields can be final and JMM guarantees safe 
> publication in this case.
>
>
> Regards, Peter
>
> On 02/03/2016 12:07 PM, Peter Levart wrote:
>> Hi Sherman,
>>
>> I don't currently have any idea how to squeeze the hashtable any more 
>> further. It is already very compact in my opinion. But I noticed a 
>> data race that could result in navigating the half-initialized data 
>> structure. It is a classical unsafe publication bug. It has been 
>> present before in get(int cp) and it is present now in both 
>> getName(int cp) and getCodePoint(String name). For example:
>>
>>  151     static int getCodePoint(String name) {
>>  152         byte[] strPool = null;
>>  153         if (refStrPool == null || (strPool = refStrPool.get()) 
>> == null) {
>>  154             strPool = initNamePool();
>>  155         }
>>
>> vs.
>>
>> 111             refStrPool = new SoftReference<>(strPool);
>>
>> ...the static refStrPool field is not marked volatile.
>>
>> One way to fix this is to mark field volatile and then rearrange code 
>> in getName/getCodePoint to only read from it once by introducing a 
>> local var. The other would be to change the line 111 into something like:
>>
>> SoftReference<byte[]> rsp = new SoftReference<>(strPool);
>> unsafe.storeFence();
>> refStrPool = rsp;
>>
>> ...*and* also rearrange code in getName/getCodePoint to only read 
>> from field once by introducing a local var.
>>
>>
>> Regards, Peter
>>
>> On 02/02/2016 10:25 PM, Xueming Shen wrote:
>>> Hi,
>>>
>>> Have not heard any feedback on this one so far. I'm adding
>>> a little more to make it attractive for reviewers :-)
>>>
>>> On top of the \N now the webrev includes the proposal to add
>>> two more matchers, \X for unicode extended grapheme cluster
>>> and \b{g} for the corresponding boundary.
>>>
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-7071819
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8147531
>>> webrev: http://cr.openjdk.java.net/~sherman/8147531_7071819/webrev/
>>>
>>> Thanks!
>>> Sherman
>>>
>>> On 01/18/2016 11:52 PM, Xueming Shen wrote:
>>>> Hi,
>>>>
>>>> Please help review the change to add \N support in regex.
>>>>
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8147531
>>>> webrev: http://cr.openjdk.java.net/~sherman/8147531/webrev
>>>>
>>>> This is one of the items we were planning to address via JEP111
>>>> http://openjdk.java.net/jeps/111
>>>> https://bugs.openjdk.java.net/browse/JDK-8046101
>>>>
>>>> Some of the constructs had been added already in early release. I'm
>>>> planning to address the rest as individual rfe separately.
>>>>
>>>> Thanks,
>>>> Sherman
>>>
>>
>



More information about the i18n-dev mailing list