<i18n dev> RFR JDK-7071819: To support Extended Grapheme Clusters in Regex

Peter Levart peter.levart at gmail.com
Wed Feb 3 11:26:43 UTC 2016

Hi Again,

I also have a comment on the implementation of the hash table and it's 
GC-ness. You keep the string pool under a SoftReference because it is 
the biggest object so it makes most sense to clear it when heap becomes 
full. Other arrays are kept strongly reachable, but you re-generate them 
nevertheless when string pool is cleared by GC. Would it make sense to:

- define all int[] arrays (including strPool) as final instance 
variables of CharacterName
- have one static field:

private static SoftReference<CharacterName> refInstance;

- convert initNamePool() into a private constructor.

- convert public static getName/getCodePoint into public instance methods

- introduce public static method:

public static CharacterName getInstance() {
     SoftReference<CharacterName> ref = refInstance;
     CharacterName instance;
     if (ref != null && ((instance = ref.get) != null) {
         return instance;
     instance = new CharacterName();
     refInstance = new SoftReference<>(instance);
     return instance;

...in this arrangement, you don't need volatile fields or explicit 
fences, as all instance fields can be final and JMM guarantees safe 
publication in this case.

Regards, Peter

On 02/03/2016 12:07 PM, Peter Levart wrote:
> Hi Sherman,
> I don't currently have any idea how to squeeze the hashtable any more 
> further. It is already very compact in my opinion. But I noticed a 
> data race that could result in navigating the half-initialized data 
> structure. It is a classical unsafe publication bug. It has been 
> present before in get(int cp) and it is present now in both 
> getName(int cp) and getCodePoint(String name). For example:
>  151     static int getCodePoint(String name) {
>  152         byte[] strPool = null;
>  153         if (refStrPool == null || (strPool = refStrPool.get()) == 
> null) {
>  154             strPool = initNamePool();
>  155         }
> vs.
> 111             refStrPool = new SoftReference<>(strPool);
> ...the static refStrPool field is not marked volatile.
> One way to fix this is to mark field volatile and then rearrange code 
> in getName/getCodePoint to only read from it once by introducing a 
> local var. The other would be to change the line 111 into something like:
> SoftReference<byte[]> rsp = new SoftReference<>(strPool);
> unsafe.storeFence();
> refStrPool = rsp;
> ...*and* also rearrange code in getName/getCodePoint to only read from 
> field once by introducing a local var.
> Regards, Peter
> On 02/02/2016 10:25 PM, Xueming Shen wrote:
>> Hi,
>> Have not heard any feedback on this one so far. I'm adding
>> a little more to make it attractive for reviewers :-)
>> On top of the \N now the webrev includes the proposal to add
>> two more matchers, \X for unicode extended grapheme cluster
>> and \b{g} for the corresponding boundary.
>> Issue: https://bugs.openjdk.java.net/browse/JDK-7071819
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8147531
>> webrev: http://cr.openjdk.java.net/~sherman/8147531_7071819/webrev/
>> Thanks!
>> Sherman
>> On 01/18/2016 11:52 PM, Xueming Shen wrote:
>>> Hi,
>>> Please help review the change to add \N support in regex.
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8147531
>>> webrev: http://cr.openjdk.java.net/~sherman/8147531/webrev
>>> This is one of the items we were planning to address via JEP111
>>> http://openjdk.java.net/jeps/111
>>> https://bugs.openjdk.java.net/browse/JDK-8046101
>>> Some of the constructs had been added already in early release. I'm
>>> planning to address the rest as individual rfe separately.
>>> Thanks,
>>> Sherman

More information about the i18n-dev mailing list