[12] RFR: 8215194: Initial size of UnicodeBlock map is incorrect

Naoto Sato

11 Dec 2018 11 Dec '18

2:51 p.m.

Hi, Please review the fix for the following issue: https://bugs.openjdk.java.net/browse/JDK-8215194 The proposed fix is located at: http://cr.openjdk.java.net/~naoto/8215194/webrev.00/ This one line fix is for the correctness of the initial map size of Character.UnicodeBlock. Naoto

Show replies by date

Hi Naoto, Since the value changes from time to time, it would give it some visibility if it were defined using a private final int (or float) private final int MAP_CAPACITY = 667; Though I suppose the test can't use the value without using reflection. But it would lower the maintenance in the long term. $.02, Roger On 12/11/2018 09:51 AM, Naoto Sato wrote:

...

Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

Naoto Sato

5:59 p.m.

Hi Roger, Thanks. I updated it as suggested (incl. test using reflection): http://cr.openjdk.java.net/~naoto/8215194/webrev.01/ Naoto On 12/11/18 7:57 AM, Roger Riggs wrote:

...

Hi Naoto,

Since the value changes from time to time, it would give it some visibility if it were defined using a private final int (or float) private final int MAP_CAPACITY = 667;

Though I suppose the test can't use the value without using reflection. But it would lower the maintenance in the long term.

$.02, Roger

On 12/11/2018 09:51 AM, Naoto Sato wrote:

...
Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

Roger Riggs

6:12 p.m.

Hi Naoto, Looks ok, I intended only the number of elements to be defined as a constant. The other factors can be hard coded. In the test, you will still have to edit the test when the number changes. I meant to avoid that edit. Though then may be there is not need for the test at all. Roger On 12/11/2018 12:59 PM, Naoto Sato wrote:

...

Hi Roger,

Thanks. I updated it as suggested (incl. test using reflection):

http://cr.openjdk.java.net/~naoto/8215194/webrev.01/

Naoto

On 12/11/18 7:57 AM, Roger Riggs wrote:

...
Hi Naoto,

Since the value changes from time to time, it would give it some visibility if it were defined using a private final int (or float) private final int MAP_CAPACITY = 667;

Though I suppose the test can't use the value without using reflection. But it would lower the maintenance in the long term.

$.02, Roger

On 12/11/2018 09:51 AM, Naoto Sato wrote:

...
Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

Naoto Sato

6:33 p.m.

Hi Roger, On 12/11/18 10:12 AM, Roger Riggs wrote:

...

Hi Naoto,

Looks ok,

I intended only the number of elements to be defined as a constant. The other factors can be hard coded.

OK, I revised it again: http://cr.openjdk.java.net/~naoto/8215194/webrev.02/

...

In the test, you will still have to edit the test when the number changes. I meant to avoid that edit. Though then may be there is not need for the test at all.

Yes, if we just reflectively use the constant in Character.UnicodeBlock in the test, it is guaranteed to succeed no matter what. Thus I added the assertion. Still, the test ofHashMap() succeeds till the block additions surpasses that 1024 capacity (thus this was overlooked on Unicode 11 upgrade). Naoto

...

Roger

On 12/11/2018 12:59 PM, Naoto Sato wrote:

...
Hi Roger,

Thanks. I updated it as suggested (incl. test using reflection):

http://cr.openjdk.java.net/~naoto/8215194/webrev.01/

Naoto

On 12/11/18 7:57 AM, Roger Riggs wrote:

...
Hi Naoto,

Since the value changes from time to time, it would give it some visibility if it were defined using a private final int (or float) private final int MAP_CAPACITY = 667;

Though I suppose the test can't use the value without using reflection. But it would lower the maintenance in the long term.

$.02, Roger

On 12/11/2018 09:51 AM, Naoto Sato wrote:

...
Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

Ivan Gerasimov

6:49 p.m.

Hi Naoto! The fix looks good, thank you! I wonder if the test can be updated to also verify the optimal capacity of HashMap<String, Character.UnicodeScript> aliases; On 12/11/18 10:33 AM, Naoto Sato wrote:

...

Hi Roger,

On 12/11/18 10:12 AM, Roger Riggs wrote:

...
Hi Naoto,

Looks ok,

I intended only the number of elements to be defined as a constant. The other factors can be hard coded.

OK, I revised it again:

http://cr.openjdk.java.net/~naoto/8215194/webrev.02/

...
In the test, you will still have to edit the test when the number changes. I meant to avoid that edit. Though then may be there is not need for the test at all.

Yes, if we just reflectively use the constant in Character.UnicodeBlock in the test, it is guaranteed to succeed no matter what. OptimalCapacity.ofHashMap uses the initialCapacity to verify that a HashMap created with that initialCapacity will not reallocate its internal array after inserting the factual number of elements.

So, if one day the number of entities is increased, but the constant NUM_ENTITIES is not updated, then the test will catch it. With kind regards, Ivan

...

Thus I added the assertion. Still, the test ofHashMap() succeeds till the block additions surpasses that 1024 capacity (thus this was overlooked on Unicode 11 upgrade).

Naoto

...
Roger

On 12/11/2018 12:59 PM, Naoto Sato wrote:

...
Hi Roger,

Thanks. I updated it as suggested (incl. test using reflection):

http://cr.openjdk.java.net/~naoto/8215194/webrev.01/

Naoto

On 12/11/18 7:57 AM, Roger Riggs wrote:

...
Hi Naoto,

Since the value changes from time to time, it would give it some visibility if it were defined using a private final int (or float) private final int MAP_CAPACITY = 667;

Though I suppose the test can't use the value without using reflection. But it would lower the maintenance in the long term.

$.02, Roger

On 12/11/2018 09:51 AM, Naoto Sato wrote:

...
Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

-- With kind regards, Ivan Gerasimov

Naoto Sato

6:59 p.m.

Hi Ivan, Thank you for the review. On 12/11/18 10:49 AM, Ivan Gerasimov wrote:

...

OptimalCapacity.ofHashMap uses the initialCapacity to verify that a HashMap created with that initialCapacity will not reallocate its internal array after inserting the factual number of elements.

So, if one day the number of entities is increased, but the constant NUM_ENTITIES is not updated, then the test will catch it.

Actually the reason I am fixing it is that, the test did not detect the discrepancy with Unicode 11 upgrade, where it just increased the entry numbers by the true block increases (11 blocks), without adding their aliases. --- --- a/src/java.base/share/classes/java/lang/Character.java Wed Nov 14 13:15:54 2018 +0100 +++ b/src/java.base/share/classes/java/lang/Character.java Wed Nov 21 14:24:31 2018 +0530 @@ -43,7 +43,7 @@ * a character's category (lowercase letter, digit, etc.) and for converting * characters from uppercase to lowercase and vice versa. * - * Character information is based on the Unicode Standard, version 10.0.0. + * Character information is based on the Unicode Standard, version 11.0.0. * * The methods and data of class {@code Character} are defined by * the information in the UnicodeData file that is part of the @@ -681,11 +681,11 @@ */ public static final class UnicodeBlock extends Subset { /** - * 638 - the expected number of entities + * 649 - the expected number of entities * 0.75 - the default load factor of HashMap */ private static Map<String, UnicodeBlock> map = - new HashMap<>((int)(638 / 0.75f + 1.0f)); + new HashMap<>((int)(649 / 0.75f + 1.0f)); /** * Creates a UnicodeBlock with the given identifier name. --- However, the HashMap allocates the entry size to the nearest power of two number, i.e., 1024 for both numbers 649 and 667. Thus the test passes even if there is discrepancy. Naoto

Claes Redestad

10:31 p.m.

Hi Naoto, while the here fix looks good, I can't help wondering why the map isn't final? /Claes On 2018-12-11 19:59, Naoto Sato wrote:

...

Hi Ivan,

Thank you for the review.

On 12/11/18 10:49 AM, Ivan Gerasimov wrote:

...
OptimalCapacity.ofHashMap uses the initialCapacity to verify that a HashMap created with that initialCapacity will not reallocate its internal array after inserting the factual number of elements.

So, if one day the number of entities is increased, but the constant NUM_ENTITIES is not updated, then the test will catch it.

Actually the reason I am fixing it is that, the test did not detect the discrepancy with Unicode 11 upgrade, where it just increased the entry numbers by the true block increases (11 blocks), without adding their aliases.

---

--- a/src/java.base/share/classes/java/lang/Character.java Wed Nov 14 13:15:54 2018 +0100 +++ b/src/java.base/share/classes/java/lang/Character.java Wed Nov 21 14:24:31 2018 +0530 @@ -43,7 +43,7 @@ * a character's category (lowercase letter, digit, etc.) and for converting * characters from uppercase to lowercase and vice versa. * - * Character information is based on the Unicode Standard, version 10.0.0. + * Character information is based on the Unicode Standard, version 11.0.0. * * The methods and data of class {@code Character} are defined by * the information in the UnicodeData file that is part of the @@ -681,11 +681,11 @@ */ public static final class UnicodeBlock extends Subset { /** - * 638 - the expected number of entities + * 649 - the expected number of entities * 0.75 - the default load factor of HashMap */ private static Map<String, UnicodeBlock> map = - new HashMap<>((int)(638 / 0.75f + 1.0f)); + new HashMap<>((int)(649 / 0.75f + 1.0f));

/** * Creates a UnicodeBlock with the given identifier name. ---

However, the HashMap allocates the entry size to the nearest power of two number, i.e., 1024 for both numbers 649 and 667. Thus the test passes even if there is discrepancy.

Naoto

Naoto Sato

10:29 p.m.

Hi Claes, I don't see any particular reason for that. In fact, I was wondering the same thing, but just left it as it was. Since I've already pushed the changeset, I will fix it on the next occasion. Naoto On 12/11/18 2:31 PM, Claes Redestad wrote:

...

Hi Naoto,

while the here fix looks good, I can't help wondering why the map isn't final?

/Claes

On 2018-12-11 19:59, Naoto Sato wrote:

...
Hi Ivan,

Thank you for the review.

On 12/11/18 10:49 AM, Ivan Gerasimov wrote:

...
OptimalCapacity.ofHashMap uses the initialCapacity to verify that a HashMap created with that initialCapacity will not reallocate its internal array after inserting the factual number of elements.

So, if one day the number of entities is increased, but the constant NUM_ENTITIES is not updated, then the test will catch it.

Actually the reason I am fixing it is that, the test did not detect the discrepancy with Unicode 11 upgrade, where it just increased the entry numbers by the true block increases (11 blocks), without adding their aliases.

---

--- a/src/java.base/share/classes/java/lang/Character.java Wed Nov 14 13:15:54 2018 +0100 +++ b/src/java.base/share/classes/java/lang/Character.java Wed Nov 21 14:24:31 2018 +0530 @@ -43,7 +43,7 @@ * a character's category (lowercase letter, digit, etc.) and for converting * characters from uppercase to lowercase and vice versa. * - * Character information is based on the Unicode Standard, version 10.0.0. + * Character information is based on the Unicode Standard, version 11.0.0. * * The methods and data of class {@code Character} are defined by * the information in the UnicodeData file that is part of the @@ -681,11 +681,11 @@ */ public static final class UnicodeBlock extends Subset { /** - * 638 - the expected number of entities + * 649 - the expected number of entities * 0.75 - the default load factor of HashMap */ private static Map<String, UnicodeBlock> map = - new HashMap<>((int)(638 / 0.75f + 1.0f)); + new HashMap<>((int)(649 / 0.75f + 1.0f));

/** * Creates a UnicodeBlock with the given identifier name. ---

However, the HashMap allocates the entry size to the nearest power of two number, i.e., 1024 for both numbers 649 and 667. Thus the test passes even if there is discrepancy.

Naoto

Roger Riggs

7:04 p.m.

Looks fine. Thanks, Roger On 12/11/2018 01:33 PM, Naoto Sato wrote:

...

Hi Roger,

On 12/11/18 10:12 AM, Roger Riggs wrote:

...
Hi Naoto,

Looks ok,

I intended only the number of elements to be defined as a constant. The other factors can be hard coded.

OK, I revised it again:

http://cr.openjdk.java.net/~naoto/8215194/webrev.02/

...
In the test, you will still have to edit the test when the number changes. I meant to avoid that edit. Though then may be there is not need for the test at all.

Yes, if we just reflectively use the constant in Character.UnicodeBlock in the test, it is guaranteed to succeed no matter what. Thus I added the assertion. Still, the test ofHashMap() succeeds till the block additions surpasses that 1024 capacity (thus this was overlooked on Unicode 11 upgrade).

Naoto

...
Roger

On 12/11/2018 12:59 PM, Naoto Sato wrote:

...
Hi Roger,

Thanks. I updated it as suggested (incl. test using reflection):

http://cr.openjdk.java.net/~naoto/8215194/webrev.01/

Naoto

On 12/11/18 7:57 AM, Roger Riggs wrote:

...
Hi Naoto,

Since the value changes from time to time, it would give it some visibility if it were defined using a private final int (or float) private final int MAP_CAPACITY = 667;

Though I suppose the test can't use the value without using reflection. But it would lower the maintenance in the long term.

$.02, Roger

On 12/11/2018 09:51 AM, Naoto Sato wrote:

...
Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

Rachna Goel

4:49 p.m.

New subject: <i18n dev> [12] RFR: 8215194: Initial size of UnicodeBlock map is incorrect

Hi Naoto, Thanks for fixing this. Your fix looks good to me. Thanks, Rachna On 12/11/18 8:21 PM, Naoto Sato wrote:

...

Hi,

Please review the fix for the following issue:

https://bugs.openjdk.java.net/browse/JDK-8215194

The proposed fix is located at:

http://cr.openjdk.java.net/~naoto/8215194/webrev.00/

This one line fix is for the correctness of the initial map size of Character.UnicodeBlock.

Naoto

-- Thanks, Rachna

2646

Age (days ago)

2646

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Claes Redestad
Ivan Gerasimov
Naoto Sato
Rachna Goel
Roger Riggs