RFR [S] 8131034: Cleanup in j.u.regex.Pattern.quote()
Hello! There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/ Sincerely yours, Ivan
On Jul 12, 2015, at 6:32 PM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Hello!
There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/
Looks ok. /** - * The pattern is converted to normalizedD form and then a pure group + * The pattern is converted to normalized form and then a pure group * is constructed to match canonical equivalences of the characters. */ The "D" in "normalizedD form" may well be short hand for "canonical decomposition" (Normalizer.Form.NFD) as in "normalized canonical decomposition form". Paul.
Sincerely yours, Ivan
Thank you Paul for your review! On 13.07.2015 12:57, Paul Sandoz wrote:
On Jul 12, 2015, at 6:32 PM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Hello!
There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/
Looks ok.
/** - * The pattern is converted to normalizedD form and then a pure group + * The pattern is converted to normalized form and then a pure group * is constructed to match canonical equivalences of the characters. */
The "D" in "normalizedD form" may well be short hand for "canonical decomposition" (Normalizer.Form.NFD) as in "normalized canonical decomposition form". Hm. Still looks like a typo to me. A few lines below normalizedPattern has no extra D. Another normalization form, NFKD, also ends with D, so normalizedD wouldn't uniquely identify NFD. grep didn't find other occurrences of normalizedD abbreviation in JDK :)
Sincerely yours, Ivan
Paul.
Sincerely yours, Ivan
I agree that normalizedD looks like a typo (but it probably was not), but "normalized" is too ambiguous (there are too many different kinds of normalization) so a precise term like NFD should be used. Maybe we should add a reference to http://unicode.org/reports/tr15/ On Mon, Jul 13, 2015 at 4:29 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Thank you Paul for your review!
On 13.07.2015 12:57, Paul Sandoz wrote:
On Jul 12, 2015, at 6:32 PM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Hello!
There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/
Looks ok.
/** - * The pattern is converted to normalizedD form and then a pure group + * The pattern is converted to normalized form and then a pure group * is constructed to match canonical equivalences of the characters. */
The "D" in "normalizedD form" may well be short hand for "canonical decomposition" (Normalizer.Form.NFD) as in "normalized canonical decomposition form".
Hm. Still looks like a typo to me. A few lines below normalizedPattern has no extra D. Another normalization form, NFKD, also ends with D, so normalizedD wouldn't uniquely identify NFD. grep didn't find other occurrences of normalizedD abbreviation in JDK :)
Sincerely yours, Ivan
Paul.
Sincerely yours,
Ivan
Thanks Martin! On 13.07.2015 21:26, Martin Buchholz wrote:
I agree that normalizedD looks like a typo (but it probably was not), but "normalized" is too ambiguous (there are too many different kinds of normalization) so a precise term like NFD should be used. Maybe we should add a reference to http://unicode.org/reports/tr15/
Okay, what if we have /** - * The pattern is converted to normalized form and then a pure group - * is constructed to match canonical equivalences of the characters. + * The pattern is converted to normalized form (NFD, canonical + * decomposition) and then a pure group is constructed to match + * canonical equivalences of the characters. + * + * @see java.text.Normalizer.Form.NFD */ We've already got a link to http://www.unicode.org/unicode/reports/tr15/tr15-23.html from the doc for java.text.Normalizer.NFD: http://docs.oracle.com/javase/8/docs/api/java/text/Normalizer.Form.html I'd rather not duplicate the reference in that comment. Sincerely yours, Ivan
On Mon, Jul 13, 2015 at 4:29 AM, Ivan Gerasimov <ivan.gerasimov@oracle.com <mailto:ivan.gerasimov@oracle.com>> wrote:
Thank you Paul for your review!
On 13.07.2015 12:57, Paul Sandoz wrote:
On Jul 12, 2015, at 6:32 PM, Ivan Gerasimov <ivan.gerasimov@oracle.com <mailto:ivan.gerasimov@oracle.com>> wrote:
Hello!
There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/ <http://cr.openjdk.java.net/%7Eigerasim/8131034/00/webrev/>
Looks ok.
/** - * The pattern is converted to normalizedD form and then a pure group + * The pattern is converted to normalized form and then a pure group * is constructed to match canonical equivalences of the characters. */
The "D" in "normalizedD form" may well be short hand for "canonical decomposition" (Normalizer.Form.NFD) as in "normalized canonical decomposition form".
Hm. Still looks like a typo to me. A few lines below normalizedPattern has no extra D. Another normalization form, NFKD, also ends with D, so normalizedD wouldn't uniquely identify NFD. grep didn't find other occurrences of normalizedD abbreviation in JDK :)
Sincerely yours, Ivan
Paul.
Sincerely yours, Ivan
@see is mostly obsolete, now that we can use {@linkplain java.text.Normalizer.Form.NFD NFD} On Mon, Jul 13, 2015 at 4:51 PM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Thanks Martin!
On 13.07.2015 21:26, Martin Buchholz wrote:
I agree that normalizedD looks like a typo (but it probably was not), but "normalized" is too ambiguous (there are too many different kinds of normalization) so a precise term like NFD should be used. Maybe we should add a reference to http://unicode.org/reports/tr15/
Okay, what if we have
/** - * The pattern is converted to normalized form and then a pure group - * is constructed to match canonical equivalences of the characters. + * The pattern is converted to normalized form (NFD, canonical + * decomposition) and then a pure group is constructed to match + * canonical equivalences of the characters. + * + * @see java.text.Normalizer.Form.NFD */
We've already got a link to http://www.unicode.org/unicode/reports/tr15/tr15-23.html from the doc for java.text.Normalizer.NFD: http://docs.oracle.com/javase/8/docs/api/java/text/Normalizer.Form.html
I'd rather not duplicate the reference in that comment.
Sincerely yours, Ivan
On Mon, Jul 13, 2015 at 4:29 AM, Ivan Gerasimov < ivan.gerasimov@oracle.com> wrote:
Thank you Paul for your review!
On 13.07.2015 12:57, Paul Sandoz wrote:
On Jul 12, 2015, at 6:32 PM, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:
Hello!
There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/
Looks ok.
/** - * The pattern is converted to normalizedD form and then a pure group + * The pattern is converted to normalized form and then a pure group * is constructed to match canonical equivalences of the characters. */
The "D" in "normalizedD form" may well be short hand for "canonical decomposition" (Normalizer.Form.NFD) as in "normalized canonical decomposition form".
Hm. Still looks like a typo to me. A few lines below normalizedPattern has no extra D. Another normalization form, NFKD, also ends with D, so normalizedD wouldn't uniquely identify NFD. grep didn't find other occurrences of normalizedD abbreviation in JDK :)
Sincerely yours, Ivan
Paul.
Sincerely yours,
Ivan
looks good. On 07/12/2015 09:32 AM, Ivan Gerasimov wrote:
Hello!
There's a minor issue in the current implementation of Pattern.quote() with possible numeric overflow when calculating initial capacity of StringBuilder. With the fix, some slight optimizations were done and a few typos were fixed.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8131034 WEBREV: http://cr.openjdk.java.net/~igerasim/8131034/00/webrev/
Sincerely yours, Ivan
participants (4)
-
Ivan Gerasimov
-
Martin Buchholz
-
Paul Sandoz
-
Xueming Shen