JDK-8215626 : Correct [^..&&..] intersection negation behaviour JDK8 vs JDK11 ??
Andrew Leonard
andrew_m_leonard at uk.ibm.com
Mon Jan 7 13:50:34 UTC 2019
Anyone got any views on which "regex" beheviour is correct JDK8 or JDK11 ?
thanks
Andrew
Andrew Leonard
Java Runtimes Development
IBM Hursley
IBM United Kingdom Ltd
Phone internal: 245913, external: 01962 815913
internet email: andrew_m_leonard at uk.ibm.com
From: Andrew Leonard/UK/IBM
To: "OpenJDK Core Libs Developers" <core-libs-dev at openjdk.java.net>
Date: 03/01/2019 11:20
Subject: JDK-8215626 : Correct [^..&&..] intersection negation
behaviour JDK8 vs JDK11 ??
Hi,
I'm currently investigating bug JDK-8215626 and have discovered the
problem is in the Pattern interpretation of the [^..&&..] negation when
applied to "intersected" expressions. So I have simplified the bug example
to a more extreme and obvious example:
Input string: "1234 ABCDEFG !$%^& abcdefg"
pattern RegEx: "[^[A-B]&&[^ef]]"
Operation: pattern.matcher(input).replaceAll("");
JDK8 output:
1234 CDEFG !$%^& abcdefg
JDK11 output:
AB
So from the "spec" :
A character class is a set of characters enclosed within square brackets.
It specifies the characters that will successfully match a single
character from a given input string
Intersection:
To create a single character class matching only the characters common to
all of its nested classes, use &&, as in [0-9&&[345]].
Negation:
To match all characters except those listed, insert the "^" metacharacter
at the beginning of the character class.
The way I read the "spec" is the "^" negation negates the whole character
class within the outer square brackets, thus in this example:
"[^[A-B]&&[^ef]]" is equivalent to the negation of "[[A-B]&&[^ef]]"
ie.the negation of the intersect of chars A,B and everything other
than e,f
which is thus the negation of A,B
hence the operation above will remove any character in the input
string other than A,B
Hence, JDK11 in my opinion meets the "spec". It looks as though JDK8 is
applying the ^ negation to just [A-B] and then intersecting it with [^ef],
which to me is the wrong interpretation of the "spec".
Your thoughts please?
If JDK11 is correct, and JDK8 wrong, then the next question is do we fix
JDK8? as there's obviously potential "behavioural" impacts to existing
applications....?
Thanks
Andrew
Andrew Leonard
Java Runtimes Development
IBM Hursley
IBM United Kingdom Ltd
Phone internal: 245913, external: 01962 815913
internet email: andrew_m_leonard at uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
More information about the core-libs-dev
mailing list