JDK-8215626 : Correct [^..&&..] intersection negation behaviour JDK8 vs JDK11 ??

Andrew Leonard andrew_m_leonard at uk.ibm.com
Wed Jan 9 15:46:44 UTC 2019


Thanks Sherman,
Yes I agree.
Cheers
Andrew

Andrew Leonard
Java Runtimes Development
IBM Hursley
IBM United Kingdom Ltd
Phone internal: 245913, external: 01962 815913
internet email: andrew_m_leonard at uk.ibm.com 




From:   Xueming Shen <xueming.shen at gmail.com>
To:     core-libs-dev at openjdk.java.net
Date:   08/01/2019 16:50
Subject:        Re: JDK-8215626 : Correct [^..&&..] intersection negation 
behaviour JDK8 vs JDK11 ??
Sent by:        "core-libs-dev" <core-libs-dev-bounces at openjdk.java.net>



Hi Andrew,


See [1]/[2] for the background of the fix. I would say jdk11 behavior is 
correct

and expected :-) anyway, it's a  behavior change, so probably will not 
be easily

to go back into jdk8.

Regards,

Sherman


[1] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html


[2] 
http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/lambdafunction



On 1/7/19 5:50 AM, Andrew Leonard wrote:
> Anyone got any views on which "regex" beheviour is correct JDK8 or JDK11 
?
> thanks
> Andrew
>
> Andrew Leonard
> Java Runtimes Development
> IBM Hursley
> IBM United Kingdom Ltd
> Phone internal: 245913, external: 01962 815913
> internet email: andrew_m_leonard at uk.ibm.com
>
>
>
>
> From:   Andrew Leonard/UK/IBM
> To:     "OpenJDK Core Libs Developers" <core-libs-dev at openjdk.java.net>
> Date:   03/01/2019 11:20
> Subject:        JDK-8215626 : Correct [^..&&..] intersection negation
> behaviour JDK8 vs JDK11 ??
>
>
> Hi,
> I'm currently investigating bug JDK-8215626 and have discovered the
> problem is in the Pattern interpretation of the [^..&&..] negation when
> applied to "intersected" expressions. So I have simplified the bug 
example
> to a more extreme and obvious example:
>      Input string: "1234 ABCDEFG !$%^& abcdefg"
>      pattern RegEx: "[^[A-B]&&[^ef]]"
>      Operation: pattern.matcher(input).replaceAll("");
>
> JDK8 output:
>        1234 CDEFG !$%^& abcdefg
> JDK11 output:
>        AB
>
> So from the "spec" :
> A character class is a set of characters enclosed within square 
brackets.
> It specifies the characters that will successfully match a single
> character from a given input string
> Intersection:
> To create a single character class matching only the characters common 
to
> all of its nested classes, use &&, as in [0-9&&[345]].
> Negation:
> To match all characters except those listed, insert the "^" 
metacharacter
> at the beginning of the character class.
>
> The way I read the "spec" is the "^" negation negates the whole 
character
> class within the outer square brackets, thus in this example:
>      "[^[A-B]&&[^ef]]"  is equivalent to the negation of 
"[[A-B]&&[^ef]]"
>      ie.the negation of the intersect of chars A,B and everything other
> than e,f
>      which is thus the negation of A,B
>      hence the operation above will remove any character in the input
> string other than A,B
> Hence, JDK11 in my opinion meets the "spec". It looks as though JDK8 is
> applying the ^ negation to just [A-B] and then intersecting it with 
[^ef],
> which to me is the wrong interpretation of the "spec".
>
> Your thoughts please?
>
> If JDK11 is correct, and JDK8 wrong, then the next question is do we fix
> JDK8? as there's obviously potential "behavioural" impacts to existing
> applications....?
>
> Thanks
> Andrew
>
> Andrew Leonard
> Java Runtimes Development
> IBM Hursley
> IBM United Kingdom Ltd
> Phone internal: 245913, external: 01962 815913
> internet email: andrew_m_leonard at uk.ibm.com
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


More information about the core-libs-dev mailing list