RFR 8230365 : Pattern for a control-char matches non-control characters

Stuart Marks stuart.marks at oracle.com
Mon Sep 9 23:39:06 UTC 2019



On 9/5/19 1:43 PM, Ivan Gerasimov wrote:
> Personally, I don't have a strong preference here.
> 
> The compatibility property are meant to be temporary anyways.
> 
> Maybe if we have a single option that will control several different aspects of 
> behavior, it will be harder to get rid of it?
> 
> Partially, because it will be tempting to reuse it for other similar changes, 
> should they be needed.

OK, let's take an inventory of what behavior changes are being contemplated for 
regexes:

JDK-8230675 restrict IDs for control chars
JDK-xxxxxxx allow case-insensitive IDs for control chars *NOTE*
JDK-8225021 Treat ambiguous embedded flags as parse syntax errors
JDK-8214245 Case insensitive matching doesn't work correctly for some character 
classes

*NOTE* this was part of the original JDK-8230675 proposal, but you removed it 
after discussion. I don't know if we decided never to do this, or whether we're 
merely considering it separately. It seemed to me that there was a possibility 
that we'd do this in the future.

Is this all the behavior changes being contemplated, or is this simply the set 
that we happened to have stumbled across recently? Put another way, if we 
decided to do some further analysis of regexes, would we run across other issues 
where we might say, "Yeah, we ought to fix that, but that would be a potentially 
incompatible behavior change, so we need to add another property." ?

In practice, such properties are only removed after a very long time, or perhaps 
even "never." It's not like this change would be added in this release (JDK 14), 
with backward compatibility support removed in a year (say, JDK 16) along with 
the property. The property, and the backward compatibility mode, would stick 
around in the code for many years.

What I want to avoid doing is to introduce behavior changes -- and properties to 
control them -- in a piecemeal fashion. It looks like we might have three or 
four candidates already. Would we want to live with three or four properties? If 
we did this and continued with additional changes, we might end up with six or 
eight or ten properties over time.

I'd like to see us look ahead a bit and take stock of what changes we're 
contemplating, and then decided how to deal with compatibility and migration 
based on that. I'd like to avoid making individual changes (and adding 
properties) one at a time, with decisions made in isolation, because that will 
lead to a proliferation of properties.

s'marks


More information about the core-libs-dev mailing list