RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

Paul Sandoz paul.sandoz at oracle.com
Wed Feb 11 20:45:45 UTC 2015


On Feb 11, 2015, at 8:23 PM, Stuart Marks <stuart.marks at oracle.com> wrote:

> On 2/11/15 2:25 AM, Paul Sandoz wrote:
>> Hi Stuart,
>> 
>> Thanks for the detailed review.
>> 
>> Here is a possible way forward:
>> 
>> 1) Add the methods to Matcher, as proposed in the initial webrev.
> 
> Yes, I think this is the way to go, and it seems that Sherman concurs.
> 
>> 1.1) Change the specification of Matcher.results to reset the stream before matching, making it consistent with the replace* methods.
> 
> I'm not sure about this. The current replaceAll/replaceFirst methods reset the matcher before doing any matching, so the lambda-based overloads should do the same.

Yes, we are in agreement on that.


> 
> However, the model for
> 
>    Stream<MatchResult> results()
> 
> seems to me to be a stream of matches that would be returned by successive calls to find(). (Indeed, that's how it's implemented.) The no-arg find() call doesn't reset the Matcher,

It cannot, otherwise it would not work for repeated calls to obtain subsequent matches. 


> and it respects the existing region of the Matcher. I think results() should do the same.
> 

That "matches" my original thinking on the matter and is reflected in the patch. It's very simple to support. If the method was named "findAll" then it would be misleading and imply a reset was needed.


> Now there's also a find(int start) overload that does reset the matcher (discarding the region) but starts at the given position. This suggests another overload,
> 
>    Stream<MatchResult> results(int start)
> 
> which also resets the matcher. I don't think this is necessary, though, since I believe the equivalent effect can be achieved by setting the region before calling the no-arg results() -- as long as it doesn't reset the matcher.
> 

Yes.


> No matter what, I think results() will have to check for concurrent modification.

Yes, as in the patch.


> It seems to be in the nature of this API, sigh.
> 
> By the way, I think Matcher.results() is a fine name. The overload of Pattern.matches() is what bothers me.

Ok.


> 
>> 2) Add convenience methods for all replace*() and matches() on Pattern that defer to those on Matcher.
> 
> I'm not sure convenience methods are necessary. After all, there are the existing replaceFirst/replaceAll methods on Matcher that aren't on Pattern.
> 
> In addition, if you don't need to hang onto the Pattern or Matcher instances, my example from earlier can be compressed to:
> 
>    String result = Pattern.compile("a*b")
>        .matcher(input).replaceAll(mr -> mr.group().toUpperCase());
> 
> which ain't too bad.
> 

Agreed.


> 
>> We can do that in two stages, focusing on 1) in this review.
> 
> Yes.
> 

Ok, the first webrev i sent is already implemented as agreed so lets review that code.


>> I was not too concerned about the static method and the stream returning method having the same name as the context is quite different. For stream returning methods there is a de-facto pattern of using a plural of the stream element kind, so i prefer that to findAll. What about the name "Pattern.matchResults"? which chains well with "Pattern.match(...).results()".
>> 
>> --
>> 
>> Regarding the disparity between MatchResult and Matcher. I think that would require a new sub-interface of MatchResult from which Matcher extends from and returns. If we think it important we should do that for 9 otherwise we will be stuck for the stream-based methods.
> 
> Why do you think we need a new sub-interface? Wouldn't it be sufficient to add default methods?

What would the defaults do? Throwing an exception seems a poor solution to me.

My guess it will be possible to evolve Matcher in binary and source compatible way to use the sub-type.

Paul.

> 
> There is of course the possibility of existing 3rd party classes that implement MatchResult having conflicting methods. But I don't think MatchResult is implemented quite as often as the collection interfaces, where we've been reticent to add default methods because of so many implementations.
> 
> I did a quick web search and I did find a few implementations/extensions of MatchResult, but there were no obvious conflicts. This isn't definitive, of course.
> 
> The default implementations of group(String), start(String), and end(String) could throw IllegalArgumentException, since that's what the ones on Matcher do if there's no such named capture group.
> 
> I admit this is a little weird, since it's only safe to use these new methods if you know where the MatchResult instance came from. So logically perhaps it's a different type. My hunch, though, is that MatchResults aren't passed around, they're created from Patterns/Matchers and processed within the same code, so in practice this won't be a problem.
> 
> s'marks




More information about the core-libs-dev mailing list