Regexp with word-boundary followed by unicode character doesn't work in 19, 21

Stefan Norberg stefan at selessia.com
Fri Dec 15 19:29:21 UTC 2023


Thanks Raffaello,
Ah, thanks! Found https://bugs.openjdk.org/browse/JDK-8264160 in the
release notes for 19 just now.
Have a great weekend!

/Stefan

On Fri, Dec 15, 2023 at 8:24 PM Raffaello Giulietti <
raffaello.giulietti at oracle.com> wrote:

> By default, a word boundary only considers ASCII letters and digits. See
> "Predefined character classes" in the documentation.
>
> To add Unicode support, you have a choice between adding a flag as a 2nd
> argument to the compile() method
>
> Pattern p = Pattern.compile("(\\b" + word + "\\b)",
> Pattern.UNICODE_CHARACTER_CLASS);
>
> or add a flag in the regex pattern, as documented in "Special constructs
> (named-capturing and non-capturing)"
>
> Pattern p = Pattern.compile("(?U)(\\b" + word + "\\b)");
>
>
> Greetings
> Raffaello
>
>
> On 2023-12-15 20:07, Stefan Norberg wrote:
> > The following test works in 17 but fails in 19.0.2, and 21.0.1 on the
> > last assertion. Bug or feature?
> >
> > import org.junit.jupiter.api.Assertions;
> > import org.junit.jupiter.api.Test;
> >
> > import java.util.ArrayList;
> > import java.util.regex.Matcher;
> > import java.util.regex.Pattern;
> >
> > /**
> > * Tests passes in JDK 17 but fails in JDK 19, 21.
> > *
> > * The combination of a \b "word boundary" and a unicode char doesn't
> > seem to work in 19, 21.
> > *
> > */
> > public class UnicodeTest {
> > @Test
> > public void testRegexp() throws Exception {
> > var text = "En sak som ökas och sedan minskas. Bra va!";
> > var word = "ökas";
> > Assertions.assertTrue(text.contains(word));
> >
> > Pattern p = Pattern.compile("(\\b" + word + "\\b)");
> > Matcher m = p.matcher(text);
> > var matches = new ArrayList<>();
> >
> > while (m.find()) {
> > String matchString = m.group();
> > System.out.println(matchString);
> > matches.add(matchString);
> > }
> > Assertions.assertEquals(1, matches.size());
> > }
> > }
> >
> >
> >
> > openjdk version "21.0.1" 2023-10-17 LTS
> >
> > OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build 21.0.1+12-LTS)
> >
> > OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS,
> > mixed mode, sharing)
> >
> >
> > System Version: macOS 14.2 (23C64)
> >
> > Kernel Version: Darwin 23.2.0
> >
> >
> > Thanks!
> >
> >
> > /Stefan
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231215/7985b85b/attachment-0001.htm>


More information about the core-libs-dev mailing list