<div dir="ltr"><div class="gmail_default" style="font-size:small">Thanks Raffaello,</div><div class="gmail_default" style="font-size:small">Ah, thanks! Found <a href="https://bugs.openjdk.org/browse/JDK-8264160">https://bugs.openjdk.org/browse/JDK-8264160</a> in the release notes for 19 just now.</div><div class="gmail_default" style="font-size:small">Have a great weekend!<br></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">/Stefan</div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 15, 2023 at 8:24 PM Raffaello Giulietti <<a href="mailto:raffaello.giulietti@oracle.com">raffaello.giulietti@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">By default, a word boundary only considers ASCII letters and digits. See <br>
"Predefined character classes" in the documentation.<br>
<br>
To add Unicode support, you have a choice between adding a flag as a 2nd <br>
argument to the compile() method<br>
<br>
Pattern p = Pattern.compile("(\\b" + word + "\\b)", <br>
Pattern.UNICODE_CHARACTER_CLASS);<br>
<br>
or add a flag in the regex pattern, as documented in "Special constructs <br>
(named-capturing and non-capturing)"<br>
<br>
Pattern p = Pattern.compile("(?U)(\\b" + word + "\\b)");<br>
<br>
<br>
Greetings<br>
Raffaello<br>
<br>
<br>
On 2023-12-15 20:07, Stefan Norberg wrote:<br>
> The following test works in 17 but fails in 19.0.2, and 21.0.1 on the <br>
> last assertion. Bug or feature?<br>
> <br>
> import org.junit.jupiter.api.Assertions;<br>
> import org.junit.jupiter.api.Test;<br>
> <br>
> import java.util.ArrayList;<br>
> import java.util.regex.Matcher;<br>
> import java.util.regex.Pattern;<br>
> <br>
> /**<br>
> * Tests passes in JDK 17 but fails in JDK 19, 21.<br>
> *<br>
> * The combination of a \b "word boundary" and a unicode char doesn't <br>
> seem to work in 19, 21.<br>
> *<br>
> */<br>
> public class UnicodeTest {<br>
> @Test<br>
> public void testRegexp() throws Exception {<br>
> var text = "En sak som ökas och sedan minskas. Bra va!";<br>
> var word = "ökas";<br>
> Assertions.assertTrue(text.contains(word));<br>
> <br>
> Pattern p = Pattern.compile("(\\b" + word + "\\b)");<br>
> Matcher m = p.matcher(text);<br>
> var matches = new ArrayList<>();<br>
> <br>
> while (m.find()) {<br>
> String matchString = m.group();<br>
> System.out.println(matchString);<br>
> matches.add(matchString);<br>
> }<br>
> Assertions.assertEquals(1, matches.size());<br>
> }<br>
> }<br>
> <br>
> <br>
> <br>
> openjdk version "21.0.1" 2023-10-17 LTS<br>
> <br>
> OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build 21.0.1+12-LTS)<br>
> <br>
> OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS, <br>
> mixed mode, sharing)<br>
> <br>
> <br>
> System Version: macOS 14.2 (23C64)<br>
> <br>
> Kernel Version: Darwin 23.2.0<br>
> <br>
> <br>
> Thanks!<br>
> <br>
> <br>
> /Stefan<br>
> <br>
</blockquote></div>