Regexp with word-boundary followed by unicode character doesn't work in 19, 21
Stefan Norberg
stefan at selessia.com
Fri Dec 15 19:07:03 UTC 2023
The following test works in 17 but fails in 19.0.2, and 21.0.1 on the last
assertion. Bug or feature?
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Tests passes in JDK 17 but fails in JDK 19, 21.
*
* The combination of a \b "word boundary" and a unicode char doesn't seem
to work in 19, 21.
*
*/
public class UnicodeTest {
@Test
public void testRegexp() throws Exception {
var text = "En sak som ökas och sedan minskas. Bra va!";
var word = "ökas";
Assertions.assertTrue(text.contains(word));
Pattern p = Pattern.compile("(\\b" + word + "\\b)");
Matcher m = p.matcher(text);
var matches = new ArrayList<>();
while (m.find()) {
String matchString = m.group();
System.out.println(matchString);
matches.add(matchString);
}
Assertions.assertEquals(1, matches.size());
}
}
openjdk version "21.0.1" 2023-10-17 LTS
OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build 21.0.1+12-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS, mixed
mode, sharing)
System Version: macOS 14.2 (23C64)
Kernel Version: Darwin 23.2.0
Thanks!
/Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231215/438a63ec/attachment.htm>
More information about the core-libs-dev
mailing list