Regexp with word-boundary followed by unicode character doesn't work in 19, 21
Stefan Norberg
stefan at selessia.com
Fri Dec 15 18:50:19 UTC 2023
Hi,
I apologize in advance if this isn't the right forum for posting a bug
report. If not, feel free to point me in the right direction!
The following test works in 17 but fails in 19, and 21 on the last
assertion. Bug or feature?
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Tests passes in JDK 17 but fails in JDK 19, 21.
*
* The combination of a \b "word boundary" and a unicode char doesn't seem
to work in 19, 21.
*
*/
public class UnicodeTest {
@Test
public void testRegexp() throws Exception {
var text = "En sak som ökas och sedan minskas. Bra va!";
var word = "ökas";
Assertions.assertTrue(text.contains(word));
Pattern p = Pattern.compile("(\\b" + word + "\\b)");
Matcher m = p.matcher(text);
var matches = new ArrayList<>();
while (m.find()) {
String matchString = m.group();
System.out.println(matchString);
matches.add(matchString);
}
Assertions.assertEquals(1, matches.size());
}
}
openjdk version "21.0.1" 2023-10-17 LTS
OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build 21.0.1+12-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS, mixed
mode, sharing)
System Version: macOS 14.2 (23C64)
Kernel Version: Darwin 23.2.0
Model Name: MacBook Pro
Model Identifier: Mac14,6
Model Number: Z175000DEKS/A
Chip: Apple M2 Max
Total Number of Cores: 12 (8 performance and 4 efficiency)
Memory: 64 GB
/Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20231215/8cc5e76b/attachment.htm>
More information about the jdk-dev
mailing list