RFR: 8311939: Excessive allocation of Matcher.groups array [v2]
Cristian Vat
duke at openjdk.org
Sat Jul 22 12:29:39 UTC 2023
On Sat, 22 Jul 2023 12:13:10 GMT, Cristian Vat <duke at openjdk.org> wrote:
>> Reduces excessive allocation of Matcher.groups array when the original Pattern has no groups or less than 9 groups.
>>
>> Original clamping to 10 possibly due to documented behavior from javadoc:
>> "In this class, \1 through \9 are always interpreted as back references, "
>>
>> Only with Matcher changes RegExTest.backRefTest fails when backreferences to non-existing groups are present.
>> Added a match failure condition in Pattern that fixes failing tests.
>>
>> As per existing `java.util.regex.Pattern.BackRef#match`: "// If the referenced group didn't match, neither can this"
>>
>> A group that does not exist in the original Pattern can never match so neither can a backref to that group.
>> If the group existed in the original Pattern then it would have had space allocated in Matcher.groups for that group index.
>> So a group index outside groups array length must never match.
>
> Cristian Vat has updated the pull request incrementally with one additional commit since the last revision:
>
> reduce allocations also for Matcher.usePattern
updated to also reduce allocation in `java.util.regex.Matcher#usePattern`
all jdk_util tests passing
ran JMH `java.util.regex.FindPattern` test and times seem better but test is pretty light (large errors compared to avg score)
ran JMH with `-prof gc`:
Benchmark (patternString) (text) Mode Cnt Score Error Units
FindPattern.testFind:·gc.alloc.rate.norm [^A-Za-z0-9] abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP avgt 3 207.999 ± 0.030 B/op
FindPattern.testFind:·gc.alloc.rate.norm [^A-Za-z0-9] ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, avgt 3 208.000 ± 0.001 B/op
FindPattern.testFind:·gc.alloc.rate.norm [A-Za-z0-9] abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP avgt 3 207.861 ± 4.395 B/op
FindPattern.testFind:·gc.alloc.rate.norm [A-Za-z0-9] ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, avgt 3 207.999 ± 0.031 B/op
Benchmark (patternString) (text) Mode Cnt Score Error Units
FindPattern.testFind:·gc.alloc.rate.norm [^A-Za-z0-9] abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP avgt 3 56.181 ± 5.713 B/op
FindPattern.testFind:·gc.alloc.rate.norm [^A-Za-z0-9] ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, avgt 3 136.000 ± 0.001 B/op
FindPattern.testFind:·gc.alloc.rate.norm [A-Za-z0-9] abcdefghijklmnop1234567890ABCDEFGHIJKLMNOP avgt 3 56.000 ± 0.010 B/op
FindPattern.testFind:·gc.alloc.rate.norm [A-Za-z0-9] ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, avgt 3 135.999 ± 0.028 B/op
As expected 72 byte/op reduction for case when pattern doesn't match.
Unexpected: seems double reduction in allocations for the case when pattern matches. Not completely sure where that is coming from. Maybe some optimizations in loops since these don't have groups and array is always 2 elements?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14894#issuecomment-1646572525
More information about the core-libs-dev
mailing list