RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2]

Claes Redestad redestad at openjdk.java.net
Fri Feb 11 12:11:54 UTC 2022


> I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input.
> 
> Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904
> 
> - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups.
> 
> - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field.
> 
> - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix).

Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision:

 - Merge branch 'master' into count_positives
 - Restore partial vector checks in AVX2 and SSE intrinsic variants
 - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral
 - Simplify changes to encodeUTF8
 - Fix little-endian error caught by testing
 - Reduce jumps in the ascii path
 - Remove unused tail_mask
 - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives)
 - Add more comments, simplify tail branching in AVX512 variant
 - Resolve issues in the precise implementation
 - ... and 13 more: https://git.openjdk.java.net/jdk/compare/42073fce...c4bb3612

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7231/files
  - new: https://git.openjdk.java.net/jdk/pull/7231/files/2a855eb6..c4bb3612

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=00-01

  Stats: 18287 lines in 533 files changed: 12765 ins; 2983 del; 2539 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7231.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231

PR: https://git.openjdk.java.net/jdk/pull/7231


More information about the core-libs-dev mailing list