RFR: 8293017: Improve hash calculation parallelism in jmod/jlink
Сергей Цыпанов
duke at openjdk.org
Mon Aug 29 09:37:19 UTC 2022
On Mon, 29 Aug 2022 08:55:06 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> `jmod`/`jlink` are executed during build time to produce the `jmod` and the base modules image. On slow hardware (Raspberry Pi -class, for example) and/or slow VMs (debug, interpreter-only, for example) this takes a while. Profiling shows the considerable time is spent on hashing modules either for writing out the ModuleHash attribute or for verifying the hashes are good.
>
> Those paths can be parallelized to make them quite faster.
>
> The major contributors to module hashing are `java.base`, `jdk.desktop` and `jdk.localedata`, so we have a significant opportunity for parallelism here.
>
> Motivational improvements on `make clean-images images`:
>
>
> # x86_64 Server, release
>
> # Baseline
> real 0m11.895s
> user 1m4.526s
> sys 0m10.715s
>
> # Patched
> real 0m10.701s ; <--- 1.1x improvement
> user 1m5.097s
> sys 0m11.260s
>
> # x86_64 Zero, release
>
> # Baseline
> real 5m20.335s
> user 7m7.791s
> sys 0m7.258s
>
> # Patched
> real 2m23.551s ; <--- 2.23x improvement
> user 7m14.442s
> sys 0m7.856s
>
>
> Additional testing:
> - [x] Linux x86_64 fastdebug, `java/util/jar`
> - [x] Linux x86_64 fastdebug, `tools/jmod`
> - [x] Linux x86_64 fastdebug, `tools/jlink`
> - [x] Linux x86_64 fastdebug `tier1`
src/java.base/share/classes/jdk/internal/module/ModuleHashes.java line 170:
> 168: static ModuleHashes generate(Set<ModuleReference> mrefs, String algorithm) {
> 169: Map<String, byte[]> nameToHash = new ConcurrentHashMap<>();
> 170: mrefs.stream().parallel().forEach(mref -> {
[AFAIK](https://stackoverflow.com/questions/28985704/parallel-stream-from-a-hashset-doesnt-run-in-parallel) streams taken from HashSet might have weak parallelism, so maybe it's worth wrapping `mrefs` into an ArrayList?
-------------
PR: https://git.openjdk.org/jdk/pull/10060
More information about the core-libs-dev
mailing list