RFR: 8358343: [leyden] Drop notify_all in CompilationPolicyUtils::Queue::pop

older
git: openjdk/leyden: premain: Port...

Aleksey Shipilev

3 Jun 2025 3 Jun '25

6:14 p.m.

Found this when reading premain-vs-mainline webrev. Mainline does not have `notify_all` in this method: https://github.com/openjdk/jdk/blob/c382da579884c28f2765b2c6ba68c0ad4fdcb2ce... But if you remove `notify_all()` in `premain`, then tests start to deadlock, see bug for a sample. The culprit is `CompilationPolicy::flush_replay_training_at_init`, which is only present in premain. I fixed it by using timed waits, which obviates the need for extra notifications. We only enter this method with `-XX:+AOTVerifyTrainingData`, so we don't care much about its performance. This is IMO better than doing a questionable `notify_all` followed by `wait` in load-bearing code. Additional testing: - [x] Linux x86_64 server fastdebug, `runtime/cds` (5x, no timeouts yet; still running more iterations) ------------- Commit messages: - Fix Changes: https://git.openjdk.org/leyden/pull/74/files Webrev: https://webrevs.openjdk.org/?repo=leyden&pr=74&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358343 Stats: 5 lines in 2 files changed: 3 ins; 1 del; 1 mod Patch: https://git.openjdk.org/leyden/pull/74.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/74/head:pull/74 PR: https://git.openjdk.org/leyden/pull/74

Show replies by date

Aleksey Shipilev

3 Jun 3 Jun

8 p.m.

On Tue, 3 Jun 2025 17:59:51 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

@veresov @iwanowww ^ ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-2936964207

Igor Veresov

6 Jun 6 Jun

3:25 a.m.

On Tue, 3 Jun 2025 17:59:51 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

Is there a reason why can't we just do the processing work in the thread calling the flush instead of waiting for the replay thread? That is, why not make it be like this: void CompilationPolicy::flush_replay_training_at_init(TRAPS) { InstaceKlass* ik; do { ik = _training_replay_queue.try_pop(TrainingReplayQueue_lock, THREAD); if (ik != nullptr) { replay_training_at_init_impl(ik, THREAD); } } while (ik != nullptr); } ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-2947960746

Aleksey Shipilev

6:57 a.m.

On Fri, 6 Jun 2025 03:22:05 GMT, Igor Veresov <iveresov@openjdk.org> wrote:

...

You tell me :) I guess one upside of current code is to leave draining/processing in one place/thread, and thus never run into false positives/negatives due to diagnostic code (this hunk, gated by `-XX:+AOTVerifyTrainingData`) doing something that production code does not. So I mildly prefer it in current form. I can change it, if you want. ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-2948283345

Aleksey Shipilev

7:34 a.m.

New subject: RFR: 8358343: [leyden] Drop notify_all in CompilationPolicyUtils::Queue::pop [v2]

...

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'premain' into JDK-8358343-leyden-training-notify-all - Fix ------------- Changes: - all: https://git.openjdk.org/leyden/pull/74/files - new: https://git.openjdk.org/leyden/pull/74/files/1f98bbe5..8af7ae69 Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=74&range=01 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=74&range=00-01 Stats: 95 lines in 26 files changed: 40 ins; 44 del; 11 mod Patch: https://git.openjdk.org/leyden/pull/74.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/74/head:pull/74 PR: https://git.openjdk.org/leyden/pull/74

Igor Veresov

8:37 a.m.

New subject: RFR: 8358343: [leyden] Drop notify_all in CompilationPolicyUtils::Queue::pop [v2]

On Fri, 6 Jun 2025 07:34:46 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

Actually the current approach (even with the spin-wait) and my solution too are not really correct. The fact that the queue is empty doesn't mean that every last item has been processed. The last item may have been popped, but still is being worked on. So however you look at it we should set up some kind of a handshake to make sure the replay thread is done processing, not just done popping. ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-2948506858

Aleksey Shipilev

3 Sep 3 Sep

9:40 a.m.

New subject: RFR: 8358343: [leyden] Drop notify_all in CompilationPolicyUtils::Queue::pop [v2]

On Fri, 6 Jun 2025 07:34:46 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

Dang, too late. I think this was fixed in `premain` with: https://github.com/openjdk/leyden/commit/ca21af49a1de66b623d5a7481877766e915... ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-3248479147

Igor Veresov

6 Jun 6 Jun

7:34 a.m.

On Tue, 3 Jun 2025 17:59:51 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

I don't remember. :) I'm just not a big fan of spin-waits even with sleeps inside... ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-2948362494

Igor Veresov

7:37 a.m.

On Tue, 3 Jun 2025 17:59:51 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

It'd be also a potentially nicely reusable method. Provided that it works of course... ------------- PR Comment: https://git.openjdk.org/leyden/pull/74#issuecomment-2948368134

Aleksey Shipilev

3 Sep 3 Sep

9:37 a.m.

New subject: RFR: 8358343: [leyden] Drop notify_all in CompilationPolicyUtils::Queue::pop [v3]

...

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Coordinate with replay thread shutdown - Merge branch 'premain' into JDK-8358343-leyden-training-notify-all - Merge branch 'premain' into JDK-8358343-leyden-training-notify-all - Fix ------------- Changes: - all: https://git.openjdk.org/leyden/pull/74/files - new: https://git.openjdk.org/leyden/pull/74/files/8af7ae69..f8c4debb Webrevs: - full: https://webrevs.openjdk.org/?repo=leyden&pr=74&range=02 - incr: https://webrevs.openjdk.org/?repo=leyden&pr=74&range=01-02 Stats: 182204 lines in 3878 files changed: 112334 ins; 45443 del; 24427 mod Patch: https://git.openjdk.org/leyden/pull/74.diff Fetch: git fetch https://git.openjdk.org/leyden.git pull/74/head:pull/74 PR: https://git.openjdk.org/leyden/pull/74

Aleksey Shipilev

9:40 a.m.

New subject: Withdrawn: 8358343: [leyden] Drop notify_all in CompilationPolicyUtils::Queue::pop

On Tue, 3 Jun 2025 17:59:51 GMT, Aleksey Shipilev <shade@openjdk.org> wrote:

...

This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/leyden/pull/74

189

Age (days ago)

281

Last active (days ago)

List overview

Download

10 comments

2 participants

participants (2)

Aleksey Shipilev
Igor Veresov