RFR: 2065: Update PR labels when new files are touched

Wed Aug 27 20:53:32 UTC 2025

On Mon, 25 Aug 2025 21:23:12 GMT, Zhao Song <zsong at openjdk.org> wrote:

> This patch is trying to make the pr bot be able to update PR labels when new files are touched.
> The main idea is from Erik. For a new PR, the bot will run a labelerWorkItem to auto label the PR first, and the commit hash will be stored in a comment.
> 
> Later, when the user pushes new commits to the PR, the CheckWorkItem will be responsible for updating the labels based on the diff between headHash and stored Hash. I thought about adding the logic of updating labels in LabelerWorkItem, but later I realized that it will require another round of CheckWorkItem, so I choose to let CheckWorkItem update the labels.
> 
> With this patch, here are some new behaviors of the bot:
> (1) The user didn't issue manual label command before auto labeling, LabelerWorkItem will do the initial auto labeling and store the commit hash in the comment. 
> (2) If the user issued manual label command before auto labeling, the initial auto labeling will be skipped, the bot would still post a comment and store the hash in the comment.
> (3) The user pushes a new commit or few commits to a pr that already auto labeled, the bot will evaluate the diff between stored hash and current head, then add new labels or upgrading labels to group labels, in the end, update the stored hash in the comment.
> (4) The user force pushes to the pr that already auto labeled, the bot will evaluate the diff between baseHash and current head.
> (5) The user issues a command to add a label to the pr(or even the user add the label via the forge UI), the bot will check if the labels can be upgraded to group labels.
> 
> The side effect of this feature I can imagine is that a user thinks a file is not related to a component and removed it manually, but later, every time when he touches the file, the label will be added back.

I understand wanting to optimize for performance by putting this in CheckRun, but it really comes at a cost of maintainability IMO. Having two different WorkItems share the responsibility of adding labels in different situations seems bad. We also have the race situation between labels and comment which isn't easily solved in a nice way within CheckRun. 

Are you sure we would need to spawn another CheckWorkItem after the labeler has run? I can see that it would be forced if `rfr` wasn't set yet and that it would be triggered if we actually added a label, but for most situations where nothing changed, the only thing that would happen is that a comment got updated. That comment would have the bot user as author, and those are filtered out already for the updated check, aren't they? I may be missing something.

If we do this in the LabelerWorkItem, we can more easily control the order of label changes and comment update, to avoid the risk of inconsistent states.

bots/pr/src/main/java/org/openjdk/skara/bots/pr/CheckRun.java line 293:

> 291:                             "(<!-- PullRequest Bot label commit ')[^']*(' -->)",
> 292:                             "$1" + pr.headHash().toString() + "$2"
> 293:                     ));

This creates a race where if the `WorkItem` is killed after updating the comment, but before applying `newLabels`, a rerun will not actually retry adding those labels.

bots/pr/src/test/java/org/openjdk/skara/bots/pr/LabelTests.java line 598:

> 596: 
> 597: 
> 598: 

That's a lot of empty lines.

-------------

PR Review: https://git.openjdk.org/skara/pull/1735#pullrequestreview-3161683972
PR Review Comment: https://git.openjdk.org/skara/pull/1735#discussion_r2305251911
PR Review Comment: https://git.openjdk.org/skara/pull/1735#discussion_r2305256943