RFR: 1028: Skara notify bot fails to retry if JBS update fails at the wrong time

Kevin Rushforth kcr at openjdk.java.net
Fri Jun 4 12:07:16 UTC 2021


On Thu, 3 Jun 2021 21:32:45 GMT, Erik Joelsson <erikj at openjdk.org> wrote:

> This patch makes the JBS notifier more resilient to failures. The Notify bot has to deal with different kinds of listeners, some which can handle repeated calls with the same notification (typically JBS) and some which can not (mail, slack). This is currently handled in a less than ideal way. 
> 
> Each notification is recorded in a history repo to avoid repeating it and limit reevaluation of potential notifications. If all listeners could handle repeat notifications, we would simply notify first, and update the history after. That would guarantee that every notification was eventually sent even if there was a failure. The problem is that we don't want any risk of things like emails being sent multiple times. So the current solution is to update the history first, and then notify the listener. If there is a recoverable failure, we then attempt to roll back. This works most of the time, but we have seen situations where bad timing causes JBS bugs to not be updated, requiring manual admin work to fix.
> 
> The solution I present here is a new property of the listener "idempotent". If true, the notifier will call the listener first and update history after. If false it will employ the old strategy (and attempt the rollback unless a NonRetriableException is thrown). The code gets quite messy so I tried to explain all this in a big class comment.
> 
> I didn't write any new tests for this as I can't think of a way to explicitly test the race with a failure. The exiting tests should be good enough.

Marked as reviewed by kcr (Reviewer).

-------------

PR: https://git.openjdk.java.net/skara/pull/1181


More information about the skara-dev mailing list