RFR: 1513: Reduce polling of mailing list archives
kcr at openjdk.org
Fri Jul 29 18:47:01 UTC 2022
On Fri, 29 Jul 2022 17:31:23 GMT, Erik Joelsson <erikj at openjdk.org> wrote:
> This patch changes the strategy used by the MailmanListReader for polling the mailman archives. The current implementation relies on the server supporting "etag" in order to trust any cached results. Recent testing has shown that etags aren't supported by mail.openjdk.org, which means no results are ever cached, we just keep spamming the mailman archives for the last 12 months over and over.
> My new implementation assumes that new emails will only ever appear in the current and previous months archives. (If this proves to be wrong, I still think that would be rare enough that it doesn't matter, as the full 12 months will be re-evaluated on bot restart.) So for anything older than the previous month, all successful (200) or non-existent (404) results will be cached and never re-queried.
> The reason mlbridge needs to query emails for up to a year back is that it needs to piece together conversations and trace them back to the original post in order to correctly identify the PR link associating the conversation with a certain PR. (It's possible that this could be made more efficient in a separate change.)
> The change itself is rather small, but in order to test it, I needed to expand functionality in the TestMailmanServer. The existing tests did not verify any calls to archives for months other than the current, so I needed to add support for actually handling that. I also moved the data to in memory storage in HashMaps instead of writing to temp files.
> My only worry here is that I messed up with the test so that it will start failing on certain days of the year.
Marked as reviewed by kcr (Reviewer).
More information about the skara-dev