RFR: 1801: Exclude some GraphQL calls from RestRequestCache rate limiter

Erik Joelsson erikj at openjdk.org
Tue Jan 24 17:00:15 UTC 2023

On Tue, 24 Jan 2023 14:41:46 GMT, Erik Joelsson <erikj at openjdk.org> wrote:

> In [SKARA-1731](https://bugs.openjdk.org/browse/SKARA-1731), the best practices regarding non GET requests on Github became better enforced. Unfortunately that has lead to some unexpected performance issues. I believe this is mostly caused by GraphQL requests, which are always POST, being treated as a non GET REST requests by the limiter and limited to one per second. This limitation on a subset of our github queries is currently causing a queue buildup for that lock, which stabilizes to about 1m wait time. This delay is slowing down response times from the bots massively.
> Reading the Integrator best practices document again (https://docs.github.com/en/rest/guides/best-practices-for-integrators?apiVersion=2022-11-28) and the GraphQL resource limitations (https://docs.github.com/en/graphql/overview/resource-limitations) I'm pretty sure you aren't supposed to limit GraphQL to one call per second, but rather just make sure you stay within the points limit. The one per second recommendation is for the REST API.
> I want to make it possible to exclude certain calls from this rate limiter to see if this can alleviate the contention we are currently seeing. I have run with this patch for the pr bot since yesterday afternoon (PT) and it seems to be working well.
> Two PRs were unable to integrate yesterday due to this contention:
> https://github.com/openjdk/jdk/pull/12139
> https://github.com/openjdk/jdk/pull/11922

> You don't think we need to explicitly track GraphQL "points" to allow the rate limiter in RestRequest allow for future GraphQL requests which are more expensive? Your comment seem to indicate that you have manually analyzed our current GraphQL requests and come to the conclusion that they are worth 1 point each.

> Basically, what I think I am suggesting, is that instead of a `skipLimit` boolean, we should have a `graphQlPoints`, which we can set to 0 for normal REST calls. And if it is non-zero, the rate limits counts towards the graph QL point limit instead.
> If we know that we stay way below the 5000 per hour, we can skip the actual limitation for now, but that would make it easy to inject such a check in the future. Or, we could do it properly from the start, and add a check that the accumulated QL points do not exceed the limit.

That's an interesting idea. Can we keep it i mind for future improvements? Especially if we end up trying to implement more complex GraphQL queries in the future. I think implementing what you suggest in a useful way is a bit more complex. With the current set of queries, I would prefer not to make things more complicated right now.

I have monitored the current bot using a manual GraphQL query for current points. Our bot user has 12500 points per hour and we are currently well within that.


PR: https://git.openjdk.org/skara/pull/1461

More information about the skara-dev mailing list