Proposal to add another option to configure the DNS cache

Sergey Bylokhov bylokhov at amazon.com
Mon Mar 13 18:20:26 UTC 2023


Hi, Daniel.

> Hi Sergey,
> Could you tell us a bit about the problem you're trying to solve?

One of the main issue I try to solve is how the cache handle the intermittent DNS server outages due 
to overloading or network connection.


> - when the cache policy is set to NEVER, all requests for the same
> server are serialized - if 100 threads try to resolve the same server,
> 100 requests will be made one after another.
> However, with the default cache timeouts of 30/10 seconds, these
> problems are usually unnoticeable.

Default timeout for positive responses is good enough to "have recent dns-records" and to "minimize 
the amount of requests to the DNS server".

But the cache for the negative responses is problematic. This is a problem I would like to solve. 
Caching the negative response means that for **10** seconds the application will not be able to 
connect to the server.

Possible solutions:
  1. Decreasing timeout "for the negative responses": unfortunately more requests to the server at 
the moment of "DNS-outage" cause even more issues, since this is not the right moment to load 
network/server more.
  2. Increasing timeout "for the positive responses": this will decrease the chance to get an error, 
but the cache will start to use stale data longer.
  3. This proposal: it would be good to ignore the negative response and continue to use result of 
the last "successful lookup" until some additional timeout.

> I'd rather not extend the default caching periods; I've seen DNS
> responses with 30 second TTL, and if we cache the results longer than
> that, we might break someone's design.


The idea is to split the notion of the TTL and the timeout used for the cache. When TTL for the 
record will expire we should request the new data from the server. If this request goes fine we will 
update the record, if it fails we will continue to use the cached date until next sync.

For example if the new property "networkaddress.extended.cache.ttl" is set to 10 minutes, then we 
will cache positive response for 10 minutes but will try to sync it every 30 seconds. If the new 
property is not set then as before we will cache positive for 30 seconds and then cache negative 
response for 10 seconds.


-- 
Best regards, Sergey.


More information about the net-dev mailing list