Proposal of introducing new JFR events for DNS lookups
Alan Bateman
alan.bateman at oracle.com
Fri Nov 14 11:48:43 UTC 2025
It's a good topic to bring up.
Have you tried the JFR support for method timing and tracing events
that JEP 520 introduced in JDK 25? I'm wondering if
-XX:StartFlightRecording:jdk.MethodTrace#filter=java.net.InetAddress::getByName
records events that could help here.
If new events are introduced then I could image them having
"NameService" rather than "Dns" in the name as JDK doesn't use DNS
directly (except the JNDI-DNS provider), it is whatever is configured on
the system.
-Alan
On 13/11/2025 11:07, hashjangcyber at gmail.com wrote:
> Hello,
> I would like to start a discussion on introducing new JFR events for
> DNS lookups. While many lookups are DNS in cloud-native environments,
> the JDK uses the configured name service, so the event naming and
> semantics should not imply DNS-only behavior. I’m seeking feedback on
> scope, naming, and payload fields.
> Motivation
>
> * High-frequency, latency-sensitive lookups are critical for service
> discovery.
> * Current gaps:
> o Cannot distinguish cache hits vs. network lookups
> o Hard to trace lookup latency and diagnose timeouts/failures
> o Concurrent libraries may cause redundant lookups
> * Value:
> o End-to-end observability: lookup → socket connect → data transfer
> o Troubleshooting: identify timeouts, resolution failures
> o Performance: evaluate cache policies, detect hotspot names
> o Security: audit external domains accessed
>
> *Proposed event (initial draft)*
> *Event name:* jdk.DnsLookup
> *When:* Emitted around DNS hostname resolution call boundaries, including:
>
> * Actual network DNS queries (when cache is disabled or cache miss
> occurs)
> * Cache hits (when result is retrieved from DNS cache)
> * Stale data usage (when expired but still valid cached data is used)
> * Background DNS cache refresh operations
>
> *Key fields (feedback welcome):*
>
> * host (String): The hostname being resolved
> * result (String): Comma-separated list of resolved IP addresses, or
> error message if lookup failed
> * success (boolean): Whether the DNS lookup was successful
> * cached (boolean): Whether the result was retrieved from cache
> (true) or from actual DNS network query (false). This helps
> distinguish between three use cases:
> o Actual network queries (cached=false) - represents real DNS
> network traffic
> o Cache hits (cached=true, stale=false) - repeated lookups using
> fresh cached data
> o Stale data usage (cached=true, stale=true) - application
> continues with expired but still valid cached data when DNS
> refresh fails
> * ttl (long, seconds): Time to live in seconds. Values:
> o 0 or -1: Not cached or forever cached
> o > 0: Actual remaining TTL if cached
> * stale (boolean): Whether stale cached data was used (only valid
> when cached=true). Helps identify semi-error scenarios where DNS
> errors occur but application continues using stale cached records
>
> *Event name:* jdk.DnsCacheStatistics
> *When:* Periodic event emitted at configurable intervals (default: 5
> seconds in default.jfc, 1 second in profile.jfc). This is a statistics
> event similar to jdk.ExceptionStatistics, providing aggregate metrics
> about the DNS cache state.
> *Key fields (feedback welcome):*
>
> * cacheSize (long): Current number of entries in the DNS cache.
> Useful for monitoring cache growth and understanding cache
> utilization patterns.
> * staleEntries (long): Number of stale entries currently in the
> cache (entries that have expired but are still within the stale
> period). Helps identify how many entries are using stale data,
> which is important for understanding cache behavior in scenarios
> where DNS refresh fails.
> * entriesRemoved (long): Number of entries that have been removed
> during cache cleanup operations. This metric tracks cache eviction
> and helps understand cache churn patterns, which is particularly
> useful in Kubernetes and cloud-native environments where DNS
> entries may change frequently.
>
> *Use cases:*
>
> * Monitoring DNS cache size growth over time
> * Identifying cache cleanup frequency and patterns
> * Understanding stale data usage in production environments
> * Troubleshooting DNS-related performance issues in microservices
> architectures
> * Observing cache behavior during DNS server failures or network
> partitions
>
> Prototype/PR
>
> * A preliminary PR is available for context and discussion:
> o https://git.openjdk.org/jdk/pull/28110
> <https://git.openjdk.org/jdk/pull/28110>
> * I will update the design/implementation per feedback from this thread.
>
> Thanks in advance for your feedback!
> Best regards,
> NeayGuyCoding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/net-dev/attachments/20251114/bc106f14/attachment-0001.htm>
More information about the net-dev
mailing list