Proposal of introducing new JFR events for DNS lookups

Alan Bateman alan.bateman at oracle.com
Fri Nov 14 11:48:43 UTC 2025


It's a good topic to bring up.

Have you tried the JFR support for method timing and tracing events 
that  JEP 520 introduced in JDK 25? I'm wondering if 
-XX:StartFlightRecording:jdk.MethodTrace#filter=java.net.InetAddress::getByName 
records events that could help here.

If new events are introduced then I could image them having 
"NameService" rather than "Dns" in the name as JDK doesn't use DNS 
directly (except the JNDI-DNS provider), it is whatever is configured on 
the system.

-Alan


On 13/11/2025 11:07, hashjangcyber at gmail.com wrote:
> Hello,
> I would like to start a discussion on introducing new JFR events for 
> DNS lookups. While many lookups are DNS in cloud-native environments, 
> the JDK uses the configured name service, so the event naming and 
> semantics should not imply DNS-only behavior. I’m seeking feedback on 
> scope, naming, and payload fields.
> Motivation
>
>   * High-frequency, latency-sensitive lookups are critical for service
>     discovery.
>   * Current gaps:
>       o Cannot distinguish cache hits vs. network lookups
>       o Hard to trace lookup latency and diagnose timeouts/failures
>       o Concurrent libraries may cause redundant lookups
>   * Value:
>       o End-to-end observability: lookup → socket connect → data transfer
>       o Troubleshooting: identify timeouts, resolution failures
>       o Performance: evaluate cache policies, detect hotspot names
>       o Security: audit external domains accessed
>
> *Proposed event (initial draft)*
> *Event name:* jdk.DnsLookup
> *When:* Emitted around DNS hostname resolution call boundaries, including:
>
>   * Actual network DNS queries (when cache is disabled or cache miss
>     occurs)
>   * Cache hits (when result is retrieved from DNS cache)
>   * Stale data usage (when expired but still valid cached data is used)
>   * Background DNS cache refresh operations
>
> *Key fields (feedback welcome):*
>
>   * host (String): The hostname being resolved
>   * result (String): Comma-separated list of resolved IP addresses, or
>     error message if lookup failed
>   * success (boolean): Whether the DNS lookup was successful
>   * cached (boolean): Whether the result was retrieved from cache
>     (true) or from actual DNS network query (false). This helps
>     distinguish between three use cases:
>       o Actual network queries (cached=false) - represents real DNS
>         network traffic
>       o Cache hits (cached=true, stale=false) - repeated lookups using
>         fresh cached data
>       o Stale data usage (cached=true, stale=true) - application
>         continues with expired but still valid cached data when DNS
>         refresh fails
>   * ttl (long, seconds): Time to live in seconds. Values:
>       o 0 or -1: Not cached or forever cached
>       o > 0: Actual remaining TTL if cached
>   * stale (boolean): Whether stale cached data was used (only valid
>     when cached=true). Helps identify semi-error scenarios where DNS
>     errors occur but application continues using stale cached records
>
> *Event name:* jdk.DnsCacheStatistics
> *When:* Periodic event emitted at configurable intervals (default: 5 
> seconds in default.jfc, 1 second in profile.jfc). This is a statistics 
> event similar to jdk.ExceptionStatistics, providing aggregate metrics 
> about the DNS cache state.
> *Key fields (feedback welcome):*
>
>   * cacheSize (long): Current number of entries in the DNS cache.
>     Useful for monitoring cache growth and understanding cache
>     utilization patterns.
>   * staleEntries (long): Number of stale entries currently in the
>     cache (entries that have expired but are still within the stale
>     period). Helps identify how many entries are using stale data,
>     which is important for understanding cache behavior in scenarios
>     where DNS refresh fails.
>   * entriesRemoved (long): Number of entries that have been removed
>     during cache cleanup operations. This metric tracks cache eviction
>     and helps understand cache churn patterns, which is particularly
>     useful in Kubernetes and cloud-native environments where DNS
>     entries may change frequently.
>
> *Use cases:*
>
>   * Monitoring DNS cache size growth over time
>   * Identifying cache cleanup frequency and patterns
>   * Understanding stale data usage in production environments
>   * Troubleshooting DNS-related performance issues in microservices
>     architectures
>   * Observing cache behavior during DNS server failures or network
>     partitions
>
> Prototype/PR
>
>   * A preliminary PR is available for context and discussion:
>       o https://git.openjdk.org/jdk/pull/28110
>         <https://git.openjdk.org/jdk/pull/28110>
>   * I will update the design/implementation per feedback from this thread.
>
> Thanks in advance for your feedback!
> Best regards,
> NeayGuyCoding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/net-dev/attachments/20251114/bc106f14/attachment-0001.htm>


More information about the net-dev mailing list