[jdk21] JDK-8337994 REDO backport failure analysis - Missing prerequisite changes from JDK-8316241

Cetin, Ozan ozanctn at amazon.com
Wed Jan 7 09:37:43 UTC 2026


Hi,

I've been investigating the test failures that caused JDK-8346108 (the revert of JDK-8337994 REDO in JDK21). This is related to the native memory leak when not recording any JFR events (JDK-8335121).

Summary
Based on our investigation, we believe the JDK-8337994 (REDO) backport to JDK21 failed because it appears to depend on API changes introduced in the original JDK-8316241 fix that were never backported to JDK21. Our theory is that the REDO fix assumes the existence of infrastructure that only exists in later mainline releases.

Root Cause Analysis
The Missing Prerequisite

The original JDK-8316241 fix (commit b2a39c576706622b624314c89fa6d10d0b422f86) introduced several key changes to jfrTypeSetUtils.hpp/.cpp:

  1.  API Change: should_do_loader_klass(const Klass* k) → should_do_cld_klass(const Klass* k, bool leakp)
  2.  New Data Structure: Added _klass_loader_leakp_set for separate tracking of leakp (leak profiler) path klasses
  3.  New Function: get_cld_klass(CldPtr cld, bool leakp) in jfrTypeSet.cpp that properly enqueues CLD klasses via JfrTraceId::load()

What Happens Without These Changes

The REDO fix attempts to use get_cld_klass() which calls should_do_cld_klass(klass, leakp), but in the JDK21 backport:

  *   JDK21 still has the old API: should_do_loader_klass(const Klass* k) (no leakp parameter)
  *   JDK21 lacks _klass_loader_leakp_set for separate tracking
  *   The get_cld_klass() function doesn't exist in the JDK21 codebase

This causes the assert(IS_SERIALIZED(class_loader_klass)) to fail in write_cld() because the CLD's class_loader_klass is never properly enqueued for serialization during the leakp path.


Test Failure Mechanism (TestChunkIntegrity.java)

1. TestClassLoader loads MyClass

2. Event commits with clazz = MyClass

3. JFR rotation writes MyClass to chunk

4. MyClass's CLD references TestClassLoader Klass

5. BUG: TestClassLoader Klass not serialized (leakp path broken)

6. Chunk written with broken reference

7. In slowdebug: assert(IS_SERIALIZED(class_loader_klass)) fails

8. In release: "Events don't match" when comparing chunks


The Fix
I've been able to get a local jdk21 build passing all tests (including slowdebug) by backporting JDK-8316241 and resolving the resulting conflicts. The key changes are:
1. jfrTypeSetUtils.hpp
// OLD (JDK21 current)
bool should_do_loader_klass(const Klass* k);

// NEW (with leakp support)
bool should_do_cld_klass(const Klass* k, bool leakp);

2. jfrTypeSetUtils.cpp

// Added _klass_loader_leakp_set member

GrowableArray<const Klass*>* _klass_loader_leakp_set;



// Updated implementation

bool JfrArtifactSet::should_do_cld_klass(const Klass* k, bool leakp) {

  assert(k != nullptr, "invariant");

  assert(_klass_loader_set != nullptr, "invariant");

  assert(_klass_loader_leakp_set != nullptr, "invariant");

  return not_in_set(leakp ? _klass_loader_leakp_set : _klass_loader_set, k);

}


3. jfrTypeSet.cpp - Added get_cld_klass()

static inline KlassPtr get_cld_klass(CldPtr cld, bool leakp) {

  if (cld == nullptr) {

    return nullptr;

  }

  assert(leakp ? IS_LEAKP(cld) : used(cld), "invariant");

  KlassPtr cld_klass = cld->class_loader_klass();

  if (cld_klass == nullptr) {

    return nullptr;

  }

  if (should_do_cld_klass(cld_klass, leakp)) {

    if (current_epoch()) {

      // KEY FIX: Enqueue the klass for serialization

      JfrTraceId::load(cld_klass);

    } else {

      artifact_tag(cld_klass, leakp);

    }

    return cld_klass;

  }

  return nullptr;

}


Proposed Action
Based on this, it appears that backporting JDK-8337994 (REDO) alone may not be sufficient, and that some or all the prerequisite infrastructure changes from JDK-8316241 may also need to be backported.

Additionally, there may be other upstream commits (such as 8323631<https://github.com/openjdk/jdk/commit/e2d6023cb9667dc9911e0af421d6dd0c78f6bf58>) in JDK24 that were made on top of JDK-8316241 that could also be required for the fix to not cause other possible errors. We would appreciate guidance on identifying any additional changes that might need to be included in the backport.

If this direction makes sense, I'm happy to prepare a proper patch for review.

References

  *   JDK-8335121<https://bugs.openjdk.org/browse/JDK-8335121>: Native memory leak when JFR is enabled but no events are emitted
  *   JDK-8316241<https://github.com/openjdk/jdk/commit/b2a39c576706622b624314c89fa6d10d0b422f86#diff-e9a35c652aa2e65265e7027d3093298a6c59d2137cbf3fa6a5b25b895d77beb1L73>: Test jdk/jdk/jfr/jvm/TestChunkIntegrity.java failed (original fix)
  *   JDK-8337994<https://github.com/openjdk/jdk/commit/6a9a867d645b8fe86f4ca2b04a43bf5aa8f9d487>: [REDO] Native memory leak when not recording any events
  *   JDK-8346108<https://bugs.openjdk.org/browse/JDK-8346108>: Revert of REDO in JDK21u due to test failures


Best Regards,
Ozan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20260107/0d108b01/attachment-0001.htm>


More information about the hotspot-jfr-dev mailing list