RFR: 8337517: Redacted Heap Dumps

Thu Aug 1 14:19:31 UTC 2024

On Thu, 1 Aug 2024 03:37:26 GMT, David Holmes <dholmes at openjdk.org> wrote:

> I must be missing something in the approach. The vast majority of confidential data will be in strings yet you focus on primitives that would rarely (if ever for boolean float/double) contain anything that could be recognised as such.

Notes from the field, looking through real world heap dumps: while most of the time the confidential data is in primitive arrays (key material, cipher buffers, string contents), primitive fields carry identifiable data as well, e.g. numeric account/transaction IDs. Even double/floats contain data often, think financial data or even (pants heavily) LLM weights.

A good approach is to strip everything that is not needed to follow-up on heap occupancy problems, as this is an overwhelmingly major use case. I think the approach of "strip everything, but the shape of the object graph and the shape of the objects" is a very reasonable thing to do. This is what zeroing out all primitive fields and primitive arrays contents achieves.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20409#issuecomment-2263195036