ClassLoaderStats VM operation can be quadratic in some cases

Tue May 12 15:05:33 UTC 2020

Hi Thomas,

Thanks for the fix, tested it and it works perfectly.

On Tue, May 12, 2020 at 8:35 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>
> So, turns out the hash function was wrong. It produced constant values which caused the table to degenerate to a list.
>
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-May/039451.html
>
> Feel free to review.
>
> VM op takes now 13ms (down from 1.5 seconds in the unpatched version)
>
> Cheers, Thomas
>
>
>
> On Mon, May 11, 2020 at 2:09 PM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>>
>> Hi Jean Philippe,
>>
>> These lookups are needed since we accumulate the cost for anonymous/hidden classes as well as official classes to the loader, and the latter are loaded under a separate CLD.
>>
>> Capping the hashtable content? How would this work. You would get incomplete results.
>>
>> One could make the hashtable somewhat broader, however the hashtable size is a compile time argument, and which size is the right size? With a loader count of 40K, even 1000 or 2000 buckets would be overbooked by factor 20 or so.
>>
>> Looking more closely at the code, I see at least one optimization possibility in ResourceHashMap: the typical "if (table.get(k)==NULL) table.put(k,v);" pattern calls lookup twice. This could be done with one lookup. I opened a follow up issue: https://bugs.openjdk.java.net/browse/JDK-8244733 .
>>
>> Also, it is even worse than o(n^2) since we also call ClassLoaderStatsClosure::addEmptyParents() to collect stats for empty parent class loaders for every loader, which for every loader crawls up the loader chain and looks up the loader. However, I am not sure at all why we do this. I am not a JFR developer, maybe someone from the Oracle JFR team can answer that? The resulting numbers for these empty loaders would be of course zero, since they have not loaded anything. Is it just to make them show up in the class loader list?
>>
>> Note I am not a JFR developer. I'll wait for the JFR devs to chime in.
>>
>> Cheers, Thomas
>>
>>
>> On Thu, May 7, 2020 at 12:03 PM Jean-Philippe BEMPEL <jean-philippe.bempel at datadoghq.com> wrote:
>>>
>>> Hello all,
>>>
>>> We have a customer complaining about long VM operations especially
>>> ClassLoaderStats using JFR. This operation can last several seconds!
>>>
>>> Looking into the recording of this application, we realize that there is
>>> 40K+ ClassLoaders, which is unusual, maybe a classloader leak, but this is
>>> not the point, as without JFR the application is running fine.
>>>
>>> I can reproduce the issue with this simple class:
>>>
>>> package com.bempel.sandbox;
>>>
>>> import java.net.URL;
>>> import java.net.URLClassLoader;
>>> import java.util.ArrayList;
>>> import java.util.List;
>>>
>>> public class JFRClassLoaderStatsIssue {
>>>
>>>     private static List<ClassLoader> classLoaderList = new ArrayList<>();
>>>
>>>     public static void main(String[] args) throws Exception {
>>>         URL location =
>>> JFRClassLoaderStatsIssue.class.getProtectionDomain().getCodeSource().getLocation();
>>>         for (int i = 0; i < 40_000; i++) {
>>>             URLClassLoader classLoader = new URLClassLoader(new
>>> URL[]{location});
>>>             Class.forName(JFRClassLoaderStatsIssue.class.getCanonicalName(),
>>> true, classLoader);
>>>             classLoaderList.add(classLoader);
>>>         }
>>>     }
>>> }
>>>
>>> Running with on JDK11 with the following command line:
>>>
>>> java -Xlog:safepoint
>>> -XX:StartFlightRecording=dumponexit=true,filename=issue.jfr
>>> -XX:-UseBiasedLocking -cp build/classes/java/main
>>> com.bempel.sandbox.JFRClassLoaderStatsIssue
>>>
>>> output:
>>> [...]
>>> [0.861s][info][safepoint] Entering safepoint region:
>>> ClassLoaderStatsOperation
>>> [6.593s][info][safepoint] Leaving safepoint region
>>> [6.593s][info][safepoint] Total time for which application threads were
>>> stopped: 5.7324145 seconds, Stopping threads took: 0.0000127 seconds
>>> [...]
>>>
>>> Looking into ClassLoaderStatsClosure::do_cld method [1] we can see that
>>> each classloader is inserted into a hashtable for the duration of the VM
>>> operation.
>>> but this hashtable is fixed in size to 256 buckets. At some point, lookups
>>> and insertions become linear (too many collisions). Those lookups and
>>> insertions are done for each Classloader => O(n^2)
>>>
>>> With a small number of classloaders it's fine, but with 40K it's not ok :)
>>> My thought about this issue would be to cap the number of classloaders
>>> inserted into the hashmap based on _total_loaders field counter (which BTW
>>> seems not used at all).
>>>
>>> Thanks for looking into it.
>>>
>>> Kind Regards,
>>> Jean-Philippe Bempel
>>> [1]
>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/classfile/classLoaderStats.cpp#L48