RFR (L) 7133093: Improve system dictionary performance

Mon Jul 17 20:10:12 UTC 2017

Feedback on the initial web rev:
Thank you for using web rev - that way I can do searches of the overall patch!

First - many thanks for doing this exercise so carefully, for following the specification in continuing
to record the initiating loader and for splitting this effort into multiple phases to allow further exploration
of the current expectations/behavior relative to class unloading.

And thank you for all the testing.

Just wanted to ensure that I understand some of the assumptions:

1) lock free reads of dictionaries - yes
2) lock free adds of entries to dictionaries - yes
3) removal of entries or entire dictionaries - at safe point only
4) rehashing - none today and none planned
5) resizing of dictionaries - none today, some planned
6) use of SystemDictionary_lock: essentially unchanged
7) atomicity guarantees relative to checking loader constraints and updates to the dictionary: unchanged

I did not review the SA/vmStructs - sounds like you had Jini George do that - great.

Minor questions/comments:

SystemDictionary::verify: "Verify of system dictionary failed" ->
looks like "Verify of placeholders failed"

thank you for asserts dictionary != NULL
Dictionary::print
  is it worth an assert that loader_data != NULL?
Can you please sanity check if there are other cases in which we need
to use dictionary_or_null() rather than dictionary()?
 - are there cases in which we could be using a dictionary from an anonymous class?
  e.g. there are a lot of places in systemDictionary.cpp where we
  call class_loader_data(class_loader); loader_data->dictionary();
     - was the lazily allocated dictionary allocated yet (see next question)

CLD.cpp
  Would it be less complex to eagerly create the dictionary for a CLD
  since it is extremely rare to not need one for non-anonymous CLDs?
  You could create before publishing the CLD

Terminology:
  How are you still using System Dictionary term?
  It still shows up in some comments and in some printed strings

biasedLocking.cpp, line 59
jvmtiGetLoadedClasses: line 298
dictionary.hpp lines 113,114, 120
dictionary.cpp 262-263, 478
dictionary.cpp 491
  fix comments e.g. "Iterate the system dictionary" ...
  - probably worth checking for other comments in the sources too
  unless these all appear to match new usage model

hashTable.hpp
line 268: what does this comment mean?
  "Don't create and use freelist"?
  I assume this means that this is a direct malloc, not using any block
  allocation, e.g. metaspace chunks, and this requires explicit freeing?

dictionary.hpp line 161
  could you possibly clean up comment while you are in here?
  I presume "a protection" is "a protection domain"?

dictionary.cpp
  why change from Hashtable::free_entry to explicit
unlink_entry/FREE_C_HEAP_ARRAY?

Future optimization:
I think that none of the built-in loaders is ever recorded as an initiating
loader for any class in a loader which permits unloading.
Note: we need to ensure that built-in includes boot, platform and
application class loader (and clean up any confusion with system
class loader - which can be set by the user to a custom class loader)

Any chance of changing DictionaryEntry* probe->klass() to be
probe->instanceKlass()?

Thank you for assertion _is_at_safepoint for try_get_next_class

CLD.cpp
  If the names of the classes_do vs. all_entries_do were modified to be
  defining_loader_classes_do vs. all_entries_classes_do or some way
  to make the distinction in the name that would be helpful.
  Otherwise, if you could please add comments to the CLDG::dictionary_classes_do
  etc. to distinguish these.

CLD.cpp
  lines 1200-1201
  Can you please fix the comment?
  I think you need separate comments for the ModuleEntry reads/PackageEntry's
exports which are looking for dead modules.
  IIUC, the do_unloading is actually looking for classes for which this
  is the initating loader, but their defining loader is dead.

systemDictionary.cpp: 445-447
  THANK you for the comments about the java_mirror handle keeping the class
  and class loader alive
  Could you clarify that the class_loader handle is not for the class
  we are checking, but rather for the initiating, i.e. requesting class loader
  (is that accurate?) So we are keeping both the requestor and the
  requested class alive

You removed the comment that compute_hash could lead to a safepoint/GC
  Is that not possible?
  comment in ObjectSynchronizer::FastHashCode says that with UseBiasedLocking
    we can revoke biases and must be able to handle a safepoint.

systemDictionary.cpp:
   NoSafepointVerifier
     If we add resizing logic later, and that happens at a safepoint,
     then the number of buckets change and the d_index is no longer valid
     so we would need to add back the NSpV?
    - this is a more general problem if compute_hash can lead to a safepoint/GC

systemDictionary.cpp: parse_stream
   why did you remove the record_dependency?

Who calls Dictionary::~Dictionary?

Performance:
  Do you have performance runs of our existing benchmarks?
  Initial key requirement is that this not slow down throughput through
  slowing down lookups.
  I know you are very aware of longer-term goals for reducing safepoint pause
  times and not increasing footprint significantly.

hashtable sizes: future rfe?
  given that we now have a lot more dictionaries, and they perhaps need
  individual sizing
   - would it be helpful to add statistics information about
   - e.g. lookup_count, lookup_length, _lookup_warning
   - i.e. number of entries, max lookup depth, etc. to use later for tuning?
   - might make sense to make this JFR data?

   Might think about a follow-up with more tuning parameters than
   a single global PredictedLoadedClassCount.

thanks,
Karen

> On Jul 17, 2017, at 2:37 PM, coleen.phillimore at oracle.com wrote:
> 
> 
> Hi, I have made changes for Ioi's comments below.  The largest change since .01 is that PredictedLoadedClassCount is used to size dictionaries for class loader data, and should be used to specify the average number of classes loaded by each class loader.  Since this is an option available via UnlockExperimentalVMOptions, I'm not sure what the process is for changing the meaning of this option. We will work with our customers who have used this option for specifying the total number of classes loaded.   We are also working on the change to allow automatic resizing of the dictionaries during safepoints.
> 
> See:
> open webrev at http://cr.openjdk.java.net/~coleenp/7133093.02/webrev
> 
> Tested with the full nightly test set (jdk, nsk, hotspot jtreg tests) on linux x64.
> 
> Thanks,
> Coleen
> 
> On 6/30/17 4:45 PM, coleen.phillimore at oracle.com wrote:
>> 
>> Ioi,  Thank you for looking at this change.
>> 
>> On 6/30/17 2:39 PM, Ioi Lam wrote:
>>> Hi Coleen,
>>> 
>>> Maybe the bug should be renamed to "Use one Dictionary per class loader instance"? That way it's more obvious what it is when you look at the repo history.
>> 
>> I can do that.  I made it One Dictionary per ClassLoaderData
>>> 
>>> 1.
>>> 
>>> Why is assert_locked_or_safepoint(SystemDictionary_lock) necessary in SystemDictionary::find_class (line 1826), but not necessary SystemDictionary::find (line 951)? Since you removed NoSafepointVerifier nosafepoint in the latter, maybe this means it's safe to remove the assert_locked_or_safepoint in the former?
>> 
>> The call to SystemDictionary::find() is the (I believe) usual lock free lookup and the SystemDictionary::find_class() is used to verify that a class we're about to add or want to add hasn't been added by another thread.   Or certain cases where we already have a lock to do something else, like add loader constraints.  I took out the NoSafepointVerifier because it assumes that system dictionary entries would be moved by GC, which they won't.  The old hash function could safepoint when getting the hash for the class_loader object.
>> 
>>> 
>>> 2.
>>> 
>>> 455 static ClassLoaderData* _current_loader_data = NULL;
>>> 456 static Klass*           _current_class_entry = NULL;
>>> 457
>>> 458 InstanceKlass* ClassLoaderDataGraph::try_get_next_class() {
>>> 
>>> How about moving the static fields into an iterator object. That way you don't need to keep track of the globals
>>> 
>>> ClassLoaderDataGraphIterator {
>>>   ClassLoaderData* _current_loader_data
>>>   Klass*           _current_class_entry;
>>> 
>>>   InstanceKlass* try_get_next_class() { ....}
>>> };
>> 
>> Ok, there's a different iterator that iterates over all of the classes for GC.  I will adapt that for this use.   That would be better.
>> 
>>> 
>>> 3. Double check locking in ClassLoaderData::dictionary() -- someone else should look at this :-)
>> 
>> I copied code that David Holmes added for modules.
>> 
>>> 
>>> 4. We may need a better strategy for deciding the size of each dictionary.
>>> 
>>> 565 const int  _primelist[10] = {1, 107, 1009};
>>> 571 Dictionary* ClassLoaderData::dictionary() {
>>> 579     if ((dictionary = _dictionary) == NULL) {
>>> 580       int size;
>>> 581       if (this == the_null_class_loader_data() || is_system_class_loader_data()) {
>>> 582         size = _primelist[2];
>>> 583       } else if (class_loader()->is_a(SystemDictionary::reflect_DelegatingClassLoader_klass())) {
>>> 584         size = _primelist[0];
>>> 585       } else {
>>> 586         size = _primelist[1];
>>> 587       }
>>> 588       dictionary = new Dictionary(this, size);
>>> 
>>> I'll do some investigation on this issue and get back to you.
>>> 
>> 
>> How about if someone uses PredictedLoadedClassCount, then we use that to size all but the reflection and boot class loader?  Then if there's an application that has a class loader with a huge amount of classes loaded in it, that would help this?  It might cost some footprint but an oversized table would simply be a bigger array of pointers, so it might not be that bad to oversize.
>> 
>> I think the long term solution is being able to resize these entries and make the application provide arguments.   Please let me know what you find in your investigation and if that would work.
>>> The rest of the changes look good to me.
>> 
>> Thank you!
>> Coleen
>>> 
>>> Thanks
>>> - Ioi
>>> 
>>> 
>>> On 6/23/17 4:42 PM, coleen.phillimore at oracle.com wrote:
>>>> Summary: Implement one dictionary per ClassLoaderData for faster lookup and removal during class unloading
>>>> 
>>>> See RFE for more details.
>>>> 
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/7133093.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-7133093
>>>> 
>>>> Tested with full "nightly" run in rbt, plus locally class loading and unloading tests:
>>>> 
>>>> jtreg hotspot/test/runtime/ClassUnload
>>>> 
>>>> jtreg hotspot/test/runtime/modules
>>>> 
>>>> jtreg hotspot/test/gc/class_unloading
>>>> 
>>>> make test-hotspot-closed-tonga FILTER=quick TEST_JOBS=4 TEST=vm.parallel_class_loading
>>>> 
>>>> csh ~/testing/run_jck9  (vm/lang/java_lang)
>>>> 
>>>> runThese -jck - uses class loader isolation to run each jck test and unloads tests when done (at -gc:5 intervals)
>>>> 
>>>> 
>>>> Thanks,
>>>> Coleen
>>>> 
>>>> 
>>> 
>> 
>