RFR (L) 7133093: Improve system dictionary performance

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Fri Jul 21 18:28:05 UTC 2017


 From internal discussions, the only sizing of the dictionary for the 
java.system.class.loader will follow PredictedLoadedClassCount until 
dictionary resizing is done.  The latest full webrev is here:

open webrev at http://cr.openjdk.java.net/~coleenp/7133093.04/webrev

The only change from .03 is

http://cr.openjdk.java.net/~coleenp/7133093.04/webrev/src/share/vm/classfile/classLoaderData.cpp.frames.html

lines 595-630.   Please review!

Thanks,
Coleen


On 7/19/17 10:52 PM, coleen.phillimore at oracle.com wrote:
>
> Hi, I've made the changes in the replies below and retested with both 
> our internal NSK tests, jtreg tests and jck tests.  Also I've rerun 
> the performance tests, including SPECjbb2015.
>
> An incremental webrev can be found at:
> http://cr.openjdk.java.net/~coleenp/7133093.03.incremental/webrev/
>
> Latest version at:
>
> http://cr.openjdk.java.net/~coleenp/7133093.03/webrev/
>
> Thanks,
> Coleen
>
>
> On 7/18/17 10:56 AM, coleen.phillimore at oracle.com wrote:
>>
>> Karen,  Thank you for reviewing the code and for the discussions and 
>> consulting along the way.
>>
>> On 7/17/17 4:10 PM, Karen Kinnear wrote:
>>> Feedback on the initial web rev:
>>> Thank you for using web rev - that way I can do searches of the 
>>> overall patch!
>>>
>>> First - many thanks for doing this exercise so carefully, for 
>>> following the specification in continuing
>>> to record the initiating loader and for splitting this effort into 
>>> multiple phases to allow further exploration
>>> of the current expectations/behavior relative to class unloading.
>>>
>>> And thank you for all the testing.
>>>
>>> Just wanted to ensure that I understand some of the assumptions:
>>>
>>> 1) lock free reads of dictionaries - yes
>> Yes.
>>> 2) lock free adds of entries to dictionaries - yes
>> No, adds still take out the SystemDictionary_lock.
>>> 3) removal of entries or entire dictionaries - at safe point only
>> Yes.
>>> 4) rehashing - none today and none planned
>> Yes.
>>> 5) resizing of dictionaries - none today, some planned
>> Yes.
>>> 6) use of SystemDictionary_lock: essentially unchanged
>> Yes.
>>> 7) atomicity guarantees relative to checking loader constraints and 
>>> updates to the dictionary: unchanged
>> Yes.
>>>
>>> I did not review the SA/vmStructs - sounds like you had Jini George 
>>> do that - great.
>>>
>>> Minor questions/comments:
>>>
>>> SystemDictionary::verify: "Verify of system dictionary failed" ->
>>> looks like "Verify of placeholders failed"
>>
>> Fixed.  Good find.  It's an odd assert:
>>
>>   guarantee(placeholders()->number_of_entries() >= 0,
>>             "Verify of placeholders failed");
>>
>>>
>>> thank you for asserts dictionary != NULL
>>> Dictionary::print
>>>   is it worth an assert that loader_data != NULL?
>>
>> Added.  It's already asserted in Dictionary::verify()
>>
>>> Can you please sanity check if there are other cases in which we need
>>> to use dictionary_or_null() rather than dictionary()?
>>>  - are there cases in which we could be using a dictionary from an 
>>> anonymous class?
>>>   e.g. there are a lot of places in systemDictionary.cpp where we
>>>   call class_loader_data(class_loader); loader_data->dictionary();
>>>      - was the lazily allocated dictionary allocated yet (see next 
>>> question)
>>
>> There's an assert in dictionary() that the CLD isn't anonymous. In 
>> these calls we get class_loader_data from the class_loader which is 
>> always the host class loader for anonymous classes, so this doesn't 
>> break with the current design.
>>
>>>
>>> CLD.cpp
>>>   Would it be less complex to eagerly create the dictionary for a CLD
>>>   since it is extremely rare to not need one for non-anonymous CLDs?
>>>   You could create before publishing the CLD
>>
>> Yes, I agree it would be better this way.  I was looking at some data 
>> and it does seem rare to have a CLD with no classes.
>>>
>>> Terminology:
>>>   How are you still using System Dictionary term?
>>>   It still shows up in some comments and in some printed strings
>>
>> Bad habit, we've been calling it that for years. :)  I'll update the 
>> comments where it makes sense.
>> I tried to refer to the ClassLoaderDataGraph dictionaries 
>> collectively as the "System Dictionary", as well as the loader 
>> constraints and placeholder tables, and the other things that 
>> SystemDictionary contains.
>>
>> I guess it gets confusing, but it's less awkward than "class loader 
>> data dictionaries", and more meaningful than simply "dictionaries". 
>> So that's why I didn't change a lot of the comments.
>>
>>>
>>> biasedLocking.cpp, line 59
>>> jvmtiGetLoadedClasses: line 298
>>> dictionary.hpp lines 113,114, 120
>>> dictionary.cpp 262-263, 478
>>> dictionary.cpp 491
>>>   fix comments e.g. "Iterate the system dictionary" ...
>>>   - probably worth checking for other comments in the sources too
>>>   unless these all appear to match new usage model
>>
>> Fixed.  I've found a couple more and some other comments that are 
>> incorrect.
>>>
>>> hashTable.hpp
>>> line 268: what does this comment mean?
>>>   "Don't create and use freelist"?
>>>   I assume this means that this is a direct malloc, not using any block
>>>   allocation, e.g. metaspace chunks, and this requires explicit 
>>> freeing?
>>>
>>
>>   // Don't create and use freelist of HashtableEntry.
>>
>> The default Hashtable<> allocates entries from a block of entries and 
>> uses a freelist for freed entries.
>>
>>>
>>> dictionary.hpp line 161
>>>   could you possibly clean up comment while you are in here?
>>>   I presume "a protection" is "a protection domain"?
>>
>> Fixed.
>>
>>>
>>> dictionary.cpp
>>>   why change from Hashtable::free_entry to explicit
>>> unlink_entry/FREE_C_HEAP_ARRAY?
>>
>> I used the same code as the ModuleTable.  The default hashtable 
>> doesn't delete the whole thing and left some memory around, if I 
>> remember correctly.  I'll recheck it.
>>>
>>> Future optimization:
>>> I think that none of the built-in loaders is ever recorded as an 
>>> initiating
>>> loader for any class in a loader which permits unloading.
>>> Note: we need to ensure that built-in includes boot, platform and
>>> application class loader (and clean up any confusion with system
>>> class loader - which can be set by the user to a custom class loader)
>>>
>> I think AppClassLoader can initiate loading and is a builtin loader.
>>
>> Dictionary for class loader class loader 0x00007f6abc6e3210 a 
>> 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000101c8df00}
>> Java system dictionary (table_size=107, classes=1)
>> ^ indicates that initiating loader is different from defining loader
>>  106: ^java.util.ArrayList, loader NULL class_loader
>>
>>>
>>> Any chance of changing DictionaryEntry* probe->klass() to be
>>> probe->instanceKlass()?
>>
>> Okay.  Changed to instance_klass() as per coding standard.
>>>
>>> Thank you for assertion _is_at_safepoint for try_get_next_class
>>>
>>> CLD.cpp
>>>   If the names of the classes_do vs. all_entries_do were modified to be
>>>   defining_loader_classes_do vs. all_entries_classes_do or some way
>>>   to make the distinction in the name that would be helpful.
>>>   Otherwise, if you could please add comments to the 
>>> CLDG::dictionary_classes_do
>>>   etc. to distinguish these.
>>
>> I added comments to distinguish these.
>>>
>>> CLD.cpp
>>>   lines 1200-1201
>>>   Can you please fix the comment?
>>>   I think you need separate comments for the ModuleEntry 
>>> reads/PackageEntry's
>>> exports which are looking for dead modules.
>>>   IIUC, the do_unloading is actually looking for classes for which this
>>>   is the initating loader, but their defining loader is dead.
>>       // Remove entries in the dictionary of live class loader that have
>>       // initiated loading classes in a dead class loader.
>>       data->dictionary()->do_unloading();
>>
>> Moved the module comment to before the module code and added this for 
>> dictionary->do_unloading() call.
>>>
>>> systemDictionary.cpp: 445-447
>>>   THANK you for the comments about the java_mirror handle keeping 
>>> the class
>>>   and class loader alive
>>>   Could you clarify that the class_loader handle is not for the class
>>>   we are checking, but rather for the initiating, i.e. requesting 
>>> class loader
>>>   (is that accurate?) So we are keeping both the requestor and the
>>>   requested class alive
>>
>> Yes, that is true.   I added
>>
>>   // The class_loader handle passed in is the initiating loader.
>>
>>>
>>> You removed the comment that compute_hash could lead to a safepoint/GC
>>>   Is that not possible?
>>>   comment in ObjectSynchronizer::FastHashCode says that with 
>>> UseBiasedLocking
>>>     we can revoke biases and must be able to handle a safepoint.
>>
>> I made it not possible with this change.   The hashcode is now 
>> computed by the Symbol* of the Klass name, which is random 
>> (os::random plus some bits from the address and first two characters, 
>> which we did a study of for bug 
>> https://bugs.openjdk.java.net/browse/JDK-8181450).
>>
>> Also, I changed ClassLoaderData::identity_hash() to use the address 
>> >> 3 because the addresses of class loader datas are random enough 
>> for the one use of this function. Previously, the loader_data 
>> identity hash used to go to the class_loader oop and get to 
>> ObjectSynchronizer::FastHashCode (which was also os::random but now 
>> changed).
>>
>> The dictionary hashing doesn't need to add in the hash for the 
>> loader_data now.
>>>
>>> systemDictionary.cpp:
>>>    NoSafepointVerifier
>>>      If we add resizing logic later, and that happens at a safepoint,
>>>      then the number of buckets change and the d_index is no longer 
>>> valid
>>>      so we would need to add back the NSpV?
>>>     - this is a more general problem if compute_hash can lead to a 
>>> safepoint/GC
>>
>> Yes this is true, except that compute_hash doesn't safepoint. 
>> Computing the hash and index should be done together with the 
>> dictionary operations.  The function resolve_instance_class_or_null 
>> computes the hash and index once at the beginning and reuses it for 
>> several lookups, which may or may not affect performance.  So we need 
>> to measure this in isolation before changing this.
>>>
>>> systemDictionary.cpp: parse_stream
>>>    why did you remove the record_dependency?
>>
>> I realized it's trivially unnecessary.   The class_loader is reached 
>> by the ClassLoaderData::_class_loader field, so we don't have to 
>> record a dependency from host_klass->class_loader in the anonymous 
>> class loader data.  (I just checked this again because it's confusing).
>>
>>>
>>> Who calls Dictionary::~Dictionary?
>>
>> The ClassLoaderData::~ClassLoaderData() calls delete _dictionary, 
>> which calls the destructor.
>>>
>>> Performance:
>>>   Do you have performance runs of our existing benchmarks?
>>
>> Yes, I ran our internal benchmarks and added links to the internal 
>> wiki page.   I'll rerun them again before checkin.
>>
>>>   Initial key requirement is that this not slow down throughput through
>>>   slowing down lookups.
>>>   I know you are very aware of longer-term goals for reducing 
>>> safepoint pause
>>>   times and not increasing footprint significantly.
>>>
>>> hashtable sizes: future rfe?
>>>   given that we now have a lot more dictionaries, and they perhaps need
>>>   individual sizing
>>>    - would it be helpful to add statistics information about
>>>    - e.g. lookup_count, lookup_length, _lookup_warning
>>>    - i.e. number of entries, max lookup depth, etc. to use later for 
>>> tuning?
>>
>> All the previous statistics didn't make very much sense to us, so 
>> they were removed.  There is now logging for all the hashtables:
>>
>> busaa027.us.oracle.com% java -Xlog:hashtables -version
>> java version "10-internal"
>> Java(TM) SE Runtime Environment (fastdebug build 
>> 10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2)
>> Java HotSpot(TM) 64-Bit Server VM (fastdebug build 
>> 10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2, mixed 
>> mode)
>> [0.284s][info][hashtables] System Dictionary for 
>> jdk/internal/loader/ClassLoaders$AppClassLoader max bucket size 1 
>> bucket 91 element count 1 table size 107
>> [0.285s][info][hashtables] System Dictionary for <bootloader> max 
>> bucket size 4 bucket 263 element count 468 table size 1009
>> [0.285s][info][hashtables] Placeholder Table max bucket size 0 bucket 
>> 0 element count 0 table size 1009
>> [0.285s][info][hashtables] Protection Domain Table max bucket size 0 
>> bucket 0 element count 0 table size 1009
>>
>>>    - might make sense to make this JFR data?
>>
>> I'll file an RFE to add tracing data with our tools.  I'll need some 
>> help with this part but the data is there.
>>>
>>>    Might think about a follow-up with more tuning parameters than
>>>    a single global PredictedLoadedClassCount.
>>>
>>
>> I filed RFE to work on resizing, as this work was almost complete, 
>> but didn't show performance improvements at the time. It would be 
>> nice to have customers not need tuning parameters, but I guess these 
>> things are sometimes necessary.
>>
>> Thank you for the thorough review and all of your help with this! 
>> I've made these changes and will rerun testing and send out a new 
>> webrev and an incremental webrev.
>>
>> Thanks!
>> Coleen
>>
>>>
>>> thanks,
>>> Karen
>>>
>>>
>>>> On Jul 17, 2017, at 2:37 PM, coleen.phillimore at oracle.com 
>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>
>>>>
>>>> Hi, I have made changes for Ioi's comments below.  The largest 
>>>> change since .01 is that PredictedLoadedClassCount is used to size 
>>>> dictionaries for class loader data, and should be used to specify 
>>>> the average number of classes loaded by each class loader.  Since 
>>>> this is an option available via UnlockExperimentalVMOptions, I'm 
>>>> not sure what the process is for changing the meaning of this 
>>>> option. We will work with our customers who have used this option 
>>>> for specifying the total number of classes loaded.   We are also 
>>>> working on the change to allow automatic resizing of the 
>>>> dictionaries during safepoints.
>>>>
>>>> See:
>>>> open webrev at 
>>>> http://cr.openjdk.java.net/~coleenp/7133093.02/webrev 
>>>> <http://cr.openjdk.java.net/%7Ecoleenp/7133093.02/webrev>
>>>>
>>>> Tested with the full nightly test set (jdk, nsk, hotspot jtreg 
>>>> tests) on linux x64.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 6/30/17 4:45 PM, coleen.phillimore at oracle.com 
>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>
>>>>> Ioi,  Thank you for looking at this change.
>>>>>
>>>>> On 6/30/17 2:39 PM, Ioi Lam wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> Maybe the bug should be renamed to "Use one Dictionary per class 
>>>>>> loader instance"? That way it's more obvious what it is when you 
>>>>>> look at the repo history.
>>>>>
>>>>> I can do that.  I made it One Dictionary per ClassLoaderData
>>>>>>
>>>>>> 1.
>>>>>>
>>>>>> Why is assert_locked_or_safepoint(SystemDictionary_lock) 
>>>>>> necessary in SystemDictionary::find_class (line 1826), but not 
>>>>>> necessary SystemDictionary::find (line 951)? Since you removed 
>>>>>> NoSafepointVerifier nosafepoint in the latter, maybe this means 
>>>>>> it's safe to remove the assert_locked_or_safepoint in the former?
>>>>>
>>>>> The call to SystemDictionary::find() is the (I believe) usual lock 
>>>>> free lookup and the SystemDictionary::find_class() is used to 
>>>>> verify that a class we're about to add or want to add hasn't been 
>>>>> added by another thread.   Or certain cases where we already have 
>>>>> a lock to do something else, like add loader constraints.  I took 
>>>>> out the NoSafepointVerifier because it assumes that system 
>>>>> dictionary entries would be moved by GC, which they won't.  The 
>>>>> old hash function could safepoint when getting the hash for the 
>>>>> class_loader object.
>>>>>
>>>>>>
>>>>>> 2.
>>>>>>
>>>>>> 455 static ClassLoaderData* _current_loader_data = NULL;
>>>>>> 456 static Klass*           _current_class_entry = NULL;
>>>>>> 457
>>>>>> 458 InstanceKlass* ClassLoaderDataGraph::try_get_next_class() {
>>>>>>
>>>>>> How about moving the static fields into an iterator object. That 
>>>>>> way you don't need to keep track of the globals
>>>>>>
>>>>>> ClassLoaderDataGraphIterator {
>>>>>>   ClassLoaderData* _current_loader_data
>>>>>>   Klass*           _current_class_entry;
>>>>>>
>>>>>>   InstanceKlass* try_get_next_class() { ....}
>>>>>> };
>>>>>
>>>>> Ok, there's a different iterator that iterates over all of the 
>>>>> classes for GC.  I will adapt that for this use. That would be 
>>>>> better.
>>>>>
>>>>>>
>>>>>> 3. Double check locking in ClassLoaderData::dictionary() -- 
>>>>>> someone else should look at this :-)
>>>>>
>>>>> I copied code that David Holmes added for modules.
>>>>>
>>>>>>
>>>>>> 4. We may need a better strategy for deciding the size of each 
>>>>>> dictionary.
>>>>>>
>>>>>> 565 const int  _primelist[10] = {1, 107, 1009};
>>>>>> 571 Dictionary* ClassLoaderData::dictionary() {
>>>>>> 579     if ((dictionary = _dictionary) == NULL) {
>>>>>> 580       int size;
>>>>>> 581       if (this == the_null_class_loader_data() || 
>>>>>> is_system_class_loader_data()) {
>>>>>> 582         size = _primelist[2];
>>>>>> 583       } else if 
>>>>>> (class_loader()->is_a(SystemDictionary::reflect_DelegatingClassLoader_klass())) 
>>>>>> {
>>>>>> 584         size = _primelist[0];
>>>>>> 585       } else {
>>>>>> 586         size = _primelist[1];
>>>>>> 587       }
>>>>>> 588       dictionary = new Dictionary(this, size);
>>>>>>
>>>>>> I'll do some investigation on this issue and get back to you.
>>>>>>
>>>>>
>>>>> How about if someone uses PredictedLoadedClassCount, then we use 
>>>>> that to size all but the reflection and boot class loader?  Then 
>>>>> if there's an application that has a class loader with a huge 
>>>>> amount of classes loaded in it, that would help this?  It might 
>>>>> cost some footprint but an oversized table would simply be a 
>>>>> bigger array of pointers, so it might not be that bad to oversize.
>>>>>
>>>>> I think the long term solution is being able to resize these 
>>>>> entries and make the application provide arguments.   Please let 
>>>>> me know what you find in your investigation and if that would work.
>>>>>> The rest of the changes look good to me.
>>>>>
>>>>> Thank you!
>>>>> Coleen
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>> On 6/23/17 4:42 PM, coleen.phillimore at oracle.com 
>>>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>>> Summary: Implement one dictionary per ClassLoaderData for faster 
>>>>>>> lookup and removal during class unloading
>>>>>>>
>>>>>>> See RFE for more details.
>>>>>>>
>>>>>>> open webrev at 
>>>>>>> http://cr.openjdk.java.net/~coleenp/7133093.01/webrev 
>>>>>>> <http://cr.openjdk.java.net/%7Ecoleenp/7133093.01/webrev>
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-7133093
>>>>>>>
>>>>>>> Tested with full "nightly" run in rbt, plus locally class 
>>>>>>> loading and unloading tests:
>>>>>>>
>>>>>>> jtreg hotspot/test/runtime/ClassUnload
>>>>>>>
>>>>>>> jtreg hotspot/test/runtime/modules
>>>>>>>
>>>>>>> jtreg hotspot/test/gc/class_unloading
>>>>>>>
>>>>>>> make test-hotspot-closed-tonga FILTER=quick TEST_JOBS=4 
>>>>>>> TEST=vm.parallel_class_loading
>>>>>>>
>>>>>>> csh ~/testing/run_jck9  (vm/lang/java_lang)
>>>>>>>
>>>>>>> runThese -jck - uses class loader isolation to run each jck test 
>>>>>>> and unloads tests when done (at -gc:5 intervals)
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-dev mailing list