RFR (L) 7133093: Improve system dictionary performance
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Fri Jul 21 18:28:05 UTC 2017
From internal discussions, the only sizing of the dictionary for the
java.system.class.loader will follow PredictedLoadedClassCount until
dictionary resizing is done. The latest full webrev is here:
open webrev at http://cr.openjdk.java.net/~coleenp/7133093.04/webrev
The only change from .03 is
http://cr.openjdk.java.net/~coleenp/7133093.04/webrev/src/share/vm/classfile/classLoaderData.cpp.frames.html
lines 595-630. Please review!
Thanks,
Coleen
On 7/19/17 10:52 PM, coleen.phillimore at oracle.com wrote:
>
> Hi, I've made the changes in the replies below and retested with both
> our internal NSK tests, jtreg tests and jck tests. Also I've rerun
> the performance tests, including SPECjbb2015.
>
> An incremental webrev can be found at:
> http://cr.openjdk.java.net/~coleenp/7133093.03.incremental/webrev/
>
> Latest version at:
>
> http://cr.openjdk.java.net/~coleenp/7133093.03/webrev/
>
> Thanks,
> Coleen
>
>
> On 7/18/17 10:56 AM, coleen.phillimore at oracle.com wrote:
>>
>> Karen, Thank you for reviewing the code and for the discussions and
>> consulting along the way.
>>
>> On 7/17/17 4:10 PM, Karen Kinnear wrote:
>>> Feedback on the initial web rev:
>>> Thank you for using web rev - that way I can do searches of the
>>> overall patch!
>>>
>>> First - many thanks for doing this exercise so carefully, for
>>> following the specification in continuing
>>> to record the initiating loader and for splitting this effort into
>>> multiple phases to allow further exploration
>>> of the current expectations/behavior relative to class unloading.
>>>
>>> And thank you for all the testing.
>>>
>>> Just wanted to ensure that I understand some of the assumptions:
>>>
>>> 1) lock free reads of dictionaries - yes
>> Yes.
>>> 2) lock free adds of entries to dictionaries - yes
>> No, adds still take out the SystemDictionary_lock.
>>> 3) removal of entries or entire dictionaries - at safe point only
>> Yes.
>>> 4) rehashing - none today and none planned
>> Yes.
>>> 5) resizing of dictionaries - none today, some planned
>> Yes.
>>> 6) use of SystemDictionary_lock: essentially unchanged
>> Yes.
>>> 7) atomicity guarantees relative to checking loader constraints and
>>> updates to the dictionary: unchanged
>> Yes.
>>>
>>> I did not review the SA/vmStructs - sounds like you had Jini George
>>> do that - great.
>>>
>>> Minor questions/comments:
>>>
>>> SystemDictionary::verify: "Verify of system dictionary failed" ->
>>> looks like "Verify of placeholders failed"
>>
>> Fixed. Good find. It's an odd assert:
>>
>> guarantee(placeholders()->number_of_entries() >= 0,
>> "Verify of placeholders failed");
>>
>>>
>>> thank you for asserts dictionary != NULL
>>> Dictionary::print
>>> is it worth an assert that loader_data != NULL?
>>
>> Added. It's already asserted in Dictionary::verify()
>>
>>> Can you please sanity check if there are other cases in which we need
>>> to use dictionary_or_null() rather than dictionary()?
>>> - are there cases in which we could be using a dictionary from an
>>> anonymous class?
>>> e.g. there are a lot of places in systemDictionary.cpp where we
>>> call class_loader_data(class_loader); loader_data->dictionary();
>>> - was the lazily allocated dictionary allocated yet (see next
>>> question)
>>
>> There's an assert in dictionary() that the CLD isn't anonymous. In
>> these calls we get class_loader_data from the class_loader which is
>> always the host class loader for anonymous classes, so this doesn't
>> break with the current design.
>>
>>>
>>> CLD.cpp
>>> Would it be less complex to eagerly create the dictionary for a CLD
>>> since it is extremely rare to not need one for non-anonymous CLDs?
>>> You could create before publishing the CLD
>>
>> Yes, I agree it would be better this way. I was looking at some data
>> and it does seem rare to have a CLD with no classes.
>>>
>>> Terminology:
>>> How are you still using System Dictionary term?
>>> It still shows up in some comments and in some printed strings
>>
>> Bad habit, we've been calling it that for years. :) I'll update the
>> comments where it makes sense.
>> I tried to refer to the ClassLoaderDataGraph dictionaries
>> collectively as the "System Dictionary", as well as the loader
>> constraints and placeholder tables, and the other things that
>> SystemDictionary contains.
>>
>> I guess it gets confusing, but it's less awkward than "class loader
>> data dictionaries", and more meaningful than simply "dictionaries".
>> So that's why I didn't change a lot of the comments.
>>
>>>
>>> biasedLocking.cpp, line 59
>>> jvmtiGetLoadedClasses: line 298
>>> dictionary.hpp lines 113,114, 120
>>> dictionary.cpp 262-263, 478
>>> dictionary.cpp 491
>>> fix comments e.g. "Iterate the system dictionary" ...
>>> - probably worth checking for other comments in the sources too
>>> unless these all appear to match new usage model
>>
>> Fixed. I've found a couple more and some other comments that are
>> incorrect.
>>>
>>> hashTable.hpp
>>> line 268: what does this comment mean?
>>> "Don't create and use freelist"?
>>> I assume this means that this is a direct malloc, not using any block
>>> allocation, e.g. metaspace chunks, and this requires explicit
>>> freeing?
>>>
>>
>> // Don't create and use freelist of HashtableEntry.
>>
>> The default Hashtable<> allocates entries from a block of entries and
>> uses a freelist for freed entries.
>>
>>>
>>> dictionary.hpp line 161
>>> could you possibly clean up comment while you are in here?
>>> I presume "a protection" is "a protection domain"?
>>
>> Fixed.
>>
>>>
>>> dictionary.cpp
>>> why change from Hashtable::free_entry to explicit
>>> unlink_entry/FREE_C_HEAP_ARRAY?
>>
>> I used the same code as the ModuleTable. The default hashtable
>> doesn't delete the whole thing and left some memory around, if I
>> remember correctly. I'll recheck it.
>>>
>>> Future optimization:
>>> I think that none of the built-in loaders is ever recorded as an
>>> initiating
>>> loader for any class in a loader which permits unloading.
>>> Note: we need to ensure that built-in includes boot, platform and
>>> application class loader (and clean up any confusion with system
>>> class loader - which can be set by the user to a custom class loader)
>>>
>> I think AppClassLoader can initiate loading and is a builtin loader.
>>
>> Dictionary for class loader class loader 0x00007f6abc6e3210 a
>> 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000101c8df00}
>> Java system dictionary (table_size=107, classes=1)
>> ^ indicates that initiating loader is different from defining loader
>> 106: ^java.util.ArrayList, loader NULL class_loader
>>
>>>
>>> Any chance of changing DictionaryEntry* probe->klass() to be
>>> probe->instanceKlass()?
>>
>> Okay. Changed to instance_klass() as per coding standard.
>>>
>>> Thank you for assertion _is_at_safepoint for try_get_next_class
>>>
>>> CLD.cpp
>>> If the names of the classes_do vs. all_entries_do were modified to be
>>> defining_loader_classes_do vs. all_entries_classes_do or some way
>>> to make the distinction in the name that would be helpful.
>>> Otherwise, if you could please add comments to the
>>> CLDG::dictionary_classes_do
>>> etc. to distinguish these.
>>
>> I added comments to distinguish these.
>>>
>>> CLD.cpp
>>> lines 1200-1201
>>> Can you please fix the comment?
>>> I think you need separate comments for the ModuleEntry
>>> reads/PackageEntry's
>>> exports which are looking for dead modules.
>>> IIUC, the do_unloading is actually looking for classes for which this
>>> is the initating loader, but their defining loader is dead.
>> // Remove entries in the dictionary of live class loader that have
>> // initiated loading classes in a dead class loader.
>> data->dictionary()->do_unloading();
>>
>> Moved the module comment to before the module code and added this for
>> dictionary->do_unloading() call.
>>>
>>> systemDictionary.cpp: 445-447
>>> THANK you for the comments about the java_mirror handle keeping
>>> the class
>>> and class loader alive
>>> Could you clarify that the class_loader handle is not for the class
>>> we are checking, but rather for the initiating, i.e. requesting
>>> class loader
>>> (is that accurate?) So we are keeping both the requestor and the
>>> requested class alive
>>
>> Yes, that is true. I added
>>
>> // The class_loader handle passed in is the initiating loader.
>>
>>>
>>> You removed the comment that compute_hash could lead to a safepoint/GC
>>> Is that not possible?
>>> comment in ObjectSynchronizer::FastHashCode says that with
>>> UseBiasedLocking
>>> we can revoke biases and must be able to handle a safepoint.
>>
>> I made it not possible with this change. The hashcode is now
>> computed by the Symbol* of the Klass name, which is random
>> (os::random plus some bits from the address and first two characters,
>> which we did a study of for bug
>> https://bugs.openjdk.java.net/browse/JDK-8181450).
>>
>> Also, I changed ClassLoaderData::identity_hash() to use the address
>> >> 3 because the addresses of class loader datas are random enough
>> for the one use of this function. Previously, the loader_data
>> identity hash used to go to the class_loader oop and get to
>> ObjectSynchronizer::FastHashCode (which was also os::random but now
>> changed).
>>
>> The dictionary hashing doesn't need to add in the hash for the
>> loader_data now.
>>>
>>> systemDictionary.cpp:
>>> NoSafepointVerifier
>>> If we add resizing logic later, and that happens at a safepoint,
>>> then the number of buckets change and the d_index is no longer
>>> valid
>>> so we would need to add back the NSpV?
>>> - this is a more general problem if compute_hash can lead to a
>>> safepoint/GC
>>
>> Yes this is true, except that compute_hash doesn't safepoint.
>> Computing the hash and index should be done together with the
>> dictionary operations. The function resolve_instance_class_or_null
>> computes the hash and index once at the beginning and reuses it for
>> several lookups, which may or may not affect performance. So we need
>> to measure this in isolation before changing this.
>>>
>>> systemDictionary.cpp: parse_stream
>>> why did you remove the record_dependency?
>>
>> I realized it's trivially unnecessary. The class_loader is reached
>> by the ClassLoaderData::_class_loader field, so we don't have to
>> record a dependency from host_klass->class_loader in the anonymous
>> class loader data. (I just checked this again because it's confusing).
>>
>>>
>>> Who calls Dictionary::~Dictionary?
>>
>> The ClassLoaderData::~ClassLoaderData() calls delete _dictionary,
>> which calls the destructor.
>>>
>>> Performance:
>>> Do you have performance runs of our existing benchmarks?
>>
>> Yes, I ran our internal benchmarks and added links to the internal
>> wiki page. I'll rerun them again before checkin.
>>
>>> Initial key requirement is that this not slow down throughput through
>>> slowing down lookups.
>>> I know you are very aware of longer-term goals for reducing
>>> safepoint pause
>>> times and not increasing footprint significantly.
>>>
>>> hashtable sizes: future rfe?
>>> given that we now have a lot more dictionaries, and they perhaps need
>>> individual sizing
>>> - would it be helpful to add statistics information about
>>> - e.g. lookup_count, lookup_length, _lookup_warning
>>> - i.e. number of entries, max lookup depth, etc. to use later for
>>> tuning?
>>
>> All the previous statistics didn't make very much sense to us, so
>> they were removed. There is now logging for all the hashtables:
>>
>> busaa027.us.oracle.com% java -Xlog:hashtables -version
>> java version "10-internal"
>> Java(TM) SE Runtime Environment (fastdebug build
>> 10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2)
>> Java HotSpot(TM) 64-Bit Server VM (fastdebug build
>> 10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2, mixed
>> mode)
>> [0.284s][info][hashtables] System Dictionary for
>> jdk/internal/loader/ClassLoaders$AppClassLoader max bucket size 1
>> bucket 91 element count 1 table size 107
>> [0.285s][info][hashtables] System Dictionary for <bootloader> max
>> bucket size 4 bucket 263 element count 468 table size 1009
>> [0.285s][info][hashtables] Placeholder Table max bucket size 0 bucket
>> 0 element count 0 table size 1009
>> [0.285s][info][hashtables] Protection Domain Table max bucket size 0
>> bucket 0 element count 0 table size 1009
>>
>>> - might make sense to make this JFR data?
>>
>> I'll file an RFE to add tracing data with our tools. I'll need some
>> help with this part but the data is there.
>>>
>>> Might think about a follow-up with more tuning parameters than
>>> a single global PredictedLoadedClassCount.
>>>
>>
>> I filed RFE to work on resizing, as this work was almost complete,
>> but didn't show performance improvements at the time. It would be
>> nice to have customers not need tuning parameters, but I guess these
>> things are sometimes necessary.
>>
>> Thank you for the thorough review and all of your help with this!
>> I've made these changes and will rerun testing and send out a new
>> webrev and an incremental webrev.
>>
>> Thanks!
>> Coleen
>>
>>>
>>> thanks,
>>> Karen
>>>
>>>
>>>> On Jul 17, 2017, at 2:37 PM, coleen.phillimore at oracle.com
>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>
>>>>
>>>> Hi, I have made changes for Ioi's comments below. The largest
>>>> change since .01 is that PredictedLoadedClassCount is used to size
>>>> dictionaries for class loader data, and should be used to specify
>>>> the average number of classes loaded by each class loader. Since
>>>> this is an option available via UnlockExperimentalVMOptions, I'm
>>>> not sure what the process is for changing the meaning of this
>>>> option. We will work with our customers who have used this option
>>>> for specifying the total number of classes loaded. We are also
>>>> working on the change to allow automatic resizing of the
>>>> dictionaries during safepoints.
>>>>
>>>> See:
>>>> open webrev at
>>>> http://cr.openjdk.java.net/~coleenp/7133093.02/webrev
>>>> <http://cr.openjdk.java.net/%7Ecoleenp/7133093.02/webrev>
>>>>
>>>> Tested with the full nightly test set (jdk, nsk, hotspot jtreg
>>>> tests) on linux x64.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 6/30/17 4:45 PM, coleen.phillimore at oracle.com
>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>
>>>>> Ioi, Thank you for looking at this change.
>>>>>
>>>>> On 6/30/17 2:39 PM, Ioi Lam wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> Maybe the bug should be renamed to "Use one Dictionary per class
>>>>>> loader instance"? That way it's more obvious what it is when you
>>>>>> look at the repo history.
>>>>>
>>>>> I can do that. I made it One Dictionary per ClassLoaderData
>>>>>>
>>>>>> 1.
>>>>>>
>>>>>> Why is assert_locked_or_safepoint(SystemDictionary_lock)
>>>>>> necessary in SystemDictionary::find_class (line 1826), but not
>>>>>> necessary SystemDictionary::find (line 951)? Since you removed
>>>>>> NoSafepointVerifier nosafepoint in the latter, maybe this means
>>>>>> it's safe to remove the assert_locked_or_safepoint in the former?
>>>>>
>>>>> The call to SystemDictionary::find() is the (I believe) usual lock
>>>>> free lookup and the SystemDictionary::find_class() is used to
>>>>> verify that a class we're about to add or want to add hasn't been
>>>>> added by another thread. Or certain cases where we already have
>>>>> a lock to do something else, like add loader constraints. I took
>>>>> out the NoSafepointVerifier because it assumes that system
>>>>> dictionary entries would be moved by GC, which they won't. The
>>>>> old hash function could safepoint when getting the hash for the
>>>>> class_loader object.
>>>>>
>>>>>>
>>>>>> 2.
>>>>>>
>>>>>> 455 static ClassLoaderData* _current_loader_data = NULL;
>>>>>> 456 static Klass* _current_class_entry = NULL;
>>>>>> 457
>>>>>> 458 InstanceKlass* ClassLoaderDataGraph::try_get_next_class() {
>>>>>>
>>>>>> How about moving the static fields into an iterator object. That
>>>>>> way you don't need to keep track of the globals
>>>>>>
>>>>>> ClassLoaderDataGraphIterator {
>>>>>> ClassLoaderData* _current_loader_data
>>>>>> Klass* _current_class_entry;
>>>>>>
>>>>>> InstanceKlass* try_get_next_class() { ....}
>>>>>> };
>>>>>
>>>>> Ok, there's a different iterator that iterates over all of the
>>>>> classes for GC. I will adapt that for this use. That would be
>>>>> better.
>>>>>
>>>>>>
>>>>>> 3. Double check locking in ClassLoaderData::dictionary() --
>>>>>> someone else should look at this :-)
>>>>>
>>>>> I copied code that David Holmes added for modules.
>>>>>
>>>>>>
>>>>>> 4. We may need a better strategy for deciding the size of each
>>>>>> dictionary.
>>>>>>
>>>>>> 565 const int _primelist[10] = {1, 107, 1009};
>>>>>> 571 Dictionary* ClassLoaderData::dictionary() {
>>>>>> 579 if ((dictionary = _dictionary) == NULL) {
>>>>>> 580 int size;
>>>>>> 581 if (this == the_null_class_loader_data() ||
>>>>>> is_system_class_loader_data()) {
>>>>>> 582 size = _primelist[2];
>>>>>> 583 } else if
>>>>>> (class_loader()->is_a(SystemDictionary::reflect_DelegatingClassLoader_klass()))
>>>>>> {
>>>>>> 584 size = _primelist[0];
>>>>>> 585 } else {
>>>>>> 586 size = _primelist[1];
>>>>>> 587 }
>>>>>> 588 dictionary = new Dictionary(this, size);
>>>>>>
>>>>>> I'll do some investigation on this issue and get back to you.
>>>>>>
>>>>>
>>>>> How about if someone uses PredictedLoadedClassCount, then we use
>>>>> that to size all but the reflection and boot class loader? Then
>>>>> if there's an application that has a class loader with a huge
>>>>> amount of classes loaded in it, that would help this? It might
>>>>> cost some footprint but an oversized table would simply be a
>>>>> bigger array of pointers, so it might not be that bad to oversize.
>>>>>
>>>>> I think the long term solution is being able to resize these
>>>>> entries and make the application provide arguments. Please let
>>>>> me know what you find in your investigation and if that would work.
>>>>>> The rest of the changes look good to me.
>>>>>
>>>>> Thank you!
>>>>> Coleen
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>> On 6/23/17 4:42 PM, coleen.phillimore at oracle.com
>>>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>>> Summary: Implement one dictionary per ClassLoaderData for faster
>>>>>>> lookup and removal during class unloading
>>>>>>>
>>>>>>> See RFE for more details.
>>>>>>>
>>>>>>> open webrev at
>>>>>>> http://cr.openjdk.java.net/~coleenp/7133093.01/webrev
>>>>>>> <http://cr.openjdk.java.net/%7Ecoleenp/7133093.01/webrev>
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-7133093
>>>>>>>
>>>>>>> Tested with full "nightly" run in rbt, plus locally class
>>>>>>> loading and unloading tests:
>>>>>>>
>>>>>>> jtreg hotspot/test/runtime/ClassUnload
>>>>>>>
>>>>>>> jtreg hotspot/test/runtime/modules
>>>>>>>
>>>>>>> jtreg hotspot/test/gc/class_unloading
>>>>>>>
>>>>>>> make test-hotspot-closed-tonga FILTER=quick TEST_JOBS=4
>>>>>>> TEST=vm.parallel_class_loading
>>>>>>>
>>>>>>> csh ~/testing/run_jck9 (vm/lang/java_lang)
>>>>>>>
>>>>>>> runThese -jck - uses class loader isolation to run each jck test
>>>>>>> and unloads tests when done (at -gc:5 intervals)
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-dev
mailing list