RFR (S) 8213587 - Speed up CDS dump time by using resizable hashtables

Tue Nov 20 18:10:56 UTC 2018

Hi Gerard,

I've updated the patch according to your comments. Please take a look.

http://cr.openjdk.java.net/~iklam/jdk12/8213587-resize-cds-hashtables.v05-delta/

Thanks

- Ioi

On 11/20/18 7:53 AM, Gerard Ziemski wrote:
> hi Ioi,
>
> Please see my comments in-line.
>
>> On Nov 20, 2018, at 12:01 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>> Hi Gerard,
>>
>> Thanks for the review. Please see my comments in-line.
>>
>>
>> On 11/19/18 8:11 AM, Gerard Ziemski wrote:
>>> Hi Ioi,
>>>
>>> Very nice! I only have couple of questions.
>>>
>>> #1 Regarding “maybe_grow()” in src/hotspot/share/utilities/hashtable.hpp:
>>>
>>> 235 // Grow the number of buckets if the average entries per bucket is over the load_factor
>>> 236 bool maybe_grow(int load_factor = 8) {
>>> 237   if (number_of_entries() / table_size() > load_factor) {
>>> 238     resize(table_size() * 2);
>>> 239     return true;
>>> 240   } else {
>>> 241     return false;
>>> 242   }
>>> 243 }
>>>
>>> a) A suggestion: from my personal experience this didn’t make a difference, but technically a hash map benefits from a size that is a prime number. Maybe you could make a comparison run, with the code as is vs with prime number when resizing and see if you get an even better performance?
>> In my patch, all tables that call maybe_grow started with a table size of a prime number. E.g., in metaspaceShared.cpp:
>>
>> 1074   typedef KVHashtable<
>> 1075       address, address> RelocationTable;
>> 1081     _new_loc_table = new RelocationTable(8087);
>>
>> So hopefully this will at least be better than starting with a simple power of 2.
>>
>> Anyway, in the LotsOfClasses test, the two tables grew to 254224 and 258784 entries. I tried hard-coding them to start with a large prime number (254249 and  258787 entries, respectively):
>>
>>      BEFORE: 24.004s
>>      AFTER : 22.476s
>>
>> The AFTER run also saves all the costs of expanding the tables gradually. So I think doing the simple multiply-by-two is good enough.
> Thank you for checking. I do like the simplicity of doubling the size, so if the performance benefit of calculating a prime is small (as I was suspecting), then I’m more than fine with the simple solution.
>
> I just happened to look at https://wiki.openjdk.java.net/display/HotSpot/StyleGuide yesterday and can’t help, but note here, that according to it, we should “Assign names to constant literals and use the names instead”, so we should replace the literals like “8087” with “static const size_t INITIAL_TABLE_SIZE = 8087” (I know this is a pre-existent code)
>
>>> b) I am a little worried that we don’t have an upper bound here. In our other hash table we always have upper bounds (or don’t resize).
>> With 35,000 classes, the tables will have over 1.5 million entries and we expand the table to about 250,000. I think we should support dumping at least 100,000 classes. That means the upper bound for the table size should probably be at least 1,000,000.
>>
>> What do you think?
> If someone hits this bound and it causes an issue, they can tell us and we can always increase it later, but with your fix and the proposed bound, it’s way much better compared to the current behavior.
>
>
>>> c) Is this table only used from one thread? We are guaranteed here that no other thread will insert/delete anything while we resize?
>> Currently maybe_grow() is called only within a safepoint. resize() already has an assert for safepoint. I will add the same assert into maybe_grow:
>>
>>
>>    bool maybe_grow(int load_factor = 8) {
>> +  assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint");
>>      if (number_of_entries() / table_size() > load_factor) {
> Thanks.
>
>
>> Thanks
>> - Ioi
>>
>>> Cheers
>>>
>>>> On Nov 18, 2018, at 9:50 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/17/18 10:28 PM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> On 11/17/18 6:48 PM, Ioi Lam wrote:
>>>>>
>>>>>> On 11/17/18 1:54 PM, Jiangli Zhou wrote:
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>> This looks good. Glad to see the change that utilizes the existing BasicHashtable. Thanks Coleen for the suggestion!
>>>>>>>
>>>>>>> To avoid changing the existing assumptions about BasicHashtable and Hashtable, how about adding a destructor to the new KVHashtable instead of BasicHashtable? Just did a quick test, which seems to work properly.
>>>>>>>
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> Thanks for the review.
>>>>>>
>>>>>> Not freeing the entries when a BasicHashtable is destructed is a bug. None of the existing BasicHashtables (and subclasses thereof) are ever destructed, so I don't think my code would impact them. Also, if any code depends on the entries not freed even if the table is destructed, that's clearly a bug in that code, and it should be fixed.
>>>>>>
>>>>>> If I don't add the destructor to BasicHashtable, the next person who wants to allocate/free BasicHashtables will run in to the same issue.
>>>>> Dictionary, PackageEntryTable, ModuleEntryTable and G1CodeRootSetTable are derived from Hashtable and they call free_buckets() in their destructors. So free_buckets() will be called twice when those tables are freed although it's harmless. If you want to avoid the duplicated call, you could remove the free_buckets() calls from those classes' destructors and keep the current BasicHashtables changes. I don't have a strong opinion on this, I'll leave it to you to decide.
>>>>>
>>>>>
>>>> Hi Jiangli,
>>>>
>>>> Good catch. I've remove the call to free_buckets() in the destructor of those 4 classes. Here's the delta from the last webrev:
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213587-resize-cds-hashtables.v04-delta/
>>>>
>>>> Also, I found 2 problems while doing this. Hashtables in HotSpot are really a mess.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8214029
>>>> https://bugs.openjdk.java.net/browse/JDK-8214030
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Thanks,
>>>>>
>>>>> Jiangli
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jiangli
>>>>>>>
>>>>>>>
>>>>>>> On 11/16/18 7:55 PM, Ioi Lam wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> I deleted the default value for MEMFLAGS as you suggested. For my instantiated templates, I still use mtInternal, though, since I can't find anything better for using at CDS dump time.
>>>>>>>>
>>>>>>>> Also, Jiangli noted that there's a memory leak, because I allocate and free the KVHashtable dynamically. So I added a destructor to BasicHashtable to free the buckets and the block-allocated entries.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213587-resize-cds-hashtables.v03/
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213587-resize-cds-hashtables.v03-delta/
>>>>>>>>
>>>>>>>> This comment around Hashtable::allocate_new_entry() is wrong now -- "The allocator in blocks is preferable but doesn't have free semantics". Maybe I should change it to this?
>>>>>>>>
>>>>>>>> "The block allocator in BasicHashtable has less fragmentation, but the memory is not freed until the whole table is freed. Use allocate_new_entry() if you want to immediately free the memory used by each entry".
>>>>>>>>
>>>>>>>> I am rerunning hs-tiers{1,2,3,4} to catch any issues. I also tested the solaris/x64 build since it seems to break every time you do something with templates :-(
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>> On 11/16/18 1:36 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Hi Ioi,  I really like this new wrapper on the old hashtable to not have to write the boilerplate code!
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213587-resize-cds-hashtables.v02/src/hotspot/share/utilities/hashtable.hpp.udiff.html
>>>>>>>>>
>>>>>>>>> + MEMFLAGS F = mtInternal,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think you should require the mt type though and not make it a default parameter. mtInternal is not very useful to finding memory leaks.
>>>>>>>>>
>>>>>>>>> Apart from this (which I don't need to see another version), your change looks good and nice to get good performance benefits from this.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 11/15/18 12:31 AM, Ioi Lam wrote:
>>>>>>>>>> Coleen pointed out to me off-line that the good old (and ugly) BasicHashtable already supports resizing. I think that might be a better starting point for this RFE:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213587-resize-cds-hashtables.v02/
>>>>>>>>>>
>>>>>>>>>> I wrote a new template class called "KVHashtable" (copying the style from ResourceHashtable). That way, you can instantiate different (Key -> Value) mappings without writing tons of boilerplate code. The performance is similar to my previous version, and the code is much cleaner.
>>>>>>>>>>
>>>>>>>>>> I also renamed the RFE title, as well as the subject line of this RFR e-mail.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> - Ioi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/14/18 5:14 PM, Jiangli Zhou wrote:
>>>>>>>>>>> Hi Ioi,
>>>>>>>>>>>
>>>>>>>>>>> On 11/14/18 9:09 AM, Ioi Lam wrote:
>>>>>>>>>>>> On 11/13/18 4:05 PM, Jiangli Zhou wrote:
>>>>>>>>>>>>> Hi Ioi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The change looks reasonable to me in general. It would be helpful to see the performance difference with the expendable table. Do you have any data when large number of classes are loaded (>20000)? How much saving does it provide?
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Jiangli, thanks for the review. For dumping 30292 classes:
>>>>>>>>>>>>
>>>>>>>>>>>> BEFORE: 93.971 sec
>>>>>>>>>>>> AFTER:  34.761 sec
>>>>>>>>>>> Thanks for the data! That's about 2.6x improvement with large set of classes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jiangli
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> - Ioi
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jiangli
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/8/18 10:35 PM, Ioi Lam wrote:
>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213587
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213587-configurable-resource-hash.v01/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TL;DR: -- add a subclass to ResourceHashtable to allow the table size to be
>>>>>>>>>>>>>>            dynamically specified when the table is constructed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          *** C++ template guru alert ***
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know much about C++ templates, so my attempt on doing this may be
>>>>>>>>>>>>>> ill-advised.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I *think* that with my patch, the performance of existing code, which uses
>>>>>>>>>>>>>> a statically-defined SIZE,  should not be affected, as the C++ compiler
>>>>>>>>>>>>>> should be able to constant-propagate and reduce the new code:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    ALWAYSINLINE unsigned size() const {
>>>>>>>>>>>>>>      if (SIZE != CONFIGURABLE_SIZE) {
>>>>>>>>>>>>>>        return SIZE;
>>>>>>>>>>>>>>      } else {
>>>>>>>>>>>>>>        return _configured_table_size;
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    ALWAYSINLINE Node** get_table() const {
>>>>>>>>>>>>>>      if (SIZE != CONFIGURABLE_SIZE) {
>>>>>>>>>>>>>>        return (Node**)(&_static_table[0]);
>>>>>>>>>>>>>>      } else {
>>>>>>>>>>>>>>        return _configured_table;
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Node** lookup_node(unsigned hash, K const& key) {
>>>>>>>>>>>>>>      unsigned index = hash % size();    <-----
>>>>>>>>>>>>>>      Node** table = get_table();
>>>>>>>>>>>>>>      Node** ptr = &table[index];   <-----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> back to the old code:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Node** lookup_node(unsigned hash, K const& key) {
>>>>>>>>>>>>>>      unsigned index = hash % SIZE;      <-----
>>>>>>>>>>>>>>      Node** ptr = &_table[index]; <-----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If anyone has a better way of doing this, I'd love to hear it!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> - Ioi