RFR(L): 8029075 - String deduplication in G1

Tue Mar 18 14:44:02 UTC 2014

Thanks for making the changes.  I like pop better now.  Ship it!
Coleen

On 3/18/14 8:43 AM, Per Liden wrote:
> Thanks Coleen!
>
> Updated webrev available here:
>
> http://cr.openjdk.java.net/~pliden/8029075/webrev.4/
>
> Diff against previous webrev here:
>
> http://cr.openjdk.java.net/~pliden/8029075/webrev.4-diff_3vs4/
>
> I received some additional feedback from Thomas which is also included 
> in this update. The additional changes are:
> - Fixed some other typos in comments.
> - Added a check for the "String Dedup Fixup" in the 
> gc/g1/TestGCLogMessages test.
> - Included the memory occupied by the entry cache in the "Memory 
> usage" number printed to the GC log.
>
> Some additional comments inline below.
>
> On 2014-03-18 00:40, Coleen Phillimore wrote:
>> Per,
>>
>> From g1StringDedupQueue.hpp.html
>>    49 // Pushing to the queue is thread safe (this relies on each thread using a unique worker
>>    50 // id), but only allowed during a safepoint. Poping from the queue is NOT thread safe and
>>    51 // can only be done outside of safepoint.
>>    52 //
>>
>> Poping should be "popping". This comment had me guessing.
>
> Fixed :)
>
>> Maybe if you added at the end of 51 "only by the deduplicating 
>> thread" or "only by the deduplicating thead outside a safepoint".
>
> Good idea, fixed.
>
>> Also as a matter of style, I don't like non-const reference 
>> parameters.  It's unclear at the callsite that the value is being 
>> modified, which is why most people hate them.  It's not part of the 
>> Hotspot coding style afaik.
>
> Agree, changed it to return the oop instead (and NULL if the queue is 
> empty), looks better.
>
>>
>> One thing that I found with switching the hash code is that in 
>> lookup_or_add(), that the hash could change to use a different 
>> hashing algorithm and table transferred while you take the 
>> StringDedupTable_lock.   I think you're okay because it's a 
>> no_safepoint_check_flag lock.
>>
>> This looks really good.  It's a significant piece of work!
>
> Thanks for reviewing!
>
> /Per
>
>>
>> Coleen
>>
>>
>> On 3/17/14 5:51 AM, Bengt Rutisson wrote:
>>>
>>> Hi Per,
>>>
>>> On 2014-03-14 20:07, Per Liden wrote:
>>>> Hi,
>>>>
>>>> Here's an updated webrev, based on feedback from primarily Thomas 
>>>> and Coleen (Thanks!).
>>>>
>>>> http://cr.openjdk.java.net/~pliden/8029075/webrev.2/
>>>
>>> Looks good to me.
>>>
>>> Bengt
>>>
>>>>
>>>> I hope to have addressed all issues that were raised. The main 
>>>> theme of course is the addition of documentation/comments and the 
>>>> splitting of some classes into separate files. I think I've 
>>>> addressed most, if not all, of the other minor issues Thomas 
>>>> mentioned. The only thing I can think of that wasn't changed was 
>>>> the categorization of the VM flags, which I motivated in a 
>>>> different mail.
>>>>
>>>> I didn't think it made sense to upload a diff against the previous 
>>>> webrev as that diff turned out to be a lot bigger than real webrev 
>>>> itself. Sorry if this makes reviewing harder.
>>>>
>>>> cheers,
>>>> /Per
>>>>
>>>> On 2014-03-14 14:19, Per Liden wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> Inline below...
>>>>>
>>>>> On 2014-03-14 13:19, Coleen Phillimore wrote:
>>>>>>
>>>>>> Per,  3 quick comments below.  Thank you for the reorganization 
>>>>>> and additional comments.
>>>>>>
>>>>>> On 3/13/14 11:02 AM, Per Liden wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> Thanks for looking at the patch! Comment below.
>>>>>>>
>>>>>>> On 2014-03-12 23:10, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> Hi Per,
>>>>>>>>
>>>>>>>> Why not add the call for StringDedup::unlink() inside the 
>>>>>>>> function unlink_string_and_symbol_table rather than just above 
>>>>>>>> in two places, since they go together.
>>>>>>>>
>>>>>>>> +  // Delete dead entries in string deduplication queue/table
>>>>>>>> +  StringDedup::unlink(&GenMarkSweep::is_alive);
>>>>>>>> +
>>>>>>>>    // Delete entries for dead interned string and clean up 
>>>>>>>> unreferenced symbols in symbol table.
>>>>>>>> G1CollectedHeap::heap()->unlink_string_and_symbol_table(&GenMarkSweep::is_alive); 
>>>>>>>>
>>>>>>>
>>>>>>> Yep, I'll move that call into unlink_string_and_symbol_table(). 
>>>>>>> The unlink_string_and_symbol_table() is a fairly new addition, 
>>>>>>> there used to be StringTable/SymbolTable calls there when I 
>>>>>>> added the StrDedup call, that's why it looks like that.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I didn't see anything else specific to change.  Some general 
>>>>>>>> comments which are similar to Thomas's. There seems to be a 
>>>>>>>> lack of comments in the deduplication file.   I hate huge block 
>>>>>>>> comments too but some comments about what it does in general 
>>>>>>>> would be nice. Like 3 sentences per class.  The file is 
>>>>>>>> overwhelming (ie, I completely lose interest) with the 
>>>>>>>> statistics printing at the front.   Maybe you could put the 
>>>>>>>> most interesting classes at the front of the file?   Maybe 
>>>>>>>> statistics printing could be a separate file.
>>>>>>>>
>>>>>>>> One of the conventions that we have is that accessors for 
>>>>>>>> classes are on one line, so you could make the file smaller by 
>>>>>>>> doing that reformatting.
>>>>>>>>
>>>>>>>> It is also a general convention to have comments above the 
>>>>>>>> declaration of a class to say generally what it's for, when 
>>>>>>>> it's used, etc.  It's not easy to tell by the name or we could 
>>>>>>>> guess wrong.   What is StringDedupQueue for, eg? That would 
>>>>>>>> make reviewing the code easier.
>>>>>>>
>>>>>>> Agree, I'll be adding comments to each class and a general 
>>>>>>> description of what dedup does.
>>>>>>>
>>>>>>>>
>>>>>>>> There seems to be a lot of duplication of code with the 
>>>>>>>> hashtable in hashtable.hpp, only different.  We have lots of 
>>>>>>>> hashtables in hotspot.  Did you try to reinstantiate this 
>>>>>>>> hashtable and why didn't that not work?  There are interesting 
>>>>>>>> memory allocation functions for the entries in this normal 
>>>>>>>> hashtable that might help with fragmentation, for instance.
>>>>>>>
>>>>>>> Very early on I actually did use an instance of Hashtable and 
>>>>>>> even played with the idea of having a combined 
>>>>>>> StringTable/DedupTable. I decided not to use that though, mostly 
>>>>>>> because I felt I would need to make too many modifications to 
>>>>>>> non-dedup related code and because Hashtable<> carries concepts, 
>>>>>>> like shared entries, which didn't quite fit with dedup. Using 
>>>>>>> the existing StringTable would mean it would have to have logic 
>>>>>>> to manage entries which point to either Strings or char[], which 
>>>>>>> again seemed like an too intrusive change. The Hashtable 
>>>>>>> allocation strategy (with allocating blocks of entries) seems 
>>>>>>> nice from a performance perspective, but freeing those becomes a 
>>>>>>> bit of a challenge, which I guess is why it never frees any 
>>>>>>> entries. Since the dedup table is there to in the end save 
>>>>>>> memory, I felt I needed better control over the table's memory 
>>>>>>> usage. At one point I actually started writing a 
>>>>>>> WeakOopsHashTable<>, which I hoped could serve as the base for 
>>>>>>> both StringTable and DedupTable, but again I felt I was losing 
>>>>>>> focus a bit as it would mean modifying large parts of non-GC and 
>>>>>>> non-dedeup related stuff. The potential connection to CDS here 
>>>>>>> also made me a bit scared of going in that direction, as I don't 
>>>>>>> have a complete understanding of the implications of that.
>>>>>>
>>>>>> Ok, I'd figured they didn't map that well.   There's a lot of 
>>>>>> duplication already in symbol and string tables also.
>>>>>>>
>>>>>>>>
>>>>>>>> Lastly, why is the hash code in the string written to and used? 
>>>>>>>> The StringTable uses the same hash algorithm until it has to 
>>>>>>>> change the hash code, but the new hash code isn't stored in the 
>>>>>>>> string because it would be incompatible.
>>>>>>>
>>>>>>> I think StringTable and the dedup table is doing the same thing 
>>>>>>> here. The hash code in the String object is only used/updated in 
>>>>>>> the case we're using the standard/compatible hash function. As 
>>>>>>> soon as the table is rehashed and switches to another hash 
>>>>>>> function it no longer updates the hash code in the String. 
>>>>>>> use_java_hash() tells use if we're using the standard/compatible 
>>>>>>> hash and it's used here:
>>>>>>>
>>>>>>>   if (use_java_hash()) {
>>>>>>>     // Get hash code from cache
>>>>>>>     hash = java_lang_String::hash(java_string);
>>>>>>>   }
>>>>>>>
>>>>>>> and later here:
>>>>>>>
>>>>>>>   if (use_java_hash() && hash != 0) {
>>>>>>>     // Store hash code in cache
>>>>>>>     java_lang_String::set_hash(java_string, hash);
>>>>>>>   }
>>>>>>>
>>>>>>> Concerns?
>>>>>>
>>>>>> I have to look again but wanted to answer quickly (forgot to last 
>>>>>> night).  This is good, but why do we have to update the hash code 
>>>>>> in the java/lang/String instance at all?
>>>>>
>>>>> We don't strictly have to, but it mimics the behavior of 
>>>>> String.hashCode() and it is seems like the natural thing to do to 
>>>>> given that the hash field in String is a cache for this purpose so 
>>>>> that future users of String.hashCode() don't have to do the same 
>>>>> calculation. Do you have a reason in mind why we wouldn't want to 
>>>>> write it back?
>>>>>
>>>>> cheers,
>>>>> /Per
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I would like to see this be a few different files. It seems 
>>>>>>>> like this file does three things conceptually, interacts with 
>>>>>>>> G1 somehow to determine which strings are candidates for 
>>>>>>>> deduplication, stores these things in a rehashable, resizable 
>>>>>>>> hash table and gathers statistics.
>>>>>>>>
>>>>>>>> I'm sorry to have so many comments so late and I haven't really 
>>>>>>>> tried to read the code for correctness.   I think some 
>>>>>>>> reformatting and reorganization would make the reviewers job 
>>>>>>>> easier. It's a big change and worthwhile.
>>>>>>>
>>>>>>> Great, I've no problems splitting this up. I'm more of a "one 
>>>>>>> class per file" person myself, but I had the (apparently 
>>>>>>> incorrect) impression that wasn't the hotspot way. I'm thinking 
>>>>>>> I'll split this up into the following files (while I'm at it 
>>>>>>> I'll also add the G1 prefix as Thomas suggested):
>>>>>>>
>>>>>>> g1StringDedup - Main interface for the GC, would contain entry 
>>>>>>> point and things like the task and closure classes
>>>>>>> g1StringDedupTable - The table, cache and entry classes
>>>>>>> g1StringDedupQueue - The queue
>>>>>>> g1StringDedupStat - The statistics sutff
>>>>>>> g1StringDedupThread - The dedup thread
>>>>>>
>>>>>> Yes, this would be nice.   symbolTable.cpp should be split up too 
>>>>>> if that is what gave you the impression. I will file an RFE to do 
>>>>>> that, it's been at the back of my mind to do so.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>>
>>>>>>> Sounds good?
>>>>>>>
>>>>>>> Will send out and updated webrev.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Per
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/12/14 9:34 AM, Per Liden wrote:
>>>>>>>>> Hi Thomas,
>>>>>>>>>
>>>>>>>>> Thanks for reviewing. Comments inline.
>>>>>>>>>
>>>>>>>>> On 03/11/2014 01:39 PM, Thomas Schatzl wrote:
>>>>>>>>>> Hi Per,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, 2014-03-07 at 15:13 +0100, Per Liden wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Here's an updated webrev based on the feedback from Bengt 
>>>>>>>>>>> and Thomas.
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~pliden/8029075/webrev.1/
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See further below.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And as suggested, two changes in the initial webrev were 
>>>>>>>>>>> broken out into
>>>>>>>>>>> separate CRs, available here:
>>>>>>>>>>>
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036672
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~pliden/8036672/webrev.0/
>>>>>>>>>>
>>>>>>>>>> Looks good.
>>>>>>>>>>
>>>>>>>>>> Did you check the impact on something like specjbb2005 which 
>>>>>>>>>> does
>>>>>>>>>> nothing but object copying? That code is somewhat performance 
>>>>>>>>>> sensitive.
>>>>>>>>>
>>>>>>>>> Yes, I did Aurora runs with jbb2005 and jbb2013, didn't see 
>>>>>>>>> any difference.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036673
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~pliden/8036673/webrev.0/
>>>>>>>>>>
>>>>>>>>>> Looks good.
>>>>>>>>>>
>>>>>>>>>> Here are my comments: there are a few minor issues, and a 
>>>>>>>>>> single major:
>>>>>>>>>> basically the documentation for this feature is lacking imo. 
>>>>>>>>>> I listed a
>>>>>>>>>> few particular issues of mine with it.
>>>>>>>>>>
>>>>>>>>>> First the minor issues:
>>>>>>>>>>
>>>>>>>>>> - could you check on copyright dates on the modified files? 
>>>>>>>>>> They have
>>>>>>>>>> not been updated anywhere afaics.
>>>>>>>>>>
>>>>>>>>>> - not sure if good, but maybe make 
>>>>>>>>>> PrintStringDeduplicationStatistics
>>>>>>>>>> diagnostic? It seems to really be some diagnosis functions.
>>>>>>>>>>
>>>>>>>>>> - maybe make the new *ALot flags notproduct. These seem to be 
>>>>>>>>>> debugging
>>>>>>>>>> helpers similar to other *ALot flags.
>>>>>>>>>
>>>>>>>>> I had a conversion with Bengt about this some time ago and I 
>>>>>>>>> think the options are categorized correctly. The ALot flags 
>>>>>>>>> needs be available in product builds because they are used by 
>>>>>>>>> tests. Most Print*Statistics are product therefore it makes 
>>>>>>>>> sense to follow that pattern for 
>>>>>>>>> PrintStringDeduplicationStatistics.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - StringDeduplicationAgeThreshold: maybe also needs to 
>>>>>>>>>> mention that if
>>>>>>>>>> the object is promoted, that it is always considered as 
>>>>>>>>>> candidate.
>>>>>>>>>
>>>>>>>>> Check.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> marksweep.inline.hpp
>>>>>>>>>>
>>>>>>>>>> - MarkSweep::mark_object(): did you measure impact on other 
>>>>>>>>>> collectors
>>>>>>>>>> using this code? It is enabled unconditionally. Also I would 
>>>>>>>>>> prefer
>>>>>>>>>> that !UseG1GC automatically disables UseStringDeduplication 
>>>>>>>>>> (or use a
>>>>>>>>>> global variable for that) to avoid the two checks. (The 
>>>>>>>>>> latter option is
>>>>>>>>>> probably better).
>>>>>>>>>
>>>>>>>>> Yes, ran jbb2005/2013.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> symbolTable.cpp:
>>>>>>>>>>
>>>>>>>>>> - again both UseG1GC and UseStringDedupliation are paired. 
>>>>>>>>>> Could this be
>>>>>>>>>> put in a method/global variable somewhere that is set up 
>>>>>>>>>> correctly at
>>>>>>>>>> startup?
>>>>>>>>>
>>>>>>>>> I think using the options is much more readable. Adding 
>>>>>>>>> another state which tells us if G1 & Dedup is enabled means we 
>>>>>>>>> need a pre-init stage to set up that state, which I don't 
>>>>>>>>> think is very nice.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> g1CollectedHeap.cpp:
>>>>>>>>>>
>>>>>>>>>> - StringDedup::initialize() is run unconditionally (the check 
>>>>>>>>>> for
>>>>>>>>>> UseStringDeduplication is inside the method) while it's 
>>>>>>>>>> outside for
>>>>>>>>>> everything else. Just a note, keep it if wanted.
>>>>>>>>>
>>>>>>>>> They are hidden inside dedup everywhere, except where the 
>>>>>>>>> performance overhead of a call might be an issue (like mark 
>>>>>>>>> and copy).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> TestStringDeduplicationTools.java:
>>>>>>>>>>
>>>>>>>>>> - maybe put retrieval of the unsafe object into the test 
>>>>>>>>>> library.
>>>>>>>>>> Browsing through the code shows quite a lot copy+paste here 
>>>>>>>>>> already.
>>>>>>>>>>
>>>>>>>>>> - "qeueue" -> "queue"
>>>>>>>>>
>>>>>>>>> Check.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> all tests:
>>>>>>>>>>
>>>>>>>>>> - missing bug ids
>>>>>>>>>
>>>>>>>>> Check.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  From here it is mostly about comments/documentation:
>>>>>>>>>
>>>>>>>>> I agree that adding a feature overview and some additional 
>>>>>>>>> comments is probably a good idea. However, I also know that 
>>>>>>>>> Bengt and StefanK aren't too crazy about large comment blobs 
>>>>>>>>> so I tried to kept the noise down. I also disagree with some 
>>>>>>>>> of the other comments below which tend to be about personal 
>>>>>>>>> taste rather than code quality or breaking some coding 
>>>>>>>>> convention we have, e.g. the static helpers, readability, 
>>>>>>>>> naming, etc.
>>>>>>>>>
>>>>>>>>> I'll add some comments and make some other changes and send 
>>>>>>>>> out an updated webrev shortly.
>>>>>>>>>
>>>>>>>>> cheers,
>>>>>>>>> /Per
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> stringdedup.?pp:
>>>>>>>>>>
>>>>>>>>>> - is it possible to put an overview about how the string 
>>>>>>>>>> deduplication
>>>>>>>>>> works here? There is absolutely no place in the code where 
>>>>>>>>>> e.g. the
>>>>>>>>>> string deduplication cycle is explained. There is no 
>>>>>>>>>> comprehensive
>>>>>>>>>> information about this feature anywhere, just bits about 
>>>>>>>>>> corner cases or
>>>>>>>>>> particular here and there, but nothing that can be used as 
>>>>>>>>>> entry point
>>>>>>>>>> to understand what this is actually doing.
>>>>>>>>>>
>>>>>>>>>> To some extent, the comments seem to be repetitions of what 
>>>>>>>>>> the code
>>>>>>>>>> already shows. E.g. "some_constant = (1 << 10); // 1024 (must 
>>>>>>>>>> be power
>>>>>>>>>> of 2)" or "// Store hash code in cache" followed by
>>>>>>>>>> "java_lang_String::set_hash(java_string, hash);".
>>>>>>>>>>
>>>>>>>>>> There is also no reference to the JEP (I am not sure if it is 
>>>>>>>>>> customary
>>>>>>>>>> to put references to outside documents - I would prefer 
>>>>>>>>>> some); however
>>>>>>>>>> the JEP does not replace documentation in the source as it is 
>>>>>>>>>> not
>>>>>>>>>> detailed enough, and the JEP cannot be changed any more. The 
>>>>>>>>>> JEP is
>>>>>>>>>> interesting for the alternatives considered though.
>>>>>>>>>>
>>>>>>>>>> Ideally, to reasonably understand the feature I would like to 
>>>>>>>>>> not need
>>>>>>>>>> to spend hours looking through the implementation.
>>>>>>>>>>
>>>>>>>>>> - how many threads does this feature add, and are supported 
>>>>>>>>>> by this code
>>>>>>>>>> (for which parts)? Synchronization between them?
>>>>>>>>>>
>>>>>>>>>> - the comments "For XY" in the StringDedup header file do not 
>>>>>>>>>> give any
>>>>>>>>>> hint about what they are doing (not sure if it is interesting 
>>>>>>>>>> as it is
>>>>>>>>>> easy to find out the only callers). It seems that this 
>>>>>>>>>> information has
>>>>>>>>>> been put at the place where these methods are invoked (or 
>>>>>>>>>> just somewhere
>>>>>>>>>> in the cpp file). Imo the methods and members (except 
>>>>>>>>>> trivial) should be
>>>>>>>>>> documented in the hpp file in the class body, i.e. in the 
>>>>>>>>>> interface.
>>>>>>>>>>
>>>>>>>>>> - there are a few methods (also in the StringDedupTable 
>>>>>>>>>> class) in
>>>>>>>>>> StringDedup where the results of the methods do not seem 
>>>>>>>>>> obvious and
>>>>>>>>>> should be commented.
>>>>>>>>>>
>>>>>>>>>> - I really dislike the location of the "Candidate selection" 
>>>>>>>>>> explanation
>>>>>>>>>> within the cpp file. I would not have found it if I were not 
>>>>>>>>>> looking for
>>>>>>>>>> some documentation ;) It would probably fit very well in the 
>>>>>>>>>> above
>>>>>>>>>> mentioned overview paragraphs.
>>>>>>>>>>
>>>>>>>>>> - please add the static helper functions that implement the 
>>>>>>>>>> actual
>>>>>>>>>> policies (is_candidate_from_mark, 
>>>>>>>>>> is_candidate_from_evacuation) to the
>>>>>>>>>> StringDedup class (as private static methods) so that they 
>>>>>>>>>> can be found
>>>>>>>>>> easily. Static non-class helper functions in cpp files should 
>>>>>>>>>> only be
>>>>>>>>>> used for stuff that is really unimportant and not relevant to 
>>>>>>>>>> the code
>>>>>>>>>> imo. However these seem to be _the_ central policy methods 
>>>>>>>>>> for string
>>>>>>>>>> dedup.
>>>>>>>>>> So it would be nice to have them mentioned in the class/hpp 
>>>>>>>>>> file and
>>>>>>>>>> documented there.
>>>>>>>>>>
>>>>>>>>>> - facts about the code are repeated, e.g. that the table size 
>>>>>>>>>> must be a
>>>>>>>>>> power of two. This is nice, but has a few drawbacks: the 
>>>>>>>>>> compiler does
>>>>>>>>>> not care about comments at all, so it might be useful to put 
>>>>>>>>>> asserts in
>>>>>>>>>> these places instead, and second, it's just duplication of 
>>>>>>>>>> text with the
>>>>>>>>>> usual problem of getting out of date.
>>>>>>>>>>
>>>>>>>>>> - I am not completely sure why the class has not been called
>>>>>>>>>> G1StringDedup. It won't work without G1 anyway and calls G1 
>>>>>>>>>> internals
>>>>>>>>>> all the time. There does not seem to be a way to make it more 
>>>>>>>>>> generic
>>>>>>>>>> without considerable refactoring. It is fine to keep the name 
>>>>>>>>>> though,
>>>>>>>>>> just asking. I also know that this feature does not want to 
>>>>>>>>>> be a generic
>>>>>>>>>> solution.
>>>>>>>>>>
>>>>>>>>>> - is it possible to separate out the statistics variables of
>>>>>>>>>> StringDedupTable into either a separate class instead of 
>>>>>>>>>> using globals
>>>>>>>>>> or separate them a little from the other unrelated members? (And
>>>>>>>>>> describe them) This would improve readability.
>>>>>>>>>>
>>>>>>>>>> I *think* this is everything from 
>>>>>>>>>> StringDedupTable::_entries_added to
>>>>>>>>>> _rehash_count? I have no idea without searching for every use 
>>>>>>>>>> and
>>>>>>>>>> deducing from that.
>>>>>>>>>>
>>>>>>>>>> - when implementing the hash table for the StringDedupTable, 
>>>>>>>>>> did you
>>>>>>>>>> consider trying to reuse one of the existing hashmap 
>>>>>>>>>> implementations?
>>>>>>>>>> And potentially extend them.
>>>>>>>>>>
>>>>>>>>>> Imo we already have way too many ad-hoc implementations of 
>>>>>>>>>> often used
>>>>>>>>>> data structures (particularly) in G1 and GC code.
>>>>>>>>>>
>>>>>>>>>> - please document the properties of the StringDedupTable 
>>>>>>>>>> (type, hashing,
>>>>>>>>>> automatic resizing behavior, thread safety, ...).
>>>>>>>>>>
>>>>>>>>>> - please explain what StringDedupTable/Cache are and what 
>>>>>>>>>> they are used
>>>>>>>>>> for in context of the feature.
>>>>>>>>>>
>>>>>>>>>> - stringdedup.cpp, line 1309: candicate -> candidate
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>