RFR(L): 8029075 - String deduplication in G1

Tue Mar 18 17:24:03 UTC 2014

Thanks Thomas, Coleen and Bengt for the reviewing!

cheers,
/Per

On 03/18/2014 03:44 PM, Coleen Phillimore wrote:
>
> Thanks for making the changes.  I like pop better now.  Ship it!
> Coleen
>
> On 3/18/14 8:43 AM, Per Liden wrote:
>> Thanks Coleen!
>>
>> Updated webrev available here:
>>
>> http://cr.openjdk.java.net/~pliden/8029075/webrev.4/
>>
>> Diff against previous webrev here:
>>
>> http://cr.openjdk.java.net/~pliden/8029075/webrev.4-diff_3vs4/
>>
>> I received some additional feedback from Thomas which is also included
>> in this update. The additional changes are:
>> - Fixed some other typos in comments.
>> - Added a check for the "String Dedup Fixup" in the
>> gc/g1/TestGCLogMessages test.
>> - Included the memory occupied by the entry cache in the "Memory
>> usage" number printed to the GC log.
>>
>> Some additional comments inline below.
>>
>> On 2014-03-18 00:40, Coleen Phillimore wrote:
>>> Per,
>>>
>>> From g1StringDedupQueue.hpp.html
>>>    49 // Pushing to the queue is thread safe (this relies on each thread using a unique worker
>>>    50 // id), but only allowed during a safepoint. Poping from the queue is NOT thread safe and
>>>    51 // can only be done outside of safepoint.
>>>    52 //
>>>
>>> Poping should be "popping". This comment had me guessing.
>>
>> Fixed :)
>>
>>> Maybe if you added at the end of 51 "only by the deduplicating
>>> thread" or "only by the deduplicating thead outside a safepoint".
>>
>> Good idea, fixed.
>>
>>> Also as a matter of style, I don't like non-const reference
>>> parameters.  It's unclear at the callsite that the value is being
>>> modified, which is why most people hate them.  It's not part of the
>>> Hotspot coding style afaik.
>>
>> Agree, changed it to return the oop instead (and NULL if the queue is
>> empty), looks better.
>>
>>>
>>> One thing that I found with switching the hash code is that in
>>> lookup_or_add(), that the hash could change to use a different
>>> hashing algorithm and table transferred while you take the
>>> StringDedupTable_lock.   I think you're okay because it's a
>>> no_safepoint_check_flag lock.
>>>
>>> This looks really good.  It's a significant piece of work!
>>
>> Thanks for reviewing!
>>
>> /Per
>>
>>>
>>> Coleen
>>>
>>>
>>> On 3/17/14 5:51 AM, Bengt Rutisson wrote:
>>>>
>>>> Hi Per,
>>>>
>>>> On 2014-03-14 20:07, Per Liden wrote:
>>>>> Hi,
>>>>>
>>>>> Here's an updated webrev, based on feedback from primarily Thomas
>>>>> and Coleen (Thanks!).
>>>>>
>>>>> http://cr.openjdk.java.net/~pliden/8029075/webrev.2/
>>>>
>>>> Looks good to me.
>>>>
>>>> Bengt
>>>>
>>>>>
>>>>> I hope to have addressed all issues that were raised. The main
>>>>> theme of course is the addition of documentation/comments and the
>>>>> splitting of some classes into separate files. I think I've
>>>>> addressed most, if not all, of the other minor issues Thomas
>>>>> mentioned. The only thing I can think of that wasn't changed was
>>>>> the categorization of the VM flags, which I motivated in a
>>>>> different mail.
>>>>>
>>>>> I didn't think it made sense to upload a diff against the previous
>>>>> webrev as that diff turned out to be a lot bigger than real webrev
>>>>> itself. Sorry if this makes reviewing harder.
>>>>>
>>>>> cheers,
>>>>> /Per
>>>>>
>>>>> On 2014-03-14 14:19, Per Liden wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> Inline below...
>>>>>>
>>>>>> On 2014-03-14 13:19, Coleen Phillimore wrote:
>>>>>>>
>>>>>>> Per,  3 quick comments below.  Thank you for the reorganization
>>>>>>> and additional comments.
>>>>>>>
>>>>>>> On 3/13/14 11:02 AM, Per Liden wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> Thanks for looking at the patch! Comment below.
>>>>>>>>
>>>>>>>> On 2014-03-12 23:10, Coleen Phillimore wrote:
>>>>>>>>>
>>>>>>>>> Hi Per,
>>>>>>>>>
>>>>>>>>> Why not add the call for StringDedup::unlink() inside the
>>>>>>>>> function unlink_string_and_symbol_table rather than just above
>>>>>>>>> in two places, since they go together.
>>>>>>>>>
>>>>>>>>> +  // Delete dead entries in string deduplication queue/table
>>>>>>>>> +  StringDedup::unlink(&GenMarkSweep::is_alive);
>>>>>>>>> +
>>>>>>>>>    // Delete entries for dead interned string and clean up
>>>>>>>>> unreferenced symbols in symbol table.
>>>>>>>>> G1CollectedHeap::heap()->unlink_string_and_symbol_table(&GenMarkSweep::is_alive);
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yep, I'll move that call into unlink_string_and_symbol_table().
>>>>>>>> The unlink_string_and_symbol_table() is a fairly new addition,
>>>>>>>> there used to be StringTable/SymbolTable calls there when I
>>>>>>>> added the StrDedup call, that's why it looks like that.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I didn't see anything else specific to change.  Some general
>>>>>>>>> comments which are similar to Thomas's. There seems to be a
>>>>>>>>> lack of comments in the deduplication file.   I hate huge block
>>>>>>>>> comments too but some comments about what it does in general
>>>>>>>>> would be nice. Like 3 sentences per class.  The file is
>>>>>>>>> overwhelming (ie, I completely lose interest) with the
>>>>>>>>> statistics printing at the front.   Maybe you could put the
>>>>>>>>> most interesting classes at the front of the file?   Maybe
>>>>>>>>> statistics printing could be a separate file.
>>>>>>>>>
>>>>>>>>> One of the conventions that we have is that accessors for
>>>>>>>>> classes are on one line, so you could make the file smaller by
>>>>>>>>> doing that reformatting.
>>>>>>>>>
>>>>>>>>> It is also a general convention to have comments above the
>>>>>>>>> declaration of a class to say generally what it's for, when
>>>>>>>>> it's used, etc.  It's not easy to tell by the name or we could
>>>>>>>>> guess wrong.   What is StringDedupQueue for, eg? That would
>>>>>>>>> make reviewing the code easier.
>>>>>>>>
>>>>>>>> Agree, I'll be adding comments to each class and a general
>>>>>>>> description of what dedup does.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> There seems to be a lot of duplication of code with the
>>>>>>>>> hashtable in hashtable.hpp, only different.  We have lots of
>>>>>>>>> hashtables in hotspot.  Did you try to reinstantiate this
>>>>>>>>> hashtable and why didn't that not work?  There are interesting
>>>>>>>>> memory allocation functions for the entries in this normal
>>>>>>>>> hashtable that might help with fragmentation, for instance.
>>>>>>>>
>>>>>>>> Very early on I actually did use an instance of Hashtable and
>>>>>>>> even played with the idea of having a combined
>>>>>>>> StringTable/DedupTable. I decided not to use that though, mostly
>>>>>>>> because I felt I would need to make too many modifications to
>>>>>>>> non-dedup related code and because Hashtable<> carries concepts,
>>>>>>>> like shared entries, which didn't quite fit with dedup. Using
>>>>>>>> the existing StringTable would mean it would have to have logic
>>>>>>>> to manage entries which point to either Strings or char[], which
>>>>>>>> again seemed like an too intrusive change. The Hashtable
>>>>>>>> allocation strategy (with allocating blocks of entries) seems
>>>>>>>> nice from a performance perspective, but freeing those becomes a
>>>>>>>> bit of a challenge, which I guess is why it never frees any
>>>>>>>> entries. Since the dedup table is there to in the end save
>>>>>>>> memory, I felt I needed better control over the table's memory
>>>>>>>> usage. At one point I actually started writing a
>>>>>>>> WeakOopsHashTable<>, which I hoped could serve as the base for
>>>>>>>> both StringTable and DedupTable, but again I felt I was losing
>>>>>>>> focus a bit as it would mean modifying large parts of non-GC and
>>>>>>>> non-dedeup related stuff. The potential connection to CDS here
>>>>>>>> also made me a bit scared of going in that direction, as I don't
>>>>>>>> have a complete understanding of the implications of that.
>>>>>>>
>>>>>>> Ok, I'd figured they didn't map that well.   There's a lot of
>>>>>>> duplication already in symbol and string tables also.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lastly, why is the hash code in the string written to and used?
>>>>>>>>> The StringTable uses the same hash algorithm until it has to
>>>>>>>>> change the hash code, but the new hash code isn't stored in the
>>>>>>>>> string because it would be incompatible.
>>>>>>>>
>>>>>>>> I think StringTable and the dedup table is doing the same thing
>>>>>>>> here. The hash code in the String object is only used/updated in
>>>>>>>> the case we're using the standard/compatible hash function. As
>>>>>>>> soon as the table is rehashed and switches to another hash
>>>>>>>> function it no longer updates the hash code in the String.
>>>>>>>> use_java_hash() tells use if we're using the standard/compatible
>>>>>>>> hash and it's used here:
>>>>>>>>
>>>>>>>>   if (use_java_hash()) {
>>>>>>>>     // Get hash code from cache
>>>>>>>>     hash = java_lang_String::hash(java_string);
>>>>>>>>   }
>>>>>>>>
>>>>>>>> and later here:
>>>>>>>>
>>>>>>>>   if (use_java_hash() && hash != 0) {
>>>>>>>>     // Store hash code in cache
>>>>>>>>     java_lang_String::set_hash(java_string, hash);
>>>>>>>>   }
>>>>>>>>
>>>>>>>> Concerns?
>>>>>>>
>>>>>>> I have to look again but wanted to answer quickly (forgot to last
>>>>>>> night).  This is good, but why do we have to update the hash code
>>>>>>> in the java/lang/String instance at all?
>>>>>>
>>>>>> We don't strictly have to, but it mimics the behavior of
>>>>>> String.hashCode() and it is seems like the natural thing to do to
>>>>>> given that the hash field in String is a cache for this purpose so
>>>>>> that future users of String.hashCode() don't have to do the same
>>>>>> calculation. Do you have a reason in mind why we wouldn't want to
>>>>>> write it back?
>>>>>>
>>>>>> cheers,
>>>>>> /Per
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would like to see this be a few different files. It seems
>>>>>>>>> like this file does three things conceptually, interacts with
>>>>>>>>> G1 somehow to determine which strings are candidates for
>>>>>>>>> deduplication, stores these things in a rehashable, resizable
>>>>>>>>> hash table and gathers statistics.
>>>>>>>>>
>>>>>>>>> I'm sorry to have so many comments so late and I haven't really
>>>>>>>>> tried to read the code for correctness.   I think some
>>>>>>>>> reformatting and reorganization would make the reviewers job
>>>>>>>>> easier. It's a big change and worthwhile.
>>>>>>>>
>>>>>>>> Great, I've no problems splitting this up. I'm more of a "one
>>>>>>>> class per file" person myself, but I had the (apparently
>>>>>>>> incorrect) impression that wasn't the hotspot way. I'm thinking
>>>>>>>> I'll split this up into the following files (while I'm at it
>>>>>>>> I'll also add the G1 prefix as Thomas suggested):
>>>>>>>>
>>>>>>>> g1StringDedup - Main interface for the GC, would contain entry
>>>>>>>> point and things like the task and closure classes
>>>>>>>> g1StringDedupTable - The table, cache and entry classes
>>>>>>>> g1StringDedupQueue - The queue
>>>>>>>> g1StringDedupStat - The statistics sutff
>>>>>>>> g1StringDedupThread - The dedup thread
>>>>>>>
>>>>>>> Yes, this would be nice.   symbolTable.cpp should be split up too
>>>>>>> if that is what gave you the impression. I will file an RFE to do
>>>>>>> that, it's been at the back of my mind to do so.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>>
>>>>>>>> Sounds good?
>>>>>>>>
>>>>>>>> Will send out and updated webrev.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Per
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/12/14 9:34 AM, Per Liden wrote:
>>>>>>>>>> Hi Thomas,
>>>>>>>>>>
>>>>>>>>>> Thanks for reviewing. Comments inline.
>>>>>>>>>>
>>>>>>>>>> On 03/11/2014 01:39 PM, Thomas Schatzl wrote:
>>>>>>>>>>> Hi Per,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 2014-03-07 at 15:13 +0100, Per Liden wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Here's an updated webrev based on the feedback from Bengt
>>>>>>>>>>>> and Thomas.
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~pliden/8029075/webrev.1/
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> See further below.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And as suggested, two changes in the initial webrev were
>>>>>>>>>>>> broken out into
>>>>>>>>>>>> separate CRs, available here:
>>>>>>>>>>>>
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036672
>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~pliden/8036672/webrev.0/
>>>>>>>>>>>
>>>>>>>>>>> Looks good.
>>>>>>>>>>>
>>>>>>>>>>> Did you check the impact on something like specjbb2005 which
>>>>>>>>>>> does
>>>>>>>>>>> nothing but object copying? That code is somewhat performance
>>>>>>>>>>> sensitive.
>>>>>>>>>>
>>>>>>>>>> Yes, I did Aurora runs with jbb2005 and jbb2013, didn't see
>>>>>>>>>> any difference.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036673
>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~pliden/8036673/webrev.0/
>>>>>>>>>>>
>>>>>>>>>>> Looks good.
>>>>>>>>>>>
>>>>>>>>>>> Here are my comments: there are a few minor issues, and a
>>>>>>>>>>> single major:
>>>>>>>>>>> basically the documentation for this feature is lacking imo.
>>>>>>>>>>> I listed a
>>>>>>>>>>> few particular issues of mine with it.
>>>>>>>>>>>
>>>>>>>>>>> First the minor issues:
>>>>>>>>>>>
>>>>>>>>>>> - could you check on copyright dates on the modified files?
>>>>>>>>>>> They have
>>>>>>>>>>> not been updated anywhere afaics.
>>>>>>>>>>>
>>>>>>>>>>> - not sure if good, but maybe make
>>>>>>>>>>> PrintStringDeduplicationStatistics
>>>>>>>>>>> diagnostic? It seems to really be some diagnosis functions.
>>>>>>>>>>>
>>>>>>>>>>> - maybe make the new *ALot flags notproduct. These seem to be
>>>>>>>>>>> debugging
>>>>>>>>>>> helpers similar to other *ALot flags.
>>>>>>>>>>
>>>>>>>>>> I had a conversion with Bengt about this some time ago and I
>>>>>>>>>> think the options are categorized correctly. The ALot flags
>>>>>>>>>> needs be available in product builds because they are used by
>>>>>>>>>> tests. Most Print*Statistics are product therefore it makes
>>>>>>>>>> sense to follow that pattern for
>>>>>>>>>> PrintStringDeduplicationStatistics.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> - StringDeduplicationAgeThreshold: maybe also needs to
>>>>>>>>>>> mention that if
>>>>>>>>>>> the object is promoted, that it is always considered as
>>>>>>>>>>> candidate.
>>>>>>>>>>
>>>>>>>>>> Check.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> marksweep.inline.hpp
>>>>>>>>>>>
>>>>>>>>>>> - MarkSweep::mark_object(): did you measure impact on other
>>>>>>>>>>> collectors
>>>>>>>>>>> using this code? It is enabled unconditionally. Also I would
>>>>>>>>>>> prefer
>>>>>>>>>>> that !UseG1GC automatically disables UseStringDeduplication
>>>>>>>>>>> (or use a
>>>>>>>>>>> global variable for that) to avoid the two checks. (The
>>>>>>>>>>> latter option is
>>>>>>>>>>> probably better).
>>>>>>>>>>
>>>>>>>>>> Yes, ran jbb2005/2013.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> symbolTable.cpp:
>>>>>>>>>>>
>>>>>>>>>>> - again both UseG1GC and UseStringDedupliation are paired.
>>>>>>>>>>> Could this be
>>>>>>>>>>> put in a method/global variable somewhere that is set up
>>>>>>>>>>> correctly at
>>>>>>>>>>> startup?
>>>>>>>>>>
>>>>>>>>>> I think using the options is much more readable. Adding
>>>>>>>>>> another state which tells us if G1 & Dedup is enabled means we
>>>>>>>>>> need a pre-init stage to set up that state, which I don't
>>>>>>>>>> think is very nice.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> g1CollectedHeap.cpp:
>>>>>>>>>>>
>>>>>>>>>>> - StringDedup::initialize() is run unconditionally (the check
>>>>>>>>>>> for
>>>>>>>>>>> UseStringDeduplication is inside the method) while it's
>>>>>>>>>>> outside for
>>>>>>>>>>> everything else. Just a note, keep it if wanted.
>>>>>>>>>>
>>>>>>>>>> They are hidden inside dedup everywhere, except where the
>>>>>>>>>> performance overhead of a call might be an issue (like mark
>>>>>>>>>> and copy).
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> TestStringDeduplicationTools.java:
>>>>>>>>>>>
>>>>>>>>>>> - maybe put retrieval of the unsafe object into the test
>>>>>>>>>>> library.
>>>>>>>>>>> Browsing through the code shows quite a lot copy+paste here
>>>>>>>>>>> already.
>>>>>>>>>>>
>>>>>>>>>>> - "qeueue" -> "queue"
>>>>>>>>>>
>>>>>>>>>> Check.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> all tests:
>>>>>>>>>>>
>>>>>>>>>>> - missing bug ids
>>>>>>>>>>
>>>>>>>>>> Check.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  From here it is mostly about comments/documentation:
>>>>>>>>>>
>>>>>>>>>> I agree that adding a feature overview and some additional
>>>>>>>>>> comments is probably a good idea. However, I also know that
>>>>>>>>>> Bengt and StefanK aren't too crazy about large comment blobs
>>>>>>>>>> so I tried to kept the noise down. I also disagree with some
>>>>>>>>>> of the other comments below which tend to be about personal
>>>>>>>>>> taste rather than code quality or breaking some coding
>>>>>>>>>> convention we have, e.g. the static helpers, readability,
>>>>>>>>>> naming, etc.
>>>>>>>>>>
>>>>>>>>>> I'll add some comments and make some other changes and send
>>>>>>>>>> out an updated webrev shortly.
>>>>>>>>>>
>>>>>>>>>> cheers,
>>>>>>>>>> /Per
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> stringdedup.?pp:
>>>>>>>>>>>
>>>>>>>>>>> - is it possible to put an overview about how the string
>>>>>>>>>>> deduplication
>>>>>>>>>>> works here? There is absolutely no place in the code where
>>>>>>>>>>> e.g. the
>>>>>>>>>>> string deduplication cycle is explained. There is no
>>>>>>>>>>> comprehensive
>>>>>>>>>>> information about this feature anywhere, just bits about
>>>>>>>>>>> corner cases or
>>>>>>>>>>> particular here and there, but nothing that can be used as
>>>>>>>>>>> entry point
>>>>>>>>>>> to understand what this is actually doing.
>>>>>>>>>>>
>>>>>>>>>>> To some extent, the comments seem to be repetitions of what
>>>>>>>>>>> the code
>>>>>>>>>>> already shows. E.g. "some_constant = (1 << 10); // 1024 (must
>>>>>>>>>>> be power
>>>>>>>>>>> of 2)" or "// Store hash code in cache" followed by
>>>>>>>>>>> "java_lang_String::set_hash(java_string, hash);".
>>>>>>>>>>>
>>>>>>>>>>> There is also no reference to the JEP (I am not sure if it is
>>>>>>>>>>> customary
>>>>>>>>>>> to put references to outside documents - I would prefer
>>>>>>>>>>> some); however
>>>>>>>>>>> the JEP does not replace documentation in the source as it is
>>>>>>>>>>> not
>>>>>>>>>>> detailed enough, and the JEP cannot be changed any more. The
>>>>>>>>>>> JEP is
>>>>>>>>>>> interesting for the alternatives considered though.
>>>>>>>>>>>
>>>>>>>>>>> Ideally, to reasonably understand the feature I would like to
>>>>>>>>>>> not need
>>>>>>>>>>> to spend hours looking through the implementation.
>>>>>>>>>>>
>>>>>>>>>>> - how many threads does this feature add, and are supported
>>>>>>>>>>> by this code
>>>>>>>>>>> (for which parts)? Synchronization between them?
>>>>>>>>>>>
>>>>>>>>>>> - the comments "For XY" in the StringDedup header file do not
>>>>>>>>>>> give any
>>>>>>>>>>> hint about what they are doing (not sure if it is interesting
>>>>>>>>>>> as it is
>>>>>>>>>>> easy to find out the only callers). It seems that this
>>>>>>>>>>> information has
>>>>>>>>>>> been put at the place where these methods are invoked (or
>>>>>>>>>>> just somewhere
>>>>>>>>>>> in the cpp file). Imo the methods and members (except
>>>>>>>>>>> trivial) should be
>>>>>>>>>>> documented in the hpp file in the class body, i.e. in the
>>>>>>>>>>> interface.
>>>>>>>>>>>
>>>>>>>>>>> - there are a few methods (also in the StringDedupTable
>>>>>>>>>>> class) in
>>>>>>>>>>> StringDedup where the results of the methods do not seem
>>>>>>>>>>> obvious and
>>>>>>>>>>> should be commented.
>>>>>>>>>>>
>>>>>>>>>>> - I really dislike the location of the "Candidate selection"
>>>>>>>>>>> explanation
>>>>>>>>>>> within the cpp file. I would not have found it if I were not
>>>>>>>>>>> looking for
>>>>>>>>>>> some documentation ;) It would probably fit very well in the
>>>>>>>>>>> above
>>>>>>>>>>> mentioned overview paragraphs.
>>>>>>>>>>>
>>>>>>>>>>> - please add the static helper functions that implement the
>>>>>>>>>>> actual
>>>>>>>>>>> policies (is_candidate_from_mark,
>>>>>>>>>>> is_candidate_from_evacuation) to the
>>>>>>>>>>> StringDedup class (as private static methods) so that they
>>>>>>>>>>> can be found
>>>>>>>>>>> easily. Static non-class helper functions in cpp files should
>>>>>>>>>>> only be
>>>>>>>>>>> used for stuff that is really unimportant and not relevant to
>>>>>>>>>>> the code
>>>>>>>>>>> imo. However these seem to be _the_ central policy methods
>>>>>>>>>>> for string
>>>>>>>>>>> dedup.
>>>>>>>>>>> So it would be nice to have them mentioned in the class/hpp
>>>>>>>>>>> file and
>>>>>>>>>>> documented there.
>>>>>>>>>>>
>>>>>>>>>>> - facts about the code are repeated, e.g. that the table size
>>>>>>>>>>> must be a
>>>>>>>>>>> power of two. This is nice, but has a few drawbacks: the
>>>>>>>>>>> compiler does
>>>>>>>>>>> not care about comments at all, so it might be useful to put
>>>>>>>>>>> asserts in
>>>>>>>>>>> these places instead, and second, it's just duplication of
>>>>>>>>>>> text with the
>>>>>>>>>>> usual problem of getting out of date.
>>>>>>>>>>>
>>>>>>>>>>> - I am not completely sure why the class has not been called
>>>>>>>>>>> G1StringDedup. It won't work without G1 anyway and calls G1
>>>>>>>>>>> internals
>>>>>>>>>>> all the time. There does not seem to be a way to make it more
>>>>>>>>>>> generic
>>>>>>>>>>> without considerable refactoring. It is fine to keep the name
>>>>>>>>>>> though,
>>>>>>>>>>> just asking. I also know that this feature does not want to
>>>>>>>>>>> be a generic
>>>>>>>>>>> solution.
>>>>>>>>>>>
>>>>>>>>>>> - is it possible to separate out the statistics variables of
>>>>>>>>>>> StringDedupTable into either a separate class instead of
>>>>>>>>>>> using globals
>>>>>>>>>>> or separate them a little from the other unrelated members? (And
>>>>>>>>>>> describe them) This would improve readability.
>>>>>>>>>>>
>>>>>>>>>>> I *think* this is everything from
>>>>>>>>>>> StringDedupTable::_entries_added to
>>>>>>>>>>> _rehash_count? I have no idea without searching for every use
>>>>>>>>>>> and
>>>>>>>>>>> deducing from that.
>>>>>>>>>>>
>>>>>>>>>>> - when implementing the hash table for the StringDedupTable,
>>>>>>>>>>> did you
>>>>>>>>>>> consider trying to reuse one of the existing hashmap
>>>>>>>>>>> implementations?
>>>>>>>>>>> And potentially extend them.
>>>>>>>>>>>
>>>>>>>>>>> Imo we already have way too many ad-hoc implementations of
>>>>>>>>>>> often used
>>>>>>>>>>> data structures (particularly) in G1 and GC code.
>>>>>>>>>>>
>>>>>>>>>>> - please document the properties of the StringDedupTable
>>>>>>>>>>> (type, hashing,
>>>>>>>>>>> automatic resizing behavior, thread safety, ...).
>>>>>>>>>>>
>>>>>>>>>>> - please explain what StringDedupTable/Cache are and what
>>>>>>>>>>> they are used
>>>>>>>>>>> for in context of the feature.
>>>>>>>>>>>
>>>>>>>>>>> - stringdedup.cpp, line 1309: candicate -> candidate
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>