RFR(L) 8195099: Low latency hashtable for read-mostly scenarios
Robbin Ehn
robbin.ehn at oracle.com
Thu May 3 08:17:22 UTC 2018
Hi Martin,
On 05/02/2018 07:53 PM, Doerr, Martin wrote:
> Hi Robbin,
>
> thanks for contributing and especially for having quite a few memory barriers already in place.
>
> Yes, C++11 Atomics would be nice.
>
> The current version accesses _next without any compiler barriers so I'm kind of concerned about compilers which tend to reload and reorder accesses. We should double-check if nothing can go wrong when not using volatile.
No, we read it via:
return OrderAccess::load_acquire(&_next);
In store case we trust the implied barriers from CAS:
new_node->set_next(first_at_start);
}
if (bucket->cas_first(new_node, first_at_start)) {
It now actually have a stronger than needed ordering. We do not always need
acquire on load, but volatile semantics is always wrong, we either have stronger
or lesser.
E.g. in internal_remove loading the _next pointer should not need acquire.
After CAS:ing in the lock bit, the chain is stable and we unlock with a
release_store. Loads between the CAS and the release_store my be re-ordered
since the chain is stable.
>
> I guess this code hasn't been tested on weak memory platforms?
> So we should better take a close look at the barriers.
An earlier version have been successfully tested on aarch64. But it did not have
the CAS inserts.
>
> It'd be great if we could use precise ordering semantics for the Atomic::cmpxchg and Atomic::add (which is currently being reviewed as JDK-8202080).
I did on purpose not do any special handling for platforms not following the
documented semantics. My thought was either you would do 8202080 or just add
needed barrier in e.g. Bucket::cas_first.
Btw, I think 8202080 looks good, specially like conservative as default.
With conservative as default on cmpxchg and add for global counter, this should
work on weaker platforms, but not alpha.
/Robbin
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn
> Sent: Mittwoch, 2. Mai 2018 15:57
> To: David Holmes <david.holmes at oracle.com>; Gerard Ziemski <gerard.ziemski at oracle.com>
> Cc: hotspot-dev developers <hotspot-dev at openjdk.java.net>
> Subject: Re: RFR(L) 8195099: Low latency hashtable for read-mostly scenarios
>
> On 2018-05-02 14:40, David Holmes wrote:
>> Not a review, one comment ...
>>
>> On 2/05/2018 6:31 PM, Robbin Ehn wrote:
>>> On 2018-04-30 21:23, Gerard Ziemski wrote:
>>>> #3 How come "_first" doesn't need to be volatile to be used in Atomic::cmpxchg ?
>>>>
>>>> Node* _first;
>>>> ...
>>>> if (Atomic::cmpxchg(node, &_first, expect) == expect) {
>>>
>>> An implicit cast to the same compatible type which is volatile-qualified is okey.
>>
>> Any variable used in a lock-free manner should be declared volatile as general
>> good practice IMHO.
>
> Generally I agree with you. But in this case I didn't find the volatile
> qualifier helpful, it just clobbered the code. I didn't see any readability
> improvements from having essential all lines with Node* also having volatile in
> them.
>
> What I would like is a type that forces the use of the correct ordering, e.g.
> Atomic<Node*, memory_order_acq_rel> _next;
>
> /Robbin
>
>>
>> David
>> -----
>>
>>>>
>>>>
>>>> #4 Inconsistent parameter names - in:
>>>>
>>>> template <typename LOOKUP_FUNC, typename VALUE_FUNC>
>>>> bool get_insert_lazy(Thread* thread, LOOKUP_FUNC& lookup, VALUE_FUNC& val_f,
>>>> bool* grow_hint = NULL) {
>>>> return get_insert_lazy(thread, lookup, val_f, noOp, grow_hint);
>>>> }
>>>>
>>>> We use "val_f" with "_f” suffix, so we should have "lookup_f", not "lookup"
>>>> (many other instances) and "found_f", not "foundf" in
>>>>
>>>> template <typename LOOKUP_FUNC, typename FOUND_FUNC>
>>>> bool get(Thread* thread, LOOKUP_FUNC& lookup, FOUND_FUNC& foundf,
>>>> bool* grow_hint = NULL);
>>>>
>>>
>>> Fixed.
>>>
>>>>
>>>> #5 I wasn't familiar with CRTP, so I read up on it, but I still don't see the
>>>> CRTP in BaseConfig - which is the base class and which is derived? In
>>>> particular I don't see "class Derived : public Base<Derived>" pattern here?
>>>
>>> It's the supplied config that extends the BaseConfig which is the derived.
>>>
>>> In gtest (test_concurrentHashtable.cpp) you will see this:
>>>
>>> // Simplest working CRPT implementation for the hash-table.
>>> struct Pointer : public SimpleTestTable::BaseConfig {
>>>
>>>>
>>>> More to come…
>>>>
>>>>
>>>> btw. I edited the subject of the email slightly by adding the bug number to
>>>> it, so it’s easy to search for.
>>>>
>>>
>>> Thanks, I'll send the update to the original mail.
>>>
>>> /Robbin
>>>
>>>
>>>
>>>>
>>>> cheers
>>>>
>>>>
>>>>> On Apr 26, 2018, at 2:38 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>>>>
>>>>>
>>>>> Hi all, please review.
>>>>>
>>>>> The lower latency of the new gcs have higher demands on runtime data-structure
>>>>> in terms of latency. This is a concurrent hashtable using the global-counter
>>>>> (8195099).
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~rehn/8195098/v0/webrev/
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8195098
>>>>>
>>>>> * Readers never blocks or spins.
>>>>> * Inserts do CAS from a read-side, need to re-CAS during re-size/deletion in
>>>>> targeted bucket or insert collision. (inserts are conceptually a reader)
>>>>> * Deletes locks the targeted bucket, all other buckets are free for operations.
>>>>> * Re-size locks one bucket at the time, all other buckets are free for
>>>>> operations.
>>>>>
>>>>> It does concurrent re-size by doubling size and use one more bit from hash.
>>>>> That means a bucket in the old table contains nodes to either one of two
>>>>> buckets
>>>>> in the new table. Each node in the chain is sorted into one of the two buckets
>>>>> with concurrent readers. Shrinking just take two node chains and put them
>>>>> together in the corresponding bucket. To keep track of what is visible to the
>>>>> concurrent readers it's uses a global-counter, needed during re-size and for
>>>>> deletes.
>>>>>
>>>>> A gtest is provided which passes on our platforms, we also have a prototype
>>>>> of the stringtable using this which passes tier 1-5 on our platforms.
>>>>>
>>>>> Various people have pre-reviewed various parts, thanks! And a special thanks to
>>>>> Coleen for a lot of reviewing!
>>>>>
>>>>> Thanks, Robbin
>>>>
More information about the hotspot-dev
mailing list