RFR: 8067790: Better support for native implementations of Atomic::cmpxchg

Fri Jan 9 02:29:27 UTC 2015

Hi Erik,

I don't want to get into it on this thread but please start a new thread 
on the OrderAccess changes you are proposing. Bear in mind our official 
compilers are never the most recent (by the time we ship) and we're 
unlikely to be able to use C++11 specifics in the near term, even if we 
wanted too (which we may not).

Thanks,
David

On 8/01/2015 11:12 PM, Erik Österlund wrote:
> Hi Coleen,
>
>> On 08 Jan 2015, at 06:36, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>
>> I'm sorry for coming in late to this thread and not agreeing with the code.  It's really hard to have these discussions remotely and as a distraction to other work.
>
> I completely understand, no worries.
>
>> I don't think those of us paying loose attention realized the extent of new template features were coming into the code as part of this clean up.   One webrev I saw had a small template trait (which needed explaining to me at the time).  I think using this particular clean up as an opportunity to add these bonus templates is not something that we want to do.
>
> As pointed out by Kim earlier, the webrev massively exaggerates “the extent”. It’s really only ~100 LoC template stuff with a bunch of copyright boilerplate and testing, which are not negative maintenance overheads (we can all agree it’s a bad idea to remove testing and bunch everything together in one file to make it look less extensive right?).
>
> Actually this whole code size thing has been a bit confusing. First Kim said it looked rather extensive too. So I made a code minimal version (20 LoC only to get the job done but less reusability) and a reuse max version (~100 LoC). Then after looking closer, Kim concluded code size probably was not a big issue after closer inspection (copyrights, testing). So I dropped the code minimum version after that was ruled out as an issue. And now other voices are raised again about the perceived extensiveness of the solution.
>
>> If you're adding something general to utilities, I argue that it should be well understood so that it gets the most utilization.  If such a mechanism is needed for something larger like automatic closure specialization (not sure what that is) then this may be greater motivation.  It depends on the benefits.
>
> I thought the decision was already made to not go ahead with this.
>
> But if you want to know, I can give a few examples of things that benefits from the proposed template infrastructure with brief descriptions. The uninterested reader can skip ahead. ;)
>
> ** BEGIN BENEFITS **
>
> 1) Automatic Closure Specialization: In the GC code, OopClosures and ExtendedOopClosures do arbitrary stuff with oops. It’s a neat abstraction but since the stuff to be done is arbitrary, it results in a virtual call. To work around this, mad people driven by dark forces invented a manual closure specialization mechanism to remove the virtual calls to the closures. It comes with a threefold of problems: 1) It requires closures to be manually added to a macro to be specialized (and hence get rid of the virtual call), 2) This technique fundamentally cannot handle composite closure types (closure filtering oop then calling another closure). It loses type information and knowledge of the composite structure is lost. 3) It smells. Using template metaprogramming (and some of the traits in this change) all those problems can be resolved: automatic specialization without manual labour, full type awareness meaning composite closure types can be specialized too, much cleaner, and we won’t h
ave to define both do_oop and do_oop_nv all over the place. No problem for me to add stuff needed here at a later time though.
>
> 2) OrderAccess refactoring: OrderAccess has a number of problems in terms of maintainability and correctness, and I’m currently cleaning it up. There are two main issues, but I think they are related. (I suspect correctness is inconsistent due to poor maintainability)
>
> 2.a) OrderAccess Maintainability: Different concerns are now joined to a big blob: Which barrier to use (acquire, release, fence) and where to put it w.r.t. to memory accesses, how to implement those barriers, how to make atomic memory accesses, how to specialize more efficient variants are all different concerns, now flattened out to be done over and over again for each platform even though the behaviour is very general. I intend to move everything except specialization and barrier implementation to shared code and separate their concerns, i.e. how to make atomic memory accesses and where to put the barriers is the same for everyone except the specialized variants. This removes most of the code for most of the individual platforms and makes it easier to reason about the different concerns in isolation without changing stuff all over the place. To still allow specializations with inline assembly that deviate from the norm, this conditional class selection is once again needed and 
could benefit from the exact same pattern I intended to push here for specialization and generalization.
>
> 2.b) OrderAccess Correctness: Correctness concerns are inconsistent over different platforms and dangerous which is my second issue with OrderAccess. Specifically, for TSO architectures like SPARC and x86, OrderAccess relies on C++ volatile for acquire release semantics, which in most cases is wrong. In every compiler I know of except MSVC 2008, this can not be relied upon as the C++ standard section 1.9 is vague whether volatiles may be reordered w.r.t. non-volatiles and has been interpreted differently by compiler implementors. It only really promises reordering constraints between volatiles which is not enough for acquire release semantics. As a matter of fact, GCC explicitly states they may be arbitrarily reordered at will, and therefore we need compiler barriers to prevent this. This problem was observed on linux and fixed by Mikael, on Linux. But the exact same problem (relying on volatile to have acquire release semantics) still lurks around on the other TSO architectures, 
waiting for the right moment to strike. With this refactoring, correcting such an issue (which I also intend to do because I think the current implementation is flawed) can be easily done with 3 LoC per platform - the barrier selection for memory accesses, instead of for each type of memory access on nearly every TSO platform. This will also make a later transition to C++11/14 easier when compilers are ready, also with only a few LoC.
>
> So in summary the provided metatemplating stuff could help automating closure specialization, it could help generalize/specialize Atomics (in review now), and it could help generalize/specialize OrderedAccess which IMO needs refactoring. There are also some others on the top of my head as well like removing the fence in G1’s post reference write barrier using asymmetric dekker synchronization when the platform supports it using the same reusable generalization/specialization pattern specifying which fencing mechanism to use, if we decide we want to do that, and potentially using it as a straight forward pattern for building commercial variation points to the open source code without affecting the open source code base. And Kim had some additional uses too.
>
> Of course, #2 can use conditional #define too to get the job done which is what I’m doing now since this idea was not appreciated in the end, just saying there are more uses as you asked.
>
> ** END BENEFITS **
>
> I did not elect to go for a reusable trait variant only because metaprogramming is fun. I genuinely think they are powerful tools that we could benefit from in different areas in runtime and gc. :)
>
>> Generally, if it is a clean solution to a bunch of problems, I'd be all for it.  I'd also like it to be as close to standard library code as possible and I didn't get the sense of that from my reading of the discussion.
>
> I disagreed with Kim at one point (details not necessary I think) about the necessity for compliance with standard library code at a specific point, but that has already been fixed in favour of maximum compliance, which in the end everyone were happy with.
>
>> This is very cool stuff, but we need to proceed with caution and make sure we're all up to speed with the new C++ world and can use it effectively.
>
> I think this looks like a chicken and egg problem here. We need the tools to be available to learn how to use them and integrate them to our work flow. But we won’t make the tools available unless we know how to use them in our work flow. :/
>
>> Too complicated is subjective, certainly, but closing this issue and not adding this code for atomics is my preference.
>
> 2 reviewers accepted (Volker and Paul), 2 reviewers rejected (you I take it, and David), 1 neutral (Kim) but willing to accept after an argument about whether the specialized and generalized classes should be connected by inheritance by contract (decided to remove the check and with it some traits. I'm in the middle, optimistic about the solution but confused as to how this discussion got so long it broke my email client. ;)
>
> I am fine with this decision. I never imagined this (~100 LoC metaprogramming) would lead to such a long discussion. If I knew the extent of the discussion, I would never have suggested it. I’m perfectly fine with leaving this idea behind if opinions are a bit split on this one.
>
> Thanks,
> /Erik
>