RFR (S): CR 8004318/JEP 171 Fences intrinsics

Wed Dec 5 08:55:14 PST 2012

Hi Doug,

Thank you for engaging in an honest dialog.

I have, in a previous email, posted an outline for and @Contented alternative. I cannot say that it paints a complete picture but then I've not really started thinking about this until recently and as I've mentioned to David, I'm really buried in the GC world at the moment (with adaptive sizing). I will repost a terse version. 1) we need to be able to detect the problem. I mentioned that we can do that with an MSR. IMHO, it would be better to focus some efforts on putting MSR access into the serviceability APIs. Thus we could detect (for almost free) when this types of problems arise. Not just cache miss/hit ratios due to some pathological variable layout but for other things such as TLAB hit/miss ratios which in a self tuning JVM world (something Oracle is putting on the boards for a possible Java 9 delivery) result in an adjustment to use large pages. Without this type of measurement, introduction of this annotation is a purely speculative measure. 2) We need to determine the code which is responsible for the problem. Again, this information is buried in the JVM.. we just need to get to it. 3) We need to make the appropriate adjustment to the code (arrange to have the values end up in different cache lines). Again, the JVM may need some rework to get this done but it's in the best position to make this adjustment. In fact some of Aleksey's code would/could be helpful in making this happen.

I have Java code that runs in Linux that programs the MSRs. Unfortunately it's in a form that I cannot release here in this forum. If I could, I would. That said, I would be happy to work up a fragment that I could release. Unfortunately I'm not in a position to do this.. this week or next. If it's not too late, paying job will allow me to post code but only after Dec 15th. I'll happily post code then (or sooner if I can squeeze it in).

WIth all due respect to those like yourself that have done so much to further the platform, i would say that this can be done without the need to overhaul the JMM. Some people will mis-use what ever you give them so I don't really go for the mothering attitude either. However, I do think that delegating the decision to a time when better information is available has been a brilliant success.  In other words, from a performance perspective, it is the run time that has the highest quality information to understand if an optimization or adjustment to the code is going to make a difference (or not). And in the case of @Contented I would argue that it is the once again the run time that will have the highest quality information needed to understand if variables should be pushed away from each other or if this is simply un-needed overhead that is simply reducing the effectiveness of the CPU caches.

Kind regards,
Kirk

PS, as for the JEP process it's self.. it's not a very well known process. I come in contact with dozens of developers in the course of my performance workshops and tuning engagements and I would say that very very few of them have ever heard of a JEP. In fact I would go further to say that IME, very few people of what one might consider to be luminaries in the industry are aware of the JEP process. So, my earlier comment was more to the point that while things may have been collecting dust for a while... I and many others didn't know about it.. but now that I do, I'm paying attention.. (can't speak for the others).

On 2012-12-05, at 5:14 PM, Doug Lea <dl at cs.oswego.edu> wrote:

> On 12/05/12 10:50, Kirk Pepperdine wrote:
> 
>> their applications are fighting with the hardware... That said, I really
>> wished that we had a better... safer way to achieve the same effect than
>> exposing people to unsafe.
>> 
> 
> Of course, we all do. Mere wishing has had the effect of not
> exposing these obviously-needed intrinsics for 11 years now.
> (I regret letting people talk me out of exposing them
> back when I was first involved in the discussions of the
> semantics of c2 acquire/release/volatile membars.) And the past
> decade's worth of discussion, periodically re-raised
> on concurrency-interest list (http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest) usually boil down to: (1) We
> cannot expose these methods because we cannot spec them
> without overhauling the JMM, and (2) We don't like it
> that some people will misuse them. I'm sure that
> if you had a better solution for a better outcome, you
> would have posted it there :-)
> 
> So, the current @Contended and Fences proposals concede
> these arguments. They do not provide support in Java(tm) the
> language or platform, but are available via an
> unstandardized API on openJDK and any other JVM
> that wants to offer this functionality. My intent
> was that these concessions should then allow this
> support to at least be in place for use now
> inside core libraries, and so that when the more
> controversial aspects are resolved, we don't
> stall yet again  on the implementation side.
> 
> -Doug
>