VarHandles on non-int-sized fields and atomic operations
I believe C++ allows anything to be atomic using their atomic<> template. Having the atomic annotation at the declaration site: - makes it obvious to the human and the compiler that the data must be accessed atomically. - allows the compiler to lay out the data so that native cpu instructions can access the data efficiently and atomically. So it is likely that atomic<short> will be stored in a 32-bit field, since the cpu is likely to have a 32-bit cas, but not 16-bit cas. With VarHandles, the declared fields are Ordinary Java Fields, so they are likely to be layed out as with normal fields. As a result, it looks like you can't do a cas with a VarHandle to a short field. If that's true, then the hardware is intruding into the API. What happens when a platform with only a 64-bit CAS comes along? Since atomic fields need different field layout from regular fields, it seems to make sense to require that fields which will be accessed via a VarHandle are clearly marked as being "atomic" in some way. With the Unsafe API, we sort-of got this by requiring that fields be volatile (although some users surely cheated here), and because there was no Unsafe.compareAndSwapShort (but there is with VarHandles!) My original motivation was to be able to replace an AtomicBoolean with a VarHandle to a boolean field.
On 05/23/2016 11:16 PM, Martin Buchholz wrote:
With VarHandles, the declared fields are Ordinary Java Fields, so they are likely to be layed out as with normal fields. As a result, it looks like you can't do a cas with a VarHandle to a short field. If that's true, then the hardware is intruding into the API. What happens when a platform with only a 64-bit CAS comes along?
Then you will get an exception trying to do the CAS.
Since atomic fields need different field layout from regular fields, it seems to make sense to require that fields which will be accessed via a VarHandle are clearly marked as being "atomic" in some way.
Mark that as "volatile int" :) Because if marking boolean field with some "atomic" quantifier would blow up its storage to at least int, that's what you get in the end anyhow.
With the Unsafe API, we sort-of got this by requiring that fields be volatile (although some users surely cheated here), and because there was no Unsafe.compareAndSwapShort (but there is with VarHandles!)
Notice there is still no magic: we still can do CASes only on ints, longs, references. (This is still a common denominator among all supported platforms).
My original motivation was to be able to replace an AtomicBoolean with a VarHandle to a boolean field.
No can do. AtomicBoolean, alas, should still keep "volatile int". See the relevant discussion about this here: http://mail.openjdk.java.net/pipermail/jmm-dev/2015-August/000193.html Thanks, -Aleksey
On Mon, May 23, 2016 at 3:15 PM, Aleksey Shipilev <aleksey.shipilev@oracle.com> wrote:
Since atomic fields need different field layout from regular fields, it seems to make sense to require that fields which will be accessed via a VarHandle are clearly marked as being "atomic" in some way.
Mark that as "volatile int" :) Because if marking boolean field with some "atomic" quantifier would blow up its storage to at least int, that's what you get in the end anyhow.
But that's violating the abstraction boundary! You can use AtomicBoolean without being aware of whether you are actually generating 1-bit, 8-bit or 32-bit CASes. That's the JVM's job! It's going back to Evil Old C if we start having to encode our booleans using ints, just because of the instructions common on today's CPUs. It's true that users had to do this with Unsafe, but aren't VarHandles supposed to be a public high level replacement? I think we're losing something if any VarHandles to primitive types fail to have a CAS.
On 05/24/2016 01:32 AM, Martin Buchholz wrote:
On Mon, May 23, 2016 at 3:15 PM, Aleksey Shipilev <aleksey.shipilev@oracle.com> wrote:
Since atomic fields need different field layout from regular fields, it seems to make sense to require that fields which will be accessed via a VarHandle are clearly marked as being "atomic" in some way.
Mark that as "volatile int" :) Because if marking boolean field with some "atomic" quantifier would blow up its storage to at least int, that's what you get in the end anyhow.
But that's violating the abstraction boundary!
You can use AtomicBoolean without being aware of whether you are actually generating 1-bit, 8-bit or 32-bit CASes. That's the JVM's job!
Even JVM is just a program and cannot perform miracles. By the way, what exactly are you winning with 1-byte field in AtomicBoolean? java.util.concurrent.atomic.AtomicBoolean object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) N/A 4 4 (object header) N/A 8 4 (object header) N/A 12 4 int AtomicBoolean.value 0 Instance size: 16 bytes Making AtomicBoolean.value boolean would not decrease instance size, because it will still be rounded towards 16 bytes, with 3 bytes alignment shadow.
It's going back to Evil Old C if we start having to encode our booleans using ints, just because of the instructions common on today's CPUs. It's true that users had to do this with Unsafe, but aren't VarHandles supposed to be a public high level replacement?
VarHandles is hardly "high level", pretty much like MethodHandles are not high level. Those are building blocks to be used in library/runtime code. What Unsafe accesses could do, VarHandles can also do (sometimes more) -- without violating correctness and platform reliability.
I think we're losing something if any VarHandles to primitive types fail to have a CAS.
Not "any", int/long/reference VarHandles still do CASes. Note that the current API does not preclude implementing CASes for other types if you can come up with a plausible mechanics of doing so. The thing is, there does not seem to be a plausible fallback strategy when platform cannot do a subword CAS. Or at least I cannot see it. Artisanal Unsafe.compareAndSwapBoolean implementations are welcome :) Thanks, -Aleksey
On Mon, May 23, 2016 at 3:48 PM, Aleksey Shipilev <aleksey.shipilev@oracle.com> wrote:
By the way, what exactly are you winning with 1-byte field in AtomicBoolean?
I'm not trying to replace the int field inside the AtomicBoolean with a boolean - that's an implementation detail. (Although I would take it if I could declare an atomic boolean inside AtomicBoolean and let the JVM choose the best size for the platform. It *is* a small weakness that we need to use the "int" type here in java code) I'm trying to allow regular programmers to declare their primitive fields with the natural Java type and have all the atomic operations available.
Not "any", int/long/reference VarHandles still do CASes. Note that the current API does not preclude implementing CASes for other types if you can come up with a plausible mechanics of doing so.
The thing is, there does not seem to be a plausible fallback strategy when platform cannot do a subword CAS. Or at least I cannot see it. Artisanal Unsafe.compareAndSwapBoolean implementations are welcome :)
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop. cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } } but it would probably be more efficient if a fullword was allocated for the subword field.
On May 23, 2016, at 4:20 PM, Martin Buchholz <martinrb@google.com> wrote:
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
Yes, that's the "artisanal" version I would reach for. It doesn't scale well if there is unrelated activity on nearby bytes. But, for that matter, "nearby" can mean "within 64 bytes", which is why we have @Contended for when we really need it. — John
On 05/24/2016 05:43 AM, John Rose wrote:
On May 23, 2016, at 4:20 PM, Martin Buchholz <martinrb@google.com <mailto:martinrb@google.com>> wrote:
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
Yes, stupid me! I was under impression that loops are no-no to emulate strong CAS. But we do loops already with LL/SC...
Yes, that's the "artisanal" version I would reach for. It doesn't scale well if there is unrelated activity on nearby bytes.
Okay, we are exploring it here: https://bugs.openjdk.java.net/browse/JDK-8157726 I was able to intrinsify subword accesses on x86_64, and their performance is on par with int versions. Plain Martin-style Java loops are around 2x slower than direct intrinsics in a few basic tests (I expect them to be even slower on contended cases and/or non-x86 platforms). But first, we need to hook them up to VarHandles (in progress now). Thanks, -Aleksey
On 24 May 2016, at 21:29, Aleksey Shipilev <aleksey.shipilev@oracle.com> wrote:
On 05/24/2016 05:43 AM, John Rose wrote:
On May 23, 2016, at 4:20 PM, Martin Buchholz <martinrb@google.com <mailto:martinrb@google.com>> wrote:
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
Yes, stupid me! I was under impression that loops are no-no to emulate strong CAS. But we do loops already with LL/SC…
Indeed, doh! Martin, many thanks for persisting with this.
Yes, that's the "artisanal" version I would reach for. It doesn't scale well if there is unrelated activity on nearby bytes.
Okay, we are exploring it here: https://bugs.openjdk.java.net/browse/JDK-8157726
I was able to intrinsify subword accesses on x86_64, and their performance is on par with int versions. Plain Martin-style Java loops are around 2x slower than direct intrinsics in a few basic tests (I expect them to be even slower on contended cases and/or non-x86 platforms). But first, we need to hook them up to VarHandles (in progress now).
Nice work! This is looking very promising on x86. Paul.
More high-level observations on low-level operations: We already sort-of have an existing field qualifier for atomic: "volatile" ! It is already the case that e.g. volatile long is atomic while unadorned long is not. But atomics without CAS make us sad, so we're adding them. Also, by analogy, Atomic*FieldUpdaters must refer to a volatile variable. It seems not unreasonable to require that VarHandles also refer to a volatile field. If a field is declared volatile boolean; then the VM should ensure not only that it can be reasonably efficiently updated using ordinary read/write as is already the case but also that it can be reasonably efficiently CAS'ed, and that may mean giving it 32 bits instead of 8 on some platforms. But it would be a VM implementation detail. It would be even nicer if the field qualifier was literally "atomic", but I don't think that is going to happen. The best we can hope for is: "volatile" is how you spell "atomic" in Java.
On 25 May 2016, at 01:15, Martin Buchholz <martinrb@google.com> wrote:
More high-level observations on low-level operations:
We already sort-of have an existing field qualifier for atomic: "volatile" ! It is already the case that e.g. volatile long is atomic while unadorned long is not. But atomics without CAS make us sad, so we're adding them. Also, by analogy, Atomic*FieldUpdaters must refer to a volatile variable. It seems not unreasonable to require that VarHandles also refer to a volatile field.
We wanted the flexibility to perform “normal" plain access against other accesses using a VarHandle, of course that requires very careful use. Furthermore, other operations anyway override that of the volatile semantics. (Relatedly, there is no qualifier on the components of an array.)
If a field is declared volatile boolean; then the VM should ensure not only that it can be reasonably efficiently updated using ordinary read/write as is already the case but also that it can be reasonably efficiently CAS'ed, and that may mean giving it 32 bits instead of 8 on some platforms. But it would be a VM implementation detail.
That flexibility may be required for SPARC [*], though i don’t know how much work would be required to support such platform Object specific layouts.
It would be even nicer if the field qualifier was literally "atomic", but I don't think that is going to happen. The best we can hope for is:
"volatile" is how you spell "atomic" in Java.
That is a reasonable assumption under the circumstances, although i don’t like the way volatile conflates atomicity and memory ordering guarantees and arguably what you propose does adds to the conflation, since int field access is already atomic. I would still prefer the flexibility of not requiring a field covered by a VarHandle to be declared as such, even if volatile is conflated further (as mostly an implementation detail) to imply efficient CAS operations can be performed, perhaps at the expense of using more memory. Aleksey did some analysis to indicate we might be able to achieve access atomicity (not conflated with being able to perform an efficient CAS) by default without qualification for all types: http://shipilev.net/blog/2014/all-accesses-are-atomic/ and you can even use an experimental flag -XX:+AlwaysAtomicAccesses and try it out. Paul. [*] IIUC SPARC has just 32/64bit CAS operation. ARM has byte/short LL/SC instructions.
On Wed, May 25, 2016 at 12:53 AM, Paul Sandoz <paul.sandoz@oracle.com> wrote:
On 25 May 2016, at 01:15, Martin Buchholz <martinrb@google.com> wrote:
We already sort-of have an existing field qualifier for atomic: "volatile" ! It is already the case that e.g. volatile long is atomic while unadorned long is not. But atomics without CAS make us sad, so we're adding them. Also, by analogy, Atomic*FieldUpdaters must refer to a volatile variable. It seems not unreasonable to require that VarHandles also refer to a volatile field.
We wanted the flexibility to perform “normal" plain access against other accesses using a VarHandle, of course that requires very careful use. Furthermore, other operations anyway override that of the volatile semantics.
A*FU also provides API to weaken the normal semantics of volatile - e.g. (the poorly named and specified) lazySet. So it is already the case that there is no global sequential consistent order in Java for all memory operations on fields marked volatile! (It's true that there is no "volatile" for array elements) The dynamic nature of VarHandle bothers me a little. Static analysis tools cannot discover fields being used with atomic operations with certainty. Even though in practice most uses of VarHandles will be to fields declared "nearby".
On Wed, May 25, 2016 at 12:53 AM, Paul Sandoz <paul.sandoz@oracle.com> wrote:
Aleksey did some analysis to indicate we might be able to achieve access atomicity (not conflated with being able to perform an efficient CAS) by default without qualification for all types:
I don't do "mobile Java", but I suspect platforms without 64-bit atomic instructions are still important.
and you can even use an experimental flag -XX:+AlwaysAtomicAccesses and try it out.
We should imagine that Value Types are coming soon, and what we're developing now will need to fit into that world. We will have value types that are too small and too big for machine CAS instructions. The ones that are too big will be a bigger problem (need to use a lock!) than the ones that are too small (need to use over-sized CAS in a loop, or reserve more space)
On 25/05/2016 5:29 AM, Aleksey Shipilev wrote:
On 05/24/2016 05:43 AM, John Rose wrote:
On May 23, 2016, at 4:20 PM, Martin Buchholz <martinrb@google.com <mailto:martinrb@google.com>> wrote:
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
Yes, stupid me! I was under impression that loops are no-no to emulate strong CAS. But we do loops already with LL/SC...
The VM also does it for jbyte variant.
Yes, that's the "artisanal" version I would reach for. It doesn't scale well if there is unrelated activity on nearby bytes.
Okay, we are exploring it here: https://bugs.openjdk.java.net/browse/JDK-8157726
Don't overlook we are past FC for hotspot so this will need approval before it can be pushed. David -----
I was able to intrinsify subword accesses on x86_64, and their performance is on par with int versions. Plain Martin-style Java loops are around 2x slower than direct intrinsics in a few basic tests (I expect them to be even slower on contended cases and/or non-x86 platforms). But first, we need to hook them up to VarHandles (in progress now).
Thanks, -Aleksey
On 25/05/2016 6:43 AM, David Holmes wrote:
On 25/05/2016 5:29 AM, Aleksey Shipilev wrote:
On 05/24/2016 05:43 AM, John Rose wrote:
On May 23, 2016, at 4:20 PM, Martin Buchholz <martinrb@google.com <mailto:martinrb@google.com>> wrote:
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
Yes, stupid me! I was under impression that loops are no-no to emulate strong CAS. But we do loops already with LL/SC...
The VM also does it for jbyte variant.
Yes, that's the "artisanal" version I would reach for. It doesn't scale well if there is unrelated activity on nearby bytes.
Okay, we are exploring it here: https://bugs.openjdk.java.net/browse/JDK-8157726
Don't overlook we are past FC for hotspot so this will need approval before it can be pushed.
Never mind I see the JEP 193 exemption - though it is a bit odd to see an exception for a JEP that is already marked integrated. David
David -----
I was able to intrinsify subword accesses on x86_64, and their performance is on par with int versions. Plain Martin-style Java loops are around 2x slower than direct intrinsics in a few basic tests (I expect them to be even slower on contended cases and/or non-x86 platforms). But first, we need to hook them up to VarHandles (in progress now).
Thanks, -Aleksey
On 25 May 2016, at 01:14, David Holmes <david.holmes@oracle.com> wrote:
Yes, that's the "artisanal" version I would reach for. It doesn't scale well if there is unrelated activity on nearby bytes.
Okay, we are exploring it here: https://bugs.openjdk.java.net/browse/JDK-8157726
Don't overlook we are past FC for hotspot so this will need approval before it can be pushed.
Never mind I see the JEP 193 exemption - though it is a bit odd to see an exception for a JEP that is already marked integrated.
Because the main feature was integrated into master. The exception is primarily for finishing up some of the stress testing, which can proceed beyond the integration of the feature. In the interim new API enhacements have arrived, and may arrive in the future especially given the long soak time. It is not unusual to process additional enhancements after a JEP is integrated or completes, but it certainly gets more complicated post FC! Paul.
On Mon, May 23, 2016 at 7:43 PM, John Rose <john.r.rose@oracle.com> wrote:
It doesn't scale well if there is unrelated activity on nearby bytes. But, for that matter, "nearby" can mean "within 64 bytes", which is why we have @Contended for when we really need it.
Because "nearby" means "within 64 bytes", I think it's a non-issue for small atomic value types. There is no significant difference between atomic and plain fields. --- As for the storage for a boolean, 1-bit, 8-bit and 32-bit are all plausible choices for VMs today. 1-bit storage is most compact of course, but is rejected as too expensive because modern machines don't have instructions to modify single bits, and because sequences of adjacent booleans are rare. But note that we implement BitSet using a long[] kludge because of the lack of a compact boolean[] with 1-bit/boolean.
On 24/05/2016 9:20 AM, Martin Buchholz wrote:
On Mon, May 23, 2016 at 3:48 PM, Aleksey Shipilev <aleksey.shipilev@oracle.com> wrote:
By the way, what exactly are you winning with 1-byte field in AtomicBoolean?
I'm not trying to replace the int field inside the AtomicBoolean with a boolean - that's an implementation detail. (Although I would take it if I could declare an atomic boolean inside AtomicBoolean and let the JVM choose the best size for the platform. It *is* a small weakness that we need to use the "int" type here in java code)
I'm trying to allow regular programmers to declare their primitive fields with the natural Java type and have all the atomic operations available.
Not "any", int/long/reference VarHandles still do CASes. Note that the current API does not preclude implementing CASes for other types if you can come up with a plausible mechanics of doing so.
The thing is, there does not seem to be a plausible fallback strategy when platform cannot do a subword CAS. Or at least I cannot see it. Artisanal Unsafe.compareAndSwapBoolean implementations are welcome :)
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
but it would probably be more efficient if a fullword was allocated for the subword field.
The above will only work if the subword field is suitably aligned within the word ie atomicRead32() needs to know the address of the subword of interest. I don't see the VarHandle situation as being any different from the Atomic*FieldUpdater one. The practicalities of implementation limitations shaped the API so that we don't give the illusion of delivering something we aren't. My only problem with VarHandles is that I can't see anything that defines when the various AccessModes are unsupported. ?? Cheers, David
Expansion ... On 24/05/2016 4:50 PM, David Holmes wrote:
On 24/05/2016 9:20 AM, Martin Buchholz wrote:
On Mon, May 23, 2016 at 3:48 PM, Aleksey Shipilev <aleksey.shipilev@oracle.com> wrote:
By the way, what exactly are you winning with 1-byte field in AtomicBoolean?
I'm not trying to replace the int field inside the AtomicBoolean with a boolean - that's an implementation detail. (Although I would take it if I could declare an atomic boolean inside AtomicBoolean and let the JVM choose the best size for the platform. It *is* a small weakness that we need to use the "int" type here in java code)
I'm trying to allow regular programmers to declare their primitive fields with the natural Java type and have all the atomic operations available.
Not "any", int/long/reference VarHandles still do CASes. Note that the current API does not preclude implementing CASes for other types if you can come up with a plausible mechanics of doing so.
The thing is, there does not seem to be a plausible fallback strategy when platform cannot do a subword CAS. Or at least I cannot see it. Artisanal Unsafe.compareAndSwapBoolean implementations are welcome :)
As I said in a previous message, you can implement subword CAS using fullword CAS in a loop.
cas8bit(expect, update) { for (;;) { fullword = atomicRead32() if ((fullword &0xff) != expect) return false; if (cas32(fullword, (fullword & ~0xff) | update) return true; } }
but it would probably be more efficient if a fullword was allocated for the subword field.
The above will only work if the subword field is suitably aligned within the word ie atomicRead32() needs to know the address of the subword of interest.
... and you need to know how to form the correct mask. David -----
I don't see the VarHandle situation as being any different from the Atomic*FieldUpdater one. The practicalities of implementation limitations shaped the API so that we don't give the illusion of delivering something we aren't. My only problem with VarHandles is that I can't see anything that defines when the various AccessModes are unsupported. ??
Cheers, David
On 24 May 2016, at 08:50, David Holmes <david.holmes@oracle.com> wrote: My only problem with VarHandles is that I can't see anything that defines when the various AccessModes are unsupported. ??
Each producer of a VarHandle (factory method) specifies the supported access modes. The documentation on VarHandle categorises the access modes and the factory method refers to those categories. The method VarHandle.isAccessModeSupported can be used at runtime to query if the access mode is supported. If we choose to support subword CAS that would bring some clarity to this situation. A concern is it might be misleading in terms of performance expectations. Paul.
Another way to look at it: We have an existing field boolean closed; which should be updated using CAS for correctness, but currently is not. I should not have to change the type of the field to int just so that I can get something that is CASable.
On 24 May 2016, at 02:55, Martin Buchholz <martinrb@google.com> wrote:
Another way to look at it: We have an existing field
boolean closed;
which should be updated using CAS for correctness, but currently is not. I should not have to change the type of the field to int just so that I can get something that is CASable.
I think you would have to qualify that field in such a manner to inform the VM how best to layout the memory optimally for expected operations on that field. At the moment we don’t have such a qualifier: “this is a boolean but i don’t care about size, please use what ever bit size and alignment is needed so i can perform hardware-based atomic operations on it” As Aleksey said VarHandle is low-level. It can replace the use of Unsafe within AtomicBoolean. If you currently want to achieve what AtomicBoolean does without the boxing you need to use an int field and operate on it directly. It seems quite possible to expose VarHandle field views covering an int for the shorter types boolean, byte. char and short. Internally those types are widened to int for basic signatures so i think it’s just a matter of rewiring to the VH int field support with adjusted method types. (The boolean case may require a little more care given the disparity between Java source and byte code.) However, i am not sure that is very helpful. I would be reluctant at this point (and we were already reluctant) to investigate sub-word atomic ops, and the implications that might have for guarantees/locking/tearing/performance etc. Paul.
On Mon, May 23, 2016 at 1:16 PM, Martin Buchholz <martinrb@google.com> wrote:
So it is likely that atomic<short> will be stored in a 32-bit field, since the cpu is likely to have a 32-bit cas, but not 16-bit cas.
So apparently you can still store your atomic<short> in 16 bits, by implementing strong CAS and atomic assignment using read+mask+cas on enclosing 32 bits in a loop. But if you knew ahead of time that your short was going to be atomic, you might want to give it those extra 16 bits anyways?
participants (5)
-
Aleksey Shipilev
-
David Holmes
-
John Rose
-
Martin Buchholz
-
Paul Sandoz