hg: zgc/zgc: 208 new changesets
Stuart Monteith
stuart.monteith at linaro.org
Mon May 28 16:42:02 UTC 2018
Thanks for taking a look Erik.
On 25 May 2018 at 19:21, Erik Osterlund <erik.osterlund at oracle.com> wrote:
> Hi,
>
>> On 25 May 2018, at 16:35, Stuart Monteith <stuart.monteith at linaro.org> wrote:
>>
>> Hello,
>> I've continued to work on ZGC on aarch64. It isn't quite working
>> yet, and here are my latest patches:
>>
>> http://cr.openjdk.java.net/~smonteith/zgc/webrev-zgc-1/
>>
>> It appears to work ok using the Interpreter, it mostly works with C1,
>> and C2 seems to have issues in code generation.
>>
>> Some observations:
>> 1. The code is a mix of the SPARC and the x86 port. The os_cpu code
>> is essentially the linux_x86 code, but with the multimapping removed.
>
> On that note, it seems like the backing file stuff in your patch could be removed.
>
Per has said the same. I'll bite the bullet and remove it.
>> The aarch64 architecture allows the top 8 bits of pointers to be
>> ignored. Provided we don't need the tags to be in signal contexts, we
>> should be ok.
>
> Sounds exactly like SPARC. The top byte was used. I’m glad you could use that as a template for the AArch64 port.
The only difference really is that it is a fixed 8 bits. One change
I'll do is to shift the 4 bits we use to the end, in case that number
gets reduced.
>
>> 2. The load barriers are tragically unoptimised - I'm essentially
>> spilling all registers on the slow path. The load barriers can be
>> called in situations where the scratch registers are active, so some
>> saving is required. I've not fully considered the x86
>> register-specific stubs either.
>
> On that note, I noticed that in the barrier set assembler, you get a tmp1 register passed in. But it is not used, and instead rscratch2 is used (unless it has a collision with dst). Perhaps tmp1 could be used instead. Oh and maybe load the bad mask from rthread instead of ExternalAddress to make it position independent.
This requires more investigation on my part. Using the tmp register a
few revisions back was causing an issue in the interpreter - t turned
out the temporary register was not as available as I'd like. Also, tmp
is often set to noreg, and we need to find a useful register.
I've observed using the badmask from rthread - that should improve the
code in most cases, so I'll implement that next.
>
>> 3. The aarch64 port was written using 48-bit literal oops, which
>> strips off the colours in compiled code. I'm currently working on a
>> patch to enable 64-bit literals, this is a separate patch:
>> http://cr.openjdk.java.net/~smonteith/zgc/webrev-oop64-1/
>> Having 64-bit literal oops will help allow the 52-bit virtual
>> address space in future Arm architectures to be supported.
>> 4. The 64-bit literals don't work fully with G1GC and C2 - the gc
>> benchmark I use for testing passes with C1. 64-bit literal oops are
>> only enabled with -XX:+Use64BitLiteralOops - +UseZGC doesn't enable it
>> yet.
>
> I am curious if you could just materialize the oops without the colours instead. They will just be masked off by the HW anyway and are invariantly always good, and do not use load barriers. Although that would probably make comparisons and stores with such literals involved a bit awkward. So probably best to support 64 bit oops instead anyway.
I was hoping the former could be done, but I'll go with the latter
from now, as there are other benefits, namely support for 52-bit
addressing.
>
>> 5. Some of the constants that need to change are in enums - so the
>> Use64BitLiteralOops option has to be set, otherwise you'll get a
>> horrible mix of 48-bit constants and some 64-bit calculations.
>> 6. This is based on today's code on the ZGC branch. I've not spent
>> much time on the C2 barriers as of yet, but it seems no worse than
>> before.
>
> You might want to use the bad bit C2 node to map it against the rheapbase register if you get that working. Oh, and you don’t need the CompareAndSwap2I matching rules any longer; it was only part of a medium-slow path that has since long been removed.
Thanks - have a look at the AddrBadBitNode and I'll remove the
CompareAndSwap2I rules.
>
>> 7. It is assumed that the literal oops aren't patched outside of
>> safepoints. I'm aware this is an assumption that needs to be checked.
>> As constants are spread over 4 instructions (movz, movk, movk, movk),
>> then we need to look at another scheme.
>
> Today that should be true. We walk the code heap in safepoints only. The plan for concurrent class unloading going forward is to use nmethod entry barriers (triggered upon first entry into an nmethod after the mark start pause to mark objects and make the oops good) that patches the oops concurrently.
That's good to know. I presume the entry barriers would gate the entry
and fixup to a single thread, which would then do the patching before
issuing the correct barriers and letting the threads continue.
>
>> 8. Unlike SPARC and x86, AArch64 has a weak memory model - I haven't
>> considered yet when
>
> What could possibly go wrong...
Nothing, it'll all be OK in the end :-)
>
>> 9. SRDM - those are my initials - I've used them to highlight to
>> myself things that need done.
>>
>> This is all very much a work in progress, but I would appreciate people's input.
>
> Thank you for taking ZGC to AArch64. And nice to see more use of hardware VA masking. :)
>
Thanks, it'll be exciting to get it working, and working well.
> Thanks,
> /Erik
>
>> Thanks,
>> Stuart
>>
>>
>>> On 16 April 2018 at 20:47, Per Liden <per.liden at oracle.com> wrote:
>>> Hi Stuart,
>>>
>>>> On 04/16/2018 04:46 PM, Stuart Monteith wrote:
>>>>
>>>> Thanks for that Per - I do appreciate being able to rebase my aarch64
>>>> work on top of your periodic drops.
>>>
>>>
>>> Cool. As you might have noticed, the ZGC repo now includes the latest
>>> BarrierSetAssembler changes that Erik Österlund has been working on. This
>>> greatly simplifies the task of adding CPU-specific load barriers needed for
>>> the interpreter.
>>>
>>>>
>>>> I've gotten to the point where I am running with a slowdebug build running
>>>> with:
>>>> -XX:+VerifyOops -XX:+ZVerifyForwarding -XX:+ZVerifyMarking -Xint
>>>>
>>>> The benchmark I'm running is perhaps cruel and unusual, but even with
>>>> the VM pinned to one CPU, there is an issue with verify_oop getting a
>>>> bad address (0x8).
>>>
>>>
>>> Please note that -XX:+VerifyOops has been broken in jdk/hs (and therefore
>>> also in zgc/zgc), so watch out a bit there. There's a bug for that here:
>>> https://bugs.openjdk.java.net/browse/JDK-8187078
>>> The problem you're running into might be real, just saying don't blindly
>>> trust -XX:+VerifyOops at the moment.
>>>
>>> Would you mind uploading your current patch to cr.openjdk.java.net? I don't
>>> have a aarch64 to test/debug on, but it would be interesting to have a look
>>> anyway.
>>>
>>> cheers,
>>> Per
>>>
>>>>
>>>> BR,
>>>> Stuart
>>>>
>>>>> On 16 April 2018 at 15:02, Per Liden <per.liden at oracle.com> wrote:
>>>>>
>>>>> FYI, this rebase brings the ZGC repo in sync with jdk/hs (as it looked
>>>>> mid
>>>>> Thursday last week). As jdk/hs is closing we'll be shifting over to
>>>>> jdk/jdk
>>>>> as upstream.
>>>>>
>>>>> cheers,
>>>>> Per
>>>>>
>>>>>> On 04/16/2018 03:48 PM, per.liden at oracle.com wrote:
>>>>>>
>>>>>>
>>>>>> Changeset: 2520a95cddf7
>>>>>>-8 Snip
More information about the zgc-dev
mailing list