From simone.bordet at gmail.com  Thu Apr  4 07:04:15 2019
From: simone.bordet at gmail.com (Simone Bordet)
Date: Thu, 4 Apr 2019 09:04:15 +0200
Subject: Load Barrier Assembly
Message-ID: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>

Hi,

I have been looking at the load barrier assembly, and found out that
(at least in JDK 12) the code is (via -XX:+PrintAssembly):

test %rsi, 0x20(%r15)
jne slow_path

This is slightly different from what reported in Per's presentations
where it was:

test %rsi, (0x16)%r15
jnz slow_path

I'm not an assembly expert, is the second version is a typo?

But the question I have is: what's loaded in r15, and why the bad mask
is 32 bytes after that address?
Can the bad mask be stored in a registry (at the cost of losing one registry)?

Thanks!

-- 
Simone Bordet
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From shade at redhat.com  Thu Apr  4 07:40:13 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 4 Apr 2019 09:40:13 +0200
Subject: Load Barrier Assembly
In-Reply-To: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
References: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
Message-ID: <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>

On 4/4/19 9:04 AM, Simone Bordet wrote:
> test %rsi, 0x20(%r15)
> jne slow_path
> 
> This is slightly different from what reported in Per's presentations
> where it was:
> 
> test %rsi, (0x16)%r15
> jnz slow_path
> 
> I'm not an assembly expert, is the second version is a typo?

Second version has a typo, it should be 0x16(%r15).

> But the question I have is: what's loaded in r15, and why the bad mask
> is 32 bytes after that address?

%r15 is the pointer to thread-local storage:
 http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/cpu/x86/x86_64.ad#l12887

Bad mask is at offset 0x20 there:
 http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/share/runtime/thread.hpp#l147
 http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/share/gc/z/zThreadLocalData.hpp#l35

> Can the bad mask be stored in a registry (at the cost of losing one registry)?

Well, in Shenandoah, we store thread-local gc state the similar way and check it on barrier
fastpath. We did experiment with putting it into register and the short answer is: losing one of the
registers means significant drawback when register pressure is high, think heavily unrolled and
pipelined loop. Additionally, you'd need to handle the restoration of the register value when the
flag/mask finally changes (happens during safepoint/handshake poll). It is doable, but tedious. In
Shenandoah, there is ShenandoahCommonGCStateLoads that just caches the value between the safepoint
polls.

The greater idea to eliminate barrier costs is to use nmethod entry barriers to hot-patch the code
(e.g. nop them out) eliminating barrier overhead altogether. I don't think it was actually tried for
either G1, Shenandoah or ZGC.

-- 
Thanks,
-Aleksey


From per.liden at oracle.com  Thu Apr  4 07:50:49 2019
From: per.liden at oracle.com (Per Liden)
Date: Thu, 4 Apr 2019 09:50:49 +0200
Subject: Load Barrier Assembly
In-Reply-To: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
References: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
Message-ID: <47425eff-2f30-a298-8a07-59e835f99a34@oracle.com>

Hi,

On 4/4/19 9:04 AM, Simone Bordet wrote:
> Hi,
> 
> I have been looking at the load barrier assembly, and found out that
> (at least in JDK 12) the code is (via -XX:+PrintAssembly):
> 
> test %rsi, 0x20(%r15)
> jne slow_path
> 
> This is slightly different from what reported in Per's presentations
> where it was:
> 
> test %rsi, (0x16)%r15
> jnz slow_path
> 
> I'm not an assembly expert, is the second version is a typo?

jne and jnz is the same thing (same op code), and checks if the zero 
flag (ZF) is cleared (0).

> 
> But the question I have is: what's loaded in r15, and why the bad mask
> is 32 bytes after that address?

r15 is the Thread pointer. Each thread has an instance of 
ZThreadLocalData, where it keeps it's thread local address bad mask 
(which happens to be at offset 0x20 at this time).

> Can the bad mask be stored in a registry (at the cost of losing one registry)?

Yes. That has been prototyped (using r12), but so far we've opted to not 
go down that path. The performance gain turns out to be fairly small, 
and there are a few draw backs, like the need to carefully restore that 
register at various places where it's destroyed.

cheers,
Per

> 
> Thanks!
> 

From per.liden at oracle.com  Thu Apr  4 07:57:53 2019
From: per.liden at oracle.com (Per Liden)
Date: Thu, 4 Apr 2019 09:57:53 +0200
Subject: Load Barrier Assembly
In-Reply-To: <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>
References: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
 <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>
Message-ID: <e910d152-3e09-9f02-257c-16d64e90575a@oracle.com>

On 4/4/19 9:40 AM, Aleksey Shipilev wrote:
> On 4/4/19 9:04 AM, Simone Bordet wrote:
>> test %rsi, 0x20(%r15)
>> jne slow_path
>>
>> This is slightly different from what reported in Per's presentations
>> where it was:
>>
>> test %rsi, (0x16)%r15
>> jnz slow_path
>>
>> I'm not an assembly expert, is the second version is a typo?
> 
> Second version has a typo, it should be 0x16(%r15).

Ah, missed that typo.

> 
>> But the question I have is: what's loaded in r15, and why the bad mask
>> is 32 bytes after that address?
> 
> %r15 is the pointer to thread-local storage:
>   http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/cpu/x86/x86_64.ad#l12887
> 
> Bad mask is at offset 0x20 there:
>   http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/share/runtime/thread.hpp#l147
>   http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/share/gc/z/zThreadLocalData.hpp#l35
> 
>> Can the bad mask be stored in a registry (at the cost of losing one registry)?
> 
> Well, in Shenandoah, we store thread-local gc state the similar way and check it on barrier
> fastpath. We did experiment with putting it into register and the short answer is: losing one of the
> registers means significant drawback when register pressure is high, think heavily unrolled and
> pipelined loop. Additionally, you'd need to handle the restoration of the register value when the
> flag/mask finally changes (happens during safepoint/handshake poll). It is doable, but tedious. In
> Shenandoah, there is ShenandoahCommonGCStateLoads that just caches the value between the safepoint
> polls.
> 
> The greater idea to eliminate barrier costs is to use nmethod entry barriers to hot-patch the code
> (e.g. nop them out) eliminating barrier overhead altogether. I don't think it was actually tried for
> either G1, Shenandoah or ZGC.
> 

Yep, not yet tried in ZGC.

For phases where barrier are needed, and can't be completely eliminated, 
we could have the test instruction take the bad mask as an immediate 
instead of loading it, and let the nmethod barrier patch that immediate 
bad mask as needed.

cheers,
Per

From rkennke at redhat.com  Thu Apr  4 08:09:38 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 4 Apr 2019 10:09:38 +0200
Subject: Load Barrier Assembly
In-Reply-To: <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>
References: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
 <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>
Message-ID: <23895046-f0ea-8acb-e688-f0ee79923cc8@redhat.com>

>> test %rsi, 0x20(%r15)
>> jne slow_path
>>
>> This is slightly different from what reported in Per's presentations
>> where it was:
>>
>> test %rsi, (0x16)%r15
>> jnz slow_path
>>
>> I'm not an assembly expert, is the second version is a typo?
> 
> Second version has a typo, it should be 0x16(%r15).
> 
>> But the question I have is: what's loaded in r15, and why the bad mask
>> is 32 bytes after that address?
> 
> %r15 is the pointer to thread-local storage:
>   http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/cpu/x86/x86_64.ad#l12887
> 
> Bad mask is at offset 0x20 there:
>   http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/share/runtime/thread.hpp#l147
>   http://hg.openjdk.java.net/jdk/jdk/file/5c7418757bad/src/hotspot/share/gc/z/zThreadLocalData.hpp#l35
> 
>> Can the bad mask be stored in a registry (at the cost of losing one registry)?
> 
> Well, in Shenandoah, we store thread-local gc state the similar way and check it on barrier
> fastpath. We did experiment with putting it into register and the short answer is: losing one of the
> registers means significant drawback when register pressure is high, think heavily unrolled and
> pipelined loop. Additionally, you'd need to handle the restoration of the register value when the
> flag/mask finally changes (happens during safepoint/handshake poll). It is doable, but tedious.

I have the prototype. There are two problems: 1. as you say, we're 
loosing if register pressure is high 2. gc-state reloads (at safepoints, 
call-sites, etc) appear to be more frequent than barriers, which makes 
this approach not very attractive.

Roman

From thomas.schatzl at oracle.com  Thu Apr  4 08:24:36 2019
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 04 Apr 2019 10:24:36 +0200
Subject: Load Barrier Assembly
In-Reply-To: <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>
References: <CAFWmRJ3GkBcDZ3UEOfd4HUSRGS8JFzkLf-dK=O9WqW2NnzKwmA@mail.gmail.com>
 <443a2e99-ca9c-a07a-bd4c-25c7d6bf1ec6@redhat.com>
Message-ID: <5e09279b53e44ae600fec3f7876c32a969a08659.camel@oracle.com>

Hi,

On Thu, 2019-04-04 at 09:40 +0200, Aleksey Shipilev wrote:
> On 4/4/19 9:04 AM, Simone Bordet wrote:
> > test %rsi, 0x20(%r15)
> > jne slow_path
> > 
> > 
[...]
> The greater idea to eliminate barrier costs is to use nmethod entry
> barriers to hot-patch the code (e.g. nop them out) eliminating
> barrier overhead altogether. I don't think it was actually tried for
> either G1, Shenandoah or ZGC.
> 

  hasn't been tried but thought about for a long time for G1; also
nmethod barrier functionality is not that recent :)

Thanks,
  Thomas


From per.liden at oracle.com  Thu Apr  4 09:03:26 2019
From: per.liden at oracle.com (Per Liden)
Date: Thu, 4 Apr 2019 11:03:26 +0200
Subject: hg: zgc/zgc: Implementation of JEP 351: ZGC: Uncommit Unused
 Memory (Preview 1)
In-Reply-To: <201903291158.x2TBwrOL025279@aojmv0008.oracle.com>
References: <201903291158.x2TBwrOL025279@aojmv0008.oracle.com>
Message-ID: <b744f16a-2816-881b-e7b2-fefff2fbd262@oracle.com>

As the title of this commit suggests, this is a preview of the 
capability to uncommit unused memory. See JEP 351 for for information, 
http://openjdk.java.net/jeps/351.

If you're interested in this feature, please take it for a spin. 
Feedback welcomed.

You can control the uncommit delay using -XX:ZUncommitDelay=<seconds> 
(defaults to 5 min). And unlike what the JEP currently suggests, this 
patch has an explicit option to turn this feature off (-XX:-ZUncommit).

Uncommitting of unused memory sort of plays together with proactive GCs, 
which typically will kicks in when the allocation rate is low for a 
longer period. If proactive GCs aren't happening at a pace that fits 
your workload, you can use -XX:ZCollectionInterval=<seconds> to make a 
GC to happen at least once within the given interval.

cheers,
Per

On 3/29/19 12:58 PM, per.liden at oracle.com wrote:
> Changeset: ffab403eaf14
> Author:    pliden
> Date:      2019-03-29 12:58 +0100
> URL:       http://hg.openjdk.java.net/zgc/zgc/rev/ffab403eaf14
> 
> Implementation of JEP 351: ZGC: Uncommit Unused Memory (Preview 1)
> 
> ! src/hotspot/os/linux/gc/z/zNUMA_linux.cpp
> ! src/hotspot/os_cpu/linux_x86/gc/z/zBackingFile_linux_x86.cpp
> ! src/hotspot/os_cpu/linux_x86/gc/z/zBackingFile_linux_x86.hpp
> ! src/hotspot/os_cpu/linux_x86/gc/z/zBackingPath_linux_x86.cpp
> ! src/hotspot/os_cpu/linux_x86/gc/z/zPhysicalMemoryBacking_linux_x86.cpp
> ! src/hotspot/os_cpu/linux_x86/gc/z/zPhysicalMemoryBacking_linux_x86.hpp
> ! src/hotspot/share/gc/z/vmStructs_z.hpp
> ! src/hotspot/share/gc/z/zCollectedHeap.cpp
> ! src/hotspot/share/gc/z/zCollectedHeap.hpp
> ! src/hotspot/share/gc/z/zHeap.cpp
> ! src/hotspot/share/gc/z/zHeap.hpp
> ! src/hotspot/share/gc/z/zMemory.cpp
> ! src/hotspot/share/gc/z/zMemory.hpp
> ! src/hotspot/share/gc/z/zPage.cpp
> ! src/hotspot/share/gc/z/zPage.hpp
> ! src/hotspot/share/gc/z/zPage.inline.hpp
> ! src/hotspot/share/gc/z/zPageAllocator.cpp
> ! src/hotspot/share/gc/z/zPageAllocator.hpp
> ! src/hotspot/share/gc/z/zPhysicalMemory.cpp
> ! src/hotspot/share/gc/z/zPhysicalMemory.hpp
> ! src/hotspot/share/gc/z/zPhysicalMemory.inline.hpp
> - src/hotspot/share/gc/z/zPreMappedMemory.cpp
> - src/hotspot/share/gc/z/zPreMappedMemory.hpp
> - src/hotspot/share/gc/z/zPreMappedMemory.inline.hpp
> + src/hotspot/share/gc/z/zUncommitter.cpp
> + src/hotspot/share/gc/z/zUncommitter.hpp
> ! src/hotspot/share/gc/z/zVirtualMemory.cpp
> ! src/hotspot/share/gc/z/zVirtualMemory.hpp
> ! src/hotspot/share/gc/z/zVirtualMemory.inline.hpp
> ! src/hotspot/share/gc/z/z_globals.hpp
> ! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/z/ZPageAllocator.java
> - src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/z/ZPhysicalMemoryManager.java
> ! test/hotspot/gtest/gc/z/test_zForwarding.cpp
> ! test/hotspot/gtest/gc/z/test_zPhysicalMemory.cpp
> - test/hotspot/gtest/gc/z/test_zVirtualMemory.cpp
> ! test/hotspot/jtreg/ProblemList-zgc.txt
> + test/hotspot/jtreg/gc/z/TestUncommit.java
> 

From zgu at redhat.com  Mon Apr  8 15:27:26 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 8 Apr 2019 11:27:26 -0400
Subject: Questions on concurrent class unloading
Message-ID: <66c30512-86e1-5b56-ef5d-30831e513e21@redhat.com>

Hi,

I am studying concurrent class unloading in ZGC, it looks to me that 
nothing gets unlinked (ZNMethod::unlink())and purged (ZNMethod::purge()).

Inside ZNMethodUnlinkClosure::do_nmethod() [1]

It appears that nm->is_unloading() can never be true. Otherwise, 
assertion should fail inside nm->flush_dependencies(false), cause 
neither Universe::heap()->is_gc_active() (true for STW GC) nor 
is_ConcurrentGC_thread() is true.

Similar with ZNMethodPurgeClosure [2] , nm->make_unloaded() has similar 
assertion.

What did I miss?

Thanks,

-Zhengyu


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l279

[2] 
http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l357


From stuart.monteith at linaro.org  Mon Apr  8 16:39:07 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Mon, 8 Apr 2019 17:39:07 +0100
Subject: abort in ZForwarding
Message-ID: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>

Hello,
   I'm currently getting the following with -XX:+ZVerifyForwarding:

#  Internal Error
(/home/stumon01/repos/jdk/src/hotspot/share/gc/z/zForwarding.cpp:72),
pid=13355, tid=13359
#  guarantee(entry.from_index() != other.from_index()) failed: Duplicate from

Current thread (0x00007fd65c04f3a0):  GCTaskThread "ZWorker#2" [stack:
0x00007fd661542000,0x00007fd661642000] [id=13359]

Stack: [0x00007fd661542000,0x00007fd661642000],
sp=0x00007fd661640c30,  free space=1019k
Native frames: (J=compiled Java code, A=aot compiled Java code,
j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x13729b2]  ZForwarding::verify() const+0x1c6
V  [libjvm.so+0x13948c2]  ZRelocate::work(ZRelocationSetParallelIterator*)+0x7a
V  [libjvm.so+0x139557d]  ZRelocateTask::work()+0x27
V  [libjvm.so+0x139f55c]  ZTask::GangTask::work(unsigned int)+0x38
V  [libjvm.so+0x135353b]  GangWorker::run_task(WorkData)+0xab
V  [libjvm.so+0x13535f3]  GangWorker::loop()+0x37
V  [libjvm.so+0x135327a]  AbstractGangWorker::run()+0x3e
V  [libjvm.so+0x125ce2b]  Thread::call_run()+0x195
V  [libjvm.so+0xfb9fae]  thread_native_entry(Thread*)+0x1ee

I had assumed that this was because of something I'd some wrong on
AArch64, but I find it also occurs on x86.

ZForwarding::verify() is diligent enough to skip over empty entries in
the outer loop. However, this is not done for the inner loop.

This resolves the issue:

diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
--- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49 2019 +0100
+++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18 2019 +0100
@@ -69,6 +69,11 @@
     // Check for duplicates
     for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
       const ZForwardingEntry other = at(&j);
+      if (!other.populated()) {
+         // Skip empty entries
+         continue;
+       }
+
       guarantee(entry.from_index() != other.from_index(), "Duplicate from");
       guarantee(entry.to_offset() != other.to_offset(), "Duplicate to");
     }

Have you already encountered this, shall I create a bug+  patchset ?

BR,
    Stuart

From erik.osterlund at oracle.com  Mon Apr  8 18:49:52 2019
From: erik.osterlund at oracle.com (Erik Osterlund)
Date: Mon, 8 Apr 2019 20:49:52 +0200
Subject: Questions on concurrent class unloading
In-Reply-To: <66c30512-86e1-5b56-ef5d-30831e513e21@redhat.com>
References: <66c30512-86e1-5b56-ef5d-30831e513e21@redhat.com>
Message-ID: <780D3CA4-81A4-4578-844E-A587EC0D60D6@oracle.com>

Hi Zhengyu,

The code is called by concurrent GC threads, so is_ConcurrentGC_thread() should return true.

is_unloading() returns true if the nmethod has a dead oop due to GC.

/Erik

> On 8 Apr 2019, at 17:27, Zhengyu Gu <zgu at redhat.com> wrote:
> 
> Hi,
> 
> I am studying concurrent class unloading in ZGC, it looks to me that nothing gets unlinked (ZNMethod::unlink())and purged (ZNMethod::purge()).
> 
> Inside ZNMethodUnlinkClosure::do_nmethod() [1]
> 
> It appears that nm->is_unloading() can never be true. Otherwise, assertion should fail inside nm->flush_dependencies(false), cause neither Universe::heap()->is_gc_active() (true for STW GC) nor is_ConcurrentGC_thread() is true.
> 
> Similar with ZNMethodPurgeClosure [2] , nm->make_unloaded() has similar assertion.
> 
> What did I miss?
> 
> Thanks,
> 
> -Zhengyu
> 
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l279
> 
> [2] http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l357
> 
> 
> 


From zgu at redhat.com  Mon Apr  8 19:02:46 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 8 Apr 2019 15:02:46 -0400
Subject: Questions on concurrent class unloading
In-Reply-To: <780D3CA4-81A4-4578-844E-A587EC0D60D6@oracle.com>
References: <66c30512-86e1-5b56-ef5d-30831e513e21@redhat.com>
 <780D3CA4-81A4-4578-844E-A587EC0D60D6@oracle.com>
Message-ID: <14a9d997-cc02-4fea-470e-2b4a3e4236de@redhat.com>


On 4/8/19 2:49 PM, Erik Osterlund wrote:
> Hi Zhengyu,
> 
> The code is called by concurrent GC threads, so is_ConcurrentGC_thread() should return true.
> 
E.g ZNMethodUnlinkClosure is only used by ZNMethodUnlinkTask, and the 
task is executed by worker threads, which are not concurrent gc thread. No?

I also added assert(false, "debug") inside under nm->is_unloading() 
branch, it never fired.

Thanks,

-Zhengyu


> is_unloading() returns true if the nmethod has a dead oop due to GC.
> 
> /Erik
> 
>> On 8 Apr 2019, at 17:27, Zhengyu Gu <zgu at redhat.com> wrote:
>>
>> Hi,
>>
>> I am studying concurrent class unloading in ZGC, it looks to me that nothing gets unlinked (ZNMethod::unlink())and purged (ZNMethod::purge()).
>>
>> Inside ZNMethodUnlinkClosure::do_nmethod() [1]
>>
>> It appears that nm->is_unloading() can never be true. Otherwise, assertion should fail inside nm->flush_dependencies(false), cause neither Universe::heap()->is_gc_active() (true for STW GC) nor is_ConcurrentGC_thread() is true.
>>
>> Similar with ZNMethodPurgeClosure [2] , nm->make_unloaded() has similar assertion.
>>
>> What did I miss?
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>> [1] http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l279
>>
>> [2] http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l357
>>
>>
>>
> 

From erik.osterlund at oracle.com  Mon Apr  8 19:28:35 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 8 Apr 2019 21:28:35 +0200
Subject: Questions on concurrent class unloading
In-Reply-To: <14a9d997-cc02-4fea-470e-2b4a3e4236de@redhat.com>
References: <66c30512-86e1-5b56-ef5d-30831e513e21@redhat.com>
 <780D3CA4-81A4-4578-844E-A587EC0D60D6@oracle.com>
 <14a9d997-cc02-4fea-470e-2b4a3e4236de@redhat.com>
Message-ID: <2ad783db-0fc5-f1c6-7f8b-75ba651fb542@oracle.com>

Hi Zhengyu,

On 2019-04-08 21:02, Zhengyu Gu wrote:
> 
> 
> On 4/8/19 2:49 PM, Erik Osterlund wrote:
>> Hi Zhengyu,
>>
>> The code is called by concurrent GC threads, so 
>> is_ConcurrentGC_thread() should return true.
>>
> E.g ZNMethodUnlinkClosure is only used by ZNMethodUnlinkTask, and the 
> task is executed by worker threads, which are not concurrent gc thread. No?

The workgang we use for concurrent execution, uses concurrent GC threads.

> I also added assert(false, "debug") inside under nm->is_unloading() 
> branch, it never fired.

So... are you running a workload that needs to unload nmethods? Because 
we are definitely unloading nmethods. I did the same experiment, and hit 
the assert as expected.

/Erik

> Thanks,
> 
> -Zhengyu
> 
> 
> 
>> is_unloading() returns true if the nmethod has a dead oop due to GC.
>>
>> /Erik
>>
>>> On 8 Apr 2019, at 17:27, Zhengyu Gu <zgu at redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> I am studying concurrent class unloading in ZGC, it looks to me that 
>>> nothing gets unlinked (ZNMethod::unlink())and purged 
>>> (ZNMethod::purge()).
>>>
>>> Inside ZNMethodUnlinkClosure::do_nmethod() [1]
>>>
>>> It appears that nm->is_unloading() can never be true. Otherwise, 
>>> assertion should fail inside nm->flush_dependencies(false), cause 
>>> neither Universe::heap()->is_gc_active() (true for STW GC) nor 
>>> is_ConcurrentGC_thread() is true.
>>>
>>> Similar with ZNMethodPurgeClosure [2] , nm->make_unloaded() has 
>>> similar assertion.
>>>
>>> What did I miss?
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>> [1] 
>>> http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l279 
>>>
>>>
>>> [2] 
>>> http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l357 
>>>
>>>
>>>
>>>
>>

From zgu at redhat.com  Mon Apr  8 20:10:14 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 8 Apr 2019 16:10:14 -0400
Subject: Questions on concurrent class unloading
In-Reply-To: <2ad783db-0fc5-f1c6-7f8b-75ba651fb542@oracle.com>
References: <66c30512-86e1-5b56-ef5d-30831e513e21@redhat.com>
 <780D3CA4-81A4-4578-844E-A587EC0D60D6@oracle.com>
 <14a9d997-cc02-4fea-470e-2b4a3e4236de@redhat.com>
 <2ad783db-0fc5-f1c6-7f8b-75ba651fb542@oracle.com>
Message-ID: <4a566137-cfde-170a-0c7e-cb26cf08dc33@redhat.com>


On 4/8/19 3:28 PM, Erik ?sterlund wrote:
> Hi Zhengyu,
> 
> On 2019-04-08 21:02, Zhengyu Gu wrote:
>>
>>
>> On 4/8/19 2:49 PM, Erik Osterlund wrote:
>>> Hi Zhengyu,
>>>
>>> The code is called by concurrent GC threads, so 
>>> is_ConcurrentGC_thread() should return true.
>>>
>> E.g ZNMethodUnlinkClosure is only used by ZNMethodUnlinkTask, and the 
>> task is executed by worker threads, which are not concurrent gc 
>> thread. No?
> 
> The workgang we use for concurrent execution, uses concurrent GC threads.

Got it.

Thanks,

-Zhengyu

> 
>> I also added assert(false, "debug") inside under nm->is_unloading() 
>> branch, it never fired.
> 
> So... are you running a workload that needs to unload nmethods? Because 
> we are definitely unloading nmethods. I did the same experiment, and hit 
> the assert as expected.
> 
> /Erik
> 
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>>> is_unloading() returns true if the nmethod has a dead oop due to GC.
>>>
>>> /Erik
>>>
>>>> On 8 Apr 2019, at 17:27, Zhengyu Gu <zgu at redhat.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am studying concurrent class unloading in ZGC, it looks to me that 
>>>> nothing gets unlinked (ZNMethod::unlink())and purged 
>>>> (ZNMethod::purge()).
>>>>
>>>> Inside ZNMethodUnlinkClosure::do_nmethod() [1]
>>>>
>>>> It appears that nm->is_unloading() can never be true. Otherwise, 
>>>> assertion should fail inside nm->flush_dependencies(false), cause 
>>>> neither Universe::heap()->is_gc_active() (true for STW GC) nor 
>>>> is_ConcurrentGC_thread() is true.
>>>>
>>>> Similar with ZNMethodPurgeClosure [2] , nm->make_unloaded() has 
>>>> similar assertion.
>>>>
>>>> What did I miss?
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>
>>>> [1] 
>>>> http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l279 
>>>>
>>>>
>>>> [2] 
>>>> http://hg.openjdk.java.net/jdk/jdk/file/542735f2a53e/src/hotspot/share/gc/z/zNMethod.cpp#l357 
>>>>
>>>>
>>>>
>>>>
>>>

From per.liden at oracle.com  Mon Apr  8 20:16:35 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 8 Apr 2019 22:16:35 +0200
Subject: abort in ZForwarding
In-Reply-To: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
References: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
Message-ID: <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>

Hi Stuart,

On 04/08/2019 06:39 PM, Stuart Monteith wrote:
[...]
> This resolves the issue:
> 
> diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
> --- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49 2019 +0100
> +++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18 2019 +0100
> @@ -69,6 +69,11 @@
>       // Check for duplicates
>       for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
>         const ZForwardingEntry other = at(&j);
> +      if (!other.populated()) {
> +         // Skip empty entries
> +         continue;
> +       }
> +
>         guarantee(entry.from_index() != other.from_index(), "Duplicate from");
>         guarantee(entry.to_offset() != other.to_offset(), "Duplicate to");
>       }
> 
> Have you already encountered this, shall I create a bug+  patchset ?

Ah, yes. That looks like an oversight after JDK-8221540. Feel free to 
create a bug and send a patch to hotspot-gc-dev. I'll sponsor it.

/Per

From stuart.monteith at linaro.org  Tue Apr  9 09:52:12 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Tue, 9 Apr 2019 10:52:12 +0100
Subject: abort in ZForwarding
In-Reply-To: <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>
References: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
 <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>
Message-ID: <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>

Thanks Per.

I've opened JDK-8222180 and put the patch here:
    http://cr.openjdk.java.net/~smonteith/8222180/webrev/

It passes the gtests and runs fine with -XX:+ZVerifyForwarding enabled.

BR,
   Stuart


On Mon, 8 Apr 2019 at 21:18, Per Liden <per.liden at oracle.com> wrote:
>
> Hi Stuart,
>
> On 04/08/2019 06:39 PM, Stuart Monteith wrote:
> [...]
> > This resolves the issue:
> >
> > diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
> > --- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49 2019 +0100
> > +++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18 2019 +0100
> > @@ -69,6 +69,11 @@
> >       // Check for duplicates
> >       for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
> >         const ZForwardingEntry other = at(&j);
> > +      if (!other.populated()) {
> > +         // Skip empty entries
> > +         continue;
> > +       }
> > +
> >         guarantee(entry.from_index() != other.from_index(), "Duplicate from");
> >         guarantee(entry.to_offset() != other.to_offset(), "Duplicate to");
> >       }
> >
> > Have you already encountered this, shall I create a bug+  patchset ?
>
> Ah, yes. That looks like an oversight after JDK-8221540. Feel free to
> create a bug and send a patch to hotspot-gc-dev. I'll sponsor it.
>
> /Per

From per.liden at oracle.com  Tue Apr  9 10:50:40 2019
From: per.liden at oracle.com (Per Liden)
Date: Tue, 9 Apr 2019 12:50:40 +0200
Subject: abort in ZForwarding
In-Reply-To: <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>
References: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
 <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>
 <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>
Message-ID: <cdc07788-c9ef-3dfc-fe3b-2fd0ccbd87d6@oracle.com>

(Including hotspot-gc-dev)

On 4/9/19 11:52 AM, Stuart Monteith wrote:
> Thanks Per.
> 
> I've opened JDK-8222180 and put the patch here:
>      http://cr.openjdk.java.net/~smonteith/8222180/webrev/

Looks good!

/Per

> 
> It passes the gtests and runs fine with -XX:+ZVerifyForwarding enabled.
> 
> BR,
>     Stuart
> 
> 
> On Mon, 8 Apr 2019 at 21:18, Per Liden <per.liden at oracle.com> wrote:
>>
>> Hi Stuart,
>>
>> On 04/08/2019 06:39 PM, Stuart Monteith wrote:
>> [...]
>>> This resolves the issue:
>>>
>>> diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
>>> --- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49 2019 +0100
>>> +++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18 2019 +0100
>>> @@ -69,6 +69,11 @@
>>>        // Check for duplicates
>>>        for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
>>>          const ZForwardingEntry other = at(&j);
>>> +      if (!other.populated()) {
>>> +         // Skip empty entries
>>> +         continue;
>>> +       }
>>> +
>>>          guarantee(entry.from_index() != other.from_index(), "Duplicate from");
>>>          guarantee(entry.to_offset() != other.to_offset(), "Duplicate to");
>>>        }
>>>
>>> Have you already encountered this, shall I create a bug+  patchset ?
>>
>> Ah, yes. That looks like an oversight after JDK-8221540. Feel free to
>> create a bug and send a patch to hotspot-gc-dev. I'll sponsor it.
>>
>> /Per

From erik.osterlund at oracle.com  Wed Apr 10 09:22:59 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 10 Apr 2019 11:22:59 +0200
Subject: abort in ZForwarding
In-Reply-To: <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>
References: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
 <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>
 <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>
Message-ID: <b0d29fc2-2faa-a74f-2554-ede4eed85383@oracle.com>

Hi Stuart,

Good catch. Looks good.

Thanks,
/Erik

On 2019-04-09 11:52, Stuart Monteith wrote:
> Thanks Per.
>
> I've opened JDK-8222180 and put the patch here:
>      http://cr.openjdk.java.net/~smonteith/8222180/webrev/
>
> It passes the gtests and runs fine with -XX:+ZVerifyForwarding enabled.
>
> BR,
>     Stuart
>
>
> On Mon, 8 Apr 2019 at 21:18, Per Liden <per.liden at oracle.com> wrote:
>> Hi Stuart,
>>
>> On 04/08/2019 06:39 PM, Stuart Monteith wrote:
>> [...]
>>> This resolves the issue:
>>>
>>> diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
>>> --- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49 2019 +0100
>>> +++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18 2019 +0100
>>> @@ -69,6 +69,11 @@
>>>        // Check for duplicates
>>>        for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
>>>          const ZForwardingEntry other = at(&j);
>>> +      if (!other.populated()) {
>>> +         // Skip empty entries
>>> +         continue;
>>> +       }
>>> +
>>>          guarantee(entry.from_index() != other.from_index(), "Duplicate from");
>>>          guarantee(entry.to_offset() != other.to_offset(), "Duplicate to");
>>>        }
>>>
>>> Have you already encountered this, shall I create a bug+  patchset ?
>> Ah, yes. That looks like an oversight after JDK-8221540. Feel free to
>> create a bug and send a patch to hotspot-gc-dev. I'll sponsor it.
>>
>> /Per


From per.liden at oracle.com  Wed Apr 10 10:54:13 2019
From: per.liden at oracle.com (Per Liden)
Date: Wed, 10 Apr 2019 12:54:13 +0200
Subject: abort in ZForwarding
In-Reply-To: <b0d29fc2-2faa-a74f-2554-ede4eed85383@oracle.com>
References: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
 <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>
 <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>
 <b0d29fc2-2faa-a74f-2554-ede4eed85383@oracle.com>
Message-ID: <c4081770-49ba-aaa0-bdc8-da390d74fa4a@oracle.com>

Pushed

/Per

On 04/10/2019 11:22 AM, Erik ?sterlund wrote:
> Hi Stuart,
> 
> Good catch. Looks good.
> 
> Thanks,
> /Erik
> 
> On 2019-04-09 11:52, Stuart Monteith wrote:
>> Thanks Per.
>>
>> I've opened JDK-8222180 and put the patch here:
>>      http://cr.openjdk.java.net/~smonteith/8222180/webrev/
>>
>> It passes the gtests and runs fine with -XX:+ZVerifyForwarding enabled.
>>
>> BR,
>>     Stuart
>>
>>
>> On Mon, 8 Apr 2019 at 21:18, Per Liden <per.liden at oracle.com> wrote:
>>> Hi Stuart,
>>>
>>> On 04/08/2019 06:39 PM, Stuart Monteith wrote:
>>> [...]
>>>> This resolves the issue:
>>>>
>>>> diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
>>>> --- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49 
>>>> 2019 +0100
>>>> +++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18 
>>>> 2019 +0100
>>>> @@ -69,6 +69,11 @@
>>>>        // Check for duplicates
>>>>        for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
>>>>          const ZForwardingEntry other = at(&j);
>>>> +      if (!other.populated()) {
>>>> +         // Skip empty entries
>>>> +         continue;
>>>> +       }
>>>> +
>>>>          guarantee(entry.from_index() != other.from_index(), 
>>>> "Duplicate from");
>>>>          guarantee(entry.to_offset() != other.to_offset(), 
>>>> "Duplicate to");
>>>>        }
>>>>
>>>> Have you already encountered this, shall I create a bug+  patchset ?
>>> Ah, yes. That looks like an oversight after JDK-8221540. Feel free to
>>> create a bug and send a patch to hotspot-gc-dev. I'll sponsor it.
>>>
>>> /Per
> 

From stuart.monteith at linaro.org  Wed Apr 10 11:15:27 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Wed, 10 Apr 2019 12:15:27 +0100
Subject: abort in ZForwarding
In-Reply-To: <c4081770-49ba-aaa0-bdc8-da390d74fa4a@oracle.com>
References: <CAEGA6kaGeCuu6iOOWDaN_=yuSv5Mhxc6Bj-VYsnVd89fEvj4cQ@mail.gmail.com>
 <f7c54434-bb04-2c2e-465a-f8faf5e2b2e5@oracle.com>
 <CAEGA6kZ1c=dksA434Ru0qWFq7BSmXtTzzHwCkQefZ4_P4X7Gbg@mail.gmail.com>
 <b0d29fc2-2faa-a74f-2554-ede4eed85383@oracle.com>
 <c4081770-49ba-aaa0-bdc8-da390d74fa4a@oracle.com>
Message-ID: <CAEGA6kZgU1KmOVCk_9BkMoeTd-sMEayv_vxrSaaf=ziqfXd4-w@mail.gmail.com>

Thanks!

On Wed, 10 Apr 2019, 11:54 Per Liden, <per.liden at oracle.com> wrote:

> Pushed
>
> /Per
>
> On 04/10/2019 11:22 AM, Erik ?sterlund wrote:
> > Hi Stuart,
> >
> > Good catch. Looks good.
> >
> > Thanks,
> > /Erik
> >
> > On 2019-04-09 11:52, Stuart Monteith wrote:
> >> Thanks Per.
> >>
> >> I've opened JDK-8222180 and put the patch here:
> >>      http://cr.openjdk.java.net/~smonteith/8222180/webrev/
> >>
> >> It passes the gtests and runs fine with -XX:+ZVerifyForwarding enabled.
> >>
> >> BR,
> >>     Stuart
> >>
> >>
> >> On Mon, 8 Apr 2019 at 21:18, Per Liden <per.liden at oracle.com> wrote:
> >>> Hi Stuart,
> >>>
> >>> On 04/08/2019 06:39 PM, Stuart Monteith wrote:
> >>> [...]
> >>>> This resolves the issue:
> >>>>
> >>>> diff -r 6eb8c555644a src/hotspot/share/gc/z/zForwarding.cpp
> >>>> --- a/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 09:44:49
> >>>> 2019 +0100
> >>>> +++ b/src/hotspot/share/gc/z/zForwarding.cpp    Mon Apr 08 17:30:18
> >>>> 2019 +0100
> >>>> @@ -69,6 +69,11 @@
> >>>>        // Check for duplicates
> >>>>        for (ZForwardingCursor j = i + 1; j < _entries.length(); j++) {
> >>>>          const ZForwardingEntry other = at(&j);
> >>>> +      if (!other.populated()) {
> >>>> +         // Skip empty entries
> >>>> +         continue;
> >>>> +       }
> >>>> +
> >>>>          guarantee(entry.from_index() != other.from_index(),
> >>>> "Duplicate from");
> >>>>          guarantee(entry.to_offset() != other.to_offset(),
> >>>> "Duplicate to");
> >>>>        }
> >>>>
> >>>> Have you already encountered this, shall I create a bug+  patchset ?
> >>> Ah, yes. That looks like an oversight after JDK-8221540. Feel free to
> >>> create a bug and send a patch to hotspot-gc-dev. I'll sponsor it.
> >>>
> >>> /Per
> >
>

From simone.bordet at gmail.com  Wed Apr 10 14:43:57 2019
From: simone.bordet at gmail.com (Simone Bordet)
Date: Wed, 10 Apr 2019 16:43:57 +0200
Subject: Failure scenarios
Message-ID: <CAFWmRJ0R3kB6S5VZCEHYQweJY9+zNAeFr5jBj1EODjuvxQ7Rbg@mail.gmail.com>

Hi,

I would like to ask a few questions about how ZGC handles failure scenarios.

If there is an allocation failure, but the GC is currently running, my
understanding is that the allocating thread will pause until the GC
can make space for the request allocation.
AFAIK, this is called "Allocation Stall" and it's reported by ZGC logging.
Is my understanding correct?

Also AFAIK there is no fall back to Full GCs. Is this correct?

If ZGC cannot free space, there is no retry and no more stalling of
the allocating thread, but just a OOME. Is that right?

Thanks!

-- 
Simone Bordet
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From per.liden at oracle.com  Wed Apr 10 15:38:16 2019
From: per.liden at oracle.com (Per Liden)
Date: Wed, 10 Apr 2019 17:38:16 +0200
Subject: Failure scenarios
In-Reply-To: <CAFWmRJ0R3kB6S5VZCEHYQweJY9+zNAeFr5jBj1EODjuvxQ7Rbg@mail.gmail.com>
References: <CAFWmRJ0R3kB6S5VZCEHYQweJY9+zNAeFr5jBj1EODjuvxQ7Rbg@mail.gmail.com>
Message-ID: <b8b8a436-64db-b155-2100-a73758e83e2c@oracle.com>

Hi,

On 04/10/2019 04:43 PM, Simone Bordet wrote:
> Hi,
> 
> I would like to ask a few questions about how ZGC handles failure scenarios.
> 
> If there is an allocation failure, but the GC is currently running, my
> understanding is that the allocating thread will pause until the GC
> can make space for the request allocation.
> AFAIK, this is called "Allocation Stall" and it's reported by ZGC logging.
> Is my understanding correct?

That's correct. And note that space can become available before the GC 
cycle has completed. For example, during relocation set selection, 
ZPages that have zero live objects are immediately free up. And during 
relocation, as ZPages are emptied they are immediately made available 
for new allocations.

> 
> Also AFAIK there is no fall back to Full GCs. Is this correct?

Correct. Since ZGC is compacting (as opposed to copying) there's no need 
for any fall back. Put another way, if the normal GC cycle failed to 
free up memory, then any fallback "full GC" will also fail.

> 
> If ZGC cannot free space, there is no retry and no more stalling of
> the allocating thread, but just a OOME. Is that right?

That's right. A stalled Java thread will give up and throw OOME if one 
complete GC cycle has passed and there's still no memory available. So, 
if a thread stalls when a GC cycle is in progress then it will not throw 
OOME until another cycle has been completed.

cheers,
Per

> 
> Thanks!
> 

From simone.bordet at gmail.com  Wed Apr 10 15:49:01 2019
From: simone.bordet at gmail.com (Simone Bordet)
Date: Wed, 10 Apr 2019 17:49:01 +0200
Subject: Failure scenarios
In-Reply-To: <b8b8a436-64db-b155-2100-a73758e83e2c@oracle.com>
References: <CAFWmRJ0R3kB6S5VZCEHYQweJY9+zNAeFr5jBj1EODjuvxQ7Rbg@mail.gmail.com>
 <b8b8a436-64db-b155-2100-a73758e83e2c@oracle.com>
Message-ID: <CAFWmRJ3X1Hh+744r6+Ag=EAw4s6FKzPd9n8zx95tYXtzdJhjcQ@mail.gmail.com>

Thanks!

On Wed, Apr 10, 2019 at 5:38 PM Per Liden <per.liden at oracle.com> wrote:
>
> Hi,
>
> On 04/10/2019 04:43 PM, Simone Bordet wrote:
> > Hi,
> >
> > I would like to ask a few questions about how ZGC handles failure scenarios.
> >
> > If there is an allocation failure, but the GC is currently running, my
> > understanding is that the allocating thread will pause until the GC
> > can make space for the request allocation.
> > AFAIK, this is called "Allocation Stall" and it's reported by ZGC logging.
> > Is my understanding correct?
>
> That's correct. And note that space can become available before the GC
> cycle has completed. For example, during relocation set selection,
> ZPages that have zero live objects are immediately free up. And during
> relocation, as ZPages are emptied they are immediately made available
> for new allocations.
>
> >
> > Also AFAIK there is no fall back to Full GCs. Is this correct?
>
> Correct. Since ZGC is compacting (as opposed to copying) there's no need
> for any fall back. Put another way, if the normal GC cycle failed to
> free up memory, then any fallback "full GC" will also fail.
>
> >
> > If ZGC cannot free space, there is no retry and no more stalling of
> > the allocating thread, but just a OOME. Is that right?
>
> That's right. A stalled Java thread will give up and throw OOME if one
> complete GC cycle has passed and there's still no memory available. So,
> if a thread stalls when a GC cycle is in progress then it will not throw
> OOME until another cycle has been completed.
>
> cheers,
> Per
>
> >
> > Thanks!
> >


-- 
Simone Bordet
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From per.liden at oracle.com  Wed Apr 10 15:49:52 2019
From: per.liden at oracle.com (Per Liden)
Date: Wed, 10 Apr 2019 17:49:52 +0200
Subject: Failure scenarios
In-Reply-To: <b8b8a436-64db-b155-2100-a73758e83e2c@oracle.com>
References: <CAFWmRJ0R3kB6S5VZCEHYQweJY9+zNAeFr5jBj1EODjuvxQ7Rbg@mail.gmail.com>
 <b8b8a436-64db-b155-2100-a73758e83e2c@oracle.com>
Message-ID: <b98a32aa-07a5-e90c-1071-10b2cfc9b260@oracle.com>

Just a small clarification below.

On 04/10/2019 05:38 PM, Per Liden wrote:
> Hi,
> 
> On 04/10/2019 04:43 PM, Simone Bordet wrote:
>> Hi,
>>
>> I would like to ask a few questions about how ZGC handles failure 
>> scenarios.
>>
>> If there is an allocation failure, but the GC is currently running, my
>> understanding is that the allocating thread will pause until the GC
>> can make space for the request allocation.
>> AFAIK, this is called "Allocation Stall" and it's reported by ZGC 
>> logging.
>> Is my understanding correct?
> 
> That's correct. And note that space can become available before the GC 
> cycle has completed. For example, during relocation set selection, 
> ZPages that have zero live objects are immediately free up. And during 
> relocation, as ZPages are emptied they are immediately made available 
> for new allocations.
> 
>>
>> Also AFAIK there is no fall back to Full GCs. Is this correct?
> 
> Correct. Since ZGC is compacting (as opposed to copying) there's no need 
> for any fall back. Put another way, if the normal GC cycle failed to 
> free up memory, then any fallback "full GC" will also fail.
> 
>>
>> If ZGC cannot free space, there is no retry and no more stalling of
>> the allocating thread, but just a OOME. Is that right?
> 
> That's right. A stalled Java thread will give up and throw OOME if one 
> complete GC cycle has passed and there's still no memory available. So, 
> if a thread stalls when a GC cycle is in progress then it will not throw 
> OOME until another cycle has been completed.

... and there's till no memory available.

/Per

> 
> cheers,
> Per
> 
>>
>> Thanks!
>>

From stuart.monteith at linaro.org  Sat Apr 13 22:33:45 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Sat, 13 Apr 2019 23:33:45 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com>
 <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
Message-ID: <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>

Hello,
   At some point earlier on another thread the connection is being
closed, hence the XIOError here. This error is being caused because
the libx11 library is writing some image data to its socket with
"writev(int fd, const struct iovec *iov, int iovcnt);" returning an
error. The errno associated with that is EFAULT - which means the
kernel was being passed a bad address.
The reason running with AWT is exiting is that tagged addresses are
regarded as bad by the kernel, and we are passing an address derived
from a Java byte array that is on the heap. The AWT JNI code calls
jni_GetPrimitiveArrayCritical to get that address.
Patching that one JNI call is sufficient to start IDEA, and any of the
jfc demos. You can also try "-XX:+CheckJNICalls", which works as it
forces JNI to always copy arrays that are passed to/from JNI to a
temporary array that does not have a tagged address.

If I pursue the tagged addresses, I have to make sure that tagged
addresses are masked in all of the correct areas. As well as this part
of JNI I must consider the likes of the Unsafe API, and, a bit
unrelated, I know of another instance where I get an Exception
"sun.jvm.hotspot.debugger.UnmappedAddressException" in the
serviceability/sa tests.

An alternative is to reproduce the multimapping that is on x86, where
all addresses are real. This has the advantage that when the Memory
Tagging Extensions (MTE) in AArch64 are implemented, they won't
encroach on the ZGC coloured bits, and vice versa. Realistically if we
are ever to use MTE, ZGC may prevent that from happening.

I believe the only reason we should continue with using the aarch64
TBI tagged addresses is if it confers a good performance advantage
over multimapping. My plan is to patch JNI for now in my ZGC patch,
and then work on a new patch but with x86 style multimapping, which
ought to be a straight copy. As I understand it, only one bit of
colour is significant in ZGC at any given time, so the TLB impact
might not be so bad for the majority of the time, but we'll need to
check that.

The 64-bit literal addresses isn't strictly speaking at that point.
Per's patch for expanding maximum heap size to 16TB will still fit in
48-bits (with not bits spare), and so it would only be necessary for
52-bit VA support on aarch64, or if we ever feel the need to offer
32TB  VA.

BR,
   Stuart


On Fri, 15 Mar 2019 at 18:52, Andrew Haley <aph at redhat.com> wrote:
>
> On 3/14/19 5:26 PM, Stuart Monteith wrote:
> > The patches are here:
> >
> > http://cr.openjdk.java.net/~smonteith/zgc/20190314/
> >
> > I'm doing some more testing and then move onto RFRs.
>
> After https://www.jetbrains.com/idea/ starts up, I get an exit in the "AWT-XAWT" thread:
>
> #0  0x000003ff7eea77d8 in exit () from /lib64/libc.so.6
> #1  0x000003fd40a14bac in _XIOError () from /lib64/libX11.so.6
> #2  0x000003fd40a125ec in _XEventsQueued () from /lib64/libX11.so.6
> #3  0x000003fd40a0452c in XEventsQueued () from /lib64/libX11.so.6
> #4  0x000003fd40bb773c in Java_sun_awt_X11_XlibWrapper_XEventsQueued (env=0x3fcf032bc10,
>     clazz=0x3fd4146e0f0, display=4384894979008, mode=1)
>     at /local/jdk-zgc-new/src/java.desktop/unix/native/libawt_xawt/xawt/XlibWrapper.c:804
>
>  0x000003fd4146e1b0: 0x000003fd4146e250 #2 method sun.awt.X11.XToolkit.run(Z)V @ 63
>                                         - 8 locals 7 max stack
>
> If I had to guess I'd say something was leaking. The nice thing is, though,
> that the failure is entirely repeatable.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Sun Apr 14 11:57:52 2019
From: aph at redhat.com (Andrew Haley)
Date: Sun, 14 Apr 2019 12:57:52 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
Message-ID: <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>

On 4/13/19 11:33 PM, Stuart Monteith wrote:
>    At some point earlier on another thread the connection is being
> closed, hence the XIOError here. This error is being caused because
> the libx11 library is writing some image data to its socket with
> "writev(int fd, const struct iovec *iov, int iovcnt);" returning an
> error. The errno associated with that is EFAULT - which means the
> kernel was being passed a bad address.
> The reason running with AWT is exiting is that tagged addresses are
> regarded as bad by the kernel,

Great catch.

That sounds like a kernel bug to me. The upper bits are supposed to be
ignored by the system. Can you raise this with a suitable kernel engineer?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From stuart.monteith at linaro.org  Mon Apr 15 08:56:18 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Mon, 15 Apr 2019 09:56:18 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com>
 <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
Message-ID: <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>

Hello,
  I'm afraid this is the expected behaviour:
    https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt

So it is explicit in stating that we don't pass tagged addresses to the kernel.

BR,
   Stuart


BR,
    Stuart

On Sun, 14 Apr 2019 at 12:57, Andrew Haley <aph at redhat.com> wrote:
>
> On 4/13/19 11:33 PM, Stuart Monteith wrote:
> >    At some point earlier on another thread the connection is being
> > closed, hence the XIOError here. This error is being caused because
> > the libx11 library is writing some image data to its socket with
> > "writev(int fd, const struct iovec *iov, int iovcnt);" returning an
> > error. The errno associated with that is EFAULT - which means the
> > kernel was being passed a bad address.
> > The reason running with AWT is exiting is that tagged addresses are
> > regarded as bad by the kernel,
>
> Great catch.
>
> That sounds like a kernel bug to me. The upper bits are supposed to be
> ignored by the system. Can you raise this with a suitable kernel engineer?
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Mon Apr 15 09:33:56 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 15 Apr 2019 10:33:56 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
Message-ID: <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>

On 4/13/19 11:33 PM, Stuart Monteith wrote:
> An alternative is to reproduce the multimapping that is on x86, where
> all addresses are real. This has the advantage that when the Memory
> Tagging Extensions (MTE) in AArch64 are implemented, they won't
> encroach on the ZGC coloured bits, and vice versa. Realistically if we
> are ever to use MTE, ZGC may prevent that from happening.
> 
> I believe the only reason we should continue with using the aarch64
> TBI tagged addresses is if it confers a good performance advantage
> over multimapping. My plan is to patch JNI for now in my ZGC patch,
> and then work on a new patch but with x86 style multimapping, which
> ought to be a straight copy. As I understand it, only one bit of
> colour is significant in ZGC at any given time, so the TLB impact
> might not be so bad for the majority of the time, but we'll need to
> check that.

This seems a shame. Multi-mapping is a kludge that we shouldn't need
on AArch64. Do we know that the colour bits will conflict with MTE? I
would have thought that the only place you're likely to see a problem
is with the GetPrimitiveArrayCritical functions, and they can be
corrected for ZGC.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From per.liden at oracle.com  Mon Apr 15 09:41:15 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 15 Apr 2019 11:41:15 +0200
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
Message-ID: <feb68ba7-5798-f10c-a756-3be104b05bed@oracle.com>

On 04/15/2019 11:33 AM, Andrew Haley wrote:
> On 4/13/19 11:33 PM, Stuart Monteith wrote:
>> An alternative is to reproduce the multimapping that is on x86, where
>> all addresses are real. This has the advantage that when the Memory
>> Tagging Extensions (MTE) in AArch64 are implemented, they won't
>> encroach on the ZGC coloured bits, and vice versa. Realistically if we
>> are ever to use MTE, ZGC may prevent that from happening.
>>
>> I believe the only reason we should continue with using the aarch64
>> TBI tagged addresses is if it confers a good performance advantage
>> over multimapping. My plan is to patch JNI for now in my ZGC patch,
>> and then work on a new patch but with x86 style multimapping, which
>> ought to be a straight copy. As I understand it, only one bit of
>> colour is significant in ZGC at any given time, so the TLB impact
>> might not be so bad for the majority of the time, but we'll need to
>> check that.
> 
> This seems a shame. Multi-mapping is a kludge that we shouldn't need
> on AArch64. Do we know that the colour bits will conflict with MTE? I
> would have thought that the only place you're likely to see a problem
> is with the GetPrimitiveArrayCritical functions, and they can be
> corrected for ZGC.
> 

I think so too. I.e. shaving off the colors in the return path for 
GetStringCritical and GetPrimitiveArrayCritical should be enough.

cheers,
Per

From per.liden at oracle.com  Mon Apr 15 09:51:31 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 15 Apr 2019 11:51:31 +0200
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
Message-ID: <ced44c93-f8d7-1e3b-a959-8eef9c669eba@oracle.com>

On 04/15/2019 11:33 AM, Andrew Haley wrote:
> On 4/13/19 11:33 PM, Stuart Monteith wrote:
>> An alternative is to reproduce the multimapping that is on x86, where
>> all addresses are real. This has the advantage that when the Memory
>> Tagging Extensions (MTE) in AArch64 are implemented, they won't
>> encroach on the ZGC coloured bits, and vice versa. Realistically if we
>> are ever to use MTE, ZGC may prevent that from happening.
>>
>> I believe the only reason we should continue with using the aarch64
>> TBI tagged addresses is if it confers a good performance advantage
>> over multimapping. My plan is to patch JNI for now in my ZGC patch,
>> and then work on a new patch but with x86 style multimapping, which
>> ought to be a straight copy. As I understand it, only one bit of
>> colour is significant in ZGC at any given time, so the TLB impact
>> might not be so bad for the majority of the time, but we'll need to
>> check that.
> 
> This seems a shame. Multi-mapping is a kludge that we shouldn't need

Btw, it's only a bit of a kludge if you're on an old kernel, that 
doesn't have memfd_create() support (kernel < 3.17 and kernel < 4.14 
when using large pages). With memfd_create() support it's pretty much 
transparent, with no need to mount file system, etc.

cheers,
Per

> on AArch64. Do we know that the colour bits will conflict with MTE? I
> would have thought that the only place you're likely to see a problem
> is with the GetPrimitiveArrayCritical functions, and they can be
> corrected for ZGC.
> 

From aph at redhat.com  Mon Apr 15 10:53:00 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 15 Apr 2019 11:53:00 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <ced44c93-f8d7-1e3b-a959-8eef9c669eba@oracle.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
 <ced44c93-f8d7-1e3b-a959-8eef9c669eba@oracle.com>
Message-ID: <cefaae08-6ab1-3a15-4247-a3c29d35f400@redhat.com>

On 4/15/19 10:51 AM, Per Liden wrote:
> Btw, it's only a bit of a kludge if you're on an old kernel, that 
> doesn't have memfd_create() support (kernel < 3.17 and kernel < 4.14 
> when using large pages). With memfd_create() support it's pretty much 
> transparent, with no need to mount file system, etc.

I agree, but you still end up with aliases in the TLB. Given that the
L1 TLB is maybe 10-32 items in size, we really shouldn't be using
scarce resources if we can avoid doing so. This is something of a pain
point: Java tends to be memory heavy, with lots of pointer chasing,
and Arm cores tend to be a little smaller than Intel's.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From per.liden at oracle.com  Mon Apr 15 11:24:34 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 15 Apr 2019 13:24:34 +0200
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <cefaae08-6ab1-3a15-4247-a3c29d35f400@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
 <ced44c93-f8d7-1e3b-a959-8eef9c669eba@oracle.com>
 <cefaae08-6ab1-3a15-4247-a3c29d35f400@redhat.com>
Message-ID: <5d04b17a-1109-1f19-5b89-1ae0cd7391c4@oracle.com>

On 04/15/2019 12:53 PM, Andrew Haley wrote:
> On 4/15/19 10:51 AM, Per Liden wrote:
>> Btw, it's only a bit of a kludge if you're on an old kernel, that
>> doesn't have memfd_create() support (kernel < 3.17 and kernel < 4.14
>> when using large pages). With memfd_create() support it's pretty much
>> transparent, with no need to mount file system, etc.
> 
> I agree, but you still end up with aliases in the TLB. Given that the
> L1 TLB is maybe 10-32 items in size, we really shouldn't be using
> scarce resources if we can avoid doing so. This is something of a pain
> point: Java tends to be memory heavy, with lots of pointer chasing,
> and Arm cores tend to be a little smaller than Intel's.

Only one of the three heap views (heap mappings) is actively accessed by 
threads at any given time, so the TLB will be fully utilized with no 
need to keep aliases around. We're only switching heap views twice per 
GC cycle (which could be many seconds or minutes apart), so the effect 
multi-mapping has on the TLB is negligible.

cheers,
Per

From aph at redhat.com  Mon Apr 15 11:30:42 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 15 Apr 2019 12:30:42 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <5d04b17a-1109-1f19-5b89-1ae0cd7391c4@oracle.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
 <ced44c93-f8d7-1e3b-a959-8eef9c669eba@oracle.com>
 <cefaae08-6ab1-3a15-4247-a3c29d35f400@redhat.com>
 <5d04b17a-1109-1f19-5b89-1ae0cd7391c4@oracle.com>
Message-ID: <dd40ec33-52d3-acdf-48dc-d6b6a2e6154d@redhat.com>

On 4/15/19 12:24 PM, Per Liden wrote:
> On 04/15/2019 12:53 PM, Andrew Haley wrote:
>> On 4/15/19 10:51 AM, Per Liden wrote:
>>> Btw, it's only a bit of a kludge if you're on an old kernel, that
>>> doesn't have memfd_create() support (kernel < 3.17 and kernel < 4.14
>>> when using large pages). With memfd_create() support it's pretty much
>>> transparent, with no need to mount file system, etc.
>>
>> I agree, but you still end up with aliases in the TLB. Given that the
>> L1 TLB is maybe 10-32 items in size, we really shouldn't be using
>> scarce resources if we can avoid doing so. This is something of a pain
>> point: Java tends to be memory heavy, with lots of pointer chasing,
>> and Arm cores tend to be a little smaller than Intel's.
> 
> Only one of the three heap views (heap mappings) is actively
> accessed by threads at any given time, so the TLB will be fully
> utilized with no need to keep aliases around. We're only switching
> heap views twice per GC cycle (which could be many seconds or
> minutes apart), so the effect multi-mapping has on the TLB is
> negligible.

Thank you, good point. So the only real effect of the multi-mapping
will be some small additional use of kernel resources and a reduction
in address space.

So, I think it's perhaps not so important to use the AArch64 tag bits
for practical reasons, but I think it's still worth a try,

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From per.liden at oracle.com  Mon Apr 15 12:10:14 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 15 Apr 2019 14:10:14 +0200
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <dd40ec33-52d3-acdf-48dc-d6b6a2e6154d@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <ec8ec07a-c2b4-02f6-7e32-f4e42312e8a0@redhat.com>
 <ced44c93-f8d7-1e3b-a959-8eef9c669eba@oracle.com>
 <cefaae08-6ab1-3a15-4247-a3c29d35f400@redhat.com>
 <5d04b17a-1109-1f19-5b89-1ae0cd7391c4@oracle.com>
 <dd40ec33-52d3-acdf-48dc-d6b6a2e6154d@redhat.com>
Message-ID: <1d97a659-b708-d905-49da-8a4c5a5606e3@oracle.com>

On 04/15/2019 01:30 PM, Andrew Haley wrote:
> On 4/15/19 12:24 PM, Per Liden wrote:
>> On 04/15/2019 12:53 PM, Andrew Haley wrote:
>>> On 4/15/19 10:51 AM, Per Liden wrote:
>>>> Btw, it's only a bit of a kludge if you're on an old kernel, that
>>>> doesn't have memfd_create() support (kernel < 3.17 and kernel < 4.14
>>>> when using large pages). With memfd_create() support it's pretty much
>>>> transparent, with no need to mount file system, etc.
>>>
>>> I agree, but you still end up with aliases in the TLB. Given that the
>>> L1 TLB is maybe 10-32 items in size, we really shouldn't be using
>>> scarce resources if we can avoid doing so. This is something of a pain
>>> point: Java tends to be memory heavy, with lots of pointer chasing,
>>> and Arm cores tend to be a little smaller than Intel's.
>>
>> Only one of the three heap views (heap mappings) is actively
>> accessed by threads at any given time, so the TLB will be fully
>> utilized with no need to keep aliases around. We're only switching
>> heap views twice per GC cycle (which could be many seconds or
>> minutes apart), so the effect multi-mapping has on the TLB is
>> negligible.
> 
> Thank you, good point. So the only real effect of the multi-mapping
> will be some small additional use of kernel resources and a reduction
> in address space.

Yes, that's correct.

The main problem we've seen so far is more of an educational challenge. 
People often use tools like 'top' and 'ps' to see the %MEM or RSS of the 
process. Unfortunately, RSS is a pretty meaningless number when a 
process has shared mappings (the ZGC heap is a shared mapping). The RSS 
becomes inflated and can confuse users about the actual process size. 
Fortunately, Linux also reports PSS, which is the the more correct and 
non-inflated version of RSS, that people should look at. Tools like 
'smem', 'ps_mem', 'procrank', etc report this (instead of RSS).

> 
> So, I think it's perhaps not so important to use the AArch64 tag bits
> for practical reasons, but I think it's still worth a try,
> 
I completely agree.

/Per

From aph at redhat.com  Mon Apr 15 14:56:48 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 15 Apr 2019 15:56:48 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
Message-ID: <8457301d-d13f-72f0-4963-0d668f84b0a7@redhat.com>

On 4/15/19 3:00 PM, Andrew Dinn wrote:
> Yikes! Does that not imply that when we return from a SIGSEGV into a
> handler that an oop held in a register may have its tags wiped?

We'll have to try it.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From stuart.monteith at linaro.org  Mon Apr 15 14:57:44 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Mon, 15 Apr 2019 15:57:44 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com>
 <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
Message-ID: <CAEGA6ka7jQ1DoFrh74fNhoeA2XRy-ARy5C7Cnn2RaD9zGRDi3g@mail.gmail.com>

Hello,
   If I understand this correctly, siginfo_t would contain the
faulting address with the tag wiped. The ucontext would still contain
the full 64-bit registers, unmolested. I've asked internally, but that
is how I've interpreted it.

BR,
  Stuart

On Mon, 15 Apr 2019 at 15:00, Andrew Dinn <adinn at redhat.com> wrote:
>
> On 15/04/2019 09:56, Stuart Monteith wrote:
> > Hello,
> >   I'm afraid this is the expected behaviour:
> >     https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt
> >
> > So it is explicit in stating that we don't pass tagged addresses to the kernel.
> "Non-zero tags are not preserved when delivering signals. This means
> that signal handlers in applications making use of tags cannot rely on
> the tag information for user virtual addresses being maintained for
> fields inside siginfo_t. One exception to this rule is for signals
> raised in response to watchpoint debug exceptions, where the tag
> information will be preserved."
>
> Yikes! Does that not imply that when we return from a SIGSEGV into a
> handler that an oop held in a register may have its tags wiped?
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander

From stuart.monteith at linaro.org  Mon Apr 15 14:59:53 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Mon, 15 Apr 2019 15:59:53 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <8457301d-d13f-72f0-4963-0d668f84b0a7@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com>
 <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
 <8457301d-d13f-72f0-4963-0d668f84b0a7@redhat.com>
Message-ID: <CAEGA6kbjeMqvtZ36fPteRCrmAxpTBQekAT3iC3JtKAju=a74qA@mail.gmail.com>

Furthermore, if the top 8-bits of GPRs were being wiped on SIGSEGV,
and not restored on return, then we'd already be experiencing that
problem. ZGC simply makes use of an existing feature - I've not turned
it on. It just so happens that for GPRs with the pointers, the top 8
bits is now significant.

On Mon, 15 Apr 2019 at 15:56, Andrew Haley <aph at redhat.com> wrote:
>
> On 4/15/19 3:00 PM, Andrew Dinn wrote:
> > Yikes! Does that not imply that when we return from a SIGSEGV into a
> > handler that an oop held in a register may have its tags wiped?
>
> We'll have to try it.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Mon Apr 15 15:25:41 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 15 Apr 2019 16:25:41 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <CAEGA6ka7jQ1DoFrh74fNhoeA2XRy-ARy5C7Cnn2RaD9zGRDi3g@mail.gmail.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
 <CAEGA6ka7jQ1DoFrh74fNhoeA2XRy-ARy5C7Cnn2RaD9zGRDi3g@mail.gmail.com>
Message-ID: <3a560c90-805e-7553-5847-86de5f655d04@redhat.com>

On 4/15/19 3:57 PM, Stuart Monteith wrote:
>    If I understand this correctly, siginfo_t would contain the
> faulting address with the tag wiped. The ucontext would still contain
> the full 64-bit registers, unmolested. I've asked internally, but that
> is how I've interpreted it.

That sounds sensible.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From per.liden at oracle.com  Mon Apr 15 15:33:13 2019
From: per.liden at oracle.com (Per Liden)
Date: Mon, 15 Apr 2019 17:33:13 +0200
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <3a560c90-805e-7553-5847-86de5f655d04@redhat.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com> <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
 <CAEGA6ka7jQ1DoFrh74fNhoeA2XRy-ARy5C7Cnn2RaD9zGRDi3g@mail.gmail.com>
 <3a560c90-805e-7553-5847-86de5f655d04@redhat.com>
Message-ID: <eb4c5bbf-6a65-a069-35e2-b982eb393eb5@oracle.com>

On 04/15/2019 05:25 PM, Andrew Haley wrote:
> On 4/15/19 3:57 PM, Stuart Monteith wrote:
>>     If I understand this correctly, siginfo_t would contain the
>> faulting address with the tag wiped. The ucontext would still contain
>> the full 64-bit registers, unmolested. I've asked internally, but that
>> is how I've interpreted it.
> 
> That sounds sensible.

Also note that depending on what this oop is used for in the signal 
handler, you might not need the tag bits (they typically are only useful 
for ZGC). But if you really do need them, you can call 
ZAddress::good(oop) to slap the bits on again.

cheers,
Per

From stuart.monteith at linaro.org  Mon Apr 15 16:28:53 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Mon, 15 Apr 2019 17:28:53 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <eb4c5bbf-6a65-a069-35e2-b982eb393eb5@oracle.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com>
 <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
 <CAEGA6ka7jQ1DoFrh74fNhoeA2XRy-ARy5C7Cnn2RaD9zGRDi3g@mail.gmail.com>
 <3a560c90-805e-7553-5847-86de5f655d04@redhat.com>
 <eb4c5bbf-6a65-a069-35e2-b982eb393eb5@oracle.com>
Message-ID: <CAEGA6kbePCfbmv1s2ONt0oeAQPN1SzNuJ-h56q64gjGuXei=6Q@mail.gmail.com>

Thanks Per, I've not encountered any issues so far with that. With the
fix to JNI, I expect ZGC should be functioning properly now... until
the next edge-case when it is not.

This is my current set of patches:
   http://cr.openjdk.java.net/~smonteith/zgc/20190415/

It was built on tip from this morning. Meanwhile, I'll work on a patch
for multi-map ZGC for comparison.

BR,
  Stuart


On Mon, 15 Apr 2019 at 16:33, Per Liden <per.liden at oracle.com> wrote:
>
> On 04/15/2019 05:25 PM, Andrew Haley wrote:
> > On 4/15/19 3:57 PM, Stuart Monteith wrote:
> >>     If I understand this correctly, siginfo_t would contain the
> >> faulting address with the tag wiped. The ucontext would still contain
> >> the full 64-bit registers, unmolested. I've asked internally, but that
> >> is how I've interpreted it.
> >
> > That sounds sensible.
>
> Also note that depending on what this oop is used for in the signal
> handler, you might not need the tag bits (they typically are only useful
> for ZGC). But if you really do need them, you can call
> ZAddress::good(oop) to slap the bits on again.
>
> cheers,
> Per

From stuart.monteith at linaro.org  Tue Apr 30 17:36:08 2019
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Tue, 30 Apr 2019 18:36:08 +0100
Subject: [aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, 
 so far
In-Reply-To: <CAEGA6kbePCfbmv1s2ONt0oeAQPN1SzNuJ-h56q64gjGuXei=6Q@mail.gmail.com>
References: <877egxmpx4.fsf@redhat.com>
 <bdedf1a6-94f7-89de-1b6f-a86902cfd0ca@redhat.com>
 <87d0qokk25.fsf@redhat.com>
 <CAEGA6kY-pi=uKCJF3uuTqz5Kjxv=OUzu6LOuGd+hsH2dFaoEiA@mail.gmail.com>
 <87o9a6k55g.fsf@redhat.com>
 <CAEGA6kaPEpWk_hqP2vOTtbia+12UGEoGjLZZNu5W1bNfqif_-Q@mail.gmail.com>
 <875zw9k63l.fsf@redhat.com>
 <CAEGA6kZfKvNNFxBgEuXb6g9+mA=Op_=9NhdCi+LoOHtO9czfSA@mail.gmail.com>
 <090f5722-daa7-bbb0-cff4-f017514a46fb@redhat.com>
 <CAEGA6kaR3Rbp7jKABTayxGtbnPZ8OYbUDo5k-bqFVP6-se4bwA@mail.gmail.com>
 <CAEGA6kaUG_aZrKpnD0sk47_FJv9P2tC_NTSqBGduNfhToA523Q@mail.gmail.com>
 <ac034004-1438-1683-ac7c-38d579f61830@redhat.com>
 <CAEGA6kYS69zL5jB8fN9cwBVx1Mr-R2_EhSjiMAUmSOOD5tzUkg@mail.gmail.com>
 <c72d7573-cf92-9efe-c7df-96a1f4a0274e@redhat.com>
 <CAEGA6kaqvTzs2pwumXN3uM9-Jahjr6eFqn4B7=k47+rypQVN0w@mail.gmail.com>
 <3feb3501-89f2-4e25-f3f6-1e02cf5daf0d@redhat.com>
 <CAEGA6ka7jQ1DoFrh74fNhoeA2XRy-ARy5C7Cnn2RaD9zGRDi3g@mail.gmail.com>
 <3a560c90-805e-7553-5847-86de5f655d04@redhat.com>
 <eb4c5bbf-6a65-a069-35e2-b982eb393eb5@oracle.com>
 <CAEGA6kbePCfbmv1s2ONt0oeAQPN1SzNuJ-h56q64gjGuXei=6Q@mail.gmail.com>
Message-ID: <CAEGA6kYPL7k02u-t=m1LALfuuwgvXMJwmF7G6o8qvyrxYPMokw@mail.gmail.com>

I've made a patchset with multi-mapping for aarch64, which is
literally a copy from x86. It doesn't require the 64bit literal
addresses patch, as the bits all fit within 48 bits:
    http://cr.openjdk.java.net/~smonteith/zgc-mm/20190430/

There is also an updated patchset for the tagged pointers, with a fix
for the JNI "Critical" functions to mask out the tags:
http://cr.openjdk.java.net/~smonteith/zgc/20190430/

These patchsets apply to the tip today, so they take into account the
new 4/8/16TB heap limits Per introduced.

I'm currently regression testing, and benchmarking with/without
tagging. With tags there are 64-bit literals that ought to have a
negative effect on performance, and with multimapping there might be
some impact from the TLB. Having said that, the impact is expected to
be small on the TLB as Per explained.

Once I've settled on an approach, I'll post RFRS - I'd really like for
ZGC for Aarch64 to get into OpenJDK for JDK 13.

BR,
   Stuart

On Mon, 15 Apr 2019 at 17:28, Stuart Monteith
<stuart.monteith at linaro.org> wrote:
>
> Thanks Per, I've not encountered any issues so far with that. With the
> fix to JNI, I expect ZGC should be functioning properly now... until
> the next edge-case when it is not.
>
> This is my current set of patches:
>    http://cr.openjdk.java.net/~smonteith/zgc/20190415/
>
> It was built on tip from this morning. Meanwhile, I'll work on a patch
> for multi-map ZGC for comparison.
>
> BR,
>   Stuart
>
>
> On Mon, 15 Apr 2019 at 16:33, Per Liden <per.liden at oracle.com> wrote:
> >
> > On 04/15/2019 05:25 PM, Andrew Haley wrote:
> > > On 4/15/19 3:57 PM, Stuart Monteith wrote:
> > >>     If I understand this correctly, siginfo_t would contain the
> > >> faulting address with the tag wiped. The ucontext would still contain
> > >> the full 64-bit registers, unmolested. I've asked internally, but that
> > >> is how I've interpreted it.
> > >
> > > That sounds sensible.
> >
> > Also note that depending on what this oop is used for in the signal
> > handler, you might not need the tag bits (they typically are only useful
> > for ZGC). But if you really do need them, you can call
> > ZAddress::good(oop) to slap the bits on again.
> >
> > cheers,
> > Per