From kim.barrett at oracle.com  Fri Nov  1 01:15:56 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 31 Oct 2019 21:15:56 -0400
Subject: RFR: 8233359: Add global sized operator delete definitions 
Message-ID: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>

Please review this addition of replacement implementations for the
global sized deallocation functions that were added by C++14.

Since Visual Studio 2017 or later always provides C++14 or later, we
should be including these when using those compiler versions.

We also need these definitions when doing experimental C++14 builds
with gcc (in preparation for JEP 347), to avoid -Wsized-deallocation
warnings (enabled by the recent addition of -Wextra).

Rather than trying to determine whether the definitions are needed or
not, we add them unconditionally. It's harmless to provide such
definitions in non-product builds for pre-C++14 compilers; they just
won't ever be called.

CR:
https://bugs.openjdk.java.net/browse/JDK-8233359

Webrev:
https://cr.openjdk.java.net/~kbarrett/8233359/open.00/

Testing:
mach5 tier1


From david.holmes at oracle.com  Fri Nov  1 05:07:38 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 1 Nov 2019 15:07:38 +1000
Subject: RFR: 8233359: Add global sized operator delete definitions
In-Reply-To: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>
References: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>
Message-ID: <9d617173-f53a-ea9e-33c4-cb7127c71530@oracle.com>

Hi Kim,

That looks fine and trivial IMO.

Thanks,
David

On 1/11/2019 11:15 am, Kim Barrett wrote:
> Please review this addition of replacement implementations for the
> global sized deallocation functions that were added by C++14.
> 
> Since Visual Studio 2017 or later always provides C++14 or later, we
> should be including these when using those compiler versions.
> 
> We also need these definitions when doing experimental C++14 builds
> with gcc (in preparation for JEP 347), to avoid -Wsized-deallocation
> warnings (enabled by the recent addition of -Wextra).
> 
> Rather than trying to determine whether the definitions are needed or
> not, we add them unconditionally. It's harmless to provide such
> definitions in non-product builds for pre-C++14 compilers; they just
> won't ever be called.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8233359
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8233359/open.00/
> 
> Testing:
> mach5 tier1
> 

From kim.barrett at oracle.com  Fri Nov  1 05:30:14 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 1 Nov 2019 01:30:14 -0400
Subject: RFR: 8233359: Add global sized operator delete definitions
In-Reply-To: <9d617173-f53a-ea9e-33c4-cb7127c71530@oracle.com>
References: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>
 <9d617173-f53a-ea9e-33c4-cb7127c71530@oracle.com>
Message-ID: <0B2FC930-99B3-43C7-A20B-B394DAB63D02@oracle.com>

> On Nov 1, 2019, at 1:07 AM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Kim,
> 
> That looks fine and trivial IMO.

Thanks.

> 
> Thanks,
> David
> 
> On 1/11/2019 11:15 am, Kim Barrett wrote:
>> Please review this addition of replacement implementations for the
>> global sized deallocation functions that were added by C++14.
>> Since Visual Studio 2017 or later always provides C++14 or later, we
>> should be including these when using those compiler versions.
>> We also need these definitions when doing experimental C++14 builds
>> with gcc (in preparation for JEP 347), to avoid -Wsized-deallocation
>> warnings (enabled by the recent addition of -Wextra).
>> Rather than trying to determine whether the definitions are needed or
>> not, we add them unconditionally. It's harmless to provide such
>> definitions in non-product builds for pre-C++14 compilers; they just
>> won't ever be called.
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8233359
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8233359/open.00/
>> Testing:
>> mach5 tier1


From christoph.langer at sap.com  Fri Nov  1 07:35:48 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Fri, 1 Nov 2019 07:35:48 +0000
Subject: RFR(S): 8232980: Cleanup initialization of function pointers into
 java.base from classloader.cpp
In-Reply-To: <8ae9cc08-397d-2ee1-76b0-15bef8aae770@oracle.com>
References: <PR1PR02MB481021B96BB4795C1AA9B22A8A6A0@PR1PR02MB4810.eurprd02.prod.outlook.com>
 <4cd57959-2c13-03d7-4eea-4ece37fd067b@oracle.com>
 <AM6PR02MB4801E9752B2B4F758CCE4CD18A610@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <f7ebad08-1888-cac6-9b29-a335e66af045@oracle.com>
 <AM6PR02MB480179F3158896678970F60E8A600@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <8ae9cc08-397d-2ee1-76b0-15bef8aae770@oracle.com>
Message-ID: <AM6PR02MB48013D26F8F2F791864EC0258A620@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi Ioi, Calvin,

thanks for reviewing. I've pushed now after running through submit.

Best regards
Christoph

> -----Original Message-----
> From: Calvin Cheung <calvin.cheung at oracle.com>
> Sent: Mittwoch, 30. Oktober 2019 16:55
> To: Langer, Christoph <christoph.langer at sap.com>; Ioi Lam
> <ioi.lam at oracle.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(S): 8232980: Cleanup initialization of function pointers into
> java.base from classloader.cpp
> 
> Hi Christoph,
> 
> The updated webrev looks good.
> 
> thanks,
> 
> Calvin
> 
> On 10/30/19 12:48 AM, Langer, Christoph wrote:
> > Hi Ioi,
> >
> > you're right, we should prefer methods over macros - that's way nicer ??
> So, I changed the macro into a method and I also removed the comments
> which don't do more than stating the obvious.
> >
> > Please check: http://cr.openjdk.java.net/~clanger/webrevs/8232980.2/
> >
> > Thanks
> > Christoph
> >
> >> -----Original Message-----
> >> From: Ioi Lam <ioi.lam at oracle.com>
> >> Sent: Dienstag, 29. Oktober 2019 16:35
> >> To: Langer, Christoph <christoph.langer at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; Calvin Cheung <calvin.cheung at oracle.com>
> >> Subject: Re: RFR(S): 8232980: Cleanup initialization of function pointers
> into
> >> java.base from classloader.cpp
> >>
> >> Hi Christoph,
> >>
> >> I think in general we should avoid macros to improve debuggability. It
> >> may be better to use a helper function. Also, this way you don't need to
> >> come up with a different error message for each function.
> >>
> >> void* ClassLoader::dll_lookup(void *lib, const char *name) {
> >>   ? void func = os::dll_lookup(lib, "Canonicalize"));
> >>   ? if (func == NULL) {
> >>   ??? vm_exit_during_initialization("function %s not found", name);
> >>   ? }
> >>   ? return func;
> >> }
> >>
> >> CanonicalizeEntry = CAST_TO_FN_PTR(canonicalize_fn_t,
> >> dll_lookup(javalib_handle, "Canonicalize"));
> >>
> >>
> >>
> >> Also, I think this comment is not necessary as it's clear what the code
> >> is trying to do:
> >>
> >> // Lookup canonicalize entry in libjava.dll
> >>
> >> Thanks
> >> - Ioi
> >>
> >>
> >> On 10/29/19 3:08 AM, Langer, Christoph wrote:
> >>> Hi Ioi, Calvin,
> >>>
> >>> thanks for looking at my RFR. I've addressed your points in an updated
> >> webrev: http://cr.openjdk.java.net/~clanger/webrevs/8232980.1/
> >>> - I removed ClassLoader::decompress and resolving of ZipInflateFully.
> >>> - For the checking of the resolved symbols, I defined macro
> >> CHECK_RESOLVED_OR_EXIT. It'll check for NULL and in that case, exit via
> >> vm_exit_during_initialization.
> >>> - in the destructor ClassPathZipEntry::~ClassPathZipEntry, ZipClose is
> called
> >> unconditionally (without checking for ZipClose not being null).
> >>> - ClassLoader::crc32: assert removed, since failing to resolve the Crc32
> >> pointer will cause vm_exit_during_initialization
> >>> Please check the new version ??
> >>>
> >>> Thanks
> >>> Christoph
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>>> bounces at openjdk.java.net> On Behalf Of Ioi Lam
> >>>> Sent: Donnerstag, 24. Oktober 2019 21:00
> >>>> To: hotspot-runtime-dev at openjdk.java.net
> >>>> Subject: Re: RFR(S): 8232980: Cleanup initialization of function pointers
> >> into
> >>>> java.base from classloader.cpp
> >>>>
> >>>> Hi Chtistoph,
> >>>>
> >>>> Thanks for fixing this.
> >>>>
> >>>> I think is OK to leave the 3 functions separate, as each of them are
> >>>> non-trivial.
> >>>>
> >>>> In fact, we can probably delay calling of
> >>>> ClassLoader::load_zip_library() in ClassPathZipEntry::open_entry, so
> >>>> apps that don't use JAR files can start up a little faster. I'll file an
> >>>> REF for this: JDK-8232989.
> >>>>
> >>>> For repeated calls like this:
> >>>>
> >>>>
> >>>> 992 ZipOpen = CAST_TO_FN_PTR(ZipOpen_t, os::dll_lookup(handle,
> >>>> "ZIP_Open"));
> >>>> 993 if (ZipOpen == NULL) {
> >>>> 994 vm_exit_during_initialization("Corrupted ZIP library: ZIP_Open
> >>>> missing", path);
> >>>> 995 }
> >>>> 996 ZipClose = CAST_TO_FN_PTR(ZipClose_t, os::dll_lookup(handle,
> >>>> "ZIP_Close"));
> >>>> 997 if (ZipClose == NULL) {
> >>>> 998 vm_exit_during_initialization("Corrupted ZIP library: ZIP_Close
> >>>> missing", path);
> >>>> 999 }
> >>>>
> >>>>
> >>>> I think it's better to use a utility function to do the check and exiting.
> >>>>
> >>>> Thanks
> >>>> - Ioi
> >>>>
> >>>> On 10/24/19 8:41 AM, Langer, Christoph wrote:
> >>>>> Hi,
> >>>>>
> >>>>> please help reviewing a cleanup patch to classLoader.cpp.
> >>>>>
> >>>>> The methods load_zip_library() and load_jimage_library() can be
> >> cleaned
> >>>> up a little bit. In my patch, I also extracted the initialization of the one
> >> symbol
> >>>> coming from libjava to a new method load_java_library(). However, I'm
> >> not
> >>>> fully sure on whether it would be nicer to have all these 3 methods
> >>>> consolidated into one. What do you think?
> >>>>> In my patch I check for all needed symbols since it's all coming from
> the
> >> JDK
> >>>> and we can assume consistency. Should there be a problem in resolving
> >>>> some symbol, then VM initialization should fail.
> >>>>> Furthermore, I'm wondering, whether to use guarantee or
> >>>> vm_exit_during_initialization for the NULL checks of the resolved
> >> symbols.
> >>>> Currently we have both but I think we should use one consistent
> >> approach. I
> >>>> think vm_exit_during_initialization would be the best fit. Opinions?
> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232980
> >>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8232980.0/
> >>>>>
> >>>>> Thanks
> >>>>> Christoph
> >>>>>

From suenaga at oss.nttdata.com  Fri Nov  1 09:41:26 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Fri, 1 Nov 2019 18:41:26 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
Message-ID: <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>

Forward to hotspot-runtime-dev.

As David commented in JBS, it may need to be fixed in JFR code.
But I'm not unclear why thread state is not recover.

I'd like to hear about this from JFR folks.
If it is just a bug in JFR, I will create a patch which recover it in JFR code.


Thanks,

Yasumasa


-------- Forwarded Message --------
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
Date: Fri, 1 Nov 2019 17:08:42 +0900
From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
To: hotspot-jfr-dev at openjdk.java.net
CC: yasuenag at gmail.com <yasuenag at gmail.com>

Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/

If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
It would perform in prepare_for_emergency_dump().
However this function transits thread state to "_thread_in_vm".

This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
It failed at compiler/types/correctness/CorrectnessTest.java
However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
So I think this patch can go through.


Thanks,

Yasumasa

From thomas.stuefe at gmail.com  Fri Nov  1 10:36:41 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Fri, 1 Nov 2019 11:36:41 +0100
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
Message-ID: <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>

Hi Yasumasa,

I see that we do JFR::on_vm_shutdown() before error reporting ran. Is that
really necessary? Error reporting should happen as close as possible to the
error point - ideally, as little code as possible should run between the
crash/assert and the generation of the hs-err file. I suggest moving the
call to JFR::on_vm_shutdown()
down to a point after error reporting, e.g. to where we print the NMT
report on shutdown.

Cheers, Thomas


On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> Forward to hotspot-runtime-dev.
>
> As David commented in JBS, it may need to be fixed in JFR code.
> But I'm not unclear why thread state is not recover.
>
> I'd like to hear about this from JFR folks.
> If it is just a bug in JFR, I will create a patch which recover it in JFR
> code.
>
>
> Thanks,
>
> Yasumasa
>
>
> -------- Forwarded Message --------
> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
> Date: Fri, 1 Nov 2019 17:08:42 +0900
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> To: hotspot-jfr-dev at openjdk.java.net
> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>
> Hi all,
>
> Please review this change:
>
>    JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>    webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>
> If JFR is running when JVM crashes, JFR will dump data to
> hs_err_pid<PID>.jfr .
> It would perform in prepare_for_emergency_dump().
> However this function transits thread state to "_thread_in_vm".
>
> This change has been tested on submit repo as
> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
> It failed at compiler/types/correctness/CorrectnessTest.java
> However this test is for JIT compiler, and related issue has been reported
> as JDK-8225620.
> So I think this patch can go through.
>
>
> Thanks,
>
> Yasumasa
>

From coleen.phillimore at oracle.com  Fri Nov  1 13:01:33 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 1 Nov 2019 09:01:33 -0400
Subject: RFR (T) 8229894: SEGV during class loading in
 InstanceKlass::print_class_load_logging
Message-ID: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>

Summary: NULL initialize decorations that are not used to make crash 
easier to understand next time.

This doesn't fix the bug (I can't find or reproduce it), but might be 
helpful for debugging if it happens again.

Tested with tier1 all Oracle platforms, and tier2,3 linux-x64-debug.

open webrev at http://cr.openjdk.java.net/~coleenp/2019/8229894.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8229894

Thanks,
Coleen

From shade at redhat.com  Fri Nov  1 13:05:24 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 1 Nov 2019 14:05:24 +0100
Subject: RFR (T) 8229894: SEGV during class loading in
 InstanceKlass::print_class_load_logging
In-Reply-To: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
References: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
Message-ID: <5b7099f9-4850-2214-1315-ed3a93d9eed3@redhat.com>

On 11/1/19 2:01 PM, coleen.phillimore at oracle.com wrote:
> Summary: NULL initialize decorations that are not used to make crash easier to understand next time.
> 
> This doesn't fix the bug (I can't find or reproduce it), but might be helpful for debugging if it
> happens again.

But, if this change does not fix the reported bug, then it should not go under its bug ID?

> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8229894.01/webrev

The patch itself looks good.

-- 
Thanks,
-Aleksey


From coleen.phillimore at oracle.com  Fri Nov  1 13:08:13 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 1 Nov 2019 09:08:13 -0400
Subject: RFR (T) 8229894: SEGV during class loading in
 InstanceKlass::print_class_load_logging
In-Reply-To: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
References: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
Message-ID: <51cc65fb-59d9-d59c-20c7-16cb1d0d3d39@oracle.com>

This is actually a fix for the trivial change:

8233386: Initialize NULL fields for unused decorations

Thanks,
Coleen

On 11/1/19 9:01 AM, coleen.phillimore at oracle.com wrote:
> Summary: NULL initialize decorations that are not used to make crash 
> easier to understand next time.
>
> This doesn't fix the bug (I can't find or reproduce it), but might be 
> helpful for debugging if it happens again.
>
> Tested with tier1 all Oracle platforms, and tier2,3 linux-x64-debug.
>
> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8229894.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8229894
>
> Thanks,
> Coleen


From harold.seigel at oracle.com  Fri Nov  1 13:33:08 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Fri, 1 Nov 2019 09:33:08 -0400
Subject: RFR (T) 8229894: SEGV during class loading in
 InstanceKlass::print_class_load_logging
In-Reply-To: <51cc65fb-59d9-d59c-20c7-16cb1d0d3d39@oracle.com>
References: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
 <51cc65fb-59d9-d59c-20c7-16cb1d0d3d39@oracle.com>
Message-ID: <7f2a9b13-fb74-6b22-99e6-d4a3dc7f0338@oracle.com>

Looks good and trivial.

Thanks, Harold

On 11/1/2019 9:08 AM, coleen.phillimore at oracle.com wrote:
> This is actually a fix for the trivial change:
>
> 8233386: Initialize NULL fields for unused decorations
>
> Thanks,
> Coleen
>
> On 11/1/19 9:01 AM, coleen.phillimore at oracle.com wrote:
>> Summary: NULL initialize decorations that are not used to make crash 
>> easier to understand next time.
>>
>> This doesn't fix the bug (I can't find or reproduce it), but might be 
>> helpful for debugging if it happens again.
>>
>> Tested with tier1 all Oracle platforms, and tier2,3 linux-x64-debug.
>>
>> open webrev at 
>> http://cr.openjdk.java.net/~coleenp/2019/8229894.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8229894
>>
>> Thanks,
>> Coleen
>

From daniel.daugherty at oracle.com  Fri Nov  1 13:52:33 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 1 Nov 2019 09:52:33 -0400
Subject: RFR (T) 8229894: SEGV during class loading in
 InstanceKlass::print_class_load_logging
In-Reply-To: <7f2a9b13-fb74-6b22-99e6-d4a3dc7f0338@oracle.com>
References: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
 <51cc65fb-59d9-d59c-20c7-16cb1d0d3d39@oracle.com>
 <7f2a9b13-fb74-6b22-99e6-d4a3dc7f0338@oracle.com>
Message-ID: <3191fb6b-9e36-aba4-0389-7ca14b755f9e@oracle.com>

Thumbs up. Please be careful to push the change using:

8233386: Initialize NULL fields for unused decorations

Dan

On 11/1/19 9:33 AM, Harold Seigel wrote:
> Looks good and trivial.
>
> Thanks, Harold
>
> On 11/1/2019 9:08 AM, coleen.phillimore at oracle.com wrote:
>> This is actually a fix for the trivial change:
>>
>> 8233386: Initialize NULL fields for unused decorations
>>
>> Thanks,
>> Coleen
>>
>> On 11/1/19 9:01 AM, coleen.phillimore at oracle.com wrote:
>>> Summary: NULL initialize decorations that are not used to make crash 
>>> easier to understand next time.
>>>
>>> This doesn't fix the bug (I can't find or reproduce it), but might 
>>> be helpful for debugging if it happens again.
>>>
>>> Tested with tier1 all Oracle platforms, and tier2,3 linux-x64-debug.
>>>
>>> open webrev at 
>>> http://cr.openjdk.java.net/~coleenp/2019/8229894.01/webrev
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8229894
>>>
>>> Thanks,
>>> Coleen
>>
>


From coleen.phillimore at oracle.com  Fri Nov  1 14:02:31 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 1 Nov 2019 10:02:31 -0400
Subject: RFR (T) 8229894: SEGV during class loading in
 InstanceKlass::print_class_load_logging
In-Reply-To: <3191fb6b-9e36-aba4-0389-7ca14b755f9e@oracle.com>
References: <bf331b30-e7ae-f2c7-6647-cefa2df8f5d3@oracle.com>
 <51cc65fb-59d9-d59c-20c7-16cb1d0d3d39@oracle.com>
 <7f2a9b13-fb74-6b22-99e6-d4a3dc7f0338@oracle.com>
 <3191fb6b-9e36-aba4-0389-7ca14b755f9e@oracle.com>
Message-ID: <dd64c55f-5f11-d35a-0ac1-78aa147f65f7@oracle.com>


Thanks Aleksey, Harold and Dan.? I fixed the bugid in my commit 
comments, which my checkin script picks up.? I also regenerated the 
webrev, so there's a match.

open webrev at http://cr.openjdk.java.net/~coleenp/2019/8233386.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8233386

Thanks for checking!
Coleen

On 11/1/19 9:52 AM, Daniel D. Daugherty wrote:
> Thumbs up. Please be careful to push the change using:
>
> 8233386: Initialize NULL fields for unused decorations
>
> Dan
>
> On 11/1/19 9:33 AM, Harold Seigel wrote:
>> Looks good and trivial.
>>
>> Thanks, Harold
>>
>> On 11/1/2019 9:08 AM, coleen.phillimore at oracle.com wrote:
>>> This is actually a fix for the trivial change:
>>>
>>> 8233386: Initialize NULL fields for unused decorations
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/1/19 9:01 AM, coleen.phillimore at oracle.com wrote:
>>>> Summary: NULL initialize decorations that are not used to make 
>>>> crash easier to understand next time.
>>>>
>>>> This doesn't fix the bug (I can't find or reproduce it), but might 
>>>> be helpful for debugging if it happens again.
>>>>
>>>> Tested with tier1 all Oracle platforms, and tier2,3 linux-x64-debug.
>>>>
>>>> open webrev at 
>>>> http://cr.openjdk.java.net/~coleenp/2019/8229894.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8229894
>>>>
>>>> Thanks,
>>>> Coleen
>>>
>>
>


From daniel.daugherty at oracle.com  Fri Nov  1 16:08:48 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 1 Nov 2019 12:08:48 -0400
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <4636ebe3-b783-a104-0ea6-23ab7fb47e0a@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <23107754-b388-f3fb-c3a1-0b85523e4049@oracle.com>
 <91674cd2-f9e2-bafe-b3bb-36c66d963de9@oracle.com>
 <a67d549c-b3a5-6da6-ca45-3c64a7a75724@oracle.com>
 <7c82dcf9-e7c3-cb0c-9375-15130848f7d7@oracle.com>
 <efe728a7-1a77-fda3-f598-dd3ced64197e@oracle.com>
 <4636ebe3-b783-a104-0ea6-23ab7fb47e0a@oracle.com>
Message-ID: <bae398b2-ce0f-dd4c-1085-b66966ba4be1@oracle.com>

Hi David,

Thanks for closing the loop on the load_acquire() discussion.
More below...


On 10/31/19 12:00 AM, David Holmes wrote:
> Hi Dan,
>
> First I've seen your additional tracking emails, but for now just want 
> to respond to the load_acquire discussion below ...
>
> On 30/10/2019 6:18 am, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Sorry for the delay in replying to this email. I spent most of last week
>> migrating my development environment from my dying MBP13 to a new MBP13.
>> I think I have everything recovered, but only time will tell...
>>
>> More below...
>>
>> On 10/23/19 12:00 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Trimming to the open issues.
>>
>> I'll try to do the same (but was only able to trim one thing)...
>>
>>
>>>
>>> On 23/10/2019 6:46 am, Daniel D. Daugherty wrote:
>>>> Okay, time to return to my uses of release_store() and load_acquire().
>>>> Consider this bit of code:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>
>>>> ???? L351: // Take an ObjectMonitor from the start of the specified 
>>>> list. Also
>>>> ???? L352: // decrements the specified counter. Returns NULL if 
>>>> none are available.
>>>> ???? L353: static ObjectMonitor* 
>>>> take_from_start_of_common(ObjectMonitor* volatile * list_p,
>>>> ???? L354: int volatile * count_p) {
>>>> ???? L355: ? ObjectMonitor* next = NULL;
>>>> ???? L356: ? ObjectMonitor* take = NULL;
>>>> ???? L357: ? // Mark the list head to guard against A-B-A race:
>>>> ???? L358: ? if (!mark_list_head(list_p, &take, &next)) {
>>>> ???? L359: ??? return NULL;? // None are available.
>>>> ???? L360: ? }
>>>> ???? L361: ? // Switch marked list head to next (which unmarks the 
>>>> list head, but
>>>> ???? L362: ? // leaves take marked):
>>>> ???? L363: ? OrderAccess::release_store(list_p, next);
>>>> ???? L364: ? Atomic::dec(count_p);
>>>> ???? L365: ? // Unmark take, but leave the next value for any 
>>>> lagging list
>>>> ???? L366: ? // walkers. It will get cleaned up when take is 
>>>> prepended to
>>>> ???? L367: ? // the in-use list:
>>>> ???? L368: ? set_next(take, next);
>>>> ???? L369: ? return take;
>>>> ???? L370: }
>>>>
>>>>
>>>> On L358, we call mark_list_head() and if that call returns true, then
>>>> the calling thread is the "owner" of the list head. My idea is that
>>>> once my thread "owns" the list head, I can use release_store() as a
>>>> smaller hammer than cmpxchg() to sync out the new value ('next') as
>>>> the new list head.
>>>>
>>>> Should I be doing something other than release_store() here?
>>>>
>>>> My thinking is that release_store() pairs up with the load_acquire()
>>>> of any reader thread that's trying to simply walk the list. And that
>>>> release_store() also pairs up with any _other_ writer thread that's
>>>> trying to mark the list head using:
>>>>
>>>> ???? mark_list_head() -> mark_next() -> cmpxchg()
>>>>
>>>> Here's mark_list_head():
>>>>
>>>> ???? L194: // Mark the next field in the list head ObjectMonitor. 
>>>> If marking was
>>>> ???? L195: // successful, then the mid and the unmarked next field 
>>>> are returned
>>>> ???? L196: // via parameter and true is returned. Otherwise false 
>>>> is returned.
>>>> ???? L197: static bool mark_list_head(ObjectMonitor* volatile * 
>>>> list_p,
>>>> ???? L198: ?????????????????????????? ObjectMonitor** mid_p, 
>>>> ObjectMonitor** next_p) {
>>>> ???? L199: ? while (true) {
>>>> ???? L200: ??? ObjectMonitor* mid = OrderAccess::load_acquire(list_p);
>>>> ???? L201: ??? if (mid == NULL) {
>>>> ???? L202: ????? return false;? // The list is empty so nothing to 
>>>> mark.
>>>> ???? L203: ??? }
>>>> ???? L204: ??? if (mark_next(mid, next_p)) {
>>>> ???? L205: ????? if (OrderAccess::load_acquire(list_p) != mid) {
>>>> ???? L206: ??????? // The list head changed so we have to retry.
>>>> ???? L207: ??????? set_next(mid, *next_p);? // unmark mid
>>>> ???? L208: ??????? continue;
>>>> ???? L209: ????? }
>>>> ???? L210: ????? // We marked next field to guard against races.
>>>> ???? L211: ????? *mid_p = mid;
>>>> ???? L212: ????? return true;
>>>> ???? L213: ??? }
>>>> ???? L214: ? }
>>>> ???? L215: }
>>>>
>>>> which does a matching load_acquire() to get the current list head
>>>> and calls mark_next() to try and mark it. It then calls load_acquire()
>>>> again to verify that the list head hasn't changed while doing the
>>>> mark_next().
>>>>
>>>> So here's mark_next():
>>>>
>>>> ??161 // Mark the next field in an ObjectMonitor. If marking was 
>>>> successful,
>>>> ??162 // then the unmarked next field is returned via parameter and 
>>>> true is
>>>> ??163 // returned. Otherwise false is returned.
>>>> ??164 static bool mark_next(ObjectMonitor* om, ObjectMonitor** 
>>>> next_p) {
>>>> ??165?? // Get current next field without any marking value.
>>>> ??166?? ObjectMonitor* next = (ObjectMonitor*)
>>>> ??167 ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
>>>> ??168?? if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) 
>>>> != next) {
>>>> ??169???? return false;? // Could not mark the next field or it was 
>>>> already marked.
>>>> ??170?? }
>>>> ??171?? *next_p = next;
>>>> ??172?? return true;
>>>> ??173 }
>>>>
>>>> We use load_acquire() on L166-7 to make sure that we have the latest
>>>> value in the next_om field. That load_acquire() matches up with the
>>>> release_store() done by another thread to unmark the next field (see
>>>> the set_next() function). Or it matches with the cmpxchg() done by
>>>> another to mark the next field. Yes, we could do a regular load when
>>>> we know that the field is only changed by cmpxchg(), but it could
>>>> have been changed by release_store() and my thinking is that we need
>>>> a matching load_acquire().
>>>>
>>>> We have to use cmpxchg() here to try and set the mark on the next 
>>>> field.
>>>> If successful, then we return the value of the unmarked next field via
>>>> the 'next_p' param and we return true. However, just because we marked
>>>> the next field in the ObjectMonitor doesn't mean that we marked the 
>>>> list
>>>> head. That's what the second load_acquire() on L205 above verifies.
>>>>
>>>>
>>>> Okay, that was pretty gory on the details of my thinking! I think it's
>>>> pretty clear that I'm doing load_acquire()/release_store() pairs on
>>>> the list head pointers or the next_om fields to make sure that I match
>>>> loads and stores of _just_ the list head pointer or the next_om field
>>>> in question. I'm not trying affect other fields in the ObjectMonitor
>>>> or at least I don't think I have to do that.
>>>>
>>>> Maybe I've misunderstood the load_acquire()/release_store() stuff 
>>>> (again)
>>>> and this code is doing way too much! I would be happy to have my 
>>>> thinking
>>>> about this load_acquire()/release_store() stuff corrected and the code
>>>> simplified. Maybe in this sequence:
>>>>
>>>> ???? mark_list_head() -> mark_next() -> cmpxchg()
>>>>
>>>> we don't need all the load_acquire() calls because the sequence 
>>>> only ends
>>>> successfully with a cmpxchg() and that makes memory happy 
>>>> everywhere with
>>>> just simple loads. Dunno. I definitely need more feedback here!
>>>
>>> Okay that was a massive deep dive :) but it is exactly what I was 
>>> referring to when I said:
>>>
>>> "My main comment on all the list management code pertains to the
>>> difficulty in seeing which release_store is paired with which
>>> load_acquire, and why."
>>
>> Yup. And that comment is exactly why I wrote the "massive deep dive"
>> reply... so I _think_ I replied to the comment and thus proved the
>> point that this stuff is difficult all at the same time. :-)
>>
>>
>>> To understand the use of acquire/release/cmpxchg in detail you 
>>> really need to see all that code inlined/flattened so that you can 
>>> see how they actually arrange themselves.
>>
>> Agreed...
>
> To be clear I can't see the code that way so it is difficult to give 
> concrete opinions on appropriateness of load_acquire/release_store at 
> individual code locations.
>
>>
>>> I'm immediately suspicious of a load_acquire that is just before a 
>>> cmpxchg on the same field because the cmpxchg will subsume any 
>>> memory affects of the load-acquire.
>>
>> Okay, I'm trying to figure out if this comment is meant for L166, L167
>> and L168 of mark_next():
>>
>> ?? 165?? // Get current next field without any marking value.
>> ?? 166?? ObjectMonitor* next = (ObjectMonitor*)
>> ?? 167 ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
>> ?? 168?? if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) 
>> != next) {
>> ?? 169???? return false;? // Could not mark the next field or it was 
>> already marked.
>> ?? 170?? }
>> ?? 171?? *next_p = next;
>> ?? 172?? return true;
>>
>> or if this is just a general comment or both... :-)
>
> Both.
>
>>
>>> But the devil is in the detail.
>>
>> Agreed and with this project there are so many details. :-(
>>
>>
>>> If a field can be set by release_store, and there are previous 
>>> stores related to that which must be visible, then a read of that 
>>> field can't be via a plain load (but either load_acquire or as part 
>>> of cmpxchg).
>>
>> As the comment on L166 says, the goal of L167 and L168 is to get the
>> current next field without any marking value. Since the next field 
>> can be
>> changed by either cmpxchg() or release_store() and we have no idea which
>> one made the most recent update, we have to use a load_acquire() to get
>> the latest value of the next field.
>>
>> As for the cmpxchg() on L168, the potential update operation has to be
>> done with cmpxchg() in order to safely and exclusively allow only one
>> thread to mark the ObjectMonitor's next field (or no threads if the next
>> field is already marked).
>>
>> So... I think I have good reasons for a load_acquire() to be immediately
>> followed by a cmpxchg(), in this case anyway.
>
> Actually no. :) The purpose of a load_acquire is to ensure that 
> subsequent loads of other variables will return values written before 
> the release_store that is paired with the load_acquire. In this code 
> there are no subsequent loads between the load_acquire and the 
> cmpxchg. The cmpxchg itself provides full bi-directional fence 
> semantics, so it deals with any loads after the cmpxchg. Hence the 
> acquire part of the load_acquire is simply not needed.

So instead of this:

 ?? 166?? ObjectMonitor* next = (ObjectMonitor*)
 ?? 167 ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
 ?? 168?? if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) != 
next) {

we can do the simpler:

 ?? 166 ObjectMonitor* next = (ObjectMonitor*)((intptr_t)om->_next_om) & 
~0x1);

and the simple load of the _next_om field will do the trick. Okay,
I'll include that change in my next round of v2.08 testing.


>
>>
>> Of course, I'm assuming that this is the load_acquire(), cmpxchg() pair
>> that you are suspicious of. It could be some other pair or it could be
>> the pairs in general.
>
> Anywhere there are no intervening loads, between the load_acquire, and 
> the cmpxchg, the acquire part is not needed.

I didn't find any other load_acquire(), cmpxchg() pairs when I looked
the other day.


>
>>
>>> But there are so many fields at play here that it is very difficult 
>>> to keep track of everything.
>>>
>>> You mention above:
>>>
>>> "My idea is that once my thread "owns" the list head, I can use 
>>> release_store() as a smaller hammer than cmpxchg() to sync out the 
>>> new value ('next') as the new list head."
>>>
>>> and I tend to agree that the second cmpxchg is certainly not needed 
>>> as there cannot be any contended update after you've marked the 
>>> "next" field.
>>
>> Not quite as strong of an agreement as I was hoping for, but I'll 
>> take it. :-)
>>
>> Is there a different memory update/sync operation I should be using 
>> for my
>> "smaller hammer"?? I don't think so, but I could be missing something...
>>
>> For example, I don't think I can switch from 
>> OrderAccess::release_store()
>> to Atomic::store() and from OrderAccess::load_acquire() to 
>> Atomic::load()
>> because I'll lose the {release, acquire} matching up component.
>
> Well you wouldn't need Atomic operations regardless.
>
> Whether you could use plain load/store depends, as we've already 
> agreed, on whether other accesses are dependent on the release/acquire 
> semantics. And I can't see whether they are not with the current 
> structure of the code, so I have no concrete suggestions here so lets 
> move on.

Moving on.


>
>>
>> I also need to clarify one thing. In my reply to your previous set of
>> comments, I wrote:
>>
>>> I think it's
>>> pretty clear that I'm doing load_acquire()/release_store() pairs on
>>> the list head pointers or the next_om fields to make sure that I match
>>> loads and stores of _just_ the list head pointer or the next_om field
>>> in question. I'm not trying affect other fields in the ObjectMonitor
>>> or at least I don't think I have to do that.
>>
>> Now that I've had time to think about it, I have to retract that last
>> sentence. When doing these lock free list manipulations, we often will
>> change some other field in the ObjectMonitor as part of that 
>> manipulation.
>>
>> For example, as part of deflation, we will clear the _object field when
>> we extract the ObjectMonitor from the containing in-use list and before
>> we prepend it to the global free list. A release_store() is used to 
>> update
>> the next field in the (cur_mid_in_use) ObjectMonitor that refers to the
>> deflated ObjectMonitor. That release_store() is important for sync'ing
>> out the clearing of the _object field because once the deflated
>> ObjectMonitor is unlinked from an in-use list, that field is no longer
>> updated by GC.
>
> Agreed.
>
>>
>>> In essence the marking is implementing a spin lock
>>
>> I like the "spin lock" analogy.
>>
>>
>>> so once you have the "lock" you can use normal load/stores to make 
>>> updates
>>
>> Yup. I get it, e.g., we mark an ObjectMonitor so that we can deflate it
>> and part of deflation is making some updates...
>>
>>
>>> - but that assumes all updates occur under the "lock", that 
>>> releasing the "lock" also has the right barriers and that any reads 
>>> sync correctly with the writes. If any of that doesn't hold then 
>>> release_store and load_acquire will be needed rather than plain 
>>> stores and loads.
>>
>> I _think_ that all updates that we make as part of list manipulation are
>> done after marking the ObjectMonitor's next field and before we unmark
>> that same next field. So I _think_ the "lock" is properly held...
>>
>> Of course, it's always possible that I missed something, but that's what
>> we have code reviewers for right? :-)
>
> I love your sense of humour ;-)

I try... it's helpful to have a sense of humour with ObjectMonitors... :-)


>
>>
>>> So again you'd need to look at all the flattened code paths to see 
>>> how this all hangs together.
>>
>> For my load_acquire() calls I have matched them up with the 
>> corresponding
>> release_store() call. And in some cases the load_acquire() call site 
>> matches
>> up with a call site that does release_store() on one branch and 
>> cmpxchg()
>> on the other. Yes, complicated and mind numbing... Sorry about that...
>>
>> I've also looked at the release_store() calls and matched them up with a
>> corresponding load_acquire().
>>
>> However, I've looked at this code so many different times that I 
>> might be
>> blind to some aspects of it so another pair of eyes looking for missing
>> matches is always welcome.
>
> Without seeing an inlined/flattened version of the code it's near 
> impossible (for me at least) to see the code paths in detail.
>
>>
>>>
>>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>>
>>>>> I have some concerns that the serviceThread must now wakeup 
>>>>> periodically instead of only on demand. It has a lot of tasks to 
>>>>> check now and it is not obvious how complex or trivial those 
>>>>> checks are. We don't currently gather any statistics in this area 
>>>>> so we have no idea how long the serviceThread typically waits 
>>>>> between requests. The fact the serviceThread is woken on demand 
>>>>> means that the desire to be "checked at the same interval" seems 
>>>>> exceedingly unlikely - the serviceThread would have to waiting for 
>>>>> extended periods of time for the timed-wait to have any real 
>>>>> affect on the frequency of the async monitor deflation.
>>>>
>>>> Periodically waking up the ServiceThread is done to match the 
>>>> safepoint
>>>> cleanup period of GuaranteedSafepointInterval when doing safepoint 
>>>> based
>>>> deflation. Without waking up periodically, we could go a long time 
>>>> before
>>>> doing any deflation via the ServiceThread and that would definitely be
>>>> an observable change in behavior relative to safepoint based 
>>>> deflation.
>>>> In some of my test runs, I had seen us go for 8-27 seconds without 
>>>> doing
>>>> any async monitor deflation. Of course, that meant that there was 
>>>> more to
>>>> cleanup when we did do it.
>>>
>>> How does that occur? If the ServiceThread is woken when there are 
>>> monitors to deflate then that implies that if it was not woken then 
>>> there were none to deflate. Or turning it around, if after waiting 
>>> for GuaranteedSafepointInterval ms the Service Thread finds there 
>>> are monitors to deflate why wasn't it directly notified of that?
>>
>> I think there's a bit of confusion here.
>>
>> We don't wake up the ServiceThread when there are ObjectMonitors to
>> deflate. We wake up the ServiceThread to see if it can deflate any
>> ObjectMonitors. ObjectMonitor idleness or suitability to be deflated is
>> determined by the ServiceThread. ObjectMonitor idleness is not a 
>> property
>> determined by another part of the system and the ServiceThread is woken
>> to handle it.
>
> Right my mistake.
>
> That's it for this email.

And that's it for me. Thanks again for closing the loop on this
load_acquire() stuff.

Dan


>
> Thanks,
> David
> -----
>
>>
>>>
>>>> I've been thinking about have my own AsyncDeflateIdleMonitorsThread 
>>>> and move
>>>> all the async deflation code from ServiceThread there. So many 
>>>> things are
>>>> being added to the ServiceThread that I do wonder whether it is 
>>>> overloaded,
>>>> but (as you said) there's no way to know that (yet).
>>>
>>> Some stuff just got moved out of the ServiceThread to a new 
>>> NotificationThread (which is a non-hidden Java thread), so the 
>>> ServiceThread has a little more capacity (potentially) to handle 
>>> async monitor deflation. But yes if the ServiceThread becomes a 
>>> bottleneck it will defeat the purpose of offloading work from the 
>>> VMThread at safepoints.
>>
>> The other "benefit" of having a dedicated AsyncDeflateIdleMonitorsThread
>> is that it would be easier for us to determine how much actual work is
>> done by the whole ObjectMonitor deflation procedure. We can say XX% of
>> the VMs execution time went to the AsyncDeflateIdleMonitorsThread.
>>
>> Also, the work moved out of the ServiceThread is not an "always on" 
>> thing
>> so the help for async deflation is limited at best.
>>
>>
>>>
>>>> BTW, I think GuaranteedSafepointInterval might be "broken" at this 
>>>> point
>>>> in that we don't guarantee a safepoint at that interval anymore. 
>>>> This has
>>>> kind of slipped through the cracks...
>>>
>>> We can look at that separately.
>>
>> Agreed. Not directly related to the Async Monitor Deflation project
>> except as the explanation for how I went so long without seeing an
>> async monitor deflation... Without that cleanup safepoint happening
>> every GuaranteedSafepointInterval (and without the wait() timeout 
>> value),
>> we don't have ObjectSynchronizer::do_safepoint_work() periodically
>> waking up the ServiceThread to do an async monitor deflation sweep...
>>
>>
>>
>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>
>>>>> No changes to this file.
>>>>>
>>>>> ---
>>>>>
>>>>> test/jdk/tools/jlink/multireleasejar/JLinkMultiReleaseJarTest.java
>>>>>
>>>>> No changes to this file.
>>>>
>>>> Correct. I'm unsure how to deal with this type of thing. This project
>>>> is managed as a series of patches. Earlier patches changed those 
>>>> files,
>>>> but a later patch undid those changes. So depending on where you are
>>>> in the patch queue, the logical 'files.list' contents are different.
>>>> I'm currently using a 'files.list' that reflects every file touched by
>>>> the current set of patches and that's why those two files are included
>>>> with no changes.
>>>>
>>>> Suggestions?
>>>
>>> When you think you have the final version ready for review then you 
>>> should flatten out the patches (I don't know the right terminology) 
>>> so that the final webrev is accurate.
>>
>> Good point. I've been so focused on different reviewers wanting to come
>> at this mass of code from different phases of the development process 
>> and
>> that led me to keep all the various version patches intact. I think 
>> we're
>> past the point where we are going to back up to, e.g., v2.05, and then
>> do a completely different lock free monitor list as a new v2.06.
>>
>> Okay. When we roll to v2.08, v2.07 and predecessors will get 
>> flattened so
>> that we have an easier review process.
>>
>>
>>>
>>>> Thanks for the partial review! I'm looking forward to resolving the
>>>> queries about and your next set of review comments.
>>>
>>> I will try to look at the ObjectMonitor changes themselves this 
>>> afternoon. So many non-trivial reviews/discussions at the moment, 
>>> plus my own work :)
>>
>> Thanks for spending so much time and energy on this code.
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> ---
>>>>>
>>>>>
>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> The Async Monitor Deflation project is reaching the end game. I 
>>>>>>> have no
>>>>>>> changes planned for the project at this time so all that is left 
>>>>>>> is code
>>>>>>> review and any changes that results from those reviews.
>>>>>>>
>>>>>>> Carsten and Roman! Time for you guys to chime in again on the 
>>>>>>> code reviews.
>>>>>>>
>>>>>>> I have attached the list of fixes from CR6 to CR7 instead of 
>>>>>>> putting it
>>>>>>> in the main body of this email.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-14+19.
>>>>>>>
>>>>>>> Here's the full webrev URL for those folks that want to see all 
>>>>>>> of the
>>>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>>>>>
>>>>>>>
>>>>>>> Some folks might want to see just what has changed since the 
>>>>>>> last review
>>>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>>>>>
>>>>>>>
>>>>>>> The OpenJDK wiki has been updated to match the 
>>>>>>> CR7/v2.07/10-for-jdk14 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> The jdk-14+18 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-8]
>>>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>>>> through my usual
>>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with 
>>>>>>> the addition
>>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the 
>>>>>>> other tests in
>>>>>>> my lab.
>>>>>>>
>>>>>>> The jdk-14+19 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-3]
>>>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>>>>>> process.
>>>>>>>
>>>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>> Performance lab
>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>>>
>>>>>>> ??? - "base" is jdk-14+18
>>>>>>> ??? - "v2.07" is the latest version and includes C2 
>>>>>>> inc_om_ref_count() support
>>>>>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors 
>>>>>>> option
>>>>>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>>>>>> specified
>>>>>>>
>>>>>>> ???????? hbIR?????????? hbIR
>>>>>>> ??? (max attempted)? (settled)? max-jOPS critical-jOPS runtime
>>>>>>> ??? ---------------? ---------? -------- ------------- -------
>>>>>>> ?????????? 34282.00?? 30635.90? 28831.30 20969.20 3841.30 base
>>>>>>> ?????????? 34282.00?? 30973.00? 29345.80 21025.20 3964.10 v2.07
>>>>>>> ?????????? 34282.00?? 31105.60? 29174.30 21074.00 3931.30 
>>>>>>> v2.07_handshake
>>>>>>> ?????????? 34282.00?? 30789.70? 27151.60 19839.10 3850.20 v2.07_off
>>>>>>>
>>>>>>> ??? - The Aurora Perf comparison tool reports:
>>>>>>>
>>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>>> ??????? ----------------------? -------------------- 
>>>>>>> --------------------
>>>>>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000) +0.27% (ns, 
>>>>>>> p=0.790)
>>>>>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007) +0.58% (ns, 
>>>>>>> p=0.536)
>>>>>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394) -5.39% (ns, 
>>>>>>> p=0.347)
>>>>>>>
>>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>>
>>>>>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>>>>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>>>>>
>>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>>> ??????? ----------------------? -------------------- 
>>>>>>> --------------------
>>>>>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345) +0.71% (ns, 
>>>>>>> p=0.646)
>>>>>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292) -1.21% (ns, 
>>>>>>> p=0.481)
>>>>>>>
>>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>
>>>> <trimmed older review invites>
>>>>
>>


From calvin.cheung at oracle.com  Fri Nov  1 16:30:47 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Fri, 1 Nov 2019 09:30:47 -0700
Subject: RFR(T) 8233363: Clarify the DumpSharedSpaces condition in
 InstanceKlass::verify_on
Message-ID: <d470f375-a4ac-b9f2-8f45-a65e7fbe2b39@oracle.com>

bug: https://bugs.openjdk.java.net/browse/JDK-8233363

Summary: change DumpSharedSpaces to Arguments::is_dumping_archive().

bash-4.2$ hg diff src/hotspot/share/oops/instanceKlass.cpp
diff --git a/src/hotspot/share/oops/instanceKlass.cpp 
b/src/hotspot/share/oops/instanceKlass.cpp
--- a/src/hotspot/share/oops/instanceKlass.cpp
+++ b/src/hotspot/share/oops/instanceKlass.cpp
@@ -3626,7 +3626,7 @@
 ???? Array<int>* method_ordering = this->method_ordering();
 ???? int length = method_ordering->length();
 ???? if (JvmtiExport::can_maintain_original_method_order() ||
-??????? ((UseSharedSpaces || DumpSharedSpaces) && length != 0)) {
+??????? ((UseSharedSpaces || Arguments::is_dumping_archive()) && length 
!= 0)) {
 ?????? guarantee(length == methods()->length(), "invalid method 
ordering length");
 ?????? jlong sum = 0;
 ?????? for (int j = 0; j < length; j++) {

Ran CDS and AppCDS tests locally on linux-x64.

thanks,

Calvin


From IOI.LAM at ORACLE.COM  Fri Nov  1 16:44:05 2019
From: IOI.LAM at ORACLE.COM (Ioi Lam)
Date: Fri, 1 Nov 2019 09:44:05 -0700
Subject: RFR(T) 8233363: Clarify the DumpSharedSpaces condition in
 InstanceKlass::verify_on
In-Reply-To: <d470f375-a4ac-b9f2-8f45-a65e7fbe2b39@oracle.com>
References: <d470f375-a4ac-b9f2-8f45-a65e7fbe2b39@oracle.com>
Message-ID: <FE3FB5BD-49BB-42E5-A174-50BB1350E4AC@ORACLE.COM>

Hi Calvin, this looks good and trivial to me.

Thanks
Ioi

Sent from my iPad

> On Nov 1, 2019, at 9:31 AM, Calvin Cheung <Calvin.Cheung at oracle.com> wrote:
> 
> ?bug: https://bugs.openjdk.java.net/browse/JDK-8233363
> 
> Summary: change DumpSharedSpaces to Arguments::is_dumping_archive().
> 
> bash-4.2$ hg diff src/hotspot/share/oops/instanceKlass.cpp
> diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp
> --- a/src/hotspot/share/oops/instanceKlass.cpp
> +++ b/src/hotspot/share/oops/instanceKlass.cpp
> @@ -3626,7 +3626,7 @@
>      Array<int>* method_ordering = this->method_ordering();
>      int length = method_ordering->length();
>      if (JvmtiExport::can_maintain_original_method_order() ||
> -        ((UseSharedSpaces || DumpSharedSpaces) && length != 0)) {
> +        ((UseSharedSpaces || Arguments::is_dumping_archive()) && length != 0)) {
>        guarantee(length == methods()->length(), "invalid method ordering length");
>        jlong sum = 0;
>        for (int j = 0; j < length; j++) {
> 
> Ran CDS and AppCDS tests locally on linux-x64.
> 
> thanks,
> 
> Calvin
> 


From calvin.cheung at oracle.com  Fri Nov  1 17:15:44 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Fri, 1 Nov 2019 10:15:44 -0700
Subject: RFR(T) 8233363: Clarify the DumpSharedSpaces condition in
 InstanceKlass::verify_on
In-Reply-To: <FE3FB5BD-49BB-42E5-A174-50BB1350E4AC@ORACLE.COM>
References: <d470f375-a4ac-b9f2-8f45-a65e7fbe2b39@oracle.com>
 <FE3FB5BD-49BB-42E5-A174-50BB1350E4AC@ORACLE.COM>
Message-ID: <558dff84-5b32-c215-e60b-56543f86db46@oracle.com>

Thanks!

On 11/1/19 9:44 AM, Ioi Lam wrote:
> Hi Calvin, this looks good and trivial to me.
>
> Thanks
> Ioi
>
> Sent from my iPad
>
>> On Nov 1, 2019, at 9:31 AM, Calvin Cheung <Calvin.Cheung at oracle.com> wrote:
>>
>> ?bug: https://bugs.openjdk.java.net/browse/JDK-8233363
>>
>> Summary: change DumpSharedSpaces to Arguments::is_dumping_archive().
>>
>> bash-4.2$ hg diff src/hotspot/share/oops/instanceKlass.cpp
>> diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp
>> --- a/src/hotspot/share/oops/instanceKlass.cpp
>> +++ b/src/hotspot/share/oops/instanceKlass.cpp
>> @@ -3626,7 +3626,7 @@
>>       Array<int>* method_ordering = this->method_ordering();
>>       int length = method_ordering->length();
>>       if (JvmtiExport::can_maintain_original_method_order() ||
>> -        ((UseSharedSpaces || DumpSharedSpaces) && length != 0)) {
>> +        ((UseSharedSpaces || Arguments::is_dumping_archive()) && length != 0)) {
>>         guarantee(length == methods()->length(), "invalid method ordering length");
>>         jlong sum = 0;
>>         for (int j = 0; j < length; j++) {
>>
>> Ran CDS and AppCDS tests locally on linux-x64.
>>
>> thanks,
>>
>> Calvin
>>

From coleen.phillimore at oracle.com  Fri Nov  1 17:27:46 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 1 Nov 2019 13:27:46 -0400
Subject: RFR(T) 8233363: Clarify the DumpSharedSpaces condition in
 InstanceKlass::verify_on
In-Reply-To: <558dff84-5b32-c215-e60b-56543f86db46@oracle.com>
References: <d470f375-a4ac-b9f2-8f45-a65e7fbe2b39@oracle.com>
 <FE3FB5BD-49BB-42E5-A174-50BB1350E4AC@ORACLE.COM>
 <558dff84-5b32-c215-e60b-56543f86db46@oracle.com>
Message-ID: <77d65ee2-4128-2485-fe2e-01a44f0df13b@oracle.com>

Me too. thanks for fixing this one.
Coleen

On 11/1/19 1:15 PM, Calvin Cheung wrote:
> Thanks!
>
> On 11/1/19 9:44 AM, Ioi Lam wrote:
>> Hi Calvin, this looks good and trivial to me.
>>
>> Thanks
>> Ioi
>>
>> Sent from my iPad
>>
>>> On Nov 1, 2019, at 9:31 AM, Calvin Cheung <Calvin.Cheung at oracle.com> 
>>> wrote:
>>>
>>> ?bug: https://bugs.openjdk.java.net/browse/JDK-8233363
>>>
>>> Summary: change DumpSharedSpaces to Arguments::is_dumping_archive().
>>>
>>> bash-4.2$ hg diff src/hotspot/share/oops/instanceKlass.cpp
>>> diff --git a/src/hotspot/share/oops/instanceKlass.cpp 
>>> b/src/hotspot/share/oops/instanceKlass.cpp
>>> --- a/src/hotspot/share/oops/instanceKlass.cpp
>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp
>>> @@ -3626,7 +3626,7 @@
>>> ????? Array<int>* method_ordering = this->method_ordering();
>>> ????? int length = method_ordering->length();
>>> ????? if (JvmtiExport::can_maintain_original_method_order() ||
>>> -??????? ((UseSharedSpaces || DumpSharedSpaces) && length != 0)) {
>>> +??????? ((UseSharedSpaces || Arguments::is_dumping_archive()) && 
>>> length != 0)) {
>>> ??????? guarantee(length == methods()->length(), "invalid method 
>>> ordering length");
>>> ??????? jlong sum = 0;
>>> ??????? for (int j = 0; j < length; j++) {
>>>
>>> Ran CDS and AppCDS tests locally on linux-x64.
>>>
>>> thanks,
>>>
>>> Calvin
>>>


From calvin.cheung at oracle.com  Fri Nov  1 18:09:52 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Fri, 1 Nov 2019 11:09:52 -0700
Subject: RFR(T) 8233363: Clarify the DumpSharedSpaces condition in
 InstanceKlass::verify_on
In-Reply-To: <77d65ee2-4128-2485-fe2e-01a44f0df13b@oracle.com>
References: <d470f375-a4ac-b9f2-8f45-a65e7fbe2b39@oracle.com>
 <FE3FB5BD-49BB-42E5-A174-50BB1350E4AC@ORACLE.COM>
 <558dff84-5b32-c215-e60b-56543f86db46@oracle.com>
 <77d65ee2-4128-2485-fe2e-01a44f0df13b@oracle.com>
Message-ID: <b7d5e075-e645-7505-5805-9925ac746bb8@oracle.com>

Thanks, Coleen!

On 11/1/19 10:27 AM, coleen.phillimore at oracle.com wrote:
> Me too. thanks for fixing this one.
> Coleen
>
> On 11/1/19 1:15 PM, Calvin Cheung wrote:
>> Thanks!
>>
>> On 11/1/19 9:44 AM, Ioi Lam wrote:
>>> Hi Calvin, this looks good and trivial to me.
>>>
>>> Thanks
>>> Ioi
>>>
>>> Sent from my iPad
>>>
>>>> On Nov 1, 2019, at 9:31 AM, Calvin Cheung 
>>>> <Calvin.Cheung at oracle.com> wrote:
>>>>
>>>> ?bug: https://bugs.openjdk.java.net/browse/JDK-8233363
>>>>
>>>> Summary: change DumpSharedSpaces to Arguments::is_dumping_archive().
>>>>
>>>> bash-4.2$ hg diff src/hotspot/share/oops/instanceKlass.cpp
>>>> diff --git a/src/hotspot/share/oops/instanceKlass.cpp 
>>>> b/src/hotspot/share/oops/instanceKlass.cpp
>>>> --- a/src/hotspot/share/oops/instanceKlass.cpp
>>>> +++ b/src/hotspot/share/oops/instanceKlass.cpp
>>>> @@ -3626,7 +3626,7 @@
>>>> ????? Array<int>* method_ordering = this->method_ordering();
>>>> ????? int length = method_ordering->length();
>>>> ????? if (JvmtiExport::can_maintain_original_method_order() ||
>>>> -??????? ((UseSharedSpaces || DumpSharedSpaces) && length != 0)) {
>>>> +??????? ((UseSharedSpaces || Arguments::is_dumping_archive()) && 
>>>> length != 0)) {
>>>> ??????? guarantee(length == methods()->length(), "invalid method 
>>>> ordering length");
>>>> ??????? jlong sum = 0;
>>>> ??????? for (int j = 0; j < length; j++) {
>>>>
>>>> Ran CDS and AppCDS tests locally on linux-x64.
>>>>
>>>> thanks,
>>>>
>>>> Calvin
>>>>
>

From m.sundar85 at gmail.com  Fri Nov  1 19:39:46 2019
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Fri, 1 Nov 2019 12:39:46 -0700
Subject: JVM stuck/looping in futex call
Message-ID: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>

Hi,
    I am running openjdk12/Linux on our systems and see jvm not responding
to jstack or any diagnostic command (jcmd VM.info/Thread.print). Though
application is running fine.

I see following stack track

Process 115586 attached
futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
---
futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn()                          = 202
futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
---
futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn()                          = 202
futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
---
futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn()                          = 202
futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
---
rt_sigreturn()                          = 202
futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
restarted if SA_RESTART is set)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
---
rt_sigreturn()                          = 202
futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess 115586 detached
 <detached ...>

Can someone help me understand what is happening here?

Please redirect me to proper ilist if this is not correct list for these
type of questions.

TIA
Sundar

From david.holmes at oracle.com  Fri Nov  1 23:03:31 2019
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 2 Nov 2019 09:03:31 +1000
Subject: JVM stuck/looping in futex call
In-Reply-To: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
References: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
Message-ID: <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>

Hi Sundar,

On 2/11/2019 5:39 am, Sundara Mohan M wrote:
> Hi,
>      I am running openjdk12/Linux on our systems and see jvm not responding
> to jstack or any diagnostic command (jcmd VM.info/Thread.print). Though
> application is running fine.

That would sound like the attach thread (which would respond to the 
jstack or other diagnostic command) is in some kind of bad state.

> I see following stack track
> 
> Process 115586 attached
> futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> restarted if SA_RESTART is set)
> --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
> ---
> futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> rt_sigreturn()                          = 202
> futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> restarted if SA_RESTART is set)
> --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
> ---
> futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> rt_sigreturn()                          = 202
> futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> restarted if SA_RESTART is set)
> --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
> ---
> futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> rt_sigreturn()                          = 202
> futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> restarted if SA_RESTART is set)
> --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
> ---
> rt_sigreturn()                          = 202
> futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> restarted if SA_RESTART is set)
> --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587, si_uid=1000}
> ---
> rt_sigreturn()                          = 202
> futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess 115586 detached
>   <detached ...>
> 
> Can someone help me understand what is happening here?

It appears that in responding to the SIGQUIT that is used to trigger the 
starting of the attach listener thread, that something is going wrong. 
We appear to be continually restarting an operation that still sees the 
signal pending - which doesn't really make sense to me. Can you get a 
complete stack trace using gdb?

> Please redirect me to proper ilist if this is not correct list for these
> type of questions.

This list is fine. It may end up being an issue for serviceability-dev 
but we can deal with that later. :)

Thanks,
David
-----

> 
> TIA
> Sundar
> 

From suenaga at oss.nttdata.com  Sat Nov  2 12:46:27 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Sat, 2 Nov 2019 21:46:27 +0900
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
Message-ID: <e7f23e19-f208-0a8a-56ac-4be23deabc65@oss.nttdata.com>

Hi Thomas,

I agree with you. Also CI Replay will be generated after NMT report.
But I think it should be another issue.

If you are ok, I file it to JBS and create a patch.


Thanks,

Yasumasa


On 2019/11/01 19:36, Thomas St?fe wrote:
> Hi Yasumasa,
> 
> I see that we do JFR::on_vm_shutdown() before error reporting ran. Is that really necessary? Error reporting should happen as close as possible to the error point - ideally, as little code as possible should run between the crash/assert and the generation of the hs-err file. I suggest moving the call to JFR::on_vm_shutdown()
> down to a point after error reporting, e.g. to where we print the NMT report on shutdown.
> 
> Cheers, Thomas
> 
> 
> On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
> 
>     Forward to hotspot-runtime-dev.
> 
>     As David commented in JBS, it may need to be fixed in JFR code.
>     But I'm not unclear why thread state is not recover.
> 
>     I'd like to hear about this from JFR folks.
>     If it is just a bug in JFR, I will create a patch which recover it in JFR code.
> 
> 
>     Thanks,
> 
>     Yasumasa
> 
> 
>     -------- Forwarded Message --------
>     Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>     Date: Fri, 1 Nov 2019 17:08:42 +0900
>     From: Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>>
>     To: hotspot-jfr-dev at openjdk.java.net <mailto:hotspot-jfr-dev at openjdk.java.net>
>     CC: yasuenag at gmail.com <mailto:yasuenag at gmail.com> <yasuenag at gmail.com <mailto:yasuenag at gmail.com>>
> 
>     Hi all,
> 
>     Please review this change:
> 
>      ? ?JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>      ? ?webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
> 
>     If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
>     It would perform in prepare_for_emergency_dump().
>     However this function transits thread state to "_thread_in_vm".
> 
>     This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>     It failed at compiler/types/correctness/CorrectnessTest.java
>     However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
>     So I think this patch can go through.
> 
> 
>     Thanks,
> 
>     Yasumasa
> 

From daniel.daugherty at oracle.com  Sat Nov  2 13:15:53 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Sat, 2 Nov 2019 09:15:53 -0400
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
Message-ID: <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>

Erik,

David H. made a comment during this review cycle that should interest you.

The longer version of this comment came up in early reviews of the Async
Monitor Deflation code because I copied the code and the longer comment
from threadSMR.cpp. I updated the comment based on your input and review
and changed the comment and code in threadSMR.cpp and in the Async Monitor
Deflation project code.

The change in threadSMR.cpp was done with this changeset:

$ hg log -v -r 54517
changeset:?? 54517:c201ca660afd
user:??????? dcubed
date:??????? Thu Apr 11 14:14:30 2019 -0400
files:?????? src/hotspot/share/runtime/threadSMR.cpp
description:
8222034: Thread-SMR functions should be updated to remove work around
Reviewed-by: mdoerr, eosterlund

Here's one of the two diffs to job your memory:

 ?void ThreadsList::dec_nested_handle_cnt() {
-? // The decrement needs to be MO_ACQ_REL. At the moment, the Atomic::dec
-? // backend on PPC does not yet conform to these requirements. Therefore
-? // the decrement is simulated with an Atomic::sub(1, &addr).
-? // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR 
mechanism
-? // is not generally safe to use.
-? Atomic::sub(1, &_nested_handle_cnt);
+? // The decrement only needs to be MO_ACQ_REL since the reference
+? // counter is volatile (and the hazard ptr is already NULL).
+? Atomic::dec(&_nested_handle_cnt);
 ?}

Below is David's comment about the code comment...

Dan


Trimming down to just that issue...

On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
> On 10/24/19 7:00 AM, David Holmes wrote:
 >
 > src/hotspot/share/runtime/objectMonitor.inline.hpp
>
>> ?199 // The decrement only needs to be MO_ACQ_REL since the reference
>> ?200?? // counter is volatile.
>> ?201?? Atomic::dec(&_ref_count);
>>
>> volatile is irrelevant with regards to memory ordering as it is a 
>> compiler annotation. And you haven't specified any memory order value 
>> so the default is conservative ie. implied full fence. (I see the 
>> same incorrect comment is in threadSMR.cpp!)
>
> I got that wording from threadSMR.cpp and Erik O. confirmed my use of 
> that
> wording previously. I'll chase it down with Erik and get back to you.
>
>
>> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>> ?209?? // counter update is seen as soon as possible in a race with the
>> ?210?? // async deflation protocol.
>> ?211?? Atomic::inc(&_ref_count);
>>
>> Ditto you haven't specified any ordering - and inc() and dec() will 
>> have the same default.
>
> And again, I'll have to chase this down with Erik O. and get back to you.


From suenaga at oss.nttdata.com  Sat Nov  2 15:56:53 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Sun, 3 Nov 2019 00:56:53 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
Message-ID: <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>

Hi,

Markus commented in JBS this change should be kept local to JFR.
So I updated webrev. Could you review it?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/

This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).


Thanks,

Yasumasa


On 2019/11/01 18:41, Yasumasa Suenaga wrote:
> Forward to hotspot-runtime-dev.
> 
> As David commented in JBS, it may need to be fixed in JFR code.
> But I'm not unclear why thread state is not recover.
> 
> I'd like to hear about this from JFR folks.
> If it is just a bug in JFR, I will create a patch which recover it in JFR code.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> -------- Forwarded Message --------
> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
> Date: Fri, 1 Nov 2019 17:08:42 +0900
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> To: hotspot-jfr-dev at openjdk.java.net
> CC: yasuenag at gmail.com <yasuenag at gmail.com>
> 
> Hi all,
> 
> Please review this change:
> 
>  ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>  ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
> 
> If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
> It would perform in prepare_for_emergency_dump().
> However this function transits thread state to "_thread_in_vm".
> 
> This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
> It failed at compiler/types/correctness/CorrectnessTest.java
> However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
> So I think this patch can go through.
> 
> 
> Thanks,
> 
> Yasumasa

From markus.gronlund at oracle.com  Sun Nov  3 16:22:44 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Sun, 3 Nov 2019 08:22:44 -0800 (PST)
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
Message-ID: <df25a046-9f1e-417c-9ae2-f24786d850e6@default>

Hi Yasumasa,

I think you can simplify it to something like this:

http://cr.openjdk.java.net/~mgronlun/8233375/webrev/

Thanks
Markus

-----Original Message-----
From: Yasumasa Suenaga <suenaga at oss.nttdata.com> 
Sent: den 2 november 2019 16:57
To: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; Markus Gronlund <markus.gronlund at oracle.com>
Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state

Hi,

Markus commented in JBS this change should be kept local to JFR.
So I updated webrev. Could you review it?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/

This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).


Thanks,

Yasumasa


On 2019/11/01 18:41, Yasumasa Suenaga wrote:
> Forward to hotspot-runtime-dev.
> 
> As David commented in JBS, it may need to be fixed in JFR code.
> But I'm not unclear why thread state is not recover.
> 
> I'd like to hear about this from JFR folks.
> If it is just a bug in JFR, I will create a patch which recover it in JFR code.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> -------- Forwarded Message --------
> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
> Date: Fri, 1 Nov 2019 17:08:42 +0900
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> To: hotspot-jfr-dev at openjdk.java.net
> CC: yasuenag at gmail.com <yasuenag at gmail.com>
> 
> Hi all,
> 
> Please review this change:
> 
>  ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>  ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
> 
> If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
> It would perform in prepare_for_emergency_dump().
> However this function transits thread state to "_thread_in_vm".
> 
> This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
> It failed at compiler/types/correctness/CorrectnessTest.java
> However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
> So I think this patch can go through.
> 
> 
> Thanks,
> 
> Yasumasa

From david.holmes at oracle.com  Sun Nov  3 22:38:55 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 4 Nov 2019 08:38:55 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
Message-ID: <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>

On 4/11/2019 2:22 am, Markus Gronlund wrote:
> Hi Yasumasa,
> 
> I think you can simplify it to something like this:
> 
> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/

That is more like I had envisaged for this. Reusing existing 
thread-state transition code is preferable to adding more custom code 
that directly manipulates thread-state.

Thanks,
David

> Thanks
> Markus
> 
> -----Original Message-----
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> Sent: den 2 november 2019 16:57
> To: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; Markus Gronlund <markus.gronlund at oracle.com>
> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
> 
> Hi,
> 
> Markus commented in JBS this change should be kept local to JFR.
> So I updated webrev. Could you review it?
> 
>     http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
> 
> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>> Forward to hotspot-runtime-dev.
>>
>> As David commented in JBS, it may need to be fixed in JFR code.
>> But I'm not unclear why thread state is not recover.
>>
>> I'd like to hear about this from JFR folks.
>> If it is just a bug in JFR, I will create a patch which recover it in JFR code.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> -------- Forwarded Message --------
>> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>> To: hotspot-jfr-dev at openjdk.java.net
>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>
>> Hi all,
>>
>> Please review this change:
>>
>>   ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>   ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>
>> If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
>> It would perform in prepare_for_emergency_dump().
>> However this function transits thread state to "_thread_in_vm".
>>
>> This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>> It failed at compiler/types/correctness/CorrectnessTest.java
>> However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
>> So I think this patch can go through.
>>
>>
>> Thanks,
>>
>> Yasumasa

From suenaga at oss.nttdata.com  Mon Nov  4 01:19:48 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Mon, 4 Nov 2019 10:19:48 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
Message-ID: <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>

On 2019/11/04 7:38, David Holmes wrote:
> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>> Hi Yasumasa,
>>
>> I think you can simplify it to something like this:
>>
>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
> 
> That is more like I had envisaged for this. Reusing existing thread-state transition code is preferable to adding more custom code that directly manipulates thread-state.

I do not agree with this change.

VMError::report_and_die() has "Thread* thread" in its arguments. So Thread::current() might be different with it.
In addition, ThreadInVMfromUnknown uses transition_from_native() to change the thread state.
It checks (and manipulates?) something which relates to safepoint.

Thus I added ThreadInVMForJFR to new my webrev.


Thanks,

Yasumasa


> Thanks,
> David
> 
>> Thanks
>> Markus
>>
>> -----Original Message-----
>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>> Sent: den 2 november 2019 16:57
>> To: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; Markus Gronlund <markus.gronlund at oracle.com>
>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>
>> Hi,
>>
>> Markus commented in JBS this change should be kept local to JFR.
>> So I updated webrev. Could you review it?
>>
>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>
>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>> Forward to hotspot-runtime-dev.
>>>
>>> As David commented in JBS, it may need to be fixed in JFR code.
>>> But I'm not unclear why thread state is not recover.
>>>
>>> I'd like to hear about this from JFR folks.
>>> If it is just a bug in JFR, I will create a patch which recover it in JFR code.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> -------- Forwarded Message --------
>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>> To: hotspot-jfr-dev at openjdk.java.net
>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>
>>> Hi all,
>>>
>>> Please review this change:
>>>
>>> ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>> ? ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>
>>> If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
>>> It would perform in prepare_for_emergency_dump().
>>> However this function transits thread state to "_thread_in_vm".
>>>
>>> This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>> However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
>>> So I think this patch can go through.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa

From david.holmes at oracle.com  Mon Nov  4 02:11:26 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 4 Nov 2019 12:11:26 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
Message-ID: <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>

On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
> On 2019/11/04 7:38, David Holmes wrote:
>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>> Hi Yasumasa,
>>>
>>> I think you can simplify it to something like this:
>>>
>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>
>> That is more like I had envisaged for this. Reusing existing 
>> thread-state transition code is preferable to adding more custom code 
>> that directly manipulates thread-state.
> 
> I do not agree with this change.
> 
> VMError::report_and_die() has "Thread* thread" in its arguments. So 
> Thread::current() might be different with it.

Not sure what you mean. You only ever manipulate the thread state of the 
current thread.

> In addition, ThreadInVMfromUnknown uses transition_from_native() to 
> change the thread state.
> It checks (and manipulates?) something which relates to safepoint.

Yes it does - which would be a problem if a safepoint (or handshake) 
were pending. But the path through before_exit already has safepoint 
checks when you acquire the BeforeExit_lock.

The main problem with the suggestion is it seems we may not be running 
in a JavaThread:

  349   Thread* const thread = Thread::current();
  350   if (thread->is_Watcher_thread()) {

so we can't use the existing thread-state helpers, unless we narrow the 
scope (as you do) to after the check for the WatcherThread.

David
-----

> Thus I added ThreadInVMForJFR to new my webrev.

Your change still seems overly complicated.

> 
> Thanks,
> 
> Yasumasa
> 
> 
>> Thanks,
>> David
>>
>>> Thanks
>>> Markus
>>>
>>> -----Original Message-----
>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>> Sent: den 2 november 2019 16:57
>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>> hotspot-runtime-dev at openjdk.java.net
>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; 
>>> Markus Gronlund <markus.gronlund at oracle.com>
>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>> thread state
>>>
>>> Hi,
>>>
>>> Markus commented in JBS this change should be kept local to JFR.
>>> So I updated webrev. Could you review it?
>>>
>>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>
>>> This change passed all tests on submit repo 
>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>> Forward to hotspot-runtime-dev.
>>>>
>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>> But I'm not unclear why thread state is not recover.
>>>>
>>>> I'd like to hear about this from JFR folks.
>>>> If it is just a bug in JFR, I will create a patch which recover it 
>>>> in JFR code.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> -------- Forwarded Message --------
>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>
>>>> Hi all,
>>>>
>>>> Please review this change:
>>>>
>>>> ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>> ? ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>
>>>> If JFR is running when JVM crashes, JFR will dump data to 
>>>> hs_err_pid<PID>.jfr .
>>>> It would perform in prepare_for_emergency_dump().
>>>> However this function transits thread state to "_thread_in_vm".
>>>>
>>>> This change has been tested on submit repo as 
>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>> However this test is for JIT compiler, and related issue has been 
>>>> reported as JDK-8225620.
>>>> So I think this patch can go through.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa

From david.holmes at oracle.com  Mon Nov  4 02:24:55 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 4 Nov 2019 12:24:55 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
Message-ID: <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>

Correction ...

On 4/11/2019 12:11 pm, David Holmes wrote:
> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>> On 2019/11/04 7:38, David Holmes wrote:
>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>> Hi Yasumasa,
>>>>
>>>> I think you can simplify it to something like this:
>>>>
>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>
>>> That is more like I had envisaged for this. Reusing existing 
>>> thread-state transition code is preferable to adding more custom code 
>>> that directly manipulates thread-state.
>>
>> I do not agree with this change.
>>
>> VMError::report_and_die() has "Thread* thread" in its arguments. So 
>> Thread::current() might be different with it.
> 
> Not sure what you mean. You only ever manipulate the thread state of the 
> current thread.
> 
>> In addition, ThreadInVMfromUnknown uses transition_from_native() to 
>> change the thread state.
>> It checks (and manipulates?) something which relates to safepoint.
> 
> Yes it does - which would be a problem if a safepoint (or handshake) 
> were pending. But the path through before_exit already has safepoint 
> checks when you acquire the BeforeExit_lock.

But that isn't relevant. The issue is we don't want a safepoint check on 
the report_and_die() path. So a custom transition helper is needed to 
avoid that.

David

> The main problem with the suggestion is it seems we may not be running 
> in a JavaThread:
> 
>  ?349?? Thread* const thread = Thread::current();
>  ?350?? if (thread->is_Watcher_thread()) {
> 
> so we can't use the existing thread-state helpers, unless we narrow the 
> scope (as you do) to after the check for the WatcherThread.
> 
> David
> -----
> 
>> Thus I added ThreadInVMForJFR to new my webrev.
> 
> Your change still seems overly complicated.
> 
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>>> Thanks,
>>> David
>>>
>>>> Thanks
>>>> Markus
>>>>
>>>> -----Original Message-----
>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>> Sent: den 2 november 2019 16:57
>>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; 
>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>> thread state
>>>>
>>>> Hi,
>>>>
>>>> Markus commented in JBS this change should be kept local to JFR.
>>>> So I updated webrev. Could you review it?
>>>>
>>>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>
>>>> This change passed all tests on submit repo 
>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>> Forward to hotspot-runtime-dev.
>>>>>
>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>> But I'm not unclear why thread state is not recover.
>>>>>
>>>>> I'd like to hear about this from JFR folks.
>>>>> If it is just a bug in JFR, I will create a patch which recover it 
>>>>> in JFR code.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> -------- Forwarded Message --------
>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Please review this change:
>>>>>
>>>>> ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>> ? ? webrev: 
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>
>>>>> If JFR is running when JVM crashes, JFR will dump data to 
>>>>> hs_err_pid<PID>.jfr .
>>>>> It would perform in prepare_for_emergency_dump().
>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>
>>>>> This change has been tested on submit repo as 
>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>> However this test is for JIT compiler, and related issue has been 
>>>>> reported as JDK-8225620.
>>>>> So I think this patch can go through.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa

From jianglizhou at google.com  Mon Nov  4 02:34:31 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Sun, 3 Nov 2019 18:34:31 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
Message-ID: <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>

Hi Ioi,

Sorry for the delay again. Will try to put this on the top of my list
next week and reduce the turn-around time. The updates look good in
general.

We might want to have a better strategy when choosing metadata
relocation address (when relocation is needed). Some
applications/benchmarks may be more sensitive to cache locality and
memory/data layout. There was a bug,
https://bugs.openjdk.java.net/browse/JDK-8213713 that caused 1G gap
between Java heap data and metadata before JDK 12. The gap seemed to
cause a small but noticeable runtime effect in one case that I came
across.

Here are some additional comments (minor).

Could you please fix the long lines in the following?

1237 void java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
archived_mirror) {
1238   if (MetaspaceShared::relocation_delta() != 0) {
1239     assert(archived_mirror->metadata_field(_klass_offset) ==
NULL, "must be for primitive class");
1240
1241     Klass* ak =
((Klass*)archived_mirror->metadata_field(_array_klass_offset));
1242     if (ak != NULL) {
1243       archived_mirror->metadata_field_put(_array_klass_offset,
(Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
1244     }
1245   }
1246 }

src/hotspot/share/memory/dynamicArchive.cpp

 889   Thread* THREAD = Thread::current();
 890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
dynamic_dump_method_comparator);
 891   if (ik->default_methods() != NULL) {
 892     Method::sort_methods(ik->default_methods(),
/*set_idnums=*/false, dynamic_dump_method_comparator);
 893   }


Please see inlined comments below.

On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Hi Jiangli,
>
> Thanks for the review. I've updated the patch according to your comments:
>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>
> (the delta is on top of 8231610-relocate-cds-archive.v03.delta in my
> reply to Calvin's comments).
>
> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
> > Hi Ioi,
> >
> > Sorry for the delay. Here are my remaining comments.
> >
> > - src/hotspot/share/memory/dynamicArchive.cpp
> >
> > 128   static intx _method_comparator_name_delta;
> >
> > The name of the above variable is confusing. It's the value of
> > _buffer_to_target_delta. It's better to _buffer_to_target_delta
> > directly.
>
> _buffer_to_target_delta is a non-static field, but
> dynamic_dump_method_comparator() must be a static function so it can't
> use the non-static field easily.


It sounds like an issue. _buffer_to_target_delta was made as a
non-static mostly because we might support more than one dynamic
archives in the future. However, today's usages bake in an assumption
that _buffer_to_target_delta is a singleton value. It is cleaner to
either make _buffer_to_target_delta as a static variable for now, or
adding an access API in DynamicArchiveBuilder to allow other code to
properly and correctly use the value.

>
> > Also, we can do a quick pointer comparison of 'a_name' and
> > 'b_name' first before adjusting the pointers.
>
> I added this:
>
>      if (a_name == b_name) {
>        return 0;
>      }
>
> > ---
> >
> > 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
> > ...
> >   944
> >   945   ArchivePtrMarker::compact(relocatable_base, relocatable_end);
> > ...
> >
> >   974     SharedDataRelocator patcher((address*)patch_base,
> > (address*)patch_end, valid_old_base, valid_old_end,
> >   975                                 valid_new_base, valid_new_end, addr_delta);
> >   976     ArchivePtrMarker::ptrmap()->iterate(&patcher);
> >
> > Could we reduce the number of data re-iterations to help archive
> > dumping performance. The ArchivePtrMarker::compact operation can be
> > combined with the patching iteration. ArchivePtrMarker::compact API
> > can be removed.
>
> That's a good idea. I implemented it using a template parameter so that
> we can have max performance when relocating the archive at run time.
>
> I added comments to explain why the relocation is done here. The
> relocation is pretty rare (only when the base archive was not mapped at
> the default location).
>
> > ---
> >
> >   967     address valid_new_base =
> > (address)Arguments::default_SharedBaseAddress();
> >   968     address valid_new_end  = valid_new_base + base_plus_top_size;
> >
> > The debugging only code can be included under #ifdef ASSERT.
> These values are actually also used in debug logging so they can't be
> ifdef'ed out.
>
> Also, the c++ compiler is pretty good with eliding code that's no
> actually used. If I comment out all the logging code in
> DynamicArchiveBuilder::relocate_buffer_to_target()  and
> SharedDataRelocator, gcc elides all the unused fields and their
> assignments. So no code is generated for this, etc.
>
>      address valid_new_base =
> (address)Arguments::default_SharedBaseAddress();
>
> Since #ifdef ASSERT makes the code harder to read, I think we should use
> it only when really necessary.

It seems cleaner to get rid of these debugging only variables, by
using 'relocatable_base' and
'(address)Arguments::default_SharedBaseAddress()' in the logging code.

>
> > ---
> >
> >   993   dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
> >
> > We could combine the archived heap data bitmap into the new region as
> > well? It can be handled as a separate RFE.
>
> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>
> >
> > - src/hotspot/share/memory/filemap.cpp
> >
> > 1038     if (is_static()) {
> > 1039       if (errno == ENOENT) {
> > 1040         // Not locating the shared archive is ok.
> > 1041         fail_continue("Specified shared archive not found (%s).",
> > _full_path);
> > 1042       } else {
> > 1043         fail_continue("Failed to open shared archive file (%s).",
> > 1044                       os::strerror(errno));
> > 1045       }
> > 1046     } else {
> > 1047       log_warning(cds, dynamic)("specified dynamic archive
> > doesn't exist: %s", _full_path);
> > 1048     }
> >
> > If the top layer is explicitly specified by the user, a warning does
> > not seem to be a proper behavior if the VM fails to open the archive
> > file.
> >
> > If might be better to handle the relocation unrelated code in separate
> > changeset and track with a separate RFE.
>
> This code was moved from
>
> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>
> so I am not changing the behavior. If you want, we can file an REF to
> change the behavior.

Ok. A new RFE sounds like the right thing to re-evaluable the usage
issue here. Thanks.

>
> > ---
> >
> > 1148 void FileMapInfo::write_region(int region, char* base, size_t size,
> > 1149                                bool read_only, bool allow_exec) {
> > ...
> > 1154
> > 1155   if (region == MetaspaceShared::bm) {
> > 1156     target_base = NULL;
> > 1157   } else if (DynamicDumpSharedSpaces) {
> >
> > It's not too clear to me how the bitmap (bm) region is handled for the
> > base layer and top layer. Could you please explain?
>
> The bm region for both layers are mapped at an address picked by the OS:
>
> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>    FileMapRegion* si = space_at(MetaspaceShared::bm);
>    bitmap_size = si->used_aligned();
>    bool read_only = true, allow_exec = false;
>    char* requested_addr = NULL; // allow OS to pick any location
>    char* bitmap_base = os::map_memory(_fd, _full_path, si->file_offset(),
>                                       requested_addr, bitmap_size,
> read_only, allow_exec);
>

Ok, after staring at the code for a few seconds I saw that's intended.
If the current region is 'bm', then the 'target_base' is NULL
regardless if it's static or dynamic archive. Otherwise, the
'target_base' is handled differently for the static and dynamic case.
The following would be cleaner and has better reliability.

   char* target_base = NULL;

   // The target_base is NULL for 'bm' region.
   if (!region == MetaspaceShared::bm) {
     if (DynamicDumpSharedSpaces) {
       assert(!HeapShared::is_heap_region(region), "dynamic archive
doesn't support heap regions");
       target_base = DynamicArchive::buffer_to_target(base);
     } else {
       target_base = base;
     }
  }


>
> >
> > ---
> >
> > 1362   DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >
> > Could you please explain the above?
>
> I added the comments
>
>    // Make sure we don't attempt to use header()->mapped_base_address()
> unless
>    // it's been successfully mapped.
> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>
> >
> > ---
> >
> > 1359   FileMapRegion* last_region = NULL;
> >
> > 1371     if (last_region != NULL) {
> > 1372       // Ensure that the OS won't be able to allocate new memory
> > spaces between any mapped
> > 1373       // regions, or else it would mess up the simple comparision
> > in MetaspaceObj::is_shared().
> > 1374       assert(si->mapped_base() == last_region->mapped_end(),
> > "must have no gaps");
> >
> > 1379     last_region = si;
> >
> > Can you please place 'last_region' related code under #ifdef ASSERT?
>
> I think that will make the code more cluttered. The compiler will
> optimize out that away.

It's cleaner to define debugging only variable for debugging only
builds. You can wrapper it and related usage with DEBUG_ONLY.

>
> > ---
> >
> > 1478 char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
> > 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
> > 1480   bitmap_size = si->used_aligned();
> > 1481   bool read_only = true, allow_exec = false;
> > 1482   char* requested_addr = NULL; // allow OS to pick any location
> > 1483   char* bitmap_base = os::map_memory(_fd, _full_path, si->file_offset(),
> > 1484                                      requested_addr, bitmap_size,
> > read_only, allow_exec);
> >
> > We need to handle mapping failure here.
>
> It's handled here:
>
> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>    log_debug(cds, reloc)("runtime archive relocation start");
>    size_t bitmap_size;
>    char* bitmap_base = map_relocation_bitmap(bitmap_size);
>    if (bitmap_base != NULL) {
>    ...
>    } else {
>      log_error(cds)("failed to map relocation bitmap");
>      return false;
>    }
>

'bitmap_base' is used immediately after map_memory(). So the check
needs to be done immediately after map_memory(), but not in the caller
of map_relocation_bitmap().

1490   char* bitmap_base = os::map_memory(_fd, _full_path, si->file_offset(),
1491                                      requested_addr, bitmap_size,
read_only, allow_exec);
1492
1493   if (VerifySharedSpaces && bitmap_base != NULL &&
!region_crc_check(bitmap_base, bitmap_size, si->crc())) {


> > ---
> >
> > 1513     // debug only -- the current value of the pointers to be
> > patched must be within this
> > 1514     // range (i.e., must be between the requesed base address,
> > and the of the current archive).
> > 1515     // Note: top archive may point to objects in the base
> > archive, but not the other way around.
> > 1516     address valid_old_base = (address)header()->requested_base_address();
> > 1517     address valid_old_end  = valid_old_base + mapping_end_offset();
> >
> > Please place all FileMapInfo::relocate_pointers debugging only code
> > under #ifdef ASSERT.
>
> Ditto about ifdef ASSERT
>
> >
> > - src/hotspot/share/memory/heapShared.cpp
> >
> >   441 void HeapShared::initialize_from_archived_subgraph(Klass* k) {
> >   442   if (!open_archive_heap_region_mapped() || !MetaspaceObj::is_shared(k)) {
> >   443     return; // nothing to do
> >   444   }
> >
> > When do we call HeapShared::initialize_from_archived_subgraph for a
> > klass that's not shared?
>
> I've removed the !MetaspaceObj::is_shared(k). I probably added that for
> debugging purposes only.
>
> >
> >   616   DEBUG_ONLY({
> >   617       Klass* klass = orig_obj->klass();
> >   618       assert(klass != SystemDictionary::Module_klass() &&
> >   619              klass != SystemDictionary::ResolvedMethodName_klass() &&
> >   620              klass != SystemDictionary::MemberName_klass() &&
> >   621              klass != SystemDictionary::Context_klass() &&
> >   622              klass != SystemDictionary::ClassLoader_klass(), "we
> > can only relocate metaspace object pointers inside java_lang_Class
> > instances");
> >   623     });
> >
> > Let's leave the above for a separate RFE. I think assert is not
> > sufficient for the check. Also, why ResolvedMethodName, Module and
> > MemberName cannot be part of the graph?
> >
> >
> I added the following comment:
>
>    DEBUG_ONLY({
>        // The following are classes in share/classfile/javaClasses.cpp
> that have injected native pointers
>        // to metaspace objects. To support these classes, we need to add
> relocation code similar to
>        // java_lang_Class::update_archived_mirror_native_pointers.
>        Klass* klass = orig_obj->klass();
>        assert(klass != SystemDictionary::Module_klass() &&
>               klass != SystemDictionary::ResolvedMethodName_klass() &&
>

It's too restrictive to exclude those objects from the archived object
graph because metadata relocation, since metadata relocation is rare.
The trade-off doesn't seem to buy us much.

Do you plan to add the needed relocation code?

> >
> > - src/hotspot/share/memory/metaspace.cpp
> >
> > 1036   metaspace_rs = ReservedSpace(compressed_class_space_size(),
> > 1037                                              _reserve_alignment,
> > 1038                                              large_pages,
> > 1039                                              requested_addr);
> >
> > Please fix indentation.
>
> Fixed.
>
> >
> > - src/hotspot/share/memory/metaspaceClosure.hpp
> >
> >    78   enum SpecialRef {
> >    79     _method_entry_ref
> >    80   };
> >
> > Are there other pointers that are not references to MetaspaceObj? If
> > _method_entry_ref is the only type, it's probably not worth defining
> > SpecialRef?
>
> There may be more types in the future, so I want to have a stable API
> that can be easily expanded without touching all the code that uses it.
>
>
> >
> > - src/hotspot/share/memory/metaspaceShared.hpp
> >
> >    42 enum MapArchiveResult {
> >    43   MAP_ARCHIVE_SUCCESS,
> >    44   MAP_ARCHIVE_MMAP_FAILURE,
> >    45   MAP_ARCHIVE_OTHER_FAILURE
> >    46 };
> >
> > If we want to define different failure types, it's probably worth
> > using separate types for relocation failure and validation failure.
>
> For now, I just need to distinguish between MMAP_FAILURE (where I should
> attempt to remap at an alternative address) and OTHER_FAILURE (where the
> CDS archive loading will fail -- due to validation error, insufficient
> memory, etc -- without attempting to remap.)
>
> >
> > ---
> >
> >   193   static intx _mapping_delta; // FIXME rename
> >
> > How about _relocation_delta?
>
> Changed as suggested.
>
> >
> > - src/hotspot/share/oops/instanceKlass
> >
> > 1573 bool InstanceKlass::_disable_method_binary_search = false;
> >
> > The use of _disable_method_binary_search is not necessary. You can use
> > DynamicDumpSharedSpaces for the purpose. That would make things
> > cleaner.
>
> If we always disable the binary search when DynamicDumpSharedSpaces is
> true, it will slow down normal execution of the Java program when
> -XX:ArchiveClassesAtExit has been specified, but the program hasn't exited.

Could you please add some comments to _disable_method_binary_search
with the above explanation? Thanks.

>
> >
> > - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
> >
> >    76                     if (name.equals("s0") || name.equals("s1")) {
> >    77                       // String regions are listed at the end and
> > they may not be fully occupied.
> >    78                       break;
> >    79                     } else if (name.equals("bm")) {
> >    80                       // Bitmap space does not have a requested address.
> >    81                       break;
> >
> > It's not part of your change, but could you please fix line 76 - 78
> > since it is trivial. It seems the lines can be removed.
>
> Removed.
>
> >
> > - /src/hotspot/share/memory/archiveUtils.hpp
> > The file name does not match with the macro '#ifndef
> > SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
> > archiveUtils.* ? archiveRelocator.hpp and archiveRelocator.cpp are
> > more descriptive.
> I named the file archiveUtils.hpp so we can move other misc stuff used
> by dumping into this file (e.g., DumpRegion, WriteClosure from
> metaspaceShared.hpp), since theses are not used by the majority of the
> files that use metaspaceShared.hpp.
>
> I fixed the ifdef.
>
> >
> > - src/hotspot/share/memory/archiveUtils.cpp
> >
> >    36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap, address*
> > ptr_base, address* ptr_end) {
> >    37   assert(_ptrmap == NULL, "initialize only once");
> >    38   _ptr_base = ptr_base;
> >    39   _ptr_end = ptr_end;
> >    40   _compacted = false;
> >    41   _ptrmap = ptrmap;
> >    42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); // default
> > archive is about 12MB.
> >    43 }
> >
> > Could we do a better estimate here? We could guesstimate the size
> > based on the current used class space and metaspace size. It's okay if
> > a larger bitmap used, since it can be reduced after all marking are
> > done.
>
> The bitmap is automatically expanded when necessary in
> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64 of the
> total archive size, so even if we do expand, the cost will be trivial.

The initial value is based on the default CDS archive. When dealing
with a really large archive, it would have to re-grow many times.
Also, using a hard-coded value is less desirable.

Thanks,
Jiangli

>
> Thanks
> - Ioi
>
>
> > Thanks,
> > Jiangli
> >
> >
> >
> > On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou <jianglizhou at google.com> wrote:
> >> Hi Ioi,
> >>
> >> This is another great step for CDS usability improvement. Thank you!
> >>
> >> I have a high level question (or request): could we consider
> >> separating the relocation work for 'direct' class metadata from other
> >> types of metadata (such as the shared system dictionary, symbol table,
> >> etc)? Initially we only relocate the tables and other archived global
> >> data. When each archived class is being loaded, we can relocate all
> >> the pointers within the current class. We could find the segment (for
> >> the current class) in the bitmap and update the pointers within the
> >> segment. That way we can reduce initial startup costs and also avoid
> >> relocating class data that's not used at runtime. In some real world
> >> large systems, an archive may contain extremely large number of
> >> classes.
> >>
> >> Following are partial review comments so we can move things forward.
> >> Still going through the rest of the changes.
> >>
> >> - src/hotspot/share/classfile/javaClasses.cpp
> >>
> >> 1218 void java_lang_Class::update_archived_mirror_native_pointers(oop
> >> archived_mirror) {
> >> 1219   Klass* k = ((Klass*)archived_mirror->metadata_field(_klass_offset));
> >> 1220   if (k != NULL) { // k is NULL for the primitive classes such as
> >> java.lang.Byte::TYPE <<<<<<<<<<<
> >> 1221     archived_mirror->metadata_field_put(_klass_offset,
> >> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
> >> 1222   }
> >> 1223 ...
> >>
> >> Primitive type mirrors are handled separately. Could you please verify
> >> if this call path happens for primitive type mirror?
> >>
> >> To answer my question above, looks like you added the following, which
> >> is to be used for primitive type mirrors. That seems to be the reason
> >> why update_archived_mirror_native_pointers is trying to also cover
> >> primitive type. It better to have a separate API for primitive type
> >> mirror, which is cleaner. And, we also can replace the above check at
> >> line 1220 to be an assert for regular mirrors.
> >>
> >> +void ReadClosure::do_mirror_oop(oop *p) {
> >> +  do_oop(p);
> >> +  oop mirror = *p;
> >> +  if (mirror != NULL) {
> >> +    java_lang_Class::update_archived_mirror_native_pointers(mirror);
> >> +  }
> >> +}
> >> +
> >>
> >> How about renaming update_archived_mirror_native_pointers to
> >> update_archived_mirror_klass_pointers.
> >>
> >> It would be good to pass the current klass as an argument. We can
> >> verify the relocated pointer matches with the current klass pointer.
> >>
> >> We should also check if relocation is necessary before spending cycles
> >> to obtain the klass pointer from the mirror.
> >>
> >> 1252   update_archived_mirror_native_pointers(m);
> >> 1253
> >> 1254   // mirror is archived, restore
> >> 1255   assert(HeapShared::is_archived_object(m), "must be archived
> >> mirror object");
> >> 1256   Handle mirror(THREAD, m);
> >>
> >> Could we move the line at 1252 after the assert at line 1255?
> >>
> >> - src/hotspot/share/include/cds.h
> >>
> >>    47   int     _mapped_from_file;  // Is this region mapped from a file?
> >>    48                               // If false, this region was
> >> initialized using os::read().
> >>
> >> Is the new field truly needed? It seems we could use _mapped_base to
> >> determine if a region is mapped or not?
> >>
> >> - src/hotspot/share/memory/dynamicArchive.cpp
> >>
> >> Could you please remove the debugging print code in
> >> dynamic_dump_method_comparator? Or convert those to logging output if
> >> they are helpful.
> >>
> >> Will send out the rest of the review comments later.
> >>
> >> Best,
> >>
> >> Jiangli
> >>
> >>
> >>
> >>
> >> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>> Bug:
> >>> https://bugs.openjdk.java.net/browse/JDK-8231610
> >>>
> >>> Webrev:
> >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
> >>>
> >>> Design:
> >>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
> >>>
> >>>
> >>> Overview:
> >>>
> >>> The CDS archive is mmaped to a fixed address range (starting at
> >>> SharedBaseAddress, usually 0x800000000). Previously, if this
> >>> requested address range is not available (usually due to Address
> >>> Space Layout Randomization (ASLR) [2]), the JVM will give up and
> >>> will load classes dynamically using class files.
> >>>
> >>> [a] This causes slow down in JVM start-up.
> >>> [b] Handling of mapping failures causes unnecessary complication in
> >>>       the CDS tests.
> >>>
> >>> Here are some preliminary benchmarking results (using default CDS archive,
> >>> running helloworld):
> >>>
> >>> (a) 47.1ms (CDS enabled, mapped at requested addr)
> >>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
> >>> (c) 86.2ms (CDS disabled)
> >>>
> >>> The small degradation in (b) is caused by the relocation of
> >>> absolute pointers embedded in the CDS archive. However, it is
> >>> still a big improvement over case (c)
> >>>
> >>> Please see the design doc (link above) for details.
> >>>
> >>> Thanks
> >>> - Ioi
> >>>
>

From david.holmes at oracle.com  Mon Nov  4 04:24:21 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 4 Nov 2019 14:24:21 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
Message-ID: <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>

So looking at Yasumasa's proposed fix ...

I don't think it is worth the disruption to pass the "thread" all the 
way through these API's. It is simpler/cleaner to just call 
Thread::current_or_null_safe() when you need the current thread.

357   assert(thread->is_Java_thread() && 
(((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");

This assertion is incorrect. As this can be called via 
VMError::report_or_die() there is AFAICS absolutely no guarantee that we 
need be in a JavaThread at all.

  428 class ThreadInVMForJFR : public StackObj {

Can I suggest JavaThreadInVM to make it clear this only affects 
JavaThreads. And as it is local we don't need the "forJFR" part.

Based on Markus's proposed change, and with a view to constrain the 
scope even further can I suggest the following:

if (!guard_reentrancy()) {
   return;
} else {
   // Ensure a JavaThread is _thread_in_vm when we make this call
   JavaThreadInVM jtivm(Thread::current_or_null_safe());
   if (!prepare_for_emergency_dump()) {
     return;
   }
}

Thanks,
David
-----


On 4/11/2019 12:24 pm, David Holmes wrote:
> Correction ...
> 
> On 4/11/2019 12:11 pm, David Holmes wrote:
>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>> On 2019/11/04 7:38, David Holmes wrote:
>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> I think you can simplify it to something like this:
>>>>>
>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>
>>>> That is more like I had envisaged for this. Reusing existing 
>>>> thread-state transition code is preferable to adding more custom 
>>>> code that directly manipulates thread-state.
>>>
>>> I do not agree with this change.
>>>
>>> VMError::report_and_die() has "Thread* thread" in its arguments. So 
>>> Thread::current() might be different with it.
>>
>> Not sure what you mean. You only ever manipulate the thread state of 
>> the current thread.
>>
>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to 
>>> change the thread state.
>>> It checks (and manipulates?) something which relates to safepoint.
>>
>> Yes it does - which would be a problem if a safepoint (or handshake) 
>> were pending. But the path through before_exit already has safepoint 
>> checks when you acquire the BeforeExit_lock.
> 
> But that isn't relevant. The issue is we don't want a safepoint check on 
> the report_and_die() path. So a custom transition helper is needed to 
> avoid that.
> 
> David
> 
>> The main problem with the suggestion is it seems we may not be running 
>> in a JavaThread:
>>
>> ??349?? Thread* const thread = Thread::current();
>> ??350?? if (thread->is_Watcher_thread()) {
>>
>> so we can't use the existing thread-state helpers, unless we narrow 
>> the scope (as you do) to after the check for the WatcherThread.
>>
>> David
>> -----
>>
>>> Thus I added ThreadInVMForJFR to new my webrev.
>>
>> Your change still seems overly complicated.
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks
>>>>> Markus
>>>>>
>>>>> -----Original Message-----
>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>> Sent: den 2 november 2019 16:57
>>>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; 
>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>>> thread state
>>>>>
>>>>> Hi,
>>>>>
>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>> So I updated webrev. Could you review it?
>>>>>
>>>>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>
>>>>> This change passed all tests on submit repo 
>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>> Forward to hotspot-runtime-dev.
>>>>>>
>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>
>>>>>> I'd like to hear about this from JFR folks.
>>>>>> If it is just a bug in JFR, I will create a patch which recover it 
>>>>>> in JFR code.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> -------- Forwarded Message --------
>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Please review this change:
>>>>>>
>>>>>> ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>> ? ? webrev: 
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>
>>>>>> If JFR is running when JVM crashes, JFR will dump data to 
>>>>>> hs_err_pid<PID>.jfr .
>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>
>>>>>> This change has been tested on submit repo as 
>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>> However this test is for JIT compiler, and related issue has been 
>>>>>> reported as JDK-8225620.
>>>>>> So I think this patch can go through.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa

From ioi.lam at oracle.com  Mon Nov  4 06:27:08 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Sun, 3 Nov 2019 22:27:08 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
Message-ID: <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>

Hi Jiangli,

Thank you so much for spending time reviewing this RFE!

On 11/3/19 6:34 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
> Sorry for the delay again. Will try to put this on the top of my list
> next week and reduce the turn-around time. The updates look good in
> general.
>
> We might want to have a better strategy when choosing metadata
> relocation address (when relocation is needed). Some
> applications/benchmarks may be more sensitive to cache locality and
> memory/data layout. There was a bug,
> https://bugs.openjdk.java.net/browse/JDK-8213713 that caused 1G gap
> between Java heap data and metadata before JDK 12. The gap seemed to
> cause a small but noticeable runtime effect in one case that I came
> across.

I guess you're saying we should try to relocate the archive into 
somewhere under 32GB?

Could you elaborate more about the performance issue, especially about 
cache locality? I looked at JDK-8213713 but it didn't mention about 
performance.

Also, by default, we have non-zero narrow_klass_base and 
narrow_klass_shift = 3, and archive relocation doesn't change that:

$ java -Xlog:cds=debug -version
... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
$ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3

We always use narrow_klass_shift due to this:

 ? // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
 ? // MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
 ? // how dump time narrow_klass_shift is set. Although, CDS can work
 ? // with zero-shift mode also, to be consistent with AOT it uses
 ? // LogKlassAlignmentInBytes for klass shift so archived java heap objects
 ? // can be used at same time as AOT code.
 ? if (!UseSharedSpaces
 ????? && (uint64_t)(higher_address - lower_base) <= 
UnscaledClassSpaceMax) {
 ??? CompressedKlassPointers::set_shift(0);
 ? } else {
 ??? CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
 ? }

> Here are some additional comments (minor).
>
> Could you please fix the long lines in the following?
>
> 1237 void java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
> archived_mirror) {
> 1238   if (MetaspaceShared::relocation_delta() != 0) {
> 1239     assert(archived_mirror->metadata_field(_klass_offset) ==
> NULL, "must be for primitive class");
> 1240
> 1241     Klass* ak =
> ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
> 1242     if (ak != NULL) {
> 1243       archived_mirror->metadata_field_put(_array_klass_offset,
> (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
> 1244     }
> 1245   }
> 1246 }
>
> src/hotspot/share/memory/dynamicArchive.cpp
>
>   889   Thread* THREAD = Thread::current();
>   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
> dynamic_dump_method_comparator);
>   891   if (ik->default_methods() != NULL) {
>   892     Method::sort_methods(ik->default_methods(),
> /*set_idnums=*/false, dynamic_dump_method_comparator);
>   893   }
>

OK will do.

> Please see inlined comments below.
>
> On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> Hi Jiangli,
>>
>> Thanks for the review. I've updated the patch according to your comments:
>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>>
>> (the delta is on top of 8231610-relocate-cds-archive.v03.delta in my
>> reply to Calvin's comments).
>>
>> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> Sorry for the delay. Here are my remaining comments.
>>>
>>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>
>>> 128   static intx _method_comparator_name_delta;
>>>
>>> The name of the above variable is confusing. It's the value of
>>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>>> directly.
>> _buffer_to_target_delta is a non-static field, but
>> dynamic_dump_method_comparator() must be a static function so it can't
>> use the non-static field easily.
>
> It sounds like an issue. _buffer_to_target_delta was made as a
> non-static mostly because we might support more than one dynamic
> archives in the future. However, today's usages bake in an assumption
> that _buffer_to_target_delta is a singleton value. It is cleaner to
> either make _buffer_to_target_delta as a static variable for now, or
> adding an access API in DynamicArchiveBuilder to allow other code to
> properly and correctly use the value.

OK, I'll move it to a static variable.

>
>>> Also, we can do a quick pointer comparison of 'a_name' and
>>> 'b_name' first before adjusting the pointers.
>> I added this:
>>
>>       if (a_name == b_name) {
>>         return 0;
>>       }
>>
>>> ---
>>>
>>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>>> ...
>>>    944
>>>    945   ArchivePtrMarker::compact(relocatable_base, relocatable_end);
>>> ...
>>>
>>>    974     SharedDataRelocator patcher((address*)patch_base,
>>> (address*)patch_end, valid_old_base, valid_old_end,
>>>    975                                 valid_new_base, valid_new_end, addr_delta);
>>>    976     ArchivePtrMarker::ptrmap()->iterate(&patcher);
>>>
>>> Could we reduce the number of data re-iterations to help archive
>>> dumping performance. The ArchivePtrMarker::compact operation can be
>>> combined with the patching iteration. ArchivePtrMarker::compact API
>>> can be removed.
>> That's a good idea. I implemented it using a template parameter so that
>> we can have max performance when relocating the archive at run time.
>>
>> I added comments to explain why the relocation is done here. The
>> relocation is pretty rare (only when the base archive was not mapped at
>> the default location).
>>
>>> ---
>>>
>>>    967     address valid_new_base =
>>> (address)Arguments::default_SharedBaseAddress();
>>>    968     address valid_new_end  = valid_new_base + base_plus_top_size;
>>>
>>> The debugging only code can be included under #ifdef ASSERT.
>> These values are actually also used in debug logging so they can't be
>> ifdef'ed out.
>>
>> Also, the c++ compiler is pretty good with eliding code that's no
>> actually used. If I comment out all the logging code in
>> DynamicArchiveBuilder::relocate_buffer_to_target()  and
>> SharedDataRelocator, gcc elides all the unused fields and their
>> assignments. So no code is generated for this, etc.
>>
>>       address valid_new_base =
>> (address)Arguments::default_SharedBaseAddress();
>>
>> Since #ifdef ASSERT makes the code harder to read, I think we should use
>> it only when really necessary.
> It seems cleaner to get rid of these debugging only variables, by
> using 'relocatable_base' and
> '(address)Arguments::default_SharedBaseAddress()' in the logging code.

SharedDataRelocator is used under 3 different situations. These six 
variables (patch_base, patch_end, valid_old_base, valid_old_end, 
valid_new_base, valid_new_end) describes what is being patched, and what 
the expectations are, for each situation. The code will be hard to 
understand without them.

Please note there's also logging code in the SharedDataRelocator 
constructor that prints out these values.

I think I'll just remove the 'debug only' comment to avoid confusion.

>
>>> ---
>>>
>>>    993   dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>>>
>>> We could combine the archived heap data bitmap into the new region as
>>> well? It can be handled as a separate RFE.
>> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>>
>>> - src/hotspot/share/memory/filemap.cpp
>>>
>>> 1038     if (is_static()) {
>>> 1039       if (errno == ENOENT) {
>>> 1040         // Not locating the shared archive is ok.
>>> 1041         fail_continue("Specified shared archive not found (%s).",
>>> _full_path);
>>> 1042       } else {
>>> 1043         fail_continue("Failed to open shared archive file (%s).",
>>> 1044                       os::strerror(errno));
>>> 1045       }
>>> 1046     } else {
>>> 1047       log_warning(cds, dynamic)("specified dynamic archive
>>> doesn't exist: %s", _full_path);
>>> 1048     }
>>>
>>> If the top layer is explicitly specified by the user, a warning does
>>> not seem to be a proper behavior if the VM fails to open the archive
>>> file.
>>>
>>> If might be better to handle the relocation unrelated code in separate
>>> changeset and track with a separate RFE.
>> This code was moved from
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>>
>> so I am not changing the behavior. If you want, we can file an REF to
>> change the behavior.
> Ok. A new RFE sounds like the right thing to re-evaluable the usage
> issue here. Thanks.

I created https://bugs.openjdk.java.net/browse/JDK-8233446

>>> ---
>>>
>>> 1148 void FileMapInfo::write_region(int region, char* base, size_t size,
>>> 1149                                bool read_only, bool allow_exec) {
>>> ...
>>> 1154
>>> 1155   if (region == MetaspaceShared::bm) {
>>> 1156     target_base = NULL;
>>> 1157   } else if (DynamicDumpSharedSpaces) {
>>>
>>> It's not too clear to me how the bitmap (bm) region is handled for the
>>> base layer and top layer. Could you please explain?
>> The bm region for both layers are mapped at an address picked by the OS:
>>
>> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>>     FileMapRegion* si = space_at(MetaspaceShared::bm);
>>     bitmap_size = si->used_aligned();
>>     bool read_only = true, allow_exec = false;
>>     char* requested_addr = NULL; // allow OS to pick any location
>>     char* bitmap_base = os::map_memory(_fd, _full_path, si->file_offset(),
>>                                        requested_addr, bitmap_size,
>> read_only, allow_exec);
>>
> Ok, after staring at the code for a few seconds I saw that's intended.
> If the current region is 'bm', then the 'target_base' is NULL
> regardless if it's static or dynamic archive. Otherwise, the
> 'target_base' is handled differently for the static and dynamic case.
> The following would be cleaner and has better reliability.
>
>     char* target_base = NULL;
>
>     // The target_base is NULL for 'bm' region.
>     if (!region == MetaspaceShared::bm) {
>       if (DynamicDumpSharedSpaces) {
>         assert(!HeapShared::is_heap_region(region), "dynamic archive
> doesn't support heap regions");
>         target_base = DynamicArchive::buffer_to_target(base);
>       } else {
>         target_base = base;
>       }
>    }

How about this?

 ? char* target_base;
 ? if (region == MetaspaceShared::bm) {
 ??? target_base = NULL; // always NULL for bm region.
 ? } else {
 ??? if (DynamicDumpSharedSpaces) {
 ?? ? ?? assert(!HeapShared::is_heap_region(region), "dynamic archive 
doesn't support heap regions");
 ??????? target_base = DynamicArchive::buffer_to_target(base);
 ??? } else {
 ??????? target_base = base;
 ??? }
 ? }


>
>>> ---
>>>
>>> 1362   DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>
>>> Could you please explain the above?
>> I added the comments
>>
>>     // Make sure we don't attempt to use header()->mapped_base_address()
>> unless
>>     // it's been successfully mapped.
>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>
>>> ---
>>>
>>> 1359   FileMapRegion* last_region = NULL;
>>>
>>> 1371     if (last_region != NULL) {
>>> 1372       // Ensure that the OS won't be able to allocate new memory
>>> spaces between any mapped
>>> 1373       // regions, or else it would mess up the simple comparision
>>> in MetaspaceObj::is_shared().
>>> 1374       assert(si->mapped_base() == last_region->mapped_end(),
>>> "must have no gaps");
>>>
>>> 1379     last_region = si;
>>>
>>> Can you please place 'last_region' related code under #ifdef ASSERT?
>> I think that will make the code more cluttered. The compiler will
>> optimize out that away.
> It's cleaner to define debugging only variable for debugging only
> builds. You can wrapper it and related usage with DEBUG_ONLY.

OK, will do.

>
>>> ---
>>>
>>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
>>> 1480   bitmap_size = si->used_aligned();
>>> 1481   bool read_only = true, allow_exec = false;
>>> 1482   char* requested_addr = NULL; // allow OS to pick any location
>>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path, si->file_offset(),
>>> 1484                                      requested_addr, bitmap_size,
>>> read_only, allow_exec);
>>>
>>> We need to handle mapping failure here.
>> It's handled here:
>>
>> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>>     log_debug(cds, reloc)("runtime archive relocation start");
>>     size_t bitmap_size;
>>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
>>     if (bitmap_base != NULL) {
>>     ...
>>     } else {
>>       log_error(cds)("failed to map relocation bitmap");
>>       return false;
>>     }
>>
> 'bitmap_base' is used immediately after map_memory(). So the check
> needs to be done immediately after map_memory(), but not in the caller
> of map_relocation_bitmap().
>
> 1490   char* bitmap_base = os::map_memory(_fd, _full_path, si->file_offset(),
> 1491                                      requested_addr, bitmap_size,
> read_only, allow_exec);
> 1492
> 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
> !region_crc_check(bitmap_base, bitmap_size, si->crc())) {

OK, I'll fix that.

>
>
>>> ---
>>>
>>> 1513     // debug only -- the current value of the pointers to be
>>> patched must be within this
>>> 1514     // range (i.e., must be between the requesed base address,
>>> and the of the current archive).
>>> 1515     // Note: top archive may point to objects in the base
>>> archive, but not the other way around.
>>> 1516     address valid_old_base = (address)header()->requested_base_address();
>>> 1517     address valid_old_end  = valid_old_base + mapping_end_offset();
>>>
>>> Please place all FileMapInfo::relocate_pointers debugging only code
>>> under #ifdef ASSERT.
>> Ditto about ifdef ASSERT
>>
>>> - src/hotspot/share/memory/heapShared.cpp
>>>
>>>    441 void HeapShared::initialize_from_archived_subgraph(Klass* k) {
>>>    442   if (!open_archive_heap_region_mapped() || !MetaspaceObj::is_shared(k)) {
>>>    443     return; // nothing to do
>>>    444   }
>>>
>>> When do we call HeapShared::initialize_from_archived_subgraph for a
>>> klass that's not shared?
>> I've removed the !MetaspaceObj::is_shared(k). I probably added that for
>> debugging purposes only.
>>
>>>    616   DEBUG_ONLY({
>>>    617       Klass* klass = orig_obj->klass();
>>>    618       assert(klass != SystemDictionary::Module_klass() &&
>>>    619              klass != SystemDictionary::ResolvedMethodName_klass() &&
>>>    620              klass != SystemDictionary::MemberName_klass() &&
>>>    621              klass != SystemDictionary::Context_klass() &&
>>>    622              klass != SystemDictionary::ClassLoader_klass(), "we
>>> can only relocate metaspace object pointers inside java_lang_Class
>>> instances");
>>>    623     });
>>>
>>> Let's leave the above for a separate RFE. I think assert is not
>>> sufficient for the check. Also, why ResolvedMethodName, Module and
>>> MemberName cannot be part of the graph?
>>>
>>>
>> I added the following comment:
>>
>>     DEBUG_ONLY({
>>         // The following are classes in share/classfile/javaClasses.cpp
>> that have injected native pointers
>>         // to metaspace objects. To support these classes, we need to add
>> relocation code similar to
>>         // java_lang_Class::update_archived_mirror_native_pointers.
>>         Klass* klass = orig_obj->klass();
>>         assert(klass != SystemDictionary::Module_klass() &&
>>                klass != SystemDictionary::ResolvedMethodName_klass() &&
>>
> It's too restrictive to exclude those objects from the archived object
> graph because metadata relocation, since metadata relocation is rare.
> The trade-off doesn't seem to buy us much.
>
> Do you plan to add the needed relocation code?

I looked more into this. Actually we cannot handle these 5 classes at 
all, even without archive relocation:

[1] #define MODULE_INJECTED_FIELDS(macro) \
 ? macro(java_lang_Module, module_entry, intptr_signature, false)

->? module_entry is malloc'ed

[2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
 ? macro(java_lang_invoke_ResolvedMethodName, vmholder, 
object_signature, false) \
 ? macro(java_lang_invoke_ResolvedMethodName, vmtarget, 
intptr_signature, false)

-> these fields are related to method handles and lambda forms, etc. 
They can't be easily be archived without implementing lambda form 
archiving. (I did a prototype; it's very complex and fragile).

[3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
 ? macro(java_lang_invoke_MethodHandleNatives_CallSiteContext, 
vmdependencies, intptr_signature, false) \
 ? macro(java_lang_invoke_MethodHandleNatives_CallSiteContext, 
last_cleanup, long_signature, false)

-> vmdependencies is malloc'ed.

[4] #define 
MEMBERNAME_INJECTED_FIELDS(macro)?????????????????????????????? \
 ? macro(java_lang_invoke_MemberName, vmindex,? intptr_signature, false)

-> this one is probably OK. Despite being declared as 
'intptr_signature', it seems to be used just as an integer. However, 
MemberNames are typically used with [2] and [3]. So let's just forbid it 
to be safe.

[2] [3] [4] are not used directly by regular Java code and are unlikely 
to be referenced (directly or indirectly) by static fields (except for 
the static fields in the classes in java.lang.invoke, which we probably 
won't support for heap archiving due to the problem I described for 
[2]). Objects of these types are typically referenced via constant pool 
entries.

[5] #define CLASSLOADER_INJECTED_FIELDS(macro)??????????????????????????? \
 ? macro(java_lang_ClassLoader, loader_data,? intptr_signature, false)

-> loader_data is malloc'ed.

So, I will change the DEBUG_ONLY into a product-mode check, and quit 
dumping if these objects are found in the object subgraph.

Maybe we should backport the check to older versions as well?

>
>>> - src/hotspot/share/memory/metaspace.cpp
>>>
>>> 1036   metaspace_rs = ReservedSpace(compressed_class_space_size(),
>>> 1037                                              _reserve_alignment,
>>> 1038                                              large_pages,
>>> 1039                                              requested_addr);
>>>
>>> Please fix indentation.
>> Fixed.
>>
>>> - src/hotspot/share/memory/metaspaceClosure.hpp
>>>
>>>     78   enum SpecialRef {
>>>     79     _method_entry_ref
>>>     80   };
>>>
>>> Are there other pointers that are not references to MetaspaceObj? If
>>> _method_entry_ref is the only type, it's probably not worth defining
>>> SpecialRef?
>> There may be more types in the future, so I want to have a stable API
>> that can be easily expanded without touching all the code that uses it.
>>
>>
>>> - src/hotspot/share/memory/metaspaceShared.hpp
>>>
>>>     42 enum MapArchiveResult {
>>>     43   MAP_ARCHIVE_SUCCESS,
>>>     44   MAP_ARCHIVE_MMAP_FAILURE,
>>>     45   MAP_ARCHIVE_OTHER_FAILURE
>>>     46 };
>>>
>>> If we want to define different failure types, it's probably worth
>>> using separate types for relocation failure and validation failure.
>> For now, I just need to distinguish between MMAP_FAILURE (where I should
>> attempt to remap at an alternative address) and OTHER_FAILURE (where the
>> CDS archive loading will fail -- due to validation error, insufficient
>> memory, etc -- without attempting to remap.)
>>
>>> ---
>>>
>>>    193   static intx _mapping_delta; // FIXME rename
>>>
>>> How about _relocation_delta?
>> Changed as suggested.
>>
>>> - src/hotspot/share/oops/instanceKlass
>>>
>>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>>>
>>> The use of _disable_method_binary_search is not necessary. You can use
>>> DynamicDumpSharedSpaces for the purpose. That would make things
>>> cleaner.
>> If we always disable the binary search when DynamicDumpSharedSpaces is
>> true, it will slow down normal execution of the Java program when
>> -XX:ArchiveClassesAtExit has been specified, but the program hasn't exited.
> Could you please add some comments to _disable_method_binary_search
> with the above explanation? Thanks.

OK
>
>>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>>>
>>>     76                     if (name.equals("s0") || name.equals("s1")) {
>>>     77                       // String regions are listed at the end and
>>> they may not be fully occupied.
>>>     78                       break;
>>>     79                     } else if (name.equals("bm")) {
>>>     80                       // Bitmap space does not have a requested address.
>>>     81                       break;
>>>
>>> It's not part of your change, but could you please fix line 76 - 78
>>> since it is trivial. It seems the lines can be removed.
>> Removed.
>>
>>> - /src/hotspot/share/memory/archiveUtils.hpp
>>> The file name does not match with the macro '#ifndef
>>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>>> archiveUtils.* ? archiveRelocator.hpp and archiveRelocator.cpp are
>>> more descriptive.
>> I named the file archiveUtils.hpp so we can move other misc stuff used
>> by dumping into this file (e.g., DumpRegion, WriteClosure from
>> metaspaceShared.hpp), since theses are not used by the majority of the
>> files that use metaspaceShared.hpp.
>>
>> I fixed the ifdef.
>>
>>> - src/hotspot/share/memory/archiveUtils.cpp
>>>
>>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap, address*
>>> ptr_base, address* ptr_end) {
>>>     37   assert(_ptrmap == NULL, "initialize only once");
>>>     38   _ptr_base = ptr_base;
>>>     39   _ptr_end = ptr_end;
>>>     40   _compacted = false;
>>>     41   _ptrmap = ptrmap;
>>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); // default
>>> archive is about 12MB.
>>>     43 }
>>>
>>> Could we do a better estimate here? We could guesstimate the size
>>> based on the current used class space and metaspace size. It's okay if
>>> a larger bitmap used, since it can be reduced after all marking are
>>> done.
>> The bitmap is automatically expanded when necessary in
>> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64 of the
>> total archive size, so even if we do expand, the cost will be trivial.
> The initial value is based on the default CDS archive. When dealing
> with a really large archive, it would have to re-grow many times.
> Also, using a hard-coded value is less desirable.

OK, I changed it to the following

 ? // Use this as initial guesstimate. We should need less space in the
 ? // archive, but if we're wrong the bitmap will be expanded automatically.
 ? size_t estimated_archive_size = MetaspaceGC::capacity_until_GC();
 ? // But set it smaller in debug builds so we always test the expansion 
code.
 ? // (Default archive is about 12MB).
 ? DEBUG_ONLY(estimated_archive_size = 6 * M);

 ? // We need one bit per pointer in the archive.
 ? _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));


Thanks!
- Ioi

>
>>>
>>>
>>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>>>> Hi Ioi,
>>>>
>>>> This is another great step for CDS usability improvement. Thank you!
>>>>
>>>> I have a high level question (or request): could we consider
>>>> separating the relocation work for 'direct' class metadata from other
>>>> types of metadata (such as the shared system dictionary, symbol table,
>>>> etc)? Initially we only relocate the tables and other archived global
>>>> data. When each archived class is being loaded, we can relocate all
>>>> the pointers within the current class. We could find the segment (for
>>>> the current class) in the bitmap and update the pointers within the
>>>> segment. That way we can reduce initial startup costs and also avoid
>>>> relocating class data that's not used at runtime. In some real world
>>>> large systems, an archive may contain extremely large number of
>>>> classes.
>>>>
>>>> Following are partial review comments so we can move things forward.
>>>> Still going through the rest of the changes.
>>>>
>>>> - src/hotspot/share/classfile/javaClasses.cpp
>>>>
>>>> 1218 void java_lang_Class::update_archived_mirror_native_pointers(oop
>>>> archived_mirror) {
>>>> 1219   Klass* k = ((Klass*)archived_mirror->metadata_field(_klass_offset));
>>>> 1220   if (k != NULL) { // k is NULL for the primitive classes such as
>>>> java.lang.Byte::TYPE <<<<<<<<<<<
>>>> 1221     archived_mirror->metadata_field_put(_klass_offset,
>>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>>>> 1222   }
>>>> 1223 ...
>>>>
>>>> Primitive type mirrors are handled separately. Could you please verify
>>>> if this call path happens for primitive type mirror?
>>>>
>>>> To answer my question above, looks like you added the following, which
>>>> is to be used for primitive type mirrors. That seems to be the reason
>>>> why update_archived_mirror_native_pointers is trying to also cover
>>>> primitive type. It better to have a separate API for primitive type
>>>> mirror, which is cleaner. And, we also can replace the above check at
>>>> line 1220 to be an assert for regular mirrors.
>>>>
>>>> +void ReadClosure::do_mirror_oop(oop *p) {
>>>> +  do_oop(p);
>>>> +  oop mirror = *p;
>>>> +  if (mirror != NULL) {
>>>> +    java_lang_Class::update_archived_mirror_native_pointers(mirror);
>>>> +  }
>>>> +}
>>>> +
>>>>
>>>> How about renaming update_archived_mirror_native_pointers to
>>>> update_archived_mirror_klass_pointers.
>>>>
>>>> It would be good to pass the current klass as an argument. We can
>>>> verify the relocated pointer matches with the current klass pointer.
>>>>
>>>> We should also check if relocation is necessary before spending cycles
>>>> to obtain the klass pointer from the mirror.
>>>>
>>>> 1252   update_archived_mirror_native_pointers(m);
>>>> 1253
>>>> 1254   // mirror is archived, restore
>>>> 1255   assert(HeapShared::is_archived_object(m), "must be archived
>>>> mirror object");
>>>> 1256   Handle mirror(THREAD, m);
>>>>
>>>> Could we move the line at 1252 after the assert at line 1255?
>>>>
>>>> - src/hotspot/share/include/cds.h
>>>>
>>>>     47   int     _mapped_from_file;  // Is this region mapped from a file?
>>>>     48                               // If false, this region was
>>>> initialized using os::read().
>>>>
>>>> Is the new field truly needed? It seems we could use _mapped_base to
>>>> determine if a region is mapped or not?
>>>>
>>>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>
>>>> Could you please remove the debugging print code in
>>>> dynamic_dump_method_comparator? Or convert those to logging output if
>>>> they are helpful.
>>>>
>>>> Will send out the rest of the review comments later.
>>>>
>>>> Best,
>>>>
>>>> Jiangli
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
>>>>>
>>>>> Design:
>>>>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
>>>>>
>>>>>
>>>>> Overview:
>>>>>
>>>>> The CDS archive is mmaped to a fixed address range (starting at
>>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
>>>>> requested address range is not available (usually due to Address
>>>>> Space Layout Randomization (ASLR) [2]), the JVM will give up and
>>>>> will load classes dynamically using class files.
>>>>>
>>>>> [a] This causes slow down in JVM start-up.
>>>>> [b] Handling of mapping failures causes unnecessary complication in
>>>>>        the CDS tests.
>>>>>
>>>>> Here are some preliminary benchmarking results (using default CDS archive,
>>>>> running helloworld):
>>>>>
>>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
>>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
>>>>> (c) 86.2ms (CDS disabled)
>>>>>
>>>>> The small degradation in (b) is caused by the relocation of
>>>>> absolute pointers embedded in the CDS archive. However, it is
>>>>> still a big improvement over case (c)
>>>>>
>>>>> Please see the design doc (link above) for details.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>


From david.holmes at oracle.com  Mon Nov  4 06:28:35 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 4 Nov 2019 16:28:35 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
Message-ID: <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>

Hi Dan,

A few follow ups to your responses, with trimming ...

On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
> On 10/24/19 7:00 AM, David Holmes wrote:
>> ?122 // Set _owner field to new_value; current value must match 
>> old_value.
>> ?123 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
>> old_value) {
>> ?124?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>> ?125?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>> INTPTR_FORMAT
>>
>> The use of cmpxchg seems a little strange here if you are asserting 
>> that when this is called _owner must equal old_value. That means you 
>> don't expect any race and if there is no race with another thread 
>> writing to _owner then you don't need the cmpxchg. A normal:
>>
>> if (_owner == old_value) {
>> ?? Atomic::store(&_owner, new_value);
>> ?? log(...);
>> } else {
>> ?? guarantee(false, " unexpected old owner ...");
>> }
> 
> The two parameter version of set_owner_from() is only called from three
> places and we'll cover two of them here:
> 
> src/hotspot/share/runtime/objectMonitor.cpp:
> 
> 1041???? if (AsyncDeflateIdleMonitors) {
> 1042?????? set_owner_from(NULL, Self);
> 1043???? } else {
> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
> lock
> 1045?????? OrderAccess::storeload();??????????????????????? // See if we 
> need to wake a successor
> 1046???? }
> 
> and:
> 
> 1221?? if (AsyncDeflateIdleMonitors) {
> 1222???? set_owner_from(NULL, Self);
> 1223?? } else {
> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
> _owner vs LD in unpark()
> 1226?? }
> 
> So I've replaced the existing {release_store(), storeload()} combo for one
> call site and the existing {release_store(), fence()} combo for the other
> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
> 
> 1) I wanted the same memory sync behavior at both call sites.
> 2) I wanted similar/same memory sync behavior as the original
>  ?? code at those call sites.

Why? The memory sync requirements for non-async deflation may be 
completely different to those required for async-delfation (given all 
the other bits if the protocol).

> 3) I wanted the return value from cmpxchg() for my state machine
>  ?? sanity check.

I'm somewhat dubious about using cmpxchg just for the side-effect of 
getting the existing value.

> I don't think that using 'Atomic::store(&_owner, new_value)' is the
> right choice for these two call sites.

If you don't actually need the cmpxchg to handle concurrent updates to 
the _owner field, then a plain store (not an Atomic::store - that was an 
error on my part) does not seem unreasonable; or if there are still 
memory sync issues here, perhaps a release_store.

If you use cmpxchg then anyone reading the code will assume there is a 
concurrent update that you are guarding against.

> The last two parameter set_owner_from() is talked about in the
> next reply.
> 
> 
>> Similarly for the old_value1/old_valuie2 version.
> 
> The three parameter version of set_owner_from() is only called from one
> place and the last two parameter version is called from the same place:
> 
> src/hotspot/share/runtime/synchronizer.cpp:
> 
> 1903?????? if (AsyncDeflateIdleMonitors) {
> 1904???????? m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
> 1905?????? } else {
> 1906???????? m->set_owner_from(mark.locker(), NULL);
> 1907?????? }
> 
> The original code was:
> 
> 1399?????? m->set_owner(mark.locker());
> 
> The original set_owner() code was defined like this:
> 
>  ? 87 inline void ObjectMonitor::set_owner(void* owner) {
>  ? 88?? _owner = owner;
>  ? 89 }
> 
> So the original code didn't do any memory sync'ing at all and I've
> changed that to a cmpxchg() on both code paths. That appears to be
> overkill for that callsite...

Again I'm not sure any memory sync requirements from the non-async case 
should necessarily transfer over to the async case. Even if you end up 
requiring similar memory sync the reasoning would be quite different I 
would expect.

> 
> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
> section of the code. We've gotten our ObjectMonitor from om_alloc()
> and are initializing a number of fields in the ObjectMonitor. The
> ObjectMonitor is not published until we do:
> 
> 1916?????? object->release_set_mark(markWord::encode(m));
> 
> So we don't need the memory sync'ing features of the cmpxchg() for
> either of the set_owner_from() calls and all that leaves is the
> state machine sanity check.
> 
> I really like the state machine sanity check on the owner field but
> that's just because it came in handy when chasing the recent races.
> It would be easy to change the three parameter version of
> set_owner_from() to not do memory sync'ing, but still do the state
> machine sanity check.
> 
> Update: Changing the three parameter version of set_owner_from()
> may impact the changes to owner_is_DEFLATER_MARKER() discussed
> above. Sigh...
> Update 2: Probably no impact because the three parameter version of
> set_owner_from() is only used before the ObjectMonitor is published
> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
> appeared on an in-use list.
> 
> However, the two parameter version of set_owner_from() needs its
> memory sync'ing behavior for it's objectMonitor.cpp call sites so
> this call site would need something different.
> 
> I'm not sure which solution I'm going to pick yet, but I definitely
> have to change something here since we don't need cmpxchg() at this
> call site. More thought is required.

I will look to see where this ended up.

>> src/hotspot/share/runtime/objectMonitor.cpp
>>
>>
>> ?267?? if (AsyncDeflateIdleMonitors &&
>> ?268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>> DEFLATER_MARKER) {
> 
> For more context, we are in:
> 
>  ?241 void ObjectMonitor::enter(TRAPS) {
> 
> 
>> I don't see why you need to call try_set_owner_from again here as 
>> "cur" will already be DEFLATER_MARKER from the previous try_set_owner.
> 
> I assume the previous try_set_owner() call you mean is this one:
> 
>  ?248?? void* cur = try_set_owner_from(Self, NULL);
> 
> This first try_set_owner() is for the most common case of no owner.
> 
> The second try_set_owner() call is for a different condition than the 
> first:
> 
>  ?268?????? try_set_owner_from(Self, DEFLATER_MARKER) == DEFLATER_MARKER) {
> 
> L248 is trying to change the _owner field from NULL -> 'Self'.
> L268 is trying to change the _owner field from DEFLATER_MARKER to 'Self'.
> 
> If the try_set_owner() call on L248 fails, 'cur' can be several possible
> values:
> 
>  ? - the calling thread (recursive enter is handled on L254-7)
>  ? - other owning thread value (BasicLock* or Thread*)
>  ? - DEFLATER_MARKER

I'll give a caution okay to that explanation (the deficiency being in my 
understanding, not your explaining :) ).

>> Further, I don't see how installing self as the _owner here is valid 
>> and means you acquired the monitor, as the fact it was DEFLATER_MARKER 
>> means it is still being deflated by another thread doesn't it ???
> 
> I guess the comment after L268 didn't work for you:
> 
>  ?269???? // The deflation protocol finished the first part (setting 
> owner),
>  ?270???? // but it failed the second part (making ref_count negative) and
>  ?271???? // bailed. Or the ObjectMonitor was async deflated and reused.
> 
> It means that the deflater thread was racing with this enter and
> managed to set the owner field to DEFLATER_MARKER as the first step
> in the deflation protocol. Our entering thread actually won the race
> when it managed to set the ref_count to a positive value as part of
> the ObjectMonitorHandle stuff done in the inflate() call that preceded
> the enter() call. However, the deflater thread hasn't realized that it
> lost the race yet and hasn't restored the owner field back to NULL.

You're right the comment didn't work for me as it required me to be 
holding too much of the protocol in my head. Makes more sense now.

Thanks,
David
-----

From fujie at loongson.cn  Mon Nov  4 09:16:51 2019
From: fujie at loongson.cn (Jie Fu)
Date: Mon, 4 Nov 2019 17:16:51 +0800
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(), "should
 only happen during init") after JDK-8229516
Message-ID: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>

Hi all,

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/

According to the comment [1], the assert seems to miss the case for 
threads attached via JNI.
For more info, please refer to the JBS.

Could you please review it and give me some advice?

Thanks a lot.
Best regards,
Jie

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249


From felix.yang at huawei.com  Mon Nov  4 11:42:01 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 4 Nov 2019 11:42:01 +0000
Subject: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile
 bug
In-Reply-To: <8ad0aa09-bfb1-c891-e17a-be7d14b3a2ae@redhat.com>
References: <DA41BE1DDCA941489001C7FBD7A8820ED5FE7C9A@dggeml527-mbx.china.huawei.com>
 <222f9c0b-7320-8d22-cd44-c4f3af7c1311@redhat.com>
 <DA41BE1DDCA941489001C7FBD7A8820ED5FE7E9C@dggeml527-mbx.china.huawei.com>
 <880f5072-91ba-66bd-94be-429556e7c132@redhat.com>
 <DA41BE1DDCA941489001C7FBD7A8820ED5FF3BEC@dggeml527-mbx.china.huawei.com>
 <8ad0aa09-bfb1-c891-e17a-be7d14b3a2ae@redhat.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820ED602769E@dggeml527-mbx.china.huawei.com>

> -----Original Message-----
> From: Andrew Haley [mailto:aph at redhat.com]
> Sent: Thursday, October 17, 2019 9:06 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>;
> aarch64-port-dev at openjdk.java.net
> Cc: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: [aarch64-port-dev ] RFR (trivial) : fix aarch64-8u type profile bug
> 
> On 9/26/19 2:59 AM, Yangfei (Felix) wrote:
> > CCing to hotspot-runtime-dev list.
> >
> > This has passed hotspot jtreg test on aarch64-linux.  Is it OK to go?
> 
> I'll have a look.

Hi,

    I opened a new bug for this: https://bugs.openjdk.java.net/browse/JDK-8233466
    Webrev: http://cr.openjdk.java.net/~fyang/8233466/webrev.00/
    Passed tier1-3 testing.  Is it OK to go?

Thanks,
Felix

From erik.osterlund at oracle.com  Mon Nov  4 13:09:48 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Mon, 4 Nov 2019 14:09:48 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
Message-ID: <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>

Hi,

TL/DR: David is right; the commentary is weird and does not capture what 
the real constraints are.

As the comment implied before "8222034: Thread-SMR functions should be 
updated to remove work around", the PPC port used to have incorrect 
memory ordering, and the code guarded against that. inc/dec used to be 
memory_order_relaxed and add/sub used to be memory_order_acq_rel on PPC, 
despite the shared contract promising memory_order_conservative.

The implication for the nested counter in the Thread SMR project was 
that I wanted to use the inc/dec API but knew it was not gonna work as 
expected on PPC because we really needed *at least* memory_order_acq_rel 
when decrementing (and memory_order_conservative when incrementing, 
which was simulated in a CAS loop... yuck), but would find ourselves 
getting memory_order_relaxed. Rather than treating it as a bug in the 
PPC atomics implementation, and having the code be broken while we 
waited for a fix, I changed the use to sub when decrementing (which gave 
me the required memory_order_acq_rel ordering I needed), and the 
horrible CAS loop when incrementing, as a workaround, and alerted Martin 
Doerr that this would needed to be sorted out in the PPC code. Since 
then, the PPC code did indeed get cleaned up so that inc/dec stopped 
being relaxed-only and worked as advertised.

After that, the "8222034: Thread-SMR functions should be updated to 
remove work around" change removed the workaround that was no longer 
required from the code, and put back the desired inc/dec calls (which 
now used an overly conservative memory_order_conservative ordering, 
which is suboptimal, in particular for decrements, but importantly not 
incorrect). Since the nested case would almost never run and is possibly 
the coldest code path in the VM, I did not care to comment in that 
review thread about optimizing it by explicitly passing in a weaker 
ordering. However, I should have commented on the comment that was 
changed, which does indeed look a bit confused. David is right that the 
stuff about volatile has nothing to do with why this is correct. The 
correctness required memory_order_acq_rel for decrements, but the 
implementation provided more, which is fine.

The actual reason why I wanted memory_order_conservative for correctness 
when incrementing and memory_order_acq_rel when decrementing, was to 
prevent accesses inside of the critical section (in particular - reading 
Thread*s from the acquired ThreadsList), from floating outside of the 
reference increment and decrement that marks reading the list as safe to 
access without the underlying list blowing up. In practice, it might 
have been possible to relax it a bit by relying on side effects of other 
unrelated parts of the protocol to have spurious fencing... but I did 
not want to get the protocol tangled in that way because it would be 
difficult to reason about.

Hope this explanation clears up that confusion.

Thanks,
/Erik

On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
> Erik,
>
> David H. made a comment during this review cycle that should interest 
> you.
>
> The longer version of this comment came up in early reviews of the Async
> Monitor Deflation code because I copied the code and the longer comment
> from threadSMR.cpp. I updated the comment based on your input and review
> and changed the comment and code in threadSMR.cpp and in the Async 
> Monitor
> Deflation project code.
>
> The change in threadSMR.cpp was done with this changeset:
>
> $ hg log -v -r 54517
> changeset:?? 54517:c201ca660afd
> user:??????? dcubed
> date:??????? Thu Apr 11 14:14:30 2019 -0400
> files:?????? src/hotspot/share/runtime/threadSMR.cpp
> description:
> 8222034: Thread-SMR functions should be updated to remove work around
> Reviewed-by: mdoerr, eosterlund
>
> Here's one of the two diffs to job your memory:
>
> ?void ThreadsList::dec_nested_handle_cnt() {
> -? // The decrement needs to be MO_ACQ_REL. At the moment, the 
> Atomic::dec
> -? // backend on PPC does not yet conform to these requirements. 
> Therefore
> -? // the decrement is simulated with an Atomic::sub(1, &addr).
> -? // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR 
> mechanism
> -? // is not generally safe to use.
> -? Atomic::sub(1, &_nested_handle_cnt);
> +? // The decrement only needs to be MO_ACQ_REL since the reference
> +? // counter is volatile (and the hazard ptr is already NULL).
> +? Atomic::dec(&_nested_handle_cnt);
> ?}
>
> Below is David's comment about the code comment...
>
> Dan
>
>
> Trimming down to just that issue...
>
> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>> On 10/24/19 7:00 AM, David Holmes wrote:
> >
> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>
>>> ?199 // The decrement only needs to be MO_ACQ_REL since the reference
>>> ?200?? // counter is volatile.
>>> ?201?? Atomic::dec(&_ref_count);
>>>
>>> volatile is irrelevant with regards to memory ordering as it is a 
>>> compiler annotation. And you haven't specified any memory order 
>>> value so the default is conservative ie. implied full fence. (I see 
>>> the same incorrect comment is in threadSMR.cpp!)
>>
>> I got that wording from threadSMR.cpp and Erik O. confirmed my use of 
>> that
>> wording previously. I'll chase it down with Erik and get back to you.
>>
>>
>>> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>> ?209?? // counter update is seen as soon as possible in a race with the
>>> ?210?? // async deflation protocol.
>>> ?211?? Atomic::inc(&_ref_count);
>>>
>>> Ditto you haven't specified any ordering - and inc() and dec() will 
>>> have the same default.
>>
>> And again, I'll have to chase this down with Erik O. and get back to 
>> you.
>


From david.holmes at oracle.com  Mon Nov  4 13:13:18 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 4 Nov 2019 23:13:18 +1000
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
Message-ID: <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>

Hi Jie,

I will need to take a deeper look at this. This is a problem specific to 
Shenadoah GC as it is triggering a sleep whilst a thread is still in the 
process of attaching to the JVM :(

Thanks,
David

On 4/11/2019 7:16 pm, Jie Fu wrote:
> Hi all,
> 
> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
> 
> According to the comment [1], the assert seems to miss the case for 
> threads attached via JNI.
> For more info, please refer to the JBS.
> 
> Could you please review it and give me some advice?
> 
> Thanks a lot.
> Best regards,
> Jie
> 
> [1] 
> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
> 
> 
> 

From markus.gronlund at oracle.com  Mon Nov  4 13:23:46 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Mon, 4 Nov 2019 05:23:46 -0800 (PST)
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
Message-ID: <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>

Hi Yasumasa and David,

Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.

Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.

If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.

I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.

Perhaps we can do it like this?

http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/ 

Thanks for your patience investigating this

Markus

-----Original Message-----
From: David Holmes 
Sent: den 4 november 2019 05:24
To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Cc: yasuenag at gmail.com
Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state

So looking at Yasumasa's proposed fix ...

I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
Thread::current_or_null_safe() when you need the current thread.

357   assert(thread->is_Java_thread() && 
(((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");

This assertion is incorrect. As this can be called via
VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.

  428 class ThreadInVMForJFR : public StackObj {

Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.

Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:

if (!guard_reentrancy()) {
   return;
} else {
   // Ensure a JavaThread is _thread_in_vm when we make this call
   JavaThreadInVM jtivm(Thread::current_or_null_safe());
   if (!prepare_for_emergency_dump()) {
     return;
   }
}

Thanks,
David
-----


On 4/11/2019 12:24 pm, David Holmes wrote:
> Correction ...
> 
> On 4/11/2019 12:11 pm, David Holmes wrote:
>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>> On 2019/11/04 7:38, David Holmes wrote:
>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> I think you can simplify it to something like this:
>>>>>
>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>
>>>> That is more like I had envisaged for this. Reusing existing 
>>>> thread-state transition code is preferable to adding more custom 
>>>> code that directly manipulates thread-state.
>>>
>>> I do not agree with this change.
>>>
>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>> Thread::current() might be different with it.
>>
>> Not sure what you mean. You only ever manipulate the thread state of 
>> the current thread.
>>
>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to 
>>> change the thread state.
>>> It checks (and manipulates?) something which relates to safepoint.
>>
>> Yes it does - which would be a problem if a safepoint (or handshake) 
>> were pending. But the path through before_exit already has safepoint 
>> checks when you acquire the BeforeExit_lock.
> 
> But that isn't relevant. The issue is we don't want a safepoint check 
> on the report_and_die() path. So a custom transition helper is needed 
> to avoid that.
> 
> David
> 
>> The main problem with the suggestion is it seems we may not be 
>> running in a JavaThread:
>>
>> ??349?? Thread* const thread = Thread::current();
>> ??350?? if (thread->is_Watcher_thread()) {
>>
>> so we can't use the existing thread-state helpers, unless we narrow 
>> the scope (as you do) to after the check for the WatcherThread.
>>
>> David
>> -----
>>
>>> Thus I added ThreadInVMForJFR to new my webrev.
>>
>> Your change still seems overly complicated.
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks
>>>>> Markus
>>>>>
>>>>> -----Original Message-----
>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>> Sent: den 2 november 2019 16:57
>>>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com; 
>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>>> thread state
>>>>>
>>>>> Hi,
>>>>>
>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>> So I updated webrev. Could you review it?
>>>>>
>>>>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>
>>>>> This change passed all tests on submit repo 
>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>> Forward to hotspot-runtime-dev.
>>>>>>
>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>
>>>>>> I'd like to hear about this from JFR folks.
>>>>>> If it is just a bug in JFR, I will create a patch which recover 
>>>>>> it in JFR code.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> -------- Forwarded Message --------
>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread 
>>>>>> state
>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Please review this change:
>>>>>>
>>>>>> ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>> ? ? webrev: 
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>
>>>>>> If JFR is running when JVM crashes, JFR will dump data to 
>>>>>> hs_err_pid<PID>.jfr .
>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>
>>>>>> This change has been tested on submit repo as 
>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>> However this test is for JIT compiler, and related issue has been 
>>>>>> reported as JDK-8225620.
>>>>>> So I think this patch can go through.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Mon Nov  4 13:43:35 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Mon, 4 Nov 2019 22:43:35 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
Message-ID: <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>

Hi Markus,

I thought similar change, and it is running on submit repo:

   http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea

If it passes all tests, I will send review request again.


Thanks,

Yasumasa


On 2019/11/04 22:23, Markus Gronlund wrote:
> Hi Yasumasa and David,
> 
> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
> 
> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
> 
> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
> 
> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
> 
> Perhaps we can do it like this?
> 
> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
> 
> Thanks for your patience investigating this
> 
> Markus
> 
> -----Original Message-----
> From: David Holmes
> Sent: den 4 november 2019 05:24
> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Cc: yasuenag at gmail.com
> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
> 
> So looking at Yasumasa's proposed fix ...
> 
> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
> Thread::current_or_null_safe() when you need the current thread.
> 
> 357   assert(thread->is_Java_thread() &&
> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
> 
> This assertion is incorrect. As this can be called via
> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
> 
>    428 class ThreadInVMForJFR : public StackObj {
> 
> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
> 
> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
> 
> if (!guard_reentrancy()) {
>     return;
> } else {
>     // Ensure a JavaThread is _thread_in_vm when we make this call
>     JavaThreadInVM jtivm(Thread::current_or_null_safe());
>     if (!prepare_for_emergency_dump()) {
>       return;
>     }
> }
> 
> Thanks,
> David
> -----
> 
> 
> 
> On 4/11/2019 12:24 pm, David Holmes wrote:
>> Correction ...
>>
>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> I think you can simplify it to something like this:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>
>>>>> That is more like I had envisaged for this. Reusing existing
>>>>> thread-state transition code is preferable to adding more custom
>>>>> code that directly manipulates thread-state.
>>>>
>>>> I do not agree with this change.
>>>>
>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>> Thread::current() might be different with it.
>>>
>>> Not sure what you mean. You only ever manipulate the thread state of
>>> the current thread.
>>>
>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>> change the thread state.
>>>> It checks (and manipulates?) something which relates to safepoint.
>>>
>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>> were pending. But the path through before_exit already has safepoint
>>> checks when you acquire the BeforeExit_lock.
>>
>> But that isn't relevant. The issue is we don't want a safepoint check
>> on the report_and_die() path. So a custom transition helper is needed
>> to avoid that.
>>
>> David
>>
>>> The main problem with the suggestion is it seems we may not be
>>> running in a JavaThread:
>>>
>>>  ??349?? Thread* const thread = Thread::current();
>>>  ??350?? if (thread->is_Watcher_thread()) {
>>>
>>> so we can't use the existing thread-state helpers, unless we narrow
>>> the scope (as you do) to after the check for the WatcherThread.
>>>
>>> David
>>> -----
>>>
>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>
>>> Your change still seems overly complicated.
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks
>>>>>> Markus
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>> Sent: den 2 november 2019 16:57
>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>> thread state
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>> So I updated webrev. Could you review it?
>>>>>>
>>>>>>  ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>
>>>>>> This change passed all tests on submit repo
>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>
>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>
>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>> it in JFR code.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> -------- Forwarded Message --------
>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>> state
>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Please review this change:
>>>>>>>
>>>>>>>  ? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>  ? ? webrev:
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>
>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>
>>>>>>> This change has been tested on submit repo as
>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>> reported as JDK-8225620.
>>>>>>> So I think this patch can go through.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa

From claes.redestad at oracle.com  Mon Nov  4 13:53:32 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 4 Nov 2019 14:53:32 +0100
Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
 record_free when NMT is off
Message-ID: <e014f422-c100-c22b-b50c-7525fd2c07e9@oracle.com>

Hi,

this patch removes some small but measurable NMT-related overheads when
when NMT is disabled, by moving NMT_off checks out into MemTracker where
they can be more aggressively inlined.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8233494
Webrev: http://cr.openjdk.java.net/~redestad/8233494/open.00/

Motivation:

Overhead of calling MallocTracker methods account for ~15-25% of
instructions retired by os::malloc/realloc/free. On a "Hello World" on
my laptop (no large pages) we already do roughly 9k os::malloc calls, so
this improvement means a reduction in instructions retired by ~250k, or
~0.2% of the total. There is no discernible difference in behavior or
overhead for the case when NMT is enabled.

Testing: tier1-2

Thanks!

/Claes

From martin.doerr at sap.com  Mon Nov  4 14:14:05 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 4 Nov 2019 14:14:05 +0000
Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
 record_free when NMT is off
In-Reply-To: <e014f422-c100-c22b-b50c-7525fd2c07e9@oracle.com>
References: <e014f422-c100-c22b-b50c-7525fd2c07e9@oracle.com>
Message-ID: <VI1PR0201MB2479CC1C7B488257ACC5193C9A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Claes,

this makes sense. Change looks good to me.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net> On Behalf Of Claes Redestad
> Sent: Montag, 4. November 2019 14:54
> To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
> Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
> record_free when NMT is off
> 
> Hi,
> 
> this patch removes some small but measurable NMT-related overheads
> when
> when NMT is disabled, by moving NMT_off checks out into MemTracker
> where
> they can be more aggressively inlined.
> 
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8233494
> Webrev: http://cr.openjdk.java.net/~redestad/8233494/open.00/
> 
> Motivation:
> 
> Overhead of calling MallocTracker methods account for ~15-25% of
> instructions retired by os::malloc/realloc/free. On a "Hello World" on
> my laptop (no large pages) we already do roughly 9k os::malloc calls, so
> this improvement means a reduction in instructions retired by ~250k, or
> ~0.2% of the total. There is no discernible difference in behavior or
> overhead for the case when NMT is enabled.
> 
> Testing: tier1-2
> 
> Thanks!
> 
> /Claes

From zgu at redhat.com  Mon Nov  4 14:18:02 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 4 Nov 2019 09:18:02 -0500
Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
 record_free when NMT is off
In-Reply-To: <VI1PR0201MB2479CC1C7B488257ACC5193C9A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <e014f422-c100-c22b-b50c-7525fd2c07e9@oracle.com>
 <VI1PR0201MB2479CC1C7B488257ACC5193C9A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <531a0bdb-d4ca-ff5b-11a9-4aa18dd735c7@redhat.com>

Looks good to me too.

Thanks,

-Zhengyu

On 11/4/19 9:14 AM, Doerr, Martin wrote:
> Hi Claes,
> 
> this makes sense. Change looks good to me.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>> bounces at openjdk.java.net> On Behalf Of Claes Redestad
>> Sent: Montag, 4. November 2019 14:54
>> To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
>> Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
>> record_free when NMT is off
>>
>> Hi,
>>
>> this patch removes some small but measurable NMT-related overheads
>> when
>> when NMT is disabled, by moving NMT_off checks out into MemTracker
>> where
>> they can be more aggressively inlined.
>>
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8233494
>> Webrev: http://cr.openjdk.java.net/~redestad/8233494/open.00/
>>
>> Motivation:
>>
>> Overhead of calling MallocTracker methods account for ~15-25% of
>> instructions retired by os::malloc/realloc/free. On a "Hello World" on
>> my laptop (no large pages) we already do roughly 9k os::malloc calls, so
>> this improvement means a reduction in instructions retired by ~250k, or
>> ~0.2% of the total. There is no discernible difference in behavior or
>> overhead for the case when NMT is enabled.
>>
>> Testing: tier1-2
>>
>> Thanks!
>>
>> /Claes


From claes.redestad at oracle.com  Mon Nov  4 14:38:29 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 4 Nov 2019 15:38:29 +0100
Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
 record_free when NMT is off
In-Reply-To: <531a0bdb-d4ca-ff5b-11a9-4aa18dd735c7@redhat.com>
References: <e014f422-c100-c22b-b50c-7525fd2c07e9@oracle.com>
 <VI1PR0201MB2479CC1C7B488257ACC5193C9A7F0@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <531a0bdb-d4ca-ff5b-11a9-4aa18dd735c7@redhat.com>
Message-ID: <dd45816f-2250-2ad2-6c11-cde3a4263e4b@oracle.com>

Martin, Zhengyu,

thank you for reviewing!

/Claes

On 2019-11-04 15:18, Zhengyu Gu wrote:
> Looks good to me too.
> 
> Thanks,
> 
> -Zhengyu
> 
> On 11/4/19 9:14 AM, Doerr, Martin wrote:
>> Hi Claes,
>>
>> this makes sense. Change looks good to me.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>> bounces at openjdk.java.net> On Behalf Of Claes Redestad
>>> Sent: Montag, 4. November 2019 14:54
>>> To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
>>> Subject: RFR: 8233494: Avoid calling MallocTracker::record_malloc and
>>> record_free when NMT is off
>>>
>>> Hi,
>>>
>>> this patch removes some small but measurable NMT-related overheads
>>> when
>>> when NMT is disabled, by moving NMT_off checks out into MemTracker
>>> where
>>> they can be more aggressively inlined.
>>>
>>> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233494
>>> Webrev: http://cr.openjdk.java.net/~redestad/8233494/open.00/
>>>
>>> Motivation:
>>>
>>> Overhead of calling MallocTracker methods account for ~15-25% of
>>> instructions retired by os::malloc/realloc/free. On a "Hello World" on
>>> my laptop (no large pages) we already do roughly 9k os::malloc calls, so
>>> this improvement means a reduction in instructions retired by ~250k, or
>>> ~0.2% of the total. There is no discernible difference in behavior or
>>> overhead for the case when NMT is enabled.
>>>
>>> Testing: tier1-2
>>>
>>> Thanks!
>>>
>>> /Claes
> 

From claes.redestad at oracle.com  Mon Nov  4 14:49:04 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 4 Nov 2019 15:49:04 +0100
Subject: RFR[T]: 8233495: Some fieldDescriptor methods can pass existing
 constantPoolHandle
Message-ID: <abc575f3-ada1-570e-7bfb-cbfc28c3058f@oracle.com>

Hi,

(trivial?) patch to avoid dereferencing and then subsequent implicit re-
handleificiation of a constantPoolHandle in two places in
fieldDescriptor.inline.hpp. This has a minor impact on field resolution
overhead in various places.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8233495
Webrev: http://cr.openjdk.java.net/~redestad/8233495/open.00/

Testing: tier1

Thanks!

/Claes

From lois.foltan at oracle.com  Mon Nov  4 15:11:11 2019
From: lois.foltan at oracle.com (Lois Foltan)
Date: Mon, 4 Nov 2019 10:11:11 -0500
Subject: RFR[T]: 8233495: Some fieldDescriptor methods can pass existing
 constantPoolHandle
In-Reply-To: <abc575f3-ada1-570e-7bfb-cbfc28c3058f@oracle.com>
References: <abc575f3-ada1-570e-7bfb-cbfc28c3058f@oracle.com>
Message-ID: <8ed60bf3-cf51-2a61-504f-65c4a36b7e08@oracle.com>

Looks good Claes.? However, because the change is subtle, probably not 
trivial.
Lois

On 11/4/2019 9:49 AM, Claes Redestad wrote:
> Hi,
>
> (trivial?) patch to avoid dereferencing and then subsequent implicit re-
> handleificiation of a constantPoolHandle in two places in
> fieldDescriptor.inline.hpp. This has a minor impact on field resolution
> overhead in various places.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233495
> Webrev: http://cr.openjdk.java.net/~redestad/8233495/open.00/
>
> Testing: tier1
>
> Thanks!
>
> /Claes


From claes.redestad at oracle.com  Mon Nov  4 15:19:41 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 4 Nov 2019 16:19:41 +0100
Subject: RFR[T]: 8233495: Some fieldDescriptor methods can pass existing
 constantPoolHandle
In-Reply-To: <8ed60bf3-cf51-2a61-504f-65c4a36b7e08@oracle.com>
References: <abc575f3-ada1-570e-7bfb-cbfc28c3058f@oracle.com>
 <8ed60bf3-cf51-2a61-504f-65c4a36b7e08@oracle.com>
Message-ID: <a3797ccc-8175-0d0c-25d0-118190e619b7@oracle.com>


On 2019-11-04 16:11, Lois Foltan wrote:
> Looks good Claes.? However, because the change is subtle, probably not 
> trivial.

Thanks, Lois!

Fair enough that the subtlety of this probably makes it non-trivial.

/Claes

From coleen.phillimore at oracle.com  Mon Nov  4 18:51:09 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Mon, 4 Nov 2019 13:51:09 -0500
Subject: RFR[T]: 8233495: Some fieldDescriptor methods can pass existing
 constantPoolHandle
In-Reply-To: <a3797ccc-8175-0d0c-25d0-118190e619b7@oracle.com>
References: <abc575f3-ada1-570e-7bfb-cbfc28c3058f@oracle.com>
 <8ed60bf3-cf51-2a61-504f-65c4a36b7e08@oracle.com>
 <a3797ccc-8175-0d0c-25d0-118190e619b7@oracle.com>
Message-ID: <4ed53905-54d2-3dec-2d26-8f0ae599d736@oracle.com>


Looks good to me also.? I think it is trivial enough.? It would fail 
immediately if not.
thanks,
Coleen

On 11/4/19 10:19 AM, Claes Redestad wrote:
>
>
> On 2019-11-04 16:11, Lois Foltan wrote:
>> Looks good Claes.? However, because the change is subtle, probably 
>> not trivial.
>
> Thanks, Lois!
>
> Fair enough that the subtlety of this probably makes it non-trivial.
>
> /Claes


From thomas.stuefe at gmail.com  Mon Nov  4 19:17:10 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 4 Nov 2019 20:17:10 +0100
Subject: RFR: 8233359: Add global sized operator delete definitions
In-Reply-To: <0B2FC930-99B3-43C7-A20B-B394DAB63D02@oracle.com>
References: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>
 <9d617173-f53a-ea9e-33c4-cb7127c71530@oracle.com>
 <0B2FC930-99B3-43C7-A20B-B394DAB63D02@oracle.com>
Message-ID: <CAA-vtUyN0zuKcy3tzxjabQYR0pOagxvvWE+2q9j=P5Y6ooSJfA@mail.gmail.com>

Hi,

I now get:

/shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/memory/operator_new.cpp:92:6:
error: 'void operator delete(void*, size_t)' is a usual (non-placement)
deallocation function in C++14 (or with -fsized-deallocation)
[-Werror=c++14-compat]
 void operator delete(void* p, size_t size) throw() {
      ^
/shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/memory/operator_new.cpp:96:6:
error: 'void operator delete [](void*, size_t)' is a usual (non-placement)
deallocation function in C++14 (or with -fsized-deallocation)
[-Werror=c++14-compat]
 void operator delete [](void* p, size_t size) throw() {
      ^

when building on Ubuntu 16.4 using gcc 5.4.0.

This used to build without problems on Ubuntu 16.4 with the stock compiler.
This is really a pain :(

..Thomas


On Fri, Nov 1, 2019 at 6:32 AM Kim Barrett <kim.barrett at oracle.com> wrote:

> > On Nov 1, 2019, at 1:07 AM, David Holmes <david.holmes at oracle.com>
> wrote:
> >
> > Hi Kim,
> >
> > That looks fine and trivial IMO.
>
> Thanks.
>
> >
> > Thanks,
> > David
> >
> > On 1/11/2019 11:15 am, Kim Barrett wrote:
> >> Please review this addition of replacement implementations for the
> >> global sized deallocation functions that were added by C++14.
> >> Since Visual Studio 2017 or later always provides C++14 or later, we
> >> should be including these when using those compiler versions.
> >> We also need these definitions when doing experimental C++14 builds
> >> with gcc (in preparation for JEP 347), to avoid -Wsized-deallocation
> >> warnings (enabled by the recent addition of -Wextra).
> >> Rather than trying to determine whether the definitions are needed or
> >> not, we add them unconditionally. It's harmless to provide such
> >> definitions in non-product builds for pre-C++14 compilers; they just
> >> won't ever be called.
> >> CR:
> >> https://bugs.openjdk.java.net/browse/JDK-8233359
> >> Webrev:
> >> https://cr.openjdk.java.net/~kbarrett/8233359/open.00/
> >> Testing:
> >> mach5 tier1
>
>
>

From kim.barrett at oracle.com  Mon Nov  4 19:34:18 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 4 Nov 2019 14:34:18 -0500
Subject: RFR: 8233359: Add global sized operator delete definitions
In-Reply-To: <CAA-vtUyN0zuKcy3tzxjabQYR0pOagxvvWE+2q9j=P5Y6ooSJfA@mail.gmail.com>
References: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>
 <9d617173-f53a-ea9e-33c4-cb7127c71530@oracle.com>
 <0B2FC930-99B3-43C7-A20B-B394DAB63D02@oracle.com>
 <CAA-vtUyN0zuKcy3tzxjabQYR0pOagxvvWE+2q9j=P5Y6ooSJfA@mail.gmail.com>
Message-ID: <E01863DC-CB9F-42F4-8A2F-E5A5DBD04ABE@oracle.com>

> On Nov 4, 2019, at 2:17 PM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> 
> Hi,
> 
> I now get:
> 
> /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/memory/operator_new.cpp:92:6: error: 'void operator delete(void*, size_t)' is a usual (non-placement) deallocation function in C++14 (or with -fsized-deallocation) [-Werror=c++14-compat]
>  void operator delete(void* p, size_t size) throw() {
>       ^
> /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/memory/operator_new.cpp:96:6: error: 'void operator delete [](void*, size_t)' is a usual (non-placement) deallocation function in C++14 (or with -fsized-deallocation) [-Werror=c++14-compat]
>  void operator delete [](void* p, size_t size) throw() {
>       ^

Where is -Wc++14-compat being turned on?  Hm, that?s in -Wall?  That?s really rude!
So why doesn?t a more recent gcc version (we?re using 8.3 now) complain too?

> when building on Ubuntu 16.4 using gcc 5.4.0.
> 
> This used to build without problems on Ubuntu 16.4 with the stock compiler. This is really a pain :(

Agreed.

I guess we can add "-Wno-c++11-compat -Wno-c++14-compat -Wno-c++17-compat? for now.


From kim.barrett at oracle.com  Mon Nov  4 20:01:08 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 4 Nov 2019 15:01:08 -0500
Subject: RFR: 8233359: Add global sized operator delete definitions
In-Reply-To: <E01863DC-CB9F-42F4-8A2F-E5A5DBD04ABE@oracle.com>
References: <4954BF57-8C96-41F9-A2D2-1E78D1505037@oracle.com>
 <9d617173-f53a-ea9e-33c4-cb7127c71530@oracle.com>
 <0B2FC930-99B3-43C7-A20B-B394DAB63D02@oracle.com>
 <CAA-vtUyN0zuKcy3tzxjabQYR0pOagxvvWE+2q9j=P5Y6ooSJfA@mail.gmail.com>
 <E01863DC-CB9F-42F4-8A2F-E5A5DBD04ABE@oracle.com>
Message-ID: <CF79DE99-DE3F-4796-88DD-DBD2F3E76B46@oracle.com>

> On Nov 4, 2019, at 2:34 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> when building on Ubuntu 16.4 using gcc 5.4.0.
>> 
>> This used to build without problems on Ubuntu 16.4 with the stock compiler. This is really a pain :(
> 
> Agreed.
> 
> I guess we can add "-Wno-c++11-compat -Wno-c++14-compat -Wno-c++17-compat? for now.

https://bugs.openjdk.java.net/browse/JDK-8233530

An alternative would be to suppress -Wc++14-compat around those two definitions.  The compat
options have probably been somewhat helpful, with -Wnarrowing and such.


From daniel.daugherty at oracle.com  Mon Nov  4 21:03:09 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 4 Nov 2019 16:03:09 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
Message-ID: <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>

Greetings,

I have made changes to the Async Monitor Deflation code in response to
the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
and Erik O. for their comments!

JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
Deflation project needs to push before Nov. 12, 2019 in order to allow
for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
so we have 8 days from today to finish this code review cycle and push
this code for JDK14.

Carsten and Roman! Time for you guys to chime in again on the code reviews.

I have attached the change list from CR7 to CR8 instead of putting it in
the body of this email. I've also added a link to the CR7-to-CR8-changes
file to the webrevs so it should be easy to find.

Main bug URL:

 ??? JDK-8153224 Monitor deflation prolong safepoints
 ??? https://bugs.openjdk.java.net/browse/JDK-8153224

The project is currently baselined on jdk-14+21.

Here's the full webrev URL for those folks that want to see all of the
current Async Monitor Deflation code in one go (v2.08 full):

http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full

Some folks might want to see just what has changed since the last review
cycle so here's a webrev for that (v2.08 inc):

http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/

The OpenJDK wiki did not need any changes for this round:

https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation

The jdk-14+21 based v2.08 version of the patch has been thru Mach5 tier[1-8]
testing on Oracle's usual set of platforms. It has also been through my 
usual
set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
my lab. Some testing is still running, but so far there are no new 
regressions.

I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 bits.

Thanks, in advance, for any questions, comments or suggestions.

Dan


On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> The Async Monitor Deflation project is reaching the end game. I have no
> changes planned for the project at this time so all that is left is code
> review and any changes that results from those reviews.
>
> Carsten and Roman! Time for you guys to chime in again on the code 
> reviews.
>
> I have attached the list of fixes from CR6 to CR7 instead of putting it
> in the main body of this email.
>
> Main bug URL:
>
> ??? JDK-8153224 Monitor deflation prolong safepoints
> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>
> The project is currently baselined on jdk-14+19.
>
> Here's the full webrev URL for those folks that want to see all of the
> current Async Monitor Deflation code in one go (v2.07 full):
>
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full
>
> Some folks might want to see just what has changed since the last review
> cycle so here's a webrev for that (v2.07 inc):
>
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/
>
> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
> changes:
>
> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>
> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
> tier[1-8]
> testing on Oracle's usual set of platforms. It has also been through 
> my usual
> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
> addition
> of Robbin's "MoCrazy 1024" test running in parallel with the other 
> tests in
> my lab.
>
> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
> tier[1-3]
> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>
> I did another round of SPECjbb2015 testing in Oracle's Aurora 
> Performance lab
> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>
> ??? - "base" is jdk-14+18
> ??? - "v2.07" is the latest version and includes C2 inc_om_ref_count() 
> support
> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
> specified
>
> ???????? hbIR?????????? hbIR
> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
> ??? ---------------? ---------? --------? -------------? -------
> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
> v2.07_handshake
> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 
> v2.07_off
>
> ??? - The Aurora Perf comparison tool reports:
>
> ??????? Comparison????????????? max-jOPS critical-jOPS
> ??????? ----------------------? -------------------- --------------------
> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
> p=0.790)
> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
> p=0.536)
> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
> p=0.347)
>
> ??????? (s) - significant? (ns) - not-significant
>
> ??? - For historical comparison, the Aurora Perf comparision tool
> ??????? reported for v2.06 with a baseline of jdk-13+31:
>
> ??????? Comparison????????????? max-jOPS critical-jOPS
> ??????? ----------------------? -------------------- --------------------
> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
> p=0.646)
> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
> p=0.481)
>
> ??????? (s) - significant? (ns) - not-significant
>
> Thanks, in advance, for any questions, comments or suggestions.
>
> Dan
>
>
> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>> for our first code review in that new context!!
>>
>> I've been focused on changing the monitor list management code to be
>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>> like that, it takes a while to chase down all the new and wonderful
>> races. At this point, I have the code back to the same stability that
>> I had with CR5/v2.05/8-for-jdk13.
>>
>> To lay the ground work for this round of review, I pushed the following
>> two fixes to jdk/jdk earlier today:
>>
>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>> preparation
>> ? ? ??????????? for lock free Monitor lists
>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>
>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails after 
>> 8230184
>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>
>> I have attached the list of fixes from CR5 to CR6 instead of putting
>> in the main body of this email.
>>
>> Main bug URL:
>>
>> ??? JDK-8153224 Monitor deflation prolong safepoints
>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+11 plus the fixes for
>> JDK-8230184 and JDK-8230317.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.06 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>
>>
>>
>> The primary focus of this review cycle is on the lock-free Monitor List
>> management changes so here's a webrev for just that patch (v2.06c):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>
>>
>> The secondary focus of this review cycle is on the bug fixes that have
>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>> patch (v2.06b):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>
>>
>> The third and final bucket for this review cycle is the rename, 
>> whitespace,
>> indent and comments changes made in preparation for lock free Monitor 
>> list
>> management. Almost all of that was extracted into JDK-8230184 for the
>> baseline so this bucket now has just a few comment changes relative to
>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>
>>
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.06 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
>>
>>
>> Last, but not least, some folks might want to see the code before the
>> addition of lock-free Monitor List management so here's a webrev for
>> that (v2.00 -> v2.05):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/
>>
>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> but that should only be changes to describe per-thread list async 
>> monitor
>> deflation being done by the ServiceThread.
>>
>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>
>> This version of the patch has been thru Mach5 tier[1-8] testing on
>> Oracle's usual set of platforms. It has also been through my usual set
>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>
>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was using
>> this patch baselined on jdk-13+31 (for stability):
>>
>> ????????? hbIR?????????? hbIR
>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>> ???? ---------------? ---------? --------? ------------- -------
>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>> v2.06d_off
>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>> v2.06d_ocache
>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>> v2.05_off
>>
>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>> the "ocache" config is with 128 byte cache line sizes instead of 64 byte
>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>> before
>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>> buckets for this review recycle.
>>
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I've been focused on chasing down and fixing the rare test failures
>>> that only pop up rarely. So this round is primarily fixes for races
>>> with a few additional fixes that came from Karen's review of CR4.
>>> Thanks Karen!
>>>
>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-13+29. This will likely be
>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>> (jdk/jdk) repo soon...
>>>
>>> Here's the full webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>
>>> Here's the incremental webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>
>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>> to match the CR5 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>
>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>> available.
>>> Since I haven't made any performance changes in this round, I'll only
>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>
>>> Next up:
>>>
>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>> ? Linux-X64 and we've determined that some of that comes from
>>> ? contention on the gListLock. So I'm going to investigate removing
>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>> - Of course, going lock free often causes new races and new failures
>>> ? so that's a good reason for make those changes isolated in their
>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>> - I finally have a potential fix for the Win* failure with
>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>> ? but I haven't run it through Mach5 yet so it'll be in the next round.
>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>> ? monitor related failures there. I suspect that I need to go take a
>>> ? look at the C2 RTM macro assembler code and look for things that 
>>> might
>>> ? conflict if Async Monitor Deflation. If you're interested in that 
>>> kind
>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>> ? added in this round!
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have a fix for an issue that came up during performance testing.
>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>> experiments.
>>>>
>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>> verbose due to the complexity of the issue, but the changes
>>>> themselves are not that big.
>>>>
>>>> Functional:
>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>> ????? monitors_used_above_threshold() for safepoint based
>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>> ????? there is a special deflation request, e.g., System.gc()
>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>> ??????? safepoints that simply request async deflation which
>>>> ??????? keeps the async JavaThreads from making progress on
>>>> ??????? their async deflation work.
>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>> ??? - is_async_deflation_needed() returns true when
>>>> ????? is_async_cleanup_requested() is true or when
>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>> ????? AsyncDeflationInterval).
>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>> ????? at most GuaranteedSafepointInterval millis:
>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>> ??????? (default is 1000 millis/1 second)
>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>> ??????? This is to prevent async deflation from swamping the
>>>> ??????? ServiceThread.
>>>> ? - The ServiceThread still handles async deflation of the global
>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>> ??? of their in-use lists.
>>>> ??? - The ServiceThread will check for async deflation work every
>>>> ????? GuaranteedSafepointInterval.
>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>> ????? async deflation work via is_async_deflation_requested.
>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>
>>>> Test update:
>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>> ??? AsyncDeflateIdleMonitors.
>>>>
>>>> Collateral:
>>>> ? - Add/clarify/update some logging messages.
>>>>
>>>> Cleanup:
>>>> ? - Updated comments based on Karen's code review.
>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>> ??? 'async cleanup' -> 'async deflation'.
>>>> ??? - comment and function name changes
>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-13+22.
>>>>
>>>> Here's the full webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>
>>>> Here's the incremental webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>
>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>> mechanism so I have to figure out how to add that content.
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I had some discussions with Karen about a race that was in the
>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>> simple: remove the special case code for async deflation in the
>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>> for ObjectMonitor::enter() protection.
>>>>>
>>>>> During those discussions Karen also floated the idea of using the
>>>>> ref_count field instead of the contentions field for the Async
>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>> Sorry for all the names...
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+18.
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>
>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>> stress kit is running right now.
>>>>>
>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>> the results and analyze them.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>>> project ready to go. It's also known as v2.02 (for those for with 
>>>>>> the
>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). 
>>>>>> Sorry
>>>>>> for all the names...
>>>>>>
>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>> patch
>>>>>> is out of our hair.
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>
>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>> running
>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>> my jdk-13+18 stress run is done).
>>>>>>
>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>> testing is done.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>> and as
>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>> names...
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> Baseline bug fixes URL:
>>>>>>>
>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>> Deflation project
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>
>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>
>>>>>>>
>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>
>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>> version
>>>>>>> of JDK-8153224...
>>>>>>>
>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>> Solaris-X64
>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>> work on:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>
>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>
>>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>> Kitchensink
>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>> fastdebug
>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>>> stress
>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>> fastdebug and slowdebug).
>>>>>>>>
>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>> latest version of the patch.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> P.S.
>>>>>>>> One subtest in 
>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>> aggravating
>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
Functional:
  David H. discussion about JDK-8230249:
  - Use ref_counts to protect ObjectMonitor* cached in JavaThread
    current_pending_monitor and current_waiting_monitor fields
  - Update current_pending_monitor() and current_waiting_monitor()
    callers to use ObjectMonitorHandles and new set_om_ptr_if_safe()
  David H. 1 of 3 CR7 review:
  - Remove unnecessary load_acquire() and release_store() calls from
    the counter fields.
  Robbin CR7 review:
  - Add mark_next_for_traversal() for implementing safe traversal of
    lists by concurrent readers.
  - Update ObjectMonitor audit and checking functions to do safe
    traversal of the lists where needed.
  David H. 3 of 3 CR7 review:
  - owner_is_DEFLATER_MARKER() and ref_count no longer use load_acquire().
  - Rename three param set_owner_from() to simply_set_owner_from() and
    drop cmpxchg() so the set of the _owner field to new_value happens
    without any memory sync.
  - Add two param simply_set_owner_from() that sets the _owner field to
    new_value without any memory sync.
  - Rename set_owner_from_BasicLock() to simply_set_owner_from_BasicLock(),
    drop "assert(is_lock_owned(basic_lock_p))" since all the callers
    already check that.
  Erik O. bug fix:
  - The BasicLock needs to be initialized to markWord::unused_mark before
    C2 inc_om_ref_count() can take a branch to DONE_LABEL for the slow path.
  David H. CR7 follow-up:
  - mark_next() does not need load_acquire(&_next_om) since it is followed
    by cmpxchg(); regular load will do.
  - g_block_list is only changed by cmpxchg() so load_acquire(&g_block_list)
    is not needed; missed these from a previous David H. comment.
  Self-review:
  - Verify that all primary list head changes are made with mark_list_head()
    which uses cmpxchg(); list head loads can be done with regular loads
    instead of load_acquire() and all unmarking of list heads can be done
    with a regular store followed by an OrderAccess::storestore() instead
    of a release_store().

Test update:
  - no changes

Collateral:
  - Drop 'ObjectMonitorHandle(ObjectMonitor*)' ctr and use new
    set_om_ptr_if_safe() in monitors_iterate()
  - Drop 'ObjectMonitor::is_active()' and use '!ObjectMonitor::is_free()'
    directly.

Cleanup:
  - Add spaces around a few binary operators.
  - ObjectSynchronizer::finish_deflate_idle_monitors() should do nothing
    with async deflation unless a special deflation has been requested.
    - restore finish_deflate_idle_monitors() to mostly match the baseline
    - audit_and_print_stats() is now called from do_safepoint_work()
      after async deflation is requested.
    - deflate_idle_monitors_using_JT() now logs the global list counters
      at the Info level for async deflation.
  David H. 3 of 3 CR7 review:
  - Update assert()'s and guarantee()'s that refer to ref_count() to save
    a local copy of the value, do the check with the local copy and report
    any failure with the local copy and the current ref_count() value for
    comparison.
  - Update (monitorinflation, owner) log messages to provide more caller
    context.
  - Restore "assert(this->object() != NULL)" in ObjectMonitor::enter()
    to the baseline version.
  - Add a clarifying comment in is_busy_to_string().
  - Delete stale baseline comment in ObjectMonitor::exit().

Temporary:
  - no changes

From daniel.daugherty at oracle.com  Mon Nov  4 21:20:00 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 4 Nov 2019 16:20:00 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
Message-ID: <9961fc91-5136-b4a6-3e07-66c76ebc4e4d@oracle.com>

Hi David,

This set of comments is not addressed in the CR8/v2.08/11-for-jdk14
code review request that I just sent out.

I'll have to go through these comments and address them as part of the
CR8 resolution cycle.

Dan


On 11/4/19 1:28 AM, David Holmes wrote:
> Hi Dan,
>
> A few follow ups to your responses, with trimming ...
>
> On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
>> On 10/24/19 7:00 AM, David Holmes wrote:
>>> ?122 // Set _owner field to new_value; current value must match 
>>> old_value.
>>> ?123 inline void ObjectMonitor::set_owner_from(void* new_value, 
>>> void* old_value) {
>>> ?124?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>> ?125?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>>> INTPTR_FORMAT
>>>
>>> The use of cmpxchg seems a little strange here if you are asserting 
>>> that when this is called _owner must equal old_value. That means you 
>>> don't expect any race and if there is no race with another thread 
>>> writing to _owner then you don't need the cmpxchg. A normal:
>>>
>>> if (_owner == old_value) {
>>> ?? Atomic::store(&_owner, new_value);
>>> ?? log(...);
>>> } else {
>>> ?? guarantee(false, " unexpected old owner ...");
>>> }
>>
>> The two parameter version of set_owner_from() is only called from three
>> places and we'll cover two of them here:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp:
>>
>> 1041???? if (AsyncDeflateIdleMonitors) {
>> 1042?????? set_owner_from(NULL, Self);
>> 1043???? } else {
>> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>> the lock
>> 1045?????? OrderAccess::storeload();??????????????????????? // See if 
>> we need to wake a successor
>> 1046???? }
>>
>> and:
>>
>> 1221?? if (AsyncDeflateIdleMonitors) {
>> 1222???? set_owner_from(NULL, Self);
>> 1223?? } else {
>> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
>> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
>> _owner vs LD in unpark()
>> 1226?? }
>>
>> So I've replaced the existing {release_store(), storeload()} combo 
>> for one
>> call site and the existing {release_store(), fence()} combo for the 
>> other
>> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
>>
>> 1) I wanted the same memory sync behavior at both call sites.
>> 2) I wanted similar/same memory sync behavior as the original
>> ??? code at those call sites.
>
> Why? The memory sync requirements for non-async deflation may be 
> completely different to those required for async-delfation (given all 
> the other bits if the protocol).
>
>> 3) I wanted the return value from cmpxchg() for my state machine
>> ??? sanity check.
>
> I'm somewhat dubious about using cmpxchg just for the side-effect of 
> getting the existing value.
>
>> I don't think that using 'Atomic::store(&_owner, new_value)' is the
>> right choice for these two call sites.
>
> If you don't actually need the cmpxchg to handle concurrent updates to 
> the _owner field, then a plain store (not an Atomic::store - that was 
> an error on my part) does not seem unreasonable; or if there are still 
> memory sync issues here, perhaps a release_store.
>
> If you use cmpxchg then anyone reading the code will assume there is a 
> concurrent update that you are guarding against.
>
>> The last two parameter set_owner_from() is talked about in the
>> next reply.
>>
>>
>>> Similarly for the old_value1/old_valuie2 version.
>>
>> The three parameter version of set_owner_from() is only called from one
>> place and the last two parameter version is called from the same place:
>>
>> src/hotspot/share/runtime/synchronizer.cpp:
>>
>> 1903?????? if (AsyncDeflateIdleMonitors) {
>> 1904???????? m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
>> 1905?????? } else {
>> 1906???????? m->set_owner_from(mark.locker(), NULL);
>> 1907?????? }
>>
>> The original code was:
>>
>> 1399?????? m->set_owner(mark.locker());
>>
>> The original set_owner() code was defined like this:
>>
>> ?? 87 inline void ObjectMonitor::set_owner(void* owner) {
>> ?? 88?? _owner = owner;
>> ?? 89 }
>>
>> So the original code didn't do any memory sync'ing at all and I've
>> changed that to a cmpxchg() on both code paths. That appears to be
>> overkill for that callsite...
>
> Again I'm not sure any memory sync requirements from the non-async 
> case should necessarily transfer over to the async case. Even if you 
> end up requiring similar memory sync the reasoning would be quite 
> different I would expect.
>
>>
>> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
>> section of the code. We've gotten our ObjectMonitor from om_alloc()
>> and are initializing a number of fields in the ObjectMonitor. The
>> ObjectMonitor is not published until we do:
>>
>> 1916?????? object->release_set_mark(markWord::encode(m));
>>
>> So we don't need the memory sync'ing features of the cmpxchg() for
>> either of the set_owner_from() calls and all that leaves is the
>> state machine sanity check.
>>
>> I really like the state machine sanity check on the owner field but
>> that's just because it came in handy when chasing the recent races.
>> It would be easy to change the three parameter version of
>> set_owner_from() to not do memory sync'ing, but still do the state
>> machine sanity check.
>>
>> Update: Changing the three parameter version of set_owner_from()
>> may impact the changes to owner_is_DEFLATER_MARKER() discussed
>> above. Sigh...
>> Update 2: Probably no impact because the three parameter version of
>> set_owner_from() is only used before the ObjectMonitor is published
>> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
>> appeared on an in-use list.
>>
>> However, the two parameter version of set_owner_from() needs its
>> memory sync'ing behavior for it's objectMonitor.cpp call sites so
>> this call site would need something different.
>>
>> I'm not sure which solution I'm going to pick yet, but I definitely
>> have to change something here since we don't need cmpxchg() at this
>> call site. More thought is required.
>
> I will look to see where this ended up.
>
>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>
>>>
>>> ?267?? if (AsyncDeflateIdleMonitors &&
>>> ?268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>>> DEFLATER_MARKER) {
>>
>> For more context, we are in:
>>
>> ??241 void ObjectMonitor::enter(TRAPS) {
>>
>>
>>> I don't see why you need to call try_set_owner_from again here as 
>>> "cur" will already be DEFLATER_MARKER from the previous try_set_owner.
>>
>> I assume the previous try_set_owner() call you mean is this one:
>>
>> ??248?? void* cur = try_set_owner_from(Self, NULL);
>>
>> This first try_set_owner() is for the most common case of no owner.
>>
>> The second try_set_owner() call is for a different condition than the 
>> first:
>>
>> ??268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>> DEFLATER_MARKER) {
>>
>> L248 is trying to change the _owner field from NULL -> 'Self'.
>> L268 is trying to change the _owner field from DEFLATER_MARKER to 
>> 'Self'.
>>
>> If the try_set_owner() call on L248 fails, 'cur' can be several possible
>> values:
>>
>> ?? - the calling thread (recursive enter is handled on L254-7)
>> ?? - other owning thread value (BasicLock* or Thread*)
>> ?? - DEFLATER_MARKER
>
> I'll give a caution okay to that explanation (the deficiency being in 
> my understanding, not your explaining :) ).
>
>>> Further, I don't see how installing self as the _owner here is valid 
>>> and means you acquired the monitor, as the fact it was 
>>> DEFLATER_MARKER means it is still being deflated by another thread 
>>> doesn't it ???
>>
>> I guess the comment after L268 didn't work for you:
>>
>> ??269???? // The deflation protocol finished the first part (setting 
>> owner),
>> ??270???? // but it failed the second part (making ref_count 
>> negative) and
>> ??271???? // bailed. Or the ObjectMonitor was async deflated and reused.
>>
>> It means that the deflater thread was racing with this enter and
>> managed to set the owner field to DEFLATER_MARKER as the first step
>> in the deflation protocol. Our entering thread actually won the race
>> when it managed to set the ref_count to a positive value as part of
>> the ObjectMonitorHandle stuff done in the inflate() call that preceded
>> the enter() call. However, the deflater thread hasn't realized that it
>> lost the race yet and hasn't restored the owner field back to NULL.
>
> You're right the comment didn't work for me as it required me to be 
> holding too much of the protocol in my head. Makes more sense now.
>
> Thanks,
> David
> -----


From daniel.daugherty at oracle.com  Mon Nov  4 21:25:20 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 4 Nov 2019 16:25:20 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
 <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>
Message-ID: <17b0ec76-6f32-fd7d-6486-4df21582ce03@oracle.com>

Hi David and Erik,

Thanks for chiming in here Erik...

This set of comments is not addressed in the CR8/v2.08/11-for-jdk14
code review request that I just sent out.

I've read this response twice and I'm not quite sure what to do with it
relative to David's CR comment. I'll repeat those here:

 >? 199 // The decrement only needs to be MO_ACQ_REL since the reference
 >? 200?? // counter is volatile.
 >? 201?? Atomic::dec(&_ref_count);
 >
 > volatile is irrelevant with regards to memory ordering as it is a 
compiler
 > annotation. And you haven't specified any memory order value so the 
default
 > is conservative ie. implied full fence. (I see the same incorrect comment
 > is in threadSMR.cpp!)

Should I delete this comment? Or should it be changed? If changed, then
what text do you recommend here?


 > 208?? // The increment needs to be MO_SEQ_CST so that the reference
 >? 209?? // counter update is seen as soon as possible in a race with the
 >? 210?? // async deflation protocol.
 >? 211?? Atomic::inc(&_ref_count);
 >
 > Ditto you haven't specified any ordering - and inc() and dec() will 
have the same default.

Should I delete this comment? Or should it be changed? If changed, then
what text do you recommend here?

Dan


On 11/4/19 8:09 AM, erik.osterlund at oracle.com wrote:
> Hi,
>
> TL/DR: David is right; the commentary is weird and does not capture 
> what the real constraints are.
>
> As the comment implied before "8222034: Thread-SMR functions should be 
> updated to remove work around", the PPC port used to have incorrect 
> memory ordering, and the code guarded against that. inc/dec used to be 
> memory_order_relaxed and add/sub used to be memory_order_acq_rel on 
> PPC, despite the shared contract promising memory_order_conservative.
>
> The implication for the nested counter in the Thread SMR project was 
> that I wanted to use the inc/dec API but knew it was not gonna work as 
> expected on PPC because we really needed *at least* 
> memory_order_acq_rel when decrementing (and memory_order_conservative 
> when incrementing, which was simulated in a CAS loop... yuck), but 
> would find ourselves getting memory_order_relaxed. Rather than 
> treating it as a bug in the PPC atomics implementation, and having the 
> code be broken while we waited for a fix, I changed the use to sub 
> when decrementing (which gave me the required memory_order_acq_rel 
> ordering I needed), and the horrible CAS loop when incrementing, as a 
> workaround, and alerted Martin Doerr that this would needed to be 
> sorted out in the PPC code. Since then, the PPC code did indeed get 
> cleaned up so that inc/dec stopped being relaxed-only and worked as 
> advertised.
>
> After that, the "8222034: Thread-SMR functions should be updated to 
> remove work around" change removed the workaround that was no longer 
> required from the code, and put back the desired inc/dec calls (which 
> now used an overly conservative memory_order_conservative ordering, 
> which is suboptimal, in particular for decrements, but importantly not 
> incorrect). Since the nested case would almost never run and is 
> possibly the coldest code path in the VM, I did not care to comment in 
> that review thread about optimizing it by explicitly passing in a 
> weaker ordering. However, I should have commented on the comment that 
> was changed, which does indeed look a bit confused. David is right 
> that the stuff about volatile has nothing to do with why this is 
> correct. The correctness required memory_order_acq_rel for decrements, 
> but the implementation provided more, which is fine.
>
> The actual reason why I wanted memory_order_conservative for 
> correctness when incrementing and memory_order_acq_rel when 
> decrementing, was to prevent accesses inside of the critical section 
> (in particular - reading Thread*s from the acquired ThreadsList), from 
> floating outside of the reference increment and decrement that marks 
> reading the list as safe to access without the underlying list blowing 
> up. In practice, it might have been possible to relax it a bit by 
> relying on side effects of other unrelated parts of the protocol to 
> have spurious fencing... but I did not want to get the protocol 
> tangled in that way because it would be difficult to reason about.
>
> Hope this explanation clears up that confusion.
>
> Thanks,
> /Erik
>
> On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
>> Erik,
>>
>> David H. made a comment during this review cycle that should interest 
>> you.
>>
>> The longer version of this comment came up in early reviews of the Async
>> Monitor Deflation code because I copied the code and the longer comment
>> from threadSMR.cpp. I updated the comment based on your input and review
>> and changed the comment and code in threadSMR.cpp and in the Async 
>> Monitor
>> Deflation project code.
>>
>> The change in threadSMR.cpp was done with this changeset:
>>
>> $ hg log -v -r 54517
>> changeset:?? 54517:c201ca660afd
>> user:??????? dcubed
>> date:??????? Thu Apr 11 14:14:30 2019 -0400
>> files:?????? src/hotspot/share/runtime/threadSMR.cpp
>> description:
>> 8222034: Thread-SMR functions should be updated to remove work around
>> Reviewed-by: mdoerr, eosterlund
>>
>> Here's one of the two diffs to job your memory:
>>
>> ?void ThreadsList::dec_nested_handle_cnt() {
>> -? // The decrement needs to be MO_ACQ_REL. At the moment, the 
>> Atomic::dec
>> -? // backend on PPC does not yet conform to these requirements. 
>> Therefore
>> -? // the decrement is simulated with an Atomic::sub(1, &addr).
>> -? // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR 
>> mechanism
>> -? // is not generally safe to use.
>> -? Atomic::sub(1, &_nested_handle_cnt);
>> +? // The decrement only needs to be MO_ACQ_REL since the reference
>> +? // counter is volatile (and the hazard ptr is already NULL).
>> +? Atomic::dec(&_nested_handle_cnt);
>> ?}
>>
>> Below is David's comment about the code comment...
>>
>> Dan
>>
>>
>> Trimming down to just that issue...
>>
>> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>>> On 10/24/19 7:00 AM, David Holmes wrote:
>> >
>> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>
>>>> ?199 // The decrement only needs to be MO_ACQ_REL since the reference
>>>> ?200?? // counter is volatile.
>>>> ?201?? Atomic::dec(&_ref_count);
>>>>
>>>> volatile is irrelevant with regards to memory ordering as it is a 
>>>> compiler annotation. And you haven't specified any memory order 
>>>> value so the default is conservative ie. implied full fence. (I see 
>>>> the same incorrect comment is in threadSMR.cpp!)
>>>
>>> I got that wording from threadSMR.cpp and Erik O. confirmed my use 
>>> of that
>>> wording previously. I'll chase it down with Erik and get back to you.
>>>
>>>
>>>> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>>> ?209?? // counter update is seen as soon as possible in a race with 
>>>> the
>>>> ?210?? // async deflation protocol.
>>>> ?211?? Atomic::inc(&_ref_count);
>>>>
>>>> Ditto you haven't specified any ordering - and inc() and dec() will 
>>>> have the same default.
>>>
>>> And again, I'll have to chase this down with Erik O. and get back to 
>>> you.
>>
>


From m.sundar85 at gmail.com  Mon Nov  4 22:43:57 2019
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Mon, 4 Nov 2019 14:43:57 -0800
Subject: JVM stuck/looping in futex call
In-Reply-To: <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>
References: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
 <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>
Message-ID: <CACGCMVqRdOnjstXgFZ3AGd2Fxo8LBrTeHkC6r7EJZ311-M_a=w@mail.gmail.com>

HI David,
    Did you mean to get stack trace of that process? I could attach to gdb
but not sure where to keep breakpoint.
More info on how to get this will be helpful.


Thanks
Sundar

On Fri, Nov 1, 2019 at 4:03 PM David Holmes <david.holmes at oracle.com> wrote:

> Hi Sundar,
>
> On 2/11/2019 5:39 am, Sundara Mohan M wrote:
> > Hi,
> >      I am running openjdk12/Linux on our systems and see jvm not
> responding
> > to jstack or any diagnostic command (jcmd VM.info/Thread.print). Though
> > application is running fine.
>
> That would sound like the attach thread (which would respond to the
> jstack or other diagnostic command) is in some kind of bad state.
>
> > I see following stack track
> >
> > Process 115586 attached
> > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> > restarted if SA_RESTART is set)
> > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> si_uid=1000}
> > ---
> > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> > rt_sigreturn()                          = 202
> > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> > restarted if SA_RESTART is set)
> > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> si_uid=1000}
> > ---
> > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> > rt_sigreturn()                          = 202
> > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> > restarted if SA_RESTART is set)
> > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> si_uid=1000}
> > ---
> > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> > rt_sigreturn()                          = 202
> > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> > restarted if SA_RESTART is set)
> > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> si_uid=1000}
> > ---
> > rt_sigreturn()                          = 202
> > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS (To be
> > restarted if SA_RESTART is set)
> > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> si_uid=1000}
> > ---
> > rt_sigreturn()                          = 202
> > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess 115586 detached
> >   <detached ...>
> >
> > Can someone help me understand what is happening here?
>
> It appears that in responding to the SIGQUIT that is used to trigger the
> starting of the attach listener thread, that something is going wrong.
> We appear to be continually restarting an operation that still sees the
> signal pending - which doesn't really make sense to me. Can you get a
> complete stack trace using gdb?
>
> > Please redirect me to proper ilist if this is not correct list for these
> > type of questions.
>
> This list is fine. It may end up being an issue for serviceability-dev
> but we can deal with that later. :)
>
> Thanks,
> David
> -----
>
> >
> > TIA
> > Sundar
> >
>

From david.holmes at oracle.com  Mon Nov  4 23:16:54 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 09:16:54 +1000
Subject: JVM stuck/looping in futex call
In-Reply-To: <CACGCMVqRdOnjstXgFZ3AGd2Fxo8LBrTeHkC6r7EJZ311-M_a=w@mail.gmail.com>
References: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
 <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>
 <CACGCMVqRdOnjstXgFZ3AGd2Fxo8LBrTeHkC6r7EJZ311-M_a=w@mail.gmail.com>
Message-ID: <350a6e97-41ee-f49f-0354-ec655d6490da@oracle.com>

On 5/11/2019 8:43 am, Sundara Mohan M wrote:
> HI David,
>  ? ? Did you mean to get stack trace of that process? I could attach to 
> gdb but not sure where to keep breakpoint.
> More info on how to get this will be helpful.

I need to see the stack before we hit the looping call, to see what it 
is that triggers the loop. Can you tell what thread is involved?

Is there something special/different about your Linux environment? Do 
you have native threads attached to the VM?

Thanks,
David

> 
> Thanks
> Sundar
> 
> On Fri, Nov 1, 2019 at 4:03 PM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Sundar,
> 
>     On 2/11/2019 5:39 am, Sundara Mohan M wrote:
>      > Hi,
>      >? ? ? I am running openjdk12/Linux on our systems and see jvm not
>     responding
>      > to jstack or any diagnostic command (jcmd VM.info/Thread.print).
>     Though
>      > application is running fine.
> 
>     That would sound like the attach thread (which would respond to the
>     jstack or other diagnostic command) is in some kind of bad state.
> 
>      > I see following stack track
>      >
>      > Process 115586 attached
>      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
>     (To be
>      > restarted if SA_RESTART is set)
>      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>     si_uid=1000}
>      > ---
>      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
>      > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
>     (To be
>      > restarted if SA_RESTART is set)
>      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>     si_uid=1000}
>      > ---
>      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
>      > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
>     (To be
>      > restarted if SA_RESTART is set)
>      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>     si_uid=1000}
>      > ---
>      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
>      > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
>     (To be
>      > restarted if SA_RESTART is set)
>      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>     si_uid=1000}
>      > ---
>      > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
>     (To be
>      > restarted if SA_RESTART is set)
>      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>     si_uid=1000}
>      > ---
>      > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess 115586
>     detached
>      >? ?<detached ...>
>      >
>      > Can someone help me understand what is happening here?
> 
>     It appears that in responding to the SIGQUIT that is used to trigger
>     the
>     starting of the attach listener thread, that something is going wrong.
>     We appear to be continually restarting an operation that still sees the
>     signal pending - which doesn't really make sense to me. Can you get a
>     complete stack trace using gdb?
> 
>      > Please redirect me to proper ilist if this is not correct list
>     for these
>      > type of questions.
> 
>     This list is fine. It may end up being an issue for serviceability-dev
>     but we can deal with that later. :)
> 
>     Thanks,
>     David
>     -----
> 
>      >
>      > TIA
>      > Sundar
>      >
> 

From david.holmes at oracle.com  Mon Nov  4 23:26:44 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 09:26:44 +1000
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
Message-ID: <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>

Hi Jie,

Thanks for filing this and attempting a fix. As per the bug report the 
underlying issue has now been fixed in Shenandoah, but I want to make 
the interrupt code more resilient as well:

http://cr.openjdk.java.net/~dholmes/8233454/webrev/

I was unable to reproduce the Shenandoah crash so if you could test this 
patch I would appreciate it - thanks. (Without the Shenandoah fix of 
course :) )

Meanwhile I'm putting the patch through other testing.

Thanks,
David
-----

On 4/11/2019 11:13 pm, David Holmes wrote:
> Hi Jie,
> 
> I will need to take a deeper look at this. This is a problem specific to 
> Shenadoah GC as it is triggering a sleep whilst a thread is still in the 
> process of attaching to the JVM :(
> 
> Thanks,
> David
> 
> On 4/11/2019 7:16 pm, Jie Fu wrote:
>> Hi all,
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>
>> According to the comment [1], the assert seems to miss the case for 
>> threads attached via JNI.
>> For more info, please refer to the JBS.
>>
>> Could you please review it and give me some advice?
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>
>>
>>

From david.holmes at oracle.com  Mon Nov  4 23:44:51 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 09:44:51 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <17b0ec76-6f32-fd7d-6486-4df21582ce03@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
 <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>
 <17b0ec76-6f32-fd7d-6486-4df21582ce03@oracle.com>
Message-ID: <2c16521b-5694-7f6e-0f54-ee2bddf5563f@oracle.com>

Hi Dan,

Just delete the comments.

Thanks,
David

On 5/11/2019 7:25 am, Daniel D. Daugherty wrote:
> Hi David and Erik,
> 
> Thanks for chiming in here Erik...
> 
> This set of comments is not addressed in the CR8/v2.08/11-for-jdk14
> code review request that I just sent out.
> 
> I've read this response twice and I'm not quite sure what to do with it
> relative to David's CR comment. I'll repeat those here:
> 
>  >? 199 // The decrement only needs to be MO_ACQ_REL since the reference
>  >? 200?? // counter is volatile.
>  >? 201?? Atomic::dec(&_ref_count);
>  >
>  > volatile is irrelevant with regards to memory ordering as it is a 
> compiler
>  > annotation. And you haven't specified any memory order value so the 
> default
>  > is conservative ie. implied full fence. (I see the same incorrect 
> comment
>  > is in threadSMR.cpp!)
> 
> Should I delete this comment? Or should it be changed? If changed, then
> what text do you recommend here?
> 
> 
>  > 208?? // The increment needs to be MO_SEQ_CST so that the reference
>  >? 209?? // counter update is seen as soon as possible in a race with the
>  >? 210?? // async deflation protocol.
>  >? 211?? Atomic::inc(&_ref_count);
>  >
>  > Ditto you haven't specified any ordering - and inc() and dec() will 
> have the same default.
> 
> Should I delete this comment? Or should it be changed? If changed, then
> what text do you recommend here?
> 
> Dan
> 
> 
> On 11/4/19 8:09 AM, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> TL/DR: David is right; the commentary is weird and does not capture 
>> what the real constraints are.
>>
>> As the comment implied before "8222034: Thread-SMR functions should be 
>> updated to remove work around", the PPC port used to have incorrect 
>> memory ordering, and the code guarded against that. inc/dec used to be 
>> memory_order_relaxed and add/sub used to be memory_order_acq_rel on 
>> PPC, despite the shared contract promising memory_order_conservative.
>>
>> The implication for the nested counter in the Thread SMR project was 
>> that I wanted to use the inc/dec API but knew it was not gonna work as 
>> expected on PPC because we really needed *at least* 
>> memory_order_acq_rel when decrementing (and memory_order_conservative 
>> when incrementing, which was simulated in a CAS loop... yuck), but 
>> would find ourselves getting memory_order_relaxed. Rather than 
>> treating it as a bug in the PPC atomics implementation, and having the 
>> code be broken while we waited for a fix, I changed the use to sub 
>> when decrementing (which gave me the required memory_order_acq_rel 
>> ordering I needed), and the horrible CAS loop when incrementing, as a 
>> workaround, and alerted Martin Doerr that this would needed to be 
>> sorted out in the PPC code. Since then, the PPC code did indeed get 
>> cleaned up so that inc/dec stopped being relaxed-only and worked as 
>> advertised.
>>
>> After that, the "8222034: Thread-SMR functions should be updated to 
>> remove work around" change removed the workaround that was no longer 
>> required from the code, and put back the desired inc/dec calls (which 
>> now used an overly conservative memory_order_conservative ordering, 
>> which is suboptimal, in particular for decrements, but importantly not 
>> incorrect). Since the nested case would almost never run and is 
>> possibly the coldest code path in the VM, I did not care to comment in 
>> that review thread about optimizing it by explicitly passing in a 
>> weaker ordering. However, I should have commented on the comment that 
>> was changed, which does indeed look a bit confused. David is right 
>> that the stuff about volatile has nothing to do with why this is 
>> correct. The correctness required memory_order_acq_rel for decrements, 
>> but the implementation provided more, which is fine.
>>
>> The actual reason why I wanted memory_order_conservative for 
>> correctness when incrementing and memory_order_acq_rel when 
>> decrementing, was to prevent accesses inside of the critical section 
>> (in particular - reading Thread*s from the acquired ThreadsList), from 
>> floating outside of the reference increment and decrement that marks 
>> reading the list as safe to access without the underlying list blowing 
>> up. In practice, it might have been possible to relax it a bit by 
>> relying on side effects of other unrelated parts of the protocol to 
>> have spurious fencing... but I did not want to get the protocol 
>> tangled in that way because it would be difficult to reason about.
>>
>> Hope this explanation clears up that confusion.
>>
>> Thanks,
>> /Erik
>>
>> On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
>>> Erik,
>>>
>>> David H. made a comment during this review cycle that should interest 
>>> you.
>>>
>>> The longer version of this comment came up in early reviews of the Async
>>> Monitor Deflation code because I copied the code and the longer comment
>>> from threadSMR.cpp. I updated the comment based on your input and review
>>> and changed the comment and code in threadSMR.cpp and in the Async 
>>> Monitor
>>> Deflation project code.
>>>
>>> The change in threadSMR.cpp was done with this changeset:
>>>
>>> $ hg log -v -r 54517
>>> changeset:?? 54517:c201ca660afd
>>> user:??????? dcubed
>>> date:??????? Thu Apr 11 14:14:30 2019 -0400
>>> files:?????? src/hotspot/share/runtime/threadSMR.cpp
>>> description:
>>> 8222034: Thread-SMR functions should be updated to remove work around
>>> Reviewed-by: mdoerr, eosterlund
>>>
>>> Here's one of the two diffs to job your memory:
>>>
>>> ?void ThreadsList::dec_nested_handle_cnt() {
>>> -? // The decrement needs to be MO_ACQ_REL. At the moment, the 
>>> Atomic::dec
>>> -? // backend on PPC does not yet conform to these requirements. 
>>> Therefore
>>> -? // the decrement is simulated with an Atomic::sub(1, &addr).
>>> -? // Without this MO_ACQ_REL Atomic::dec simulation, the nested SMR 
>>> mechanism
>>> -? // is not generally safe to use.
>>> -? Atomic::sub(1, &_nested_handle_cnt);
>>> +? // The decrement only needs to be MO_ACQ_REL since the reference
>>> +? // counter is volatile (and the hazard ptr is already NULL).
>>> +? Atomic::dec(&_nested_handle_cnt);
>>> ?}
>>>
>>> Below is David's comment about the code comment...
>>>
>>> Dan
>>>
>>>
>>> Trimming down to just that issue...
>>>
>>> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>> >
>>> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>
>>>>> ?199 // The decrement only needs to be MO_ACQ_REL since the reference
>>>>> ?200?? // counter is volatile.
>>>>> ?201?? Atomic::dec(&_ref_count);
>>>>>
>>>>> volatile is irrelevant with regards to memory ordering as it is a 
>>>>> compiler annotation. And you haven't specified any memory order 
>>>>> value so the default is conservative ie. implied full fence. (I see 
>>>>> the same incorrect comment is in threadSMR.cpp!)
>>>>
>>>> I got that wording from threadSMR.cpp and Erik O. confirmed my use 
>>>> of that
>>>> wording previously. I'll chase it down with Erik and get back to you.
>>>>
>>>>
>>>>> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>>>> ?209?? // counter update is seen as soon as possible in a race with 
>>>>> the
>>>>> ?210?? // async deflation protocol.
>>>>> ?211?? Atomic::inc(&_ref_count);
>>>>>
>>>>> Ditto you haven't specified any ordering - and inc() and dec() will 
>>>>> have the same default.
>>>>
>>>> And again, I'll have to chase this down with Erik O. and get back to 
>>>> you.
>>>
>>
> 

From suenaga at oss.nttdata.com  Mon Nov  4 23:56:42 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 08:56:42 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
Message-ID: <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>

On 2019/11/04 22:43, Yasumasa Suenaga wrote:
> Hi Markus,
> 
> I thought similar change, and it is running on submit repo:
> 
>  ? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
> 
> If it passes all tests, I will send review request again.

This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
Could you review again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/

In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.

Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.


Thanks,

Yasumasa


> Thanks,
> 
> Yasumasa
> 
> 
> On 2019/11/04 22:23, Markus Gronlund wrote:
>> Hi Yasumasa and David,
>>
>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>
>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>
>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>
>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>
>> Perhaps we can do it like this?
>>
>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>
>> Thanks for your patience investigating this
>>
>> Markus
>>
>> -----Original Message-----
>> From: David Holmes
>> Sent: den 4 november 2019 05:24
>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Cc: yasuenag at gmail.com
>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>
>> So looking at Yasumasa's proposed fix ...
>>
>> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
>> Thread::current_or_null_safe() when you need the current thread.
>>
>> 357?? assert(thread->is_Java_thread() &&
>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>
>> This assertion is incorrect. As this can be called via
>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>
>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>
>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>
>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>
>> if (!guard_reentrancy()) {
>> ??? return;
>> } else {
>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>> ??? if (!prepare_for_emergency_dump()) {
>> ????? return;
>> ??? }
>> }
>>
>> Thanks,
>> David
>> -----
>>
>>
>>
>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>> Correction ...
>>>
>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> I think you can simplify it to something like this:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>
>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>> thread-state transition code is preferable to adding more custom
>>>>>> code that directly manipulates thread-state.
>>>>>
>>>>> I do not agree with this change.
>>>>>
>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>> Thread::current() might be different with it.
>>>>
>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>> the current thread.
>>>>
>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>> change the thread state.
>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>
>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>> were pending. But the path through before_exit already has safepoint
>>>> checks when you acquire the BeforeExit_lock.
>>>
>>> But that isn't relevant. The issue is we don't want a safepoint check
>>> on the report_and_die() path. So a custom transition helper is needed
>>> to avoid that.
>>>
>>> David
>>>
>>>> The main problem with the suggestion is it seems we may not be
>>>> running in a JavaThread:
>>>>
>>>> ???349?? Thread* const thread = Thread::current();
>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>
>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>
>>>> Your change still seems overly complicated.
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks
>>>>>>> Markus
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>> thread state
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>> So I updated webrev. Could you review it?
>>>>>>>
>>>>>>> ???? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>
>>>>>>> This change passed all tests on submit repo
>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>
>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>
>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>> it in JFR code.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Forwarded Message --------
>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>> state
>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Please review this change:
>>>>>>>>
>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>> ?? ? webrev:
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>
>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>
>>>>>>>> This change has been tested on submit repo as
>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>> reported as JDK-8225620.
>>>>>>>> So I think this patch can go through.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa

From david.holmes at oracle.com  Tue Nov  5 00:17:38 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 10:17:38 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
Message-ID: <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>

On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>> Hi Markus,
>>
>> I thought similar change, and it is running on submit repo:
>>
>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>
>> If it passes all tests, I will send review request again.
> 
> This change passed all tests on submit repo 
> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
> Could you review again?
> 
>  ? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
> 
> In Markus's change, emergency dump will not perform when 
> Thread::current_or_null_safe() returns NULL.
> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID 
> by `kill` command - main thread of the process will be already detached 
> (out of JVM).
> Also the crash might happen in native thread - created by pthread_create 
> (on Linux) from JNI code.
> 
> Thus we should continue to perform emergency dump even if 
> Thread::current_or_null_safe() returns NULL.

I didn't quite follow all that, but if there is no current thread then 
prepare_for_emergency_dump() is either going to assert here:

  348   Thread* const thread = Thread::current();

or crash here:

  349   if (thread->is_Watcher_thread()) {

David
-----

> 
> Thanks,
> 
> Yasumasa
> 
> 
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>> Hi Yasumasa and David,
>>>
>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I 
>>> realized later that, as you have pointed out, it would perform a real 
>>> thread transition. Sorry.
>>>
>>> Taking some input from ThreadInVMFromUnknown and the situations I 
>>> have seen at this location, I think the only case we need to be 
>>> concerned about here is when a JavaThread is _thread_in_native. 
>>> _thread_in_java transition to _thread_in_vm via stubs in 
>>> SharedRuntime (i believe) as part of coming out of the exception 
>>> handler(s). Unfortunately I cannot give a proper argument now to give 
>>> the premises where this invariant is enforced, so let's work with the 
>>> original thread state as you suggested Yasumasa.
>>>
>>> If we can avoid passing the thread all the way through, I think that 
>>> is preferable (this is not performance critical code). David also 
>>> alluded to the fact that you always manipulate the current thread 
>>> anyway. Although very unlikely, we could have run into an issue with 
>>> thread local storage, so it makes sense to test this up front. If we 
>>> cannot read the thread local, the operations we intend to perform 
>>> will fail, so we might just bail out already.
>>>
>>> I took the liberty to tighten up the transition class a little bit; 
>>> you only need to restore the thread state if there was an actual change.
>>>
>>> Perhaps we can do it like this?
>>>
>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>
>>> Thanks for your patience investigating this
>>>
>>> Markus
>>>
>>> -----Original Message-----
>>> From: David Holmes
>>> Sent: den 4 november 2019 05:24
>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund 
>>> <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; 
>>> hotspot-runtime-dev at openjdk.java.net
>>> Cc: yasuenag at gmail.com
>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>> thread state
>>>
>>> So looking at Yasumasa's proposed fix ...
>>>
>>> I don't think it is worth the disruption to pass the "thread" all the 
>>> way through these API's. It is simpler/cleaner to just call
>>> Thread::current_or_null_safe() when you need the current thread.
>>>
>>> 357?? assert(thread->is_Java_thread() &&
>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>>
>>> This assertion is incorrect. As this can be called via
>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that 
>>> we need be in a JavaThread at all.
>>>
>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>
>>> Can I suggest JavaThreadInVM to make it clear this only affects 
>>> JavaThreads. And as it is local we don't need the "forJFR" part.
>>>
>>> Based on Markus's proposed change, and with a view to constrain the 
>>> scope even further can I suggest the following:
>>>
>>> if (!guard_reentrancy()) {
>>> ??? return;
>>> } else {
>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>> ??? if (!prepare_for_emergency_dump()) {
>>> ????? return;
>>> ??? }
>>> }
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>
>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>> Correction ...
>>>>
>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>
>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>> code that directly manipulates thread-state.
>>>>>>
>>>>>> I do not agree with this change.
>>>>>>
>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>> Thread::current() might be different with it.
>>>>>
>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>> the current thread.
>>>>>
>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>> change the thread state.
>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>
>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>> were pending. But the path through before_exit already has safepoint
>>>>> checks when you acquire the BeforeExit_lock.
>>>>
>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>> on the report_and_die() path. So a custom transition helper is needed
>>>> to avoid that.
>>>>
>>>> David
>>>>
>>>>> The main problem with the suggestion is it seems we may not be
>>>>> running in a JavaThread:
>>>>>
>>>>> ???349?? Thread* const thread = Thread::current();
>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>
>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>
>>>>> Your change still seems overly complicated.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Markus
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>> thread state
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>
>>>>>>>> ???? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>
>>>>>>>> This change passed all tests on submit repo
>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>
>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>
>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>> it in JFR code.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>> state
>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Please review this change:
>>>>>>>>>
>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>> ?? ? webrev:
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>
>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>
>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>>> reported as JDK-8225620.
>>>>>>>>> So I think this patch can go through.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa

From fujie at loongson.cn  Tue Nov  5 00:49:44 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 5 Nov 2019 08:49:44 +0800
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
 <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
Message-ID: <36fffa89-05de-5f48-4dfe-ac6c9f4e981a@loongson.cn>

Thanks David for fixing this issue.

I'd like to test your patch and will let you know as soon as possible.

Thanks a lot.
Best regards,
Jie

On 2019/11/5 ??7:26, David Holmes wrote:
> Hi Jie,
>
> Thanks for filing this and attempting a fix. As per the bug report the 
> underlying issue has now been fixed in Shenandoah, but I want to make 
> the interrupt code more resilient as well:
>
> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
>
> I was unable to reproduce the Shenandoah crash so if you could test 
> this patch I would appreciate it - thanks. (Without the Shenandoah fix 
> of course :) )
>
> Meanwhile I'm putting the patch through other testing.
>
> Thanks,
> David
> -----
>
> On 4/11/2019 11:13 pm, David Holmes wrote:
>> Hi Jie,
>>
>> I will need to take a deeper look at this. This is a problem specific 
>> to Shenadoah GC as it is triggering a sleep whilst a thread is still 
>> in the process of attaching to the JVM :(
>>
>> Thanks,
>> David
>>
>> On 4/11/2019 7:16 pm, Jie Fu wrote:
>>> Hi all,
>>>
>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>>
>>> According to the comment [1], the assert seems to miss the case for 
>>> threads attached via JNI.
>>> For more info, please refer to the JBS.
>>>
>>> Could you please review it and give me some advice?
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>> [1] 
>>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>>
>>>
>>>


From daniel.daugherty at oracle.com  Tue Nov  5 01:31:49 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 4 Nov 2019 20:31:49 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
Message-ID: <32f8e268-7c15-82f6-3b9b-398c33c160cb@oracle.com>

Hi David,

Thanks for continuing to provide feedback on the Async Monitor Deflation
project! I appreciate your reviews very much...

Responses embedded below (as usual)...


On 11/4/19 1:28 AM, David Holmes wrote:
> Hi Dan,
>
> A few follow ups to your responses, with trimming ...
>
> On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
>> On 10/24/19 7:00 AM, David Holmes wrote:
>>> ?122 // Set _owner field to new_value; current value must match 
>>> old_value.
>>> ?123 inline void ObjectMonitor::set_owner_from(void* new_value, 
>>> void* old_value) {
>>> ?124?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>> ?125?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>>> INTPTR_FORMAT
>>>
>>> The use of cmpxchg seems a little strange here if you are asserting 
>>> that when this is called _owner must equal old_value. That means you 
>>> don't expect any race and if there is no race with another thread 
>>> writing to _owner then you don't need the cmpxchg. A normal:
>>>
>>> if (_owner == old_value) {
>>> ?? Atomic::store(&_owner, new_value);
>>> ?? log(...);
>>> } else {
>>> ?? guarantee(false, " unexpected old owner ...");
>>> }
>>
>> The two parameter version of set_owner_from() is only called from three
>> places and we'll cover two of them here:
>>
>> src/hotspot/share/runtime/objectMonitor.cpp:
>>
>> 1041???? if (AsyncDeflateIdleMonitors) {
>> 1042?????? set_owner_from(NULL, Self);
>> 1043???? } else {
>> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>> the lock
>> 1045?????? OrderAccess::storeload();??????????????????????? // See if 
>> we need to wake a successor
>> 1046???? }
>>
>> and:
>>
>> 1221?? if (AsyncDeflateIdleMonitors) {
>> 1222???? set_owner_from(NULL, Self);
>> 1223?? } else {
>> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
>> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
>> _owner vs LD in unpark()
>> 1226?? }
>>
>> So I've replaced the existing {release_store(), storeload()} combo 
>> for one
>> call site and the existing {release_store(), fence()} combo for the 
>> other
>> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
>>
>> 1) I wanted the same memory sync behavior at both call sites.
>> 2) I wanted similar/same memory sync behavior as the original
>> ??? code at those call sites.
>
> Why? The memory sync requirements for non-async deflation may be 
> completely different to those required for async-delfation (given all 
> the other bits if the protocol).

Good point!

For context, the first code block above (L1041-6) is in 
ObjectMonitor::exit()
and the second code block above (L1221-6) is in ObjectMonitor::ExitEpilog()
which is called from two different places by ObjectMonitor::exit(). In both
cases, we are setting the _owner field to NULL which will potentially make
the ObjectMonitor async deflatible (depending on ref_count).

For async deflation, I want the full fence semantics after setting the
_owner field to NULL in both locations:

src/hotspot/share/runtime/orderAccess.hpp:
//?????????????????????? Constraint???? x86????????? sparc TSO????????? ppc
// 
---------------------------------------------------------------------------
// fence???????????????? LoadStore? |?? lock???????? membar #StoreLoad? sync
//?????????????????????? StoreStore |?? addl 0,(sp)
//?????????????????????? LoadLoad?? |
//?????????????????????? StoreLoad
//
// release?????????????? LoadStore |?????????????????????????????????? 
lwsync
//?????????????????????? StoreStore

I don't want any loads or stores floating into or out of the critical 
region.


*** Side bar here ****

I just noticed something with the original code:

1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
lock
1045?????? OrderAccess::storeload();??????????????????????? // See if we 
need to wake a successor

For constraints, this gives us:
 ?????????? {LoadStore | StoreStore}
 ?????????? {StoreLoad}
at L1044-5. So the original code is just "missing" LoadLoad relative
to a full fence(). I'm not sure why this kind of load is allowed to
float into the critical region, but the code has been this way for a
very long time.

And for this original code:

1224???? OrderAccess::release_store(&_owner, (void*)NULL);
1225???? OrderAccess::fence();?????????????????????????????? // ST 
_owner vs LD in unpark()

For constraints, this gives us:
 ???????? {LoadStore | StoreStore}
 ???????? {LoadStore | StoreStore | LoadLoad | StoreLoad}
at L1224-5. Again this code has been this way for a very long time.

It seems to me that L1224-5 could be written like this:

1224???? _owner = NULL;
1225???? OrderAccess::fence();?????????????????????????????? // ST 
_owner vs LD in unpark()

with a plain store on L1224. Is that correct?

*** End side bar ***


>
>> 3) I wanted the return value from cmpxchg() for my state machine
>> ??? sanity check.
>
> I'm somewhat dubious about using cmpxchg just for the side-effect of 
> getting the existing value.

But I'm not "using cmpxchg just for the side-effect of getting the 
existing value".

That's the third thing on my list of three reasons. The most important
thing is I want the full fence that cmpcxhg() gives me. Above I said:

 > 1) I wanted the same memory sync behavior at both call sites.
 > 2) I wanted similar/same memory sync behavior as the original
 >? ?? code at those call sites.

Using cmpxchg() gives me the full fence I want and that's similar to
this baseline code at this call site:

1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
lock
1045?????? OrderAccess::storeload();??????????????????????? // See if we 
need to wake a successor

I'm getting the LoadLoad that the baseline site doesn't have.

The cmpxchg() gives me the same memory constaints as this baseline code
at this call site:

1224???? OrderAccess::release_store(&_owner, (void*)NULL);
1225???? OrderAccess::fence();?????????????????????????????? // ST 
_owner vs LD in unpark()

Note: Actually, I don't have the extra {LoadStore | StoreStore} from
the release_store() that I mentioned in the side bar above...

The last thing that I get is the existing value...


Okay, so I thought it was a pretty cool use of cmpxchg(), but I'm
obviously confusing code readers. So here's the v2.08 set_owner_from():

 ?124 // Set _owner field to new_value; current value must match old_value.
 ?125 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
old_value) {
 ?126?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
 ?127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
INTPTR_FORMAT
 ?128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
p2i(old_value));
 ?129?? log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
INTPTR_FORMAT
 ?130????????????????????????????????????? ", prev=" INTPTR_FORMAT ", new="
 ?131????????????????????????????????????? INTPTR_FORMAT, p2i(this), 
p2i(prev),
 ?132????????????????????????????????????? p2i(new_value));
 ?133 }

I could change it like this:

 ?124 // Set _owner field to new_value; current value must match old_value.
 ?125 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
old_value) {
 ?126?? void* prev = _owner;
 ?127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
INTPTR_FORMAT
 ?128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
p2i(old_value));
 ?129?? _owner = new_value;
 ?130?? OrderAccess::fence();
 ?131 ? log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
INTPTR_FORMAT
 ?132 ???????????????????????????????????? ", prev=" INTPTR_FORMAT ", new="
 ?133 ???????????????????????????????????? INTPTR_FORMAT, p2i(this), 
p2i(prev),
 ?134 ???????????????????????????????????? p2i(new_value));
 ?135 }

It's two lines longer, but it should require less head scratching to
figure out what I'm trying to do. Would this be acceptable?


>
>> I don't think that using 'Atomic::store(&_owner, new_value)' is the
>> right choice for these two call sites.
>
> If you don't actually need the cmpxchg to handle concurrent updates to 
> the _owner field, then a plain store (not an Atomic::store - that was 
> an error on my part) does not seem unreasonable; or if there are still 
> memory sync issues here, perhaps a release_store.

So in the above proposed code I switched to a plain store followed by
a fence().


> If you use cmpxchg then anyone reading the code will assume there is a 
> concurrent update that you are guarding against.

Yup. I concede the point that I'm obviously confusing the other
code readers... sorry about that...


>
>> The last two parameter set_owner_from() is talked about in the
>> next reply.
>>
>>
>>> Similarly for the old_value1/old_valuie2 version.
>>
>> The three parameter version of set_owner_from() is only called from one
>> place and the last two parameter version is called from the same place:
>>
>> src/hotspot/share/runtime/synchronizer.cpp:
>>
>> 1903?????? if (AsyncDeflateIdleMonitors) {
>> 1904???????? m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
>> 1905?????? } else {
>> 1906???????? m->set_owner_from(mark.locker(), NULL);
>> 1907?????? }
>>
>> The original code was:
>>
>> 1399?????? m->set_owner(mark.locker());
>>
>> The original set_owner() code was defined like this:
>>
>> ?? 87 inline void ObjectMonitor::set_owner(void* owner) {
>> ?? 88?? _owner = owner;
>> ?? 89 }
>>
>> So the original code didn't do any memory sync'ing at all and I've
>> changed that to a cmpxchg() on both code paths. That appears to be
>> overkill for that callsite...
>
> Again I'm not sure any memory sync requirements from the non-async 
> case should necessarily transfer over to the async case. Even if you 
> end up requiring similar memory sync the reasoning would be quite 
> different I would expect.

In this case, both async deflation and safepoint based deflation are
happy with the same memory sync because the newly allocated ObjectMonitor
isn't published yet so it is not deflatible by either mechanism. Also the
act of publishing the ObjectMonitor* will take care of the memory sync.


>
>>
>> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
>> section of the code. We've gotten our ObjectMonitor from om_alloc()
>> and are initializing a number of fields in the ObjectMonitor. The
>> ObjectMonitor is not published until we do:
>>
>> 1916?????? object->release_set_mark(markWord::encode(m));
>>
>> So we don't need the memory sync'ing features of the cmpxchg() for
>> either of the set_owner_from() calls and all that leaves is the
>> state machine sanity check.
>>
>> I really like the state machine sanity check on the owner field but
>> that's just because it came in handy when chasing the recent races.
>> It would be easy to change the three parameter version of
>> set_owner_from() to not do memory sync'ing, but still do the state
>> machine sanity check.
>>
>> Update: Changing the three parameter version of set_owner_from()
>> may impact the changes to owner_is_DEFLATER_MARKER() discussed
>> above. Sigh...
>> Update 2: Probably no impact because the three parameter version of
>> set_owner_from() is only used before the ObjectMonitor is published
>> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
>> appeared on an in-use list.
>>
>> However, the two parameter version of set_owner_from() needs its
>> memory sync'ing behavior for it's objectMonitor.cpp call sites so
>> this call site would need something different.
>>
>> I'm not sure which solution I'm going to pick yet, but I definitely
>> have to change something here since we don't need cmpxchg() at this
>> call site. More thought is required.
>
> I will look to see where this ended up.

I'll wait to see if you can live with the v2.08 version. I hope so...


>
>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>
>>>
>>> ?267?? if (AsyncDeflateIdleMonitors &&
>>> ?268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>>> DEFLATER_MARKER) {
>>
>> For more context, we are in:
>>
>> ??241 void ObjectMonitor::enter(TRAPS) {
>>
>>
>>> I don't see why you need to call try_set_owner_from again here as 
>>> "cur" will already be DEFLATER_MARKER from the previous try_set_owner.
>>
>> I assume the previous try_set_owner() call you mean is this one:
>>
>> ??248?? void* cur = try_set_owner_from(Self, NULL);
>>
>> This first try_set_owner() is for the most common case of no owner.
>>
>> The second try_set_owner() call is for a different condition than the 
>> first:
>>
>> ??268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>> DEFLATER_MARKER) {
>>
>> L248 is trying to change the _owner field from NULL -> 'Self'.
>> L268 is trying to change the _owner field from DEFLATER_MARKER to 
>> 'Self'.
>>
>> If the try_set_owner() call on L248 fails, 'cur' can be several possible
>> values:
>>
>> ?? - the calling thread (recursive enter is handled on L254-7)
>> ?? - other owning thread value (BasicLock* or Thread*)
>> ?? - DEFLATER_MARKER
>
> I'll give a caution okay to that explanation (the deficiency being in 
> my understanding, not your explaining :) ).

Thanks. I'll take it!


>
>>> Further, I don't see how installing self as the _owner here is valid 
>>> and means you acquired the monitor, as the fact it was 
>>> DEFLATER_MARKER means it is still being deflated by another thread 
>>> doesn't it ???
>>
>> I guess the comment after L268 didn't work for you:
>>
>> ??269???? // The deflation protocol finished the first part (setting 
>> owner),
>> ??270???? // but it failed the second part (making ref_count 
>> negative) and
>> ??271???? // bailed. Or the ObjectMonitor was async deflated and reused.
>>
>> It means that the deflater thread was racing with this enter and
>> managed to set the owner field to DEFLATER_MARKER as the first step
>> in the deflation protocol. Our entering thread actually won the race
>> when it managed to set the ref_count to a positive value as part of
>> the ObjectMonitorHandle stuff done in the inflate() call that preceded
>> the enter() call. However, the deflater thread hasn't realized that it
>> lost the race yet and hasn't restored the owner field back to NULL.
>
> You're right the comment didn't work for me as it required me to be 
> holding too much of the protocol in my head. Makes more sense now.

Good to hear!


>
> Thanks,
> David
> -----

Thanks again for the thorough reviews!

Dan


From daniel.daugherty at oracle.com  Tue Nov  5 01:34:43 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 4 Nov 2019 20:34:43 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <2c16521b-5694-7f6e-0f54-ee2bddf5563f@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
 <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>
 <17b0ec76-6f32-fd7d-6486-4df21582ce03@oracle.com>
 <2c16521b-5694-7f6e-0f54-ee2bddf5563f@oracle.com>
Message-ID: <b692dcca-a676-f462-67cb-21f64b05923e@oracle.com>

I can do that. I assume delete them in the Async Monitor Deflation
code and delete them in threadSMR.cpp. For the threadSMR.cpp, I'll
roll that change into my latest baseline cleanup bug:

 ??? JDK-8230876 baseline cleanups from Async Monitor Deflation v2.07
 ??? https://bugs.openjdk.java.net/browse/JDK-8230876

Erik, please chime in here... Thanks!

Dan


On 11/4/19 6:44 PM, David Holmes wrote:
> Hi Dan,
>
> Just delete the comments.
>
> Thanks,
> David
>
> On 5/11/2019 7:25 am, Daniel D. Daugherty wrote:
>> Hi David and Erik,
>>
>> Thanks for chiming in here Erik...
>>
>> This set of comments is not addressed in the CR8/v2.08/11-for-jdk14
>> code review request that I just sent out.
>>
>> I've read this response twice and I'm not quite sure what to do with it
>> relative to David's CR comment. I'll repeat those here:
>>
>> ?>? 199 // The decrement only needs to be MO_ACQ_REL since the reference
>> ?>? 200?? // counter is volatile.
>> ?>? 201?? Atomic::dec(&_ref_count);
>> ?>
>> ?> volatile is irrelevant with regards to memory ordering as it is a 
>> compiler
>> ?> annotation. And you haven't specified any memory order value so 
>> the default
>> ?> is conservative ie. implied full fence. (I see the same incorrect 
>> comment
>> ?> is in threadSMR.cpp!)
>>
>> Should I delete this comment? Or should it be changed? If changed, then
>> what text do you recommend here?
>>
>>
>> ?> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>> ?>? 209?? // counter update is seen as soon as possible in a race 
>> with the
>> ?>? 210?? // async deflation protocol.
>> ?>? 211?? Atomic::inc(&_ref_count);
>> ?>
>> ?> Ditto you haven't specified any ordering - and inc() and dec() 
>> will have the same default.
>>
>> Should I delete this comment? Or should it be changed? If changed, then
>> what text do you recommend here?
>>
>> Dan
>>
>>
>> On 11/4/19 8:09 AM, erik.osterlund at oracle.com wrote:
>>> Hi,
>>>
>>> TL/DR: David is right; the commentary is weird and does not capture 
>>> what the real constraints are.
>>>
>>> As the comment implied before "8222034: Thread-SMR functions should 
>>> be updated to remove work around", the PPC port used to have 
>>> incorrect memory ordering, and the code guarded against that. 
>>> inc/dec used to be memory_order_relaxed and add/sub used to be 
>>> memory_order_acq_rel on PPC, despite the shared contract promising 
>>> memory_order_conservative.
>>>
>>> The implication for the nested counter in the Thread SMR project was 
>>> that I wanted to use the inc/dec API but knew it was not gonna work 
>>> as expected on PPC because we really needed *at least* 
>>> memory_order_acq_rel when decrementing (and 
>>> memory_order_conservative when incrementing, which was simulated in 
>>> a CAS loop... yuck), but would find ourselves getting 
>>> memory_order_relaxed. Rather than treating it as a bug in the PPC 
>>> atomics implementation, and having the code be broken while we 
>>> waited for a fix, I changed the use to sub when decrementing (which 
>>> gave me the required memory_order_acq_rel ordering I needed), and 
>>> the horrible CAS loop when incrementing, as a workaround, and 
>>> alerted Martin Doerr that this would needed to be sorted out in the 
>>> PPC code. Since then, the PPC code did indeed get cleaned up so that 
>>> inc/dec stopped being relaxed-only and worked as advertised.
>>>
>>> After that, the "8222034: Thread-SMR functions should be updated to 
>>> remove work around" change removed the workaround that was no longer 
>>> required from the code, and put back the desired inc/dec calls 
>>> (which now used an overly conservative memory_order_conservative 
>>> ordering, which is suboptimal, in particular for decrements, but 
>>> importantly not incorrect). Since the nested case would almost never 
>>> run and is possibly the coldest code path in the VM, I did not care 
>>> to comment in that review thread about optimizing it by explicitly 
>>> passing in a weaker ordering. However, I should have commented on 
>>> the comment that was changed, which does indeed look a bit confused. 
>>> David is right that the stuff about volatile has nothing to do with 
>>> why this is correct. The correctness required memory_order_acq_rel 
>>> for decrements, but the implementation provided more, which is fine.
>>>
>>> The actual reason why I wanted memory_order_conservative for 
>>> correctness when incrementing and memory_order_acq_rel when 
>>> decrementing, was to prevent accesses inside of the critical section 
>>> (in particular - reading Thread*s from the acquired ThreadsList), 
>>> from floating outside of the reference increment and decrement that 
>>> marks reading the list as safe to access without the underlying list 
>>> blowing up. In practice, it might have been possible to relax it a 
>>> bit by relying on side effects of other unrelated parts of the 
>>> protocol to have spurious fencing... but I did not want to get the 
>>> protocol tangled in that way because it would be difficult to reason 
>>> about.
>>>
>>> Hope this explanation clears up that confusion.
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
>>>> Erik,
>>>>
>>>> David H. made a comment during this review cycle that should 
>>>> interest you.
>>>>
>>>> The longer version of this comment came up in early reviews of the 
>>>> Async
>>>> Monitor Deflation code because I copied the code and the longer 
>>>> comment
>>>> from threadSMR.cpp. I updated the comment based on your input and 
>>>> review
>>>> and changed the comment and code in threadSMR.cpp and in the Async 
>>>> Monitor
>>>> Deflation project code.
>>>>
>>>> The change in threadSMR.cpp was done with this changeset:
>>>>
>>>> $ hg log -v -r 54517
>>>> changeset:?? 54517:c201ca660afd
>>>> user:??????? dcubed
>>>> date:??????? Thu Apr 11 14:14:30 2019 -0400
>>>> files:?????? src/hotspot/share/runtime/threadSMR.cpp
>>>> description:
>>>> 8222034: Thread-SMR functions should be updated to remove work around
>>>> Reviewed-by: mdoerr, eosterlund
>>>>
>>>> Here's one of the two diffs to job your memory:
>>>>
>>>> ?void ThreadsList::dec_nested_handle_cnt() {
>>>> -? // The decrement needs to be MO_ACQ_REL. At the moment, the 
>>>> Atomic::dec
>>>> -? // backend on PPC does not yet conform to these requirements. 
>>>> Therefore
>>>> -? // the decrement is simulated with an Atomic::sub(1, &addr).
>>>> -? // Without this MO_ACQ_REL Atomic::dec simulation, the nested 
>>>> SMR mechanism
>>>> -? // is not generally safe to use.
>>>> -? Atomic::sub(1, &_nested_handle_cnt);
>>>> +? // The decrement only needs to be MO_ACQ_REL since the reference
>>>> +? // counter is volatile (and the hazard ptr is already NULL).
>>>> +? Atomic::dec(&_nested_handle_cnt);
>>>> ?}
>>>>
>>>> Below is David's comment about the code comment...
>>>>
>>>> Dan
>>>>
>>>>
>>>> Trimming down to just that issue...
>>>>
>>>> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>>>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>> >
>>>> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>
>>>>>> ?199 // The decrement only needs to be MO_ACQ_REL since the 
>>>>>> reference
>>>>>> ?200?? // counter is volatile.
>>>>>> ?201?? Atomic::dec(&_ref_count);
>>>>>>
>>>>>> volatile is irrelevant with regards to memory ordering as it is a 
>>>>>> compiler annotation. And you haven't specified any memory order 
>>>>>> value so the default is conservative ie. implied full fence. (I 
>>>>>> see the same incorrect comment is in threadSMR.cpp!)
>>>>>
>>>>> I got that wording from threadSMR.cpp and Erik O. confirmed my use 
>>>>> of that
>>>>> wording previously. I'll chase it down with Erik and get back to you.
>>>>>
>>>>>
>>>>>> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>>>>> ?209?? // counter update is seen as soon as possible in a race 
>>>>>> with the
>>>>>> ?210?? // async deflation protocol.
>>>>>> ?211?? Atomic::inc(&_ref_count);
>>>>>>
>>>>>> Ditto you haven't specified any ordering - and inc() and dec() 
>>>>>> will have the same default.
>>>>>
>>>>> And again, I'll have to chase this down with Erik O. and get back 
>>>>> to you.
>>>>
>>>
>>


From jianglizhou at google.com  Tue Nov  5 01:52:08 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 4 Nov 2019 17:52:08 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
Message-ID: <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>

On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com> wrote:

> Hi Jiangli,
>
> Thank you so much for spending time reviewing this RFE!
>
> On 11/3/19 6:34 PM, Jiangli Zhou wrote:
> > Hi Ioi,
> >
> > Sorry for the delay again. Will try to put this on the top of my list
> > next week and reduce the turn-around time. The updates look good in
> > general.
> >
> > We might want to have a better strategy when choosing metadata
> > relocation address (when relocation is needed). Some
> > applications/benchmarks may be more sensitive to cache locality and
> > memory/data layout. There was a bug,
> > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused 1G gap
> > between Java heap data and metadata before JDK 12. The gap seemed to
> > cause a small but noticeable runtime effect in one case that I came
> > across.
>
> I guess you're saying we should try to relocate the archive into
> somewhere under 32GB?
>

I don't yet have sufficient data that determins if mapping at low 32G
produces better runtime performance. I experimented with that, but didn't
see noticeable difference when comparing to mapping at the current default
address. It doesn't hurt, I think. So it may be a better choice than
relocating to a random address in high 32G space (when Java heap is in low
32G address space).


>
> Could you elaborate more about the performance issue, especially about
> cache locality? I looked at JDK-8213713 but it didn't mention about
> performance.
>

When enabling CDS we noticed a small runtime overhead in JDK 11 recently
with a benchmark. After I backported JDK-8213713 to 11, it seemed to reduce
the runtime overhead that the benchmark was experiencing.


>
> Also, by default, we have non-zero narrow_klass_base and
> narrow_klass_shift = 3, and archive relocation doesn't change that:
>
> $ java -Xlog:cds=debug -version
> ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
> $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
> ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>
> We always use narrow_klass_shift due to this:
>
>    // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>    // MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>    // how dump time narrow_klass_shift is set. Although, CDS can work
>    // with zero-shift mode also, to be consistent with AOT it uses
>    // LogKlassAlignmentInBytes for klass shift so archived java heap
> objects
>    // can be used at same time as AOT code.
>    if (!UseSharedSpaces
>        && (uint64_t)(higher_address - lower_base) <=
> UnscaledClassSpaceMax) {
>      CompressedKlassPointers::set_shift(0);
>    } else {
>      CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>    }
>

Right. If we relocate to low 32G space, it needs to make sure that the
range containing the mapped class data and class space must be encodable.


>
> > Here are some additional comments (minor).
> >
> > Could you please fix the long lines in the following?
> >
> > 1237 void
> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
> > archived_mirror) {
> > 1238   if (MetaspaceShared::relocation_delta() != 0) {
> > 1239     assert(archived_mirror->metadata_field(_klass_offset) ==
> > NULL, "must be for primitive class");
> > 1240
> > 1241     Klass* ak =
> > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
> > 1242     if (ak != NULL) {
> > 1243       archived_mirror->metadata_field_put(_array_klass_offset,
> > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
> > 1244     }
> > 1245   }
> > 1246 }
> >
> > src/hotspot/share/memory/dynamicArchive.cpp
> >
> >   889   Thread* THREAD = Thread::current();
> >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
> > dynamic_dump_method_comparator);
> >   891   if (ik->default_methods() != NULL) {
> >   892     Method::sort_methods(ik->default_methods(),
> > /*set_idnums=*/false, dynamic_dump_method_comparator);
> >   893   }
> >
>
> OK will do.
>
> > Please see inlined comments below.
> >
> > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >> Hi Jiangli,
> >>
> >> Thanks for the review. I've updated the patch according to your
> comments:
> >>
> >>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
> >>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
> >>
> >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta in my
> >> reply to Calvin's comments).
> >>
> >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
> >>> Hi Ioi,
> >>>
> >>> Sorry for the delay. Here are my remaining comments.
> >>>
> >>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>
> >>> 128   static intx _method_comparator_name_delta;
> >>>
> >>> The name of the above variable is confusing. It's the value of
> >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
> >>> directly.
> >> _buffer_to_target_delta is a non-static field, but
> >> dynamic_dump_method_comparator() must be a static function so it can't
> >> use the non-static field easily.
> >
> > It sounds like an issue. _buffer_to_target_delta was made as a
> > non-static mostly because we might support more than one dynamic
> > archives in the future. However, today's usages bake in an assumption
> > that _buffer_to_target_delta is a singleton value. It is cleaner to
> > either make _buffer_to_target_delta as a static variable for now, or
> > adding an access API in DynamicArchiveBuilder to allow other code to
> > properly and correctly use the value.
>
> OK, I'll move it to a static variable.
>
> >
> >>> Also, we can do a quick pointer comparison of 'a_name' and
> >>> 'b_name' first before adjusting the pointers.
> >> I added this:
> >>
> >>       if (a_name == b_name) {
> >>         return 0;
> >>       }
> >>
> >>> ---
> >>>
> >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
> >>> ...
> >>>    944
> >>>    945   ArchivePtrMarker::compact(relocatable_base, relocatable_end);
> >>> ...
> >>>
> >>>    974     SharedDataRelocator patcher((address*)patch_base,
> >>> (address*)patch_end, valid_old_base, valid_old_end,
> >>>    975                                 valid_new_base, valid_new_end,
> addr_delta);
> >>>    976     ArchivePtrMarker::ptrmap()->iterate(&patcher);
> >>>
> >>> Could we reduce the number of data re-iterations to help archive
> >>> dumping performance. The ArchivePtrMarker::compact operation can be
> >>> combined with the patching iteration. ArchivePtrMarker::compact API
> >>> can be removed.
> >> That's a good idea. I implemented it using a template parameter so that
> >> we can have max performance when relocating the archive at run time.
> >>
> >> I added comments to explain why the relocation is done here. The
> >> relocation is pretty rare (only when the base archive was not mapped at
> >> the default location).
> >>
> >>> ---
> >>>
> >>>    967     address valid_new_base =
> >>> (address)Arguments::default_SharedBaseAddress();
> >>>    968     address valid_new_end  = valid_new_base +
> base_plus_top_size;
> >>>
> >>> The debugging only code can be included under #ifdef ASSERT.
> >> These values are actually also used in debug logging so they can't be
> >> ifdef'ed out.
> >>
> >> Also, the c++ compiler is pretty good with eliding code that's no
> >> actually used. If I comment out all the logging code in
> >> DynamicArchiveBuilder::relocate_buffer_to_target()  and
> >> SharedDataRelocator, gcc elides all the unused fields and their
> >> assignments. So no code is generated for this, etc.
> >>
> >>       address valid_new_base =
> >> (address)Arguments::default_SharedBaseAddress();
> >>
> >> Since #ifdef ASSERT makes the code harder to read, I think we should use
> >> it only when really necessary.
> > It seems cleaner to get rid of these debugging only variables, by
> > using 'relocatable_base' and
> > '(address)Arguments::default_SharedBaseAddress()' in the logging code.
>
> SharedDataRelocator is used under 3 different situations. These six
> variables (patch_base, patch_end, valid_old_base, valid_old_end,
> valid_new_base, valid_new_end) describes what is being patched, and what
> the expectations are, for each situation. The code will be hard to
> understand without them.
>
> Please note there's also logging code in the SharedDataRelocator
> constructor that prints out these values.
>
> I think I'll just remove the 'debug only' comment to avoid confusion.
>

Ok.


>
> >
> >>> ---
> >>>
> >>>    993   dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
> >>>
> >>> We could combine the archived heap data bitmap into the new region as
> >>> well? It can be handled as a separate RFE.
> >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
> >>
> >>> - src/hotspot/share/memory/filemap.cpp
> >>>
> >>> 1038     if (is_static()) {
> >>> 1039       if (errno == ENOENT) {
> >>> 1040         // Not locating the shared archive is ok.
> >>> 1041         fail_continue("Specified shared archive not found (%s).",
> >>> _full_path);
> >>> 1042       } else {
> >>> 1043         fail_continue("Failed to open shared archive file (%s).",
> >>> 1044                       os::strerror(errno));
> >>> 1045       }
> >>> 1046     } else {
> >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
> >>> doesn't exist: %s", _full_path);
> >>> 1048     }
> >>>
> >>> If the top layer is explicitly specified by the user, a warning does
> >>> not seem to be a proper behavior if the VM fails to open the archive
> >>> file.
> >>>
> >>> If might be better to handle the relocation unrelated code in separate
> >>> changeset and track with a separate RFE.
> >> This code was moved from
> >>
> >>
> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
> >>
> >> so I am not changing the behavior. If you want, we can file an REF to
> >> change the behavior.
> > Ok. A new RFE sounds like the right thing to re-evaluable the usage
> > issue here. Thanks.
>
> I created https://bugs.openjdk.java.net/browse/JDK-8233446
>
> >>> ---
> >>>
> >>> 1148 void FileMapInfo::write_region(int region, char* base, size_t
> size,
> >>> 1149                                bool read_only, bool allow_exec) {
> >>> ...
> >>> 1154
> >>> 1155   if (region == MetaspaceShared::bm) {
> >>> 1156     target_base = NULL;
> >>> 1157   } else if (DynamicDumpSharedSpaces) {
> >>>
> >>> It's not too clear to me how the bitmap (bm) region is handled for the
> >>> base layer and top layer. Could you please explain?
> >> The bm region for both layers are mapped at an address picked by the OS:
> >>
> >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
> >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>     bitmap_size = si->used_aligned();
> >>     bool read_only = true, allow_exec = false;
> >>     char* requested_addr = NULL; // allow OS to pick any location
> >>     char* bitmap_base = os::map_memory(_fd, _full_path,
> si->file_offset(),
> >>                                        requested_addr, bitmap_size,
> >> read_only, allow_exec);
> >>
> > Ok, after staring at the code for a few seconds I saw that's intended.
> > If the current region is 'bm', then the 'target_base' is NULL
> > regardless if it's static or dynamic archive. Otherwise, the
> > 'target_base' is handled differently for the static and dynamic case.
> > The following would be cleaner and has better reliability.
> >
> >     char* target_base = NULL;
> >
> >     // The target_base is NULL for 'bm' region.
> >     if (!region == MetaspaceShared::bm) {
> >       if (DynamicDumpSharedSpaces) {
> >         assert(!HeapShared::is_heap_region(region), "dynamic archive
> > doesn't support heap regions");
> >         target_base = DynamicArchive::buffer_to_target(base);
> >       } else {
> >         target_base = base;
> >       }
> >    }
>
> How about this?
>
>    char* target_base;
>    if (region == MetaspaceShared::bm) {
>      target_base = NULL; // always NULL for bm region.
>    } else {
>      if (DynamicDumpSharedSpaces) {
>          assert(!HeapShared::is_heap_region(region), "dynamic archive
> doesn't support heap regions");
>          target_base = DynamicArchive::buffer_to_target(base);
>      } else {
>          target_base = base;
>      }
>    }
>
>
No objection If you prefer the extra 'else' block.


> >
> >>> ---
> >>>
> >>> 1362
>  DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>
> >>> Could you please explain the above?
> >> I added the comments
> >>
> >>     // Make sure we don't attempt to use header()->mapped_base_address()
> >> unless
> >>     // it's been successfully mapped.
> >>
> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>
> >>> ---
> >>>
> >>> 1359   FileMapRegion* last_region = NULL;
> >>>
> >>> 1371     if (last_region != NULL) {
> >>> 1372       // Ensure that the OS won't be able to allocate new memory
> >>> spaces between any mapped
> >>> 1373       // regions, or else it would mess up the simple comparision
> >>> in MetaspaceObj::is_shared().
> >>> 1374       assert(si->mapped_base() == last_region->mapped_end(),
> >>> "must have no gaps");
> >>>
> >>> 1379     last_region = si;
> >>>
> >>> Can you please place 'last_region' related code under #ifdef ASSERT?
> >> I think that will make the code more cluttered. The compiler will
> >> optimize out that away.
> > It's cleaner to define debugging only variable for debugging only
> > builds. You can wrapper it and related usage with DEBUG_ONLY.
>
> OK, will do.
>
> >
> >>> ---
> >>>
> >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
> >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>> 1480   bitmap_size = si->used_aligned();
> >>> 1481   bool read_only = true, allow_exec = false;
> >>> 1482   char* requested_addr = NULL; // allow OS to pick any location
> >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
> si->file_offset(),
> >>> 1484                                      requested_addr, bitmap_size,
> >>> read_only, allow_exec);
> >>>
> >>> We need to handle mapping failure here.
> >> It's handled here:
> >>
> >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
> >>     log_debug(cds, reloc)("runtime archive relocation start");
> >>     size_t bitmap_size;
> >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
> >>     if (bitmap_base != NULL) {
> >>     ...
> >>     } else {
> >>       log_error(cds)("failed to map relocation bitmap");
> >>       return false;
> >>     }
> >>
> > 'bitmap_base' is used immediately after map_memory(). So the check
> > needs to be done immediately after map_memory(), but not in the caller
> > of map_relocation_bitmap().
> >
> > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
> si->file_offset(),
> > 1491                                      requested_addr, bitmap_size,
> > read_only, allow_exec);
> > 1492
> > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
> > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>
> OK, I'll fix that.
>
> >
> >
> >>> ---
> >>>
> >>> 1513     // debug only -- the current value of the pointers to be
> >>> patched must be within this
> >>> 1514     // range (i.e., must be between the requesed base address,
> >>> and the of the current archive).
> >>> 1515     // Note: top archive may point to objects in the base
> >>> archive, but not the other way around.
> >>> 1516     address valid_old_base =
> (address)header()->requested_base_address();
> >>> 1517     address valid_old_end  = valid_old_base +
> mapping_end_offset();
> >>>
> >>> Please place all FileMapInfo::relocate_pointers debugging only code
> >>> under #ifdef ASSERT.
> >> Ditto about ifdef ASSERT
> >>
> >>> - src/hotspot/share/memory/heapShared.cpp
> >>>
> >>>    441 void HeapShared::initialize_from_archived_subgraph(Klass* k) {
> >>>    442   if (!open_archive_heap_region_mapped() ||
> !MetaspaceObj::is_shared(k)) {
> >>>    443     return; // nothing to do
> >>>    444   }
> >>>
> >>> When do we call HeapShared::initialize_from_archived_subgraph for a
> >>> klass that's not shared?
> >> I've removed the !MetaspaceObj::is_shared(k). I probably added that for
> >> debugging purposes only.
> >>
> >>>    616   DEBUG_ONLY({
> >>>    617       Klass* klass = orig_obj->klass();
> >>>    618       assert(klass != SystemDictionary::Module_klass() &&
> >>>    619              klass !=
> SystemDictionary::ResolvedMethodName_klass() &&
> >>>    620              klass != SystemDictionary::MemberName_klass() &&
> >>>    621              klass != SystemDictionary::Context_klass() &&
> >>>    622              klass != SystemDictionary::ClassLoader_klass(), "we
> >>> can only relocate metaspace object pointers inside java_lang_Class
> >>> instances");
> >>>    623     });
> >>>
> >>> Let's leave the above for a separate RFE. I think assert is not
> >>> sufficient for the check. Also, why ResolvedMethodName, Module and
> >>> MemberName cannot be part of the graph?
> >>>
> >>>
> >> I added the following comment:
> >>
> >>     DEBUG_ONLY({
> >>         // The following are classes in share/classfile/javaClasses.cpp
> >> that have injected native pointers
> >>         // to metaspace objects. To support these classes, we need to
> add
> >> relocation code similar to
> >>         // java_lang_Class::update_archived_mirror_native_pointers.
> >>         Klass* klass = orig_obj->klass();
> >>         assert(klass != SystemDictionary::Module_klass() &&
> >>                klass != SystemDictionary::ResolvedMethodName_klass() &&
> >>
> > It's too restrictive to exclude those objects from the archived object
> > graph because metadata relocation, since metadata relocation is rare.
> > The trade-off doesn't seem to buy us much.
> >
> > Do you plan to add the needed relocation code?
>
> I looked more into this. Actually we cannot handle these 5 classes at
> all, even without archive relocation:
>
> [1] #define MODULE_INJECTED_FIELDS(macro) \
>    macro(java_lang_Module, module_entry, intptr_signature, false)
>
> ->  module_entry is malloc'ed
>
> [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>    macro(java_lang_invoke_ResolvedMethodName, vmholder,
> object_signature, false) \
>    macro(java_lang_invoke_ResolvedMethodName, vmtarget,
> intptr_signature, false)
>
> -> these fields are related to method handles and lambda forms, etc.
> They can't be easily be archived without implementing lambda form
> archiving. (I did a prototype; it's very complex and fragile).
>
> [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>    macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> vmdependencies, intptr_signature, false) \
>    macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> last_cleanup, long_signature, false)
>
> -> vmdependencies is malloc'ed.
>
> [4] #define
> MEMBERNAME_INJECTED_FIELDS(macro)                               \
>    macro(java_lang_invoke_MemberName, vmindex,  intptr_signature, false)
>
> -> this one is probably OK. Despite being declared as
> 'intptr_signature', it seems to be used just as an integer. However,
> MemberNames are typically used with [2] and [3]. So let's just forbid it
> to be safe.
>
> [2] [3] [4] are not used directly by regular Java code and are unlikely
> to be referenced (directly or indirectly) by static fields (except for
> the static fields in the classes in java.lang.invoke, which we probably
> won't support for heap archiving due to the problem I described for
> [2]). Objects of these types are typically referenced via constant pool
> entries.
>
> [5] #define CLASSLOADER_INJECTED_FIELDS(macro)                            \
>    macro(java_lang_ClassLoader, loader_data,  intptr_signature, false)
>
> -> loader_data is malloc'ed.
>
> So, I will change the DEBUG_ONLY into a product-mode check, and quit
> dumping if these objects are found in the object subgraph.
>

Sounds good. Can you please also add a comment with explanation.

For  ClassLoader and Module, it worth considering caching the additional
native data some time in the future. Lois had suggested the Module part a
while ago.


> Maybe we should backport the check to older versions as well?
>

We should discuss with Andrew Haley for backports to JDK 11 update
releases. Since the current OpenJDK 11 only applies Java heap archiving to
a restricted set of JDK library code, I think it is safe without the new
check.

For non-LTS releases, it might not be worthwhile as they may not be widely
used?

Thanks,
Jiangli


> >
> >>> - src/hotspot/share/memory/metaspace.cpp
> >>>
> >>> 1036   metaspace_rs = ReservedSpace(compressed_class_space_size(),
> >>> 1037                                              _reserve_alignment,
> >>> 1038                                              large_pages,
> >>> 1039                                              requested_addr);
> >>>
> >>> Please fix indentation.
> >> Fixed.
> >>
> >>> - src/hotspot/share/memory/metaspaceClosure.hpp
> >>>
> >>>     78   enum SpecialRef {
> >>>     79     _method_entry_ref
> >>>     80   };
> >>>
> >>> Are there other pointers that are not references to MetaspaceObj? If
> >>> _method_entry_ref is the only type, it's probably not worth defining
> >>> SpecialRef?
> >> There may be more types in the future, so I want to have a stable API
> >> that can be easily expanded without touching all the code that uses it.
> >>
> >>
> >>> - src/hotspot/share/memory/metaspaceShared.hpp
> >>>
> >>>     42 enum MapArchiveResult {
> >>>     43   MAP_ARCHIVE_SUCCESS,
> >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
> >>>     45   MAP_ARCHIVE_OTHER_FAILURE
> >>>     46 };
> >>>
> >>> If we want to define different failure types, it's probably worth
> >>> using separate types for relocation failure and validation failure.
> >> For now, I just need to distinguish between MMAP_FAILURE (where I should
> >> attempt to remap at an alternative address) and OTHER_FAILURE (where the
> >> CDS archive loading will fail -- due to validation error, insufficient
> >> memory, etc -- without attempting to remap.)
> >>
> >>> ---
> >>>
> >>>    193   static intx _mapping_delta; // FIXME rename
> >>>
> >>> How about _relocation_delta?
> >> Changed as suggested.
> >>
> >>> - src/hotspot/share/oops/instanceKlass
> >>>
> >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
> >>>
> >>> The use of _disable_method_binary_search is not necessary. You can use
> >>> DynamicDumpSharedSpaces for the purpose. That would make things
> >>> cleaner.
> >> If we always disable the binary search when DynamicDumpSharedSpaces is
> >> true, it will slow down normal execution of the Java program when
> >> -XX:ArchiveClassesAtExit has been specified, but the program hasn't
> exited.
> > Could you please add some comments to _disable_method_binary_search
> > with the above explanation? Thanks.
>
> OK
> >
> >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
> >>>
> >>>     76                     if (name.equals("s0") || name.equals("s1"))
> {
> >>>     77                       // String regions are listed at the end
> and
> >>> they may not be fully occupied.
> >>>     78                       break;
> >>>     79                     } else if (name.equals("bm")) {
> >>>     80                       // Bitmap space does not have a requested
> address.
> >>>     81                       break;
> >>>
> >>> It's not part of your change, but could you please fix line 76 - 78
> >>> since it is trivial. It seems the lines can be removed.
> >> Removed.
> >>
> >>> - /src/hotspot/share/memory/archiveUtils.hpp
> >>> The file name does not match with the macro '#ifndef
> >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
> >>> archiveUtils.* ? archiveRelocator.hpp and archiveRelocator.cpp are
> >>> more descriptive.
> >> I named the file archiveUtils.hpp so we can move other misc stuff used
> >> by dumping into this file (e.g., DumpRegion, WriteClosure from
> >> metaspaceShared.hpp), since theses are not used by the majority of the
> >> files that use metaspaceShared.hpp.
> >>
> >> I fixed the ifdef.
> >>
> >>> - src/hotspot/share/memory/archiveUtils.cpp
> >>>
> >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap, address*
> >>> ptr_base, address* ptr_end) {
> >>>     37   assert(_ptrmap == NULL, "initialize only once");
> >>>     38   _ptr_base = ptr_base;
> >>>     39   _ptr_end = ptr_end;
> >>>     40   _compacted = false;
> >>>     41   _ptrmap = ptrmap;
> >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); // default
> >>> archive is about 12MB.
> >>>     43 }
> >>>
> >>> Could we do a better estimate here? We could guesstimate the size
> >>> based on the current used class space and metaspace size. It's okay if
> >>> a larger bitmap used, since it can be reduced after all marking are
> >>> done.
> >> The bitmap is automatically expanded when necessary in
> >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64 of the
> >> total archive size, so even if we do expand, the cost will be trivial.
> > The initial value is based on the default CDS archive. When dealing
> > with a really large archive, it would have to re-grow many times.
> > Also, using a hard-coded value is less desirable.
>
> OK, I changed it to the following
>
>    // Use this as initial guesstimate. We should need less space in the
>    // archive, but if we're wrong the bitmap will be expanded
> automatically.
>    size_t estimated_archive_size = MetaspaceGC::capacity_until_GC();
>    // But set it smaller in debug builds so we always test the expansion
> code.
>    // (Default archive is about 12MB).
>    DEBUG_ONLY(estimated_archive_size = 6 * M);
>
>    // We need one bit per pointer in the archive.
>    _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>
>
> Thanks!
> - Ioi
>
> >
> >>>
> >>>
> >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou <jianglizhou at google.com>
> wrote:
> >>>> Hi Ioi,
> >>>>
> >>>> This is another great step for CDS usability improvement. Thank you!
> >>>>
> >>>> I have a high level question (or request): could we consider
> >>>> separating the relocation work for 'direct' class metadata from other
> >>>> types of metadata (such as the shared system dictionary, symbol table,
> >>>> etc)? Initially we only relocate the tables and other archived global
> >>>> data. When each archived class is being loaded, we can relocate all
> >>>> the pointers within the current class. We could find the segment (for
> >>>> the current class) in the bitmap and update the pointers within the
> >>>> segment. That way we can reduce initial startup costs and also avoid
> >>>> relocating class data that's not used at runtime. In some real world
> >>>> large systems, an archive may contain extremely large number of
> >>>> classes.
> >>>>
> >>>> Following are partial review comments so we can move things forward.
> >>>> Still going through the rest of the changes.
> >>>>
> >>>> - src/hotspot/share/classfile/javaClasses.cpp
> >>>>
> >>>> 1218 void java_lang_Class::update_archived_mirror_native_pointers(oop
> >>>> archived_mirror) {
> >>>> 1219   Klass* k =
> ((Klass*)archived_mirror->metadata_field(_klass_offset));
> >>>> 1220   if (k != NULL) { // k is NULL for the primitive classes such as
> >>>> java.lang.Byte::TYPE <<<<<<<<<<<
> >>>> 1221     archived_mirror->metadata_field_put(_klass_offset,
> >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
> >>>> 1222   }
> >>>> 1223 ...
> >>>>
> >>>> Primitive type mirrors are handled separately. Could you please verify
> >>>> if this call path happens for primitive type mirror?
> >>>>
> >>>> To answer my question above, looks like you added the following, which
> >>>> is to be used for primitive type mirrors. That seems to be the reason
> >>>> why update_archived_mirror_native_pointers is trying to also cover
> >>>> primitive type. It better to have a separate API for primitive type
> >>>> mirror, which is cleaner. And, we also can replace the above check at
> >>>> line 1220 to be an assert for regular mirrors.
> >>>>
> >>>> +void ReadClosure::do_mirror_oop(oop *p) {
> >>>> +  do_oop(p);
> >>>> +  oop mirror = *p;
> >>>> +  if (mirror != NULL) {
> >>>> +    java_lang_Class::update_archived_mirror_native_pointers(mirror);
> >>>> +  }
> >>>> +}
> >>>> +
> >>>>
> >>>> How about renaming update_archived_mirror_native_pointers to
> >>>> update_archived_mirror_klass_pointers.
> >>>>
> >>>> It would be good to pass the current klass as an argument. We can
> >>>> verify the relocated pointer matches with the current klass pointer.
> >>>>
> >>>> We should also check if relocation is necessary before spending cycles
> >>>> to obtain the klass pointer from the mirror.
> >>>>
> >>>> 1252   update_archived_mirror_native_pointers(m);
> >>>> 1253
> >>>> 1254   // mirror is archived, restore
> >>>> 1255   assert(HeapShared::is_archived_object(m), "must be archived
> >>>> mirror object");
> >>>> 1256   Handle mirror(THREAD, m);
> >>>>
> >>>> Could we move the line at 1252 after the assert at line 1255?
> >>>>
> >>>> - src/hotspot/share/include/cds.h
> >>>>
> >>>>     47   int     _mapped_from_file;  // Is this region mapped from a
> file?
> >>>>     48                               // If false, this region was
> >>>> initialized using os::read().
> >>>>
> >>>> Is the new field truly needed? It seems we could use _mapped_base to
> >>>> determine if a region is mapped or not?
> >>>>
> >>>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>>
> >>>> Could you please remove the debugging print code in
> >>>> dynamic_dump_method_comparator? Or convert those to logging output if
> >>>> they are helpful.
> >>>>
> >>>> Will send out the rest of the review comments later.
> >>>>
> >>>> Best,
> >>>>
> >>>> Jiangli
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>> Bug:
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
> >>>>>
> >>>>> Webrev:
> >>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
> >>>>>
> >>>>> Design:
> >>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
> >>>>>
> >>>>>
> >>>>> Overview:
> >>>>>
> >>>>> The CDS archive is mmaped to a fixed address range (starting at
> >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
> >>>>> requested address range is not available (usually due to Address
> >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give up and
> >>>>> will load classes dynamically using class files.
> >>>>>
> >>>>> [a] This causes slow down in JVM start-up.
> >>>>> [b] Handling of mapping failures causes unnecessary complication in
> >>>>>        the CDS tests.
> >>>>>
> >>>>> Here are some preliminary benchmarking results (using default CDS
> archive,
> >>>>> running helloworld):
> >>>>>
> >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
> >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
> >>>>> (c) 86.2ms (CDS disabled)
> >>>>>
> >>>>> The small degradation in (b) is caused by the relocation of
> >>>>> absolute pointers embedded in the CDS archive. However, it is
> >>>>> still a big improvement over case (c)
> >>>>>
> >>>>> Please see the design doc (link above) for details.
> >>>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
>
>

From fujie at loongson.cn  Tue Nov  5 02:49:39 2019
From: fujie at loongson.cn (Jie Fu)
Date: Tue, 5 Nov 2019 10:49:39 +0800
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
 <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
Message-ID: <a705e140-8a6f-3178-b229-88325a8e2584@loongson.cn>

Hi David,

I had tested your patch (without the Shenandoah fix) on VMs with/without 
the JRF feature and both of them had passed for the particular reproducer.
So thanks again for fixing it in the shared runtime code.

Best regards,
Jie

On 2019/11/5 ??7:26, David Holmes wrote:
> Hi Jie,
>
> Thanks for filing this and attempting a fix. As per the bug report the 
> underlying issue has now been fixed in Shenandoah, but I want to make 
> the interrupt code more resilient as well:
>
> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
>
> I was unable to reproduce the Shenandoah crash so if you could test 
> this patch I would appreciate it - thanks. (Without the Shenandoah fix 
> of course :) )
>
> Meanwhile I'm putting the patch through other testing.
>
> Thanks,
> David
> -----
>
> On 4/11/2019 11:13 pm, David Holmes wrote:
>> Hi Jie,
>>
>> I will need to take a deeper look at this. This is a problem specific 
>> to Shenadoah GC as it is triggering a sleep whilst a thread is still 
>> in the process of attaching to the JVM :(
>>
>> Thanks,
>> David
>>
>> On 4/11/2019 7:16 pm, Jie Fu wrote:
>>> Hi all,
>>>
>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>>
>>> According to the comment [1], the assert seems to miss the case for 
>>> threads attached via JNI.
>>> For more info, please refer to the JBS.
>>>
>>> Could you please review it and give me some advice?
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>> [1] 
>>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>>
>>>
>>>


From suenaga at oss.nttdata.com  Tue Nov  5 03:56:24 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 12:56:24 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
Message-ID: <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>

On 2019/11/05 9:17, David Holmes wrote:
> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>> Hi Markus,
>>>
>>> I thought similar change, and it is running on submit repo:
>>>
>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>
>>> If it passes all tests, I will send review request again.
>>
>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>> Could you review again?
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>
>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>
>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
> 
> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
> 
>  ?348?? Thread* const thread = Thread::current();
> 
> or crash here:
> 
>  ?349?? if (thread->is_Watcher_thread()) {

Thanks David!
I fixed it in new webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/

It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).


Yasumasa


> David
> -----
> 
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>> Hi Yasumasa and David,
>>>>
>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>
>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>
>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>
>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>
>>>> Perhaps we can do it like this?
>>>>
>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>
>>>> Thanks for your patience investigating this
>>>>
>>>> Markus
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes
>>>> Sent: den 4 november 2019 05:24
>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Cc: yasuenag at gmail.com
>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>
>>>> So looking at Yasumasa's proposed fix ...
>>>>
>>>> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>
>>>> 357?? assert(thread->is_Java_thread() &&
>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>>>
>>>> This assertion is incorrect. As this can be called via
>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>
>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>
>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>
>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>
>>>> if (!guard_reentrancy()) {
>>>> ??? return;
>>>> } else {
>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>> ??? if (!prepare_for_emergency_dump()) {
>>>> ????? return;
>>>> ??? }
>>>> }
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>
>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>> Correction ...
>>>>>
>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>
>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>> code that directly manipulates thread-state.
>>>>>>>
>>>>>>> I do not agree with this change.
>>>>>>>
>>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>>> Thread::current() might be different with it.
>>>>>>
>>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>>> the current thread.
>>>>>>
>>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>>> change the thread state.
>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>
>>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>>> were pending. But the path through before_exit already has safepoint
>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>
>>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>>> on the report_and_die() path. So a custom transition helper is needed
>>>>> to avoid that.
>>>>>
>>>>> David
>>>>>
>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>> running in a JavaThread:
>>>>>>
>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>
>>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>
>>>>>> Your change still seems overly complicated.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Markus
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>>> thread state
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>
>>>>>>>>> ???? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>
>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>
>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>
>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>>> it in JFR code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>>> state
>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Please review this change:
>>>>>>>>>>
>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>> ?? ? webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>
>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>
>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa

From david.holmes at oracle.com  Tue Nov  5 04:50:46 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 14:50:46 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <b692dcca-a676-f462-67cb-21f64b05923e@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
 <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>
 <17b0ec76-6f32-fd7d-6486-4df21582ce03@oracle.com>
 <2c16521b-5694-7f6e-0f54-ee2bddf5563f@oracle.com>
 <b692dcca-a676-f462-67cb-21f64b05923e@oracle.com>
Message-ID: <9c3f8b52-88da-33d3-8737-db3854d6c20c@oracle.com>

On 5/11/2019 11:34 am, Daniel D. Daugherty wrote:
> I can do that. I assume delete them in the Async Monitor Deflation
> code and delete them in threadSMR.cpp. For the threadSMR.cpp, I'll
> roll that change into my latest baseline cleanup bug:
> 
>  ??? JDK-8230876 baseline cleanups from Async Monitor Deflation v2.07
>  ??? https://bugs.openjdk.java.net/browse/JDK-8230876

Sure.

Thanks,
David
-----

> Erik, please chime in here... Thanks!
> 
> Dan
> 
> 
> On 11/4/19 6:44 PM, David Holmes wrote:
>> Hi Dan,
>>
>> Just delete the comments.
>>
>> Thanks,
>> David
>>
>> On 5/11/2019 7:25 am, Daniel D. Daugherty wrote:
>>> Hi David and Erik,
>>>
>>> Thanks for chiming in here Erik...
>>>
>>> This set of comments is not addressed in the CR8/v2.08/11-for-jdk14
>>> code review request that I just sent out.
>>>
>>> I've read this response twice and I'm not quite sure what to do with it
>>> relative to David's CR comment. I'll repeat those here:
>>>
>>> ?>? 199 // The decrement only needs to be MO_ACQ_REL since the reference
>>> ?>? 200?? // counter is volatile.
>>> ?>? 201?? Atomic::dec(&_ref_count);
>>> ?>
>>> ?> volatile is irrelevant with regards to memory ordering as it is a 
>>> compiler
>>> ?> annotation. And you haven't specified any memory order value so 
>>> the default
>>> ?> is conservative ie. implied full fence. (I see the same incorrect 
>>> comment
>>> ?> is in threadSMR.cpp!)
>>>
>>> Should I delete this comment? Or should it be changed? If changed, then
>>> what text do you recommend here?
>>>
>>>
>>> ?> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>> ?>? 209?? // counter update is seen as soon as possible in a race 
>>> with the
>>> ?>? 210?? // async deflation protocol.
>>> ?>? 211?? Atomic::inc(&_ref_count);
>>> ?>
>>> ?> Ditto you haven't specified any ordering - and inc() and dec() 
>>> will have the same default.
>>>
>>> Should I delete this comment? Or should it be changed? If changed, then
>>> what text do you recommend here?
>>>
>>> Dan
>>>
>>>
>>> On 11/4/19 8:09 AM, erik.osterlund at oracle.com wrote:
>>>> Hi,
>>>>
>>>> TL/DR: David is right; the commentary is weird and does not capture 
>>>> what the real constraints are.
>>>>
>>>> As the comment implied before "8222034: Thread-SMR functions should 
>>>> be updated to remove work around", the PPC port used to have 
>>>> incorrect memory ordering, and the code guarded against that. 
>>>> inc/dec used to be memory_order_relaxed and add/sub used to be 
>>>> memory_order_acq_rel on PPC, despite the shared contract promising 
>>>> memory_order_conservative.
>>>>
>>>> The implication for the nested counter in the Thread SMR project was 
>>>> that I wanted to use the inc/dec API but knew it was not gonna work 
>>>> as expected on PPC because we really needed *at least* 
>>>> memory_order_acq_rel when decrementing (and 
>>>> memory_order_conservative when incrementing, which was simulated in 
>>>> a CAS loop... yuck), but would find ourselves getting 
>>>> memory_order_relaxed. Rather than treating it as a bug in the PPC 
>>>> atomics implementation, and having the code be broken while we 
>>>> waited for a fix, I changed the use to sub when decrementing (which 
>>>> gave me the required memory_order_acq_rel ordering I needed), and 
>>>> the horrible CAS loop when incrementing, as a workaround, and 
>>>> alerted Martin Doerr that this would needed to be sorted out in the 
>>>> PPC code. Since then, the PPC code did indeed get cleaned up so that 
>>>> inc/dec stopped being relaxed-only and worked as advertised.
>>>>
>>>> After that, the "8222034: Thread-SMR functions should be updated to 
>>>> remove work around" change removed the workaround that was no longer 
>>>> required from the code, and put back the desired inc/dec calls 
>>>> (which now used an overly conservative memory_order_conservative 
>>>> ordering, which is suboptimal, in particular for decrements, but 
>>>> importantly not incorrect). Since the nested case would almost never 
>>>> run and is possibly the coldest code path in the VM, I did not care 
>>>> to comment in that review thread about optimizing it by explicitly 
>>>> passing in a weaker ordering. However, I should have commented on 
>>>> the comment that was changed, which does indeed look a bit confused. 
>>>> David is right that the stuff about volatile has nothing to do with 
>>>> why this is correct. The correctness required memory_order_acq_rel 
>>>> for decrements, but the implementation provided more, which is fine.
>>>>
>>>> The actual reason why I wanted memory_order_conservative for 
>>>> correctness when incrementing and memory_order_acq_rel when 
>>>> decrementing, was to prevent accesses inside of the critical section 
>>>> (in particular - reading Thread*s from the acquired ThreadsList), 
>>>> from floating outside of the reference increment and decrement that 
>>>> marks reading the list as safe to access without the underlying list 
>>>> blowing up. In practice, it might have been possible to relax it a 
>>>> bit by relying on side effects of other unrelated parts of the 
>>>> protocol to have spurious fencing... but I did not want to get the 
>>>> protocol tangled in that way because it would be difficult to reason 
>>>> about.
>>>>
>>>> Hope this explanation clears up that confusion.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
>>>>> Erik,
>>>>>
>>>>> David H. made a comment during this review cycle that should 
>>>>> interest you.
>>>>>
>>>>> The longer version of this comment came up in early reviews of the 
>>>>> Async
>>>>> Monitor Deflation code because I copied the code and the longer 
>>>>> comment
>>>>> from threadSMR.cpp. I updated the comment based on your input and 
>>>>> review
>>>>> and changed the comment and code in threadSMR.cpp and in the Async 
>>>>> Monitor
>>>>> Deflation project code.
>>>>>
>>>>> The change in threadSMR.cpp was done with this changeset:
>>>>>
>>>>> $ hg log -v -r 54517
>>>>> changeset:?? 54517:c201ca660afd
>>>>> user:??????? dcubed
>>>>> date:??????? Thu Apr 11 14:14:30 2019 -0400
>>>>> files:?????? src/hotspot/share/runtime/threadSMR.cpp
>>>>> description:
>>>>> 8222034: Thread-SMR functions should be updated to remove work around
>>>>> Reviewed-by: mdoerr, eosterlund
>>>>>
>>>>> Here's one of the two diffs to job your memory:
>>>>>
>>>>> ?void ThreadsList::dec_nested_handle_cnt() {
>>>>> -? // The decrement needs to be MO_ACQ_REL. At the moment, the 
>>>>> Atomic::dec
>>>>> -? // backend on PPC does not yet conform to these requirements. 
>>>>> Therefore
>>>>> -? // the decrement is simulated with an Atomic::sub(1, &addr).
>>>>> -? // Without this MO_ACQ_REL Atomic::dec simulation, the nested 
>>>>> SMR mechanism
>>>>> -? // is not generally safe to use.
>>>>> -? Atomic::sub(1, &_nested_handle_cnt);
>>>>> +? // The decrement only needs to be MO_ACQ_REL since the reference
>>>>> +? // counter is volatile (and the hazard ptr is already NULL).
>>>>> +? Atomic::dec(&_nested_handle_cnt);
>>>>> ?}
>>>>>
>>>>> Below is David's comment about the code comment...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> Trimming down to just that issue...
>>>>>
>>>>> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>>>>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>>> >
>>>>> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>>
>>>>>>> ?199 // The decrement only needs to be MO_ACQ_REL since the 
>>>>>>> reference
>>>>>>> ?200?? // counter is volatile.
>>>>>>> ?201?? Atomic::dec(&_ref_count);
>>>>>>>
>>>>>>> volatile is irrelevant with regards to memory ordering as it is a 
>>>>>>> compiler annotation. And you haven't specified any memory order 
>>>>>>> value so the default is conservative ie. implied full fence. (I 
>>>>>>> see the same incorrect comment is in threadSMR.cpp!)
>>>>>>
>>>>>> I got that wording from threadSMR.cpp and Erik O. confirmed my use 
>>>>>> of that
>>>>>> wording previously. I'll chase it down with Erik and get back to you.
>>>>>>
>>>>>>
>>>>>>> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>>>>>> ?209?? // counter update is seen as soon as possible in a race 
>>>>>>> with the
>>>>>>> ?210?? // async deflation protocol.
>>>>>>> ?211?? Atomic::inc(&_ref_count);
>>>>>>>
>>>>>>> Ditto you haven't specified any ordering - and inc() and dec() 
>>>>>>> will have the same default.
>>>>>>
>>>>>> And again, I'll have to chase this down with Erik O. and get back 
>>>>>> to you.
>>>>>
>>>>
>>>
> 

From david.holmes at oracle.com  Tue Nov  5 04:52:16 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 14:52:16 +1000
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <a705e140-8a6f-3178-b229-88325a8e2584@loongson.cn>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
 <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
 <a705e140-8a6f-3178-b229-88325a8e2584@loongson.cn>
Message-ID: <90322374-4ae4-15c5-96f1-a2781b7e35b5@oracle.com>

Hi Jie,

On 5/11/2019 12:49 pm, Jie Fu wrote:
> Hi David,
> 
> I had tested your patch (without the Shenandoah fix) on VMs with/without 
> the JRF feature and both of them had passed for the particular reproducer.
> So thanks again for fixing it in the shared runtime code.

Thanks for verifying that. My own testing has also been good so far.

Just need an official Reviewer now.

Thanks again,
David
-----

> Best regards,
> Jie
> 
> On 2019/11/5 ??7:26, David Holmes wrote:
>> Hi Jie,
>>
>> Thanks for filing this and attempting a fix. As per the bug report the 
>> underlying issue has now been fixed in Shenandoah, but I want to make 
>> the interrupt code more resilient as well:
>>
>> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
>>
>> I was unable to reproduce the Shenandoah crash so if you could test 
>> this patch I would appreciate it - thanks. (Without the Shenandoah fix 
>> of course :) )
>>
>> Meanwhile I'm putting the patch through other testing.
>>
>> Thanks,
>> David
>> -----
>>
>> On 4/11/2019 11:13 pm, David Holmes wrote:
>>> Hi Jie,
>>>
>>> I will need to take a deeper look at this. This is a problem specific 
>>> to Shenadoah GC as it is triggering a sleep whilst a thread is still 
>>> in the process of attaching to the JVM :(
>>>
>>> Thanks,
>>> David
>>>
>>> On 4/11/2019 7:16 pm, Jie Fu wrote:
>>>> Hi all,
>>>>
>>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>>>
>>>> According to the comment [1], the assert seems to miss the case for 
>>>> threads attached via JNI.
>>>> For more info, please refer to the JBS.
>>>>
>>>> Could you please review it and give me some advice?
>>>>
>>>> Thanks a lot.
>>>> Best regards,
>>>> Jie
>>>>
>>>> [1] 
>>>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>>>
>>>>
>>>>
> 

From david.holmes at oracle.com  Tue Nov  5 04:56:42 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 14:56:42 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
Message-ID: <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>

On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
> On 2019/11/05 9:17, David Holmes wrote:
>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>> Hi Markus,
>>>>
>>>> I thought similar change, and it is running on submit repo:
>>>>
>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>
>>>> If it passes all tests, I will send review request again.
>>>
>>> This change passed all tests on submit repo 
>>> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>> Could you review again?
>>>
>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>
>>> In Markus's change, emergency dump will not perform when 
>>> Thread::current_or_null_safe() returns NULL.
>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to 
>>> PID by `kill` command - main thread of the process will be already 
>>> detached (out of JVM).
>>> Also the crash might happen in native thread - created by 
>>> pthread_create (on Linux) from JNI code.
>>>
>>> Thus we should continue to perform emergency dump even if 
>>> Thread::current_or_null_safe() returns NULL.
>>
>> I didn't quite follow all that, but if there is no current thread then 
>> prepare_for_emergency_dump() is either going to assert here:
>>
>> ??348?? Thread* const thread = Thread::current();
>>
>> or crash here:
>>
>> ??349?? if (thread->is_Watcher_thread()) {
> 
> Thanks David!
> I fixed it in new webrev:
> 
>  ? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/

It would be cleaner/simpler if prepare_for_emergency_dump takes the 
thread argument. As it is just a static function that doesn't impact 
anything else.

Thanks,
David

> It works fine on submit repo 
> (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
> 
> 
> Yasumasa
> 
> 
>> David
>> -----
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>> Hi Yasumasa and David,
>>>>>
>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I 
>>>>> realized later that, as you have pointed out, it would perform a 
>>>>> real thread transition. Sorry.
>>>>>
>>>>> Taking some input from ThreadInVMFromUnknown and the situations I 
>>>>> have seen at this location, I think the only case we need to be 
>>>>> concerned about here is when a JavaThread is _thread_in_native. 
>>>>> _thread_in_java transition to _thread_in_vm via stubs in 
>>>>> SharedRuntime (i believe) as part of coming out of the exception 
>>>>> handler(s). Unfortunately I cannot give a proper argument now to 
>>>>> give the premises where this invariant is enforced, so let's work 
>>>>> with the original thread state as you suggested Yasumasa.
>>>>>
>>>>> If we can avoid passing the thread all the way through, I think 
>>>>> that is preferable (this is not performance critical code). David 
>>>>> also alluded to the fact that you always manipulate the current 
>>>>> thread anyway. Although very unlikely, we could have run into an 
>>>>> issue with thread local storage, so it makes sense to test this up 
>>>>> front. If we cannot read the thread local, the operations we intend 
>>>>> to perform will fail, so we might just bail out already.
>>>>>
>>>>> I took the liberty to tighten up the transition class a little bit; 
>>>>> you only need to restore the thread state if there was an actual 
>>>>> change.
>>>>>
>>>>> Perhaps we can do it like this?
>>>>>
>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>
>>>>> Thanks for your patience investigating this
>>>>>
>>>>> Markus
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes
>>>>> Sent: den 4 november 2019 05:24
>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund 
>>>>> <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; 
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Cc: yasuenag at gmail.com
>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>>> thread state
>>>>>
>>>>> So looking at Yasumasa's proposed fix ...
>>>>>
>>>>> I don't think it is worth the disruption to pass the "thread" all 
>>>>> the way through these API's. It is simpler/cleaner to just call
>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>
>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>> "invariant");
>>>>>
>>>>> This assertion is incorrect. As this can be called via
>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee 
>>>>> that we need be in a JavaThread at all.
>>>>>
>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>
>>>>> Can I suggest JavaThreadInVM to make it clear this only affects 
>>>>> JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>
>>>>> Based on Markus's proposed change, and with a view to constrain the 
>>>>> scope even further can I suggest the following:
>>>>>
>>>>> if (!guard_reentrancy()) {
>>>>> ??? return;
>>>>> } else {
>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>> ????? return;
>>>>> ??? }
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>>
>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>> Correction ...
>>>>>>
>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>
>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>
>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>
>>>>>>>> I do not agree with this change.
>>>>>>>>
>>>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>>>> Thread::current() might be different with it.
>>>>>>>
>>>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>>>> the current thread.
>>>>>>>
>>>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>>>> change the thread state.
>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>
>>>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>>>> were pending. But the path through before_exit already has safepoint
>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>
>>>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>>>> on the report_and_die() path. So a custom transition helper is needed
>>>>>> to avoid that.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>> running in a JavaThread:
>>>>>>>
>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>
>>>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>
>>>>>>> Your change still seems overly complicated.
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Markus
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>>>> thread state
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>
>>>>>>>>>> ???? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>
>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>
>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>
>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>>>> state
>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Please review this change:
>>>>>>>>>>>
>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>
>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>> However this test is for JIT compiler, and related issue has 
>>>>>>>>>>> been
>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 05:13:37 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 14:13:37 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
Message-ID: <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>

Hi David,

> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.

prepare_for_emergency_dump() returns false if some critical locks could not unlock.
So what should we return if NULL is passed as the argument? true?

It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.


Thanks,

Yasumasa


On 2019/11/05 13:56, David Holmes wrote:
> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>> On 2019/11/05 9:17, David Holmes wrote:
>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>> Hi Markus,
>>>>>
>>>>> I thought similar change, and it is running on submit repo:
>>>>>
>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>
>>>>> If it passes all tests, I will send review request again.
>>>>
>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>> Could you review again?
>>>>
>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>
>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>
>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>
>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>
>>> ??348?? Thread* const thread = Thread::current();
>>>
>>> or crash here:
>>>
>>> ??349?? if (thread->is_Watcher_thread()) {
>>
>> Thanks David!
>> I fixed it in new webrev:
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
> 
> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
> 
> Thanks,
> David
> 
>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>
>>
>> Yasumasa
>>
>>
>>> David
>>> -----
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>> Hi Yasumasa and David,
>>>>>>
>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>
>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>
>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>
>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>
>>>>>> Perhaps we can do it like this?
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>
>>>>>> Thanks for your patience investigating this
>>>>>>
>>>>>> Markus
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes
>>>>>> Sent: den 4 november 2019 05:24
>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>> Cc: yasuenag at gmail.com
>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>>>
>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>
>>>>>> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>
>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>>>>>
>>>>>> This assertion is incorrect. As this can be called via
>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>
>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>
>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>
>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>
>>>>>> if (!guard_reentrancy()) {
>>>>>> ??? return;
>>>>>> } else {
>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>> ????? return;
>>>>>> ??? }
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>> Correction ...
>>>>>>>
>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>
>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>
>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>
>>>>>>>>> I do not agree with this change.
>>>>>>>>>
>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>
>>>>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>>>>> the current thread.
>>>>>>>>
>>>>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>>>>> change the thread state.
>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>
>>>>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>>>>> were pending. But the path through before_exit already has safepoint
>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>
>>>>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>>>>> on the report_and_die() path. So a custom transition helper is needed
>>>>>>> to avoid that.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>> running in a JavaThread:
>>>>>>>>
>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>
>>>>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>
>>>>>>>> Your change still seems overly complicated.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Markus
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>>>>> thread state
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>
>>>>>>>>>>> ???? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>
>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>
>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>>>>> state
>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>
>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>
>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>
>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa

From david.holmes at oracle.com  Tue Nov  5 05:31:32 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 15:31:32 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <32f8e268-7c15-82f6-3b9b-398c33c160cb@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
 <32f8e268-7c15-82f6-3b9b-398c33c160cb@oracle.com>
Message-ID: <f1e071e6-c92d-a6be-7536-df1a30fe7e6b@oracle.com>

Hi Dan,

On 5/11/2019 11:31 am, Daniel D. Daugherty wrote:
> Hi David,
> 
> Thanks for continuing to provide feedback on the Async Monitor Deflation
> project! I appreciate your reviews very much...
> 
> Responses embedded below (as usual)...

Ditto. :)

> 
> On 11/4/19 1:28 AM, David Holmes wrote:
>> Hi Dan,
>>
>> A few follow ups to your responses, with trimming ...
>>
>> On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>> ?122 // Set _owner field to new_value; current value must match 
>>>> old_value.
>>>> ?123 inline void ObjectMonitor::set_owner_from(void* new_value, 
>>>> void* old_value) {
>>>> ?124?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>>> ?125?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>>>> INTPTR_FORMAT
>>>>
>>>> The use of cmpxchg seems a little strange here if you are asserting 
>>>> that when this is called _owner must equal old_value. That means you 
>>>> don't expect any race and if there is no race with another thread 
>>>> writing to _owner then you don't need the cmpxchg. A normal:
>>>>
>>>> if (_owner == old_value) {
>>>> ?? Atomic::store(&_owner, new_value);
>>>> ?? log(...);
>>>> } else {
>>>> ?? guarantee(false, " unexpected old owner ...");
>>>> }
>>>
>>> The two parameter version of set_owner_from() is only called from three
>>> places and we'll cover two of them here:
>>>
>>> src/hotspot/share/runtime/objectMonitor.cpp:
>>>
>>> 1041???? if (AsyncDeflateIdleMonitors) {
>>> 1042?????? set_owner_from(NULL, Self);
>>> 1043???? } else {
>>> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>>> the lock
>>> 1045?????? OrderAccess::storeload();??????????????????????? // See if 
>>> we need to wake a successor
>>> 1046???? }
>>>
>>> and:
>>>
>>> 1221?? if (AsyncDeflateIdleMonitors) {
>>> 1222???? set_owner_from(NULL, Self);
>>> 1223?? } else {
>>> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
>>> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
>>> _owner vs LD in unpark()
>>> 1226?? }
>>>
>>> So I've replaced the existing {release_store(), storeload()} combo 
>>> for one
>>> call site and the existing {release_store(), fence()} combo for the 
>>> other
>>> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
>>>
>>> 1) I wanted the same memory sync behavior at both call sites.
>>> 2) I wanted similar/same memory sync behavior as the original
>>> ??? code at those call sites.
>>
>> Why? The memory sync requirements for non-async deflation may be 
>> completely different to those required for async-delfation (given all 
>> the other bits if the protocol).
> 
> Good point!
> 
> For context, the first code block above (L1041-6) is in 
> ObjectMonitor::exit()
> and the second code block above (L1221-6) is in ObjectMonitor::ExitEpilog()
> which is called from two different places by ObjectMonitor::exit(). In both
> cases, we are setting the _owner field to NULL which will potentially make
> the ObjectMonitor async deflatible (depending on ref_count).
> 
> For async deflation, I want the full fence semantics after setting the
> _owner field to NULL in both locations:
> 
> src/hotspot/share/runtime/orderAccess.hpp:
> //?????????????????????? Constraint???? x86????????? sparc TSO????????? ppc
> // 
> ---------------------------------------------------------------------------
> // fence???????????????? LoadStore? |?? lock???????? membar #StoreLoad  
> sync
> //?????????????????????? StoreStore |?? addl 0,(sp)
> //?????????????????????? LoadLoad?? |
> //?????????????????????? StoreLoad
> //
> // release?????????????? LoadStore | lwsync
> //?????????????????????? StoreStore
> 
> I don't want any loads or stores floating into or out of the critical 
> region.
> 
> 
> *** Side bar here ****
> 
> I just noticed something with the original code:
> 
> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
> lock
> 1045?????? OrderAccess::storeload();??????????????????????? // See if we 
> need to wake a successor
> 
> For constraints, this gives us:
>  ?????????? {LoadStore | StoreStore}
>  ?????????? {StoreLoad}
> at L1044-5. So the original code is just "missing" LoadLoad relative
> to a full fence(). I'm not sure why this kind of load is allowed to
> float into the critical region, but the code has been this way for a
> very long time.

You seem to overlooking the fact that your store appears between the 
various memory barriers e.g.

              {LoadStore | StoreStore}
              ST _owner, 0
              {StoreLoad}

which establishes the effects of those barriers with respect to that 
store. So loadload() would be superfluous as we've already ensured that 
no loads can float above the store, due to the storeload barrier.

A full fence is logically all 4 barriers in that it ensures all loads 
and all stores remain on their respective sides of the fence - nothing 
can cross it.

> And for this original code:
> 
> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
> _owner vs LD in unpark()
> 
> For constraints, this gives us:
>  ???????? {LoadStore | StoreStore}
>  ???????? {LoadStore | StoreStore | LoadLoad | StoreLoad}
> at L1224-5. Again this code has been this way for a very long time.
> 
> It seems to me that L1224-5 could be written like this:
> 
> 1224???? _owner = NULL;
> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
> _owner vs LD in unpark()
> 
> with a plain store on L1224. Is that correct?

No, I don't believe so.

What we are also in danger of overlooking here is the presence of memory 
synchronization instructions related to the semantics of a synchronized 
code block in Java, and the presence of memory synchronization 
instructions needed for the correct implementation of the 
synchronization subsystem itself. Specifically given:

OrderAccess::release_store(&_owner, (void*)NULL);
OrderAccess::<some other sync op>

The release store ensures that the releasing of the monitor cannot be 
reordered with respect to any of the stores that occurred within the 
synchronized block at the Java-level. (And it _might_ also ensure some 
property of the sync implementation.) While "some other op" is typically 
needed only because of the way we implement the synchronization 
subsystem - as per the comments e.g.

storeload(); // See if we need to wake a successor

we don't want to load a potential successor before we set _owner to NULL 
else we might read the wrong value.

> *** End side bar ***
> 
> 
>>
>>> 3) I wanted the return value from cmpxchg() for my state machine
>>> ??? sanity check.
>>
>> I'm somewhat dubious about using cmpxchg just for the side-effect of 
>> getting the existing value.
> 
> But I'm not "using cmpxchg just for the side-effect of getting the 
> existing value".
> 
> That's the third thing on my list of three reasons. The most important
> thing is I want the full fence that cmpcxhg() gives me. Above I said:
> 
>  > 1) I wanted the same memory sync behavior at both call sites.
>  > 2) I wanted similar/same memory sync behavior as the original
>  >? ?? code at those call sites.
> 
> Using cmpxchg() gives me the full fence I want and that's similar to
> this baseline code at this call site:

Yes but the memory sync effects of cmpxchg are secondary to it primary 
purpose: which is to provide an atomic compare-and-exchange in the face 
of concurrent updates to a variable.

> 
> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
> lock
> 1045?????? OrderAccess::storeload();??????????????????????? // See if we 
> need to wake a successor
> 
> I'm getting the LoadLoad that the baseline site doesn't have.

And which it doesn't need.

> The cmpxchg() gives me the same memory constaints as this baseline code
> at this call site:
> 
> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
> _owner vs LD in unpark()
> 
> Note: Actually, I don't have the extra {LoadStore | StoreStore} from
> the release_store() that I mentioned in the side bar above...
> 
> The last thing that I get is the existing value...
> 
> 
> Okay, so I thought it was a pretty cool use of cmpxchg(), but I'm
> obviously confusing code readers. So here's the v2.08 set_owner_from():
> 
>  ?124 // Set _owner field to new_value; current value must match old_value.
>  ?125 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
> old_value) {
>  ?126?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>  ?127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
>  ?128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(old_value));
>  ?129?? log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
> INTPTR_FORMAT
>  ?130????????????????????????????????????? ", prev=" INTPTR_FORMAT ", new="
>  ?131????????????????????????????????????? INTPTR_FORMAT, p2i(this), 
> p2i(prev),
>  ?132????????????????????????????????????? p2i(new_value));
>  ?133 }
> 
> I could change it like this:
> 
>  ?124 // Set _owner field to new_value; current value must match old_value.
>  ?125 inline void ObjectMonitor::set_owner_from(void* new_value, void* 
> old_value) {
>  ?126?? void* prev = _owner;
>  ?127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
>  ?128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(old_value));
>  ?129?? _owner = new_value;
>  ?130?? OrderAccess::fence();
>  ?131 ? log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
> INTPTR_FORMAT
>  ?132 ???????????????????????????????????? ", prev=" INTPTR_FORMAT ", new="
>  ?133 ???????????????????????????????????? INTPTR_FORMAT, p2i(this), 
> p2i(prev),
>  ?134 ???????????????????????????????????? p2i(new_value));
>  ?135 }
> 
> It's two lines longer, but it should require less head scratching to
> figure out what I'm trying to do. Would this be acceptable?

As per previous discussion I think you still need a release_store of 
_owner (at least in the case where you are releasing the monitor).

That's it on this thread. I still have to look at version 2.08 in full.

Thanks,
David
-----

> 
> 
>>
>>> I don't think that using 'Atomic::store(&_owner, new_value)' is the
>>> right choice for these two call sites.
>>
>> If you don't actually need the cmpxchg to handle concurrent updates to 
>> the _owner field, then a plain store (not an Atomic::store - that was 
>> an error on my part) does not seem unreasonable; or if there are still 
>> memory sync issues here, perhaps a release_store.
> 
> So in the above proposed code I switched to a plain store followed by
> a fence().
> 
> 
>> If you use cmpxchg then anyone reading the code will assume there is a 
>> concurrent update that you are guarding against.
> 
> Yup. I concede the point that I'm obviously confusing the other
> code readers... sorry about that...
> 
> 
>>
>>> The last two parameter set_owner_from() is talked about in the
>>> next reply.
>>>
>>>
>>>> Similarly for the old_value1/old_valuie2 version.
>>>
>>> The three parameter version of set_owner_from() is only called from one
>>> place and the last two parameter version is called from the same place:
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>
>>> 1903?????? if (AsyncDeflateIdleMonitors) {
>>> 1904???????? m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
>>> 1905?????? } else {
>>> 1906???????? m->set_owner_from(mark.locker(), NULL);
>>> 1907?????? }
>>>
>>> The original code was:
>>>
>>> 1399?????? m->set_owner(mark.locker());
>>>
>>> The original set_owner() code was defined like this:
>>>
>>> ?? 87 inline void ObjectMonitor::set_owner(void* owner) {
>>> ?? 88?? _owner = owner;
>>> ?? 89 }
>>>
>>> So the original code didn't do any memory sync'ing at all and I've
>>> changed that to a cmpxchg() on both code paths. That appears to be
>>> overkill for that callsite...
>>
>> Again I'm not sure any memory sync requirements from the non-async 
>> case should necessarily transfer over to the async case. Even if you 
>> end up requiring similar memory sync the reasoning would be quite 
>> different I would expect.
> 
> In this case, both async deflation and safepoint based deflation are
> happy with the same memory sync because the newly allocated ObjectMonitor
> isn't published yet so it is not deflatible by either mechanism. Also the
> act of publishing the ObjectMonitor* will take care of the memory sync.
> 
> 
>>
>>>
>>> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
>>> section of the code. We've gotten our ObjectMonitor from om_alloc()
>>> and are initializing a number of fields in the ObjectMonitor. The
>>> ObjectMonitor is not published until we do:
>>>
>>> 1916?????? object->release_set_mark(markWord::encode(m));
>>>
>>> So we don't need the memory sync'ing features of the cmpxchg() for
>>> either of the set_owner_from() calls and all that leaves is the
>>> state machine sanity check.
>>>
>>> I really like the state machine sanity check on the owner field but
>>> that's just because it came in handy when chasing the recent races.
>>> It would be easy to change the three parameter version of
>>> set_owner_from() to not do memory sync'ing, but still do the state
>>> machine sanity check.
>>>
>>> Update: Changing the three parameter version of set_owner_from()
>>> may impact the changes to owner_is_DEFLATER_MARKER() discussed
>>> above. Sigh...
>>> Update 2: Probably no impact because the three parameter version of
>>> set_owner_from() is only used before the ObjectMonitor is published
>>> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
>>> appeared on an in-use list.
>>>
>>> However, the two parameter version of set_owner_from() needs its
>>> memory sync'ing behavior for it's objectMonitor.cpp call sites so
>>> this call site would need something different.
>>>
>>> I'm not sure which solution I'm going to pick yet, but I definitely
>>> have to change something here since we don't need cmpxchg() at this
>>> call site. More thought is required.
>>
>> I will look to see where this ended up.
> 
> I'll wait to see if you can live with the v2.08 version. I hope so...
> 
> 
>>
>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>
>>>>
>>>> ?267?? if (AsyncDeflateIdleMonitors &&
>>>> ?268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>>>> DEFLATER_MARKER) {
>>>
>>> For more context, we are in:
>>>
>>> ??241 void ObjectMonitor::enter(TRAPS) {
>>>
>>>
>>>> I don't see why you need to call try_set_owner_from again here as 
>>>> "cur" will already be DEFLATER_MARKER from the previous try_set_owner.
>>>
>>> I assume the previous try_set_owner() call you mean is this one:
>>>
>>> ??248?? void* cur = try_set_owner_from(Self, NULL);
>>>
>>> This first try_set_owner() is for the most common case of no owner.
>>>
>>> The second try_set_owner() call is for a different condition than the 
>>> first:
>>>
>>> ??268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>>> DEFLATER_MARKER) {
>>>
>>> L248 is trying to change the _owner field from NULL -> 'Self'.
>>> L268 is trying to change the _owner field from DEFLATER_MARKER to 
>>> 'Self'.
>>>
>>> If the try_set_owner() call on L248 fails, 'cur' can be several possible
>>> values:
>>>
>>> ?? - the calling thread (recursive enter is handled on L254-7)
>>> ?? - other owning thread value (BasicLock* or Thread*)
>>> ?? - DEFLATER_MARKER
>>
>> I'll give a caution okay to that explanation (the deficiency being in 
>> my understanding, not your explaining :) ).
> 
> Thanks. I'll take it!
> 
> 
>>
>>>> Further, I don't see how installing self as the _owner here is valid 
>>>> and means you acquired the monitor, as the fact it was 
>>>> DEFLATER_MARKER means it is still being deflated by another thread 
>>>> doesn't it ???
>>>
>>> I guess the comment after L268 didn't work for you:
>>>
>>> ??269???? // The deflation protocol finished the first part (setting 
>>> owner),
>>> ??270???? // but it failed the second part (making ref_count 
>>> negative) and
>>> ??271???? // bailed. Or the ObjectMonitor was async deflated and reused.
>>>
>>> It means that the deflater thread was racing with this enter and
>>> managed to set the owner field to DEFLATER_MARKER as the first step
>>> in the deflation protocol. Our entering thread actually won the race
>>> when it managed to set the ref_count to a positive value as part of
>>> the ObjectMonitorHandle stuff done in the inflate() call that preceded
>>> the enter() call. However, the deflater thread hasn't realized that it
>>> lost the race yet and hasn't restored the owner field back to NULL.
>>
>> You're right the comment didn't work for me as it required me to be 
>> holding too much of the protocol in my head. Makes more sense now.
> 
> Good to hear!
> 
> 
>>
>> Thanks,
>> David
>> -----
> 
> Thanks again for the thorough reviews!
> 
> Dan
> 

From david.holmes at oracle.com  Tue Nov  5 05:34:48 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 15:34:48 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
Message-ID: <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>

On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
> Hi David,
> 
>> It would be cleaner/simpler if prepare_for_emergency_dump takes the 
>> thread argument. As it is just a static function that doesn't impact 
>> anything else.
> 
> prepare_for_emergency_dump() returns false if some critical locks could 
> not unlock.
> So what should we return if NULL is passed as the argument? true?

But you're not calling prepare_for_emergency_dump when the thread is NULL:

  454   Thread* thread = Thread::current_or_null_safe();
  455
  456   // Ensure a JavaThread is _thread_in_vm when we make this call
  457   JavaThreadInVM jtivm(thread);
  458   if ((thread != NULL) && !prepare_for_emergency_dump()) {
  459     return;
  460   }

All I'm saying is that you pass "thread" as a parameter so you can then 
delete the existing call to Thread::current() that is inside 
prepare_for_emergency_dump.

David
-----

> It might break semantics of this function, so I did not add argument to 
> prepare_for_emergency_dump() in this webrev.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2019/11/05 13:56, David Holmes wrote:
>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>> On 2019/11/05 9:17, David Holmes wrote:
>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>> Hi Markus,
>>>>>>
>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>
>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>
>>>>>> If it passes all tests, I will send review request again.
>>>>>
>>>>> This change passed all tests on submit repo 
>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>> Could you review again?
>>>>>
>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>
>>>>> In Markus's change, emergency dump will not perform when 
>>>>> Thread::current_or_null_safe() returns NULL.
>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to 
>>>>> PID by `kill` command - main thread of the process will be already 
>>>>> detached (out of JVM).
>>>>> Also the crash might happen in native thread - created by 
>>>>> pthread_create (on Linux) from JNI code.
>>>>>
>>>>> Thus we should continue to perform emergency dump even if 
>>>>> Thread::current_or_null_safe() returns NULL.
>>>>
>>>> I didn't quite follow all that, but if there is no current thread 
>>>> then prepare_for_emergency_dump() is either going to assert here:
>>>>
>>>> ??348?? Thread* const thread = Thread::current();
>>>>
>>>> or crash here:
>>>>
>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>
>>> Thanks David!
>>> I fixed it in new webrev:
>>>
>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>
>> It would be cleaner/simpler if prepare_for_emergency_dump takes the 
>> thread argument. As it is just a static function that doesn't impact 
>> anything else.
>>
>> Thanks,
>> David
>>
>>> It works fine on submit repo 
>>> (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>
>>>
>>> Yasumasa
>>>
>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>> Hi Yasumasa and David,
>>>>>>>
>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I 
>>>>>>> realized later that, as you have pointed out, it would perform a 
>>>>>>> real thread transition. Sorry.
>>>>>>>
>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I 
>>>>>>> have seen at this location, I think the only case we need to be 
>>>>>>> concerned about here is when a JavaThread is _thread_in_native. 
>>>>>>> _thread_in_java transition to _thread_in_vm via stubs in 
>>>>>>> SharedRuntime (i believe) as part of coming out of the exception 
>>>>>>> handler(s). Unfortunately I cannot give a proper argument now to 
>>>>>>> give the premises where this invariant is enforced, so let's work 
>>>>>>> with the original thread state as you suggested Yasumasa.
>>>>>>>
>>>>>>> If we can avoid passing the thread all the way through, I think 
>>>>>>> that is preferable (this is not performance critical code). David 
>>>>>>> also alluded to the fact that you always manipulate the current 
>>>>>>> thread anyway. Although very unlikely, we could have run into an 
>>>>>>> issue with thread local storage, so it makes sense to test this 
>>>>>>> up front. If we cannot read the thread local, the operations we 
>>>>>>> intend to perform will fail, so we might just bail out already.
>>>>>>>
>>>>>>> I took the liberty to tighten up the transition class a little 
>>>>>>> bit; you only need to restore the thread state if there was an 
>>>>>>> actual change.
>>>>>>>
>>>>>>> Perhaps we can do it like this?
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>
>>>>>>> Thanks for your patience investigating this
>>>>>>>
>>>>>>> Markus
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes
>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund 
>>>>>>> <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; 
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Cc: yasuenag at gmail.com
>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>>>>> thread state
>>>>>>>
>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>
>>>>>>> I don't think it is worth the disruption to pass the "thread" all 
>>>>>>> the way through these API's. It is simpler/cleaner to just call
>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>
>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>>>> "invariant");
>>>>>>>
>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee 
>>>>>>> that we need be in a JavaThread at all.
>>>>>>>
>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>
>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects 
>>>>>>> JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>
>>>>>>> Based on Markus's proposed change, and with a view to constrain 
>>>>>>> the scope even further can I suggest the following:
>>>>>>>
>>>>>>> if (!guard_reentrancy()) {
>>>>>>> ??? return;
>>>>>>> } else {
>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>> ????? return;
>>>>>>> ??? }
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>> Correction ...
>>>>>>>>
>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>
>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>
>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>
>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its 
>>>>>>>>>> arguments. So
>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>
>>>>>>>>> Not sure what you mean. You only ever manipulate the thread 
>>>>>>>>> state of
>>>>>>>>> the current thread.
>>>>>>>>>
>>>>>>>>>> In addition, ThreadInVMfromUnknown uses 
>>>>>>>>>> transition_from_native() to
>>>>>>>>>> change the thread state.
>>>>>>>>>> It checks (and manipulates?) something which relates to 
>>>>>>>>>> safepoint.
>>>>>>>>>
>>>>>>>>> Yes it does - which would be a problem if a safepoint (or 
>>>>>>>>> handshake)
>>>>>>>>> were pending. But the path through before_exit already has 
>>>>>>>>> safepoint
>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>
>>>>>>>> But that isn't relevant. The issue is we don't want a safepoint 
>>>>>>>> check
>>>>>>>> on the report_and_die() path. So a custom transition helper is 
>>>>>>>> needed
>>>>>>>> to avoid that.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>> running in a JavaThread:
>>>>>>>>>
>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>
>>>>>>>>> so we can't use the existing thread-state helpers, unless we 
>>>>>>>>> narrow
>>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>
>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Markus
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>> recover
>>>>>>>>>>>> thread state
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Markus commented in JBS this change should be kept local to 
>>>>>>>>>>>> JFR.
>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>
>>>>>>>>>>>>      
>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>
>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR 
>>>>>>>>>>>>> code.
>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which 
>>>>>>>>>>>>> recover
>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover 
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> state
>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>>
>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>> However this function transits thread state to 
>>>>>>>>>>>>> "_thread_in_vm".
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>> However this test is for JIT compiler, and related issue 
>>>>>>>>>>>>> has been
>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 05:48:05 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 14:48:05 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
Message-ID: <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>

On 2019/11/05 14:34, David Holmes wrote:
> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>> Hi David,
>>
>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>
>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>> So what should we return if NULL is passed as the argument? true?
> 
> But you're not calling prepare_for_emergency_dump when the thread is NULL:
> 
>  ?454?? Thread* thread = Thread::current_or_null_safe();
>  ?455
>  ?456?? // Ensure a JavaThread is _thread_in_vm when we make this call
>  ?457?? JavaThreadInVM jtivm(thread);
>  ?458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>  ?459???? return;
>  ?460?? }

Oh, sorry, I have a mistake!
I want to change as below:

```
+  Thread* thread = Thread::current_or_null_safe();
+
+  // Ensure a JavaThread is _thread_in_vm when we make this call
+  JavaThreadInVM jtivm(thread);
+  if (thread != NULL) {
+    if (!prepare_for_emergency_dump()) {
+      return;
+    }
+  }
```

> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.

Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
If so, I will push new changeset to submit repo, and will send new review request.


Yasumasa


> David
> -----
> 
>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2019/11/05 13:56, David Holmes wrote:
>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>> Hi Markus,
>>>>>>>
>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>
>>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>
>>>>>>> If it passes all tests, I will send review request again.
>>>>>>
>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>> Could you review again?
>>>>>>
>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>
>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>
>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>
>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>
>>>>> ??348?? Thread* const thread = Thread::current();
>>>>>
>>>>> or crash here:
>>>>>
>>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>>
>>>> Thanks David!
>>>> I fixed it in new webrev:
>>>>
>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>
>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>
>>> Thanks,
>>> David
>>>
>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>> Hi Yasumasa and David,
>>>>>>>>
>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>
>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>
>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>
>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>
>>>>>>>> Perhaps we can do it like this?
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>
>>>>>>>> Thanks for your patience investigating this
>>>>>>>>
>>>>>>>> Markus
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes
>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>>>>>
>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>
>>>>>>>> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>
>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>>>>>>>
>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>
>>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>
>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>
>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>
>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>> ??? return;
>>>>>>>> } else {
>>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>>> ????? return;
>>>>>>>> ??? }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>> Correction ...
>>>>>>>>>
>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>
>>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>>
>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>
>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>
>>>>>>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>>>>>>> the current thread.
>>>>>>>>>>
>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>>>>>>> change the thread state.
>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>
>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>>>>>>> were pending. But the path through before_exit already has safepoint
>>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>>
>>>>>>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>>>>>>> on the report_and_die() path. So a custom transition helper is needed
>>>>>>>>> to avoid that.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>>> running in a JavaThread:
>>>>>>>>>>
>>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>
>>>>>>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>
>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>>>>>>> thread state
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>>>>>>> state
>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa

From david.holmes at oracle.com  Tue Nov  5 05:56:14 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 15:56:14 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
Message-ID: <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>

On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
> On 2019/11/05 14:34, David Holmes wrote:
>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>> Hi David,
>>>
>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the 
>>>> thread argument. As it is just a static function that doesn't impact 
>>>> anything else.
>>>
>>> prepare_for_emergency_dump() returns false if some critical locks 
>>> could not unlock.
>>> So what should we return if NULL is passed as the argument? true?
>>
>> But you're not calling prepare_for_emergency_dump when the thread is 
>> NULL:
>>
>> ??454?? Thread* thread = Thread::current_or_null_safe();
>> ??455
>> ??456?? // Ensure a JavaThread is _thread_in_vm when we make this call
>> ??457?? JavaThreadInVM jtivm(thread);
>> ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>> ??459???? return;
>> ??460?? }
> 
> Oh, sorry, I have a mistake!
> I want to change as below:
> 
> ```
> +? Thread* thread = Thread::current_or_null_safe();
> +
> +? // Ensure a JavaThread is _thread_in_vm when we make this call
> +? JavaThreadInVM jtivm(thread);
> +? if (thread != NULL) {
> +??? if (!prepare_for_emergency_dump()) {
> +????? return;
> +??? }
> +? }
> ```

but that is the same logic ??

>> All I'm saying is that you pass "thread" as a parameter so you can 
>> then delete the existing call to Thread::current() that is inside 
>> prepare_for_emergency_dump.
> 
> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of 
> above?

??? The two are not related. If you've already obtained the current 
thread you can pass it to prepare_for_emergency_dump and avoid the need 
to call Thread:current() (in whatever form) again. How you handle a NULL 
current thread is independent of that.

> If so, I will push new changeset to submit repo, and will send new 
> review request.

I'd send the review request first and get agreement before wasting time 
on the submit repo.

Thanks,
David
-----

> 
> Yasumasa
> 
> 
>> David
>> -----
>>
>>> It might break semantics of this function, so I did not add argument 
>>> to prepare_for_emergency_dump() in this webrev.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/05 13:56, David Holmes wrote:
>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>> Hi Markus,
>>>>>>>>
>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>
>>>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>
>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>
>>>>>>> This change passed all tests on submit repo 
>>>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>> Could you review again?
>>>>>>>
>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>
>>>>>>> In Markus's change, emergency dump will not perform when 
>>>>>>> Thread::current_or_null_safe() returns NULL.
>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws 
>>>>>>> to PID by `kill` command - main thread of the process will be 
>>>>>>> already detached (out of JVM).
>>>>>>> Also the crash might happen in native thread - created by 
>>>>>>> pthread_create (on Linux) from JNI code.
>>>>>>>
>>>>>>> Thus we should continue to perform emergency dump even if 
>>>>>>> Thread::current_or_null_safe() returns NULL.
>>>>>>
>>>>>> I didn't quite follow all that, but if there is no current thread 
>>>>>> then prepare_for_emergency_dump() is either going to assert here:
>>>>>>
>>>>>> ??348?? Thread* const thread = Thread::current();
>>>>>>
>>>>>> or crash here:
>>>>>>
>>>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>>>
>>>>> Thanks David!
>>>>> I fixed it in new webrev:
>>>>>
>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>
>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the 
>>>> thread argument. As it is just a static function that doesn't impact 
>>>> anything else.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> It works fine on submit repo 
>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>
>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I 
>>>>>>>>> realized later that, as you have pointed out, it would perform 
>>>>>>>>> a real thread transition. Sorry.
>>>>>>>>>
>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations 
>>>>>>>>> I have seen at this location, I think the only case we need to 
>>>>>>>>> be concerned about here is when a JavaThread is 
>>>>>>>>> _thread_in_native. _thread_in_java transition to _thread_in_vm 
>>>>>>>>> via stubs in SharedRuntime (i believe) as part of coming out of 
>>>>>>>>> the exception handler(s). Unfortunately I cannot give a proper 
>>>>>>>>> argument now to give the premises where this invariant is 
>>>>>>>>> enforced, so let's work with the original thread state as you 
>>>>>>>>> suggested Yasumasa.
>>>>>>>>>
>>>>>>>>> If we can avoid passing the thread all the way through, I think 
>>>>>>>>> that is preferable (this is not performance critical code). 
>>>>>>>>> David also alluded to the fact that you always manipulate the 
>>>>>>>>> current thread anyway. Although very unlikely, we could have 
>>>>>>>>> run into an issue with thread local storage, so it makes sense 
>>>>>>>>> to test this up front. If we cannot read the thread local, the 
>>>>>>>>> operations we intend to perform will fail, so we might just 
>>>>>>>>> bail out already.
>>>>>>>>>
>>>>>>>>> I took the liberty to tighten up the transition class a little 
>>>>>>>>> bit; you only need to restore the thread state if there was an 
>>>>>>>>> actual change.
>>>>>>>>>
>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>
>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>
>>>>>>>>> Markus
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes
>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund 
>>>>>>>>> <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>> recover thread state
>>>>>>>>>
>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>
>>>>>>>>> I don't think it is worth the disruption to pass the "thread" 
>>>>>>>>> all the way through these API's. It is simpler/cleaner to just 
>>>>>>>>> call
>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>
>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>>>>>> "invariant");
>>>>>>>>>
>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no 
>>>>>>>>> guarantee that we need be in a JavaThread at all.
>>>>>>>>>
>>>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>
>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects 
>>>>>>>>> JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>
>>>>>>>>> Based on Markus's proposed change, and with a view to constrain 
>>>>>>>>> the scope even further can I suggest the following:
>>>>>>>>>
>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>> ??? return;
>>>>>>>>> } else {
>>>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>> ????? return;
>>>>>>>>> ??? }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>> Correction ...
>>>>>>>>>>
>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>
>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>>>> thread-state transition code is preferable to adding more 
>>>>>>>>>>>>> custom
>>>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>>>
>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>
>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its 
>>>>>>>>>>>> arguments. So
>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what you mean. You only ever manipulate the thread 
>>>>>>>>>>> state of
>>>>>>>>>>> the current thread.
>>>>>>>>>>>
>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses 
>>>>>>>>>>>> transition_from_native() to
>>>>>>>>>>>> change the thread state.
>>>>>>>>>>>> It checks (and manipulates?) something which relates to 
>>>>>>>>>>>> safepoint.
>>>>>>>>>>>
>>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or 
>>>>>>>>>>> handshake)
>>>>>>>>>>> were pending. But the path through before_exit already has 
>>>>>>>>>>> safepoint
>>>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>>>
>>>>>>>>>> But that isn't relevant. The issue is we don't want a 
>>>>>>>>>> safepoint check
>>>>>>>>>> on the report_and_die() path. So a custom transition helper is 
>>>>>>>>>> needed
>>>>>>>>>> to avoid that.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>>>> running in a JavaThread:
>>>>>>>>>>>
>>>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>
>>>>>>>>>>> so we can't use the existing thread-state helpers, unless we 
>>>>>>>>>>> narrow
>>>>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>
>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; 
>>>>>>>>>>>>>> yasuenag at gmail.com;
>>>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>> thread state
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local 
>>>>>>>>>>>>>> to JFR.
>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR 
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which 
>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover 
>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>> However this function transits thread state to 
>>>>>>>>>>>>>>> "_thread_in_vm".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>> However this test is for JIT compiler, and related issue 
>>>>>>>>>>>>>>> has been
>>>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 06:08:52 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 15:08:52 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <ba73b7c2-0117-cb85-da13-79541cdb8c8e@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
Message-ID: <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>

Hi David,

Sorry, I was confused :)
This is new webrev. Could you check again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/


Yasumasa


On 2019/11/05 14:56, David Holmes wrote:
> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>> On 2019/11/05 14:34, David Holmes wrote:
>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>> Hi David,
>>>>
>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>
>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>> So what should we return if NULL is passed as the argument? true?
>>>
>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>
>>> ??454?? Thread* thread = Thread::current_or_null_safe();
>>> ??455
>>> ??456?? // Ensure a JavaThread is _thread_in_vm when we make this call
>>> ??457?? JavaThreadInVM jtivm(thread);
>>> ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>> ??459???? return;
>>> ??460?? }
>>
>> Oh, sorry, I have a mistake!
>> I want to change as below:
>>
>> ```
>> +? Thread* thread = Thread::current_or_null_safe();
>> +
>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>> +? JavaThreadInVM jtivm(thread);
>> +? if (thread != NULL) {
>> +??? if (!prepare_for_emergency_dump()) {
>> +????? return;
>> +??? }
>> +? }
>> ```
> 
> but that is the same logic ??
> 
>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>
>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
> 
> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
> 
>> If so, I will push new changeset to submit repo, and will send new review request.
> 
> I'd send the review request first and get agreement before wasting time on the submit repo.
> 
> Thanks,
> David
> -----
> 
>>
>> Yasumasa
>>
>>
>>> David
>>> -----
>>>
>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>> Hi Markus,
>>>>>>>>>
>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>
>>>>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>
>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>
>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>> Could you review again?
>>>>>>>>
>>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>
>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>
>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>
>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>
>>>>>>> ??348?? Thread* const thread = Thread::current();
>>>>>>>
>>>>>>> or crash here:
>>>>>>>
>>>>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>>>>
>>>>>> Thanks David!
>>>>>> I fixed it in new webrev:
>>>>>>
>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>
>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>
>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>
>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>
>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>
>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>
>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>
>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>
>>>>>>>>>> Markus
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes
>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>>>>>>>
>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>
>>>>>>>>>> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>
>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>>>>>>>>>
>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>
>>>>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>
>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>
>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>
>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>> ??? return;
>>>>>>>>>> } else {
>>>>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>> ????? return;
>>>>>>>>>> ??? }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>> Correction ...
>>>>>>>>>>>
>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>
>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>
>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>>>>>>>>> the current thread.
>>>>>>>>>>>>
>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>>>>>>>>> change the thread state.
>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>>>>>>>>> were pending. But the path through before_exit already has safepoint
>>>>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>
>>>>>>>>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>>>>>>>>> on the report_and_die() path. So a custom transition helper is needed
>>>>>>>>>>> to avoid that.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>>>>> running in a JavaThread:
>>>>>>>>>>>>
>>>>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>
>>>>>>>>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>
>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>>>>>>>>> thread state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa

From m.sundar85 at gmail.com  Tue Nov  5 06:13:08 2019
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Mon, 4 Nov 2019 22:13:08 -0800
Subject: JVM stuck/looping in futex call
In-Reply-To: <350a6e97-41ee-f49f-0354-ec655d6490da@oracle.com>
References: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
 <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>
 <CACGCMVqRdOnjstXgFZ3AGd2Fxo8LBrTeHkC6r7EJZ311-M_a=w@mail.gmail.com>
 <350a6e97-41ee-f49f-0354-ec655d6490da@oracle.com>
Message-ID: <CACGCMVqQZ66ty164P1QxgFk-QKNhmrh+Y99XQ5vDapA2V+JDxw@mail.gmail.com>

Hi David,
    Will try to get stack when it happens.
I think the main thread is where the loop happens (in my case it is jetty
server).
No special environment just jetty server and no native threads are
attached. We also try to avoid JNI related stuff as much as possible.


Thanks
Sundar

On Mon, Nov 4, 2019 at 3:17 PM David Holmes <david.holmes at oracle.com> wrote:

> On 5/11/2019 8:43 am, Sundara Mohan M wrote:
> > HI David,
> >      Did you mean to get stack trace of that process? I could attach to
> > gdb but not sure where to keep breakpoint.
> > More info on how to get this will be helpful.
>
> I need to see the stack before we hit the looping call, to see what it
> is that triggers the loop. Can you tell what thread is involved?
>
> Is there something special/different about your Linux environment? Do
> you have native threads attached to the VM?
>
> Thanks,
> David
>
> >
> > Thanks
> > Sundar
> >
> > On Fri, Nov 1, 2019 at 4:03 PM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> >     Hi Sundar,
> >
> >     On 2/11/2019 5:39 am, Sundara Mohan M wrote:
> >      > Hi,
> >      >      I am running openjdk12/Linux on our systems and see jvm not
> >     responding
> >      > to jstack or any diagnostic command (jcmd VM.info/Thread.print).
> >     Though
> >      > application is running fine.
> >
> >     That would sound like the attach thread (which would respond to the
> >     jstack or other diagnostic command) is in some kind of bad state.
> >
> >      > I see following stack track
> >      >
> >      > Process 115586 attached
> >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
> >     (To be
> >      > restarted if SA_RESTART is set)
> >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> >     si_uid=1000}
> >      > ---
> >      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> >      > rt_sigreturn()                          = 202
> >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
> >     (To be
> >      > restarted if SA_RESTART is set)
> >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> >     si_uid=1000}
> >      > ---
> >      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> >      > rt_sigreturn()                          = 202
> >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
> >     (To be
> >      > restarted if SA_RESTART is set)
> >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> >     si_uid=1000}
> >      > ---
> >      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> >      > rt_sigreturn()                          = 202
> >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
> >     (To be
> >      > restarted if SA_RESTART is set)
> >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> >     si_uid=1000}
> >      > ---
> >      > rt_sigreturn()                          = 202
> >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ? ERESTARTSYS
> >     (To be
> >      > restarted if SA_RESTART is set)
> >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
> >     si_uid=1000}
> >      > ---
> >      > rt_sigreturn()                          = 202
> >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess 115586
> >     detached
> >      >   <detached ...>
> >      >
> >      > Can someone help me understand what is happening here?
> >
> >     It appears that in responding to the SIGQUIT that is used to trigger
> >     the
> >     starting of the attach listener thread, that something is going
> wrong.
> >     We appear to be continually restarting an operation that still sees
> the
> >     signal pending - which doesn't really make sense to me. Can you get a
> >     complete stack trace using gdb?
> >
> >      > Please redirect me to proper ilist if this is not correct list
> >     for these
> >      > type of questions.
> >
> >     This list is fine. It may end up being an issue for
> serviceability-dev
> >     but we can deal with that later. :)
> >
> >     Thanks,
> >     David
> >     -----
> >
> >      >
> >      > TIA
> >      > Sundar
> >      >
> >
>

From david.holmes at oracle.com  Tue Nov  5 06:19:47 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 16:19:47 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
Message-ID: <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>

On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
> Hi David,
> 
> Sorry, I was confused :)
> This is new webrev. Could you check again?
> 
>  ? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/

Okay structurally I'm fine with that.

What I don't know, and will leave to Markus to determine, is whether the 
rest of the code in on_vm_shutdown can actually execute okay if there is 
no current thread.

Thanks,
David

> 
> Yasumasa
> 
> 
> On 2019/11/05 14:56, David Holmes wrote:
>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>> On 2019/11/05 14:34, David Holmes wrote:
>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>> Hi David,
>>>>>
>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes 
>>>>>> the thread argument. As it is just a static function that doesn't 
>>>>>> impact anything else.
>>>>>
>>>>> prepare_for_emergency_dump() returns false if some critical locks 
>>>>> could not unlock.
>>>>> So what should we return if NULL is passed as the argument? true?
>>>>
>>>> But you're not calling prepare_for_emergency_dump when the thread is 
>>>> NULL:
>>>>
>>>> ??454?? Thread* thread = Thread::current_or_null_safe();
>>>> ??455
>>>> ??456?? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>> ??457?? JavaThreadInVM jtivm(thread);
>>>> ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>> ??459???? return;
>>>> ??460?? }
>>>
>>> Oh, sorry, I have a mistake!
>>> I want to change as below:
>>>
>>> ```
>>> +? Thread* thread = Thread::current_or_null_safe();
>>> +
>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>> +? JavaThreadInVM jtivm(thread);
>>> +? if (thread != NULL) {
>>> +??? if (!prepare_for_emergency_dump()) {
>>> +????? return;
>>> +??? }
>>> +? }
>>> ```
>>
>> but that is the same logic ??
>>
>>>> All I'm saying is that you pass "thread" as a parameter so you can 
>>>> then delete the existing call to Thread::current() that is inside 
>>>> prepare_for_emergency_dump.
>>>
>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of 
>>> above?
>>
>> ??? The two are not related. If you've already obtained the current 
>> thread you can pass it to prepare_for_emergency_dump and avoid the 
>> need to call Thread:current() (in whatever form) again. How you handle 
>> a NULL current thread is independent of that.
>>
>>> If so, I will push new changeset to submit repo, and will send new 
>>> review request.
>>
>> I'd send the review request first and get agreement before wasting 
>> time on the submit repo.
>>
>> Thanks,
>> David
>> -----
>>
>>>
>>> Yasumasa
>>>
>>>
>>>> David
>>>> -----
>>>>
>>>>> It might break semantics of this function, so I did not add 
>>>>> argument to prepare_for_emergency_dump() in this webrev.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>> Hi Markus,
>>>>>>>>>>
>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>
>>>>>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>
>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>
>>>>>>>>> This change passed all tests on submit repo 
>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>> Could you review again?
>>>>>>>>>
>>>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>
>>>>>>>>> In Markus's change, emergency dump will not perform when 
>>>>>>>>> Thread::current_or_null_safe() returns NULL.
>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) 
>>>>>>>>> throws to PID by `kill` command - main thread of the process 
>>>>>>>>> will be already detached (out of JVM).
>>>>>>>>> Also the crash might happen in native thread - created by 
>>>>>>>>> pthread_create (on Linux) from JNI code.
>>>>>>>>>
>>>>>>>>> Thus we should continue to perform emergency dump even if 
>>>>>>>>> Thread::current_or_null_safe() returns NULL.
>>>>>>>>
>>>>>>>> I didn't quite follow all that, but if there is no current 
>>>>>>>> thread then prepare_for_emergency_dump() is either going to 
>>>>>>>> assert here:
>>>>>>>>
>>>>>>>> ??348?? Thread* const thread = Thread::current();
>>>>>>>>
>>>>>>>> or crash here:
>>>>>>>>
>>>>>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>
>>>>>>> Thanks David!
>>>>>>> I fixed it in new webrev:
>>>>>>>
>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>
>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes 
>>>>>> the thread argument. As it is just a static function that doesn't 
>>>>>> impact anything else.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> It works fine on submit repo 
>>>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>
>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. 
>>>>>>>>>>> I realized later that, as you have pointed out, it would 
>>>>>>>>>>> perform a real thread transition. Sorry.
>>>>>>>>>>>
>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the 
>>>>>>>>>>> situations I have seen at this location, I think the only 
>>>>>>>>>>> case we need to be concerned about here is when a JavaThread 
>>>>>>>>>>> is _thread_in_native. _thread_in_java transition to 
>>>>>>>>>>> _thread_in_vm via stubs in SharedRuntime (i believe) as part 
>>>>>>>>>>> of coming out of the exception handler(s). Unfortunately I 
>>>>>>>>>>> cannot give a proper argument now to give the premises where 
>>>>>>>>>>> this invariant is enforced, so let's work with the original 
>>>>>>>>>>> thread state as you suggested Yasumasa.
>>>>>>>>>>>
>>>>>>>>>>> If we can avoid passing the thread all the way through, I 
>>>>>>>>>>> think that is preferable (this is not performance critical 
>>>>>>>>>>> code). David also alluded to the fact that you always 
>>>>>>>>>>> manipulate the current thread anyway. Although very unlikely, 
>>>>>>>>>>> we could have run into an issue with thread local storage, so 
>>>>>>>>>>> it makes sense to test this up front. If we cannot read the 
>>>>>>>>>>> thread local, the operations we intend to perform will fail, 
>>>>>>>>>>> so we might just bail out already.
>>>>>>>>>>>
>>>>>>>>>>> I took the liberty to tighten up the transition class a 
>>>>>>>>>>> little bit; you only need to restore the thread state if 
>>>>>>>>>>> there was an actual change.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>
>>>>>>>>>>> Markus
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: David Holmes
>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus 
>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>; 
>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>> recover thread state
>>>>>>>>>>>
>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>
>>>>>>>>>>> I don't think it is worth the disruption to pass the "thread" 
>>>>>>>>>>> all the way through these API's. It is simpler/cleaner to 
>>>>>>>>>>> just call
>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>
>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>>>>>>>> "invariant");
>>>>>>>>>>>
>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no 
>>>>>>>>>>> guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>
>>>>>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>
>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only 
>>>>>>>>>>> affects JavaThreads. And as it is local we don't need the 
>>>>>>>>>>> "forJFR" part.
>>>>>>>>>>>
>>>>>>>>>>> Based on Markus's proposed change, and with a view to 
>>>>>>>>>>> constrain the scope even further can I suggest the following:
>>>>>>>>>>>
>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>> ??? return;
>>>>>>>>>>> } else {
>>>>>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this 
>>>>>>>>>>> call
>>>>>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>> ????? return;
>>>>>>>>>>> ??? }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>>>>>> thread-state transition code is preferable to adding more 
>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its 
>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the thread 
>>>>>>>>>>>>> state of
>>>>>>>>>>>>> the current thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses 
>>>>>>>>>>>>>> transition_from_native() to
>>>>>>>>>>>>>> change the thread state.
>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to 
>>>>>>>>>>>>>> safepoint.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or 
>>>>>>>>>>>>> handshake)
>>>>>>>>>>>>> were pending. But the path through before_exit already has 
>>>>>>>>>>>>> safepoint
>>>>>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>
>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a 
>>>>>>>>>>>> safepoint check
>>>>>>>>>>>> on the report_and_die() path. So a custom transition helper 
>>>>>>>>>>>> is needed
>>>>>>>>>>>> to avoid that.
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>>>>>> running in a JavaThread:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> so we can't use the existing thread-state helpers, unless 
>>>>>>>>>>>>> we narrow
>>>>>>>>>>>>> the scope (as you do) to after the check for the 
>>>>>>>>>>>>> WatcherThread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; 
>>>>>>>>>>>>>>>> yasuenag at gmail.com;
>>>>>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do 
>>>>>>>>>>>>>>>> not recover
>>>>>>>>>>>>>>>> thread state
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local 
>>>>>>>>>>>>>>>> to JFR.
>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in 
>>>>>>>>>>>>>>>>> JFR code.
>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which 
>>>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>>>>>>> recover thread
>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/ 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>> However this function transits thread state to 
>>>>>>>>>>>>>>>>> "_thread_in_vm".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>> It failed at 
>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related 
>>>>>>>>>>>>>>>>> issue has been
>>>>>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 06:36:27 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 15:36:27 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
Message-ID: <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>

Thanks David!
I wait for Markus's review.


Yasumasa


On 2019/11/05 15:19, David Holmes wrote:
> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>> Hi David,
>>
>> Sorry, I was confused :)
>> This is new webrev. Could you check again?
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
> 
> Okay structurally I'm fine with that.
> 
> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
> 
> Thanks,
> David
> 
>>
>> Yasumasa
>>
>>
>> On 2019/11/05 14:56, David Holmes wrote:
>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>> Hi David,
>>>>>>
>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>
>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>
>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>
>>>>> ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>> ??455
>>>>> ??456?? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>> ??457?? JavaThreadInVM jtivm(thread);
>>>>> ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>> ??459???? return;
>>>>> ??460?? }
>>>>
>>>> Oh, sorry, I have a mistake!
>>>> I want to change as below:
>>>>
>>>> ```
>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>> +
>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>> +? JavaThreadInVM jtivm(thread);
>>>> +? if (thread != NULL) {
>>>> +??? if (!prepare_for_emergency_dump()) {
>>>> +????? return;
>>>> +??? }
>>>> +? }
>>>> ```
>>>
>>> but that is the same logic ??
>>>
>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>
>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>
>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>
>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>
>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>
>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>
>>>>>>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>
>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>
>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>> Could you review again?
>>>>>>>>>>
>>>>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>
>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>
>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>
>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>
>>>>>>>>> ??348?? Thread* const thread = Thread::current();
>>>>>>>>>
>>>>>>>>> or crash here:
>>>>>>>>>
>>>>>>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>
>>>>>>>> Thanks David!
>>>>>>>> I fixed it in new webrev:
>>>>>>>>
>>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>
>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>
>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>
>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>
>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>
>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>
>>>>>>>>>>>> Markus
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>>>>>>>>>
>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think it is worth the disruption to pass the "thread" all the way through these API's. It is simpler/cleaner to just call
>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>
>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), "invariant");
>>>>>>>>>>>>
>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>
>>>>>>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>
>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>
>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>
>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>> ??? return;
>>>>>>>>>>>> } else {
>>>>>>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>> ????? return;
>>>>>>>>>>>> ??? }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing existing
>>>>>>>>>>>>>>>> thread-state transition code is preferable to adding more custom
>>>>>>>>>>>>>>>> code that directly manipulates thread-state.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its arguments. So
>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the thread state of
>>>>>>>>>>>>>> the current thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses transition_from_native() to
>>>>>>>>>>>>>>> change the thread state.
>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or handshake)
>>>>>>>>>>>>>> were pending. But the path through before_exit already has safepoint
>>>>>>>>>>>>>> checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a safepoint check
>>>>>>>>>>>>> on the report_and_die() path. So a custom transition helper is needed
>>>>>>>>>>>>> to avoid that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may not be
>>>>>>>>>>>>>> running in a JavaThread:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> so we can't use the existing thread-state helpers, unless we narrow
>>>>>>>>>>>>>> the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; yasuenag at gmail.com;
>>>>>>>>>>>>>>>>> Markus Gronlund <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>>>>>>>>>>>>> thread state
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.01/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch which recover
>>>>>>>>>>>>>>>>>> it in JFR code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not recover thread
>>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ?? ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump data to
>>>>>>>>>>>>>>>>>> hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>> It failed at compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related issue has been
>>>>>>>>>>>>>>>>>> reported as JDK-8225620.
>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa

From thomas.stuefe at gmail.com  Tue Nov  5 07:06:58 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 5 Nov 2019 08:06:58 +0100
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
Message-ID: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>

Hi all,

may I please have reviews for this small build fix:

Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
Webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
Prior discussion:
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html

Thank you,

Thomas

From david.holmes at oracle.com  Tue Nov  5 07:19:45 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 5 Nov 2019 17:19:45 +1000
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
Message-ID: <08fd3fc7-c270-f734-ab0d-6204552c13a1@oracle.com>

Hi Thomas,

On 5/11/2019 5:06 pm, Thomas St?fe wrote:
> Hi all,
> 
> may I please have reviews for this small build fix:
> 
> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
> Prior discussion:
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html

Seems fine in principle. :) This seems to be the first case of local 
warning suppression in the sources, so I didn't have anything to compare 
against.

I did find the use of

#ifdef __GNUG__

surprising as we don't use that define anywhere else in the sources. I'm 
guessing the 'G' refers to g++ and technically is more correct than using

#ifdef __GNUC__

but we always seem to use the latter even for C++. ??

Thanks,
David
-----

> Thank you,
> 
> Thomas
> 

From thomas.stuefe at gmail.com  Tue Nov  5 07:52:05 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 5 Nov 2019 08:52:05 +0100
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <08fd3fc7-c270-f734-ab0d-6204552c13a1@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <08fd3fc7-c270-f734-ab0d-6204552c13a1@oracle.com>
Message-ID: <CAA-vtUyab+NBQh==YcA76+vvzk8qLi=MmvvwsZKFuhbqOSe2Tw@mail.gmail.com>

Hi David,

On Tue, Nov 5, 2019 at 8:19 AM David Holmes <david.holmes at oracle.com> wrote:

> Hi Thomas,
>
> On 5/11/2019 5:06 pm, Thomas St?fe wrote:
> > Hi all,
> >
> > may I please have reviews for this small build fix:
> >
> > Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
> > Webrev:
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
> > Prior discussion:
> >
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
>
> Seems fine in principle. :) This seems to be the first case of local
> warning suppression in the sources, so I didn't have anything to compare
> against.
>
>
This was Kim's explicit wish :) but I think it makes sense.


> I did find the use of
>
> #ifdef __GNUG__
>
> surprising as we don't use that define anywhere else in the sources. I'm
> guessing the 'G' refers to g++ and technically is more correct than using
>
> #ifdef __GNUC__
>
> but we always seem to use the latter even for C++. ??
>
>
Hmm.. I do not have strong emotions. I can use __GNUC__ if you prefer.

Thanks for looking!

..Thomas

Thanks,
> David
> -----
>
> > Thank you,
> >
> > Thomas
> >
>

From markus.gronlund at oracle.com  Tue Nov  5 10:35:21 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Tue, 5 Nov 2019 02:35:21 -0800 (PST)
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <df25a046-9f1e-417c-9ae2-f24786d850e6@default>
 <1982d421-943c-027b-65c4-5311311eb5aa@oracle.com>
 <a94b0b91-d5e0-1e84-6569-15dcc17373cb@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
Message-ID: <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>

Hi again,

The comments in my previous email still apply:

"...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."

1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation(). 
2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.

http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/

Thanks
Markus


-----Original Message-----
From: Yasumasa Suenaga <suenaga at oss.nttdata.com> 
Sent: den 5 november 2019 07:36
To: David Holmes <david.holmes at oracle.com>; Markus Gronlund <markus.gronlund at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state

Thanks David!
I wait for Markus's review.


Yasumasa


On 2019/11/05 15:19, David Holmes wrote:
> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>> Hi David,
>>
>> Sorry, I was confused :)
>> This is new webrev. Could you check again?
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
> 
> Okay structurally I'm fine with that.
> 
> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
> 
> Thanks,
> David
> 
>>
>> Yasumasa
>>
>>
>> On 2019/11/05 14:56, David Holmes wrote:
>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>> Hi David,
>>>>>>
>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>
>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>
>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>
>>>>> ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>> ??455
>>>>> ??456?? // Ensure a JavaThread is _thread_in_vm when we make this 
>>>>> call
>>>>> ??457?? JavaThreadInVM jtivm(thread);
>>>>> ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>> ??459???? return;
>>>>> ??460?? }
>>>>
>>>> Oh, sorry, I have a mistake!
>>>> I want to change as below:
>>>>
>>>> ```
>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>> +
>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>> +? JavaThreadInVM jtivm(thread);
>>>> +? if (thread != NULL) {
>>>> +??? if (!prepare_for_emergency_dump()) {
>>>> +????? return;
>>>> +??? }
>>>> +? }
>>>> ```
>>>
>>> but that is the same logic ??
>>>
>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>
>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>
>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>
>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>
>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>
>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>
>>>>>>>>>>> ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>
>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>
>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>> Could you review again?
>>>>>>>>>>
>>>>>>>>>> ?? 
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>
>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>
>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>
>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>
>>>>>>>>> ??348?? Thread* const thread = Thread::current();
>>>>>>>>>
>>>>>>>>> or crash here:
>>>>>>>>>
>>>>>>>>> ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>
>>>>>>>> Thanks David!
>>>>>>>> I fixed it in new webrev:
>>>>>>>>
>>>>>>>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>
>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>
>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>
>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>
>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>
>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>
>>>>>>>>>>>> Markus
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus 
>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>; 
>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>
>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think it is worth the disruption to pass the 
>>>>>>>>>>>> "thread" all the way through these API's. It is 
>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>
>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>
>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>
>>>>>>>>>>>> ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>
>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>
>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>
>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>> ??? return;
>>>>>>>>>>>> } else {
>>>>>>>>>>>> ??? // Ensure a JavaThread is _thread_in_vm when we make 
>>>>>>>>>>>> this call
>>>>>>>>>>>> ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>> ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>> ????? return;
>>>>>>>>>>>> ??? }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing 
>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to 
>>>>>>>>>>>>>>>> adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its 
>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the 
>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses 
>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or 
>>>>>>>>>>>>>> handshake) were pending. But the path through before_exit 
>>>>>>>>>>>>>> already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a 
>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom 
>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may 
>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>> ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> so we can't use the existing thread-state helpers, unless 
>>>>>>>>>>>>>> we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; 
>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund 
>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do 
>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webre
>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This change passed all tests on submit repo 
>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch 
>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ?? ? JBS: 
>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>> ?? ? webrev:
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump 
>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as 
>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>> It failed at 
>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related 
>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 11:56:09 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 20:56:09 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
Message-ID: <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>

Hi Markus,

> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().

Do you mean that it would not perform when Thread::current() returns NULL?
It will happen when crash is occur in detached thread [1]. Can't we think about that case?


Thanks,

Yasumasa


[1] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000822.html


On 2019/11/05 19:35, Markus Gronlund wrote:
> Hi again,
> 
> The comments in my previous email still apply:
> 
> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
> 
> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
> 
> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
> 
> Thanks
> Markus
> 
> 
> 
> -----Original Message-----
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> Sent: den 5 november 2019 07:36
> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund <markus.gronlund at oracle.com>
> Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
> 
> Thanks David!
> I wait for Markus's review.
> 
> 
> Yasumasa
> 
> 
> On 2019/11/05 15:19, David Holmes wrote:
>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>> Hi David,
>>>
>>> Sorry, I was confused :)
>>> This is new webrev. Could you check again?
>>>
>>>  ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>
>> Okay structurally I'm fine with that.
>>
>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>
>> Thanks,
>> David
>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/05 14:56, David Holmes wrote:
>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>
>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>
>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>
>>>>>>  ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>  ??455
>>>>>>  ??456?? // Ensure a JavaThread is _thread_in_vm when we make this
>>>>>> call
>>>>>>  ??457?? JavaThreadInVM jtivm(thread);
>>>>>>  ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>>>  ??459???? return;
>>>>>>  ??460?? }
>>>>>
>>>>> Oh, sorry, I have a mistake!
>>>>> I want to change as below:
>>>>>
>>>>> ```
>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>> +
>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>> +? JavaThreadInVM jtivm(thread);
>>>>> +? if (thread != NULL) {
>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>> +????? return;
>>>>> +??? }
>>>>> +? }
>>>>> ```
>>>>
>>>> but that is the same logic ??
>>>>
>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>
>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>
>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>
>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>
>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>
>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>
>>>>>>>>>>>>  ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>
>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>
>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>> Could you review again?
>>>>>>>>>>>
>>>>>>>>>>>     
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>>
>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>
>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>
>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>
>>>>>>>>>>  ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>
>>>>>>>>>> or crash here:
>>>>>>>>>>
>>>>>>>>>>  ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>
>>>>>>>>> Thanks David!
>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>
>>>>>>>>>  ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>>
>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>
>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus
>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>;
>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>
>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think it is worth the disruption to pass the
>>>>>>>>>>>>> "thread" all the way through these API's. It is
>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm),
>>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>>
>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>  ??? return;
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>  ??? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>>>>>> this call
>>>>>>>>>>>>>  ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>  ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>  ????? return;
>>>>>>>>>>>>>  ??? }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing
>>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to
>>>>>>>>>>>>>>>>> adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its
>>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the
>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint (or
>>>>>>>>>>>>>>> handshake) were pending. But the path through before_exit
>>>>>>>>>>>>>>> already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a
>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom
>>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may
>>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>  ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers, unless
>>>>>>>>>>>>>>> we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund
>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do
>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webre
>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch
>>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  ?? ? JBS:
>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>  ?? ? webrev:
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump
>>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related
>>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa

From markus.gronlund at oracle.com  Tue Nov  5 12:54:32 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Tue, 5 Nov 2019 04:54:32 -0800 (PST)
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9d0a3618-1b21-ca7f-b17d-dee8577cd885@oracle.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
Message-ID: <f0245ca4-d35a-4752-8fca-0e0e67399338@default>

The current dump mechanism is reusing most of the regular logic because it has to perform quite a lot of work to construct a recording. For example, it needs to collect all tagged artifacts in the system (Klass, Method, ClassLoaderData, Symbols, Modules and more) to have the events in an emergency recording file be fully parsable. This is non-trivial requiring a thread to at least be part of the VM, with most thread local data structures preserved.

We should remember that dumping an emergency recording is only a best effort attempt. As an analogy, compare it to other routines in VMError::report() that are conditional and require a non-NULL thread object (print the current Compile Task, print VM Operation or event printing the JavaStack for a thread for example).

With Event Streaming that was recently checked in, it is easier (although not easy, but easier compared to before ) to extend this support to not have the emergency dumper thread do so much internal VM work. 
The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks). 

It is unclear at this point if the value-add is high enough to warrant the work.

Thanks
Markus


-----Original Message-----
From: Yasumasa Suenaga <suenaga at oss.nttdata.com> 
Sent: den 5 november 2019 12:56
To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes <david.holmes at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state

Hi Markus,

> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().

Do you mean that it would not perform when Thread::current() returns NULL?
It will happen when crash is occur in detached thread [1]. Can't we think about that case?


Thanks,

Yasumasa


[1] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000822.html


On 2019/11/05 19:35, Markus Gronlund wrote:
> Hi again,
> 
> The comments in my previous email still apply:
> 
> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
> 
> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
> 
> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
> 
> Thanks
> Markus
> 
> 
> 
> -----Original Message-----
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> Sent: den 5 november 2019 07:36
> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund 
> <markus.gronlund at oracle.com>
> Cc: hotspot-jfr-dev at openjdk.java.net; 
> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
> thread state
> 
> Thanks David!
> I wait for Markus's review.
> 
> 
> Yasumasa
> 
> 
> On 2019/11/05 15:19, David Holmes wrote:
>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>> Hi David,
>>>
>>> Sorry, I was confused :)
>>> This is new webrev. Could you check again?
>>>
>>>  ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>
>> Okay structurally I'm fine with that.
>>
>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>
>> Thanks,
>> David
>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/05 14:56, David Holmes wrote:
>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>
>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>
>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>
>>>>>>  ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>  ??455
>>>>>>  ??456?? // Ensure a JavaThread is _thread_in_vm when we make 
>>>>>> this call
>>>>>>  ??457?? JavaThreadInVM jtivm(thread);
>>>>>>  ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>>>  ??459???? return;
>>>>>>  ??460?? }
>>>>>
>>>>> Oh, sorry, I have a mistake!
>>>>> I want to change as below:
>>>>>
>>>>> ```
>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>> +
>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>> +? JavaThreadInVM jtivm(thread);
>>>>> +? if (thread != NULL) {
>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>> +????? return;
>>>>> +??? }
>>>>> +? }
>>>>> ```
>>>>
>>>> but that is the same logic ??
>>>>
>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>
>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>
>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>
>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>
>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>
>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>
>>>>>>>>>>>>  ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>
>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>
>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>> Could you review again?
>>>>>>>>>>>
>>>>>>>>>>>     
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>>
>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>
>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>
>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>
>>>>>>>>>>  ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>
>>>>>>>>>> or crash here:
>>>>>>>>>>
>>>>>>>>>>  ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>
>>>>>>>>> Thanks David!
>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>
>>>>>>>>>  ?? 
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>>
>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>
>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus 
>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>; 
>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>
>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think it is worth the disruption to pass the 
>>>>>>>>>>>>> "thread" all the way through these API's. It is 
>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm), 
>>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>>
>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>  ??? return;
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>  ??? // Ensure a JavaThread is _thread_in_vm when we make 
>>>>>>>>>>>>> this call
>>>>>>>>>>>>>  ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>  ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>  ????? return;
>>>>>>>>>>>>>  ??? }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing 
>>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to 
>>>>>>>>>>>>>>>>> adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its 
>>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the 
>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint 
>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>> handshake) were pending. But the path through 
>>>>>>>>>>>>>>> before_exit already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a 
>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom 
>>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may 
>>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>  ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers, 
>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; 
>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund 
>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do 
>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo 
>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch 
>>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  ?? ? JBS:
>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>  ?? ? webrev:
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/web
>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump 
>>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as 
>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related 
>>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 13:34:28 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 22:34:28 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <03ada71c-6a47-b0aa-cb22-60bc3e7b4f5a@oracle.com>
 <217f1155-bce1-127d-5c30-ed3cd2fccb8d@oracle.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
 <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
Message-ID: <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>

Hi Markus,

Thanks for explanation. I tweaked your webrev:

   http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/

If you and David are ok, I will push it.


> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).

I think it is very helpful for troubleshooting!
If crash report shows the location of JFR file or repository, it is more useful.
So I've sent review request for it (JDK-8233373):

   https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000808.html


Yasumasa


On 2019/11/05 21:54, Markus Gronlund wrote:
> The current dump mechanism is reusing most of the regular logic because it has to perform quite a lot of work to construct a recording. For example, it needs to collect all tagged artifacts in the system (Klass, Method, ClassLoaderData, Symbols, Modules and more) to have the events in an emergency recording file be fully parsable. This is non-trivial requiring a thread to at least be part of the VM, with most thread local data structures preserved.
> 
> We should remember that dumping an emergency recording is only a best effort attempt. As an analogy, compare it to other routines in VMError::report() that are conditional and require a non-NULL thread object (print the current Compile Task, print VM Operation or event printing the JavaStack for a thread for example).
> 
> With Event Streaming that was recently checked in, it is easier (although not easy, but easier compared to before ) to extend this support to not have the emergency dumper thread do so much internal VM work.
> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
> 
> It is unclear at this point if the value-add is high enough to warrant the work.
> 
> Thanks
> Markus
> 
> 
> -----Original Message-----
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> Sent: den 5 november 2019 12:56
> To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes <david.holmes at oracle.com>
> Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
> 
> Hi Markus,
> 
>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
> 
> Do you mean that it would not perform when Thread::current() returns NULL?
> It will happen when crash is occur in detached thread [1]. Can't we think about that case?
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> [1] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000822.html
> 
> 
> On 2019/11/05 19:35, Markus Gronlund wrote:
>> Hi again,
>>
>> The comments in my previous email still apply:
>>
>> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
>>
>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
>>
>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>
>> Thanks
>> Markus
>>
>>
>>
>> -----Original Message-----
>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>> Sent: den 5 november 2019 07:36
>> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund
>> <markus.gronlund at oracle.com>
>> Cc: hotspot-jfr-dev at openjdk.java.net;
>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>> thread state
>>
>> Thanks David!
>> I wait for Markus's review.
>>
>>
>> Yasumasa
>>
>>
>> On 2019/11/05 15:19, David Holmes wrote:
>>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>>> Hi David,
>>>>
>>>> Sorry, I was confused :)
>>>> This is new webrev. Could you check again?
>>>>
>>>>   ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>>
>>> Okay structurally I'm fine with that.
>>>
>>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>>
>>> Thanks,
>>> David
>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/11/05 14:56, David Holmes wrote:
>>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>
>>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>>
>>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>>
>>>>>>>   ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>>   ??455
>>>>>>>   ??456?? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>> this call
>>>>>>>   ??457?? JavaThreadInVM jtivm(thread);
>>>>>>>   ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>>>>   ??459???? return;
>>>>>>>   ??460?? }
>>>>>>
>>>>>> Oh, sorry, I have a mistake!
>>>>>> I want to change as below:
>>>>>>
>>>>>> ```
>>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>>> +
>>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>> +? JavaThreadInVM jtivm(thread);
>>>>>> +? if (thread != NULL) {
>>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>>> +????? return;
>>>>>> +??? }
>>>>>> +? }
>>>>>> ```
>>>>>
>>>>> but that is the same logic ??
>>>>>
>>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>>
>>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>>
>>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>>
>>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>>
>>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>>
>>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>>
>>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>>> Could you review again?
>>>>>>>>>>>>
>>>>>>>>>>>>      
>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>>>
>>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>>
>>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>
>>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>>
>>>>>>>>>>>   ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>>
>>>>>>>>>>> or crash here:
>>>>>>>>>>>
>>>>>>>>>>>   ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>
>>>>>>>>>> Thanks David!
>>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>>
>>>>>>>>>>      
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>>>
>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus
>>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>;
>>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think it is worth the disruption to pass the
>>>>>>>>>>>>>> "thread" all the way through these API's. It is
>>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm),
>>>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>>   ??? return;
>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>   ??? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>>>>>>> this call
>>>>>>>>>>>>>>   ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>>   ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>>   ????? return;
>>>>>>>>>>>>>>   ??? }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing
>>>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to
>>>>>>>>>>>>>>>>>> adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its
>>>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the
>>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint
>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>> handshake) were pending. But the path through
>>>>>>>>>>>>>>>> before_exit already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a
>>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom
>>>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may
>>>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>>   ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers,
>>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund
>>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do
>>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch
>>>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   ?? ? JBS:
>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>>   ?? ? webrev:
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/web
>>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump
>>>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related
>>>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 13:40:10 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 22:40:10 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <e72e8009-e58b-4e9c-8796-4f23ba4c18e5@default>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
 <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
 <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>
Message-ID: <458e9a65-9d05-f237-9fa5-63ff69515daf@oss.nttdata.com>

On 2019/11/05 22:34, Yasumasa Suenaga wrote:
> Hi Markus,
> 
> Thanks for explanation. I tweaked your webrev:
> 
>  ? http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/

Sorry, wevrev is here:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/

Yasumasa

> If you and David are ok, I will push it.
> 
> 
>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
> 
> I think it is very helpful for troubleshooting!
> If crash report shows the location of JFR file or repository, it is more useful.
> So I've sent review request for it (JDK-8233373):
> 
>  ? https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000808.html
> 
> 
> Yasumasa
> 
> 
> On 2019/11/05 21:54, Markus Gronlund wrote:
>> The current dump mechanism is reusing most of the regular logic because it has to perform quite a lot of work to construct a recording. For example, it needs to collect all tagged artifacts in the system (Klass, Method, ClassLoaderData, Symbols, Modules and more) to have the events in an emergency recording file be fully parsable. This is non-trivial requiring a thread to at least be part of the VM, with most thread local data structures preserved.
>>
>> We should remember that dumping an emergency recording is only a best effort attempt. As an analogy, compare it to other routines in VMError::report() that are conditional and require a non-NULL thread object (print the current Compile Task, print VM Operation or event printing the JavaStack for a thread for example).
>>
>> With Event Streaming that was recently checked in, it is easier (although not easy, but easier compared to before ) to extend this support to not have the emergency dumper thread do so much internal VM work.
>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>
>> It is unclear at this point if the value-add is high enough to warrant the work.
>>
>> Thanks
>> Markus
>>
>>
>> -----Original Message-----
>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>> Sent: den 5 november 2019 12:56
>> To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes <david.holmes at oracle.com>
>> Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>
>> Hi Markus,
>>
>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>
>> Do you mean that it would not perform when Thread::current() returns NULL?
>> It will happen when crash is occur in detached thread [1]. Can't we think about that case?
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000822.html
>>
>>
>> On 2019/11/05 19:35, Markus Gronlund wrote:
>>> Hi again,
>>>
>>> The comments in my previous email still apply:
>>>
>>> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
>>>
>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
>>>
>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>
>>> Thanks
>>> Markus
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>> Sent: den 5 november 2019 07:36
>>> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund
>>> <markus.gronlund at oracle.com>
>>> Cc: hotspot-jfr-dev at openjdk.java.net;
>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>> thread state
>>>
>>> Thanks David!
>>> I wait for Markus's review.
>>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/05 15:19, David Holmes wrote:
>>>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>>>> Hi David,
>>>>>
>>>>> Sorry, I was confused :)
>>>>> This is new webrev. Could you check again?
>>>>>
>>>>> ? ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>>>
>>>> Okay structurally I'm fine with that.
>>>>
>>>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/05 14:56, David Holmes wrote:
>>>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>
>>>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>>>
>>>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>>>
>>>>>>>> ? ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>>> ? ??455
>>>>>>>> ? ??456?? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>> this call
>>>>>>>> ? ??457?? JavaThreadInVM jtivm(thread);
>>>>>>>> ? ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>>>>> ? ??459???? return;
>>>>>>>> ? ??460?? }
>>>>>>>
>>>>>>> Oh, sorry, I have a mistake!
>>>>>>> I want to change as below:
>>>>>>>
>>>>>>> ```
>>>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>>>> +
>>>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>> +? JavaThreadInVM jtivm(thread);
>>>>>>> +? if (thread != NULL) {
>>>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>>>> +????? return;
>>>>>>> +??? }
>>>>>>> +? }
>>>>>>> ```
>>>>>>
>>>>>> but that is the same logic ??
>>>>>>
>>>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>>>
>>>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>>>
>>>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>>>
>>>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>>>
>>>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>>>> Could you review again?
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>>>>
>>>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>
>>>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>>>
>>>>>>>>>>>> ? ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>>>
>>>>>>>>>>>> or crash here:
>>>>>>>>>>>>
>>>>>>>>>>>> ? ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>
>>>>>>>>>>> Thanks David!
>>>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>>>>
>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus
>>>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>;
>>>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think it is worth the disruption to pass the
>>>>>>>>>>>>>>> "thread" all the way through these API's. It is
>>>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm),
>>>>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ? ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>>> ? ??? return;
>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>> ? ??? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>>>>>>>> this call
>>>>>>>>>>>>>>> ? ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>>> ? ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>>> ? ????? return;
>>>>>>>>>>>>>>> ? ??? }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing
>>>>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to
>>>>>>>>>>>>>>>>>>> adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its
>>>>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the
>>>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint
>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>> handshake) were pending. But the path through
>>>>>>>>>>>>>>>>> before_exit already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a
>>>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom
>>>>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may
>>>>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ? ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>>> ? ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers,
>>>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>>>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund
>>>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do
>>>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch
>>>>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ? ?? ? JBS:
>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>>> ? ?? ? webrev:
>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/web
>>>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump
>>>>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related
>>>>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yasumasa

From thomas.stuefe at gmail.com  Tue Nov  5 13:42:32 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 5 Nov 2019 14:42:32 +0100
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <e7f23e19-f208-0a8a-56ac-4be23deabc65@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
 <e7f23e19-f208-0a8a-56ac-4be23deabc65@oss.nttdata.com>
Message-ID: <CAA-vtUyKcSrmvZgCmmdZs74nQQ8_NN0D0=mFdrxDKq8rZwq=Qw@mail.gmail.com>

On Sat, Nov 2, 2019 at 1:46 PM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> Hi Thomas,
>
> I agree with you. Also CI Replay will be generated after NMT report.
> But I think it should be another issue.
>
> If you are ok, I file it to JBS and create a patch.
>
>
>
Why not fix it in this patch, while you are on it?

..Thomas


> Thanks,
>
> Yasumasa
>
>
> On 2019/11/01 19:36, Thomas St?fe wrote:
> > Hi Yasumasa,
> >
> > I see that we do JFR::on_vm_shutdown() before error reporting ran. Is
> that really necessary? Error reporting should happen as close as possible
> to the error point - ideally, as little code as possible should run between
> the crash/assert and the generation of the hs-err file. I suggest moving
> the call to JFR::on_vm_shutdown()
> > down to a point after error reporting, e.g. to where we print the NMT
> report on shutdown.
> >
> > Cheers, Thomas
> >
> >
> > On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <
> suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
> >
> >     Forward to hotspot-runtime-dev.
> >
> >     As David commented in JBS, it may need to be fixed in JFR code.
> >     But I'm not unclear why thread state is not recover.
> >
> >     I'd like to hear about this from JFR folks.
> >     If it is just a bug in JFR, I will create a patch which recover it
> in JFR code.
> >
> >
> >     Thanks,
> >
> >     Yasumasa
> >
> >
> >     -------- Forwarded Message --------
> >     Subject: RFR: 8233375: JFR emergency dump do not recover thread state
> >     Date: Fri, 1 Nov 2019 17:08:42 +0900
> >     From: Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:
> suenaga at oss.nttdata.com>>
> >     To: hotspot-jfr-dev at openjdk.java.net <mailto:
> hotspot-jfr-dev at openjdk.java.net>
> >     CC: yasuenag at gmail.com <mailto:yasuenag at gmail.com> <
> yasuenag at gmail.com <mailto:yasuenag at gmail.com>>
> >
> >     Hi all,
> >
> >     Please review this change:
> >
> >         JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
> >         webrev:
> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
> >
> >     If JFR is running when JVM crashes, JFR will dump data to
> hs_err_pid<PID>.jfr .
> >     It would perform in prepare_for_emergency_dump().
> >     However this function transits thread state to "_thread_in_vm".
> >
> >     This change has been tested on submit repo as
> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
> >     It failed at compiler/types/correctness/CorrectnessTest.java
> >     However this test is for JIT compiler, and related issue has been
> reported as JDK-8225620.
> >     So I think this patch can go through.
> >
> >
> >     Thanks,
> >
> >     Yasumasa
> >
>

From harold.seigel at oracle.com  Tue Nov  5 14:27:17 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Tue, 5 Nov 2019 09:27:17 -0500
Subject: RFR: CSR JVM support for records
Message-ID: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>

Hi,

Please review this draft of the CSR for JVM support for records.

CSR: https://bugs.openjdk.java.net/browse/JDK-8233595

Thanks, Harold


From erik.osterlund at oracle.com  Tue Nov  5 14:34:31 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Tue, 5 Nov 2019 15:34:31 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <9c3f8b52-88da-33d3-8737-db3854d6c20c@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <7388c7fc-39c4-1ec6-1608-02b08e562ab3@oracle.com>
 <dbffc304-e84f-b1ec-b997-7978c1bced6f@oracle.com>
 <17b0ec76-6f32-fd7d-6486-4df21582ce03@oracle.com>
 <2c16521b-5694-7f6e-0f54-ee2bddf5563f@oracle.com>
 <b692dcca-a676-f462-67cb-21f64b05923e@oracle.com>
 <9c3f8b52-88da-33d3-8737-db3854d6c20c@oracle.com>
Message-ID: <86713b7c-3842-7a9e-89e0-76c3e1c3d4ea@oracle.com>

Sounds good.

Thanks,
/Erik

On 11/5/19 5:50 AM, David Holmes wrote:
> On 5/11/2019 11:34 am, Daniel D. Daugherty wrote:
>> I can do that. I assume delete them in the Async Monitor Deflation
>> code and delete them in threadSMR.cpp. For the threadSMR.cpp, I'll
>> roll that change into my latest baseline cleanup bug:
>>
>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation v2.07
>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>
> Sure.
>
> Thanks,
> David
> -----
>
>> Erik, please chime in here... Thanks!
>>
>> Dan
>>
>>
>> On 11/4/19 6:44 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Just delete the comments.
>>>
>>> Thanks,
>>> David
>>>
>>> On 5/11/2019 7:25 am, Daniel D. Daugherty wrote:
>>>> Hi David and Erik,
>>>>
>>>> Thanks for chiming in here Erik...
>>>>
>>>> This set of comments is not addressed in the CR8/v2.08/11-for-jdk14
>>>> code review request that I just sent out.
>>>>
>>>> I've read this response twice and I'm not quite sure what to do 
>>>> with it
>>>> relative to David's CR comment. I'll repeat those here:
>>>>
>>>> ?>? 199 // The decrement only needs to be MO_ACQ_REL since the 
>>>> reference
>>>> ?>? 200?? // counter is volatile.
>>>> ?>? 201?? Atomic::dec(&_ref_count);
>>>> ?>
>>>> ?> volatile is irrelevant with regards to memory ordering as it is 
>>>> a compiler
>>>> ?> annotation. And you haven't specified any memory order value so 
>>>> the default
>>>> ?> is conservative ie. implied full fence. (I see the same 
>>>> incorrect comment
>>>> ?> is in threadSMR.cpp!)
>>>>
>>>> Should I delete this comment? Or should it be changed? If changed, 
>>>> then
>>>> what text do you recommend here?
>>>>
>>>>
>>>> ?> 208?? // The increment needs to be MO_SEQ_CST so that the reference
>>>> ?>? 209?? // counter update is seen as soon as possible in a race 
>>>> with the
>>>> ?>? 210?? // async deflation protocol.
>>>> ?>? 211?? Atomic::inc(&_ref_count);
>>>> ?>
>>>> ?> Ditto you haven't specified any ordering - and inc() and dec() 
>>>> will have the same default.
>>>>
>>>> Should I delete this comment? Or should it be changed? If changed, 
>>>> then
>>>> what text do you recommend here?
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 11/4/19 8:09 AM, erik.osterlund at oracle.com wrote:
>>>>> Hi,
>>>>>
>>>>> TL/DR: David is right; the commentary is weird and does not 
>>>>> capture what the real constraints are.
>>>>>
>>>>> As the comment implied before "8222034: Thread-SMR functions 
>>>>> should be updated to remove work around", the PPC port used to 
>>>>> have incorrect memory ordering, and the code guarded against that. 
>>>>> inc/dec used to be memory_order_relaxed and add/sub used to be 
>>>>> memory_order_acq_rel on PPC, despite the shared contract promising 
>>>>> memory_order_conservative.
>>>>>
>>>>> The implication for the nested counter in the Thread SMR project 
>>>>> was that I wanted to use the inc/dec API but knew it was not gonna 
>>>>> work as expected on PPC because we really needed *at least* 
>>>>> memory_order_acq_rel when decrementing (and 
>>>>> memory_order_conservative when incrementing, which was simulated 
>>>>> in a CAS loop... yuck), but would find ourselves getting 
>>>>> memory_order_relaxed. Rather than treating it as a bug in the PPC 
>>>>> atomics implementation, and having the code be broken while we 
>>>>> waited for a fix, I changed the use to sub when decrementing 
>>>>> (which gave me the required memory_order_acq_rel ordering I 
>>>>> needed), and the horrible CAS loop when incrementing, as a 
>>>>> workaround, and alerted Martin Doerr that this would needed to be 
>>>>> sorted out in the PPC code. Since then, the PPC code did indeed 
>>>>> get cleaned up so that inc/dec stopped being relaxed-only and 
>>>>> worked as advertised.
>>>>>
>>>>> After that, the "8222034: Thread-SMR functions should be updated 
>>>>> to remove work around" change removed the workaround that was no 
>>>>> longer required from the code, and put back the desired inc/dec 
>>>>> calls (which now used an overly conservative 
>>>>> memory_order_conservative ordering, which is suboptimal, in 
>>>>> particular for decrements, but importantly not incorrect). Since 
>>>>> the nested case would almost never run and is possibly the coldest 
>>>>> code path in the VM, I did not care to comment in that review 
>>>>> thread about optimizing it by explicitly passing in a weaker 
>>>>> ordering. However, I should have commented on the comment that was 
>>>>> changed, which does indeed look a bit confused. David is right 
>>>>> that the stuff about volatile has nothing to do with why this is 
>>>>> correct. The correctness required memory_order_acq_rel for 
>>>>> decrements, but the implementation provided more, which is fine.
>>>>>
>>>>> The actual reason why I wanted memory_order_conservative for 
>>>>> correctness when incrementing and memory_order_acq_rel when 
>>>>> decrementing, was to prevent accesses inside of the critical 
>>>>> section (in particular - reading Thread*s from the acquired 
>>>>> ThreadsList), from floating outside of the reference increment and 
>>>>> decrement that marks reading the list as safe to access without 
>>>>> the underlying list blowing up. In practice, it might have been 
>>>>> possible to relax it a bit by relying on side effects of other 
>>>>> unrelated parts of the protocol to have spurious fencing... but I 
>>>>> did not want to get the protocol tangled in that way because it 
>>>>> would be difficult to reason about.
>>>>>
>>>>> Hope this explanation clears up that confusion.
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>>
>>>>> On 11/2/19 2:15 PM, Daniel D. Daugherty wrote:
>>>>>> Erik,
>>>>>>
>>>>>> David H. made a comment during this review cycle that should 
>>>>>> interest you.
>>>>>>
>>>>>> The longer version of this comment came up in early reviews of 
>>>>>> the Async
>>>>>> Monitor Deflation code because I copied the code and the longer 
>>>>>> comment
>>>>>> from threadSMR.cpp. I updated the comment based on your input and 
>>>>>> review
>>>>>> and changed the comment and code in threadSMR.cpp and in the 
>>>>>> Async Monitor
>>>>>> Deflation project code.
>>>>>>
>>>>>> The change in threadSMR.cpp was done with this changeset:
>>>>>>
>>>>>> $ hg log -v -r 54517
>>>>>> changeset:?? 54517:c201ca660afd
>>>>>> user:??????? dcubed
>>>>>> date:??????? Thu Apr 11 14:14:30 2019 -0400
>>>>>> files:?????? src/hotspot/share/runtime/threadSMR.cpp
>>>>>> description:
>>>>>> 8222034: Thread-SMR functions should be updated to remove work 
>>>>>> around
>>>>>> Reviewed-by: mdoerr, eosterlund
>>>>>>
>>>>>> Here's one of the two diffs to job your memory:
>>>>>>
>>>>>> ?void ThreadsList::dec_nested_handle_cnt() {
>>>>>> -? // The decrement needs to be MO_ACQ_REL. At the moment, the 
>>>>>> Atomic::dec
>>>>>> -? // backend on PPC does not yet conform to these requirements. 
>>>>>> Therefore
>>>>>> -? // the decrement is simulated with an Atomic::sub(1, &addr).
>>>>>> -? // Without this MO_ACQ_REL Atomic::dec simulation, the nested 
>>>>>> SMR mechanism
>>>>>> -? // is not generally safe to use.
>>>>>> -? Atomic::sub(1, &_nested_handle_cnt);
>>>>>> +? // The decrement only needs to be MO_ACQ_REL since the reference
>>>>>> +? // counter is volatile (and the hazard ptr is already NULL).
>>>>>> +? Atomic::dec(&_nested_handle_cnt);
>>>>>> ?}
>>>>>>
>>>>>> Below is David's comment about the code comment...
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> Trimming down to just that issue...
>>>>>>
>>>>>> On 10/29/19 4:20 PM, Daniel D. Daugherty wrote:
>>>>>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>>>> >
>>>>>> > src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>>>
>>>>>>>> ?199 // The decrement only needs to be MO_ACQ_REL since the 
>>>>>>>> reference
>>>>>>>> ?200?? // counter is volatile.
>>>>>>>> ?201?? Atomic::dec(&_ref_count);
>>>>>>>>
>>>>>>>> volatile is irrelevant with regards to memory ordering as it is 
>>>>>>>> a compiler annotation. And you haven't specified any memory 
>>>>>>>> order value so the default is conservative ie. implied full 
>>>>>>>> fence. (I see the same incorrect comment is in threadSMR.cpp!)
>>>>>>>
>>>>>>> I got that wording from threadSMR.cpp and Erik O. confirmed my 
>>>>>>> use of that
>>>>>>> wording previously. I'll chase it down with Erik and get back to 
>>>>>>> you.
>>>>>>>
>>>>>>>
>>>>>>>> 208?? // The increment needs to be MO_SEQ_CST so that the 
>>>>>>>> reference
>>>>>>>> ?209?? // counter update is seen as soon as possible in a race 
>>>>>>>> with the
>>>>>>>> ?210?? // async deflation protocol.
>>>>>>>> ?211?? Atomic::inc(&_ref_count);
>>>>>>>>
>>>>>>>> Ditto you haven't specified any ordering - and inc() and dec() 
>>>>>>>> will have the same default.
>>>>>>>
>>>>>>> And again, I'll have to chase this down with Erik O. and get 
>>>>>>> back to you.
>>>>>>
>>>>>
>>>>
>>


From suenaga at oss.nttdata.com  Tue Nov  5 14:35:46 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Tue, 5 Nov 2019 23:35:46 +0900
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <CAA-vtUyKcSrmvZgCmmdZs74nQQ8_NN0D0=mFdrxDKq8rZwq=Qw@mail.gmail.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
 <e7f23e19-f208-0a8a-56ac-4be23deabc65@oss.nttdata.com>
 <CAA-vtUyKcSrmvZgCmmdZs74nQQ8_NN0D0=mFdrxDKq8rZwq=Qw@mail.gmail.com>
Message-ID: <24ad04c1-942a-55b4-c57a-943ccada3748@oss.nttdata.com>

On 2019/11/05 22:42, Thomas St?fe wrote:
> 
> On Sat, Nov 2, 2019 at 1:46 PM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
> 
>     Hi Thomas,
> 
>     I agree with you. Also CI Replay will be generated after NMT report.
>     But I think it should be another issue.
> 
>     If you are ok, I file it to JBS and create a patch.
> 
> 
> 
> Why not fix it in this patch, while you are on it?

It is not affect thread state directly.

In addition, I've sent review request of JDK-8233373. This change affects to the location of JFR::on_vm_shutdown().
Thus I want to fix it after JDK-8233373 and JDK-8233375.


Yasumasa


> ..Thomas
> 
>     Thanks,
> 
>     Yasumasa
> 
> 
>     On 2019/11/01 19:36, Thomas St?fe wrote:
>      > Hi Yasumasa,
>      >
>      > I see that we do JFR::on_vm_shutdown() before error reporting ran. Is that really necessary? Error reporting should happen as close as possible to the error point - ideally, as little code as possible should run between the crash/assert and the generation of the hs-err file. I suggest moving the call to JFR::on_vm_shutdown()
>      > down to a point after error reporting, e.g. to where we print the NMT report on shutdown.
>      >
>      > Cheers, Thomas
>      >
>      >
>      > On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com> <mailto:suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>>> wrote:
>      >
>      >? ? ?Forward to hotspot-runtime-dev.
>      >
>      >? ? ?As David commented in JBS, it may need to be fixed in JFR code.
>      >? ? ?But I'm not unclear why thread state is not recover.
>      >
>      >? ? ?I'd like to hear about this from JFR folks.
>      >? ? ?If it is just a bug in JFR, I will create a patch which recover it in JFR code.
>      >
>      >
>      >? ? ?Thanks,
>      >
>      >? ? ?Yasumasa
>      >
>      >
>      >? ? ?-------- Forwarded Message --------
>      >? ? ?Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>      >? ? ?Date: Fri, 1 Nov 2019 17:08:42 +0900
>      >? ? ?From: Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com> <mailto:suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>>>
>      >? ? ?To: hotspot-jfr-dev at openjdk.java.net <mailto:hotspot-jfr-dev at openjdk.java.net> <mailto:hotspot-jfr-dev at openjdk.java.net <mailto:hotspot-jfr-dev at openjdk.java.net>>
>      >? ? ?CC: yasuenag at gmail.com <mailto:yasuenag at gmail.com> <mailto:yasuenag at gmail.com <mailto:yasuenag at gmail.com>> <yasuenag at gmail.com <mailto:yasuenag at gmail.com> <mailto:yasuenag at gmail.com <mailto:yasuenag at gmail.com>>>
>      >
>      >? ? ?Hi all,
>      >
>      >? ? ?Please review this change:
>      >
>      >? ? ? ? ?JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>      >? ? ? ? ?webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>      >
>      >? ? ?If JFR is running when JVM crashes, JFR will dump data to hs_err_pid<PID>.jfr .
>      >? ? ?It would perform in prepare_for_emergency_dump().
>      >? ? ?However this function transits thread state to "_thread_in_vm".
>      >
>      >? ? ?This change has been tested on submit repo as mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>      >? ? ?It failed at compiler/types/correctness/CorrectnessTest.java
>      >? ? ?However this test is for JIT compiler, and related issue has been reported as JDK-8225620.
>      >? ? ?So I think this patch can go through.
>      >
>      >
>      >? ? ?Thanks,
>      >
>      >? ? ?Yasumasa
>      >
> 

From goetz.lindenmaier at sap.com  Tue Nov  5 15:22:22 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 5 Nov 2019 15:22:22 +0000
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
Message-ID: <AM6PR02MB5347F55B15D032B53B301DBCEC7E0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

Is this happening only with gcc 5.4, or also with 7.3?
7.3 was the compiler listed for jdk 13: 
https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms

in other places, we check for the specific gcc version:
http://hg.openjdk.java.net/jdk/jdk/file/8623f75be895/src/hotspot/share/utilities/debug.hpp#l157
http://hg.openjdk.java.net/jdk/jdk/file/8623f75be895/src/hotspot/share/utilities/compilerWarnings_gcc.hpp#l62
This warning seems not to happen with gcc 8.3.
So should we, similarly, protect this place with __GNUG__ < 6? or < 8?
Else I would like to see a comment that states that a warning
was seen with gcc 5.4, so that this can be removed more easily
later on.
Don't need a new webrev for that...

Also, above places make decisions for gcc 4.
Do we still support gcc 4?
This change might not be working on gcc 4. PRAGMA_DIAG_PUSH is defined 
empty for gcc < 4.6.  (I think this does not matter.)
... Maybe, in a follow up, we should remove these checks for gcc 4 and 
force at least gcc 5 in jdk 14? 

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
> On Behalf Of Thomas St?fe
> Sent: Dienstag, 5. November 2019 08:07
> To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
> Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after JDK-
> 8233359
> 
> Hi all,
> 
> may I please have reviews for this small build fix:
> 
> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-
> compat/webrev.00/webrev/
> Prior discussion:
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-
> November/036726.html
> 
> Thank you,
> 
> Thomas

From daniel.daugherty at oracle.com  Tue Nov  5 15:36:37 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 5 Nov 2019 10:36:37 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <f1e071e6-c92d-a6be-7536-df1a30fe7e6b@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
 <32f8e268-7c15-82f6-3b9b-398c33c160cb@oracle.com>
 <f1e071e6-c92d-a6be-7536-df1a30fe7e6b@oracle.com>
Message-ID: <e7e76d46-dd1f-e3f4-3db3-500c03507661@oracle.com>

Hi David,

On 11/5/19 12:31 AM, David Holmes wrote:
> Hi Dan,
>
> On 5/11/2019 11:31 am, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Thanks for continuing to provide feedback on the Async Monitor Deflation
>> project! I appreciate your reviews very much...
>>
>> Responses embedded below (as usual)...
>
> Ditto. :)

And again...


>
>>
>> On 11/4/19 1:28 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> A few follow ups to your responses, with trimming ...
>>>
>>> On 30/10/2019 6:20 am, Daniel D. Daugherty wrote:
>>>> On 10/24/19 7:00 AM, David Holmes wrote:
>>>>> ?122 // Set _owner field to new_value; current value must match 
>>>>> old_value.
>>>>> ?123 inline void ObjectMonitor::set_owner_from(void* new_value, 
>>>>> void* old_value) {
>>>>> ?124?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>>>>> ?125?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>>>>> INTPTR_FORMAT
>>>>>
>>>>> The use of cmpxchg seems a little strange here if you are 
>>>>> asserting that when this is called _owner must equal old_value. 
>>>>> That means you don't expect any race and if there is no race with 
>>>>> another thread writing to _owner then you don't need the cmpxchg. 
>>>>> A normal:
>>>>>
>>>>> if (_owner == old_value) {
>>>>> ?? Atomic::store(&_owner, new_value);
>>>>> ?? log(...);
>>>>> } else {
>>>>> ?? guarantee(false, " unexpected old owner ...");
>>>>> }
>>>>
>>>> The two parameter version of set_owner_from() is only called from 
>>>> three
>>>> places and we'll cover two of them here:
>>>>
>>>> src/hotspot/share/runtime/objectMonitor.cpp:
>>>>
>>>> 1041???? if (AsyncDeflateIdleMonitors) {
>>>> 1042?????? set_owner_from(NULL, Self);
>>>> 1043???? } else {
>>>> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // 
>>>> drop the lock
>>>> 1045?????? OrderAccess::storeload(); // See if we need to wake a 
>>>> successor
>>>> 1046???? }
>>>>
>>>> and:
>>>>
>>>> 1221?? if (AsyncDeflateIdleMonitors) {
>>>> 1222???? set_owner_from(NULL, Self);
>>>> 1223?? } else {
>>>> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
>>>> 1225???? OrderAccess::fence(); // ST _owner vs LD in unpark()
>>>> 1226?? }
>>>>
>>>> So I've replaced the existing {release_store(), storeload()} combo 
>>>> for one
>>>> call site and the existing {release_store(), fence()} combo for the 
>>>> other
>>>> call site with a cmpxchg(). I chose cmpxchg() for these reasons:
>>>>
>>>> 1) I wanted the same memory sync behavior at both call sites.
>>>> 2) I wanted similar/same memory sync behavior as the original
>>>> ??? code at those call sites.
>>>
>>> Why? The memory sync requirements for non-async deflation may be 
>>> completely different to those required for async-delfation (given 
>>> all the other bits if the protocol).
>>
>> Good point!
>>
>> For context, the first code block above (L1041-6) is in 
>> ObjectMonitor::exit()
>> and the second code block above (L1221-6) is in 
>> ObjectMonitor::ExitEpilog()
>> which is called from two different places by ObjectMonitor::exit(). 
>> In both
>> cases, we are setting the _owner field to NULL which will potentially 
>> make
>> the ObjectMonitor async deflatible (depending on ref_count).
>>
>> For async deflation, I want the full fence semantics after setting the
>> _owner field to NULL in both locations:
>>
>> src/hotspot/share/runtime/orderAccess.hpp:
>> //?????????????????????? Constraint???? x86????????? sparc 
>> TSO????????? ppc
>> // 
>> ---------------------------------------------------------------------------
>> // fence???????????????? LoadStore? |?? lock???????? membar 
>> #StoreLoad? sync
>> //?????????????????????? StoreStore |?? addl 0,(sp)
>> //?????????????????????? LoadLoad?? |
>> //?????????????????????? StoreLoad
>> //
>> // release?????????????? LoadStore | lwsync
>> //?????????????????????? StoreStore
>>
>> I don't want any loads or stores floating into or out of the critical 
>> region.
>>
>>
>> *** Side bar here ****
>>
>> I just noticed something with the original code:
>>
>> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>> the lock
>> 1045?????? OrderAccess::storeload();??????????????????????? // See if 
>> we need to wake a successor
>>
>> For constraints, this gives us:
>> ??????????? {LoadStore | StoreStore}
>> ??????????? {StoreLoad}
>> at L1044-5. So the original code is just "missing" LoadLoad relative
>> to a full fence(). I'm not sure why this kind of load is allowed to
>> float into the critical region, but the code has been this way for a
>> very long time.
>
> You seem to overlooking the fact that your store appears between the 
> various memory barriers e.g.
>
> ???????????? {LoadStore | StoreStore}
> ???????????? ST _owner, 0
> ???????????? {StoreLoad}
>
> which establishes the effects of those barriers with respect to that 
> store. So loadload() would be superfluous as we've already ensured 
> that no loads can float above the store, due to the storeload barrier.

Ouch! And yes I did overlook the store. This is what I get for writing
a reply late in the day and not letting it sit for re-review the next
morning. My apologies for being in a hurry.


> A full fence is logically all 4 barriers in that it ensures all loads 
> and all stores remain on their respective sides of the fence - nothing 
> can cross it.
>
>> And for this original code:
>>
>> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
>> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
>> _owner vs LD in unpark()
>>
>> For constraints, this gives us:
>> ????????? {LoadStore | StoreStore}
>> ????????? {LoadStore | StoreStore | LoadLoad | StoreLoad}
>> at L1224-5. Again this code has been this way for a very long time.
>>
>> It seems to me that L1224-5 could be written like this:
>>
>> 1224???? _owner = NULL;
>> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
>> _owner vs LD in unpark()
>>
>> with a plain store on L1224. Is that correct?
>
> No, I don't believe so.

And neither do I in the morning light. :-)


> What we are also in danger of overlooking here is the presence of 
> memory synchronization instructions related to the semantics of a 
> synchronized code block in Java, and the presence of memory 
> synchronization instructions needed for the correct implementation of 
> the synchronization subsystem itself. Specifically given:
>
> OrderAccess::release_store(&_owner, (void*)NULL);
> OrderAccess::<some other sync op>
>
> The release store ensures that the releasing of the monitor cannot be 
> reordered with respect to any of the stores that occurred within the 
> synchronized block at the Java-level. (And it _might_ also ensure some 
> property of the sync implementation.) While "some other op" is 
> typically needed only because of the way we implement the 
> synchronization subsystem - as per the comments e.g.
>
> storeload(); // See if we need to wake a successor
>
> we don't want to load a potential successor before we set _owner to 
> NULL else we might read the wrong value.

Agreed.


>
>> *** End side bar ***
>>
>>
>>>
>>>> 3) I wanted the return value from cmpxchg() for my state machine
>>>> ??? sanity check.
>>>
>>> I'm somewhat dubious about using cmpxchg just for the side-effect of 
>>> getting the existing value.
>>
>> But I'm not "using cmpxchg just for the side-effect of getting the 
>> existing value".
>>
>> That's the third thing on my list of three reasons. The most important
>> thing is I want the full fence that cmpcxhg() gives me. Above I said:
>>
>> ?> 1) I wanted the same memory sync behavior at both call sites.
>> ?> 2) I wanted similar/same memory sync behavior as the original
>> ?>? ?? code at those call sites.
>>
>> Using cmpxchg() gives me the full fence I want and that's similar to
>> this baseline code at this call site:
>
> Yes but the memory sync effects of cmpxchg are secondary to it primary 
> purpose: which is to provide an atomic compare-and-exchange in the 
> face of concurrent updates to a variable.

Agreed.


>
>> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop 
>> the lock
>> 1045?????? OrderAccess::storeload();??????????????????????? // See if 
>> we need to wake a successor
>>
>> I'm getting the LoadLoad that the baseline site doesn't have.
>
> And which it doesn't need.

And my brain returns to the more fundamental question of why do we have
OrderAccess::storeload() at L1045 and OrderAccess::fence() at L1225?
Both sites are trying to separate the release_store(&_owner, NULL) from
a subsequent load. In the first case:

1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
lock
1045?????? OrderAccess::storeload();??????????????????????? // See if we 
need to wake a successor
<snip>
1047???? if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != NULL) {

In the second case:

1224???? OrderAccess::release_store(&_owner, (void*)NULL);
1225???? OrderAccess::fence();?????????????????????????????? // ST 
_owner vs LD in unpark()
<snip>
1229?? Trigger->unpark();

The code has been this way for a very long time, but why?

Of course, this question about the baseline code is still a sidebar.
An interesting sidebar, but...


For Async Monitor Deflation, I think we need fence() at both locations
for proper interaction with the deflater thread.


>
>> The cmpxchg() gives me the same memory constaints as this baseline code
>> at this call site:
>>
>> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
>> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
>> _owner vs LD in unpark()
>>
>> Note: Actually, I don't have the extra {LoadStore | StoreStore} from
>> the release_store() that I mentioned in the side bar above...
>>
>> The last thing that I get is the existing value...
>>
>>
>> Okay, so I thought it was a pretty cool use of cmpxchg(), but I'm
>> obviously confusing code readers. So here's the v2.08 set_owner_from():
>>
>> ??124 // Set _owner field to new_value; current value must match 
>> old_value.
>> ??125 inline void ObjectMonitor::set_owner_from(void* new_value, 
>> void* old_value) {
>> ??126?? void* prev = Atomic::cmpxchg(new_value, &_owner, old_value);
>> ??127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>> INTPTR_FORMAT
>> ??128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
>> p2i(old_value));
>> ??129?? log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
>> INTPTR_FORMAT
>> ??130????????????????????????????????????? ", prev=" INTPTR_FORMAT ", 
>> new="
>> ??131????????????????????????????????????? INTPTR_FORMAT, p2i(this), 
>> p2i(prev),
>> ??132????????????????????????????????????? p2i(new_value));
>> ??133 }
>>
>> I could change it like this:
>>
>> ??124 // Set _owner field to new_value; current value must match 
>> old_value.
>> ??125 inline void ObjectMonitor::set_owner_from(void* new_value, 
>> void* old_value) {
>> ??126?? void* prev = _owner;
>> ??127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
>> INTPTR_FORMAT
>> ??128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
>> p2i(old_value));
>> ??129?? _owner = new_value;
>> ??130?? OrderAccess::fence();
>> ??131 ? log_trace(monitorinflation, owner)("set_owner_from(): mid=" 
>> INTPTR_FORMAT
>> ??132 ???????????????????????????????????? ", prev=" INTPTR_FORMAT ", 
>> new="
>> ??133 ???????????????????????????????????? INTPTR_FORMAT, p2i(this), 
>> p2i(prev),
>> ??134 ???????????????????????????????????? p2i(new_value));
>> ??135 }
>>
>> It's two lines longer, but it should require less head scratching to
>> figure out what I'm trying to do. Would this be acceptable?
>
> As per previous discussion I think you still need a release_store of 
> _owner (at least in the case where you are releasing the monitor).

I've made a mistake with this encapsulation. I made it look like a
general setter of a new value. In reality, both callers specify
new_value == NULL so we don't actually need the new_value parameter.

I think it needs to be something like this:

 ? 124 // Clear _owner field; current value must match old_value.
 ? 125 inline void ObjectMonitor::clear_owner_from(void* old_value) {
 ? 126?? void* prev = _owner;
 ? 127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
INTPTR_FORMAT
 ? 128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
p2i(old_value));
 ? 129?? OrderAccess::release_store(&_owner, (void*)NULL);
 ? 130?? OrderAccess::fence();
 ? 131?? log_trace(monitorinflation, owner)("clear_owner_from(): mid=" 
INTPTR_FORMAT
 ? 132????????????????????????????????????? ", prev=" INTPTR_FORMAT, 
p2i(this), p2i(prev));
 ? 133 }

Thanks for sticking with this part of the review.


> That's it on this thread. I still have to look at version 2.08 in full.

I look forward to your feedback!

Dan


>
> Thanks,
> David
> -----
>
>>
>>
>>>
>>>> I don't think that using 'Atomic::store(&_owner, new_value)' is the
>>>> right choice for these two call sites.
>>>
>>> If you don't actually need the cmpxchg to handle concurrent updates 
>>> to the _owner field, then a plain store (not an Atomic::store - that 
>>> was an error on my part) does not seem unreasonable; or if there are 
>>> still memory sync issues here, perhaps a release_store.
>>
>> So in the above proposed code I switched to a plain store followed by
>> a fence().
>>
>>
>>> If you use cmpxchg then anyone reading the code will assume there is 
>>> a concurrent update that you are guarding against.
>>
>> Yup. I concede the point that I'm obviously confusing the other
>> code readers... sorry about that...
>>
>>
>>>
>>>> The last two parameter set_owner_from() is talked about in the
>>>> next reply.
>>>>
>>>>
>>>>> Similarly for the old_value1/old_valuie2 version.
>>>>
>>>> The three parameter version of set_owner_from() is only called from 
>>>> one
>>>> place and the last two parameter version is called from the same 
>>>> place:
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp:
>>>>
>>>> 1903?????? if (AsyncDeflateIdleMonitors) {
>>>> 1904???????? m->set_owner_from(mark.locker(), NULL, DEFLATER_MARKER);
>>>> 1905?????? } else {
>>>> 1906???????? m->set_owner_from(mark.locker(), NULL);
>>>> 1907?????? }
>>>>
>>>> The original code was:
>>>>
>>>> 1399?????? m->set_owner(mark.locker());
>>>>
>>>> The original set_owner() code was defined like this:
>>>>
>>>> ?? 87 inline void ObjectMonitor::set_owner(void* owner) {
>>>> ?? 88?? _owner = owner;
>>>> ?? 89 }
>>>>
>>>> So the original code didn't do any memory sync'ing at all and I've
>>>> changed that to a cmpxchg() on both code paths. That appears to be
>>>> overkill for that callsite...
>>>
>>> Again I'm not sure any memory sync requirements from the non-async 
>>> case should necessarily transfer over to the async case. Even if you 
>>> end up requiring similar memory sync the reasoning would be quite 
>>> different I would expect.
>>
>> In this case, both async deflation and safepoint based deflation are
>> happy with the same memory sync because the newly allocated 
>> ObjectMonitor
>> isn't published yet so it is not deflatible by either mechanism. Also 
>> the
>> act of publishing the ObjectMonitor* will take care of the memory sync.
>>
>>
>>>
>>>>
>>>> We're in ObjectSynchronizer::inflate(), in the "CASE: stack-locked"
>>>> section of the code. We've gotten our ObjectMonitor from om_alloc()
>>>> and are initializing a number of fields in the ObjectMonitor. The
>>>> ObjectMonitor is not published until we do:
>>>>
>>>> 1916?????? object->release_set_mark(markWord::encode(m));
>>>>
>>>> So we don't need the memory sync'ing features of the cmpxchg() for
>>>> either of the set_owner_from() calls and all that leaves is the
>>>> state machine sanity check.
>>>>
>>>> I really like the state machine sanity check on the owner field but
>>>> that's just because it came in handy when chasing the recent races.
>>>> It would be easy to change the three parameter version of
>>>> set_owner_from() to not do memory sync'ing, but still do the state
>>>> machine sanity check.
>>>>
>>>> Update: Changing the three parameter version of set_owner_from()
>>>> may impact the changes to owner_is_DEFLATER_MARKER() discussed
>>>> above. Sigh...
>>>> Update 2: Probably no impact because the three parameter version of
>>>> set_owner_from() is only used before the ObjectMonitor is published
>>>> and owner_is_DEFLATER_MARKER() is used after the ObjectMonitor has
>>>> appeared on an in-use list.
>>>>
>>>> However, the two parameter version of set_owner_from() needs its
>>>> memory sync'ing behavior for it's objectMonitor.cpp call sites so
>>>> this call site would need something different.
>>>>
>>>> I'm not sure which solution I'm going to pick yet, but I definitely
>>>> have to change something here since we don't need cmpxchg() at this
>>>> call site. More thought is required.
>>>
>>> I will look to see where this ended up.
>>
>> I'll wait to see if you can live with the v2.08 version. I hope so...
>>
>>
>>>
>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>
>>>>>
>>>>> ?267?? if (AsyncDeflateIdleMonitors &&
>>>>> ?268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>>>>> DEFLATER_MARKER) {
>>>>
>>>> For more context, we are in:
>>>>
>>>> ??241 void ObjectMonitor::enter(TRAPS) {
>>>>
>>>>
>>>>> I don't see why you need to call try_set_owner_from again here as 
>>>>> "cur" will already be DEFLATER_MARKER from the previous 
>>>>> try_set_owner.
>>>>
>>>> I assume the previous try_set_owner() call you mean is this one:
>>>>
>>>> ??248?? void* cur = try_set_owner_from(Self, NULL);
>>>>
>>>> This first try_set_owner() is for the most common case of no owner.
>>>>
>>>> The second try_set_owner() call is for a different condition than 
>>>> the first:
>>>>
>>>> ??268?????? try_set_owner_from(Self, DEFLATER_MARKER) == 
>>>> DEFLATER_MARKER) {
>>>>
>>>> L248 is trying to change the _owner field from NULL -> 'Self'.
>>>> L268 is trying to change the _owner field from DEFLATER_MARKER to 
>>>> 'Self'.
>>>>
>>>> If the try_set_owner() call on L248 fails, 'cur' can be several 
>>>> possible
>>>> values:
>>>>
>>>> ?? - the calling thread (recursive enter is handled on L254-7)
>>>> ?? - other owning thread value (BasicLock* or Thread*)
>>>> ?? - DEFLATER_MARKER
>>>
>>> I'll give a caution okay to that explanation (the deficiency being 
>>> in my understanding, not your explaining :) ).
>>
>> Thanks. I'll take it!
>>
>>
>>>
>>>>> Further, I don't see how installing self as the _owner here is 
>>>>> valid and means you acquired the monitor, as the fact it was 
>>>>> DEFLATER_MARKER means it is still being deflated by another thread 
>>>>> doesn't it ???
>>>>
>>>> I guess the comment after L268 didn't work for you:
>>>>
>>>> ??269???? // The deflation protocol finished the first part 
>>>> (setting owner),
>>>> ??270???? // but it failed the second part (making ref_count 
>>>> negative) and
>>>> ??271???? // bailed. Or the ObjectMonitor was async deflated and 
>>>> reused.
>>>>
>>>> It means that the deflater thread was racing with this enter and
>>>> managed to set the owner field to DEFLATER_MARKER as the first step
>>>> in the deflation protocol. Our entering thread actually won the race
>>>> when it managed to set the ref_count to a positive value as part of
>>>> the ObjectMonitorHandle stuff done in the inflate() call that preceded
>>>> the enter() call. However, the deflater thread hasn't realized that it
>>>> lost the race yet and hasn't restored the owner field back to NULL.
>>>
>>> You're right the comment didn't work for me as it required me to be 
>>> holding too much of the protocol in my head. Makes more sense now.
>>
>> Good to hear!
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>
>> Thanks again for the thorough reviews!
>>
>> Dan
>>


From alex.buckley at oracle.com  Tue Nov  5 15:31:48 2019
From: alex.buckley at oracle.com (Alex Buckley)
Date: Tue, 5 Nov 2019 07:31:48 -0800
Subject: RFR: CSR JVM support for records
In-Reply-To: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
Message-ID: <11dcb894-cee5-2c1d-644c-3f3dcff3640f@oracle.com>

I expected the HTML file containing JVMS changes to be attached. A 
misformatted copy-paste of 4.7.30 isn't as good as the HTML file.

If no reflection is performed (BTW a short list of the germane 
reflection methods would be useful), then does the abstract JVM care 
about the Record attribute in any way? (Please answer in the CSR, not 
here.) Title should probably be "Reflection support for Records".

Alex

On 11/5/2019 6:27 AM, Harold Seigel wrote:
> Hi,
> 
> Please review this draft of the CSR for JVM support for records.
> 
> CSR: https://bugs.openjdk.java.net/browse/JDK-8233595
> 
> Thanks, Harold
> 

From thomas.stuefe at gmail.com  Tue Nov  5 15:51:37 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 5 Nov 2019 16:51:37 +0100
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <24ad04c1-942a-55b4-c57a-943ccada3748@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
 <e7f23e19-f208-0a8a-56ac-4be23deabc65@oss.nttdata.com>
 <CAA-vtUyKcSrmvZgCmmdZs74nQQ8_NN0D0=mFdrxDKq8rZwq=Qw@mail.gmail.com>
 <24ad04c1-942a-55b4-c57a-943ccada3748@oss.nttdata.com>
Message-ID: <CAA-vtUwkKNtTRf+9=bAVgnXJywzH4HwrmkPkpgNXLWz9x_goAQ@mail.gmail.com>

Okay, fair enough. Thanks for fixing this.

..Thomas

On Tue, Nov 5, 2019 at 3:35 PM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> On 2019/11/05 22:42, Thomas St?fe wrote:
> >
> > On Sat, Nov 2, 2019 at 1:46 PM Yasumasa Suenaga <suenaga at oss.nttdata.com
> <mailto:suenaga at oss.nttdata.com>> wrote:
> >
> >     Hi Thomas,
> >
> >     I agree with you. Also CI Replay will be generated after NMT report.
> >     But I think it should be another issue.
> >
> >     If you are ok, I file it to JBS and create a patch.
> >
> >
> >
> > Why not fix it in this patch, while you are on it?
>
> It is not affect thread state directly.
>
> In addition, I've sent review request of JDK-8233373. This change affects
> to the location of JFR::on_vm_shutdown().
> Thus I want to fix it after JDK-8233373 and JDK-8233375.
>
>
> Yasumasa
>
>
> > ..Thomas
> >
> >     Thanks,
> >
> >     Yasumasa
> >
> >
> >     On 2019/11/01 19:36, Thomas St?fe wrote:
> >      > Hi Yasumasa,
> >      >
> >      > I see that we do JFR::on_vm_shutdown() before error reporting
> ran. Is that really necessary? Error reporting should happen as close as
> possible to the error point - ideally, as little code as possible should
> run between the crash/assert and the generation of the hs-err file. I
> suggest moving the call to JFR::on_vm_shutdown()
> >      > down to a point after error reporting, e.g. to where we print the
> NMT report on shutdown.
> >      >
> >      > Cheers, Thomas
> >      >
> >      >
> >      > On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <
> suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com> <mailto:
> suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>>> wrote:
> >      >
> >      >     Forward to hotspot-runtime-dev.
> >      >
> >      >     As David commented in JBS, it may need to be fixed in JFR
> code.
> >      >     But I'm not unclear why thread state is not recover.
> >      >
> >      >     I'd like to hear about this from JFR folks.
> >      >     If it is just a bug in JFR, I will create a patch which
> recover it in JFR code.
> >      >
> >      >
> >      >     Thanks,
> >      >
> >      >     Yasumasa
> >      >
> >      >
> >      >     -------- Forwarded Message --------
> >      >     Subject: RFR: 8233375: JFR emergency dump do not recover
> thread state
> >      >     Date: Fri, 1 Nov 2019 17:08:42 +0900
> >      >     From: Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:
> suenaga at oss.nttdata.com> <mailto:suenaga at oss.nttdata.com <mailto:
> suenaga at oss.nttdata.com>>>
> >      >     To: hotspot-jfr-dev at openjdk.java.net <mailto:
> hotspot-jfr-dev at openjdk.java.net> <mailto:hotspot-jfr-dev at openjdk.java.net
> <mailto:hotspot-jfr-dev at openjdk.java.net>>
> >      >     CC: yasuenag at gmail.com <mailto:yasuenag at gmail.com> <mailto:
> yasuenag at gmail.com <mailto:yasuenag at gmail.com>> <yasuenag at gmail.com
> <mailto:yasuenag at gmail.com> <mailto:yasuenag at gmail.com <mailto:
> yasuenag at gmail.com>>>
> >      >
> >      >     Hi all,
> >      >
> >      >     Please review this change:
> >      >
> >      >         JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
> >      >         webrev:
> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
> >      >
> >      >     If JFR is running when JVM crashes, JFR will dump data to
> hs_err_pid<PID>.jfr .
> >      >     It would perform in prepare_for_emergency_dump().
> >      >     However this function transits thread state to
> "_thread_in_vm".
> >      >
> >      >     This change has been tested on submit repo as
> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
> >      >     It failed at compiler/types/correctness/CorrectnessTest.java
> >      >     However this test is for JIT compiler, and related issue has
> been reported as JDK-8225620.
> >      >     So I think this patch can go through.
> >      >
> >      >
> >      >     Thanks,
> >      >
> >      >     Yasumasa
> >      >
> >
>

From thomas.stuefe at gmail.com  Tue Nov  5 15:54:59 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 5 Nov 2019 16:54:59 +0100
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <AM6PR02MB5347F55B15D032B53B301DBCEC7E0@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <AM6PR02MB5347F55B15D032B53B301DBCEC7E0@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <CAA-vtUx5K8v64xP22LXheJ9wr0y9D0hu2oHr418gvxS+hMG3uw@mail.gmail.com>

Hi Goetz,

thanks for the review.

On Tue, Nov 5, 2019 at 4:22 PM Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
wrote:

> Hi,
>
> Is this happening only with gcc 5.4, or also with 7.3?
> 7.3 was the compiler listed for jdk 13:
> https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms
>
> in other places, we check for the specific gcc version:
>
> http://hg.openjdk.java.net/jdk/jdk/file/8623f75be895/src/hotspot/share/utilities/debug.hpp#l157
>
> http://hg.openjdk.java.net/jdk/jdk/file/8623f75be895/src/hotspot/share/utilities/compilerWarnings_gcc.hpp#l62
> This warning seems not to happen with gcc 8.3.
> So should we, similarly, protect this place with __GNUG__ < 6? or < 8?
> Else I would like to see a comment that states that a warning
> was seen with gcc 5.4, so that this can be removed more easily
> later on.
> Don't need a new webrev for that...
>
>
Okay, I'll add a comment.


> Also, above places make decisions for gcc 4.
> Do we still support gcc 4?
> This change might not be working on gcc 4. PRAGMA_DIAG_PUSH is defined
> empty for gcc < 4.6.  (I think this does not matter.)
> ... Maybe, in a follow up, we should remove these checks for gcc 4 and
> force at least gcc 5 in jdk 14?
>

Okay. If you think this is needed, please open a follow up bug.


>
> Best regards,
>   Goetz.
>
>
Thanks!

Thomas


>
> > -----Original Message-----
> > From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
> > On Behalf Of Thomas St?fe
> > Sent: Dienstag, 5. November 2019 08:07
> > To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
> > Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after JDK-
> > 8233359
> >
> > Hi all,
> >
> > may I please have reviews for this small build fix:
> >
> > Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
> > Webrev:
> > http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-
> > compat/webrev.00/webrev/
> > Prior discussion:
> > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-
> > November/036726.html
> >
> > Thank you,
> >
> > Thomas
>

From aoqi at loongson.cn  Tue Nov  5 17:31:44 2019
From: aoqi at loongson.cn (Ao Qi)
Date: Wed, 6 Nov 2019 01:31:44 +0800
Subject: RFR(trivial): JDK-8233608: Minimal build broken after JDK-8233494
Message-ID: <CALjzQn6b=8Kfif0ADdAJCEs+Ej69pnmuBwougms9+31g4YRkYQ@mail.gmail.com>

Hi,

Minimal build is broken after JDK-8233494. Could I please get reviews for this?

JBS:
https://bugs.openjdk.java.net/browse/JDK-8233608

Fix:
http://cr.openjdk.java.net/~aoqi/8233608/webrev.00/

Testing:
linux-x86_64-{server, minimal, zero}-{fastdebug, release} build and
hotspot:tier1

Thanks,
Ao Qi


From shade at redhat.com  Tue Nov  5 17:34:38 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 5 Nov 2019 18:34:38 +0100
Subject: RFR(trivial): JDK-8233608: Minimal build broken after JDK-8233494
In-Reply-To: <CALjzQn6b=8Kfif0ADdAJCEs+Ej69pnmuBwougms9+31g4YRkYQ@mail.gmail.com>
References: <CALjzQn6b=8Kfif0ADdAJCEs+Ej69pnmuBwougms9+31g4YRkYQ@mail.gmail.com>
Message-ID: <2540d5f7-af41-3e02-5d6b-f0a4c9c03722@redhat.com>

On 11/5/19 6:31 PM, Ao Qi wrote:
> Fix:
> http://cr.openjdk.java.net/~aoqi/8233608/webrev.00/

Looks good and trivial. It matches the method signature when INCLUDE_NMT is true.

-- 
Thanks,
-Aleksey


From aoqi at loongson.cn  Tue Nov  5 17:52:19 2019
From: aoqi at loongson.cn (Ao Qi)
Date: Wed, 6 Nov 2019 01:52:19 +0800
Subject: RFR(trivial): JDK-8233608: Minimal build broken after JDK-8233494
In-Reply-To: <2540d5f7-af41-3e02-5d6b-f0a4c9c03722@redhat.com>
References: <CALjzQn6b=8Kfif0ADdAJCEs+Ej69pnmuBwougms9+31g4YRkYQ@mail.gmail.com>
 <2540d5f7-af41-3e02-5d6b-f0a4c9c03722@redhat.com>
Message-ID: <CALjzQn4X5NKhnndyN3OWcGN8diMSkB24uTAm7HzNoOd3cXKVuQ@mail.gmail.com>

On Wed, Nov 6, 2019 at 1:34 AM Aleksey Shipilev <shade at redhat.com> wrote:
>
> On 11/5/19 6:31 PM, Ao Qi wrote:
> > Fix:
> > http://cr.openjdk.java.net/~aoqi/8233608/webrev.00/
>
> Looks good and trivial. It matches the method signature when INCLUDE_NMT is true.

Thank you Aleksey. Could you sponsor it please? Updated (just added
you as the reviewer):
http://cr.openjdk.java.net/~aoqi/8233608/webrev.01

Cheers,
Ao Qi


From shade at redhat.com  Tue Nov  5 18:32:42 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 5 Nov 2019 19:32:42 +0100
Subject: RFR(trivial): JDK-8233608: Minimal build broken after JDK-8233494
In-Reply-To: <CALjzQn4X5NKhnndyN3OWcGN8diMSkB24uTAm7HzNoOd3cXKVuQ@mail.gmail.com>
References: <CALjzQn6b=8Kfif0ADdAJCEs+Ej69pnmuBwougms9+31g4YRkYQ@mail.gmail.com>
 <2540d5f7-af41-3e02-5d6b-f0a4c9c03722@redhat.com>
 <CALjzQn4X5NKhnndyN3OWcGN8diMSkB24uTAm7HzNoOd3cXKVuQ@mail.gmail.com>
Message-ID: <68d02171-aab9-3447-920b-0aace2c83c96@redhat.com>

On 11/5/19 6:52 PM, Ao Qi wrote:
> On Wed, Nov 6, 2019 at 1:34 AM Aleksey Shipilev <shade at redhat.com> wrote:
>>
>> On 11/5/19 6:31 PM, Ao Qi wrote:
>>> Fix:
>>> http://cr.openjdk.java.net/~aoqi/8233608/webrev.00/
>>
>> Looks good and trivial. It matches the method signature when INCLUDE_NMT is true.
> 
> Thank you Aleksey. Could you sponsor it please? Updated (just added
> you as the reviewer):
> http://cr.openjdk.java.net/~aoqi/8233608/webrev.01

Sure, here:
  https://hg.openjdk.java.net/jdk/jdk/rev/e767fa6a1d45

-- 
Thanks,
-Aleksey


From aoqi at loongson.cn  Tue Nov  5 18:39:11 2019
From: aoqi at loongson.cn (Ao Qi)
Date: Wed, 6 Nov 2019 02:39:11 +0800
Subject: RFR(trivial): JDK-8233608: Minimal build broken after JDK-8233494
In-Reply-To: <68d02171-aab9-3447-920b-0aace2c83c96@redhat.com>
References: <CALjzQn6b=8Kfif0ADdAJCEs+Ej69pnmuBwougms9+31g4YRkYQ@mail.gmail.com>
 <2540d5f7-af41-3e02-5d6b-f0a4c9c03722@redhat.com>
 <CALjzQn4X5NKhnndyN3OWcGN8diMSkB24uTAm7HzNoOd3cXKVuQ@mail.gmail.com>
 <68d02171-aab9-3447-920b-0aace2c83c96@redhat.com>
Message-ID: <CALjzQn7-q7LOH85_7mUei4F5CXcO03HHv9woKtyZ9YgNS6KNFg@mail.gmail.com>

Thanks Aleksey!

On Wed, Nov 6, 2019 at 2:32 AM Aleksey Shipilev <shade at redhat.com> wrote:
>
> On 11/5/19 6:52 PM, Ao Qi wrote:
> > On Wed, Nov 6, 2019 at 1:34 AM Aleksey Shipilev <shade at redhat.com> wrote:
> >>
> >> On 11/5/19 6:31 PM, Ao Qi wrote:
> >>> Fix:
> >>> http://cr.openjdk.java.net/~aoqi/8233608/webrev.00/
> >>
> >> Looks good and trivial. It matches the method signature when INCLUDE_NMT is true.
> >
> > Thank you Aleksey. Could you sponsor it please? Updated (just added
> > you as the reviewer):
> > http://cr.openjdk.java.net/~aoqi/8233608/webrev.01
>
> Sure, here:
>   https://hg.openjdk.java.net/jdk/jdk/rev/e767fa6a1d45
>
> --
> Thanks,
> -Aleksey
>


From harold.seigel at oracle.com  Tue Nov  5 21:30:56 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Tue, 5 Nov 2019 16:30:56 -0500
Subject: RFR: CSR JVM support for records
In-Reply-To: <11dcb894-cee5-2c1d-644c-3f3dcff3640f@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
 <11dcb894-cee5-2c1d-644c-3f3dcff3640f@oracle.com>
Message-ID: <8ae89784-f82f-b878-c35e-4ea1acf0f5ca@oracle.com>

Hi Alex,

Thanks for looking at the CSR.? I've made the edits that you requested, 
except for attaching the JVMS changes.? JBS claimed that the JVMS 
changes potentially contained a virus and would not let me upload them.? 
So, instead, I included a link to the changes in the CSR.

Thanks, Harold

On 11/5/2019 10:31 AM, Alex Buckley wrote:
> I expected the HTML file containing JVMS changes to be attached. A 
> misformatted copy-paste of 4.7.30 isn't as good as the HTML file.
>
> If no reflection is performed (BTW a short list of the germane 
> reflection methods would be useful), then does the abstract JVM care 
> about the Record attribute in any way? (Please answer in the CSR, not 
> here.) Title should probably be "Reflection support for Records".
>
> Alex
>
> On 11/5/2019 6:27 AM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this draft of the CSR for JVM support for records.
>>
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8233595
>>
>> Thanks, Harold
>>

From david.holmes at oracle.com  Tue Nov  5 23:26:49 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 09:26:49 +1000
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <458e9a65-9d05-f237-9fa5-63ff69515daf@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <6cf0f903-38f2-c744-ca6e-66f529cdbd6f@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
 <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
 <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>
 <458e9a65-9d05-f237-9fa5-63ff69515daf@oss.nttdata.com>
Message-ID: <62fac22d-adc3-0b0f-43ad-801d1ccd3579@oracle.com>

On 5/11/2019 11:40 pm, Yasumasa Suenaga wrote:
> On 2019/11/05 22:34, Yasumasa Suenaga wrote:
>> Hi Markus,
>>
>> Thanks for explanation. I tweaked your webrev:
>>
>> ?? http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
> 
> Sorry, wevrev is here:
> 
>  ? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/

This seems okay to me.

Thanks,
David

> Yasumasa
> 
>> If you and David are ok, I will push it.
>>
>>
>>> The reason is that Event Streaming changed the architecture a bit 
>>> where existing data will move to the chunk file more frequently 
>>> (including artifacts) and it might therefore be possible to create a 
>>> non-VM internal dump mechanism (bounding it to mainly closing the 
>>> underlying file and copying chunks).
>>
>> I think it is very helpful for troubleshooting!
>> If crash report shows the location of JFR file or repository, it is 
>> more useful.
>> So I've sent review request for it (JDK-8233373):
>>
>>    
>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000808.html 
>>
>>
>>
>> Yasumasa
>>
>>
>> On 2019/11/05 21:54, Markus Gronlund wrote:
>>> The current dump mechanism is reusing most of the regular logic 
>>> because it has to perform quite a lot of work to construct a 
>>> recording. For example, it needs to collect all tagged artifacts in 
>>> the system (Klass, Method, ClassLoaderData, Symbols, Modules and 
>>> more) to have the events in an emergency recording file be fully 
>>> parsable. This is non-trivial requiring a thread to at least be part 
>>> of the VM, with most thread local data structures preserved.
>>>
>>> We should remember that dumping an emergency recording is only a best 
>>> effort attempt. As an analogy, compare it to other routines in 
>>> VMError::report() that are conditional and require a non-NULL thread 
>>> object (print the current Compile Task, print VM Operation or event 
>>> printing the JavaStack for a thread for example).
>>>
>>> With Event Streaming that was recently checked in, it is easier 
>>> (although not easy, but easier compared to before ) to extend this 
>>> support to not have the emergency dumper thread do so much internal 
>>> VM work.
>>> The reason is that Event Streaming changed the architecture a bit 
>>> where existing data will move to the chunk file more frequently 
>>> (including artifacts) and it might therefore be possible to create a 
>>> non-VM internal dump mechanism (bounding it to mainly closing the 
>>> underlying file and copying chunks).
>>>
>>> It is unclear at this point if the value-add is high enough to 
>>> warrant the work.
>>>
>>> Thanks
>>> Markus
>>>
>>>
>>> -----Original Message-----
>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>> Sent: den 5 november 2019 12:56
>>> To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes 
>>> <david.holmes at oracle.com>
>>> Cc: hotspot-jfr-dev at openjdk.java.net; 
>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>> thread state
>>>
>>> Hi Markus,
>>>
>>>> 1. An invalid thread local will give immediate problems downstream, 
>>>> for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>
>>> Do you mean that it would not perform when Thread::current() returns 
>>> NULL?
>>> It will happen when crash is occur in detached thread [1]. Can't we 
>>> think about that case?
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> [1] 
>>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000822.html 
>>>
>>>
>>>
>>> On 2019/11/05 19:35, Markus Gronlund wrote:
>>>> Hi again,
>>>>
>>>> The comments in my previous email still apply:
>>>>
>>>> "...although very unlikely, we could have run into an issue with 
>>>> thread local storage, so it makes sense to test this up front. If we 
>>>> cannot read the thread local, the operations we intend to perform 
>>>> will fail, so we might just bail out already. I took the liberty to 
>>>> tighten up the transition class a little bit; you only need to 
>>>> restore the thread state if there was an actual change."
>>>>
>>>> 1. An invalid thread local will give immediate problems downstream, 
>>>> for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>> 2. You don't need to restore _thread_in_vm back to threads already 
>>>> running in the correct state. The purpose of the transition helper 
>>>> class is to move Java threads not running in _thread_in_vm (i.e. 
>>>> will be _thread_in_native). Move this logic to the fore to better 
>>>> clarify the intent of the helper class.
>>>>
>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>
>>>> Thanks
>>>> Markus
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>> Sent: den 5 november 2019 07:36
>>>> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund
>>>> <markus.gronlund at oracle.com>
>>>> Cc: hotspot-jfr-dev at openjdk.java.net;
>>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>> thread state
>>>>
>>>> Thanks David!
>>>> I wait for Markus's review.
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/11/05 15:19, David Holmes wrote:
>>>>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Sorry, I was confused :)
>>>>>> This is new webrev. Could you check again?
>>>>>>
>>>>>> ? ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>>>>
>>>>> Okay structurally I'm fine with that.
>>>>>
>>>>> What I don't know, and will leave to Markus to determine, is 
>>>>> whether the rest of the code in on_vm_shutdown can actually execute 
>>>>> okay if there is no current thread.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/05 14:56, David Holmes wrote:
>>>>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump 
>>>>>>>>>>> takes the thread argument. As it is just a static function 
>>>>>>>>>>> that doesn't impact anything else.
>>>>>>>>>>
>>>>>>>>>> prepare_for_emergency_dump() returns false if some critical 
>>>>>>>>>> locks could not unlock.
>>>>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>>>>
>>>>>>>>> But you're not calling prepare_for_emergency_dump when the 
>>>>>>>>> thread is NULL:
>>>>>>>>>
>>>>>>>>> ? ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>> ? ??455
>>>>>>>>> ? ??456?? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>> this call
>>>>>>>>> ? ??457?? JavaThreadInVM jtivm(thread);
>>>>>>>>> ? ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>>>>>> ? ??459???? return;
>>>>>>>>> ? ??460?? }
>>>>>>>>
>>>>>>>> Oh, sorry, I have a mistake!
>>>>>>>> I want to change as below:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>>>>> +
>>>>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>> +? JavaThreadInVM jtivm(thread);
>>>>>>>> +? if (thread != NULL) {
>>>>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>>>>> +????? return;
>>>>>>>> +??? }
>>>>>>>> +? }
>>>>>>>> ```
>>>>>>>
>>>>>>> but that is the same logic ??
>>>>>>>
>>>>>>>>> All I'm saying is that you pass "thread" as a parameter so you 
>>>>>>>>> can then delete the existing call to Thread::current() that is 
>>>>>>>>> inside prepare_for_emergency_dump.
>>>>>>>>
>>>>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() 
>>>>>>>> instead of above?
>>>>>>>
>>>>>>> ??? The two are not related. If you've already obtained the 
>>>>>>> current thread you can pass it to prepare_for_emergency_dump and 
>>>>>>> avoid the need to call Thread:current() (in whatever form) again. 
>>>>>>> How you handle a NULL current thread is independent of that.
>>>>>>>
>>>>>>>> If so, I will push new changeset to submit repo, and will send 
>>>>>>>> new review request.
>>>>>>>
>>>>>>> I'd send the review request first and get agreement before 
>>>>>>> wasting time on the submit repo.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> It might break semantics of this function, so I did not add 
>>>>>>>>>> argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ? ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This change passed all tests on submit repo 
>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>>>>> Could you review again?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In Markus's change, emergency dump will not perform when 
>>>>>>>>>>>>>> Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) 
>>>>>>>>>>>>>> throws to PID by `kill` command - main thread of the 
>>>>>>>>>>>>>> process will be already detached (out of JVM).
>>>>>>>>>>>>>> Also the crash might happen in native thread - created by 
>>>>>>>>>>>>>> pthread_create (on Linux) from JNI code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thus we should continue to perform emergency dump even if 
>>>>>>>>>>>>>> Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I didn't quite follow all that, but if there is no current 
>>>>>>>>>>>>> thread then prepare_for_emergency_dump() is either going to 
>>>>>>>>>>>>> assert here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ? ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>
>>>>>>>>>>>>> or crash here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ? ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks David!
>>>>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>>>>>
>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump 
>>>>>>>>>>> takes the thread argument. As it is just a static function 
>>>>>>>>>>> that doesn't impact anything else.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> It works fine on submit repo 
>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Apologies about the suggestion to use 
>>>>>>>>>>>>>>>> ThreadInVMFromUnknown. I realized later that, as you 
>>>>>>>>>>>>>>>> have pointed out, it would perform a real thread 
>>>>>>>>>>>>>>>> transition. Sorry.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the 
>>>>>>>>>>>>>>>> situations I have seen at this location, I think the 
>>>>>>>>>>>>>>>> only case we need to be concerned about here is when a 
>>>>>>>>>>>>>>>> JavaThread is _thread_in_native. _thread_in_java 
>>>>>>>>>>>>>>>> transition to _thread_in_vm via stubs in SharedRuntime 
>>>>>>>>>>>>>>>> (i believe) as part of coming out of the exception 
>>>>>>>>>>>>>>>> handler(s). Unfortunately I cannot give a proper 
>>>>>>>>>>>>>>>> argument now to give the premises where this invariant 
>>>>>>>>>>>>>>>> is enforced, so let's work with the original thread 
>>>>>>>>>>>>>>>> state as you suggested Yasumasa.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we can avoid passing the thread all the way through, 
>>>>>>>>>>>>>>>> I think that is preferable (this is not performance 
>>>>>>>>>>>>>>>> critical code). David also alluded to the fact that you 
>>>>>>>>>>>>>>>> always manipulate the current thread anyway. Although 
>>>>>>>>>>>>>>>> very unlikely, we could have run into an issue with 
>>>>>>>>>>>>>>>> thread local storage, so it makes sense to test this up 
>>>>>>>>>>>>>>>> front. If we cannot read the thread local, the 
>>>>>>>>>>>>>>>> operations we intend to perform will fail, so we might 
>>>>>>>>>>>>>>>> just bail out already.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I took the liberty to tighten up the transition class a 
>>>>>>>>>>>>>>>> little bit; you only need to restore the thread state if 
>>>>>>>>>>>>>>>> there was an actual change.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus
>>>>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>;
>>>>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't think it is worth the disruption to pass the
>>>>>>>>>>>>>>>> "thread" all the way through these API's. It is
>>>>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current 
>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm),
>>>>>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no 
>>>>>>>>>>>>>>>> guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ? ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only 
>>>>>>>>>>>>>>>> affects JavaThreads. And as it is local we don't need 
>>>>>>>>>>>>>>>> the "forJFR" part.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to 
>>>>>>>>>>>>>>>> constrain the scope even further can I suggest the 
>>>>>>>>>>>>>>>> following:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>>>> ? ??? return;
>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>> ? ??? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>>>>>>>>> this call
>>>>>>>>>>>>>>>> ? ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>>>> ? ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>>>> ? ????? return;
>>>>>>>>>>>>>>>> ? ??? }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing
>>>>>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to
>>>>>>>>>>>>>>>>>>>> adding more custom code that directly manipulates 
>>>>>>>>>>>>>>>>>>>> thread-state.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its
>>>>>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the
>>>>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates 
>>>>>>>>>>>>>>>>>>> to safepoint.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint
>>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>> handshake) were pending. But the path through
>>>>>>>>>>>>>>>>>> before_exit already has safepoint checks when you 
>>>>>>>>>>>>>>>>>> acquire the BeforeExit_lock.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a
>>>>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom
>>>>>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may
>>>>>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ? ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>>>> ? ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers,
>>>>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the 
>>>>>>>>>>>>>>>>>> check for the WatcherThread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>>>>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund
>>>>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do
>>>>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept 
>>>>>>>>>>>>>>>>>>>>> local to JFR.
>>>>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703). 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed 
>>>>>>>>>>>>>>>>>>>>>> in JFR code.
>>>>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch
>>>>>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ? ?? ? JBS:
>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>>>> ? ?? ? webrev:
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/web
>>>>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump
>>>>>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>>>>> However this function transits thread state to 
>>>>>>>>>>>>>>>>>>>>>> "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related
>>>>>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Tue Nov  5 23:40:13 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Wed, 6 Nov 2019 08:40:13 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <62fac22d-adc3-0b0f-43ad-801d1ccd3579@oracle.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
 <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
 <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>
 <458e9a65-9d05-f237-9fa5-63ff69515daf@oss.nttdata.com>
 <62fac22d-adc3-0b0f-43ad-801d1ccd3579@oracle.com>
Message-ID: <b55b67c6-3099-8757-5850-1003e0dd1a1b@oss.nttdata.com>

Thanks David!
Markus, please review it:

>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/


Yasumasa


On 2019/11/06 8:26, David Holmes wrote:
> On 5/11/2019 11:40 pm, Yasumasa Suenaga wrote:
>> On 2019/11/05 22:34, Yasumasa Suenaga wrote:
>>> Hi Markus,
>>>
>>> Thanks for explanation. I tweaked your webrev:
>>>
>>> ?? http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>
>> Sorry, wevrev is here:
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/
> 
> This seems okay to me.
> 
> Thanks,
> David
> 
>> Yasumasa
>>
>>> If you and David are ok, I will push it.
>>>
>>>
>>>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>>
>>> I think it is very helpful for troubleshooting!
>>> If crash report shows the location of JFR file or repository, it is more useful.
>>> So I've sent review request for it (JDK-8233373):
>>>
>>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000808.html
>>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/05 21:54, Markus Gronlund wrote:
>>>> The current dump mechanism is reusing most of the regular logic because it has to perform quite a lot of work to construct a recording. For example, it needs to collect all tagged artifacts in the system (Klass, Method, ClassLoaderData, Symbols, Modules and more) to have the events in an emergency recording file be fully parsable. This is non-trivial requiring a thread to at least be part of the VM, with most thread local data structures preserved.
>>>>
>>>> We should remember that dumping an emergency recording is only a best effort attempt. As an analogy, compare it to other routines in VMError::report() that are conditional and require a non-NULL thread object (print the current Compile Task, print VM Operation or event printing the JavaStack for a thread for example).
>>>>
>>>> With Event Streaming that was recently checked in, it is easier (although not easy, but easier compared to before ) to extend this support to not have the emergency dumper thread do so much internal VM work.
>>>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>>>
>>>> It is unclear at this point if the value-add is high enough to warrant the work.
>>>>
>>>> Thanks
>>>> Markus
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>> Sent: den 5 november 2019 12:56
>>>> To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes <david.holmes at oracle.com>
>>>> Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
>>>>
>>>> Hi Markus,
>>>>
>>>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>>
>>>> Do you mean that it would not perform when Thread::current() returns NULL?
>>>> It will happen when crash is occur in detached thread [1]. Can't we think about that case?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-November/000822.html
>>>>
>>>>
>>>> On 2019/11/05 19:35, Markus Gronlund wrote:
>>>>> Hi again,
>>>>>
>>>>> The comments in my previous email still apply:
>>>>>
>>>>> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
>>>>>
>>>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>>> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
>>>>>
>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>
>>>>> Thanks
>>>>> Markus
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>> Sent: den 5 november 2019 07:36
>>>>> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund
>>>>> <markus.gronlund at oracle.com>
>>>>> Cc: hotspot-jfr-dev at openjdk.java.net;
>>>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>> thread state
>>>>>
>>>>> Thanks David!
>>>>> I wait for Markus's review.
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/05 15:19, David Holmes wrote:
>>>>>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Sorry, I was confused :)
>>>>>>> This is new webrev. Could you check again?
>>>>>>>
>>>>>>> ? ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>>>>>
>>>>>> Okay structurally I'm fine with that.
>>>>>>
>>>>>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2019/11/05 14:56, David Holmes wrote:
>>>>>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>>
>>>>>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>>>>>
>>>>>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>>>>>
>>>>>>>>>> ? ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>>> ? ??455
>>>>>>>>>> ? ??456?? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>>> this call
>>>>>>>>>> ? ??457?? JavaThreadInVM jtivm(thread);
>>>>>>>>>> ? ??458?? if ((thread != NULL) && !prepare_for_emergency_dump()) {
>>>>>>>>>> ? ??459???? return;
>>>>>>>>>> ? ??460?? }
>>>>>>>>>
>>>>>>>>> Oh, sorry, I have a mistake!
>>>>>>>>> I want to change as below:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>> +
>>>>>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this call
>>>>>>>>> +? JavaThreadInVM jtivm(thread);
>>>>>>>>> +? if (thread != NULL) {
>>>>>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>>>>>> +????? return;
>>>>>>>>> +??? }
>>>>>>>>> +? }
>>>>>>>>> ```
>>>>>>>>
>>>>>>>> but that is the same logic ??
>>>>>>>>
>>>>>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>>>>>
>>>>>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>>>>>
>>>>>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>>>>>
>>>>>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>>>>>
>>>>>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ? ?? http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>>>>>> Could you review again?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.02/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> or crash here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks David!
>>>>>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03/
>>>>>>>>>>>>
>>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus
>>>>>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>;
>>>>>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't think it is worth the disruption to pass the
>>>>>>>>>>>>>>>>> "thread" all the way through these API's. It is
>>>>>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == _thread_in_vm),
>>>>>>>>>>>>>>>>> "invariant");
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ? ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>>>>> ? ??? return;
>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>> ? ??? // Ensure a JavaThread is _thread_in_vm when we make
>>>>>>>>>>>>>>>>> this call
>>>>>>>>>>>>>>>>> ? ??? JavaThreadInVM jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>>>>> ? ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>>>>> ? ????? return;
>>>>>>>>>>>>>>>>> ? ??? }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev/
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. Reusing
>>>>>>>>>>>>>>>>>>>>> existing thread-state transition code is preferable to
>>>>>>>>>>>>>>>>>>>>> adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in its
>>>>>>>>>>>>>>>>>>>> arguments. So
>>>>>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the
>>>>>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes it does - which would be a problem if a safepoint
>>>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>> handshake) were pending. But the path through
>>>>>>>>>>>>>>>>>>> before_exit already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a
>>>>>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a custom
>>>>>>>>>>>>>>>>>> transition helper is needed to avoid that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we may
>>>>>>>>>>>>>>>>>>> not be running in a JavaThread:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ? ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>>>>> ? ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers,
>>>>>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>>>>>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund
>>>>>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do
>>>>>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webr
>>>>>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a patch
>>>>>>>>>>>>>>>>>>>>>>> which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ? ?? ? JBS:
>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>>>>> ? ?? ? webrev:
>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/web
>>>>>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will dump
>>>>>>>>>>>>>>>>>>>>>>> data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and related
>>>>>>>>>>>>>>>>>>>>>>> issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa

From david.holmes at oracle.com  Tue Nov  5 23:43:29 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 09:43:29 +1000
Subject: RFR (trivial): 8233599: ARM32 Build failed due to 8232050 missing an
 include
Message-ID: <9994b67c-7008-e7c8-3b9a-0e923b76c890@oracle.com>

Bug: https://bugs.openjdk.java.net/browse/JDK-8233599

Contributed patch from Markus Knetschke:

diff --git a/src/hotspot/cpu/arm/vtableStubs_arm.cpp
b/src/hotspot/cpu/arm/vtableStubs_arm.cpp
index 2c564b8189..f84e11662c 100644
--- a/src/hotspot/cpu/arm/vtableStubs_arm.cpp
+++ b/src/hotspot/cpu/arm/vtableStubs_arm.cpp
@@ -32,6 +32,7 @@
  #include "oops/compiledICHolder.hpp"
  #include "oops/instanceKlass.hpp"
  #include "oops/klassVtable.hpp"
+#include "oops/klass.inline.hpp"
  #include "runtime/sharedRuntime.hpp"
  #include "vmreg_arm.inline.hpp"
  #ifdef COMPILER2

I am Reviewing and sponsoring.

Thanks,
David

From david.holmes at oracle.com  Tue Nov  5 23:50:09 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 09:50:09 +1000
Subject: JVM stuck/looping in futex call
In-Reply-To: <CACGCMVqQZ66ty164P1QxgFk-QKNhmrh+Y99XQ5vDapA2V+JDxw@mail.gmail.com>
References: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
 <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>
 <CACGCMVqRdOnjstXgFZ3AGd2Fxo8LBrTeHkC6r7EJZ311-M_a=w@mail.gmail.com>
 <350a6e97-41ee-f49f-0354-ec655d6490da@oracle.com>
 <CACGCMVqQZ66ty164P1QxgFk-QKNhmrh+Y99XQ5vDapA2V+JDxw@mail.gmail.com>
Message-ID: <96bb9df0-105a-8b51-b5b1-6f01f1ad8abc@oracle.com>

On 5/11/2019 4:13 pm, Sundara Mohan M wrote:
> Hi?David,
>  ? ? Will try to get stack when it happens.
> I think the main thread is where the loop happens (in my case it is 
> jetty server).
> No special environment just jetty server and no native threads are 
> attached. We also try to avoid JNI related stuff?as much as possible.

Can you provide full details on OS kernel and glibc versions?

No one else has reported anything like this.

Thanks,
David

> 
> Thanks
> Sundar
> 
> On Mon, Nov 4, 2019 at 3:17 PM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     On 5/11/2019 8:43 am, Sundara Mohan M wrote:
>      > HI David,
>      >? ? ? Did you mean to get stack trace of that process? I could
>     attach to
>      > gdb but not sure where to keep breakpoint.
>      > More info on how to get this will be helpful.
> 
>     I need to see the stack before we hit the looping call, to see what it
>     is that triggers the loop. Can you tell what thread is involved?
> 
>     Is there something special/different about your Linux environment? Do
>     you have native threads attached to the VM?
> 
>     Thanks,
>     David
> 
>      >
>      > Thanks
>      > Sundar
>      >
>      > On Fri, Nov 1, 2019 at 4:03 PM David Holmes
>     <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>      > <mailto:david.holmes at oracle.com
>     <mailto:david.holmes at oracle.com>>> wrote:
>      >
>      >? ? ?Hi Sundar,
>      >
>      >? ? ?On 2/11/2019 5:39 am, Sundara Mohan M wrote:
>      >? ? ? > Hi,
>      >? ? ? >? ? ? I am running openjdk12/Linux on our systems and see
>     jvm not
>      >? ? ?responding
>      >? ? ? > to jstack or any diagnostic command (jcmd
>     VM.info/Thread.print).
>      >? ? ?Though
>      >? ? ? > application is running fine.
>      >
>      >? ? ?That would sound like the attach thread (which would respond
>     to the
>      >? ? ?jstack or other diagnostic command) is in some kind of bad state.
>      >
>      >? ? ? > I see following stack track
>      >? ? ? >
>      >? ? ? > Process 115586 attached
>      >? ? ? > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
>     ERESTARTSYS
>      >? ? ?(To be
>      >? ? ? > restarted if SA_RESTART is set)
>      >? ? ? > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>      >? ? ?si_uid=1000}
>      >? ? ? > ---
>      >? ? ? > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
>      >? ? ? > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      >? ? ? > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
>     ERESTARTSYS
>      >? ? ?(To be
>      >? ? ? > restarted if SA_RESTART is set)
>      >? ? ? > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>      >? ? ?si_uid=1000}
>      >? ? ? > ---
>      >? ? ? > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
>      >? ? ? > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      >? ? ? > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
>     ERESTARTSYS
>      >? ? ?(To be
>      >? ? ? > restarted if SA_RESTART is set)
>      >? ? ? > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>      >? ? ?si_uid=1000}
>      >? ? ? > ---
>      >? ? ? > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
>      >? ? ? > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      >? ? ? > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
>     ERESTARTSYS
>      >? ? ?(To be
>      >? ? ? > restarted if SA_RESTART is set)
>      >? ? ? > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>      >? ? ?si_uid=1000}
>      >? ? ? > ---
>      >? ? ? > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      >? ? ? > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
>     ERESTARTSYS
>      >? ? ?(To be
>      >? ? ? > restarted if SA_RESTART is set)
>      >? ? ? > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=115587,
>      >? ? ?si_uid=1000}
>      >? ? ? > ---
>      >? ? ? > rt_sigreturn()? ? ? ? ? ? ? ? ? ? ? ? ? = 202
>      >? ? ? > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess 115586
>      >? ? ?detached
>      >? ? ? >? ?<detached ...>
>      >? ? ? >
>      >? ? ? > Can someone help me understand what is happening here?
>      >
>      >? ? ?It appears that in responding to the SIGQUIT that is used to
>     trigger
>      >? ? ?the
>      >? ? ?starting of the attach listener thread, that something is
>     going wrong.
>      >? ? ?We appear to be continually restarting an operation that
>     still sees the
>      >? ? ?signal pending - which doesn't really make sense to me. Can
>     you get a
>      >? ? ?complete stack trace using gdb?
>      >
>      >? ? ? > Please redirect me to proper ilist if this is not correct list
>      >? ? ?for these
>      >? ? ? > type of questions.
>      >
>      >? ? ?This list is fine. It may end up being an issue for
>     serviceability-dev
>      >? ? ?but we can deal with that later. :)
>      >
>      >? ? ?Thanks,
>      >? ? ?David
>      >? ? ?-----
>      >
>      >? ? ? >
>      >? ? ? > TIA
>      >? ? ? > Sundar
>      >? ? ? >
>      >
> 

From kim.barrett at oracle.com  Wed Nov  6 01:38:09 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 5 Nov 2019 20:38:09 -0500
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
Message-ID: <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>

> On Nov 5, 2019, at 2:06 AM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> 
> Hi all,
> 
> may I please have reviews for this small build fix:
> 
> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
> Prior discussion: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
> 
> Thank you,
> 
> Thomas

The "#ifdef __GNUG__" should not be needed or used.

There are a dozen existing uses of PRAGMA_DIAG_PUSH.

While there aren't currently any other direct uses of
PRAGMA_DISABLE_GCC_WARNING, there are indirect uses via other
PRAGMA_DISABLE_xxx macros.

None of those have __GNUX__ protections (for any X).

> On Nov 5, 2019, at 10:22 AM, Lindenmaier, Goetz <goetz.lindenmaier at sap.com> wrote:
> [?] Do we still support gcc 4?
> This change might not be working on gcc 4. PRAGMA_DIAG_PUSH is defined 
> empty for gcc < 4.6.  (I think this does not matter.)

gcc < 4.6 doesn?t matter.

Minimum supported gcc version according to make/autoconf/toolchains.mk is 4.8.
(I?m not sure I actually believe that :)

gcc7.3 is the earliest version listed for jdk13 here:
https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms
(Though there?s no mention here of the BellSoft 32bit linux port, which I possibly
mis-remember as using gcc4.9.)


From david.holmes at oracle.com  Wed Nov  6 01:44:58 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 11:44:58 +1000
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
Message-ID: <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>

On 6/11/2019 11:38 am, Kim Barrett wrote:
>> On Nov 5, 2019, at 2:06 AM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
>>
>> Hi all,
>>
>> may I please have reviews for this small build fix:
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
>> Prior discussion: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
>>
>> Thank you,
>>
>> Thomas
> 
> The "#ifdef __GNUG__" should not be needed or used.
> 
> There are a dozen existing uses of PRAGMA_DIAG_PUSH.
> 
> While there aren't currently any other direct uses of
> PRAGMA_DISABLE_GCC_WARNING, there are indirect uses via other
> PRAGMA_DISABLE_xxx macros.
> 
> None of those have __GNUX__ protections (for any X).

Well AFAICS they reside in a gcc specific file: 
compilerWarnings_gcc.hpp. But we also take steps in compilerWarnings.hpp 
to accommodate their use in source code when other compilers are used.

#ifndef PRAGMA_DISABLE_GCC_WARNING
#define PRAGMA_DISABLE_GCC_WARNING(name)
#endif

so I agree we should not need any guard to make this gcc only - unless 
we really do want to control the version.

David
-----


>> On Nov 5, 2019, at 10:22 AM, Lindenmaier, Goetz <goetz.lindenmaier at sap.com> wrote:
>> [?] Do we still support gcc 4?
>> This change might not be working on gcc 4. PRAGMA_DIAG_PUSH is defined
>> empty for gcc < 4.6.  (I think this does not matter.)
> 
> gcc < 4.6 doesn?t matter.
> 
> Minimum supported gcc version according to make/autoconf/toolchains.mk is 4.8.
> (I?m not sure I actually believe that :)
> 
> gcc7.3 is the earliest version listed for jdk13 here:
> https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms
> (Though there?s no mention here of the BellSoft 32bit linux port, which I possibly
> mis-remember as using gcc4.9.)
> 

From m.sundar85 at gmail.com  Wed Nov  6 02:05:17 2019
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Tue, 5 Nov 2019 18:05:17 -0800
Subject: JVM stuck/looping in futex call
In-Reply-To: <96bb9df0-105a-8b51-b5b1-6f01f1ad8abc@oracle.com>
References: <CACGCMVreB5xu=f1DRh+8KND+nvLVuKGzKTCS1Wv4Qi2nO4LTew@mail.gmail.com>
 <507d7b80-a93a-4e51-4842-8b329beab486@oracle.com>
 <CACGCMVqRdOnjstXgFZ3AGd2Fxo8LBrTeHkC6r7EJZ311-M_a=w@mail.gmail.com>
 <350a6e97-41ee-f49f-0354-ec655d6490da@oracle.com>
 <CACGCMVqQZ66ty164P1QxgFk-QKNhmrh+Y99XQ5vDapA2V+JDxw@mail.gmail.com>
 <96bb9df0-105a-8b51-b5b1-6f01f1ad8abc@oracle.com>
Message-ID: <CACGCMVqztJBMSFP1DdFL_03L0To-M0s-s=fV57=R7Rb8tupz1Q@mail.gmail.com>

glibc-2.12-1.212.el6.x86_64
compat-glibc-2.5-46.2.x86_64
glibc-utils-2.12-1.212.el6.x86_64
glibc-common-2.12-1.212.el6.x86_64
Linux 2.6.32-754.10.1.el6.20190116.16.x86_64 #1 SMP Wed Jan 16 21:27:59 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux (RHEL-6.10)

Thanks
Sundar


On Tue, Nov 5, 2019 at 3:52 PM David Holmes <david.holmes at oracle.com> wrote:

> On 5/11/2019 4:13 pm, Sundara Mohan M wrote:
> > Hi David,
> >      Will try to get stack when it happens.
> > I think the main thread is where the loop happens (in my case it is
> > jetty server).
> > No special environment just jetty server and no native threads are
> > attached. We also try to avoid JNI related stuff as much as possible.
>
> Can you provide full details on OS kernel and glibc versions?
>
> No one else has reported anything like this.
>
> Thanks,
> David
>
> >
> > Thanks
> > Sundar
> >
> > On Mon, Nov 4, 2019 at 3:17 PM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> >     On 5/11/2019 8:43 am, Sundara Mohan M wrote:
> >      > HI David,
> >      >      Did you mean to get stack trace of that process? I could
> >     attach to
> >      > gdb but not sure where to keep breakpoint.
> >      > More info on how to get this will be helpful.
> >
> >     I need to see the stack before we hit the looping call, to see what
> it
> >     is that triggers the loop. Can you tell what thread is involved?
> >
> >     Is there something special/different about your Linux environment? Do
> >     you have native threads attached to the VM?
> >
> >     Thanks,
> >     David
> >
> >      >
> >      > Thanks
> >      > Sundar
> >      >
> >      > On Fri, Nov 1, 2019 at 4:03 PM David Holmes
> >     <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> >      > <mailto:david.holmes at oracle.com
> >     <mailto:david.holmes at oracle.com>>> wrote:
> >      >
> >      >     Hi Sundar,
> >      >
> >      >     On 2/11/2019 5:39 am, Sundara Mohan M wrote:
> >      >      > Hi,
> >      >      >      I am running openjdk12/Linux on our systems and see
> >     jvm not
> >      >     responding
> >      >      > to jstack or any diagnostic command (jcmd
> >     VM.info/Thread.print).
> >      >     Though
> >      >      > application is running fine.
> >      >
> >      >     That would sound like the attach thread (which would respond
> >     to the
> >      >     jstack or other diagnostic command) is in some kind of bad
> state.
> >      >
> >      >      > I see following stack track
> >      >      >
> >      >      > Process 115586 attached
> >      >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
> >     ERESTARTSYS
> >      >     (To be
> >      >      > restarted if SA_RESTART is set)
> >      >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER,
> si_pid=115587,
> >      >     si_uid=1000}
> >      >      > ---
> >      >      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> >      >      > rt_sigreturn()                          = 202
> >      >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
> >     ERESTARTSYS
> >      >     (To be
> >      >      > restarted if SA_RESTART is set)
> >      >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER,
> si_pid=115587,
> >      >     si_uid=1000}
> >      >      > ---
> >      >      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> >      >      > rt_sigreturn()                          = 202
> >      >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
> >     ERESTARTSYS
> >      >     (To be
> >      >      > restarted if SA_RESTART is set)
> >      >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER,
> si_pid=115587,
> >      >     si_uid=1000}
> >      >      > ---
> >      >      > futex(0x7feee8008bf8, FUTEX_WAKE_PRIVATE, 1) = 1
> >      >      > rt_sigreturn()                          = 202
> >      >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
> >     ERESTARTSYS
> >      >     (To be
> >      >      > restarted if SA_RESTART is set)
> >      >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER,
> si_pid=115587,
> >      >     si_uid=1000}
> >      >      > ---
> >      >      > rt_sigreturn()                          = 202
> >      >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL) = ?
> >     ERESTARTSYS
> >      >     (To be
> >      >      > restarted if SA_RESTART is set)
> >      >      > --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER,
> si_pid=115587,
> >      >     si_uid=1000}
> >      >      > ---
> >      >      > rt_sigreturn()                          = 202
> >      >      > futex(0x7feeeffa19d0, FUTEX_WAIT, 116198, NULL^CProcess
> 115586
> >      >     detached
> >      >      >   <detached ...>
> >      >      >
> >      >      > Can someone help me understand what is happening here?
> >      >
> >      >     It appears that in responding to the SIGQUIT that is used to
> >     trigger
> >      >     the
> >      >     starting of the attach listener thread, that something is
> >     going wrong.
> >      >     We appear to be continually restarting an operation that
> >     still sees the
> >      >     signal pending - which doesn't really make sense to me. Can
> >     you get a
> >      >     complete stack trace using gdb?
> >      >
> >      >      > Please redirect me to proper ilist if this is not correct
> list
> >      >     for these
> >      >      > type of questions.
> >      >
> >      >     This list is fine. It may end up being an issue for
> >     serviceability-dev
> >      >     but we can deal with that later. :)
> >      >
> >      >     Thanks,
> >      >     David
> >      >     -----
> >      >
> >      >      >
> >      >      > TIA
> >      >      > Sundar
> >      >      >
> >      >
> >
>

From fujie at loongson.cn  Wed Nov  6 03:07:15 2019
From: fujie at loongson.cn (Jie Fu)
Date: Wed, 6 Nov 2019 11:07:15 +0800
Subject: RFR(trivial): 8233659: [TESTBUG]
 runtime/cds/appcds/CommandLineFlagCombo.java fails when jfr is disabled
Message-ID: <db4ae219-8663-1cd3-3bbe-6668fed4f837@loongson.cn>

Hi all,

May I get reviews for the one-line change?

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233659
Webrev: http://cr.openjdk.java.net/~jiefu/8233659/webrev.00/

Thanks a lot.
Best regards,
Jie


From ioi.lam at oracle.com  Wed Nov  6 03:59:45 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 5 Nov 2019 19:59:45 -0800
Subject: RFR(trivial): 8233659: [TESTBUG]
 runtime/cds/appcds/CommandLineFlagCombo.java fails when jfr is disabled
In-Reply-To: <db4ae219-8663-1cd3-3bbe-6668fed4f837@loongson.cn>
References: <db4ae219-8663-1cd3-3bbe-6668fed4f837@loongson.cn>
Message-ID: <9e1d7498-3e8a-d40a-a740-93703c01c2f2@oracle.com>

Hi Jie,

I think the better fix is to call WhiteBox.isJFRIncludedInVmBuild() 
inside CommandLineFlagCombo.skipTestCase(). That way you can test other 
flags even when JFR is not included.

Thanks
- Ioi

On 11/5/19 7:07 PM, Jie Fu wrote:
> Hi all,
>
> May I get reviews for the one-line change?
>
> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233659
> Webrev: http://cr.openjdk.java.net/~jiefu/8233659/webrev.00/
>
> Thanks a lot.
> Best regards,
> Jie
>
>


From ioi.lam at oracle.com  Wed Nov  6 04:14:57 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 5 Nov 2019 20:14:57 -0800
Subject: RFR (S) 8233086 [TESTBUG] need to test field layout style difference
 between CDS dump time and run time
Message-ID: <6c43a4c4-a5df-544a-6472-e921cd94aa56@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8233086
http://cr.openjdk.java.net/~iklam/jdk14/8233086-cds-field-layout-test.v01/

These VM options control how non-static fields are laid out.

 ? ? FieldsAllocationStyle
 ??? CompactFields
 ??? EnableContended
 ??? ContendedPaddingWidth
 ??? RestrictContended

It's possible to set different values for these options between CDS
dump time and program run time. As a result, it's possible to have
an archived super class that has one type of field layout, while an
non-archived sub class that has a different type of field layout.

I added a new test case to verify that the VM works properly when this
happens.

Thanks
- Ioi

From david.holmes at oracle.com  Wed Nov  6 05:24:32 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 15:24:32 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR7/v2.07/10-for-jdk14)
In-Reply-To: <e7e76d46-dd1f-e3f4-3db3-500c03507661@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <58713599-b5ea-57bb-0413-fce3b8bd5d2f@oracle.com>
 <66b55bf0-a194-cecb-a9a1-cea7a952619b@oracle.com>
 <a20e42b4-85f7-29de-4573-76cc477e39a0@oracle.com>
 <32f8e268-7c15-82f6-3b9b-398c33c160cb@oracle.com>
 <f1e071e6-c92d-a6be-7536-df1a30fe7e6b@oracle.com>
 <e7e76d46-dd1f-e3f4-3db3-500c03507661@oracle.com>
Message-ID: <dd3ac131-1066-cdfb-601a-9105136c12ea@oracle.com>

Hi Dan,

Trimming ...

On 6/11/2019 1:36 am, Daniel D. Daugherty wrote:
> 
> And my brain returns to the more fundamental question of why do we have
> OrderAccess::storeload() at L1045 and OrderAccess::fence() at L1225?
> Both sites are trying to separate the release_store(&_owner, NULL) from
> a subsequent load. In the first case:
> 
> 1044?????? OrderAccess::release_store(&_owner, (void*)NULL); // drop the 
> lock
> 1045?????? OrderAccess::storeload();??????????????????????? // See if we 
> need to wake a successor
> <snip>
> 1047???? if ((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != NULL) {
>
> In the second case:
> 
> 1224???? OrderAccess::release_store(&_owner, (void*)NULL);
> 1225???? OrderAccess::fence();?????????????????????????????? // ST 
> _owner vs LD in unpark()
> <snip>
> 1229?? Trigger->unpark();
> 
> The code has been this way for a very long time, but why?

The devil is in the detail and this is a very complex piece of code with 
very complex usage protocols. Generally speaking when you release the 
lock you want a fence() to ensure immediate visibility to other threads 
that the lock is now free. That happens in paths that will complete the 
monitor operation quickly e.g. in ExitEpilog. In other places after 
releasing the lock we may have more work to do - like wake a successor - 
so we only need the storeload() after the release_store to make sure we 
don't read the queue heads until after we've released the lock; the code 
that follows that can then have additional memory sync operations (like 
an atomic op) that implicitly achieve the effects of the fence.

> Of course, this question about the baseline code is still a sidebar.
> An interesting sidebar, but...
> 
> 
> For Async Monitor Deflation, I think we need fence() at both locations
> for proper interaction with the deflater thread.

Possibly. The devil is in the detail of the actual code paths.

>> As per previous discussion I think you still need a release_store of 
>> _owner (at least in the case where you are releasing the monitor).
> 
> I've made a mistake with this encapsulation. I made it look like a
> general setter of a new value. In reality, both callers specify
> new_value == NULL so we don't actually need the new_value parameter.
> 
> I think it needs to be something like this:
> 
>  ? 124 // Clear _owner field; current value must match old_value.
>  ? 125 inline void ObjectMonitor::clear_owner_from(void* old_value) {
>  ? 126?? void* prev = _owner;
>  ? 127?? ADIM_guarantee(prev == old_value, "unexpected prev owner=" 
> INTPTR_FORMAT
>  ? 128????????????????? ", expected=" INTPTR_FORMAT, p2i(prev), 
> p2i(old_value));
>  ? 129?? OrderAccess::release_store(&_owner, (void*)NULL);
>  ? 130?? OrderAccess::fence();
>  ? 131?? log_trace(monitorinflation, owner)("clear_owner_from(): mid=" 
> INTPTR_FORMAT
>  ? 132????????????????????????????????????? ", prev=" INTPTR_FORMAT, 
> p2i(this), p2i(prev));
>  ? 133 }

When factored out like this you are forced to use the heaviest memory 
barrier needed by a given callsite and then apply it to all - ie using 
the fence() always when it might not be needed.

I'm really not a fan of this kind of factoring out in complex lock-free 
code as it hides the details of the memory sync operations. You could 
make this more obvious in the naming e.g.

inline void ObjectMonitor::release_clear_owner_and_fence(void* 
expected_owner)

or if the fence is not always needed you could parameterise the final 
memory barrier as well.

> Thanks for sticking with this part of the review.
> 
> 
>> That's it on this thread. I still have to look at version 2.08 in full.
> 
> I look forward to your feedback!

Hopefully tomorrow. Trying to track down a memory corruption issue.

Cheers,
David


From kim.barrett at oracle.com  Wed Nov  6 05:37:44 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 6 Nov 2019 00:37:44 -0500
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
Message-ID: <C5373951-4E14-40D8-8328-370566273940@oracle.com>

> On Nov 5, 2019, at 8:44 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 6/11/2019 11:38 am, Kim Barrett wrote:
>>> On Nov 5, 2019, at 2:06 AM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> may I please have reviews for this small build fix:
>>> 
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
>>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
>>> Prior discussion: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
>>> 
>>> Thank you,
>>> 
>>> Thomas
>> The "#ifdef __GNUG__" should not be needed or used.
>> There are a dozen existing uses of PRAGMA_DIAG_PUSH.
>> While there aren't currently any other direct uses of
>> PRAGMA_DISABLE_GCC_WARNING, there are indirect uses via other
>> PRAGMA_DISABLE_xxx macros.
>> None of those have __GNUX__ protections (for any X).
> 
> Well AFAICS they reside in a gcc specific file: compilerWarnings_gcc.hpp. But we also take steps in compilerWarnings.hpp to accommodate their use in source code when other compilers are used.
> 
> #ifndef PRAGMA_DISABLE_GCC_WARNING
> #define PRAGMA_DISABLE_GCC_WARNING(name)
> #endif
> 
> so I agree we should not need any guard to make this gcc only - unless we really do want to control the version.

If we need to control the version it?s applied to, we can use PRAGMA_STRINGOP_TRUNCATION_IGNORED as a model.
Or just version-conditionalize at the one place it?s needed, for now.  If more -Wc++14-compat issues come up later
then we can introduce a new macro.


From fujie at loongson.cn  Wed Nov  6 05:49:10 2019
From: fujie at loongson.cn (Jie Fu)
Date: Wed, 6 Nov 2019 13:49:10 +0800
Subject: RFR(trivial): 8233659: [TESTBUG]
 runtime/cds/appcds/CommandLineFlagCombo.java fails when jfr is disabled
In-Reply-To: <9e1d7498-3e8a-d40a-a740-93703c01c2f2@oracle.com>
References: <db4ae219-8663-1cd3-3bbe-6668fed4f837@loongson.cn>
 <9e1d7498-3e8a-d40a-a740-93703c01c2f2@oracle.com>
Message-ID: <59d63e82-b4c2-195f-0569-e41cef0e3b72@loongson.cn>

Hi Ioi,

Thanks for your review and valuable comments.

Very good ideal.
Updated: http://cr.openjdk.java.net/~jiefu/8233659/webrev.01/

Hope you can sponsor it if you are OK with the change.

Thanks a lot.
Best regards,
Jie

On 2019/11/6 ??11:59, Ioi Lam wrote:
> Hi Jie,
>
> I think the better fix is to call WhiteBox.isJFRIncludedInVmBuild() 
> inside CommandLineFlagCombo.skipTestCase(). That way you can test 
> other flags even when JFR is not included.
>
> Thanks
> - Ioi
>
> On 11/5/19 7:07 PM, Jie Fu wrote:
>> Hi all,
>>
>> May I get reviews for the one-line change?
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233659
>> Webrev: http://cr.openjdk.java.net/~jiefu/8233659/webrev.00/
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>>
>


From ioi.lam at oracle.com  Wed Nov  6 05:54:19 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 5 Nov 2019 21:54:19 -0800
Subject: RFR(trivial): 8233659: [TESTBUG]
 runtime/cds/appcds/CommandLineFlagCombo.java fails when jfr is disabled
In-Reply-To: <59d63e82-b4c2-195f-0569-e41cef0e3b72@loongson.cn>
References: <db4ae219-8663-1cd3-3bbe-6668fed4f837@loongson.cn>
 <9e1d7498-3e8a-d40a-a740-93703c01c2f2@oracle.com>
 <59d63e82-b4c2-195f-0569-e41cef0e3b72@loongson.cn>
Message-ID: <9530bd21-8cd2-bbf8-5c5c-80d7dec19ad9@oracle.com>

Looks good. I'll sponsor it.

Thanks
- Ioi

On 11/5/19 9:49 PM, Jie Fu wrote:
> Hi Ioi,
>
> Thanks for your review and valuable comments.
>
> Very good ideal.
> Updated: http://cr.openjdk.java.net/~jiefu/8233659/webrev.01/
>
> Hope you can sponsor it if you are OK with the change.
>
> Thanks a lot.
> Best regards,
> Jie
>
> On 2019/11/6 ??11:59, Ioi Lam wrote:
>> Hi Jie,
>>
>> I think the better fix is to call WhiteBox.isJFRIncludedInVmBuild() 
>> inside CommandLineFlagCombo.skipTestCase(). That way you can test 
>> other flags even when JFR is not included.
>>
>> Thanks
>> - Ioi
>>
>> On 11/5/19 7:07 PM, Jie Fu wrote:
>>> Hi all,
>>>
>>> May I get reviews for the one-line change?
>>>
>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233659
>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233659/webrev.00/
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>>
>>
>


From fujie at loongson.cn  Wed Nov  6 05:59:03 2019
From: fujie at loongson.cn (Jie Fu)
Date: Wed, 6 Nov 2019 13:59:03 +0800
Subject: RFR(trivial): 8233659: [TESTBUG]
 runtime/cds/appcds/CommandLineFlagCombo.java fails when jfr is disabled
In-Reply-To: <9530bd21-8cd2-bbf8-5c5c-80d7dec19ad9@oracle.com>
References: <db4ae219-8663-1cd3-3bbe-6668fed4f837@loongson.cn>
 <9e1d7498-3e8a-d40a-a740-93703c01c2f2@oracle.com>
 <59d63e82-b4c2-195f-0569-e41cef0e3b72@loongson.cn>
 <9530bd21-8cd2-bbf8-5c5c-80d7dec19ad9@oracle.com>
Message-ID: <abffa174-5fdf-c02f-3565-2a8bc3067786@loongson.cn>

Thank you so much, Ioi.

On 2019/11/6 ??1:54, Ioi Lam wrote:
> I'll sponsor it.


From fujie at loongson.cn  Wed Nov  6 07:24:13 2019
From: fujie at loongson.cn (Jie Fu)
Date: Wed, 6 Nov 2019 15:24:13 +0800
Subject: RFR(trivial): 8233671: [TESTBUG]
 runtime/cds/appcds/sharedStrings/FlagCombo.java fails to compile without jfr
Message-ID: <b0f4c15f-0345-2054-cb03-050b22e5741b@loongson.cn>

Hi all,

May I get reviews for the one-line change?

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233671
Webrev: http://cr.openjdk.java.net/~jiefu/8233671/webrev.00/

Thanks a lot.
Best regards,
Jie


From markus.gronlund at oracle.com  Wed Nov  6 09:17:11 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Wed, 6 Nov 2019 01:17:11 -0800 (PST)
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <b55b67c6-3099-8757-5850-1003e0dd1a1b@oss.nttdata.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <d26e15bd-3b6b-f800-7d6f-b8e08c96cad9@oss.nttdata.com>
 <05e465ed-ee58-74c8-c1a5-450415a0b29f@oracle.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
 <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
 <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>
 <458e9a65-9d05-f237-9fa5-63ff69515daf@oss.nttdata.com>
 <62fac22d-adc3-0b0f-43ad-801d1ccd3579@oracle.com>
 <b55b67c6-3099-8757-5850-1003e0dd1a1b@oss.nttdata.com>
Message-ID: <09ca29f2-6e63-4fc3-8df9-02271594d3c1@default>

Looks good.

Markus

-----Original Message-----
From: Yasumasa Suenaga <suenaga at oss.nttdata.com> 
Sent: den 6 november 2019 00:40
To: David Holmes <david.holmes at oracle.com>; Markus Gronlund <markus.gronlund at oracle.com>
Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state

Thanks David!
Markus, please review it:

>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/


Yasumasa


On 2019/11/06 8:26, David Holmes wrote:
> On 5/11/2019 11:40 pm, Yasumasa Suenaga wrote:
>> On 2019/11/05 22:34, Yasumasa Suenaga wrote:
>>> Hi Markus,
>>>
>>> Thanks for explanation. I tweaked your webrev:
>>>
>>> ?? http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>
>> Sorry, wevrev is here:
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/
> 
> This seems okay to me.
> 
> Thanks,
> David
> 
>> Yasumasa
>>
>>> If you and David are ok, I will push it.
>>>
>>>
>>>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>>
>>> I think it is very helpful for troubleshooting!
>>> If crash report shows the location of JFR file or repository, it is more useful.
>>> So I've sent review request for it (JDK-8233373):
>>>
>>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-Novembe
>>> r/000808.html
>>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2019/11/05 21:54, Markus Gronlund wrote:
>>>> The current dump mechanism is reusing most of the regular logic because it has to perform quite a lot of work to construct a recording. For example, it needs to collect all tagged artifacts in the system (Klass, Method, ClassLoaderData, Symbols, Modules and more) to have the events in an emergency recording file be fully parsable. This is non-trivial requiring a thread to at least be part of the VM, with most thread local data structures preserved.
>>>>
>>>> We should remember that dumping an emergency recording is only a best effort attempt. As an analogy, compare it to other routines in VMError::report() that are conditional and require a non-NULL thread object (print the current Compile Task, print VM Operation or event printing the JavaStack for a thread for example).
>>>>
>>>> With Event Streaming that was recently checked in, it is easier (although not easy, but easier compared to before ) to extend this support to not have the emergency dumper thread do so much internal VM work.
>>>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>>>
>>>> It is unclear at this point if the value-add is high enough to warrant the work.
>>>>
>>>> Thanks
>>>> Markus
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>> Sent: den 5 november 2019 12:56
>>>> To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes 
>>>> <david.holmes at oracle.com>
>>>> Cc: hotspot-jfr-dev at openjdk.java.net; 
>>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>> thread state
>>>>
>>>> Hi Markus,
>>>>
>>>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>>
>>>> Do you mean that it would not perform when Thread::current() returns NULL?
>>>> It will happen when crash is occur in detached thread [1]. Can't we think about that case?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> [1] 
>>>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-Novemb
>>>> er/000822.html
>>>>
>>>>
>>>> On 2019/11/05 19:35, Markus Gronlund wrote:
>>>>> Hi again,
>>>>>
>>>>> The comments in my previous email still apply:
>>>>>
>>>>> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
>>>>>
>>>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>>> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
>>>>>
>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>
>>>>> Thanks
>>>>> Markus
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>> Sent: den 5 november 2019 07:36
>>>>> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund 
>>>>> <markus.gronlund at oracle.com>
>>>>> Cc: hotspot-jfr-dev at openjdk.java.net; 
>>>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover 
>>>>> thread state
>>>>>
>>>>> Thanks David!
>>>>> I wait for Markus's review.
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2019/11/05 15:19, David Holmes wrote:
>>>>>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Sorry, I was confused :)
>>>>>>> This is new webrev. Could you check again?
>>>>>>>
>>>>>>> ? ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>>>>>
>>>>>> Okay structurally I'm fine with that.
>>>>>>
>>>>>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2019/11/05 14:56, David Holmes wrote:
>>>>>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>>
>>>>>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>>>>>
>>>>>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>>>>>
>>>>>>>>>> ? ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>>> ? ??455
>>>>>>>>>> ? ??456?? // Ensure a JavaThread is _thread_in_vm when we 
>>>>>>>>>> make this call
>>>>>>>>>> ? ??457?? JavaThreadInVM jtivm(thread);
>>>>>>>>>> ? ??458?? if ((thread != NULL) && 
>>>>>>>>>> !prepare_for_emergency_dump()) {
>>>>>>>>>> ? ??459???? return;
>>>>>>>>>> ? ??460?? }
>>>>>>>>>
>>>>>>>>> Oh, sorry, I have a mistake!
>>>>>>>>> I want to change as below:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>> +
>>>>>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this 
>>>>>>>>> +call
>>>>>>>>> +? JavaThreadInVM jtivm(thread);
>>>>>>>>> +? if (thread != NULL) {
>>>>>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>>>>>> +????? return;
>>>>>>>>> +??? }
>>>>>>>>> +? }
>>>>>>>>> ```
>>>>>>>>
>>>>>>>> but that is the same logic ??
>>>>>>>>
>>>>>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>>>>>
>>>>>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>>>>>
>>>>>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>>>>>
>>>>>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>>>>>
>>>>>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ? ?? 
>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>>>>>> Could you review again?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.
>>>>>>>>>>>>>>> 02/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> or crash here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks David!
>>>>>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03
>>>>>>>>>>>>> /
>>>>>>>>>>>>
>>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus 
>>>>>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>; 
>>>>>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do 
>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't think it is worth the disruption to pass the 
>>>>>>>>>>>>>>>>> "thread" all the way through these API's. It is 
>>>>>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() == 
>>>>>>>>>>>>>>>>> _thread_in_vm), "invariant");
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ? ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>>>>> ? ??? return;
>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>> ? ??? // Ensure a JavaThread is _thread_in_vm when we 
>>>>>>>>>>>>>>>>> make this call
>>>>>>>>>>>>>>>>> ? ??? JavaThreadInVM 
>>>>>>>>>>>>>>>>> jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>>>>> ? ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>>>>> ? ????? return;
>>>>>>>>>>>>>>>>> ? ??? }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webr
>>>>>>>>>>>>>>>>>>>>>> ev/
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That is more like I had envisaged for this. 
>>>>>>>>>>>>>>>>>>>>> Reusing existing thread-state transition code is 
>>>>>>>>>>>>>>>>>>>>> preferable to adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in 
>>>>>>>>>>>>>>>>>>>> its arguments. So
>>>>>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the 
>>>>>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes it does - which would be a problem if a 
>>>>>>>>>>>>>>>>>>> safepoint (or
>>>>>>>>>>>>>>>>>>> handshake) were pending. But the path through 
>>>>>>>>>>>>>>>>>>> before_exit already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a 
>>>>>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a 
>>>>>>>>>>>>>>>>>> custom transition helper is needed to avoid that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we 
>>>>>>>>>>>>>>>>>>> may not be running in a JavaThread:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ? ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>>>>> ? ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers, 
>>>>>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net; 
>>>>>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>; 
>>>>>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund 
>>>>>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency 
>>>>>>>>>>>>>>>>>>>>>> dump do not recover thread state
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/
>>>>>>>>>>>>>>>>>>>>>> webr
>>>>>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo 
>>>>>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a 
>>>>>>>>>>>>>>>>>>>>>>> patch which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not 
>>>>>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ? ?? ? JBS:
>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>>>>> ? ?? ? webrev:
>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375
>>>>>>>>>>>>>>>>>>>>>>> /web
>>>>>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will 
>>>>>>>>>>>>>>>>>>>>>>> dump data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as 
>>>>>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and 
>>>>>>>>>>>>>>>>>>>>>>> related issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa

From markus.gronlund at oracle.com  Wed Nov  6 09:19:39 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Wed, 6 Nov 2019 01:19:39 -0800 (PST)
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
Message-ID: <f72e8b0f-1004-48cc-a514-06221b4fc5f9@default>

Hi Thomas,

Thanks for bringing this to attention. I agree with you that it should move to a better location to have minimal impact on error reporting.

Thanks again
Markus

-----Original Message-----
From: Thomas St?fe <thomas.stuefe at gmail.com> 
Sent: den 1 november 2019 11:37
To: Yasumasa Suenaga <suenaga at oss.nttdata.com>
Cc: hotspot-jfr-dev at openjdk.java.net; yasuenag at gmail.com; Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
Subject: Re: RFR: 8233375: JFR emergency dump do not recover thread state

Hi Yasumasa,

I see that we do JFR::on_vm_shutdown() before error reporting ran. Is that really necessary? Error reporting should happen as close as possible to the error point - ideally, as little code as possible should run between the crash/assert and the generation of the hs-err file. I suggest moving the call to JFR::on_vm_shutdown() down to a point after error reporting, e.g. to where we print the NMT report on shutdown.

Cheers, Thomas


On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> Forward to hotspot-runtime-dev.
>
> As David commented in JBS, it may need to be fixed in JFR code.
> But I'm not unclear why thread state is not recover.
>
> I'd like to hear about this from JFR folks.
> If it is just a bug in JFR, I will create a patch which recover it in 
> JFR code.
>
>
> Thanks,
>
> Yasumasa
>
>
> -------- Forwarded Message --------
> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
> Date: Fri, 1 Nov 2019 17:08:42 +0900
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> To: hotspot-jfr-dev at openjdk.java.net
> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>
> Hi all,
>
> Please review this change:
>
>    JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>    webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>
> If JFR is running when JVM crashes, JFR will dump data to 
> hs_err_pid<PID>.jfr .
> It would perform in prepare_for_emergency_dump().
> However this function transits thread state to "_thread_in_vm".
>
> This change has been tested on submit repo as 
> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
> It failed at compiler/types/correctness/CorrectnessTest.java
> However this test is for JIT compiler, and related issue has been 
> reported as JDK-8225620.
> So I think this patch can go through.
>
>
> Thanks,
>
> Yasumasa
>

From shade at redhat.com  Wed Nov  6 11:33:15 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 6 Nov 2019 12:33:15 +0100
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
Message-ID: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8233698

Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is the gift that keeps on
giving! The fix is to get even deeper into the warning disabling story:

diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
--- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25 2019 +0800
+++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23 2019 +0100
@@ -89,11 +89,13 @@
   fatal("Should not call global delete []");
 }

 #ifdef __GNUG__
 // Warning disabled for gcc 5.4
+// Warning for unknown warning disabled for gcc 4.8.5
 PRAGMA_DIAG_PUSH
+PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
 PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
 #endif // __GNUG__

 void operator delete(void* p, size_t size) throw() {
   fatal("Should not call global sized delete");

Testing: gcc 4.8.5 build

-- 
Thanks,
-Aleksey


From thomas.stuefe at gmail.com  Wed Nov  6 11:55:36 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 6 Nov 2019 12:55:36 +0100
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
Message-ID: <CAA-vtUx_ncGkebuiVXK-VD6V+qPPg3q3gDZD9k8N3Lmb2JOA7g@mail.gmail.com>

:(

Thanks for fixing.

Looks good and trivial. I will keep out of the discussion of whether or not
we should support gcc < 5. Personally, I think supporting it makes sense as
long as it does not hurt too much.

Thomas


On Wed, Nov 6, 2019 at 12:34 PM Aleksey Shipilev <shade at redhat.com> wrote:

> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8233698
>
> Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is
> the gift that keeps on
> giving! The fix is to get even deeper into the warning disabling story:
>
> diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
> --- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25 2019
> +0800
> +++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23 2019
> +0100
> @@ -89,11 +89,13 @@
>    fatal("Should not call global delete []");
>  }
>
>  #ifdef __GNUG__
>  // Warning disabled for gcc 5.4
> +// Warning for unknown warning disabled for gcc 4.8.5
>  PRAGMA_DIAG_PUSH
> +PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
>  PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
>  #endif // __GNUG__
>
>  void operator delete(void* p, size_t size) throw() {
>    fatal("Should not call global sized delete");
>
> Testing: gcc 4.8.5 build
>
> --
> Thanks,
> -Aleksey
>
>

From suenaga at oss.nttdata.com  Wed Nov  6 12:25:50 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Wed, 6 Nov 2019 21:25:50 +0900
Subject: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <09ca29f2-6e63-4fc3-8df9-02271594d3c1@default>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <34be18ce-c8ca-66e4-fcd0-51c5b12b65a2@oss.nttdata.com>
 <e5127d51-9164-ecbc-6c6b-6c5c6cced24f@oracle.com>
 <37396c04-694f-c228-1102-594425bee67f@oss.nttdata.com>
 <912210bb-e714-d188-1032-542a132eb4cd@oracle.com>
 <014cdc04-818d-45b2-798e-62ead95b6ecc@oss.nttdata.com>
 <fc05ae1b-5dca-b3b9-ff07-7eacb701fad8@oracle.com>
 <5debfd23-ba51-6a37-52cc-476e2813e2fd@oss.nttdata.com>
 <851fa9e8-ddfa-7872-b937-596efae2d6e4@oracle.com>
 <8b3a28c0-3bff-84d2-2b3c-35944ff0bb94@oss.nttdata.com>
 <3b974302-ddd7-49b6-a9e2-aa4f9a8d0b58@default>
 <c98ca024-7ec6-c378-01a7-49476151f428@oss.nttdata.com>
 <f0245ca4-d35a-4752-8fca-0e0e67399338@default>
 <a41eccb7-758b-dad9-5295-cb6d29c27c00@oss.nttdata.com>
 <458e9a65-9d05-f237-9fa5-63ff69515daf@oss.nttdata.com>
 <62fac22d-adc3-0b0f-43ad-801d1ccd3579@oracle.com>
 <b55b67c6-3099-8757-5850-1003e0dd1a1b@oss.nttdata.com>
 <09ca29f2-6e63-4fc3-8df9-02271594d3c1@default>
Message-ID: <c046a60f-d820-634c-319a-b9c322f641d0@oss.nttdata.com>

Thanks Markus!

Yasumasa

On 2019/11/06 18:17, Markus Gronlund wrote:
> Looks good.
> 
> Markus
> 
> -----Original Message-----
> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> Sent: den 6 november 2019 00:40
> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund <markus.gronlund at oracle.com>
> Cc: hotspot-jfr-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover thread state
> 
> Thanks David!
> Markus, please review it:
> 
>>>     http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/
> 
> 
> Yasumasa
> 
> 
> On 2019/11/06 8:26, David Holmes wrote:
>> On 5/11/2019 11:40 pm, Yasumasa Suenaga wrote:
>>> On 2019/11/05 22:34, Yasumasa Suenaga wrote:
>>>> Hi Markus,
>>>>
>>>> Thanks for explanation. I tweaked your webrev:
>>>>
>>>>  ?? http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>
>>> Sorry, wevrev is here:
>>>
>>>  ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.05/
>>
>> This seems okay to me.
>>
>> Thanks,
>> David
>>
>>> Yasumasa
>>>
>>>> If you and David are ok, I will push it.
>>>>
>>>>
>>>>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>>>
>>>> I think it is very helpful for troubleshooting!
>>>> If crash report shows the location of JFR file or repository, it is more useful.
>>>> So I've sent review request for it (JDK-8233373):
>>>>
>>>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-Novembe
>>>> r/000808.html
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2019/11/05 21:54, Markus Gronlund wrote:
>>>>> The current dump mechanism is reusing most of the regular logic because it has to perform quite a lot of work to construct a recording. For example, it needs to collect all tagged artifacts in the system (Klass, Method, ClassLoaderData, Symbols, Modules and more) to have the events in an emergency recording file be fully parsable. This is non-trivial requiring a thread to at least be part of the VM, with most thread local data structures preserved.
>>>>>
>>>>> We should remember that dumping an emergency recording is only a best effort attempt. As an analogy, compare it to other routines in VMError::report() that are conditional and require a non-NULL thread object (print the current Compile Task, print VM Operation or event printing the JavaStack for a thread for example).
>>>>>
>>>>> With Event Streaming that was recently checked in, it is easier (although not easy, but easier compared to before ) to extend this support to not have the emergency dumper thread do so much internal VM work.
>>>>> The reason is that Event Streaming changed the architecture a bit where existing data will move to the chunk file more frequently (including artifacts) and it might therefore be possible to create a non-VM internal dump mechanism (bounding it to mainly closing the underlying file and copying chunks).
>>>>>
>>>>> It is unclear at this point if the value-add is high enough to warrant the work.
>>>>>
>>>>> Thanks
>>>>> Markus
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>> Sent: den 5 november 2019 12:56
>>>>> To: Markus Gronlund <markus.gronlund at oracle.com>; David Holmes
>>>>> <david.holmes at oracle.com>
>>>>> Cc: hotspot-jfr-dev at openjdk.java.net;
>>>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>> thread state
>>>>>
>>>>> Hi Markus,
>>>>>
>>>>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>>>
>>>>> Do you mean that it would not perform when Thread::current() returns NULL?
>>>>> It will happen when crash is occur in detached thread [1]. Can't we think about that case?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> [1]
>>>>> https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2019-Novemb
>>>>> er/000822.html
>>>>>
>>>>>
>>>>> On 2019/11/05 19:35, Markus Gronlund wrote:
>>>>>> Hi again,
>>>>>>
>>>>>> The comments in my previous email still apply:
>>>>>>
>>>>>> "...although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already. I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change."
>>>>>>
>>>>>> 1. An invalid thread local will give immediate problems downstream, for example see JfrRecorderService::prepare_for_vm_error_rotation().
>>>>>> 2. You don't need to restore _thread_in_vm back to threads already running in the correct state. The purpose of the transition helper class is to move Java threads not running in _thread_in_vm (i.e. will be _thread_in_native). Move this logic to the fore to better clarify the intent of the helper class.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>
>>>>>> Thanks
>>>>>> Markus
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>> Sent: den 5 november 2019 07:36
>>>>>> To: David Holmes <david.holmes at oracle.com>; Markus Gronlund
>>>>>> <markus.gronlund at oracle.com>
>>>>>> Cc: hotspot-jfr-dev at openjdk.java.net;
>>>>>> hotspot-runtime-dev at openjdk.java.net; yasuenag at gmail.com
>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do not recover
>>>>>> thread state
>>>>>>
>>>>>> Thanks David!
>>>>>> I wait for Markus's review.
>>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2019/11/05 15:19, David Holmes wrote:
>>>>>>> On 5/11/2019 4:08 pm, Yasumasa Suenaga wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Sorry, I was confused :)
>>>>>>>> This is new webrev. Could you check again?
>>>>>>>>
>>>>>>>>  ? ?? http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.04/
>>>>>>>
>>>>>>> Okay structurally I'm fine with that.
>>>>>>>
>>>>>>> What I don't know, and will leave to Markus to determine, is whether the rest of the code in on_vm_shutdown can actually execute okay if there is no current thread.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019/11/05 14:56, David Holmes wrote:
>>>>>>>>> On 5/11/2019 3:48 pm, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2019/11/05 14:34, David Holmes wrote:
>>>>>>>>>>> On 5/11/2019 3:13 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>>>
>>>>>>>>>>>> prepare_for_emergency_dump() returns false if some critical locks could not unlock.
>>>>>>>>>>>> So what should we return if NULL is passed as the argument? true?
>>>>>>>>>>>
>>>>>>>>>>> But you're not calling prepare_for_emergency_dump when the thread is NULL:
>>>>>>>>>>>
>>>>>>>>>>>  ? ??454?? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>>>>  ? ??455
>>>>>>>>>>>  ? ??456?? // Ensure a JavaThread is _thread_in_vm when we
>>>>>>>>>>> make this call
>>>>>>>>>>>  ? ??457?? JavaThreadInVM jtivm(thread);
>>>>>>>>>>>  ? ??458?? if ((thread != NULL) &&
>>>>>>>>>>> !prepare_for_emergency_dump()) {
>>>>>>>>>>>  ? ??459???? return;
>>>>>>>>>>>  ? ??460?? }
>>>>>>>>>>
>>>>>>>>>> Oh, sorry, I have a mistake!
>>>>>>>>>> I want to change as below:
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> +? Thread* thread = Thread::current_or_null_safe();
>>>>>>>>>> +
>>>>>>>>>> +? // Ensure a JavaThread is _thread_in_vm when we make this
>>>>>>>>>> +call
>>>>>>>>>> +? JavaThreadInVM jtivm(thread);
>>>>>>>>>> +? if (thread != NULL) {
>>>>>>>>>> +??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>> +????? return;
>>>>>>>>>> +??? }
>>>>>>>>>> +? }
>>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> but that is the same logic ??
>>>>>>>>>
>>>>>>>>>>> All I'm saying is that you pass "thread" as a parameter so you can then delete the existing call to Thread::current() that is inside prepare_for_emergency_dump.
>>>>>>>>>>
>>>>>>>>>> Is it prefer pass "thread" to prepare_for_emergency_dump() instead of above?
>>>>>>>>>
>>>>>>>>> ??? The two are not related. If you've already obtained the current thread you can pass it to prepare_for_emergency_dump and avoid the need to call Thread:current() (in whatever form) again. How you handle a NULL current thread is independent of that.
>>>>>>>>>
>>>>>>>>>> If so, I will push new changeset to submit repo, and will send new review request.
>>>>>>>>>
>>>>>>>>> I'd send the review request first and get agreement before wasting time on the submit repo.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> It might break semantics of this function, so I did not add argument to prepare_for_emergency_dump() in this webrev.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019/11/05 13:56, David Holmes wrote:
>>>>>>>>>>>>> On 5/11/2019 1:56 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> On 2019/11/05 9:17, David Holmes wrote:
>>>>>>>>>>>>>>> On 5/11/2019 9:56 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>> On 2019/11/04 22:43, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>> Hi Markus,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I thought similar change, and it is running on submit repo:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       
>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/submit/rev/76703c4ec1ea
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If it passes all tests, I will send review request again.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This change passed all tests on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191104-1317-6405064).
>>>>>>>>>>>>>>>> Could you review again?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.
>>>>>>>>>>>>>>>> 02/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In Markus's change, emergency dump will not perform when Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>>>> However it will occur when coredump signal (e.g. SIGSEGV) throws to PID by `kill` command - main thread of the process will be already detached (out of JVM).
>>>>>>>>>>>>>>>> Also the crash might happen in native thread - created by pthread_create (on Linux) from JNI code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thus we should continue to perform emergency dump even if Thread::current_or_null_safe() returns NULL.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I didn't quite follow all that, but if there is no current thread then prepare_for_emergency_dump() is either going to assert here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ? ??348?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> or crash here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  ? ??349?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks David!
>>>>>>>>>>>>>> I fixed it in new webrev:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.03
>>>>>>>>>>>>>> /
>>>>>>>>>>>>>
>>>>>>>>>>>>> It would be cleaner/simpler if prepare_for_emergency_dump takes the thread argument. As it is just a static function that doesn't impact anything else.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It works fine on submit repo (mach5-one-ysuenaga-JDK-8233375-2-20191105-0117-6422777).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 2019/11/04 22:23, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>> Hi Yasumasa and David,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Apologies about the suggestion to use ThreadInVMFromUnknown. I realized later that, as you have pointed out, it would perform a real thread transition. Sorry.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Taking some input from ThreadInVMFromUnknown and the situations I have seen at this location, I think the only case we need to be concerned about here is when a JavaThread is _thread_in_native. _thread_in_java transition to _thread_in_vm via stubs in SharedRuntime (i believe) as part of coming out of the exception handler(s). Unfortunately I cannot give a proper argument now to give the premises where this invariant is enforced, so let's work with the original thread state as you suggested Yasumasa.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If we can avoid passing the thread all the way through, I think that is preferable (this is not performance critical code). David also alluded to the fact that you always manipulate the current thread anyway. Although very unlikely, we could have run into an issue with thread local storage, so it makes sense to test this up front. If we cannot read the thread local, the operations we intend to perform will fail, so we might just bail out already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I took the liberty to tighten up the transition class a little bit; you only need to restore the thread state if there was an actual change.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Perhaps we can do it like this?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webrev01/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for your patience investigating this
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>> From: David Holmes
>>>>>>>>>>>>>>>>>> Sent: den 4 november 2019 05:24
>>>>>>>>>>>>>>>>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Markus
>>>>>>>>>>>>>>>>>> Gronlund <markus.gronlund at oracle.com>;
>>>>>>>>>>>>>>>>>> hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>> Cc: yasuenag at gmail.com
>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency dump do
>>>>>>>>>>>>>>>>>> not recover thread state
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So looking at Yasumasa's proposed fix ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I don't think it is worth the disruption to pass the
>>>>>>>>>>>>>>>>>> "thread" all the way through these API's. It is
>>>>>>>>>>>>>>>>>> simpler/cleaner to just call
>>>>>>>>>>>>>>>>>> Thread::current_or_null_safe() when you need the current thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 357?? assert(thread->is_Java_thread() &&
>>>>>>>>>>>>>>>>>> (((JavaThread*)thread)->thread_state() ==
>>>>>>>>>>>>>>>>>> _thread_in_vm), "invariant");
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This assertion is incorrect. As this can be called via
>>>>>>>>>>>>>>>>>> VMError::report_or_die() there is AFAICS absolutely no guarantee that we need be in a JavaThread at all.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  ? ?? 428 class ThreadInVMForJFR : public StackObj {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can I suggest JavaThreadInVM to make it clear this only affects JavaThreads. And as it is local we don't need the "forJFR" part.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Based on Markus's proposed change, and with a view to constrain the scope even further can I suggest the following:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> if (!guard_reentrancy()) {
>>>>>>>>>>>>>>>>>>  ? ??? return;
>>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>>>  ? ??? // Ensure a JavaThread is _thread_in_vm when we
>>>>>>>>>>>>>>>>>> make this call
>>>>>>>>>>>>>>>>>>  ? ??? JavaThreadInVM
>>>>>>>>>>>>>>>>>> jtivm(Thread::current_or_null_safe());
>>>>>>>>>>>>>>>>>>  ? ??? if (!prepare_for_emergency_dump()) {
>>>>>>>>>>>>>>>>>>  ? ????? return;
>>>>>>>>>>>>>>>>>>  ? ??? }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 4/11/2019 12:24 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> Correction ...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 4/11/2019 12:11 pm, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>> On 4/11/2019 11:19 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>> On 2019/11/04 7:38, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>> On 4/11/2019 2:22 am, Markus Gronlund wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I think you can simplify it to something like this:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mgronlun/8233375/webr
>>>>>>>>>>>>>>>>>>>>>>> ev/
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> That is more like I had envisaged for this.
>>>>>>>>>>>>>>>>>>>>>> Reusing existing thread-state transition code is
>>>>>>>>>>>>>>>>>>>>>> preferable to adding more custom code that directly manipulates thread-state.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I do not agree with this change.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> VMError::report_and_die() has "Thread* thread" in
>>>>>>>>>>>>>>>>>>>>> its arguments. So
>>>>>>>>>>>>>>>>>>>>> Thread::current() might be different with it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Not sure what you mean. You only ever manipulate the
>>>>>>>>>>>>>>>>>>>> thread state of the current thread.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In addition, ThreadInVMfromUnknown uses
>>>>>>>>>>>>>>>>>>>>> transition_from_native() to change the thread state.
>>>>>>>>>>>>>>>>>>>>> It checks (and manipulates?) something which relates to safepoint.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes it does - which would be a problem if a
>>>>>>>>>>>>>>>>>>>> safepoint (or
>>>>>>>>>>>>>>>>>>>> handshake) were pending. But the path through
>>>>>>>>>>>>>>>>>>>> before_exit already has safepoint checks when you acquire the BeforeExit_lock.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> But that isn't relevant. The issue is we don't want a
>>>>>>>>>>>>>>>>>>> safepoint check on the report_and_die() path. So a
>>>>>>>>>>>>>>>>>>> custom transition helper is needed to avoid that.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The main problem with the suggestion is it seems we
>>>>>>>>>>>>>>>>>>>> may not be running in a JavaThread:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  ? ???349?? Thread* const thread = Thread::current();
>>>>>>>>>>>>>>>>>>>>  ? ???350?? if (thread->is_Watcher_thread()) {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> so we can't use the existing thread-state helpers,
>>>>>>>>>>>>>>>>>>>> unless we narrow the scope (as you do) to after the check for the WatcherThread.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thus I added ThreadInVMForJFR to new my webrev.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Your change still seems overly complicated.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>> Markus
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>>> Sent: den 2 november 2019 16:57
>>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net;
>>>>>>>>>>>>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>>> Cc: David Holmes <david.holmes at oracle.com>;
>>>>>>>>>>>>>>>>>>>>>>> yasuenag at gmail.com; Markus Gronlund
>>>>>>>>>>>>>>>>>>>>>>> <markus.gronlund at oracle.com>
>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: Fwd: RFR: 8233375: JFR emergency
>>>>>>>>>>>>>>>>>>>>>>> dump do not recover thread state
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Markus commented in JBS this change should be kept local to JFR.
>>>>>>>>>>>>>>>>>>>>>>> So I updated webrev. Could you review it?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/
>>>>>>>>>>>>>>>>>>>>>>> webr
>>>>>>>>>>>>>>>>>>>>>>> e
>>>>>>>>>>>>>>>>>>>>>>> v.01/
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This change passed all tests on submit repo
>>>>>>>>>>>>>>>>>>>>>>> (mach5-one-ysuenaga-JDK-8233375-1-20191102-1354-6373703).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 2019/11/01 18:41, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Forward to hotspot-runtime-dev.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> As David commented in JBS, it may need to be fixed in JFR code.
>>>>>>>>>>>>>>>>>>>>>>>> But I'm not unclear why thread state is not recover.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'd like to hear about this from JFR folks.
>>>>>>>>>>>>>>>>>>>>>>>> If it is just a bug in JFR, I will create a
>>>>>>>>>>>>>>>>>>>>>>>> patch which recover it in JFR code.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -------- Forwarded Message --------
>>>>>>>>>>>>>>>>>>>>>>>> Subject: RFR: 8233375: JFR emergency dump do not
>>>>>>>>>>>>>>>>>>>>>>>> recover thread state
>>>>>>>>>>>>>>>>>>>>>>>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>>>>>>>>>>>>>>>>>>>>>>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>>>>>>>>>>>>>>>>>>>>>>> To: hotspot-jfr-dev at openjdk.java.net
>>>>>>>>>>>>>>>>>>>>>>>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Please review this change:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  ? ?? ? JBS:
>>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233375
>>>>>>>>>>>>>>>>>>>>>>>>  ? ?? ? webrev:
>>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8233375
>>>>>>>>>>>>>>>>>>>>>>>> /web
>>>>>>>>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>>>>>>>> ev.00/
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If JFR is running when JVM crashes, JFR will
>>>>>>>>>>>>>>>>>>>>>>>> dump data to hs_err_pid<PID>.jfr .
>>>>>>>>>>>>>>>>>>>>>>>> It would perform in prepare_for_emergency_dump().
>>>>>>>>>>>>>>>>>>>>>>>> However this function transits thread state to "_thread_in_vm".
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> This change has been tested on submit repo as
>>>>>>>>>>>>>>>>>>>>>>>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>>>>>>>>>>>>>>>>>>>>>>>> It failed at
>>>>>>>>>>>>>>>>>>>>>>>> compiler/types/correctness/CorrectnessTest.java
>>>>>>>>>>>>>>>>>>>>>>>> However this test is for JIT compiler, and
>>>>>>>>>>>>>>>>>>>>>>>> related issue has been reported as JDK-8225620.
>>>>>>>>>>>>>>>>>>>>>>>> So I think this patch can go through.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa

From suenaga at oss.nttdata.com  Wed Nov  6 12:39:22 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Wed, 6 Nov 2019 21:39:22 +0900
Subject: RFR: 8233375: JFR emergency dump do not recover thread state
In-Reply-To: <f72e8b0f-1004-48cc-a514-06221b4fc5f9@default>
References: <6efb3578-18a6-0a11-3d3a-7edb1bb6f3a7@oss.nttdata.com>
 <9dca9ea3-d8ed-bcdf-ec59-7c4b9e2724e6@oss.nttdata.com>
 <CAA-vtUwwJJtJsc1Um0ruorZs34=zKJdWat+02-fmpkajzAfV3Q@mail.gmail.com>
 <f72e8b0f-1004-48cc-a514-06221b4fc5f9@default>
Message-ID: <e155cbc1-715d-4557-9349-92f0fe07141b@oss.nttdata.com>

I filed it to JBS:
   https://bugs.openjdk.java.net/browse/JDK-8233706

I will back after JDK-8233373.


Yasumasa


On 2019/11/06 18:19, Markus Gronlund wrote:
> Hi Thomas,
> 
> Thanks for bringing this to attention. I agree with you that it should move to a better location to have minimal impact on error reporting.
> 
> Thanks again
> Markus
> 
> -----Original Message-----
> From: Thomas St?fe <thomas.stuefe at gmail.com>
> Sent: den 1 november 2019 11:37
> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>
> Cc: hotspot-jfr-dev at openjdk.java.net; yasuenag at gmail.com; Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
> Subject: Re: RFR: 8233375: JFR emergency dump do not recover thread state
> 
> Hi Yasumasa,
> 
> I see that we do JFR::on_vm_shutdown() before error reporting ran. Is that really necessary? Error reporting should happen as close as possible to the error point - ideally, as little code as possible should run between the crash/assert and the generation of the hs-err file. I suggest moving the call to JFR::on_vm_shutdown() down to a point after error reporting, e.g. to where we print the NMT report on shutdown.
> 
> Cheers, Thomas
> 
> 
> On Fri, Nov 1, 2019 at 10:41 AM Yasumasa Suenaga <suenaga at oss.nttdata.com>
> wrote:
> 
>> Forward to hotspot-runtime-dev.
>>
>> As David commented in JBS, it may need to be fixed in JFR code.
>> But I'm not unclear why thread state is not recover.
>>
>> I'd like to hear about this from JFR folks.
>> If it is just a bug in JFR, I will create a patch which recover it in
>> JFR code.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> -------- Forwarded Message --------
>> Subject: RFR: 8233375: JFR emergency dump do not recover thread state
>> Date: Fri, 1 Nov 2019 17:08:42 +0900
>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>> To: hotspot-jfr-dev at openjdk.java.net
>> CC: yasuenag at gmail.com <yasuenag at gmail.com>
>>
>> Hi all,
>>
>> Please review this change:
>>
>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8233375
>>     webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233375/webrev.00/
>>
>> If JFR is running when JVM crashes, JFR will dump data to
>> hs_err_pid<PID>.jfr .
>> It would perform in prepare_for_emergency_dump().
>> However this function transits thread state to "_thread_in_vm".
>>
>> This change has been tested on submit repo as
>> mach5-one-ysuenaga-JDK-8233375-20191101-0651-6334762.
>> It failed at compiler/types/correctness/CorrectnessTest.java
>> However this test is for JIT compiler, and related issue has been
>> reported as JDK-8225620.
>> So I think this patch can go through.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>

From shade at redhat.com  Wed Nov  6 13:00:41 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 6 Nov 2019 14:00:41 +0100
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
Message-ID: <176e0ce1-1c92-6503-162d-9f5c76f5eff4@redhat.com>

On 11/6/19 12:33 PM, Aleksey Shipilev wrote:
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8233698
> 
> Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is the gift that keeps on
> giving! The fix is to get even deeper into the warning disabling story:
> 
> diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
> --- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25 2019 +0800
> +++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23 2019 +0100
> @@ -89,11 +89,13 @@
>    fatal("Should not call global delete []");
>  }
> 
>  #ifdef __GNUG__
>  // Warning disabled for gcc 5.4
> +// Warning for unknown warning disabled for gcc 4.8.5
>  PRAGMA_DIAG_PUSH
> +PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
>  PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
>  #endif // __GNUG__
> 
>  void operator delete(void* p, size_t size) throw() {
>    fatal("Should not call global sized delete");
> 
> Testing: gcc 4.8.5 build

This passes jdk-submit too, so Oracle CI is not affected by this change.

Trivial, right?

-- 
Thanks,
-Aleksey


From david.holmes at oracle.com  Wed Nov  6 13:08:41 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 23:08:41 +1000
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <C5373951-4E14-40D8-8328-370566273940@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
 <C5373951-4E14-40D8-8328-370566273940@oracle.com>
Message-ID: <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>

I was surprised to see this got pushed before Kim had had a chance to 
respond and when there was apparently still an open question to me. :(

And now we see it breaks gcc 4.8.5.

David

On 6/11/2019 3:37 pm, Kim Barrett wrote:
>> On Nov 5, 2019, at 8:44 PM, David Holmes <david.holmes at oracle.com> wrote:
>>
>> On 6/11/2019 11:38 am, Kim Barrett wrote:
>>>> On Nov 5, 2019, at 2:06 AM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> may I please have reviews for this small build fix:
>>>>
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
>>>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
>>>> Prior discussion: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
>>>>
>>>> Thank you,
>>>>
>>>> Thomas
>>> The "#ifdef __GNUG__" should not be needed or used.
>>> There are a dozen existing uses of PRAGMA_DIAG_PUSH.
>>> While there aren't currently any other direct uses of
>>> PRAGMA_DISABLE_GCC_WARNING, there are indirect uses via other
>>> PRAGMA_DISABLE_xxx macros.
>>> None of those have __GNUX__ protections (for any X).
>>
>> Well AFAICS they reside in a gcc specific file: compilerWarnings_gcc.hpp. But we also take steps in compilerWarnings.hpp to accommodate their use in source code when other compilers are used.
>>
>> #ifndef PRAGMA_DISABLE_GCC_WARNING
>> #define PRAGMA_DISABLE_GCC_WARNING(name)
>> #endif
>>
>> so I agree we should not need any guard to make this gcc only - unless we really do want to control the version.
> 
> If we need to control the version it?s applied to, we can use PRAGMA_STRINGOP_TRUNCATION_IGNORED as a model.
> Or just version-conditionalize at the one place it?s needed, for now.  If more -Wc++14-compat issues come up later
> then we can introduce a new macro.
> 
> 

From david.holmes at oracle.com  Wed Nov  6 13:17:04 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 23:17:04 +1000
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <176e0ce1-1c92-6503-162d-9f5c76f5eff4@redhat.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
 <176e0ce1-1c92-6503-162d-9f5c76f5eff4@redhat.com>
Message-ID: <50db634c-70d5-ae1f-7971-31be2fca79be@oracle.com>

On 6/11/2019 11:00 pm, Aleksey Shipilev wrote:
> On 11/6/19 12:33 PM, Aleksey Shipilev wrote:
>> Bug:
>>    https://bugs.openjdk.java.net/browse/JDK-8233698
>>
>> Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is the gift that keeps on
>> giving! The fix is to get even deeper into the warning disabling story:
>>
>> diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
>> --- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25 2019 +0800
>> +++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23 2019 +0100
>> @@ -89,11 +89,13 @@
>>     fatal("Should not call global delete []");
>>   }
>>
>>   #ifdef __GNUG__
>>   // Warning disabled for gcc 5.4
>> +// Warning for unknown warning disabled for gcc 4.8.5
>>   PRAGMA_DIAG_PUSH
>> +PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
>>   PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
>>   #endif // __GNUG__
>>
>>   void operator delete(void* p, size_t size) throw() {
>>     fatal("Should not call global sized delete");
>>
>> Testing: gcc 4.8.5 build
> 
> This passes jdk-submit too, so Oracle CI is not affected by this change.
> 
> Trivial, right?

I've given up trying to guess what might be trivial when it comes to 
these interactions with compilers :(

The proof of this change is in the building and you have done that so it 
seems to be good.

Thanks,
David


From thomas.stuefe at gmail.com  Wed Nov  6 13:17:14 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 6 Nov 2019 14:17:14 +0100
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
 <C5373951-4E14-40D8-8328-370566273940@oracle.com>
 <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>
Message-ID: <CAA-vtUxC6FrE8U+yTNMR4qt_3KjhZHNiHPdpidAMJ83-JVNN+Q@mail.gmail.com>

Hi David,

sorry for that. I had marked this as trivial, understood your initial
answer as valid Review and pushed this after getting a second review from
Goetz.

But no problem, I can do a follow up patch. What are the remaining
criticisms? Removal of the _GNUG_ define?

Best Regards, Thomas


On Wed, Nov 6, 2019 at 2:09 PM David Holmes <david.holmes at oracle.com> wrote:

> I was surprised to see this got pushed before Kim had had a chance to
> respond and when there was apparently still an open question to me. :(
>
> And now we see it breaks gcc 4.8.5.
>
> David
>
> On 6/11/2019 3:37 pm, Kim Barrett wrote:
> >> On Nov 5, 2019, at 8:44 PM, David Holmes <david.holmes at oracle.com>
> wrote:
> >>
> >> On 6/11/2019 11:38 am, Kim Barrett wrote:
> >>>> On Nov 5, 2019, at 2:06 AM, Thomas St?fe <thomas.stuefe at gmail.com>
> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> may I please have reviews for this small build fix:
> >>>>
> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
> >>>> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
> >>>> Prior discussion:
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Thomas
> >>> The "#ifdef __GNUG__" should not be needed or used.
> >>> There are a dozen existing uses of PRAGMA_DIAG_PUSH.
> >>> While there aren't currently any other direct uses of
> >>> PRAGMA_DISABLE_GCC_WARNING, there are indirect uses via other
> >>> PRAGMA_DISABLE_xxx macros.
> >>> None of those have __GNUX__ protections (for any X).
> >>
> >> Well AFAICS they reside in a gcc specific file:
> compilerWarnings_gcc.hpp. But we also take steps in compilerWarnings.hpp to
> accommodate their use in source code when other compilers are used.
> >>
> >> #ifndef PRAGMA_DISABLE_GCC_WARNING
> >> #define PRAGMA_DISABLE_GCC_WARNING(name)
> >> #endif
> >>
> >> so I agree we should not need any guard to make this gcc only - unless
> we really do want to control the version.
> >
> > If we need to control the version it?s applied to, we can use
> PRAGMA_STRINGOP_TRUNCATION_IGNORED as a model.
> > Or just version-conditionalize at the one place it?s needed, for now.
> If more -Wc++14-compat issues come up later
> > then we can introduce a new macro.
> >
> >
>

From david.holmes at oracle.com  Wed Nov  6 13:21:19 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 6 Nov 2019 23:21:19 +1000
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <CAA-vtUxC6FrE8U+yTNMR4qt_3KjhZHNiHPdpidAMJ83-JVNN+Q@mail.gmail.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
 <C5373951-4E14-40D8-8328-370566273940@oracle.com>
 <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>
 <CAA-vtUxC6FrE8U+yTNMR4qt_3KjhZHNiHPdpidAMJ83-JVNN+Q@mail.gmail.com>
Message-ID: <1bbf1195-023e-1d0a-36be-29b12be49041@oracle.com>

On 6/11/2019 11:17 pm, Thomas St?fe wrote:
> Hi David,
> 
> sorry for that. I had marked this as trivial, understood your initial 
> answer as valid Review and pushed this after getting a second review 
> from Goetz.
> 
> But no problem, I can do a follow up patch. What are the remaining 
> criticisms? Removal of the _GNUG_ define?

I'll let Kim decide whether it is worth following up on this.

Thanks,
David

> Best Regards, Thomas
> 
> 
> On Wed, Nov 6, 2019 at 2:09 PM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     I was surprised to see this got pushed before Kim had had a chance to
>     respond and when there was apparently still an open question to me. :(
> 
>     And now we see it breaks gcc 4.8.5.
> 
>     David
> 
>     On 6/11/2019 3:37 pm, Kim Barrett wrote:
>      >> On Nov 5, 2019, at 8:44 PM, David Holmes
>     <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>      >>
>      >> On 6/11/2019 11:38 am, Kim Barrett wrote:
>      >>>> On Nov 5, 2019, at 2:06 AM, Thomas St?fe
>     <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>> wrote:
>      >>>>
>      >>>> Hi all,
>      >>>>
>      >>>> may I please have reviews for this small build fix:
>      >>>>
>      >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8233530
>      >>>> Webrev:
>     http://cr.openjdk.java.net/~stuefe/webrevs/8233530-wc14-compat/webrev.00/webrev/
>      >>>> Prior discussion:
>     https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/036726.html
>      >>>>
>      >>>> Thank you,
>      >>>>
>      >>>> Thomas
>      >>> The "#ifdef __GNUG__" should not be needed or used.
>      >>> There are a dozen existing uses of PRAGMA_DIAG_PUSH.
>      >>> While there aren't currently any other direct uses of
>      >>> PRAGMA_DISABLE_GCC_WARNING, there are indirect uses via other
>      >>> PRAGMA_DISABLE_xxx macros.
>      >>> None of those have __GNUX__ protections (for any X).
>      >>
>      >> Well AFAICS they reside in a gcc specific file:
>     compilerWarnings_gcc.hpp. But we also take steps in
>     compilerWarnings.hpp to accommodate their use in source code when
>     other compilers are used.
>      >>
>      >> #ifndef PRAGMA_DISABLE_GCC_WARNING
>      >> #define PRAGMA_DISABLE_GCC_WARNING(name)
>      >> #endif
>      >>
>      >> so I agree we should not need any guard to make this gcc only -
>     unless we really do want to control the version.
>      >
>      > If we need to control the version it?s applied to, we can use
>     PRAGMA_STRINGOP_TRUNCATION_IGNORED as a model.
>      > Or just version-conditionalize at the one place it?s needed, for
>     now.? If more -Wc++14-compat issues come up later
>      > then we can introduce a new macro.
>      >
>      >
> 

From shade at redhat.com  Wed Nov  6 14:07:47 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 6 Nov 2019 15:07:47 +0100
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <50db634c-70d5-ae1f-7971-31be2fca79be@oracle.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
 <176e0ce1-1c92-6503-162d-9f5c76f5eff4@redhat.com>
 <50db634c-70d5-ae1f-7971-31be2fca79be@oracle.com>
Message-ID: <60d2af76-a008-8fdd-bae3-5ae23c18a0ff@redhat.com>

On 11/6/19 2:17 PM, David Holmes wrote:
> On 6/11/2019 11:00 pm, Aleksey Shipilev wrote:
>> On 11/6/19 12:33 PM, Aleksey Shipilev wrote:
>>> Bug:
>>> ?? https://bugs.openjdk.java.net/browse/JDK-8233698
>>>
>>> Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is the gift that keeps on
>>> giving! The fix is to get even deeper into the warning disabling story:
>>>
>>> diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
>>> --- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25 2019 +0800
>>> +++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23 2019 +0100
>>> @@ -89,11 +89,13 @@
>>> ??? fatal("Should not call global delete []");
>>> ? }
>>>
>>> ? #ifdef __GNUG__
>>> ? // Warning disabled for gcc 5.4
>>> +// Warning for unknown warning disabled for gcc 4.8.5
>>> ? PRAGMA_DIAG_PUSH
>>> +PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
>>> ? PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
>>> ? #endif // __GNUG__
>>>
>>> ? void operator delete(void* p, size_t size) throw() {
>>> ??? fatal("Should not call global sized delete");
>>>
>>> Testing: gcc 4.8.5 build
>>
>> This passes jdk-submit too, so Oracle CI is not affected by this change.
>>
>> Trivial, right?
> 
> I've given up trying to guess what might be trivial when it comes to these interactions with
> compilers :(
> 
> The proof of this change is in the building and you have done that so it seems to be good.

I am pushing it then to unbreak our CIs. OK?

-- 
Thanks,
-Aleksey


From kim.barrett at oracle.com  Wed Nov  6 14:46:26 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 6 Nov 2019 09:46:26 -0500
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <1bbf1195-023e-1d0a-36be-29b12be49041@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
 <C5373951-4E14-40D8-8328-370566273940@oracle.com>
 <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>
 <CAA-vtUxC6FrE8U+yTNMR4qt_3KjhZHNiHPdpidAMJ83-JVNN+Q@mail.gmail.com>
 <1bbf1195-023e-1d0a-36be-29b12be49041@oracle.com>
Message-ID: <03F3C48D-B8C6-412D-A117-75AAF0752F00@oracle.com>

> On Nov 6, 2019, at 8:21 AM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 6/11/2019 11:17 pm, Thomas St?fe wrote:
>> Hi David,
>> sorry for that. I had marked this as trivial, understood your initial answer as valid Review and pushed this after getting a second review from Goetz.
>> But no problem, I can do a follow up patch. What are the remaining criticisms? Removal of the _GNUG_ define?
> 
> I'll let Kim decide whether it is worth following up on this.

Those #ifdefs look really odd, and this file might not be touched again for a while.
What?s one more time through the spin cycle? :)  Sigh.  I thought 8233359 was going to be easy.

BTW, the unofficial protocol for ?trivial? involves explicit reviewer agreement (or suggestion).
(Maybe that part ought to be made explicit in https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change)


From daniel.daugherty at oracle.com  Wed Nov  6 15:18:31 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 6 Nov 2019 10:18:31 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
Message-ID: <80e9b031-2c1a-22c6-ce75-495515518dfe@oracle.com>

Ping!

I'm still looking for reviewers on this one. David H. is planning to
review this round and I would love to have two more reviewers...

Carsten and/or Roman! Are you guys still out there? Haven't heard from
either of you in quite a while...

Dan


On 11/4/19 4:03 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> I have made changes to the Async Monitor Deflation code in response to
> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
> and Erik O. for their comments!
>
> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
> Deflation project needs to push before Nov. 12, 2019 in order to allow
> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
> so we have 8 days from today to finish this code review cycle and push
> this code for JDK14.
>
> Carsten and Roman! Time for you guys to chime in again on the code 
> reviews.
>
> I have attached the change list from CR7 to CR8 instead of putting it in
> the body of this email. I've also added a link to the CR7-to-CR8-changes
> file to the webrevs so it should be easy to find.
>
> Main bug URL:
>
> ??? JDK-8153224 Monitor deflation prolong safepoints
> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>
> The project is currently baselined on jdk-14+21.
>
> Here's the full webrev URL for those folks that want to see all of the
> current Async Monitor Deflation code in one go (v2.08 full):
>
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
>
> Some folks might want to see just what has changed since the last review
> cycle so here's a webrev for that (v2.08 inc):
>
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
>
> The OpenJDK wiki did not need any changes for this round:
>
> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>
> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
> tier[1-8]
> testing on Oracle's usual set of platforms. It has also been through 
> my usual
> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
> addition
> of Robbin's "MoCrazy 1024" test running in parallel with the other 
> tests in
> my lab. Some testing is still running, but so far there are no new 
> regressions.
>
> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 
> bits.
>
> Thanks, in advance, for any questions, comments or suggestions.
>
> Dan
>
>
> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The Async Monitor Deflation project is reaching the end game. I have no
>> changes planned for the project at this time so all that is left is code
>> review and any changes that results from those reviews.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>> in the main body of this email.
>>
>> Main bug URL:
>>
>> ??? JDK-8153224 Monitor deflation prolong safepoints
>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+19.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.07 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.07 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>
>>
>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
>> changes:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab.
>>
>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>> tier[1-3]
>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>
>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>> Performance lab
>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>
>> ??? - "base" is jdk-14+18
>> ??? - "v2.07" is the latest version and includes C2 
>> inc_om_ref_count() support
>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>> specified
>>
>> ???????? hbIR?????????? hbIR
>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>> ??? ---------------? ---------? --------? -------------? -------
>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>> v2.07_handshake
>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 
>> v2.07_off
>>
>> ??? - The Aurora Perf comparison tool reports:
>>
>> ??????? Comparison????????????? max-jOPS critical-jOPS
>> ??????? ----------------------? -------------------- 
>> --------------------
>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>> p=0.790)
>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>> p=0.536)
>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>> p=0.347)
>>
>> ??????? (s) - significant? (ns) - not-significant
>>
>> ??? - For historical comparison, the Aurora Perf comparision tool
>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>
>> ??????? Comparison????????????? max-jOPS critical-jOPS
>> ??????? ----------------------? -------------------- 
>> --------------------
>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>> p=0.646)
>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>> p=0.481)
>>
>> ??????? (s) - significant? (ns) - not-significant
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>> for our first code review in that new context!!
>>>
>>> I've been focused on changing the monitor list management code to be
>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>> like that, it takes a while to chase down all the new and wonderful
>>> races. At this point, I have the code back to the same stability that
>>> I had with CR5/v2.05/8-for-jdk13.
>>>
>>> To lay the ground work for this round of review, I pushed the following
>>> two fixes to jdk/jdk earlier today:
>>>
>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>> preparation
>>> ? ? ??????????? for lock free Monitor lists
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>
>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>> after 8230184
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>
>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>> JDK-8230184 and JDK-8230317.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>
>>>
>>>
>>> The primary focus of this review cycle is on the lock-free Monitor List
>>> management changes so here's a webrev for just that patch (v2.06c):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>
>>>
>>> The secondary focus of this review cycle is on the bug fixes that have
>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>> patch (v2.06b):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>
>>>
>>> The third and final bucket for this review cycle is the rename, 
>>> whitespace,
>>> indent and comments changes made in preparation for lock free 
>>> Monitor list
>>> management. Almost all of that was extracted into JDK-8230184 for the
>>> baseline so this bucket now has just a few comment changes relative to
>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>
>>>
>>>
>>> Some folks might want to see just what has changed since the last 
>>> review
>>> cycle so here's a webrev for that (v2.06 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>
>>>
>>>
>>> Last, but not least, some folks might want to see the code before the
>>> addition of lock-free Monitor List management so here's a webrev for
>>> that (v2.00 -> v2.05):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>
>>>
>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> but that should only be changes to describe per-thread list async 
>>> monitor
>>> deflation being done by the ServiceThread.
>>>
>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>
>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>> Oracle's usual set of platforms. It has also been through my usual set
>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>
>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>> using
>>> this patch baselined on jdk-13+31 (for stability):
>>>
>>> ????????? hbIR?????????? hbIR
>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>> ???? ---------------? ---------? --------? ------------- -------
>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>> v2.06d_off
>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>> v2.06d_ocache
>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>> v2.05_off
>>>
>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>> the "ocache" config is with 128 byte cache line sizes instead of 64 
>>> byte
>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>> before
>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>> buckets for this review recycle.
>>>
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I've been focused on chasing down and fixing the rare test failures
>>>> that only pop up rarely. So this round is primarily fixes for races
>>>> with a few additional fixes that came from Karen's review of CR4.
>>>> Thanks Karen!
>>>>
>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>> (jdk/jdk) repo soon...
>>>>
>>>> Here's the full webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>
>>>> Here's the incremental webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>
>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>> to match the CR5 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>> available.
>>>> Since I haven't made any performance changes in this round, I'll only
>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>
>>>> Next up:
>>>>
>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>> ? Linux-X64 and we've determined that some of that comes from
>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>> - Of course, going lock free often causes new races and new failures
>>>> ? so that's a good reason for make those changes isolated in their
>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>> - I finally have a potential fix for the Win* failure with
>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>> ? but I haven't run it through Mach5 yet so it'll be in the next 
>>>> round.
>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>> ? monitor related failures there. I suspect that I need to go take a
>>>> ? look at the C2 RTM macro assembler code and look for things that 
>>>> might
>>>> ? conflict if Async Monitor Deflation. If you're interested in that 
>>>> kind
>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>> ? added in this round!
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a fix for an issue that came up during performance testing.
>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>> experiments.
>>>>>
>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>> verbose due to the complexity of the issue, but the changes
>>>>> themselves are not that big.
>>>>>
>>>>> Functional:
>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>> ??????? safepoints that simply request async deflation which
>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>> ??????? their async deflation work.
>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>> ??? - is_async_deflation_needed() returns true when
>>>>> ????? is_async_cleanup_requested() is true or when
>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>> ????? AsyncDeflationInterval).
>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>> ??????? (default is 1000 millis/1 second)
>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>> ??????? This is to prevent async deflation from swamping the
>>>>> ??????? ServiceThread.
>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>> ??? of their in-use lists.
>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>> ????? GuaranteedSafepointInterval.
>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>
>>>>> Test update:
>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>
>>>>> Collateral:
>>>>> ? - Add/clarify/update some logging messages.
>>>>>
>>>>> Cleanup:
>>>>> ? - Updated comments based on Karen's code review.
>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>> ??? - comment and function name changes
>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+22.
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>
>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>> mechanism so I have to figure out how to add that content.
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I had some discussions with Karen about a race that was in the
>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>> simple: remove the special case code for async deflation in the
>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>> for ObjectMonitor::enter() protection.
>>>>>>
>>>>>> During those discussions Karen also floated the idea of using the
>>>>>> ref_count field instead of the contentions field for the Async
>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>> Sorry for all the names...
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>
>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>> stress kit is running right now.
>>>>>>
>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>> the results and analyze them.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>> Deflation
>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>> with the
>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). 
>>>>>>> Sorry
>>>>>>> for all the names...
>>>>>>>
>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>> patch
>>>>>>> is out of our hair.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>
>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>> running
>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>
>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>> testing is done.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>> ready to
>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>> and as
>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>> names...
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> Baseline bug fixes URL:
>>>>>>>>
>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>> Deflation project
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>
>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>
>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>> version
>>>>>>>> of JDK-8153224...
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now 
>>>>>>>> and
>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>> Solaris-X64
>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>> work on:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>
>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>
>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>> rounds of
>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>> testing on
>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>> Kitchensink
>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>> fastdebug
>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>>>> stress
>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>
>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>> latest version of the patch.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> P.S.
>>>>>>>>> One subtest in 
>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>> trying
>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>> aggravating
>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


From alex.buckley at oracle.com  Wed Nov  6 17:44:14 2019
From: alex.buckley at oracle.com (Alex Buckley)
Date: Wed, 6 Nov 2019 09:44:14 -0800
Subject: RFR: CSR JVM support for records
In-Reply-To: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
Message-ID: <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>

Thanks for getting the JVMS changes attached to the CSR. I did some 
light editing on the body of the CSR as part of trying to figure out 
what reflection methods are being supported. It would be nice if the CSR 
compared its support for j.l.r.RecordComponent to support for other APIs 
like j.l.r.Parameter and j.l.Class::getAnnotations which also perform 
consistency checking only when the API is called ... but I don't want to 
hold it up.

Alex

On 11/5/2019 6:27 AM, Harold Seigel wrote:
> Hi,
> 
> Please review this draft of the CSR for JVM support for records.
> 
> CSR: https://bugs.openjdk.java.net/browse/JDK-8233595
> 
> Thanks, Harold

From kim.barrett at oracle.com  Wed Nov  6 17:50:33 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 6 Nov 2019 12:50:33 -0500
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <03F3C48D-B8C6-412D-A117-75AAF0752F00@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
 <C5373951-4E14-40D8-8328-370566273940@oracle.com>
 <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>
 <CAA-vtUxC6FrE8U+yTNMR4qt_3KjhZHNiHPdpidAMJ83-JVNN+Q@mail.gmail.com>
 <1bbf1195-023e-1d0a-36be-29b12be49041@oracle.com>
 <03F3C48D-B8C6-412D-A117-75AAF0752F00@oracle.com>
Message-ID: <52E805BA-991D-4211-ABE8-6AE6DDB619F7@oracle.com>

> On Nov 6, 2019, at 9:46 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Nov 6, 2019, at 8:21 AM, David Holmes <david.holmes at oracle.com> wrote:
>> 
>> On 6/11/2019 11:17 pm, Thomas St?fe wrote:
>>> Hi David,
>>> sorry for that. I had marked this as trivial, understood your initial answer as valid Review and pushed this after getting a second review from Goetz.
>>> But no problem, I can do a follow up patch. What are the remaining criticisms? Removal of the _GNUG_ define?
>> 
>> I'll let Kim decide whether it is worth following up on this.
> 
> Those #ifdefs look really odd, and this file might not be touched again for a while.
> What?s one more time through the spin cycle? :)  Sigh.  I thought 8233359 was going to be easy.

Let?s leave it to post JEP 347 cleanup: https://bugs.openjdk.java.net/browse/JDK-8233724


From kim.barrett at oracle.com  Wed Nov  6 17:51:32 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 6 Nov 2019 12:51:32 -0500
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <60d2af76-a008-8fdd-bae3-5ae23c18a0ff@redhat.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
 <176e0ce1-1c92-6503-162d-9f5c76f5eff4@redhat.com>
 <50db634c-70d5-ae1f-7971-31be2fca79be@oracle.com>
 <60d2af76-a008-8fdd-bae3-5ae23c18a0ff@redhat.com>
Message-ID: <63245994-B8AA-4693-8941-852F8493D5B4@oracle.com>

> On Nov 6, 2019, at 9:07 AM, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> On 11/6/19 2:17 PM, David Holmes wrote:
>> On 6/11/2019 11:00 pm, Aleksey Shipilev wrote:
>>> On 11/6/19 12:33 PM, Aleksey Shipilev wrote:
>>>> Bug:
>>>>    https://bugs.openjdk.java.net/browse/JDK-8233698
>>>> 
>>>> Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is the gift that keeps on
>>>> giving! The fix is to get even deeper into the warning disabling story:
>>>> 
>>>> diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
>>>> --- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25 2019 +0800
>>>> +++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23 2019 +0100
>>>> @@ -89,11 +89,13 @@
>>>>     fatal("Should not call global delete []");
>>>>   }
>>>> 
>>>>   #ifdef __GNUG__
>>>>   // Warning disabled for gcc 5.4
>>>> +// Warning for unknown warning disabled for gcc 4.8.5
>>>>   PRAGMA_DIAG_PUSH
>>>> +PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
>>>>   PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
>>>>   #endif // __GNUG__
>>>> 
>>>>   void operator delete(void* p, size_t size) throw() {
>>>>     fatal("Should not call global sized delete");
>>>> 
>>>> Testing: gcc 4.8.5 build
>>> 
>>> This passes jdk-submit too, so Oracle CI is not affected by this change.
>>> 
>>> Trivial, right?
>> 
>> I've given up trying to guess what might be trivial when it comes to these interactions with
>> compilers :(
>> 
>> The proof of this change is in the building and you have done that so it seems to be good.
> 
> I am pushing it then to unbreak our CIs. OK?

For the record, this solution looks okay to me too.


From thomas.stuefe at gmail.com  Wed Nov  6 17:51:55 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 6 Nov 2019 18:51:55 +0100
Subject: RFR(T): 8233530: gcc 5.4 build warning -Wc++14-compat after
 JDK-8233359
In-Reply-To: <52E805BA-991D-4211-ABE8-6AE6DDB619F7@oracle.com>
References: <CAA-vtUzzVyOR=1ttrjALjPXPt5p9YM7kMhrkxB9eby-epKK9wA@mail.gmail.com>
 <58113575-CE3A-4F52-92F0-05C247F0623A@oracle.com>
 <b7710fae-84f7-4e6f-ffd8-d0a2d8c99978@oracle.com>
 <C5373951-4E14-40D8-8328-370566273940@oracle.com>
 <0e6e45bf-cad5-d4cb-7816-a8b5be6c730c@oracle.com>
 <CAA-vtUxC6FrE8U+yTNMR4qt_3KjhZHNiHPdpidAMJ83-JVNN+Q@mail.gmail.com>
 <1bbf1195-023e-1d0a-36be-29b12be49041@oracle.com>
 <03F3C48D-B8C6-412D-A117-75AAF0752F00@oracle.com>
 <52E805BA-991D-4211-ABE8-6AE6DDB619F7@oracle.com>
Message-ID: <CAA-vtUz8=+jM7k8Drf6gd7=d=zSQB2Fk+QskTuQe4bcN13uVZw@mail.gmail.com>

Okay I'm fine with that.

Thanks, Thomas

On Wed, Nov 6, 2019, 18:50 Kim Barrett <kim.barrett at oracle.com> wrote:

> > On Nov 6, 2019, at 9:46 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
> >
> >> On Nov 6, 2019, at 8:21 AM, David Holmes <david.holmes at oracle.com>
> wrote:
> >>
> >> On 6/11/2019 11:17 pm, Thomas St?fe wrote:
> >>> Hi David,
> >>> sorry for that. I had marked this as trivial, understood your initial
> answer as valid Review and pushed this after getting a second review from
> Goetz.
> >>> But no problem, I can do a follow up patch. What are the remaining
> criticisms? Removal of the _GNUG_ define?
> >>
> >> I'll let Kim decide whether it is worth following up on this.
> >
> > Those #ifdefs look really odd, and this file might not be touched again
> for a while.
> > What?s one more time through the spin cycle? :)  Sigh.  I thought
> 8233359 was going to be easy.
>
> Let?s leave it to post JEP 347 cleanup:
> https://bugs.openjdk.java.net/browse/JDK-8233724
>
>

From frederic.parain at oracle.com  Wed Nov  6 17:57:23 2019
From: frederic.parain at oracle.com (Frederic Parain)
Date: Wed, 6 Nov 2019 12:57:23 -0500
Subject: RFR (S) 8233086 [TESTBUG] need to test field layout style
 difference between CDS dump time and run time
In-Reply-To: <6c43a4c4-a5df-544a-6472-e921cd94aa56@oracle.com>
References: <6c43a4c4-a5df-544a-6472-e921cd94aa56@oracle.com>
Message-ID: <683231E8-F2EA-4BB7-81AC-F3EAB3FDC299@oracle.com>

Looks good to me.

Typo in FieldLayoutFlags.java:94
   we can it -> we can use it

Fred


> On Nov 5, 2019, at 23:14, Ioi Lam <ioi.lam at oracle.com> wrote:
> 
> https://bugs.openjdk.java.net/browse/JDK-8233086
> http://cr.openjdk.java.net/~iklam/jdk14/8233086-cds-field-layout-test.v01/
> 
> These VM options control how non-static fields are laid out.
> 
>     FieldsAllocationStyle
>     CompactFields
>     EnableContended
>     ContendedPaddingWidth
>     RestrictContended
> 
> It's possible to set different values for these options between CDS
> dump time and program run time. As a result, it's possible to have
> an archived super class that has one type of field layout, while an
> non-archived sub class that has a different type of field layout.
> 
> I added a new test case to verify that the VM works properly when this
> happens.
> 
> Thanks
> - Ioi


From vladimir.kozlov at oracle.com  Wed Nov  6 18:01:44 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 6 Nov 2019 10:01:44 -0800
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter
 with CDS due to code cache exhaustion
In-Reply-To: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
References: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
Message-ID: <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>

CC to runtime group too.

Looks good to me.

Thanks,
Vladimir

On 11/6/19 5:34 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8233491
> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/
> 
> When running a stress test with CDS, we fail to create adapters when linking a method from a shared
> class because the code cache is full. This case is not properly handled by the CDS specific code and
> instead of throwing a VirtualMachineError, we crash because "entry" is NULL.
> 
> I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the
> class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg
> test. However, I've verified that the patch fixes the problem.
> 
> Thanks,
> Tobias
> 
> [1]
> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462
> 

From ioi.lam at oracle.com  Wed Nov  6 18:12:53 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 6 Nov 2019 10:12:53 -0800
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter
 with CDS due to code cache exhaustion
In-Reply-To: <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
References: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
 <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
Message-ID: <5800caa0-b7d4-fadd-0516-53351eb014fe@oracle.com>

Looks good to me. Thanks for fixing this. I think I introduced this bug :-(

- Ioi

On 11/6/19 10:01 AM, Vladimir Kozlov wrote:
> CC to runtime group too.
>
> Looks good to me.
>
> Thanks,
> Vladimir
>
> On 11/6/19 5:34 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8233491
>> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/
>>
>> When running a stress test with CDS, we fail to create adapters when 
>> linking a method from a shared
>> class because the code cache is full. This case is not properly 
>> handled by the CDS specific code and
>> instead of throwing a VirtualMachineError, we crash because "entry" 
>> is NULL.
>>
>> I'm able to spuriously reproduce this with a test (see [1]) but since 
>> the problem depends on the
>> class loading sequence, I was not able to make it more reliable or 
>> convert it to a robust jtreg
>> test. However, I've verified that the patch fixes the problem.
>>
>> Thanks,
>> Tobias
>>
>> [1]
>> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462 
>>
>>


From harold.seigel at oracle.com  Wed Nov  6 18:14:05 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Wed, 6 Nov 2019 13:14:05 -0500
Subject: RFR: CSR JVM support for records
In-Reply-To: <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
 <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>
Message-ID: <6af860a2-20d4-a194-d4ad-c17e58df6ffa@oracle.com>

Hi Alex,

Thanks for your help and for reviewing it!

Note that the JVM does consistency check the Records attribute at class 
load time, not at first use by reflection. So, perhaps this sentence:

    Note that if no reflection is performed then the abstract JVM does
    not care about the Record attribute in any way.

to something like

    The format of the record attribute is checked even if no reflection
    is performed.

Harold

On 11/6/2019 12:44 PM, Alex Buckley wrote:
> Thanks for getting the JVMS changes attached to the CSR. I did some 
> light editing on the body of the CSR as part of trying to figure out 
> what reflection methods are being supported. It would be nice if the 
> CSR compared its support for j.l.r.RecordComponent to support for 
> other APIs like j.l.r.Parameter and j.l.Class::getAnnotations which 
> also perform consistency checking only when the API is called ... but 
> I don't want to hold it up.
>
> Alex
>
> On 11/5/2019 6:27 AM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this draft of the CSR for JVM support for records.
>>
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8233595
>>
>> Thanks, Harold

From ioi.lam at oracle.com  Wed Nov  6 18:22:28 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 6 Nov 2019 10:22:28 -0800
Subject: RFR(trivial): 8233671: [TESTBUG]
 runtime/cds/appcds/sharedStrings/FlagCombo.java fails to compile without jfr
In-Reply-To: <b0f4c15f-0345-2054-cb03-050b22e5741b@loongson.cn>
References: <b0f4c15f-0345-2054-cb03-050b22e5741b@loongson.cn>
Message-ID: <fa4f3233-b632-cc91-a222-261296bcb047@oracle.com>

Hi Jie,

Looks good and trivial to me. Thanks for fixing this. I've sponsored the 
changes and pushed.

http://hg.openjdk.java.net/jdk/jdk/rev/38d4202154f2

- Ioi

On 11/5/19 11:24 PM, Jie Fu wrote:
> Hi all,
>
> May I get reviews for the one-line change?
>
> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233671
> Webrev: http://cr.openjdk.java.net/~jiefu/8233671/webrev.00/
>
> Thanks a lot.
> Best regards,
> Jie
>
>


From alex.buckley at oracle.com  Wed Nov  6 18:28:12 2019
From: alex.buckley at oracle.com (Alex Buckley)
Date: Wed, 6 Nov 2019 10:28:12 -0800
Subject: RFR: CSR JVM support for records
In-Reply-To: <6af860a2-20d4-a194-d4ad-c17e58df6ffa@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
 <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>
 <6af860a2-20d4-a194-d4ad-c17e58df6ffa@oracle.com>
Message-ID: <91e9ce5f-b636-ef6d-850c-6c9fa454946b@oracle.com>

On 11/6/2019 10:14 AM, Harold Seigel wrote:
> Note that the JVM does consistency check the Records attribute at class 
> load time, not at first use by reflection. So, perhaps this sentence:
> 
>     Note that if no reflection is performed then the abstract JVM does
>     not care about the Record attribute in any way.
> 
> to something like
> 
>     The format of the record attribute is checked even if no reflection
>     is performed.

Given how Record is described in JVMS 4.7 ("each of these attributes 
must be recognized and correctly read by an implementation of the Java 
Virtual Machine"), it makes sense that the HotSpot JVM is format 
checking a Record attribute at load time. So, yes, please make the 
change you describe above, and please explicitly compare Record to the 
format checking performed for Exceptions, InnerClasses, etc, and 
contrast Record with the lack of format checking for MethodParameters, 
Module, etc.

Alex

From harold.seigel at oracle.com  Wed Nov  6 19:17:31 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Wed, 6 Nov 2019 14:17:31 -0500
Subject: RFR: CSR JVM support for records
In-Reply-To: <91e9ce5f-b636-ef6d-850c-6c9fa454946b@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
 <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>
 <6af860a2-20d4-a194-d4ad-c17e58df6ffa@oracle.com>
 <91e9ce5f-b636-ef6d-850c-6c9fa454946b@oracle.com>
Message-ID: <fb4cbab3-2a0f-2c1b-cd2e-47cf24692f33@oracle.com>

Hi Alex,

I updated the CSR, hopefully with the info you requested.

Thanks, Harold

On 11/6/2019 1:28 PM, Alex Buckley wrote:
> On 11/6/2019 10:14 AM, Harold Seigel wrote:
>> Note that the JVM does consistency check the Records attribute at 
>> class load time, not at first use by reflection. So, perhaps this 
>> sentence:
>>
>> ??? Note that if no reflection is performed then the abstract JVM does
>> ??? not care about the Record attribute in any way.
>>
>> to something like
>>
>> ??? The format of the record attribute is checked even if no reflection
>> ??? is performed.
>
> Given how Record is described in JVMS 4.7 ("each of these attributes 
> must be recognized and correctly read by an implementation of the Java 
> Virtual Machine"), it makes sense that the HotSpot JVM is format 
> checking a Record attribute at load time. So, yes, please make the 
> change you describe above, and please explicitly compare Record to the 
> format checking performed for Exceptions, InnerClasses, etc, and 
> contrast Record with the lack of format checking for MethodParameters, 
> Module, etc.
>
> Alex

From ioi.lam at oracle.com  Wed Nov  6 21:17:03 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 6 Nov 2019 13:17:03 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
Message-ID: <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>

Hi Jiangli,

I've uploaded the webrev after integrating your comments:

http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/

Please see more replies below:


On 11/4/19 5:52 PM, Jiangli Zhou wrote:
> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com 
> <mailto:ioi.lam at oracle.com>> wrote:
>
>     Hi Jiangli,
>
>     Thank you so much for spending time reviewing this RFE!
>
>     On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>     > Hi Ioi,
>     >
>     > Sorry for the delay again. Will try to put this on the top of my
>     list
>     > next week and reduce the turn-around time. The updates look good in
>     > general.
>     >
>     > We might want to have a better strategy when choosing metadata
>     > relocation address (when relocation is needed). Some
>     > applications/benchmarks may be more sensitive to cache locality and
>     > memory/data layout. There was a bug,
>     > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused 1G gap
>     > between Java heap data and metadata before JDK 12. The gap seemed to
>     > cause a small but noticeable runtime effect in one case that I came
>     > across.
>
>     I guess you're saying we should try to relocate the archive into
>     somewhere under 32GB?
>
>
> I don't yet have sufficient data that determins if mapping at low 32G 
> produces better runtime performance. I experimented with that, but 
> didn't see noticeable difference when comparing to mapping at the 
> current default address. It doesn't hurt, I think. So it may be a 
> better choice than relocating to a random address in high?32G space 
> (when Java heap is in low 32G address space).

Maybe we should reconsider this when we have more concrete data for the 
benefits of moving the compressed class space to under 32G.

Please note that in metaspace.cpp, when CDS is disabled and? the VM 
fails to allocate the class space at the requested address (0x7c000000 
for 16GB heap), it also just allocates from a random address (without 
trying to to search under 32GB):

http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128

This code has been there since 2013 and we have not seen any issues.


>
>     Could you elaborate more about the performance issue, especially
>     about
>     cache locality? I looked at JDK-8213713 but it didn't mention about
>     performance.
>
>
> When enabling CDS we noticed a small runtime overhead in JDK 11 
> recently with a benchmark. After I backported JDK-8213713 to 11, it 
> seemed to reduce the runtime overhead that the benchmark was experiencing.
>
>
>     Also, by default, we have non-zero narrow_klass_base and
>     narrow_klass_shift = 3, and archive relocation doesn't change that:
>
>     $ java -Xlog:cds=debug -version
>     ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>     $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>     ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>
>     We always use narrow_klass_shift due to this:
>
>     ?? // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>     ?? //
>     MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>     ?? // how dump time narrow_klass_shift is set. Although, CDS can work
>     ?? // with zero-shift mode also, to be consistent with AOT it uses
>     ?? // LogKlassAlignmentInBytes for klass shift so archived java
>     heap objects
>     ?? // can be used at same time as AOT code.
>     ?? if (!UseSharedSpaces
>     ?????? && (uint64_t)(higher_address - lower_base) <=
>     UnscaledClassSpaceMax) {
>     ???? CompressedKlassPointers::set_shift(0);
>     ?? } else {
>     CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>     ?? }
>
>
> Right. If we relocate to low 32G space, it needs to make sure that the 
> range containing the mapped class data and class space must be encodable.
>
>
>     > Here are some additional comments (minor).
>     >
>     > Could you please fix the long lines in the following?
>     >
>     > 1237 void
>     java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>     > archived_mirror) {
>     > 1238? ?if (MetaspaceShared::relocation_delta() != 0) {
>     > 1239 ?assert(archived_mirror->metadata_field(_klass_offset) ==
>     > NULL, "must be for primitive class");
>     > 1240
>     > 1241? ? ?Klass* ak =
>     > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>     > 1242? ? ?if (ak != NULL) {
>     > 1243 ?archived_mirror->metadata_field_put(_array_klass_offset,
>     > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>     > 1244? ? ?}
>     > 1245? ?}
>     > 1246 }
>     >
>     > src/hotspot/share/memory/dynamicArchive.cpp
>     >
>     >? ?889? ?Thread* THREAD = Thread::current();
>     >? ?890? ?Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>     > dynamic_dump_method_comparator);
>     >? ?891? ?if (ik->default_methods() != NULL) {
>     >? ?892 ?Method::sort_methods(ik->default_methods(),
>     > /*set_idnums=*/false, dynamic_dump_method_comparator);
>     >? ?893? ?}
>     >
>
>     OK will do.
>
>     > Please see inlined comments below.
>     >
>     > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>     <mailto:ioi.lam at oracle.com>> wrote:
>     >> Hi Jiangli,
>     >>
>     >> Thanks for the review. I've updated the patch according to your
>     comments:
>     >>
>     >>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>     >>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>     >>
>     >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>     in my
>     >> reply to Calvin's comments).
>     >>
>     >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>     >>> Hi Ioi,
>     >>>
>     >>> Sorry for the delay. Here are my remaining comments.
>     >>>
>     >>> - src/hotspot/share/memory/dynamicArchive.cpp
>     >>>
>     >>> 128? ?static intx _method_comparator_name_delta;
>     >>>
>     >>> The name of the above variable is confusing. It's the value of
>     >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>     >>> directly.
>     >> _buffer_to_target_delta is a non-static field, but
>     >> dynamic_dump_method_comparator() must be a static function so
>     it can't
>     >> use the non-static field easily.
>     >
>     > It sounds like an issue. _buffer_to_target_delta was made as a
>     > non-static mostly because we might support more than one dynamic
>     > archives in the future. However, today's usages bake in an
>     assumption
>     > that _buffer_to_target_delta is a singleton value. It is cleaner to
>     > either make _buffer_to_target_delta as a static variable for now, or
>     > adding an access API in DynamicArchiveBuilder to allow other code to
>     > properly and correctly use the value.
>
>     OK, I'll move it to a static variable.
>
>     >
>     >>> Also, we can do a quick pointer comparison of 'a_name' and
>     >>> 'b_name' first before adjusting the pointers.
>     >> I added this:
>     >>
>     >>? ? ? ?if (a_name == b_name) {
>     >>? ? ? ? ?return 0;
>     >>? ? ? ?}
>     >>
>     >>> ---
>     >>>
>     >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>     >>> ...
>     >>>? ? 944
>     >>>? ? 945 ?ArchivePtrMarker::compact(relocatable_base,
>     relocatable_end);
>     >>> ...
>     >>>
>     >>>? ? 974? ? ?SharedDataRelocator patcher((address*)patch_base,
>     >>> (address*)patch_end, valid_old_base, valid_old_end,
>     >>>? ? 975 ?valid_new_base, valid_new_end, addr_delta);
>     >>>? ? 976 ?ArchivePtrMarker::ptrmap()->iterate(&patcher);
>     >>>
>     >>> Could we reduce the number of data re-iterations to help archive
>     >>> dumping performance. The ArchivePtrMarker::compact operation
>     can be
>     >>> combined with the patching iteration.
>     ArchivePtrMarker::compact API
>     >>> can be removed.
>     >> That's a good idea. I implemented it using a template parameter
>     so that
>     >> we can have max performance when relocating the archive at run
>     time.
>     >>
>     >> I added comments to explain why the relocation is done here. The
>     >> relocation is pretty rare (only when the base archive was not
>     mapped at
>     >> the default location).
>     >>
>     >>> ---
>     >>>
>     >>>? ? 967? ? ?address valid_new_base =
>     >>> (address)Arguments::default_SharedBaseAddress();
>     >>>? ? 968? ? ?address valid_new_end? = valid_new_base +
>     base_plus_top_size;
>     >>>
>     >>> The debugging only code can be included under #ifdef ASSERT.
>     >> These values are actually also used in debug logging so they
>     can't be
>     >> ifdef'ed out.
>     >>
>     >> Also, the c++ compiler is pretty good with eliding code that's no
>     >> actually used. If I comment out all the logging code in
>     >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>     >> SharedDataRelocator, gcc elides all the unused fields and their
>     >> assignments. So no code is generated for this, etc.
>     >>
>     >>? ? ? ?address valid_new_base =
>     >> (address)Arguments::default_SharedBaseAddress();
>     >>
>     >> Since #ifdef ASSERT makes the code harder to read, I think we
>     should use
>     >> it only when really necessary.
>     > It seems cleaner to get rid of these debugging only variables, by
>     > using 'relocatable_base' and
>     > '(address)Arguments::default_SharedBaseAddress()' in the logging
>     code.
>
>     SharedDataRelocator is used under 3 different situations. These six
>     variables (patch_base, patch_end, valid_old_base, valid_old_end,
>     valid_new_base, valid_new_end) describes what is being patched,
>     and what
>     the expectations are, for each situation. The code will be hard to
>     understand without them.
>
>     Please note there's also logging code in the SharedDataRelocator
>     constructor that prints out these values.
>
>     I think I'll just remove the 'debug only' comment to avoid confusion.
>
>
> Ok.
>
>
>     >
>     >>> ---
>     >>>
>     >>>? ? 993
>     ?dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>     >>>
>     >>> We could combine the archived heap data bitmap into the new
>     region as
>     >>> well? It can be handled as a separate RFE.
>     >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>     >>
>     >>> - src/hotspot/share/memory/filemap.cpp
>     >>>
>     >>> 1038? ? ?if (is_static()) {
>     >>> 1039? ? ? ?if (errno == ENOENT) {
>     >>> 1040? ? ? ? ?// Not locating the shared archive is ok.
>     >>> 1041? ? ? ? ?fail_continue("Specified shared archive not found
>     (%s).",
>     >>> _full_path);
>     >>> 1042? ? ? ?} else {
>     >>> 1043? ? ? ? ?fail_continue("Failed to open shared archive file
>     (%s).",
>     >>> 1044 ?os::strerror(errno));
>     >>> 1045? ? ? ?}
>     >>> 1046? ? ?} else {
>     >>> 1047? ? ? ?log_warning(cds, dynamic)("specified dynamic archive
>     >>> doesn't exist: %s", _full_path);
>     >>> 1048? ? ?}
>     >>>
>     >>> If the top layer is explicitly specified by the user, a
>     warning does
>     >>> not seem to be a proper behavior if the VM fails to open the
>     archive
>     >>> file.
>     >>>
>     >>> If might be better to handle the relocation unrelated code in
>     separate
>     >>> changeset and track with a separate RFE.
>     >> This code was moved from
>     >>
>     >>
>     http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>     >>
>     >> so I am not changing the behavior. If you want, we can file an
>     REF to
>     >> change the behavior.
>     > Ok. A new RFE sounds like the right thing to re-evaluable the usage
>     > issue here. Thanks.
>
>     I created https://bugs.openjdk.java.net/browse/JDK-8233446
>
>     >>> ---
>     >>>
>     >>> 1148 void FileMapInfo::write_region(int region, char* base,
>     size_t size,
>     >>> 1149? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool read_only, bool
>     allow_exec) {
>     >>> ...
>     >>> 1154
>     >>> 1155? ?if (region == MetaspaceShared::bm) {
>     >>> 1156? ? ?target_base = NULL;
>     >>> 1157? ?} else if (DynamicDumpSharedSpaces) {
>     >>>
>     >>> It's not too clear to me how the bitmap (bm) region is handled
>     for the
>     >>> base layer and top layer. Could you please explain?
>     >> The bm region for both layers are mapped at an address picked
>     by the OS:
>     >>
>     >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>     >>? ? ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>     >>? ? ?bitmap_size = si->used_aligned();
>     >>? ? ?bool read_only = true, allow_exec = false;
>     >>? ? ?char* requested_addr = NULL; // allow OS to pick any location
>     >>? ? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>     si->file_offset(),
>     >> requested_addr, bitmap_size,
>     >> read_only, allow_exec);
>     >>
>     > Ok, after staring at the code for a few seconds I saw that's
>     intended.
>     > If the current region is 'bm', then the 'target_base' is NULL
>     > regardless if it's static or dynamic archive. Otherwise, the
>     > 'target_base' is handled differently for the static and dynamic
>     case.
>     > The following would be cleaner and has better reliability.
>     >
>     >? ? ?char* target_base = NULL;
>     >
>     >? ? ?// The target_base is NULL for 'bm' region.
>     >? ? ?if (!region == MetaspaceShared::bm) {
>     >? ? ? ?if (DynamicDumpSharedSpaces) {
>     >? ? ? ? ?assert(!HeapShared::is_heap_region(region), "dynamic archive
>     > doesn't support heap regions");
>     >? ? ? ? ?target_base = DynamicArchive::buffer_to_target(base);
>     >? ? ? ?} else {
>     >? ? ? ? ?target_base = base;
>     >? ? ? ?}
>     >? ? }
>
>     How about this?
>
>     ?? char* target_base;
>     ?? if (region == MetaspaceShared::bm) {
>     ???? target_base = NULL; // always NULL for bm region.
>     ?? } else {
>     ???? if (DynamicDumpSharedSpaces) {
>     ??? ? ?? assert(!HeapShared::is_heap_region(region), "dynamic archive
>     doesn't support heap regions");
>     ???????? target_base = DynamicArchive::buffer_to_target(base);
>     ???? } else {
>     ???????? target_base = base;
>     ???? }
>     ?? }
>
>
> No objection If you prefer the extra 'else' block.
>
>
>     >
>     >>> ---
>     >>>
>     >>> 1362
>     ?DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>     >>>
>     >>> Could you please explain the above?
>     >> I added the comments
>     >>
>     >>? ? ?// Make sure we don't attempt to use
>     header()->mapped_base_address()
>     >> unless
>     >>? ? ?// it's been successfully mapped.
>     >>
>     DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>     >>
>     >>> ---
>     >>>
>     >>> 1359? ?FileMapRegion* last_region = NULL;
>     >>>
>     >>> 1371? ? ?if (last_region != NULL) {
>     >>> 1372? ? ? ?// Ensure that the OS won't be able to allocate new
>     memory
>     >>> spaces between any mapped
>     >>> 1373? ? ? ?// regions, or else it would mess up the simple
>     comparision
>     >>> in MetaspaceObj::is_shared().
>     >>> 1374? ? ? ?assert(si->mapped_base() == last_region->mapped_end(),
>     >>> "must have no gaps");
>     >>>
>     >>> 1379? ? ?last_region = si;
>     >>>
>     >>> Can you please place 'last_region' related code under #ifdef
>     ASSERT?
>     >> I think that will make the code more cluttered. The compiler will
>     >> optimize out that away.
>     > It's cleaner to define debugging only variable for debugging only
>     > builds. You can wrapper it and related usage with DEBUG_ONLY.
>
>     OK, will do.
>
>     >
>     >>> ---
>     >>>
>     >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>     bitmap_size) {
>     >>> 1479? ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>     >>> 1480? ?bitmap_size = si->used_aligned();
>     >>> 1481? ?bool read_only = true, allow_exec = false;
>     >>> 1482? ?char* requested_addr = NULL; // allow OS to pick any
>     location
>     >>> 1483? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>     si->file_offset(),
>     >>> 1484 requested_addr, bitmap_size,
>     >>> read_only, allow_exec);
>     >>>
>     >>> We need to handle mapping failure here.
>     >> It's handled here:
>     >>
>     >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>     >>? ? ?log_debug(cds, reloc)("runtime archive relocation start");
>     >>? ? ?size_t bitmap_size;
>     >>? ? ?char* bitmap_base = map_relocation_bitmap(bitmap_size);
>     >>? ? ?if (bitmap_base != NULL) {
>     >>? ? ?...
>     >>? ? ?} else {
>     >>? ? ? ?log_error(cds)("failed to map relocation bitmap");
>     >>? ? ? ?return false;
>     >>? ? ?}
>     >>
>     > 'bitmap_base' is used immediately after map_memory(). So the check
>     > needs to be done immediately after map_memory(), but not in the
>     caller
>     > of map_relocation_bitmap().
>     >
>     > 1490? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>     si->file_offset(),
>     > 1491 requested_addr, bitmap_size,
>     > read_only, allow_exec);
>     > 1492
>     > 1493? ?if (VerifySharedSpaces && bitmap_base != NULL &&
>     > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>
>     OK, I'll fix that.
>
>     >
>     >
>     >>> ---
>     >>>
>     >>> 1513? ? ?// debug only -- the current value of the pointers to be
>     >>> patched must be within this
>     >>> 1514? ? ?// range (i.e., must be between the requesed base
>     address,
>     >>> and the of the current archive).
>     >>> 1515? ? ?// Note: top archive may point to objects in the base
>     >>> archive, but not the other way around.
>     >>> 1516? ? ?address valid_old_base =
>     (address)header()->requested_base_address();
>     >>> 1517? ? ?address valid_old_end? = valid_old_base +
>     mapping_end_offset();
>     >>>
>     >>> Please place all FileMapInfo::relocate_pointers debugging only
>     code
>     >>> under #ifdef ASSERT.
>     >> Ditto about ifdef ASSERT
>     >>
>     >>> - src/hotspot/share/memory/heapShared.cpp
>     >>>
>     >>>? ? 441 void
>     HeapShared::initialize_from_archived_subgraph(Klass* k) {
>     >>>? ? 442? ?if (!open_archive_heap_region_mapped() ||
>     !MetaspaceObj::is_shared(k)) {
>     >>>? ? 443? ? ?return; // nothing to do
>     >>>? ? 444? ?}
>     >>>
>     >>> When do we call HeapShared::initialize_from_archived_subgraph
>     for a
>     >>> klass that's not shared?
>     >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>     that for
>     >> debugging purposes only.
>     >>
>     >>>? ? 616? ?DEBUG_ONLY({
>     >>>? ? 617? ? ? ?Klass* klass = orig_obj->klass();
>     >>>? ? 618? ? ? ?assert(klass != SystemDictionary::Module_klass() &&
>     >>>? ? 619? ? ? ? ? ? ? klass !=
>     SystemDictionary::ResolvedMethodName_klass() &&
>     >>>? ? 620? ? ? ? ? ? ? klass !=
>     SystemDictionary::MemberName_klass() &&
>     >>>? ? 621? ? ? ? ? ? ? klass != SystemDictionary::Context_klass() &&
>     >>>? ? 622? ? ? ? ? ? ? klass !=
>     SystemDictionary::ClassLoader_klass(), "we
>     >>> can only relocate metaspace object pointers inside java_lang_Class
>     >>> instances");
>     >>>? ? 623? ? ?});
>     >>>
>     >>> Let's leave the above for a separate RFE. I think assert is not
>     >>> sufficient for the check. Also, why ResolvedMethodName, Module and
>     >>> MemberName cannot be part of the graph?
>     >>>
>     >>>
>     >> I added the following comment:
>     >>
>     >>? ? ?DEBUG_ONLY({
>     >>? ? ? ? ?// The following are classes in
>     share/classfile/javaClasses.cpp
>     >> that have injected native pointers
>     >>? ? ? ? ?// to metaspace objects. To support these classes, we
>     need to add
>     >> relocation code similar to
>     >>? ? ? ? ?// java_lang_Class::update_archived_mirror_native_pointers.
>     >>? ? ? ? ?Klass* klass = orig_obj->klass();
>     >>? ? ? ? ?assert(klass != SystemDictionary::Module_klass() &&
>     >>? ? ? ? ? ? ? ? klass !=
>     SystemDictionary::ResolvedMethodName_klass() &&
>     >>
>     > It's too restrictive to exclude those objects from the archived
>     object
>     > graph because metadata relocation, since metadata relocation is
>     rare.
>     > The trade-off doesn't seem to buy us much.
>     >
>     > Do you plan to add the needed relocation code?
>
>     I looked more into this. Actually we cannot handle these 5 classes at
>     all, even without archive relocation:
>
>     [1] #define MODULE_INJECTED_FIELDS(macro) \
>     ?? macro(java_lang_Module, module_entry, intptr_signature, false)
>
>     ->? module_entry is malloc'ed
>
>     [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>     ?? macro(java_lang_invoke_ResolvedMethodName, vmholder,
>     object_signature, false) \
>     ?? macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>     intptr_signature, false)
>
>     -> these fields are related to method handles and lambda forms, etc.
>     They can't be easily be archived without implementing lambda form
>     archiving. (I did a prototype; it's very complex and fragile).
>
>     [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>     macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>     vmdependencies, intptr_signature, false) \
>     macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>     last_cleanup, long_signature, false)
>
>     -> vmdependencies is malloc'ed.
>
>     [4] #define
>     MEMBERNAME_INJECTED_FIELDS(macro)?????????????????????????????? \
>     ?? macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>     false)
>
>     -> this one is probably OK. Despite being declared as
>     'intptr_signature', it seems to be used just as an integer. However,
>     MemberNames are typically used with [2] and [3]. So let's just
>     forbid it
>     to be safe.
>
>     [2] [3] [4] are not used directly by regular Java code and are
>     unlikely
>     to be referenced (directly or indirectly) by static fields (except
>     for
>     the static fields in the classes in java.lang.invoke, which we
>     probably
>     won't support for heap archiving due to the problem I described for
>     [2]). Objects of these types are typically referenced via constant
>     pool
>     entries.
>
>     [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>     ?? macro(java_lang_ClassLoader, loader_data, intptr_signature, false)
>
>     -> loader_data is malloc'ed.
>
>     So, I will change the DEBUG_ONLY into a product-mode check, and quit
>     dumping if these objects are found in the object subgraph.
>
>
> Sounds good. Can you please also add a comment with explanation.
>
> For??ClassLoader and?Module, it worth considering caching the 
> additional native data some time in the future. Lois had suggested the 
> Module part a while ago.

I think we can do that if/when we archive Modules directly into the 
shared heap.


>
>
>
>
>
>     Maybe we should backport the check to older versions as well?
>
>
> We should discuss with Andrew Haley for backports to JDK 11 update 
> releases. Since the current OpenJDK 11 only applies Java heap 
> archiving to a restricted set of JDK library code, I think it is safe 
> without the new check.
>
> For non-LTS releases, it might not be worthwhile as they may not be 
> widely used?

I agree. FYI, we (Oracle) have no plan for backporting more types of 
heap object archiving, so the decision would be up to whoever that 
decides to do so.

Thanks
- Ioi


>
> Thanks,
> Jiangli
>
>
>     >
>     >>> - src/hotspot/share/memory/metaspace.cpp
>     >>>
>     >>> 1036? ?metaspace_rs = ReservedSpace(compressed_class_space_size(),
>     >>> 1037 ? _reserve_alignment,
>     >>> 1038 ? large_pages,
>     >>> 1039 ? requested_addr);
>     >>>
>     >>> Please fix indentation.
>     >> Fixed.
>     >>
>     >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>     >>>
>     >>>? ? ?78? ?enum SpecialRef {
>     >>>? ? ?79? ? ?_method_entry_ref
>     >>>? ? ?80? ?};
>     >>>
>     >>> Are there other pointers that are not references to
>     MetaspaceObj? If
>     >>> _method_entry_ref is the only type, it's probably not worth
>     defining
>     >>> SpecialRef?
>     >> There may be more types in the future, so I want to have a
>     stable API
>     >> that can be easily expanded without touching all the code that
>     uses it.
>     >>
>     >>
>     >>> - src/hotspot/share/memory/metaspaceShared.hpp
>     >>>
>     >>>? ? ?42 enum MapArchiveResult {
>     >>>? ? ?43? ?MAP_ARCHIVE_SUCCESS,
>     >>>? ? ?44? ?MAP_ARCHIVE_MMAP_FAILURE,
>     >>>? ? ?45? ?MAP_ARCHIVE_OTHER_FAILURE
>     >>>? ? ?46 };
>     >>>
>     >>> If we want to define different failure types, it's probably worth
>     >>> using separate types for relocation failure and validation
>     failure.
>     >> For now, I just need to distinguish between MMAP_FAILURE (where
>     I should
>     >> attempt to remap at an alternative address) and OTHER_FAILURE
>     (where the
>     >> CDS archive loading will fail -- due to validation error,
>     insufficient
>     >> memory, etc -- without attempting to remap.)
>     >>
>     >>> ---
>     >>>
>     >>>? ? 193? ?static intx _mapping_delta; // FIXME rename
>     >>>
>     >>> How about _relocation_delta?
>     >> Changed as suggested.
>     >>
>     >>> - src/hotspot/share/oops/instanceKlass
>     >>>
>     >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>     >>>
>     >>> The use of _disable_method_binary_search is not necessary. You
>     can use
>     >>> DynamicDumpSharedSpaces for the purpose. That would make things
>     >>> cleaner.
>     >> If we always disable the binary search when
>     DynamicDumpSharedSpaces is
>     >> true, it will slow down normal execution of the Java program when
>     >> -XX:ArchiveClassesAtExit has been specified, but the program
>     hasn't exited.
>     > Could you please add some comments to _disable_method_binary_search
>     > with the above explanation? Thanks.
>
>     OK
>     >
>     >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>     >>>
>     >>>? ? ?76? ? ? ? ? ? ? ? ? ? ?if (name.equals("s0") ||
>     name.equals("s1")) {
>     >>>? ? ?77? ? ? ? ? ? ? ? ? ? ? ?// String regions are listed at
>     the end and
>     >>> they may not be fully occupied.
>     >>>? ? ?78? ? ? ? ? ? ? ? ? ? ? ?break;
>     >>>? ? ?79? ? ? ? ? ? ? ? ? ? ?} else if (name.equals("bm")) {
>     >>>? ? ?80? ? ? ? ? ? ? ? ? ? ? ?// Bitmap space does not have a
>     requested address.
>     >>>? ? ?81? ? ? ? ? ? ? ? ? ? ? ?break;
>     >>>
>     >>> It's not part of your change, but could you please fix line 76
>     - 78
>     >>> since it is trivial. It seems the lines can be removed.
>     >> Removed.
>     >>
>     >>> - /src/hotspot/share/memory/archiveUtils.hpp
>     >>> The file name does not match with the macro '#ifndef
>     >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>     >>> archiveUtils.* ? archiveRelocator.hpp and archiveRelocator.cpp are
>     >>> more descriptive.
>     >> I named the file archiveUtils.hpp so we can move other misc
>     stuff used
>     >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>     >> metaspaceShared.hpp), since theses are not used by the majority
>     of the
>     >> files that use metaspaceShared.hpp.
>     >>
>     >> I fixed the ifdef.
>     >>
>     >>> - src/hotspot/share/memory/archiveUtils.cpp
>     >>>
>     >>>? ? ?36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>     address*
>     >>> ptr_base, address* ptr_end) {
>     >>>? ? ?37? ?assert(_ptrmap == NULL, "initialize only once");
>     >>>? ? ?38? ?_ptr_base = ptr_base;
>     >>>? ? ?39? ?_ptr_end = ptr_end;
>     >>>? ? ?40? ?_compacted = false;
>     >>>? ? ?41? ?_ptrmap = ptrmap;
>     >>>? ? ?42? ?_ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>     default
>     >>> archive is about 12MB.
>     >>>? ? ?43 }
>     >>>
>     >>> Could we do a better estimate here? We could guesstimate the size
>     >>> based on the current used class space and metaspace size. It's
>     okay if
>     >>> a larger bitmap used, since it can be reduced after all
>     marking are
>     >>> done.
>     >> The bitmap is automatically expanded when necessary in
>     >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>     of the
>     >> total archive size, so even if we do expand, the cost will be
>     trivial.
>     > The initial value is based on the default CDS archive. When dealing
>     > with a really large archive, it would have to re-grow many times.
>     > Also, using a hard-coded value is less desirable.
>
>     OK, I changed it to the following
>
>     ?? // Use this as initial guesstimate. We should need less space
>     in the
>     ?? // archive, but if we're wrong the bitmap will be expanded
>     automatically.
>     ?? size_t estimated_archive_size = MetaspaceGC::capacity_until_GC();
>     ?? // But set it smaller in debug builds so we always test the
>     expansion
>     code.
>     ?? // (Default archive is about 12MB).
>     ?? DEBUG_ONLY(estimated_archive_size = 6 * M);
>
>     ?? // We need one bit per pointer in the archive.
>     ?? _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>
>
>     Thanks!
>     - Ioi
>
>     >
>     >>>
>     >>>
>     >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>     <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>     >>>> Hi Ioi,
>     >>>>
>     >>>> This is another great step for CDS usability improvement.
>     Thank you!
>     >>>>
>     >>>> I have a high level question (or request): could we consider
>     >>>> separating the relocation work for 'direct' class metadata
>     from other
>     >>>> types of metadata (such as the shared system dictionary,
>     symbol table,
>     >>>> etc)? Initially we only relocate the tables and other
>     archived global
>     >>>> data. When each archived class is being loaded, we can
>     relocate all
>     >>>> the pointers within the current class. We could find the
>     segment (for
>     >>>> the current class) in the bitmap and update the pointers
>     within the
>     >>>> segment. That way we can reduce initial startup costs and
>     also avoid
>     >>>> relocating class data that's not used at runtime. In some
>     real world
>     >>>> large systems, an archive may contain extremely large number of
>     >>>> classes.
>     >>>>
>     >>>> Following are partial review comments so we can move things
>     forward.
>     >>>> Still going through the rest of the changes.
>     >>>>
>     >>>> - src/hotspot/share/classfile/javaClasses.cpp
>     >>>>
>     >>>> 1218 void
>     java_lang_Class::update_archived_mirror_native_pointers(oop
>     >>>> archived_mirror) {
>     >>>> 1219? ?Klass* k =
>     ((Klass*)archived_mirror->metadata_field(_klass_offset));
>     >>>> 1220? ?if (k != NULL) { // k is NULL for the primitive
>     classes such as
>     >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>     >>>> 1221 ?archived_mirror->metadata_field_put(_klass_offset,
>     >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>     >>>> 1222? ?}
>     >>>> 1223 ...
>     >>>>
>     >>>> Primitive type mirrors are handled separately. Could you
>     please verify
>     >>>> if this call path happens for primitive type mirror?
>     >>>>
>     >>>> To answer my question above, looks like you added the
>     following, which
>     >>>> is to be used for primitive type mirrors. That seems to be
>     the reason
>     >>>> why update_archived_mirror_native_pointers is trying to also
>     cover
>     >>>> primitive type. It better to have a separate API for
>     primitive type
>     >>>> mirror, which is cleaner. And, we also can replace the above
>     check at
>     >>>> line 1220 to be an assert for regular mirrors.
>     >>>>
>     >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>     >>>> +? do_oop(p);
>     >>>> +? oop mirror = *p;
>     >>>> +? if (mirror != NULL) {
>     >>>> +
>     java_lang_Class::update_archived_mirror_native_pointers(mirror);
>     >>>> +? }
>     >>>> +}
>     >>>> +
>     >>>>
>     >>>> How about renaming update_archived_mirror_native_pointers to
>     >>>> update_archived_mirror_klass_pointers.
>     >>>>
>     >>>> It would be good to pass the current klass as an argument. We can
>     >>>> verify the relocated pointer matches with the current klass
>     pointer.
>     >>>>
>     >>>> We should also check if relocation is necessary before
>     spending cycles
>     >>>> to obtain the klass pointer from the mirror.
>     >>>>
>     >>>> 1252 ?update_archived_mirror_native_pointers(m);
>     >>>> 1253
>     >>>> 1254? ?// mirror is archived, restore
>     >>>> 1255 ?assert(HeapShared::is_archived_object(m), "must be archived
>     >>>> mirror object");
>     >>>> 1256? ?Handle mirror(THREAD, m);
>     >>>>
>     >>>> Could we move the line at 1252 after the assert at line 1255?
>     >>>>
>     >>>> - src/hotspot/share/include/cds.h
>     >>>>
>     >>>>? ? ?47? ?int? ? ?_mapped_from_file;? // Is this region mapped
>     from a file?
>     >>>>? ? ?48? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?// If false, this region was
>     >>>> initialized using os::read().
>     >>>>
>     >>>> Is the new field truly needed? It seems we could use
>     _mapped_base to
>     >>>> determine if a region is mapped or not?
>     >>>>
>     >>>> - src/hotspot/share/memory/dynamicArchive.cpp
>     >>>>
>     >>>> Could you please remove the debugging print code in
>     >>>> dynamic_dump_method_comparator? Or convert those to logging
>     output if
>     >>>> they are helpful.
>     >>>>
>     >>>> Will send out the rest of the review comments later.
>     >>>>
>     >>>> Best,
>     >>>>
>     >>>> Jiangli
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
>     <mailto:ioi.lam at oracle.com>> wrote:
>     >>>>> Bug:
>     >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
>     >>>>>
>     >>>>> Webrev:
>     >>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
>     >>>>>
>     >>>>> Design:
>     >>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
>     >>>>>
>     >>>>>
>     >>>>> Overview:
>     >>>>>
>     >>>>> The CDS archive is mmaped to a fixed address range (starting at
>     >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
>     >>>>> requested address range is not available (usually due to Address
>     >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give up and
>     >>>>> will load classes dynamically using class files.
>     >>>>>
>     >>>>> [a] This causes slow down in JVM start-up.
>     >>>>> [b] Handling of mapping failures causes unnecessary
>     complication in
>     >>>>>? ? ? ? the CDS tests.
>     >>>>>
>     >>>>> Here are some preliminary benchmarking results (using
>     default CDS archive,
>     >>>>> running helloworld):
>     >>>>>
>     >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
>     >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
>     >>>>> (c) 86.2ms (CDS disabled)
>     >>>>>
>     >>>>> The small degradation in (b) is caused by the relocation of
>     >>>>> absolute pointers embedded in the CDS archive. However, it is
>     >>>>> still a big improvement over case (c)
>     >>>>>
>     >>>>> Please see the design doc (link above) for details.
>     >>>>>
>     >>>>> Thanks
>     >>>>> - Ioi
>     >>>>>
>


From alex.buckley at oracle.com  Wed Nov  6 23:57:46 2019
From: alex.buckley at oracle.com (Alex Buckley)
Date: Wed, 6 Nov 2019 15:57:46 -0800
Subject: RFR: CSR JVM support for records
In-Reply-To: <fb4cbab3-2a0f-2c1b-cd2e-47cf24692f33@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
 <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>
 <6af860a2-20d4-a194-d4ad-c17e58df6ffa@oracle.com>
 <91e9ce5f-b636-ef6d-850c-6c9fa454946b@oracle.com>
 <fb4cbab3-2a0f-2c1b-cd2e-47cf24692f33@oracle.com>
Message-ID: <bfbc0759-17d6-e909-9047-2781f6f09d32@oracle.com>

Thanks for updating. I made small edits (assuming that "The format of 
the record attribute is consistently checked" is meant to refer to 
consistency checking a.k.a. format checking, and not to checking being 
performed on a consistent basis) ... a CSR isn't really the place to 
suggest that a Record attribute could potentially be useful for X or Y, 
but it's time to move on.

Alex

On 11/6/2019 11:17 AM, Harold Seigel wrote:
> Hi Alex,
> 
> I updated the CSR, hopefully with the info you requested.
> 
> Thanks, Harold
> 
> On 11/6/2019 1:28 PM, Alex Buckley wrote:
>> On 11/6/2019 10:14 AM, Harold Seigel wrote:
>>> Note that the JVM does consistency check the Records attribute at 
>>> class load time, not at first use by reflection. So, perhaps this 
>>> sentence:
>>>
>>> ??? Note that if no reflection is performed then the abstract JVM does
>>> ??? not care about the Record attribute in any way.
>>>
>>> to something like
>>>
>>> ??? The format of the record attribute is checked even if no reflection
>>> ??? is performed.
>>
>> Given how Record is described in JVMS 4.7 ("each of these attributes 
>> must be recognized and correctly read by an implementation of the Java 
>> Virtual Machine"), it makes sense that the HotSpot JVM is format 
>> checking a Record attribute at load time. So, yes, please make the 
>> change you describe above, and please explicitly compare Record to the 
>> format checking performed for Exceptions, InnerClasses, etc, and 
>> contrast Record with the lack of format checking for MethodParameters, 
>> Module, etc.
>>
>> Alex

From david.holmes at oracle.com  Thu Nov  7 00:03:39 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 7 Nov 2019 10:03:39 +1000
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <90322374-4ae4-15c5-96f1-a2781b7e35b5@oracle.com>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
 <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
 <a705e140-8a6f-3178-b229-88325a8e2584@loongson.cn>
 <90322374-4ae4-15c5-96f1-a2781b7e35b5@oracle.com>
Message-ID: <8a29b793-89cf-8a79-3e30-1142f829feff@oracle.com>

Ping! Need a Reviewer please:

http://cr.openjdk.java.net/~dholmes/8233454/webrev/

Thanks,
David

On 5/11/2019 2:52 pm, David Holmes wrote:
> Hi Jie,
> 
> On 5/11/2019 12:49 pm, Jie Fu wrote:
>> Hi David,
>>
>> I had tested your patch (without the Shenandoah fix) on VMs 
>> with/without the JRF feature and both of them had passed for the 
>> particular reproducer.
>> So thanks again for fixing it in the shared runtime code.
> 
> Thanks for verifying that. My own testing has also been good so far.
> 
> Just need an official Reviewer now.
> 
> Thanks again,
> David
> -----
> 
>> Best regards,
>> Jie
>>
>> On 2019/11/5 ??7:26, David Holmes wrote:
>>> Hi Jie,
>>>
>>> Thanks for filing this and attempting a fix. As per the bug report 
>>> the underlying issue has now been fixed in Shenandoah, but I want to 
>>> make the interrupt code more resilient as well:
>>>
>>> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
>>>
>>> I was unable to reproduce the Shenandoah crash so if you could test 
>>> this patch I would appreciate it - thanks. (Without the Shenandoah 
>>> fix of course :) )
>>>
>>> Meanwhile I'm putting the patch through other testing.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>> On 4/11/2019 11:13 pm, David Holmes wrote:
>>>> Hi Jie,
>>>>
>>>> I will need to take a deeper look at this. This is a problem 
>>>> specific to Shenadoah GC as it is triggering a sleep whilst a thread 
>>>> is still in the process of attaching to the JVM :(
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 4/11/2019 7:16 pm, Jie Fu wrote:
>>>>> Hi all,
>>>>>
>>>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>>>>
>>>>> According to the comment [1], the assert seems to miss the case for 
>>>>> threads attached via JNI.
>>>>> For more info, please refer to the JBS.
>>>>>
>>>>> Could you please review it and give me some advice?
>>>>>
>>>>> Thanks a lot.
>>>>> Best regards,
>>>>> Jie
>>>>>
>>>>> [1] 
>>>>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>>>>
>>>>>
>>>>>
>>

From daniel.daugherty at oracle.com  Thu Nov  7 00:23:10 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 6 Nov 2019 19:23:10 -0500
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <8a29b793-89cf-8a79-3e30-1142f829feff@oracle.com>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
 <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
 <a705e140-8a6f-3178-b229-88325a8e2584@loongson.cn>
 <90322374-4ae4-15c5-96f1-a2781b7e35b5@oracle.com>
 <8a29b793-89cf-8a79-3e30-1142f829feff@oracle.com>
Message-ID: <207ee953-6365-8d8b-1912-912df3588df8@oracle.com>

On 11/6/19 7:03 PM, David Holmes wrote:
> Ping! Need a Reviewer please:
>
> http://cr.openjdk.java.net/~dholmes/8233454/webrev/

src/hotspot/share/classfile/javaClasses.cpp
 ??? No comments.

src/hotspot/share/runtime/thread.cpp
 ??? No comments.

Thumbs up.

Dan


>
> Thanks,
> David
>
> On 5/11/2019 2:52 pm, David Holmes wrote:
>> Hi Jie,
>>
>> On 5/11/2019 12:49 pm, Jie Fu wrote:
>>> Hi David,
>>>
>>> I had tested your patch (without the Shenandoah fix) on VMs 
>>> with/without the JRF feature and both of them had passed for the 
>>> particular reproducer.
>>> So thanks again for fixing it in the shared runtime code.
>>
>> Thanks for verifying that. My own testing has also been good so far.
>>
>> Just need an official Reviewer now.
>>
>> Thanks again,
>> David
>> -----
>>
>>> Best regards,
>>> Jie
>>>
>>> On 2019/11/5 ??7:26, David Holmes wrote:
>>>> Hi Jie,
>>>>
>>>> Thanks for filing this and attempting a fix. As per the bug report 
>>>> the underlying issue has now been fixed in Shenandoah, but I want 
>>>> to make the interrupt code more resilient as well:
>>>>
>>>> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
>>>>
>>>> I was unable to reproduce the Shenandoah crash so if you could test 
>>>> this patch I would appreciate it - thanks. (Without the Shenandoah 
>>>> fix of course :) )
>>>>
>>>> Meanwhile I'm putting the patch through other testing.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>> On 4/11/2019 11:13 pm, David Holmes wrote:
>>>>> Hi Jie,
>>>>>
>>>>> I will need to take a deeper look at this. This is a problem 
>>>>> specific to Shenadoah GC as it is triggering a sleep whilst a 
>>>>> thread is still in the process of attaching to the JVM :(
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 4/11/2019 7:16 pm, Jie Fu wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>>>>>
>>>>>> According to the comment [1], the assert seems to miss the case 
>>>>>> for threads attached via JNI.
>>>>>> For more info, please refer to the JBS.
>>>>>>
>>>>>> Could you please review it and give me some advice?
>>>>>>
>>>>>> Thanks a lot.
>>>>>> Best regards,
>>>>>> Jie
>>>>>>
>>>>>> [1] 
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>>>>>
>>>>>>
>>>>>>
>>>


From david.holmes at oracle.com  Thu Nov  7 00:24:46 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 7 Nov 2019 10:24:46 +1000
Subject: RFR: 8233454: Test fails with assert(!is_init_completed(),
 "should only happen during init") after JDK-8229516
In-Reply-To: <207ee953-6365-8d8b-1912-912df3588df8@oracle.com>
References: <ac7a98cf-6730-bbd2-aa52-7bb972a37873@loongson.cn>
 <e478a4a2-7105-8de1-d72e-0ce10d8d34ae@oracle.com>
 <ff418d45-a848-e28e-1716-a777fd9ed5a3@oracle.com>
 <a705e140-8a6f-3178-b229-88325a8e2584@loongson.cn>
 <90322374-4ae4-15c5-96f1-a2781b7e35b5@oracle.com>
 <8a29b793-89cf-8a79-3e30-1142f829feff@oracle.com>
 <207ee953-6365-8d8b-1912-912df3588df8@oracle.com>
Message-ID: <daae0155-8579-4819-48d4-962d2be79126@oracle.com>

Thanks Dan! Much appreciated.

David

On 7/11/2019 10:23 am, Daniel D. Daugherty wrote:
> On 11/6/19 7:03 PM, David Holmes wrote:
>> Ping! Need a Reviewer please:
>>
>> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
> 
> src/hotspot/share/classfile/javaClasses.cpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/thread.cpp
>  ??? No comments.
> 
> Thumbs up.
> 
> Dan
> 
> 
>>
>> Thanks,
>> David
>>
>> On 5/11/2019 2:52 pm, David Holmes wrote:
>>> Hi Jie,
>>>
>>> On 5/11/2019 12:49 pm, Jie Fu wrote:
>>>> Hi David,
>>>>
>>>> I had tested your patch (without the Shenandoah fix) on VMs 
>>>> with/without the JRF feature and both of them had passed for the 
>>>> particular reproducer.
>>>> So thanks again for fixing it in the shared runtime code.
>>>
>>> Thanks for verifying that. My own testing has also been good so far.
>>>
>>> Just need an official Reviewer now.
>>>
>>> Thanks again,
>>> David
>>> -----
>>>
>>>> Best regards,
>>>> Jie
>>>>
>>>> On 2019/11/5 ??7:26, David Holmes wrote:
>>>>> Hi Jie,
>>>>>
>>>>> Thanks for filing this and attempting a fix. As per the bug report 
>>>>> the underlying issue has now been fixed in Shenandoah, but I want 
>>>>> to make the interrupt code more resilient as well:
>>>>>
>>>>> http://cr.openjdk.java.net/~dholmes/8233454/webrev/
>>>>>
>>>>> I was unable to reproduce the Shenandoah crash so if you could test 
>>>>> this patch I would appreciate it - thanks. (Without the Shenandoah 
>>>>> fix of course :) )
>>>>>
>>>>> Meanwhile I'm putting the patch through other testing.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>> On 4/11/2019 11:13 pm, David Holmes wrote:
>>>>>> Hi Jie,
>>>>>>
>>>>>> I will need to take a deeper look at this. This is a problem 
>>>>>> specific to Shenadoah GC as it is triggering a sleep whilst a 
>>>>>> thread is still in the process of attaching to the JVM :(
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 4/11/2019 7:16 pm, Jie Fu wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233454
>>>>>>> Webrev: http://cr.openjdk.java.net/~jiefu/8233454/webrev.00/
>>>>>>>
>>>>>>> According to the comment [1], the assert seems to miss the case 
>>>>>>> for threads attached via JNI.
>>>>>>> For more info, please refer to the JBS.
>>>>>>>
>>>>>>> Could you please review it and give me some advice?
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>> Best regards,
>>>>>>> Jie
>>>>>>>
>>>>>>> [1] 
>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/2700c409ff10/src/hotspot/share/runtime/thread.hpp#l1249 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
> 

From fujie at loongson.cn  Thu Nov  7 00:42:36 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 7 Nov 2019 08:42:36 +0800
Subject: RFR(trivial): 8233671: [TESTBUG]
 runtime/cds/appcds/sharedStrings/FlagCombo.java fails to compile without jfr
In-Reply-To: <fa4f3233-b632-cc91-a222-261296bcb047@oracle.com>
References: <b0f4c15f-0345-2054-cb03-050b22e5741b@loongson.cn>
 <fa4f3233-b632-cc91-a222-261296bcb047@oracle.com>
Message-ID: <f061a242-bbf6-25bb-b8bb-a135e095e03d@loongson.cn>

Thank you so much, Ioi.

On 2019/11/7 ??2:22, Ioi Lam wrote:
> Hi Jie,
>
> Looks good and trivial to me. Thanks for fixing this. I've sponsored 
> the changes and pushed.
>
> http://hg.openjdk.java.net/jdk/jdk/rev/38d4202154f2
>
> - Ioi
>
> On 11/5/19 11:24 PM, Jie Fu wrote:
>> Hi all,
>>
>> May I get reviews for the one-line change?
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8233671
>> Webrev: http://cr.openjdk.java.net/~jiefu/8233671/webrev.00/
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>>
>


From felix.yang at huawei.com  Thu Nov  7 01:17:05 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Thu, 7 Nov 2019 01:17:05 +0000
Subject: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when
 profiling return and parameters type
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820ED6027D6E@dggeml527-mbx.china.huawei.com>

Hi,

   Please review the following patch:

      Bug: https://bugs.openjdk.java.net/browse/JDK-8233466

Webrev: http://cr.openjdk.java.net/~fyang/8233466/webrev.00/


When profiling return and parameters type from the interpreter on aarch64 platform, 'mdp' is loaded by test_method_data_pointer which is called by profile_return_type & profile_parameters_type.

It's not necessary to load mdo before calling __ profile_return_type or __ profile_parameters_type.


Passed tier1-3 testing.


Thanks,

Felix

From tobias.hartmann at oracle.com  Thu Nov  7 05:54:45 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 7 Nov 2019 06:54:45 +0100
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter
 with CDS due to code cache exhaustion
In-Reply-To: <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
References: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
 <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
Message-ID: <3e45e244-31a0-3bd2-4b6c-acd1478ace5f@oracle.com>

Thanks Vladimir.

Best regards,
Tobias

On 06.11.19 19:01, Vladimir Kozlov wrote:
> CC to runtime group too.
> 
> Looks good to me.
> 
> Thanks,
> Vladimir
> 
> On 11/6/19 5:34 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8233491
>> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/
>>
>> When running a stress test with CDS, we fail to create adapters when linking a method from a shared
>> class because the code cache is full. This case is not properly handled by the CDS specific code and
>> instead of throwing a VirtualMachineError, we crash because "entry" is NULL.
>>
>> I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the
>> class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg
>> test. However, I've verified that the patch fixes the problem.
>>
>> Thanks,
>> Tobias
>>
>> [1]
>> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462
>>
>>

From tobias.hartmann at oracle.com  Thu Nov  7 06:00:43 2019
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 7 Nov 2019 07:00:43 +0100
Subject: RFR(XS): 8233491: Crash in AdapterHandlerLibrary::get_adapter
 with CDS due to code cache exhaustion
In-Reply-To: <5800caa0-b7d4-fadd-0516-53351eb014fe@oracle.com>
References: <c44d6d02-8244-2ef3-ac8b-ba7f12e65eb4@oracle.com>
 <af6e664b-8942-96b3-aba2-220679ba4b18@oracle.com>
 <5800caa0-b7d4-fadd-0516-53351eb014fe@oracle.com>
Message-ID: <42249c53-ad83-7ed7-8926-5f6f28212f8c@oracle.com>

Thanks Ioi!

Best regards,
Tobias

On 06.11.19 19:12, Ioi Lam wrote:
> Looks good to me. Thanks for fixing this. I think I introduced this bug :-(
> 
> - Ioi
> 
> On 11/6/19 10:01 AM, Vladimir Kozlov wrote:
>> CC to runtime group too.
>>
>> Looks good to me.
>>
>> Thanks,
>> Vladimir
>>
>> On 11/6/19 5:34 AM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch:
>>> https://bugs.openjdk.java.net/browse/JDK-8233491
>>> http://cr.openjdk.java.net/~thartmann/8233491/webrev.00/
>>>
>>> When running a stress test with CDS, we fail to create adapters when linking a method from a shared
>>> class because the code cache is full. This case is not properly handled by the CDS specific code and
>>> instead of throwing a VirtualMachineError, we crash because "entry" is NULL.
>>>
>>> I'm able to spuriously reproduce this with a test (see [1]) but since the problem depends on the
>>> class loading sequence, I was not able to make it more reliable or convert it to a robust jtreg
>>> test. However, I've verified that the patch fixes the problem.
>>>
>>> Thanks,
>>> Tobias
>>>
>>> [1]
>>> https://bugs.openjdk.java.net/browse/JDK-8233491?focusedCommentId=14298462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14298462
>>>
>>>
> 

From goetz.lindenmaier at sap.com  Thu Nov  7 10:24:06 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 7 Nov 2019 10:24:06 +0000
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
Message-ID: <AM6PR02MB53475BBD24B9F49F50C629D7EC780@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

if your CI assures proper building with gcc 4.8.5, why don't you 
state that on the "supported build platforms" page?
https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms

(If you don't have access there, we could put it up there 
for you.)

@Oracle: it would also be nice to find there that you switched to 
gcc 8.  

Best regards,
  Goetz.

> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
> On Behalf Of Aleksey Shipilev
> Sent: Mittwoch, 6. November 2019 12:33
> To: hotspot-runtime-dev at openjdk.java.net
> Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
> 
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8233698
> 
> Our current RHEL-based CIs fail to compile jdk/jdk. That C++14 compat is the
> gift that keeps on
> giving! The fix is to get even deeper into the warning disabling story:
> 
> diff -r bb2a436e616c src/hotspot/share/memory/operator_new.cpp
> --- a/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 13:43:25
> 2019 +0800
> +++ b/src/hotspot/share/memory/operator_new.cpp Wed Nov 06 12:31:23
> 2019 +0100
> @@ -89,11 +89,13 @@
>    fatal("Should not call global delete []");
>  }
> 
>  #ifdef __GNUG__
>  // Warning disabled for gcc 5.4
> +// Warning for unknown warning disabled for gcc 4.8.5
>  PRAGMA_DIAG_PUSH
> +PRAGMA_DISABLE_GCC_WARNING("-Wpragmas")
>  PRAGMA_DISABLE_GCC_WARNING("-Wc++14-compat")
>  #endif // __GNUG__
> 
>  void operator delete(void* p, size_t size) throw() {
>    fatal("Should not call global sized delete");
> 
> Testing: gcc 4.8.5 build
> 
> --
> Thanks,
> -Aleksey


From david.holmes at oracle.com  Thu Nov  7 10:45:01 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 7 Nov 2019 20:45:01 +1000
Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
Message-ID: <9cb34006-e88f-c0a9-776d-fb7c6791a4dd@oracle.com>

Bug: https://bugs.openjdk.java.net/browse/JDK-8233784

Patch below.

Getting the fix tested will take a little while and we are getting 
numerous failures in our CI testing. I ran all the scenario tests 
multiple times to try and find all that fail due to this problem. It may 
not be exhaustive, so if needed I'll add more later.

Thanks,
David
-----

iff -r bb2a436e616c test/hotspot/jtreg/ProblemList.txt
--- a/test/hotspot/jtreg/ProblemList.txt
+++ b/test/hotspot/jtreg/ProblemList.txt
@@ -206,4 +206,14 @@

 
vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn001/forceEarlyReturn001.java 
7199837 generic-all

-#############################################################################
+vmTestbase/nsk/jvmti/scenarios/allocation/AP01/ap01t001/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t003/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t002/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t003/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.java 
8233549 generic-all
+vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t006/TestDescription.java 
8233549 generic-all
+
+#############################################################################

From goetz.lindenmaier at sap.com  Thu Nov  7 11:39:52 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 7 Nov 2019 11:39:52 +0000
Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
In-Reply-To: <9cb34006-e88f-c0a9-776d-fb7c6791a4dd@oracle.com>
References: <9cb34006-e88f-c0a9-776d-fb7c6791a4dd@oracle.com>
Message-ID: <AM6PR02MB53470D5D7CF9D85459B310BEEC780@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi David, 

we also see these failures in our CI. It is intermittent and on all platforms.
It happens since Nov 4.

We saw the following ones you already listed:
  vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.java
  vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.java

Further we saw this one, could you please add it, too?
  vmTestbase/nsk/jvmti/scenarios/events/EM07/em07t002/TestDescription.java
It failed once with the same kind of message.

Besides that, the change looks good.

Further we saw these failing, but I'm not sure it's the same issue:
  vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t001/TestDescription.java
  vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.java
with this message:

CASE #3:
Allocating objects...
Start heap iteration thread and field modification loop
thread3 started.
- ap04t002.cpp, 346: Calling IterateOverObjectsReachableFromObject...
- ap04t002.cpp, 352: IterateOverObjectsReachableFromObject finished.
- ap04t002.cpp, 354: Iterations count: 247089
- ap04t002.cpp, 355: Modifications count: 14
- ap04t002.cpp, 358: Errors detected: 0
Wait for completion thread to finish
thread3 finished.
Cleaning tags and references to objects...
The following fake exception stacktrace is for failure analysis. 
nsk.share.Fake_Exception_for_RULE_Creation: (jvmti_tools.cpp:683) error
	at nsk_lvcomplain(nsk_tools.cpp:172)
# ERROR: jvmti_tools.cpp, 683: error
#   jvmti error: code=52, name=JVMTI_ERROR_INTERRUPT
CASE #3 finished.

CASE #4:
Allocating objects...
----------System.err:(18/4306)----------
java.lang.AssertionError: .../jvm_14/bin/java, -Xmx768m, -Djava.awt.headless=true,  ... -agentlib:ap04t002=-waittime=5 -verbose, nsk.jvmti.scenarios.allocation.AP04.ap04t002] exit code is 52
	at ExecDriver.main(ExecDriver.java:137)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:833)


> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
> On Behalf Of David Holmes
> Sent: Donnerstag, 7. November 2019 11:45
> To: serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233784
> 
> Patch below.
> 
> Getting the fix tested will take a little while and we are getting
> numerous failures in our CI testing. I ran all the scenario tests
> multiple times to try and find all that fail due to this problem. It may
> not be exhaustive, so if needed I'll add more later.
> 
> Thanks,
> David
> -----
> 
> iff -r bb2a436e616c test/hotspot/jtreg/ProblemList.txt
> --- a/test/hotspot/jtreg/ProblemList.txt
> +++ b/test/hotspot/jtreg/ProblemList.txt
> @@ -206,4 +206,14 @@
> 
> 
> vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn00
> 1/forceEarlyReturn001.java
> 7199837 generic-all
> 
> -
> ##################################################################
> ###########
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP01/ap01t001/TestDescription.
> java
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.
> java
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t003/TestDescription.
> java
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.
> java
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/TestDescriptio
> n.java
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t002/TestDescription.ja
> va
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t003/TestDescription.ja
> va
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.ja
> va
> 8233549 generic-all
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t006/TestDescription.ja
> va
> 8233549 generic-all
> +
> +#################################################################
> ############

From david.holmes at oracle.com  Thu Nov  7 12:23:45 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 7 Nov 2019 22:23:45 +1000
Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
In-Reply-To: <AM6PR02MB53470D5D7CF9D85459B310BEEC780@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <9cb34006-e88f-c0a9-776d-fb7c6791a4dd@oracle.com>
 <AM6PR02MB53470D5D7CF9D85459B310BEEC780@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <17f834a6-eb52-f69f-c827-d5856280955e@oracle.com>

Hi Goetz,

Thanks for looking at this.

On 7/11/2019 9:39 pm, Lindenmaier, Goetz wrote:
> Hi David,
> 
> we also see these failures in our CI. It is intermittent and on all platforms.
> It happens since Nov 4.
> 
> We saw the following ones you already listed:
>    vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.java
>    vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.java
> 
> Further we saw this one, could you please add it, too?
>    vmTestbase/nsk/jvmti/scenarios/events/EM07/em07t002/TestDescription.java
> It failed once with the same kind of message.

My latest test run just saw that one fail too.

> Besides that, the change looks good.
> 
> Further we saw these failing, but I'm not sure it's the same issue:
>    vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t001/TestDescription.java
>    vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.java
> with this message:

Yes this is the same issue. I've ensured both are in the list.

I now have 11 on the list, which leaves 151 that have not yet been seen 
to fail. Any test that uses the AgentThread functionality and uses a 
RawMonitorWait is potentially affected. Unfortunately there seems to be 
no easy way to actually determine all the tests affected due to the use 
of utility functions.

I'll push what I have now so that we can stem the failures. 
Unfortunately I'm out of the office tomorrow morning.

Thanks,
David
-----

> CASE #3:
> Allocating objects...
> Start heap iteration thread and field modification loop
> thread3 started.
> - ap04t002.cpp, 346: Calling IterateOverObjectsReachableFromObject...
> - ap04t002.cpp, 352: IterateOverObjectsReachableFromObject finished.
> - ap04t002.cpp, 354: Iterations count: 247089
> - ap04t002.cpp, 355: Modifications count: 14
> - ap04t002.cpp, 358: Errors detected: 0
> Wait for completion thread to finish
> thread3 finished.
> Cleaning tags and references to objects...
> The following fake exception stacktrace is for failure analysis.
> nsk.share.Fake_Exception_for_RULE_Creation: (jvmti_tools.cpp:683) error
> 	at nsk_lvcomplain(nsk_tools.cpp:172)
> # ERROR: jvmti_tools.cpp, 683: error
> #   jvmti error: code=52, name=JVMTI_ERROR_INTERRUPT
> CASE #3 finished.
> 
> CASE #4:
> Allocating objects...
> ----------System.err:(18/4306)----------
> java.lang.AssertionError: .../jvm_14/bin/java, -Xmx768m, -Djava.awt.headless=true,  ... -agentlib:ap04t002=-waittime=5 -verbose, nsk.jvmti.scenarios.allocation.AP04.ap04t002] exit code is 52
> 	at ExecDriver.main(ExecDriver.java:137)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> 	at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> 	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
> 	at java.base/java.lang.Thread.run(Thread.java:833)
> 
> 
>> -----Original Message-----
>> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
>> On Behalf Of David Holmes
>> Sent: Donnerstag, 7. November 2019 11:45
>> To: serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233784
>>
>> Patch below.
>>
>> Getting the fix tested will take a little while and we are getting
>> numerous failures in our CI testing. I ran all the scenario tests
>> multiple times to try and find all that fail due to this problem. It may
>> not be exhaustive, so if needed I'll add more later.
>>
>> Thanks,
>> David
>> -----
>>
>> iff -r bb2a436e616c test/hotspot/jtreg/ProblemList.txt
>> --- a/test/hotspot/jtreg/ProblemList.txt
>> +++ b/test/hotspot/jtreg/ProblemList.txt
>> @@ -206,4 +206,14 @@
>>
>>
>> vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn00
>> 1/forceEarlyReturn001.java
>> 7199837 generic-all
>>
>> -
>> ##################################################################
>> ###########
>> +vmTestbase/nsk/jvmti/scenarios/allocation/AP01/ap01t001/TestDescription.
>> java
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.
>> java
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t003/TestDescription.
>> java
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.
>> java
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/TestDescriptio
>> n.java
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t002/TestDescription.ja
>> va
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t003/TestDescription.ja
>> va
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.ja
>> va
>> 8233549 generic-all
>> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t006/TestDescription.ja
>> va
>> 8233549 generic-all
>> +
>> +#################################################################
>> ############

From suenaga at oss.nttdata.com  Thu Nov  7 12:28:07 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Thu, 7 Nov 2019 21:28:07 +0900
Subject: RFR: 8233785: Incorrect JDK version is reported in hs_err log
Message-ID: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>

Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8233785
   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233785/webrev.00/

If JVM which is configured with --with-version-patch is crashed, JDK version in he_err log is incorrect.
We can get hs_err log which contains the following in header when we configure configure with "--with-version-update=0 --with-version-patch=1":

```
# JRE version: OpenJDK Runtime Environment (14.0.1+2) (build 14.0.0.1+2-TypeS)
```

Valid JDK version is "14.0.0.1", however it includes "14.0.1".
It is a bug in JDK_Version::to_string().


Thanks,

Yasumasa

From david.holmes at oracle.com  Thu Nov  7 12:39:18 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 7 Nov 2019 22:39:18 +1000
Subject: RFR: 8233785: Incorrect JDK version is reported in hs_err log
In-Reply-To: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>
References: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>
Message-ID: <6811d542-a530-5d70-5fd6-bea47de81d35@oracle.com>

Hi Yasumasa,

On 7/11/2019 10:28 pm, Yasumasa Suenaga wrote:
> Hi all,
> 
> Please review this change:
> 
>  ? JBS: https://bugs.openjdk.java.net/browse/JDK-8233785
>  ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233785/webrev.00/
> 
> If JVM which is configured with --with-version-patch is crashed, JDK 
> version in he_err log is incorrect.
> We can get hs_err log which contains the following in header when we 
> configure configure with "--with-version-update=0 --with-version-patch=1":
> 
> ```
> # JRE version: OpenJDK Runtime Environment (14.0.1+2) (build 
> 14.0.0.1+2-TypeS)
> ```
> 
> Valid JDK version is "14.0.0.1", however it includes "14.0.1".
> It is a bug in JDK_Version::to_string().

I initially missed the fact that you always print _security along with 
_patch.

I think what you have looks correct, but I'd want to double check that 
against the versioning spec to be sure.

Thanks,
David

> 
> Thanks,
> 
> Yasumasa

From goetz.lindenmaier at sap.com  Thu Nov  7 13:17:32 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 7 Nov 2019 13:17:32 +0000
Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
In-Reply-To: <17f834a6-eb52-f69f-c827-d5856280955e@oracle.com>
References: <9cb34006-e88f-c0a9-776d-fb7c6791a4dd@oracle.com>
 <AM6PR02MB53470D5D7CF9D85459B310BEEC780@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <17f834a6-eb52-f69f-c827-d5856280955e@oracle.com>
Message-ID: <AM6PR02MB53478A370098332AD77107D7EC780@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi David, 

thanks for adding our failures.

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Donnerstag, 7. November 2019 13:24
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; serviceability-dev
> <serviceability-dev at openjdk.java.net>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR: 8233784: ProblemList failing JVMTI scenario tests
> 
> Hi Goetz,
> 
> Thanks for looking at this.
> 
> On 7/11/2019 9:39 pm, Lindenmaier, Goetz wrote:
> > Hi David,
> >
> > we also see these failures in our CI. It is intermittent and on all platforms.
> > It happens since Nov 4.
> >
> > We saw the following ones you already listed:
> >
> vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.j
> ava
> >
> vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.jav
> a
> >
> > Further we saw this one, could you please add it, too?
> >
> vmTestbase/nsk/jvmti/scenarios/events/EM07/em07t002/TestDescription.jav
> a
> > It failed once with the same kind of message.
> 
> My latest test run just saw that one fail too.
> 
> > Besides that, the change looks good.
> >
> > Further we saw these failing, but I'm not sure it's the same issue:
> >
> vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t001/TestDescription.j
> ava
> >
> vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.j
> ava
> > with this message:
> 
> Yes this is the same issue. I've ensured both are in the list.
> 
> I now have 11 on the list, which leaves 151 that have not yet been seen
> to fail. Any test that uses the AgentThread functionality and uses a
> RawMonitorWait is potentially affected. Unfortunately there seems to be
> no easy way to actually determine all the tests affected due to the use
> of utility functions.
> 
> I'll push what I have now so that we can stem the failures.
> Unfortunately I'm out of the office tomorrow morning.
> 
> Thanks,
> David
> -----
> 
> > CASE #3:
> > Allocating objects...
> > Start heap iteration thread and field modification loop
> > thread3 started.
> > - ap04t002.cpp, 346: Calling IterateOverObjectsReachableFromObject...
> > - ap04t002.cpp, 352: IterateOverObjectsReachableFromObject finished.
> > - ap04t002.cpp, 354: Iterations count: 247089
> > - ap04t002.cpp, 355: Modifications count: 14
> > - ap04t002.cpp, 358: Errors detected: 0
> > Wait for completion thread to finish
> > thread3 finished.
> > Cleaning tags and references to objects...
> > The following fake exception stacktrace is for failure analysis.
> > nsk.share.Fake_Exception_for_RULE_Creation: (jvmti_tools.cpp:683) error
> > 	at nsk_lvcomplain(nsk_tools.cpp:172)
> > # ERROR: jvmti_tools.cpp, 683: error
> > #   jvmti error: code=52, name=JVMTI_ERROR_INTERRUPT
> > CASE #3 finished.
> >
> > CASE #4:
> > Allocating objects...
> > ----------System.err:(18/4306)----------
> > java.lang.AssertionError: .../jvm_14/bin/java, -Xmx768m, -
> Djava.awt.headless=true,  ... -agentlib:ap04t002=-waittime=5 -verbose,
> nsk.jvmti.scenarios.allocation.AP04.ap04t002] exit code is 52
> > 	at ExecDriver.main(ExecDriver.java:137)
> > 	at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > 	at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMetho
> dAccessorImpl.java:62)
> > 	at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Delegatin
> gMethodAccessorImpl.java:43)
> > 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> > 	at
> PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
> > 	at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > 	at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMetho
> dAccessorImpl.java:62)
> > 	at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Delegatin
> gMethodAccessorImpl.java:43)
> > 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> > 	at
> com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.j
> ava:127)
> > 	at java.base/java.lang.Thread.run(Thread.java:833)
> >
> >
> >> -----Original Message-----
> >> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net>
> >> On Behalf Of David Holmes
> >> Sent: Donnerstag, 7. November 2019 11:45
> >> To: serviceability-dev <serviceability-dev at openjdk.java.net>; hotspot-
> runtime-
> >> dev at openjdk.java.net
> >> Subject: RFR: 8233784: ProblemList failing JVMTI scenario tests
> >>
> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233784
> >>
> >> Patch below.
> >>
> >> Getting the fix tested will take a little while and we are getting
> >> numerous failures in our CI testing. I ran all the scenario tests
> >> multiple times to try and find all that fail due to this problem. It may
> >> not be exhaustive, so if needed I'll add more later.
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >> iff -r bb2a436e616c test/hotspot/jtreg/ProblemList.txt
> >> --- a/test/hotspot/jtreg/ProblemList.txt
> >> +++ b/test/hotspot/jtreg/ProblemList.txt
> >> @@ -206,4 +206,14 @@
> >>
> >>
> >>
> vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn00
> >> 1/forceEarlyReturn001.java
> >> 7199837 generic-all
> >>
> >> -
> >>
> ##################################################################
> >> ###########
> >>
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP01/ap01t001/TestDescription.
> >> java
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t002/TestDescription.
> >> java
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP04/ap04t003/TestDescription.
> >> java
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/allocation/AP12/ap12t001/TestDescription.
> >> java
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/TestDescriptio
> >> n.java
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t002/TestDescription.ja
> >> va
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t003/TestDescription.ja
> >> va
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t005/TestDescription.ja
> >> va
> >> 8233549 generic-all
> >>
> +vmTestbase/nsk/jvmti/scenarios/events/EM02/em02t006/TestDescription.ja
> >> va
> >> 8233549 generic-all
> >> +
> >>
> +#################################################################
> >> ############

From harold.seigel at oracle.com  Thu Nov  7 14:08:37 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Thu, 7 Nov 2019 09:08:37 -0500
Subject: RFR: CSR JVM support for records
In-Reply-To: <bfbc0759-17d6-e909-9047-2781f6f09d32@oracle.com>
References: <fb8c0662-78c0-c442-3fd8-db3178c27127@oracle.com>
 <1d080dde-df36-d0b1-5e33-f439d70336e7@oracle.com>
 <6af860a2-20d4-a194-d4ad-c17e58df6ffa@oracle.com>
 <91e9ce5f-b636-ef6d-850c-6c9fa454946b@oracle.com>
 <fb4cbab3-2a0f-2c1b-cd2e-47cf24692f33@oracle.com>
 <bfbc0759-17d6-e909-9047-2781f6f09d32@oracle.com>
Message-ID: <911fedf0-246c-3bde-fcc5-6a33408aa730@oracle.com>

Thanks Alex!

Harold

On 11/6/2019 6:57 PM, Alex Buckley wrote:
> Thanks for updating. I made small edits (assuming that "The format of 
> the record attribute is consistently checked" is meant to refer to 
> consistency checking a.k.a. format checking, and not to checking being 
> performed on a consistent basis) ... a CSR isn't really the place to 
> suggest that a Record attribute could potentially be useful for X or 
> Y, but it's time to move on.
>
> Alex
>
> On 11/6/2019 11:17 AM, Harold Seigel wrote:
>> Hi Alex,
>>
>> I updated the CSR, hopefully with the info you requested.
>>
>> Thanks, Harold
>>
>> On 11/6/2019 1:28 PM, Alex Buckley wrote:
>>> On 11/6/2019 10:14 AM, Harold Seigel wrote:
>>>> Note that the JVM does consistency check the Records attribute at 
>>>> class load time, not at first use by reflection. So, perhaps this 
>>>> sentence:
>>>>
>>>> ??? Note that if no reflection is performed then the abstract JVM does
>>>> ??? not care about the Record attribute in any way.
>>>>
>>>> to something like
>>>>
>>>> ??? The format of the record attribute is checked even if no 
>>>> reflection
>>>> ??? is performed.
>>>
>>> Given how Record is described in JVMS 4.7 ("each of these attributes 
>>> must be recognized and correctly read by an implementation of the 
>>> Java Virtual Machine"), it makes sense that the HotSpot JVM is 
>>> format checking a Record attribute at load time. So, yes, please 
>>> make the change you describe above, and please explicitly compare 
>>> Record to the format checking performed for Exceptions, 
>>> InnerClasses, etc, and contrast Record with the lack of format 
>>> checking for MethodParameters, Module, etc.
>>>
>>> Alex

From kim.barrett at oracle.com  Thu Nov  7 19:47:29 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 7 Nov 2019 14:47:29 -0500
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <AM6PR02MB53475BBD24B9F49F50C629D7EC780@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
 <AM6PR02MB53475BBD24B9F49F50C629D7EC780@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <F5466D90-798B-4D5F-932F-25A53D0E3B0A@oracle.com>

> On Nov 7, 2019, at 5:24 AM, Lindenmaier, Goetz <goetz.lindenmaier at sap.com> wrote:
> @Oracle: it would also be nice to find there that you switched to 
> gcc 8.  

That got done today.  Thanks for the reminder.


From harold.seigel at oracle.com  Thu Nov  7 22:03:52 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Thu, 7 Nov 2019 17:03:52 -0500
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
Message-ID: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>

Hi,

Please review this small change to help prevent test 
runtime/modules/ModuleStress/ModuleStressGC.java from timing out. The 
change reduces the number of loop iterations in the test by 40%.

Open Webrev: 
http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html

JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055

The change was tested by running Mach5 tier2 tests on Linux-x64, 
Solaris, Windows, and Mac OS X.

Thanks, Harold


From coleen.phillimore at oracle.com  Thu Nov  7 22:46:17 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Thu, 7 Nov 2019 17:46:17 -0500
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
Message-ID: <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>

Hi, I've done a more high level code review of this and it looks good!

http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html

I think these classes require comments on what they do and why.? The 
comments you sent me offline look good.

Also .hpp files shouldn't include .inline.hpp files, like 
bitMap.inline.hpp.? Hopefully it's just a case of moving do_bit() into 
the cpp file.

I wonder if the exception list of classes to exclude should be a 
function in javaClasses.hpp/cpp where the explanation would make more 
sense?? ie bool JavaClasses::has_injected_native_pointers(InstanceKlass* k);

Is there already an RFE to move the DumpSharedSpaces output from 
tty->print() to log_info() ?

Thanks,
Coleen

On 11/6/19 4:17 PM, Ioi Lam wrote:
> Hi Jiangli,
>
> I've uploaded the webrev after integrating your comments:
>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/ 
>
>
> Please see more replies below:
>
>
> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com 
>> <mailto:ioi.lam at oracle.com>> wrote:
>>
>> ??? Hi Jiangli,
>>
>> ??? Thank you so much for spending time reviewing this RFE!
>>
>> ??? On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>> ??? > Hi Ioi,
>> ??? >
>> ??? > Sorry for the delay again. Will try to put this on the top of my
>> ??? list
>> ??? > next week and reduce the turn-around time. The updates look 
>> good in
>> ??? > general.
>> ??? >
>> ??? > We might want to have a better strategy when choosing metadata
>> ??? > relocation address (when relocation is needed). Some
>> ??? > applications/benchmarks may be more sensitive to cache locality 
>> and
>> ??? > memory/data layout. There was a bug,
>> ??? > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused 1G 
>> gap
>> ??? > between Java heap data and metadata before JDK 12. The gap 
>> seemed to
>> ??? > cause a small but noticeable runtime effect in one case that I 
>> came
>> ??? > across.
>>
>> ??? I guess you're saying we should try to relocate the archive into
>> ??? somewhere under 32GB?
>>
>>
>> I don't yet have sufficient data that determins if mapping at low 32G 
>> produces better runtime performance. I experimented with that, but 
>> didn't see noticeable difference when comparing to mapping at the 
>> current default address. It doesn't hurt, I think. So it may be a 
>> better choice than relocating to a random address in high?32G space 
>> (when Java heap is in low 32G address space).
>
> Maybe we should reconsider this when we have more concrete data for 
> the benefits of moving the compressed class space to under 32G.
>
> Please note that in metaspace.cpp, when CDS is disabled and? the VM 
> fails to allocate the class space at the requested address (0x7c000000 
> for 16GB heap), it also just allocates from a random address (without 
> trying to to search under 32GB):
>
> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128 
>
>
> This code has been there since 2013 and we have not seen any issues.
>
>
>
>
>>
>> ??? Could you elaborate more about the performance issue, especially
>> ??? about
>> ??? cache locality? I looked at JDK-8213713 but it didn't mention about
>> ??? performance.
>>
>>
>> When enabling CDS we noticed a small runtime overhead in JDK 11 
>> recently with a benchmark. After I backported JDK-8213713 to 11, it 
>> seemed to reduce the runtime overhead that the benchmark was 
>> experiencing.
>>
>>
>> ??? Also, by default, we have non-zero narrow_klass_base and
>> ??? narrow_klass_shift = 3, and archive relocation doesn't change that:
>>
>> ??? $ java -Xlog:cds=debug -version
>> ??? ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>> ??? $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>> ??? ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>>
>> ??? We always use narrow_klass_shift due to this:
>>
>> ??? ?? // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>> ??? ?? //
>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>> ??? ?? // how dump time narrow_klass_shift is set. Although, CDS can 
>> work
>> ??? ?? // with zero-shift mode also, to be consistent with AOT it uses
>> ??? ?? // LogKlassAlignmentInBytes for klass shift so archived java
>> ??? heap objects
>> ??? ?? // can be used at same time as AOT code.
>> ??? ?? if (!UseSharedSpaces
>> ??? ?????? && (uint64_t)(higher_address - lower_base) <=
>> ??? UnscaledClassSpaceMax) {
>> ??? ???? CompressedKlassPointers::set_shift(0);
>> ??? ?? } else {
>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>> ??? ?? }
>>
>>
>> Right. If we relocate to low 32G space, it needs to make sure that 
>> the range containing the mapped class data and class space must be 
>> encodable.
>>
>>
>> ??? > Here are some additional comments (minor).
>> ??? >
>> ??? > Could you please fix the long lines in the following?
>> ??? >
>> ??? > 1237 void
>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>> ??? > archived_mirror) {
>> ??? > 1238? ?if (MetaspaceShared::relocation_delta() != 0) {
>> ??? > 1239 ?assert(archived_mirror->metadata_field(_klass_offset) ==
>> ??? > NULL, "must be for primitive class");
>> ??? > 1240
>> ??? > 1241? ? ?Klass* ak =
>> ??? > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>> ??? > 1242? ? ?if (ak != NULL) {
>> ??? > 1243 ?archived_mirror->metadata_field_put(_array_klass_offset,
>> ??? > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>> ??? > 1244? ? ?}
>> ??? > 1245? ?}
>> ??? > 1246 }
>> ??? >
>> ??? > src/hotspot/share/memory/dynamicArchive.cpp
>> ??? >
>> ??? >? ?889? ?Thread* THREAD = Thread::current();
>> ??? >? ?890? ?Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>> ??? > dynamic_dump_method_comparator);
>> ??? >? ?891? ?if (ik->default_methods() != NULL) {
>> ??? >? ?892 ?Method::sort_methods(ik->default_methods(),
>> ??? > /*set_idnums=*/false, dynamic_dump_method_comparator);
>> ??? >? ?893? ?}
>> ??? >
>>
>> ??? OK will do.
>>
>> ??? > Please see inlined comments below.
>> ??? >
>> ??? > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>> ??? <mailto:ioi.lam at oracle.com>> wrote:
>> ??? >> Hi Jiangli,
>> ??? >>
>> ??? >> Thanks for the review. I've updated the patch according to your
>> ??? comments:
>> ??? >>
>> ??? >>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>> ??? >>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>> ??? >>
>> ??? >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>> ??? in my
>> ??? >> reply to Calvin's comments).
>> ??? >>
>> ??? >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>> ??? >>> Hi Ioi,
>> ??? >>>
>> ??? >>> Sorry for the delay. Here are my remaining comments.
>> ??? >>>
>> ??? >>> - src/hotspot/share/memory/dynamicArchive.cpp
>> ??? >>>
>> ??? >>> 128? ?static intx _method_comparator_name_delta;
>> ??? >>>
>> ??? >>> The name of the above variable is confusing. It's the value of
>> ??? >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>> ??? >>> directly.
>> ??? >> _buffer_to_target_delta is a non-static field, but
>> ??? >> dynamic_dump_method_comparator() must be a static function so
>> ??? it can't
>> ??? >> use the non-static field easily.
>> ??? >
>> ??? > It sounds like an issue. _buffer_to_target_delta was made as a
>> ??? > non-static mostly because we might support more than one dynamic
>> ??? > archives in the future. However, today's usages bake in an
>> ??? assumption
>> ??? > that _buffer_to_target_delta is a singleton value. It is 
>> cleaner to
>> ??? > either make _buffer_to_target_delta as a static variable for 
>> now, or
>> ??? > adding an access API in DynamicArchiveBuilder to allow other 
>> code to
>> ??? > properly and correctly use the value.
>>
>> ??? OK, I'll move it to a static variable.
>>
>> ??? >
>> ??? >>> Also, we can do a quick pointer comparison of 'a_name' and
>> ??? >>> 'b_name' first before adjusting the pointers.
>> ??? >> I added this:
>> ??? >>
>> ??? >>? ? ? ?if (a_name == b_name) {
>> ??? >>? ? ? ? ?return 0;
>> ??? >>? ? ? ?}
>> ??? >>
>> ??? >>> ---
>> ??? >>>
>> ??? >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>> ??? >>> ...
>> ??? >>>? ? 944
>> ??? >>>? ? 945 ?ArchivePtrMarker::compact(relocatable_base,
>> ??? relocatable_end);
>> ??? >>> ...
>> ??? >>>
>> ??? >>>? ? 974? ? ?SharedDataRelocator patcher((address*)patch_base,
>> ??? >>> (address*)patch_end, valid_old_base, valid_old_end,
>> ??? >>>? ? 975 ?valid_new_base, valid_new_end, addr_delta);
>> ??? >>>? ? 976 ?ArchivePtrMarker::ptrmap()->iterate(&patcher);
>> ??? >>>
>> ??? >>> Could we reduce the number of data re-iterations to help archive
>> ??? >>> dumping performance. The ArchivePtrMarker::compact operation
>> ??? can be
>> ??? >>> combined with the patching iteration.
>> ??? ArchivePtrMarker::compact API
>> ??? >>> can be removed.
>> ??? >> That's a good idea. I implemented it using a template parameter
>> ??? so that
>> ??? >> we can have max performance when relocating the archive at run
>> ??? time.
>> ??? >>
>> ??? >> I added comments to explain why the relocation is done here. The
>> ??? >> relocation is pretty rare (only when the base archive was not
>> ??? mapped at
>> ??? >> the default location).
>> ??? >>
>> ??? >>> ---
>> ??? >>>
>> ??? >>>? ? 967? ? ?address valid_new_base =
>> ??? >>> (address)Arguments::default_SharedBaseAddress();
>> ??? >>>? ? 968? ? ?address valid_new_end? = valid_new_base +
>> ??? base_plus_top_size;
>> ??? >>>
>> ??? >>> The debugging only code can be included under #ifdef ASSERT.
>> ??? >> These values are actually also used in debug logging so they
>> ??? can't be
>> ??? >> ifdef'ed out.
>> ??? >>
>> ??? >> Also, the c++ compiler is pretty good with eliding code that's no
>> ??? >> actually used. If I comment out all the logging code in
>> ??? >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>> ??? >> SharedDataRelocator, gcc elides all the unused fields and their
>> ??? >> assignments. So no code is generated for this, etc.
>> ??? >>
>> ??? >>? ? ? ?address valid_new_base =
>> ??? >> (address)Arguments::default_SharedBaseAddress();
>> ??? >>
>> ??? >> Since #ifdef ASSERT makes the code harder to read, I think we
>> ??? should use
>> ??? >> it only when really necessary.
>> ??? > It seems cleaner to get rid of these debugging only variables, by
>> ??? > using 'relocatable_base' and
>> ??? > '(address)Arguments::default_SharedBaseAddress()' in the logging
>> ??? code.
>>
>> ??? SharedDataRelocator is used under 3 different situations. These six
>> ??? variables (patch_base, patch_end, valid_old_base, valid_old_end,
>> ??? valid_new_base, valid_new_end) describes what is being patched,
>> ??? and what
>> ??? the expectations are, for each situation. The code will be hard to
>> ??? understand without them.
>>
>> ??? Please note there's also logging code in the SharedDataRelocator
>> ??? constructor that prints out these values.
>>
>> ??? I think I'll just remove the 'debug only' comment to avoid 
>> confusion.
>>
>>
>> Ok.
>>
>>
>> ??? >
>> ??? >>> ---
>> ??? >>>
>> ??? >>>? ? 993
>> ?dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>> ??? >>>
>> ??? >>> We could combine the archived heap data bitmap into the new
>> ??? region as
>> ??? >>> well? It can be handled as a separate RFE.
>> ??? >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>> ??? >>
>> ??? >>> - src/hotspot/share/memory/filemap.cpp
>> ??? >>>
>> ??? >>> 1038? ? ?if (is_static()) {
>> ??? >>> 1039? ? ? ?if (errno == ENOENT) {
>> ??? >>> 1040? ? ? ? ?// Not locating the shared archive is ok.
>> ??? >>> 1041? ? ? ? ?fail_continue("Specified shared archive not found
>> ??? (%s).",
>> ??? >>> _full_path);
>> ??? >>> 1042? ? ? ?} else {
>> ??? >>> 1043? ? ? ? ?fail_continue("Failed to open shared archive file
>> ??? (%s).",
>> ??? >>> 1044 ?os::strerror(errno));
>> ??? >>> 1045? ? ? ?}
>> ??? >>> 1046? ? ?} else {
>> ??? >>> 1047? ? ? ?log_warning(cds, dynamic)("specified dynamic archive
>> ??? >>> doesn't exist: %s", _full_path);
>> ??? >>> 1048? ? ?}
>> ??? >>>
>> ??? >>> If the top layer is explicitly specified by the user, a
>> ??? warning does
>> ??? >>> not seem to be a proper behavior if the VM fails to open the
>> ??? archive
>> ??? >>> file.
>> ??? >>>
>> ??? >>> If might be better to handle the relocation unrelated code in
>> ??? separate
>> ??? >>> changeset and track with a separate RFE.
>> ??? >> This code was moved from
>> ??? >>
>> ??? >>
>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>> ??? >>
>> ??? >> so I am not changing the behavior. If you want, we can file an
>> ??? REF to
>> ??? >> change the behavior.
>> ??? > Ok. A new RFE sounds like the right thing to re-evaluable the 
>> usage
>> ??? > issue here. Thanks.
>>
>> ??? I created https://bugs.openjdk.java.net/browse/JDK-8233446
>>
>> ??? >>> ---
>> ??? >>>
>> ??? >>> 1148 void FileMapInfo::write_region(int region, char* base,
>> ??? size_t size,
>> ??? >>> 1149? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool read_only, bool
>> ??? allow_exec) {
>> ??? >>> ...
>> ??? >>> 1154
>> ??? >>> 1155? ?if (region == MetaspaceShared::bm) {
>> ??? >>> 1156? ? ?target_base = NULL;
>> ??? >>> 1157? ?} else if (DynamicDumpSharedSpaces) {
>> ??? >>>
>> ??? >>> It's not too clear to me how the bitmap (bm) region is handled
>> ??? for the
>> ??? >>> base layer and top layer. Could you please explain?
>> ??? >> The bm region for both layers are mapped at an address picked
>> ??? by the OS:
>> ??? >>
>> ??? >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>> ??? >>? ? ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>> ??? >>? ? ?bitmap_size = si->used_aligned();
>> ??? >>? ? ?bool read_only = true, allow_exec = false;
>> ??? >>? ? ?char* requested_addr = NULL; // allow OS to pick any location
>> ??? >>? ? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>> ??? si->file_offset(),
>> ??? >> requested_addr, bitmap_size,
>> ??? >> read_only, allow_exec);
>> ??? >>
>> ??? > Ok, after staring at the code for a few seconds I saw that's
>> ??? intended.
>> ??? > If the current region is 'bm', then the 'target_base' is NULL
>> ??? > regardless if it's static or dynamic archive. Otherwise, the
>> ??? > 'target_base' is handled differently for the static and dynamic
>> ??? case.
>> ??? > The following would be cleaner and has better reliability.
>> ??? >
>> ??? >? ? ?char* target_base = NULL;
>> ??? >
>> ??? >? ? ?// The target_base is NULL for 'bm' region.
>> ??? >? ? ?if (!region == MetaspaceShared::bm) {
>> ??? >? ? ? ?if (DynamicDumpSharedSpaces) {
>> ??? >? ? ? ? ?assert(!HeapShared::is_heap_region(region), "dynamic 
>> archive
>> ??? > doesn't support heap regions");
>> ??? >? ? ? ? ?target_base = DynamicArchive::buffer_to_target(base);
>> ??? >? ? ? ?} else {
>> ??? >? ? ? ? ?target_base = base;
>> ??? >? ? ? ?}
>> ??? >? ? }
>>
>> ??? How about this?
>>
>> ??? ?? char* target_base;
>> ??? ?? if (region == MetaspaceShared::bm) {
>> ??? ???? target_base = NULL; // always NULL for bm region.
>> ??? ?? } else {
>> ??? ???? if (DynamicDumpSharedSpaces) {
>> ??? ??? ? ?? assert(!HeapShared::is_heap_region(region), "dynamic 
>> archive
>> ??? doesn't support heap regions");
>> ??? ???????? target_base = DynamicArchive::buffer_to_target(base);
>> ??? ???? } else {
>> ??? ???????? target_base = base;
>> ??? ???? }
>> ??? ?? }
>>
>>
>> No objection If you prefer the extra 'else' block.
>>
>>
>> ??? >
>> ??? >>> ---
>> ??? >>>
>> ??? >>> 1362
>> ?DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>> ??? >>>
>> ??? >>> Could you please explain the above?
>> ??? >> I added the comments
>> ??? >>
>> ??? >>? ? ?// Make sure we don't attempt to use
>> ??? header()->mapped_base_address()
>> ??? >> unless
>> ??? >>? ? ?// it's been successfully mapped.
>> ??? >>
>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>> ??? >>
>> ??? >>> ---
>> ??? >>>
>> ??? >>> 1359? ?FileMapRegion* last_region = NULL;
>> ??? >>>
>> ??? >>> 1371? ? ?if (last_region != NULL) {
>> ??? >>> 1372? ? ? ?// Ensure that the OS won't be able to allocate new
>> ??? memory
>> ??? >>> spaces between any mapped
>> ??? >>> 1373? ? ? ?// regions, or else it would mess up the simple
>> ??? comparision
>> ??? >>> in MetaspaceObj::is_shared().
>> ??? >>> 1374? ? ? ?assert(si->mapped_base() == 
>> last_region->mapped_end(),
>> ??? >>> "must have no gaps");
>> ??? >>>
>> ??? >>> 1379? ? ?last_region = si;
>> ??? >>>
>> ??? >>> Can you please place 'last_region' related code under #ifdef
>> ??? ASSERT?
>> ??? >> I think that will make the code more cluttered. The compiler will
>> ??? >> optimize out that away.
>> ??? > It's cleaner to define debugging only variable for debugging only
>> ??? > builds. You can wrapper it and related usage with DEBUG_ONLY.
>>
>> ??? OK, will do.
>>
>> ??? >
>> ??? >>> ---
>> ??? >>>
>> ??? >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>> ??? bitmap_size) {
>> ??? >>> 1479? ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>> ??? >>> 1480? ?bitmap_size = si->used_aligned();
>> ??? >>> 1481? ?bool read_only = true, allow_exec = false;
>> ??? >>> 1482? ?char* requested_addr = NULL; // allow OS to pick any
>> ??? location
>> ??? >>> 1483? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>> ??? si->file_offset(),
>> ??? >>> 1484 requested_addr, bitmap_size,
>> ??? >>> read_only, allow_exec);
>> ??? >>>
>> ??? >>> We need to handle mapping failure here.
>> ??? >> It's handled here:
>> ??? >>
>> ??? >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>> ??? >>? ? ?log_debug(cds, reloc)("runtime archive relocation start");
>> ??? >>? ? ?size_t bitmap_size;
>> ??? >>? ? ?char* bitmap_base = map_relocation_bitmap(bitmap_size);
>> ??? >>? ? ?if (bitmap_base != NULL) {
>> ??? >>? ? ?...
>> ??? >>? ? ?} else {
>> ??? >>? ? ? ?log_error(cds)("failed to map relocation bitmap");
>> ??? >>? ? ? ?return false;
>> ??? >>? ? ?}
>> ??? >>
>> ??? > 'bitmap_base' is used immediately after map_memory(). So the check
>> ??? > needs to be done immediately after map_memory(), but not in the
>> ??? caller
>> ??? > of map_relocation_bitmap().
>> ??? >
>> ??? > 1490? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>> ??? si->file_offset(),
>> ??? > 1491 requested_addr, bitmap_size,
>> ??? > read_only, allow_exec);
>> ??? > 1492
>> ??? > 1493? ?if (VerifySharedSpaces && bitmap_base != NULL &&
>> ??? > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>>
>> ??? OK, I'll fix that.
>>
>> ??? >
>> ??? >
>> ??? >>> ---
>> ??? >>>
>> ??? >>> 1513? ? ?// debug only -- the current value of the pointers 
>> to be
>> ??? >>> patched must be within this
>> ??? >>> 1514? ? ?// range (i.e., must be between the requesed base
>> ??? address,
>> ??? >>> and the of the current archive).
>> ??? >>> 1515? ? ?// Note: top archive may point to objects in the base
>> ??? >>> archive, but not the other way around.
>> ??? >>> 1516? ? ?address valid_old_base =
>> ??? (address)header()->requested_base_address();
>> ??? >>> 1517? ? ?address valid_old_end? = valid_old_base +
>> ??? mapping_end_offset();
>> ??? >>>
>> ??? >>> Please place all FileMapInfo::relocate_pointers debugging only
>> ??? code
>> ??? >>> under #ifdef ASSERT.
>> ??? >> Ditto about ifdef ASSERT
>> ??? >>
>> ??? >>> - src/hotspot/share/memory/heapShared.cpp
>> ??? >>>
>> ??? >>>? ? 441 void
>> ??? HeapShared::initialize_from_archived_subgraph(Klass* k) {
>> ??? >>>? ? 442? ?if (!open_archive_heap_region_mapped() ||
>> ??? !MetaspaceObj::is_shared(k)) {
>> ??? >>>? ? 443? ? ?return; // nothing to do
>> ??? >>>? ? 444? ?}
>> ??? >>>
>> ??? >>> When do we call HeapShared::initialize_from_archived_subgraph
>> ??? for a
>> ??? >>> klass that's not shared?
>> ??? >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>> ??? that for
>> ??? >> debugging purposes only.
>> ??? >>
>> ??? >>>? ? 616? ?DEBUG_ONLY({
>> ??? >>>? ? 617? ? ? ?Klass* klass = orig_obj->klass();
>> ??? >>>? ? 618? ? ? ?assert(klass != SystemDictionary::Module_klass() &&
>> ??? >>>? ? 619? ? ? ? ? ? ? klass !=
>> ??? SystemDictionary::ResolvedMethodName_klass() &&
>> ??? >>>? ? 620? ? ? ? ? ? ? klass !=
>> ??? SystemDictionary::MemberName_klass() &&
>> ??? >>>? ? 621? ? ? ? ? ? ? klass != 
>> SystemDictionary::Context_klass() &&
>> ??? >>>? ? 622? ? ? ? ? ? ? klass !=
>> ??? SystemDictionary::ClassLoader_klass(), "we
>> ??? >>> can only relocate metaspace object pointers inside 
>> java_lang_Class
>> ??? >>> instances");
>> ??? >>>? ? 623? ? ?});
>> ??? >>>
>> ??? >>> Let's leave the above for a separate RFE. I think assert is not
>> ??? >>> sufficient for the check. Also, why ResolvedMethodName, 
>> Module and
>> ??? >>> MemberName cannot be part of the graph?
>> ??? >>>
>> ??? >>>
>> ??? >> I added the following comment:
>> ??? >>
>> ??? >>? ? ?DEBUG_ONLY({
>> ??? >>? ? ? ? ?// The following are classes in
>> ??? share/classfile/javaClasses.cpp
>> ??? >> that have injected native pointers
>> ??? >>? ? ? ? ?// to metaspace objects. To support these classes, we
>> ??? need to add
>> ??? >> relocation code similar to
>> ??? >>? ? ? ? ?// 
>> java_lang_Class::update_archived_mirror_native_pointers.
>> ??? >>? ? ? ? ?Klass* klass = orig_obj->klass();
>> ??? >>? ? ? ? ?assert(klass != SystemDictionary::Module_klass() &&
>> ??? >>? ? ? ? ? ? ? ? klass !=
>> ??? SystemDictionary::ResolvedMethodName_klass() &&
>> ??? >>
>> ??? > It's too restrictive to exclude those objects from the archived
>> ??? object
>> ??? > graph because metadata relocation, since metadata relocation is
>> ??? rare.
>> ??? > The trade-off doesn't seem to buy us much.
>> ??? >
>> ??? > Do you plan to add the needed relocation code?
>>
>> ??? I looked more into this. Actually we cannot handle these 5 
>> classes at
>> ??? all, even without archive relocation:
>>
>> ??? [1] #define MODULE_INJECTED_FIELDS(macro) \
>> ??? ?? macro(java_lang_Module, module_entry, intptr_signature, false)
>>
>> ??? ->? module_entry is malloc'ed
>>
>> ??? [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>> ??? ?? macro(java_lang_invoke_ResolvedMethodName, vmholder,
>> ??? object_signature, false) \
>> ??? ?? macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>> ??? intptr_signature, false)
>>
>> ??? -> these fields are related to method handles and lambda forms, etc.
>> ??? They can't be easily be archived without implementing lambda form
>> ??? archiving. (I did a prototype; it's very complex and fragile).
>>
>> ??? [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>> ??? macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>> ??? vmdependencies, intptr_signature, false) \
>> ??? macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>> ??? last_cleanup, long_signature, false)
>>
>> ??? -> vmdependencies is malloc'ed.
>>
>> ??? [4] #define
>> MEMBERNAME_INJECTED_FIELDS(macro) \
>> ??? ?? macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>> ??? false)
>>
>> ??? -> this one is probably OK. Despite being declared as
>> ??? 'intptr_signature', it seems to be used just as an integer. However,
>> ??? MemberNames are typically used with [2] and [3]. So let's just
>> ??? forbid it
>> ??? to be safe.
>>
>> ??? [2] [3] [4] are not used directly by regular Java code and are
>> ??? unlikely
>> ??? to be referenced (directly or indirectly) by static fields (except
>> ??? for
>> ??? the static fields in the classes in java.lang.invoke, which we
>> ??? probably
>> ??? won't support for heap archiving due to the problem I described for
>> ??? [2]). Objects of these types are typically referenced via constant
>> ??? pool
>> ??? entries.
>>
>> ??? [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>> ??? ?? macro(java_lang_ClassLoader, loader_data, intptr_signature, 
>> false)
>>
>> ??? -> loader_data is malloc'ed.
>>
>> ??? So, I will change the DEBUG_ONLY into a product-mode check, and quit
>> ??? dumping if these objects are found in the object subgraph.
>>
>>
>> Sounds good. Can you please also add a comment with explanation.
>>
>> For??ClassLoader and?Module, it worth considering caching the 
>> additional native data some time in the future. Lois had suggested 
>> the Module part a while ago.
>
> I think we can do that if/when we archive Modules directly into the 
> shared heap.
>
>
>
>>
>>
>>
>>
>>
>> ??? Maybe we should backport the check to older versions as well?
>>
>>
>> We should discuss with Andrew Haley for backports to JDK 11 update 
>> releases. Since the current OpenJDK 11 only applies Java heap 
>> archiving to a restricted set of JDK library code, I think it is safe 
>> without the new check.
>>
>> For non-LTS releases, it might not be worthwhile as they may not be 
>> widely used?
>
> I agree. FYI, we (Oracle) have no plan for backporting more types of 
> heap object archiving, so the decision would be up to whoever that 
> decides to do so.
>
> Thanks
> - Ioi
>
>
>>
>> Thanks,
>> Jiangli
>>
>>
>> ??? >
>> ??? >>> - src/hotspot/share/memory/metaspace.cpp
>> ??? >>>
>> ??? >>> 1036? ?metaspace_rs = 
>> ReservedSpace(compressed_class_space_size(),
>> ??? >>> 1037 ? _reserve_alignment,
>> ??? >>> 1038 ? large_pages,
>> ??? >>> 1039 ? requested_addr);
>> ??? >>>
>> ??? >>> Please fix indentation.
>> ??? >> Fixed.
>> ??? >>
>> ??? >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>> ??? >>>
>> ??? >>>? ? ?78? ?enum SpecialRef {
>> ??? >>>? ? ?79? ? ?_method_entry_ref
>> ??? >>>? ? ?80? ?};
>> ??? >>>
>> ??? >>> Are there other pointers that are not references to
>> ??? MetaspaceObj? If
>> ??? >>> _method_entry_ref is the only type, it's probably not worth
>> ??? defining
>> ??? >>> SpecialRef?
>> ??? >> There may be more types in the future, so I want to have a
>> ??? stable API
>> ??? >> that can be easily expanded without touching all the code that
>> ??? uses it.
>> ??? >>
>> ??? >>
>> ??? >>> - src/hotspot/share/memory/metaspaceShared.hpp
>> ??? >>>
>> ??? >>>? ? ?42 enum MapArchiveResult {
>> ??? >>>? ? ?43? ?MAP_ARCHIVE_SUCCESS,
>> ??? >>>? ? ?44? ?MAP_ARCHIVE_MMAP_FAILURE,
>> ??? >>>? ? ?45? ?MAP_ARCHIVE_OTHER_FAILURE
>> ??? >>>? ? ?46 };
>> ??? >>>
>> ??? >>> If we want to define different failure types, it's probably 
>> worth
>> ??? >>> using separate types for relocation failure and validation
>> ??? failure.
>> ??? >> For now, I just need to distinguish between MMAP_FAILURE (where
>> ??? I should
>> ??? >> attempt to remap at an alternative address) and OTHER_FAILURE
>> ??? (where the
>> ??? >> CDS archive loading will fail -- due to validation error,
>> ??? insufficient
>> ??? >> memory, etc -- without attempting to remap.)
>> ??? >>
>> ??? >>> ---
>> ??? >>>
>> ??? >>>? ? 193? ?static intx _mapping_delta; // FIXME rename
>> ??? >>>
>> ??? >>> How about _relocation_delta?
>> ??? >> Changed as suggested.
>> ??? >>
>> ??? >>> - src/hotspot/share/oops/instanceKlass
>> ??? >>>
>> ??? >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>> ??? >>>
>> ??? >>> The use of _disable_method_binary_search is not necessary. You
>> ??? can use
>> ??? >>> DynamicDumpSharedSpaces for the purpose. That would make things
>> ??? >>> cleaner.
>> ??? >> If we always disable the binary search when
>> ??? DynamicDumpSharedSpaces is
>> ??? >> true, it will slow down normal execution of the Java program when
>> ??? >> -XX:ArchiveClassesAtExit has been specified, but the program
>> ??? hasn't exited.
>> ??? > Could you please add some comments to 
>> _disable_method_binary_search
>> ??? > with the above explanation? Thanks.
>>
>> ??? OK
>> ??? >
>> ??? >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>> ??? >>>
>> ??? >>>? ? ?76? ? ? ? ? ? ? ? ? ? ?if (name.equals("s0") ||
>> ??? name.equals("s1")) {
>> ??? >>>? ? ?77? ? ? ? ? ? ? ? ? ? ? ?// String regions are listed at
>> ??? the end and
>> ??? >>> they may not be fully occupied.
>> ??? >>>? ? ?78? ? ? ? ? ? ? ? ? ? ? ?break;
>> ??? >>>? ? ?79? ? ? ? ? ? ? ? ? ? ?} else if (name.equals("bm")) {
>> ??? >>>? ? ?80? ? ? ? ? ? ? ? ? ? ? ?// Bitmap space does not have a
>> ??? requested address.
>> ??? >>>? ? ?81? ? ? ? ? ? ? ? ? ? ? ?break;
>> ??? >>>
>> ??? >>> It's not part of your change, but could you please fix line 76
>> ??? - 78
>> ??? >>> since it is trivial. It seems the lines can be removed.
>> ??? >> Removed.
>> ??? >>
>> ??? >>> - /src/hotspot/share/memory/archiveUtils.hpp
>> ??? >>> The file name does not match with the macro '#ifndef
>> ??? >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>> ??? >>> archiveUtils.* ? archiveRelocator.hpp and 
>> archiveRelocator.cpp are
>> ??? >>> more descriptive.
>> ??? >> I named the file archiveUtils.hpp so we can move other misc
>> ??? stuff used
>> ??? >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>> ??? >> metaspaceShared.hpp), since theses are not used by the majority
>> ??? of the
>> ??? >> files that use metaspaceShared.hpp.
>> ??? >>
>> ??? >> I fixed the ifdef.
>> ??? >>
>> ??? >>> - src/hotspot/share/memory/archiveUtils.cpp
>> ??? >>>
>> ??? >>>? ? ?36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>> ??? address*
>> ??? >>> ptr_base, address* ptr_end) {
>> ??? >>>? ? ?37? ?assert(_ptrmap == NULL, "initialize only once");
>> ??? >>>? ? ?38? ?_ptr_base = ptr_base;
>> ??? >>>? ? ?39? ?_ptr_end = ptr_end;
>> ??? >>>? ? ?40? ?_compacted = false;
>> ??? >>>? ? ?41? ?_ptrmap = ptrmap;
>> ??? >>>? ? ?42? ?_ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>> ??? default
>> ??? >>> archive is about 12MB.
>> ??? >>>? ? ?43 }
>> ??? >>>
>> ??? >>> Could we do a better estimate here? We could guesstimate the 
>> size
>> ??? >>> based on the current used class space and metaspace size. It's
>> ??? okay if
>> ??? >>> a larger bitmap used, since it can be reduced after all
>> ??? marking are
>> ??? >>> done.
>> ??? >> The bitmap is automatically expanded when necessary in
>> ??? >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>> ??? of the
>> ??? >> total archive size, so even if we do expand, the cost will be
>> ??? trivial.
>> ??? > The initial value is based on the default CDS archive. When 
>> dealing
>> ??? > with a really large archive, it would have to re-grow many times.
>> ??? > Also, using a hard-coded value is less desirable.
>>
>> ??? OK, I changed it to the following
>>
>> ??? ?? // Use this as initial guesstimate. We should need less space
>> ??? in the
>> ??? ?? // archive, but if we're wrong the bitmap will be expanded
>> ??? automatically.
>> ??? ?? size_t estimated_archive_size = MetaspaceGC::capacity_until_GC();
>> ??? ?? // But set it smaller in debug builds so we always test the
>> ??? expansion
>> ??? code.
>> ??? ?? // (Default archive is about 12MB).
>> ??? ?? DEBUG_ONLY(estimated_archive_size = 6 * M);
>>
>> ??? ?? // We need one bit per pointer in the archive.
>> ??? ?? _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>>
>>
>> ??? Thanks!
>> ??? - Ioi
>>
>> ??? >
>> ??? >>>
>> ??? >>>
>> ??? >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>> ??? <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>> ??? >>>> Hi Ioi,
>> ??? >>>>
>> ??? >>>> This is another great step for CDS usability improvement.
>> ??? Thank you!
>> ??? >>>>
>> ??? >>>> I have a high level question (or request): could we consider
>> ??? >>>> separating the relocation work for 'direct' class metadata
>> ??? from other
>> ??? >>>> types of metadata (such as the shared system dictionary,
>> ??? symbol table,
>> ??? >>>> etc)? Initially we only relocate the tables and other
>> ??? archived global
>> ??? >>>> data. When each archived class is being loaded, we can
>> ??? relocate all
>> ??? >>>> the pointers within the current class. We could find the
>> ??? segment (for
>> ??? >>>> the current class) in the bitmap and update the pointers
>> ??? within the
>> ??? >>>> segment. That way we can reduce initial startup costs and
>> ??? also avoid
>> ??? >>>> relocating class data that's not used at runtime. In some
>> ??? real world
>> ??? >>>> large systems, an archive may contain extremely large number of
>> ??? >>>> classes.
>> ??? >>>>
>> ??? >>>> Following are partial review comments so we can move things
>> ??? forward.
>> ??? >>>> Still going through the rest of the changes.
>> ??? >>>>
>> ??? >>>> - src/hotspot/share/classfile/javaClasses.cpp
>> ??? >>>>
>> ??? >>>> 1218 void
>> ??? java_lang_Class::update_archived_mirror_native_pointers(oop
>> ??? >>>> archived_mirror) {
>> ??? >>>> 1219? ?Klass* k =
>> ??? ((Klass*)archived_mirror->metadata_field(_klass_offset));
>> ??? >>>> 1220? ?if (k != NULL) { // k is NULL for the primitive
>> ??? classes such as
>> ??? >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>> ??? >>>> 1221 ?archived_mirror->metadata_field_put(_klass_offset,
>> ??? >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>> ??? >>>> 1222? ?}
>> ??? >>>> 1223 ...
>> ??? >>>>
>> ??? >>>> Primitive type mirrors are handled separately. Could you
>> ??? please verify
>> ??? >>>> if this call path happens for primitive type mirror?
>> ??? >>>>
>> ??? >>>> To answer my question above, looks like you added the
>> ??? following, which
>> ??? >>>> is to be used for primitive type mirrors. That seems to be
>> ??? the reason
>> ??? >>>> why update_archived_mirror_native_pointers is trying to also
>> ??? cover
>> ??? >>>> primitive type. It better to have a separate API for
>> ??? primitive type
>> ??? >>>> mirror, which is cleaner. And, we also can replace the above
>> ??? check at
>> ??? >>>> line 1220 to be an assert for regular mirrors.
>> ??? >>>>
>> ??? >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>> ??? >>>> +? do_oop(p);
>> ??? >>>> +? oop mirror = *p;
>> ??? >>>> +? if (mirror != NULL) {
>> ??? >>>> +
>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
>> ??? >>>> +? }
>> ??? >>>> +}
>> ??? >>>> +
>> ??? >>>>
>> ??? >>>> How about renaming update_archived_mirror_native_pointers to
>> ??? >>>> update_archived_mirror_klass_pointers.
>> ??? >>>>
>> ??? >>>> It would be good to pass the current klass as an argument. 
>> We can
>> ??? >>>> verify the relocated pointer matches with the current klass
>> ??? pointer.
>> ??? >>>>
>> ??? >>>> We should also check if relocation is necessary before
>> ??? spending cycles
>> ??? >>>> to obtain the klass pointer from the mirror.
>> ??? >>>>
>> ??? >>>> 1252 ?update_archived_mirror_native_pointers(m);
>> ??? >>>> 1253
>> ??? >>>> 1254? ?// mirror is archived, restore
>> ??? >>>> 1255 ?assert(HeapShared::is_archived_object(m), "must be 
>> archived
>> ??? >>>> mirror object");
>> ??? >>>> 1256? ?Handle mirror(THREAD, m);
>> ??? >>>>
>> ??? >>>> Could we move the line at 1252 after the assert at line 1255?
>> ??? >>>>
>> ??? >>>> - src/hotspot/share/include/cds.h
>> ??? >>>>
>> ??? >>>>? ? ?47? ?int? ? ?_mapped_from_file;? // Is this region mapped
>> ??? from a file?
>> ??? >>>>? ? ?48? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?// If false, this 
>> region was
>> ??? >>>> initialized using os::read().
>> ??? >>>>
>> ??? >>>> Is the new field truly needed? It seems we could use
>> ??? _mapped_base to
>> ??? >>>> determine if a region is mapped or not?
>> ??? >>>>
>> ??? >>>> - src/hotspot/share/memory/dynamicArchive.cpp
>> ??? >>>>
>> ??? >>>> Could you please remove the debugging print code in
>> ??? >>>> dynamic_dump_method_comparator? Or convert those to logging
>> ??? output if
>> ??? >>>> they are helpful.
>> ??? >>>>
>> ??? >>>> Will send out the rest of the review comments later.
>> ??? >>>>
>> ??? >>>> Best,
>> ??? >>>>
>> ??? >>>> Jiangli
>> ??? >>>>
>> ??? >>>>
>> ??? >>>>
>> ??? >>>>
>> ??? >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
>> ??? <mailto:ioi.lam at oracle.com>> wrote:
>> ??? >>>>> Bug:
>> ??? >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
>> ??? >>>>>
>> ??? >>>>> Webrev:
>> ??? >>>>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
>> ??? >>>>>
>> ??? >>>>> Design:
>> ??? >>>>>
>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
>> ??? >>>>>
>> ??? >>>>>
>> ??? >>>>> Overview:
>> ??? >>>>>
>> ??? >>>>> The CDS archive is mmaped to a fixed address range 
>> (starting at
>> ??? >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
>> ??? >>>>> requested address range is not available (usually due to 
>> Address
>> ??? >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give 
>> up and
>> ??? >>>>> will load classes dynamically using class files.
>> ??? >>>>>
>> ??? >>>>> [a] This causes slow down in JVM start-up.
>> ??? >>>>> [b] Handling of mapping failures causes unnecessary
>> ??? complication in
>> ??? >>>>>? ? ? ? the CDS tests.
>> ??? >>>>>
>> ??? >>>>> Here are some preliminary benchmarking results (using
>> ??? default CDS archive,
>> ??? >>>>> running helloworld):
>> ??? >>>>>
>> ??? >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
>> ??? >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
>> ??? >>>>> (c) 86.2ms (CDS disabled)
>> ??? >>>>>
>> ??? >>>>> The small degradation in (b) is caused by the relocation of
>> ??? >>>>> absolute pointers embedded in the CDS archive. However, it is
>> ??? >>>>> still a big improvement over case (c)
>> ??? >>>>>
>> ??? >>>>> Please see the design doc (link above) for details.
>> ??? >>>>>
>> ??? >>>>> Thanks
>> ??? >>>>> - Ioi
>> ??? >>>>>
>>
>


From mikhailo.seledtsov at oracle.com  Fri Nov  8 00:03:00 2019
From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com)
Date: Thu, 7 Nov 2019 16:03:00 -0800
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
In-Reply-To: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
References: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
Message-ID: <82556da6-5e85-eb8e-d6b2-fa8cec141c73@oracle.com>

Looks good to me,

Misha

On 11/7/19 2:03 PM, Harold Seigel wrote:
> Hi,
>
> Please review this small change to help prevent test 
> runtime/modules/ModuleStress/ModuleStressGC.java from timing out. The 
> change reduces the number of loop iterations in the test by 40%.
>
> Open Webrev: 
> http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html
>
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055
>
> The change was tested by running Mach5 tier2 tests on Linux-x64, 
> Solaris, Windows, and Mac OS X.
>
> Thanks, Harold
>

From ioi.lam at oracle.com  Fri Nov  8 00:22:14 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 7 Nov 2019 16:22:14 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
Message-ID: <99030987-a044-53fb-784b-62408333137a@oracle.com>

Hi Coleen,

Thanks for the review. Here's an webrev that has incorporated your 
suggestions:

http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/

Please see comments in-line

On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
> Hi, I've done a more high level code review of this and it looks good!
>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html 
>
>
> I think these classes require comments on what they do and why. The 
> comments you sent me offline look good.

I added more comments for ArchivePtrMarker::_compacted per your offline 
request.

>
> Also .hpp files shouldn't include .inline.hpp files, like 
> bitMap.inline.hpp.? Hopefully it's just a case of moving do_bit() into 
> the cpp file.

I moved the do_bit() function into archiveUtils.inline.hpp, since is 
used by 3 .cpp files, and performance is important.

>
> I wonder if the exception list of classes to exclude should be a 
> function in javaClasses.hpp/cpp where the explanation would make more 
> sense?? ie bool 
> JavaClasses::has_injected_native_pointers(InstanceKlass* k);

I moved the checking code to javaClasses.cpp. Since we do (partially) 
support java.lang.Class, which has injected native pointers, I named the 
function as JavaClasses::is_supported_for_archiving instead. I also 
massaged the comments a little for clarification.

>
> Is there already an RFE to move the DumpSharedSpaces output from 
> tty->print() to log_info() ?

I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS 
dumping tty->print_cr() to unified logging).

Thanks
- Ioi

>
> Thanks,
> Coleen
>
> On 11/6/19 4:17 PM, Ioi Lam wrote:
>> Hi Jiangli,
>>
>> I've uploaded the webrev after integrating your comments:
>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/ 
>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/ 
>>
>>
>> Please see more replies below:
>>
>>
>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com 
>>> <mailto:ioi.lam at oracle.com>> wrote:
>>>
>>> ??? Hi Jiangli,
>>>
>>> ??? Thank you so much for spending time reviewing this RFE!
>>>
>>> ??? On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>>> ??? > Hi Ioi,
>>> ??? >
>>> ??? > Sorry for the delay again. Will try to put this on the top of my
>>> ??? list
>>> ??? > next week and reduce the turn-around time. The updates look 
>>> good in
>>> ??? > general.
>>> ??? >
>>> ??? > We might want to have a better strategy when choosing metadata
>>> ??? > relocation address (when relocation is needed). Some
>>> ??? > applications/benchmarks may be more sensitive to cache 
>>> locality and
>>> ??? > memory/data layout. There was a bug,
>>> ??? > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused 
>>> 1G gap
>>> ??? > between Java heap data and metadata before JDK 12. The gap 
>>> seemed to
>>> ??? > cause a small but noticeable runtime effect in one case that I 
>>> came
>>> ??? > across.
>>>
>>> ??? I guess you're saying we should try to relocate the archive into
>>> ??? somewhere under 32GB?
>>>
>>>
>>> I don't yet have sufficient data that determins if mapping at low 
>>> 32G produces better runtime performance. I experimented with that, 
>>> but didn't see noticeable difference when comparing to mapping at 
>>> the current default address. It doesn't hurt, I think. So it may be 
>>> a better choice than relocating to a random address in high?32G 
>>> space (when Java heap is in low 32G address space).
>>
>> Maybe we should reconsider this when we have more concrete data for 
>> the benefits of moving the compressed class space to under 32G.
>>
>> Please note that in metaspace.cpp, when CDS is disabled and? the VM 
>> fails to allocate the class space at the requested address 
>> (0x7c000000 for 16GB heap), it also just allocates from a random 
>> address (without trying to to search under 32GB):
>>
>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128 
>>
>>
>> This code has been there since 2013 and we have not seen any issues.
>>
>>
>>
>>
>>>
>>> ??? Could you elaborate more about the performance issue, especially
>>> ??? about
>>> ??? cache locality? I looked at JDK-8213713 but it didn't mention about
>>> ??? performance.
>>>
>>>
>>> When enabling CDS we noticed a small runtime overhead in JDK 11 
>>> recently with a benchmark. After I backported JDK-8213713 to 11, it 
>>> seemed to reduce the runtime overhead that the benchmark was 
>>> experiencing.
>>>
>>>
>>> ??? Also, by default, we have non-zero narrow_klass_base and
>>> ??? narrow_klass_shift = 3, and archive relocation doesn't change that:
>>>
>>> ??? $ java -Xlog:cds=debug -version
>>> ??? ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>>> ??? $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>>> ??? ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>>>
>>> ??? We always use narrow_klass_shift due to this:
>>>
>>> ??? ?? // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>>> ??? ?? //
>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>>> ??? ?? // how dump time narrow_klass_shift is set. Although, CDS can 
>>> work
>>> ??? ?? // with zero-shift mode also, to be consistent with AOT it uses
>>> ??? ?? // LogKlassAlignmentInBytes for klass shift so archived java
>>> ??? heap objects
>>> ??? ?? // can be used at same time as AOT code.
>>> ??? ?? if (!UseSharedSpaces
>>> ??? ?????? && (uint64_t)(higher_address - lower_base) <=
>>> ??? UnscaledClassSpaceMax) {
>>> ??? ???? CompressedKlassPointers::set_shift(0);
>>> ??? ?? } else {
>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>>> ??? ?? }
>>>
>>>
>>> Right. If we relocate to low 32G space, it needs to make sure that 
>>> the range containing the mapped class data and class space must be 
>>> encodable.
>>>
>>>
>>> ??? > Here are some additional comments (minor).
>>> ??? >
>>> ??? > Could you please fix the long lines in the following?
>>> ??? >
>>> ??? > 1237 void
>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>>> ??? > archived_mirror) {
>>> ??? > 1238? ?if (MetaspaceShared::relocation_delta() != 0) {
>>> ??? > 1239 ?assert(archived_mirror->metadata_field(_klass_offset) ==
>>> ??? > NULL, "must be for primitive class");
>>> ??? > 1240
>>> ??? > 1241? ? ?Klass* ak =
>>> ??? > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>>> ??? > 1242? ? ?if (ak != NULL) {
>>> ??? > 1243 ?archived_mirror->metadata_field_put(_array_klass_offset,
>>> ??? > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>>> ??? > 1244? ? ?}
>>> ??? > 1245? ?}
>>> ??? > 1246 }
>>> ??? >
>>> ??? > src/hotspot/share/memory/dynamicArchive.cpp
>>> ??? >
>>> ??? >? ?889? ?Thread* THREAD = Thread::current();
>>> ??? >? ?890? ?Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>>> ??? > dynamic_dump_method_comparator);
>>> ??? >? ?891? ?if (ik->default_methods() != NULL) {
>>> ??? >? ?892 ?Method::sort_methods(ik->default_methods(),
>>> ??? > /*set_idnums=*/false, dynamic_dump_method_comparator);
>>> ??? >? ?893? ?}
>>> ??? >
>>>
>>> ??? OK will do.
>>>
>>> ??? > Please see inlined comments below.
>>> ??? >
>>> ??? > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>>> ??? <mailto:ioi.lam at oracle.com>> wrote:
>>> ??? >> Hi Jiangli,
>>> ??? >>
>>> ??? >> Thanks for the review. I've updated the patch according to your
>>> ??? comments:
>>> ??? >>
>>> ??? >>
>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/ 
>>>
>>> ??? >>
>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/ 
>>>
>>> ??? >>
>>> ??? >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>>> ??? in my
>>> ??? >> reply to Calvin's comments).
>>> ??? >>
>>> ??? >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>>> ??? >>> Hi Ioi,
>>> ??? >>>
>>> ??? >>> Sorry for the delay. Here are my remaining comments.
>>> ??? >>>
>>> ??? >>> - src/hotspot/share/memory/dynamicArchive.cpp
>>> ??? >>>
>>> ??? >>> 128? ?static intx _method_comparator_name_delta;
>>> ??? >>>
>>> ??? >>> The name of the above variable is confusing. It's the value of
>>> ??? >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>>> ??? >>> directly.
>>> ??? >> _buffer_to_target_delta is a non-static field, but
>>> ??? >> dynamic_dump_method_comparator() must be a static function so
>>> ??? it can't
>>> ??? >> use the non-static field easily.
>>> ??? >
>>> ??? > It sounds like an issue. _buffer_to_target_delta was made as a
>>> ??? > non-static mostly because we might support more than one dynamic
>>> ??? > archives in the future. However, today's usages bake in an
>>> ??? assumption
>>> ??? > that _buffer_to_target_delta is a singleton value. It is 
>>> cleaner to
>>> ??? > either make _buffer_to_target_delta as a static variable for 
>>> now, or
>>> ??? > adding an access API in DynamicArchiveBuilder to allow other 
>>> code to
>>> ??? > properly and correctly use the value.
>>>
>>> ??? OK, I'll move it to a static variable.
>>>
>>> ??? >
>>> ??? >>> Also, we can do a quick pointer comparison of 'a_name' and
>>> ??? >>> 'b_name' first before adjusting the pointers.
>>> ??? >> I added this:
>>> ??? >>
>>> ??? >>? ? ? ?if (a_name == b_name) {
>>> ??? >>? ? ? ? ?return 0;
>>> ??? >>? ? ? ?}
>>> ??? >>
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>>> ??? >>> ...
>>> ??? >>>? ? 944
>>> ??? >>>? ? 945 ?ArchivePtrMarker::compact(relocatable_base,
>>> ??? relocatable_end);
>>> ??? >>> ...
>>> ??? >>>
>>> ??? >>>? ? 974? ? ?SharedDataRelocator patcher((address*)patch_base,
>>> ??? >>> (address*)patch_end, valid_old_base, valid_old_end,
>>> ??? >>>? ? 975 ?valid_new_base, valid_new_end, addr_delta);
>>> ??? >>>? ? 976 ?ArchivePtrMarker::ptrmap()->iterate(&patcher);
>>> ??? >>>
>>> ??? >>> Could we reduce the number of data re-iterations to help 
>>> archive
>>> ??? >>> dumping performance. The ArchivePtrMarker::compact operation
>>> ??? can be
>>> ??? >>> combined with the patching iteration.
>>> ??? ArchivePtrMarker::compact API
>>> ??? >>> can be removed.
>>> ??? >> That's a good idea. I implemented it using a template parameter
>>> ??? so that
>>> ??? >> we can have max performance when relocating the archive at run
>>> ??? time.
>>> ??? >>
>>> ??? >> I added comments to explain why the relocation is done here. The
>>> ??? >> relocation is pretty rare (only when the base archive was not
>>> ??? mapped at
>>> ??? >> the default location).
>>> ??? >>
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>>? ? 967? ? ?address valid_new_base =
>>> ??? >>> (address)Arguments::default_SharedBaseAddress();
>>> ??? >>>? ? 968? ? ?address valid_new_end? = valid_new_base +
>>> ??? base_plus_top_size;
>>> ??? >>>
>>> ??? >>> The debugging only code can be included under #ifdef ASSERT.
>>> ??? >> These values are actually also used in debug logging so they
>>> ??? can't be
>>> ??? >> ifdef'ed out.
>>> ??? >>
>>> ??? >> Also, the c++ compiler is pretty good with eliding code 
>>> that's no
>>> ??? >> actually used. If I comment out all the logging code in
>>> ??? >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>>> ??? >> SharedDataRelocator, gcc elides all the unused fields and their
>>> ??? >> assignments. So no code is generated for this, etc.
>>> ??? >>
>>> ??? >>? ? ? ?address valid_new_base =
>>> ??? >> (address)Arguments::default_SharedBaseAddress();
>>> ??? >>
>>> ??? >> Since #ifdef ASSERT makes the code harder to read, I think we
>>> ??? should use
>>> ??? >> it only when really necessary.
>>> ??? > It seems cleaner to get rid of these debugging only variables, by
>>> ??? > using 'relocatable_base' and
>>> ??? > '(address)Arguments::default_SharedBaseAddress()' in the logging
>>> ??? code.
>>>
>>> ??? SharedDataRelocator is used under 3 different situations. These six
>>> ??? variables (patch_base, patch_end, valid_old_base, valid_old_end,
>>> ??? valid_new_base, valid_new_end) describes what is being patched,
>>> ??? and what
>>> ??? the expectations are, for each situation. The code will be hard to
>>> ??? understand without them.
>>>
>>> ??? Please note there's also logging code in the SharedDataRelocator
>>> ??? constructor that prints out these values.
>>>
>>> ??? I think I'll just remove the 'debug only' comment to avoid 
>>> confusion.
>>>
>>>
>>> Ok.
>>>
>>>
>>> ??? >
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>>? ? 993
>>> ?dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>>> ??? >>>
>>> ??? >>> We could combine the archived heap data bitmap into the new
>>> ??? region as
>>> ??? >>> well? It can be handled as a separate RFE.
>>> ??? >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>>> ??? >>
>>> ??? >>> - src/hotspot/share/memory/filemap.cpp
>>> ??? >>>
>>> ??? >>> 1038? ? ?if (is_static()) {
>>> ??? >>> 1039? ? ? ?if (errno == ENOENT) {
>>> ??? >>> 1040? ? ? ? ?// Not locating the shared archive is ok.
>>> ??? >>> 1041? ? ? ? ?fail_continue("Specified shared archive not found
>>> ??? (%s).",
>>> ??? >>> _full_path);
>>> ??? >>> 1042? ? ? ?} else {
>>> ??? >>> 1043? ? ? ? ?fail_continue("Failed to open shared archive file
>>> ??? (%s).",
>>> ??? >>> 1044 ?os::strerror(errno));
>>> ??? >>> 1045? ? ? ?}
>>> ??? >>> 1046? ? ?} else {
>>> ??? >>> 1047? ? ? ?log_warning(cds, dynamic)("specified dynamic archive
>>> ??? >>> doesn't exist: %s", _full_path);
>>> ??? >>> 1048? ? ?}
>>> ??? >>>
>>> ??? >>> If the top layer is explicitly specified by the user, a
>>> ??? warning does
>>> ??? >>> not seem to be a proper behavior if the VM fails to open the
>>> ??? archive
>>> ??? >>> file.
>>> ??? >>>
>>> ??? >>> If might be better to handle the relocation unrelated code in
>>> ??? separate
>>> ??? >>> changeset and track with a separate RFE.
>>> ??? >> This code was moved from
>>> ??? >>
>>> ??? >>
>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070 
>>>
>>> ??? >>
>>> ??? >> so I am not changing the behavior. If you want, we can file an
>>> ??? REF to
>>> ??? >> change the behavior.
>>> ??? > Ok. A new RFE sounds like the right thing to re-evaluable the 
>>> usage
>>> ??? > issue here. Thanks.
>>>
>>> ??? I created https://bugs.openjdk.java.net/browse/JDK-8233446
>>>
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>> 1148 void FileMapInfo::write_region(int region, char* base,
>>> ??? size_t size,
>>> ??? >>> 1149? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool read_only, bool
>>> ??? allow_exec) {
>>> ??? >>> ...
>>> ??? >>> 1154
>>> ??? >>> 1155? ?if (region == MetaspaceShared::bm) {
>>> ??? >>> 1156? ? ?target_base = NULL;
>>> ??? >>> 1157? ?} else if (DynamicDumpSharedSpaces) {
>>> ??? >>>
>>> ??? >>> It's not too clear to me how the bitmap (bm) region is handled
>>> ??? for the
>>> ??? >>> base layer and top layer. Could you please explain?
>>> ??? >> The bm region for both layers are mapped at an address picked
>>> ??? by the OS:
>>> ??? >>
>>> ??? >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>>> ??? >>? ? ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>>> ??? >>? ? ?bitmap_size = si->used_aligned();
>>> ??? >>? ? ?bool read_only = true, allow_exec = false;
>>> ??? >>? ? ?char* requested_addr = NULL; // allow OS to pick any 
>>> location
>>> ??? >>? ? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>>> ??? si->file_offset(),
>>> ??? >> requested_addr, bitmap_size,
>>> ??? >> read_only, allow_exec);
>>> ??? >>
>>> ??? > Ok, after staring at the code for a few seconds I saw that's
>>> ??? intended.
>>> ??? > If the current region is 'bm', then the 'target_base' is NULL
>>> ??? > regardless if it's static or dynamic archive. Otherwise, the
>>> ??? > 'target_base' is handled differently for the static and dynamic
>>> ??? case.
>>> ??? > The following would be cleaner and has better reliability.
>>> ??? >
>>> ??? >? ? ?char* target_base = NULL;
>>> ??? >
>>> ??? >? ? ?// The target_base is NULL for 'bm' region.
>>> ??? >? ? ?if (!region == MetaspaceShared::bm) {
>>> ??? >? ? ? ?if (DynamicDumpSharedSpaces) {
>>> ??? >? ? ? ? ?assert(!HeapShared::is_heap_region(region), "dynamic 
>>> archive
>>> ??? > doesn't support heap regions");
>>> ??? >? ? ? ? ?target_base = DynamicArchive::buffer_to_target(base);
>>> ??? >? ? ? ?} else {
>>> ??? >? ? ? ? ?target_base = base;
>>> ??? >? ? ? ?}
>>> ??? >? ? }
>>>
>>> ??? How about this?
>>>
>>> ??? ?? char* target_base;
>>> ??? ?? if (region == MetaspaceShared::bm) {
>>> ??? ???? target_base = NULL; // always NULL for bm region.
>>> ??? ?? } else {
>>> ??? ???? if (DynamicDumpSharedSpaces) {
>>> ??? ??? ? ?? assert(!HeapShared::is_heap_region(region), "dynamic 
>>> archive
>>> ??? doesn't support heap regions");
>>> ??? ???????? target_base = DynamicArchive::buffer_to_target(base);
>>> ??? ???? } else {
>>> ??? ???????? target_base = base;
>>> ??? ???? }
>>> ??? ?? }
>>>
>>>
>>> No objection If you prefer the extra 'else' block.
>>>
>>>
>>> ??? >
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>> 1362
>>> ?DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);) 
>>>
>>> ??? >>>
>>> ??? >>> Could you please explain the above?
>>> ??? >> I added the comments
>>> ??? >>
>>> ??? >>? ? ?// Make sure we don't attempt to use
>>> ??? header()->mapped_base_address()
>>> ??? >> unless
>>> ??? >>? ? ?// it's been successfully mapped.
>>> ??? >>
>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);) 
>>>
>>> ??? >>
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>> 1359? ?FileMapRegion* last_region = NULL;
>>> ??? >>>
>>> ??? >>> 1371? ? ?if (last_region != NULL) {
>>> ??? >>> 1372? ? ? ?// Ensure that the OS won't be able to allocate new
>>> ??? memory
>>> ??? >>> spaces between any mapped
>>> ??? >>> 1373? ? ? ?// regions, or else it would mess up the simple
>>> ??? comparision
>>> ??? >>> in MetaspaceObj::is_shared().
>>> ??? >>> 1374? ? ? ?assert(si->mapped_base() == 
>>> last_region->mapped_end(),
>>> ??? >>> "must have no gaps");
>>> ??? >>>
>>> ??? >>> 1379? ? ?last_region = si;
>>> ??? >>>
>>> ??? >>> Can you please place 'last_region' related code under #ifdef
>>> ??? ASSERT?
>>> ??? >> I think that will make the code more cluttered. The compiler 
>>> will
>>> ??? >> optimize out that away.
>>> ??? > It's cleaner to define debugging only variable for debugging only
>>> ??? > builds. You can wrapper it and related usage with DEBUG_ONLY.
>>>
>>> ??? OK, will do.
>>>
>>> ??? >
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>>> ??? bitmap_size) {
>>> ??? >>> 1479? ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>>> ??? >>> 1480? ?bitmap_size = si->used_aligned();
>>> ??? >>> 1481? ?bool read_only = true, allow_exec = false;
>>> ??? >>> 1482? ?char* requested_addr = NULL; // allow OS to pick any
>>> ??? location
>>> ??? >>> 1483? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>>> ??? si->file_offset(),
>>> ??? >>> 1484 requested_addr, bitmap_size,
>>> ??? >>> read_only, allow_exec);
>>> ??? >>>
>>> ??? >>> We need to handle mapping failure here.
>>> ??? >> It's handled here:
>>> ??? >>
>>> ??? >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>>> ??? >>? ? ?log_debug(cds, reloc)("runtime archive relocation start");
>>> ??? >>? ? ?size_t bitmap_size;
>>> ??? >>? ? ?char* bitmap_base = map_relocation_bitmap(bitmap_size);
>>> ??? >>? ? ?if (bitmap_base != NULL) {
>>> ??? >>? ? ?...
>>> ??? >>? ? ?} else {
>>> ??? >>? ? ? ?log_error(cds)("failed to map relocation bitmap");
>>> ??? >>? ? ? ?return false;
>>> ??? >>? ? ?}
>>> ??? >>
>>> ??? > 'bitmap_base' is used immediately after map_memory(). So the 
>>> check
>>> ??? > needs to be done immediately after map_memory(), but not in the
>>> ??? caller
>>> ??? > of map_relocation_bitmap().
>>> ??? >
>>> ??? > 1490? ?char* bitmap_base = os::map_memory(_fd, _full_path,
>>> ??? si->file_offset(),
>>> ??? > 1491 requested_addr, bitmap_size,
>>> ??? > read_only, allow_exec);
>>> ??? > 1492
>>> ??? > 1493? ?if (VerifySharedSpaces && bitmap_base != NULL &&
>>> ??? > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>>>
>>> ??? OK, I'll fix that.
>>>
>>> ??? >
>>> ??? >
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>> 1513? ? ?// debug only -- the current value of the pointers 
>>> to be
>>> ??? >>> patched must be within this
>>> ??? >>> 1514? ? ?// range (i.e., must be between the requesed base
>>> ??? address,
>>> ??? >>> and the of the current archive).
>>> ??? >>> 1515? ? ?// Note: top archive may point to objects in the base
>>> ??? >>> archive, but not the other way around.
>>> ??? >>> 1516? ? ?address valid_old_base =
>>> ??? (address)header()->requested_base_address();
>>> ??? >>> 1517? ? ?address valid_old_end? = valid_old_base +
>>> ??? mapping_end_offset();
>>> ??? >>>
>>> ??? >>> Please place all FileMapInfo::relocate_pointers debugging only
>>> ??? code
>>> ??? >>> under #ifdef ASSERT.
>>> ??? >> Ditto about ifdef ASSERT
>>> ??? >>
>>> ??? >>> - src/hotspot/share/memory/heapShared.cpp
>>> ??? >>>
>>> ??? >>>? ? 441 void
>>> ??? HeapShared::initialize_from_archived_subgraph(Klass* k) {
>>> ??? >>>? ? 442? ?if (!open_archive_heap_region_mapped() ||
>>> ??? !MetaspaceObj::is_shared(k)) {
>>> ??? >>>? ? 443? ? ?return; // nothing to do
>>> ??? >>>? ? 444? ?}
>>> ??? >>>
>>> ??? >>> When do we call HeapShared::initialize_from_archived_subgraph
>>> ??? for a
>>> ??? >>> klass that's not shared?
>>> ??? >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>>> ??? that for
>>> ??? >> debugging purposes only.
>>> ??? >>
>>> ??? >>>? ? 616? ?DEBUG_ONLY({
>>> ??? >>>? ? 617? ? ? ?Klass* klass = orig_obj->klass();
>>> ??? >>>? ? 618? ? ? ?assert(klass != 
>>> SystemDictionary::Module_klass() &&
>>> ??? >>>? ? 619? ? ? ? ? ? ? klass !=
>>> ??? SystemDictionary::ResolvedMethodName_klass() &&
>>> ??? >>>? ? 620? ? ? ? ? ? ? klass !=
>>> ??? SystemDictionary::MemberName_klass() &&
>>> ??? >>>? ? 621? ? ? ? ? ? ? klass != 
>>> SystemDictionary::Context_klass() &&
>>> ??? >>>? ? 622? ? ? ? ? ? ? klass !=
>>> ??? SystemDictionary::ClassLoader_klass(), "we
>>> ??? >>> can only relocate metaspace object pointers inside 
>>> java_lang_Class
>>> ??? >>> instances");
>>> ??? >>>? ? 623? ? ?});
>>> ??? >>>
>>> ??? >>> Let's leave the above for a separate RFE. I think assert is not
>>> ??? >>> sufficient for the check. Also, why ResolvedMethodName, 
>>> Module and
>>> ??? >>> MemberName cannot be part of the graph?
>>> ??? >>>
>>> ??? >>>
>>> ??? >> I added the following comment:
>>> ??? >>
>>> ??? >>? ? ?DEBUG_ONLY({
>>> ??? >>? ? ? ? ?// The following are classes in
>>> ??? share/classfile/javaClasses.cpp
>>> ??? >> that have injected native pointers
>>> ??? >>? ? ? ? ?// to metaspace objects. To support these classes, we
>>> ??? need to add
>>> ??? >> relocation code similar to
>>> ??? >>? ? ? ? ?// 
>>> java_lang_Class::update_archived_mirror_native_pointers.
>>> ??? >>? ? ? ? ?Klass* klass = orig_obj->klass();
>>> ??? >>? ? ? ? ?assert(klass != SystemDictionary::Module_klass() &&
>>> ??? >>? ? ? ? ? ? ? ? klass !=
>>> ??? SystemDictionary::ResolvedMethodName_klass() &&
>>> ??? >>
>>> ??? > It's too restrictive to exclude those objects from the archived
>>> ??? object
>>> ??? > graph because metadata relocation, since metadata relocation is
>>> ??? rare.
>>> ??? > The trade-off doesn't seem to buy us much.
>>> ??? >
>>> ??? > Do you plan to add the needed relocation code?
>>>
>>> ??? I looked more into this. Actually we cannot handle these 5 
>>> classes at
>>> ??? all, even without archive relocation:
>>>
>>> ??? [1] #define MODULE_INJECTED_FIELDS(macro) \
>>> ??? ?? macro(java_lang_Module, module_entry, intptr_signature, false)
>>>
>>> ??? ->? module_entry is malloc'ed
>>>
>>> ??? [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>>> ??? ?? macro(java_lang_invoke_ResolvedMethodName, vmholder,
>>> ??? object_signature, false) \
>>> ??? ?? macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>>> ??? intptr_signature, false)
>>>
>>> ??? -> these fields are related to method handles and lambda forms, 
>>> etc.
>>> ??? They can't be easily be archived without implementing lambda form
>>> ??? archiving. (I did a prototype; it's very complex and fragile).
>>>
>>> ??? [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>> ??? vmdependencies, intptr_signature, false) \
>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>> ??? last_cleanup, long_signature, false)
>>>
>>> ??? -> vmdependencies is malloc'ed.
>>>
>>> ??? [4] #define
>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>>> ??? ?? macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>>> ??? false)
>>>
>>> ??? -> this one is probably OK. Despite being declared as
>>> ??? 'intptr_signature', it seems to be used just as an integer. 
>>> However,
>>> ??? MemberNames are typically used with [2] and [3]. So let's just
>>> ??? forbid it
>>> ??? to be safe.
>>>
>>> ??? [2] [3] [4] are not used directly by regular Java code and are
>>> ??? unlikely
>>> ??? to be referenced (directly or indirectly) by static fields (except
>>> ??? for
>>> ??? the static fields in the classes in java.lang.invoke, which we
>>> ??? probably
>>> ??? won't support for heap archiving due to the problem I described for
>>> ??? [2]). Objects of these types are typically referenced via constant
>>> ??? pool
>>> ??? entries.
>>>
>>> ??? [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>>> ??? ?? macro(java_lang_ClassLoader, loader_data, intptr_signature, 
>>> false)
>>>
>>> ??? -> loader_data is malloc'ed.
>>>
>>> ??? So, I will change the DEBUG_ONLY into a product-mode check, and 
>>> quit
>>> ??? dumping if these objects are found in the object subgraph.
>>>
>>>
>>> Sounds good. Can you please also add a comment with explanation.
>>>
>>> For??ClassLoader and?Module, it worth considering caching the 
>>> additional native data some time in the future. Lois had suggested 
>>> the Module part a while ago.
>>
>> I think we can do that if/when we archive Modules directly into the 
>> shared heap.
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> ??? Maybe we should backport the check to older versions as well?
>>>
>>>
>>> We should discuss with Andrew Haley for backports to JDK 11 update 
>>> releases. Since the current OpenJDK 11 only applies Java heap 
>>> archiving to a restricted set of JDK library code, I think it is 
>>> safe without the new check.
>>>
>>> For non-LTS releases, it might not be worthwhile as they may not be 
>>> widely used?
>>
>> I agree. FYI, we (Oracle) have no plan for backporting more types of 
>> heap object archiving, so the decision would be up to whoever that 
>> decides to do so.
>>
>> Thanks
>> - Ioi
>>
>>
>>>
>>> Thanks,
>>> Jiangli
>>>
>>>
>>> ??? >
>>> ??? >>> - src/hotspot/share/memory/metaspace.cpp
>>> ??? >>>
>>> ??? >>> 1036? ?metaspace_rs = 
>>> ReservedSpace(compressed_class_space_size(),
>>> ??? >>> 1037 ? _reserve_alignment,
>>> ??? >>> 1038 ? large_pages,
>>> ??? >>> 1039 ? requested_addr);
>>> ??? >>>
>>> ??? >>> Please fix indentation.
>>> ??? >> Fixed.
>>> ??? >>
>>> ??? >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>>> ??? >>>
>>> ??? >>>? ? ?78? ?enum SpecialRef {
>>> ??? >>>? ? ?79? ? ?_method_entry_ref
>>> ??? >>>? ? ?80? ?};
>>> ??? >>>
>>> ??? >>> Are there other pointers that are not references to
>>> ??? MetaspaceObj? If
>>> ??? >>> _method_entry_ref is the only type, it's probably not worth
>>> ??? defining
>>> ??? >>> SpecialRef?
>>> ??? >> There may be more types in the future, so I want to have a
>>> ??? stable API
>>> ??? >> that can be easily expanded without touching all the code that
>>> ??? uses it.
>>> ??? >>
>>> ??? >>
>>> ??? >>> - src/hotspot/share/memory/metaspaceShared.hpp
>>> ??? >>>
>>> ??? >>>? ? ?42 enum MapArchiveResult {
>>> ??? >>>? ? ?43? ?MAP_ARCHIVE_SUCCESS,
>>> ??? >>>? ? ?44? ?MAP_ARCHIVE_MMAP_FAILURE,
>>> ??? >>>? ? ?45? ?MAP_ARCHIVE_OTHER_FAILURE
>>> ??? >>>? ? ?46 };
>>> ??? >>>
>>> ??? >>> If we want to define different failure types, it's probably 
>>> worth
>>> ??? >>> using separate types for relocation failure and validation
>>> ??? failure.
>>> ??? >> For now, I just need to distinguish between MMAP_FAILURE (where
>>> ??? I should
>>> ??? >> attempt to remap at an alternative address) and OTHER_FAILURE
>>> ??? (where the
>>> ??? >> CDS archive loading will fail -- due to validation error,
>>> ??? insufficient
>>> ??? >> memory, etc -- without attempting to remap.)
>>> ??? >>
>>> ??? >>> ---
>>> ??? >>>
>>> ??? >>>? ? 193? ?static intx _mapping_delta; // FIXME rename
>>> ??? >>>
>>> ??? >>> How about _relocation_delta?
>>> ??? >> Changed as suggested.
>>> ??? >>
>>> ??? >>> - src/hotspot/share/oops/instanceKlass
>>> ??? >>>
>>> ??? >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>>> ??? >>>
>>> ??? >>> The use of _disable_method_binary_search is not necessary. You
>>> ??? can use
>>> ??? >>> DynamicDumpSharedSpaces for the purpose. That would make things
>>> ??? >>> cleaner.
>>> ??? >> If we always disable the binary search when
>>> ??? DynamicDumpSharedSpaces is
>>> ??? >> true, it will slow down normal execution of the Java program 
>>> when
>>> ??? >> -XX:ArchiveClassesAtExit has been specified, but the program
>>> ??? hasn't exited.
>>> ??? > Could you please add some comments to 
>>> _disable_method_binary_search
>>> ??? > with the above explanation? Thanks.
>>>
>>> ??? OK
>>> ??? >
>>> ??? >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>>> ??? >>>
>>> ??? >>>? ? ?76? ? ? ? ? ? ? ? ? ? ?if (name.equals("s0") ||
>>> ??? name.equals("s1")) {
>>> ??? >>>? ? ?77? ? ? ? ? ? ? ? ? ? ? ?// String regions are listed at
>>> ??? the end and
>>> ??? >>> they may not be fully occupied.
>>> ??? >>>? ? ?78? ? ? ? ? ? ? ? ? ? ? ?break;
>>> ??? >>>? ? ?79? ? ? ? ? ? ? ? ? ? ?} else if (name.equals("bm")) {
>>> ??? >>>? ? ?80? ? ? ? ? ? ? ? ? ? ? ?// Bitmap space does not have a
>>> ??? requested address.
>>> ??? >>>? ? ?81? ? ? ? ? ? ? ? ? ? ? ?break;
>>> ??? >>>
>>> ??? >>> It's not part of your change, but could you please fix line 76
>>> ??? - 78
>>> ??? >>> since it is trivial. It seems the lines can be removed.
>>> ??? >> Removed.
>>> ??? >>
>>> ??? >>> - /src/hotspot/share/memory/archiveUtils.hpp
>>> ??? >>> The file name does not match with the macro '#ifndef
>>> ??? >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>>> ??? >>> archiveUtils.* ? archiveRelocator.hpp and 
>>> archiveRelocator.cpp are
>>> ??? >>> more descriptive.
>>> ??? >> I named the file archiveUtils.hpp so we can move other misc
>>> ??? stuff used
>>> ??? >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>>> ??? >> metaspaceShared.hpp), since theses are not used by the majority
>>> ??? of the
>>> ??? >> files that use metaspaceShared.hpp.
>>> ??? >>
>>> ??? >> I fixed the ifdef.
>>> ??? >>
>>> ??? >>> - src/hotspot/share/memory/archiveUtils.cpp
>>> ??? >>>
>>> ??? >>>? ? ?36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>>> ??? address*
>>> ??? >>> ptr_base, address* ptr_end) {
>>> ??? >>>? ? ?37? ?assert(_ptrmap == NULL, "initialize only once");
>>> ??? >>>? ? ?38? ?_ptr_base = ptr_base;
>>> ??? >>>? ? ?39? ?_ptr_end = ptr_end;
>>> ??? >>>? ? ?40? ?_compacted = false;
>>> ??? >>>? ? ?41? ?_ptrmap = ptrmap;
>>> ??? >>>? ? ?42? ?_ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>>> ??? default
>>> ??? >>> archive is about 12MB.
>>> ??? >>>? ? ?43 }
>>> ??? >>>
>>> ??? >>> Could we do a better estimate here? We could guesstimate the 
>>> size
>>> ??? >>> based on the current used class space and metaspace size. It's
>>> ??? okay if
>>> ??? >>> a larger bitmap used, since it can be reduced after all
>>> ??? marking are
>>> ??? >>> done.
>>> ??? >> The bitmap is automatically expanded when necessary in
>>> ??? >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>>> ??? of the
>>> ??? >> total archive size, so even if we do expand, the cost will be
>>> ??? trivial.
>>> ??? > The initial value is based on the default CDS archive. When 
>>> dealing
>>> ??? > with a really large archive, it would have to re-grow many times.
>>> ??? > Also, using a hard-coded value is less desirable.
>>>
>>> ??? OK, I changed it to the following
>>>
>>> ??? ?? // Use this as initial guesstimate. We should need less space
>>> ??? in the
>>> ??? ?? // archive, but if we're wrong the bitmap will be expanded
>>> ??? automatically.
>>> ??? ?? size_t estimated_archive_size = 
>>> MetaspaceGC::capacity_until_GC();
>>> ??? ?? // But set it smaller in debug builds so we always test the
>>> ??? expansion
>>> ??? code.
>>> ??? ?? // (Default archive is about 12MB).
>>> ??? ?? DEBUG_ONLY(estimated_archive_size = 6 * M);
>>>
>>> ??? ?? // We need one bit per pointer in the archive.
>>> ??? ?? _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>>>
>>>
>>> ??? Thanks!
>>> ??? - Ioi
>>>
>>> ??? >
>>> ??? >>>
>>> ??? >>>
>>> ??? >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>>> ??? <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>>> ??? >>>> Hi Ioi,
>>> ??? >>>>
>>> ??? >>>> This is another great step for CDS usability improvement.
>>> ??? Thank you!
>>> ??? >>>>
>>> ??? >>>> I have a high level question (or request): could we consider
>>> ??? >>>> separating the relocation work for 'direct' class metadata
>>> ??? from other
>>> ??? >>>> types of metadata (such as the shared system dictionary,
>>> ??? symbol table,
>>> ??? >>>> etc)? Initially we only relocate the tables and other
>>> ??? archived global
>>> ??? >>>> data. When each archived class is being loaded, we can
>>> ??? relocate all
>>> ??? >>>> the pointers within the current class. We could find the
>>> ??? segment (for
>>> ??? >>>> the current class) in the bitmap and update the pointers
>>> ??? within the
>>> ??? >>>> segment. That way we can reduce initial startup costs and
>>> ??? also avoid
>>> ??? >>>> relocating class data that's not used at runtime. In some
>>> ??? real world
>>> ??? >>>> large systems, an archive may contain extremely large 
>>> number of
>>> ??? >>>> classes.
>>> ??? >>>>
>>> ??? >>>> Following are partial review comments so we can move things
>>> ??? forward.
>>> ??? >>>> Still going through the rest of the changes.
>>> ??? >>>>
>>> ??? >>>> - src/hotspot/share/classfile/javaClasses.cpp
>>> ??? >>>>
>>> ??? >>>> 1218 void
>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>>> ??? >>>> archived_mirror) {
>>> ??? >>>> 1219? ?Klass* k =
>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>>> ??? >>>> 1220? ?if (k != NULL) { // k is NULL for the primitive
>>> ??? classes such as
>>> ??? >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>>> ??? >>>> 1221 ?archived_mirror->metadata_field_put(_klass_offset,
>>> ??? >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>>> ??? >>>> 1222? ?}
>>> ??? >>>> 1223 ...
>>> ??? >>>>
>>> ??? >>>> Primitive type mirrors are handled separately. Could you
>>> ??? please verify
>>> ??? >>>> if this call path happens for primitive type mirror?
>>> ??? >>>>
>>> ??? >>>> To answer my question above, looks like you added the
>>> ??? following, which
>>> ??? >>>> is to be used for primitive type mirrors. That seems to be
>>> ??? the reason
>>> ??? >>>> why update_archived_mirror_native_pointers is trying to also
>>> ??? cover
>>> ??? >>>> primitive type. It better to have a separate API for
>>> ??? primitive type
>>> ??? >>>> mirror, which is cleaner. And, we also can replace the above
>>> ??? check at
>>> ??? >>>> line 1220 to be an assert for regular mirrors.
>>> ??? >>>>
>>> ??? >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>>> ??? >>>> +? do_oop(p);
>>> ??? >>>> +? oop mirror = *p;
>>> ??? >>>> +? if (mirror != NULL) {
>>> ??? >>>> +
>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
>>> ??? >>>> +? }
>>> ??? >>>> +}
>>> ??? >>>> +
>>> ??? >>>>
>>> ??? >>>> How about renaming update_archived_mirror_native_pointers to
>>> ??? >>>> update_archived_mirror_klass_pointers.
>>> ??? >>>>
>>> ??? >>>> It would be good to pass the current klass as an argument. 
>>> We can
>>> ??? >>>> verify the relocated pointer matches with the current klass
>>> ??? pointer.
>>> ??? >>>>
>>> ??? >>>> We should also check if relocation is necessary before
>>> ??? spending cycles
>>> ??? >>>> to obtain the klass pointer from the mirror.
>>> ??? >>>>
>>> ??? >>>> 1252 ?update_archived_mirror_native_pointers(m);
>>> ??? >>>> 1253
>>> ??? >>>> 1254? ?// mirror is archived, restore
>>> ??? >>>> 1255 ?assert(HeapShared::is_archived_object(m), "must be 
>>> archived
>>> ??? >>>> mirror object");
>>> ??? >>>> 1256? ?Handle mirror(THREAD, m);
>>> ??? >>>>
>>> ??? >>>> Could we move the line at 1252 after the assert at line 1255?
>>> ??? >>>>
>>> ??? >>>> - src/hotspot/share/include/cds.h
>>> ??? >>>>
>>> ??? >>>>? ? ?47? ?int? ? ?_mapped_from_file;? // Is this region mapped
>>> ??? from a file?
>>> ??? >>>>? ? ?48? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?// If false, this 
>>> region was
>>> ??? >>>> initialized using os::read().
>>> ??? >>>>
>>> ??? >>>> Is the new field truly needed? It seems we could use
>>> ??? _mapped_base to
>>> ??? >>>> determine if a region is mapped or not?
>>> ??? >>>>
>>> ??? >>>> - src/hotspot/share/memory/dynamicArchive.cpp
>>> ??? >>>>
>>> ??? >>>> Could you please remove the debugging print code in
>>> ??? >>>> dynamic_dump_method_comparator? Or convert those to logging
>>> ??? output if
>>> ??? >>>> they are helpful.
>>> ??? >>>>
>>> ??? >>>> Will send out the rest of the review comments later.
>>> ??? >>>>
>>> ??? >>>> Best,
>>> ??? >>>>
>>> ??? >>>> Jiangli
>>> ??? >>>>
>>> ??? >>>>
>>> ??? >>>>
>>> ??? >>>>
>>> ??? >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
>>> ??? <mailto:ioi.lam at oracle.com>> wrote:
>>> ??? >>>>> Bug:
>>> ??? >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
>>> ??? >>>>>
>>> ??? >>>>> Webrev:
>>> ??? >>>>>
>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/ 
>>>
>>> ??? >>>>>
>>> ??? >>>>> Design:
>>> ??? >>>>>
>>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt 
>>>
>>> ??? >>>>>
>>> ??? >>>>>
>>> ??? >>>>> Overview:
>>> ??? >>>>>
>>> ??? >>>>> The CDS archive is mmaped to a fixed address range 
>>> (starting at
>>> ??? >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
>>> ??? >>>>> requested address range is not available (usually due to 
>>> Address
>>> ??? >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give 
>>> up and
>>> ??? >>>>> will load classes dynamically using class files.
>>> ??? >>>>>
>>> ??? >>>>> [a] This causes slow down in JVM start-up.
>>> ??? >>>>> [b] Handling of mapping failures causes unnecessary
>>> ??? complication in
>>> ??? >>>>>? ? ? ? the CDS tests.
>>> ??? >>>>>
>>> ??? >>>>> Here are some preliminary benchmarking results (using
>>> ??? default CDS archive,
>>> ??? >>>>> running helloworld):
>>> ??? >>>>>
>>> ??? >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
>>> ??? >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
>>> ??? >>>>> (c) 86.2ms (CDS disabled)
>>> ??? >>>>>
>>> ??? >>>>> The small degradation in (b) is caused by the relocation of
>>> ??? >>>>> absolute pointers embedded in the CDS archive. However, it is
>>> ??? >>>>> still a big improvement over case (c)
>>> ??? >>>>>
>>> ??? >>>>> Please see the design doc (link above) for details.
>>> ??? >>>>>
>>> ??? >>>>> Thanks
>>> ??? >>>>> - Ioi
>>> ??? >>>>>
>>>
>>
>


From jianglizhou at google.com  Fri Nov  8 02:11:16 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Thu, 7 Nov 2019 18:11:16 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <99030987-a044-53fb-784b-62408333137a@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
Message-ID: <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>

I looked both 05.full and 06.delta webrevs. They look good.

I still feel a bit uneasy about the potential runtime impact when data
does get relocated. Long running apps/services may be shy away from
enabling archive at runtime, if there is a detectable overhead even
though it may only occur rarely. As relocation is enabled by default
and users cannot turn it off, disabling with -Xshare:off entirely
would become the only choice. Could you please create a new RFE
(possibly with higher priority) to investigate the potential effect,
or provide an option for users to opt-in relocation with the
command-line switch?

Regards,
Jiangli

On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Hi Coleen,
>
> Thanks for the review. Here's an webrev that has incorporated your
> suggestions:
>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
>
> Please see comments in-line
>
> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
> > Hi, I've done a more high level code review of this and it looks good!
> >
> > http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
> >
> >
> > I think these classes require comments on what they do and why. The
> > comments you sent me offline look good.
>
> I added more comments for ArchivePtrMarker::_compacted per your offline
> request.
>
> >
> > Also .hpp files shouldn't include .inline.hpp files, like
> > bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
> > the cpp file.
>
> I moved the do_bit() function into archiveUtils.inline.hpp, since is
> used by 3 .cpp files, and performance is important.
>
> >
> > I wonder if the exception list of classes to exclude should be a
> > function in javaClasses.hpp/cpp where the explanation would make more
> > sense?  ie bool
> > JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>
> I moved the checking code to javaClasses.cpp. Since we do (partially)
> support java.lang.Class, which has injected native pointers, I named the
> function as JavaClasses::is_supported_for_archiving instead. I also
> massaged the comments a little for clarification.
>
> >
> > Is there already an RFE to move the DumpSharedSpaces output from
> > tty->print() to log_info() ?
>
> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
> dumping tty->print_cr() to unified logging).
>
> Thanks
> - Ioi
>
> >
> > Thanks,
> > Coleen
> >
> > On 11/6/19 4:17 PM, Ioi Lam wrote:
> >> Hi Jiangli,
> >>
> >> I've uploaded the webrev after integrating your comments:
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
> >>
> >>
> >> Please see more replies below:
> >>
> >>
> >> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
> >>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
> >>> <mailto:ioi.lam at oracle.com>> wrote:
> >>>
> >>>     Hi Jiangli,
> >>>
> >>>     Thank you so much for spending time reviewing this RFE!
> >>>
> >>>     On 11/3/19 6:34 PM, Jiangli Zhou wrote:
> >>>     > Hi Ioi,
> >>>     >
> >>>     > Sorry for the delay again. Will try to put this on the top of my
> >>>     list
> >>>     > next week and reduce the turn-around time. The updates look
> >>> good in
> >>>     > general.
> >>>     >
> >>>     > We might want to have a better strategy when choosing metadata
> >>>     > relocation address (when relocation is needed). Some
> >>>     > applications/benchmarks may be more sensitive to cache
> >>> locality and
> >>>     > memory/data layout. There was a bug,
> >>>     > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
> >>> 1G gap
> >>>     > between Java heap data and metadata before JDK 12. The gap
> >>> seemed to
> >>>     > cause a small but noticeable runtime effect in one case that I
> >>> came
> >>>     > across.
> >>>
> >>>     I guess you're saying we should try to relocate the archive into
> >>>     somewhere under 32GB?
> >>>
> >>>
> >>> I don't yet have sufficient data that determins if mapping at low
> >>> 32G produces better runtime performance. I experimented with that,
> >>> but didn't see noticeable difference when comparing to mapping at
> >>> the current default address. It doesn't hurt, I think. So it may be
> >>> a better choice than relocating to a random address in high 32G
> >>> space (when Java heap is in low 32G address space).
> >>
> >> Maybe we should reconsider this when we have more concrete data for
> >> the benefits of moving the compressed class space to under 32G.
> >>
> >> Please note that in metaspace.cpp, when CDS is disabled and  the VM
> >> fails to allocate the class space at the requested address
> >> (0x7c000000 for 16GB heap), it also just allocates from a random
> >> address (without trying to to search under 32GB):
> >>
> >> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
> >>
> >>
> >> This code has been there since 2013 and we have not seen any issues.
> >>
> >>
> >>
> >>
> >>>
> >>>     Could you elaborate more about the performance issue, especially
> >>>     about
> >>>     cache locality? I looked at JDK-8213713 but it didn't mention about
> >>>     performance.
> >>>
> >>>
> >>> When enabling CDS we noticed a small runtime overhead in JDK 11
> >>> recently with a benchmark. After I backported JDK-8213713 to 11, it
> >>> seemed to reduce the runtime overhead that the benchmark was
> >>> experiencing.
> >>>
> >>>
> >>>     Also, by default, we have non-zero narrow_klass_base and
> >>>     narrow_klass_shift = 3, and archive relocation doesn't change that:
> >>>
> >>>     $ java -Xlog:cds=debug -version
> >>>     ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
> >>>     $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
> >>>     ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
> >>>
> >>>     We always use narrow_klass_shift due to this:
> >>>
> >>>        // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
> >>>        //
> >>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
> >>>        // how dump time narrow_klass_shift is set. Although, CDS can
> >>> work
> >>>        // with zero-shift mode also, to be consistent with AOT it uses
> >>>        // LogKlassAlignmentInBytes for klass shift so archived java
> >>>     heap objects
> >>>        // can be used at same time as AOT code.
> >>>        if (!UseSharedSpaces
> >>>            && (uint64_t)(higher_address - lower_base) <=
> >>>     UnscaledClassSpaceMax) {
> >>>          CompressedKlassPointers::set_shift(0);
> >>>        } else {
> >>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
> >>>        }
> >>>
> >>>
> >>> Right. If we relocate to low 32G space, it needs to make sure that
> >>> the range containing the mapped class data and class space must be
> >>> encodable.
> >>>
> >>>
> >>>     > Here are some additional comments (minor).
> >>>     >
> >>>     > Could you please fix the long lines in the following?
> >>>     >
> >>>     > 1237 void
> >>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
> >>>     > archived_mirror) {
> >>>     > 1238   if (MetaspaceShared::relocation_delta() != 0) {
> >>>     > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
> >>>     > NULL, "must be for primitive class");
> >>>     > 1240
> >>>     > 1241     Klass* ak =
> >>>     > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
> >>>     > 1242     if (ak != NULL) {
> >>>     > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
> >>>     > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
> >>>     > 1244     }
> >>>     > 1245   }
> >>>     > 1246 }
> >>>     >
> >>>     > src/hotspot/share/memory/dynamicArchive.cpp
> >>>     >
> >>>     >   889   Thread* THREAD = Thread::current();
> >>>     >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
> >>>     > dynamic_dump_method_comparator);
> >>>     >   891   if (ik->default_methods() != NULL) {
> >>>     >   892  Method::sort_methods(ik->default_methods(),
> >>>     > /*set_idnums=*/false, dynamic_dump_method_comparator);
> >>>     >   893   }
> >>>     >
> >>>
> >>>     OK will do.
> >>>
> >>>     > Please see inlined comments below.
> >>>     >
> >>>     > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
> >>>     <mailto:ioi.lam at oracle.com>> wrote:
> >>>     >> Hi Jiangli,
> >>>     >>
> >>>     >> Thanks for the review. I've updated the patch according to your
> >>>     comments:
> >>>     >>
> >>>     >>
> >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
> >>>
> >>>     >>
> >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
> >>>
> >>>     >>
> >>>     >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
> >>>     in my
> >>>     >> reply to Calvin's comments).
> >>>     >>
> >>>     >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
> >>>     >>> Hi Ioi,
> >>>     >>>
> >>>     >>> Sorry for the delay. Here are my remaining comments.
> >>>     >>>
> >>>     >>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>     >>>
> >>>     >>> 128   static intx _method_comparator_name_delta;
> >>>     >>>
> >>>     >>> The name of the above variable is confusing. It's the value of
> >>>     >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
> >>>     >>> directly.
> >>>     >> _buffer_to_target_delta is a non-static field, but
> >>>     >> dynamic_dump_method_comparator() must be a static function so
> >>>     it can't
> >>>     >> use the non-static field easily.
> >>>     >
> >>>     > It sounds like an issue. _buffer_to_target_delta was made as a
> >>>     > non-static mostly because we might support more than one dynamic
> >>>     > archives in the future. However, today's usages bake in an
> >>>     assumption
> >>>     > that _buffer_to_target_delta is a singleton value. It is
> >>> cleaner to
> >>>     > either make _buffer_to_target_delta as a static variable for
> >>> now, or
> >>>     > adding an access API in DynamicArchiveBuilder to allow other
> >>> code to
> >>>     > properly and correctly use the value.
> >>>
> >>>     OK, I'll move it to a static variable.
> >>>
> >>>     >
> >>>     >>> Also, we can do a quick pointer comparison of 'a_name' and
> >>>     >>> 'b_name' first before adjusting the pointers.
> >>>     >> I added this:
> >>>     >>
> >>>     >>       if (a_name == b_name) {
> >>>     >>         return 0;
> >>>     >>       }
> >>>     >>
> >>>     >>> ---
> >>>     >>>
> >>>     >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
> >>>     >>> ...
> >>>     >>>    944
> >>>     >>>    945  ArchivePtrMarker::compact(relocatable_base,
> >>>     relocatable_end);
> >>>     >>> ...
> >>>     >>>
> >>>     >>>    974     SharedDataRelocator patcher((address*)patch_base,
> >>>     >>> (address*)patch_end, valid_old_base, valid_old_end,
> >>>     >>>    975  valid_new_base, valid_new_end, addr_delta);
> >>>     >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
> >>>     >>>
> >>>     >>> Could we reduce the number of data re-iterations to help
> >>> archive
> >>>     >>> dumping performance. The ArchivePtrMarker::compact operation
> >>>     can be
> >>>     >>> combined with the patching iteration.
> >>>     ArchivePtrMarker::compact API
> >>>     >>> can be removed.
> >>>     >> That's a good idea. I implemented it using a template parameter
> >>>     so that
> >>>     >> we can have max performance when relocating the archive at run
> >>>     time.
> >>>     >>
> >>>     >> I added comments to explain why the relocation is done here. The
> >>>     >> relocation is pretty rare (only when the base archive was not
> >>>     mapped at
> >>>     >> the default location).
> >>>     >>
> >>>     >>> ---
> >>>     >>>
> >>>     >>>    967     address valid_new_base =
> >>>     >>> (address)Arguments::default_SharedBaseAddress();
> >>>     >>>    968     address valid_new_end  = valid_new_base +
> >>>     base_plus_top_size;
> >>>     >>>
> >>>     >>> The debugging only code can be included under #ifdef ASSERT.
> >>>     >> These values are actually also used in debug logging so they
> >>>     can't be
> >>>     >> ifdef'ed out.
> >>>     >>
> >>>     >> Also, the c++ compiler is pretty good with eliding code
> >>> that's no
> >>>     >> actually used. If I comment out all the logging code in
> >>>     >> DynamicArchiveBuilder::relocate_buffer_to_target() and
> >>>     >> SharedDataRelocator, gcc elides all the unused fields and their
> >>>     >> assignments. So no code is generated for this, etc.
> >>>     >>
> >>>     >>       address valid_new_base =
> >>>     >> (address)Arguments::default_SharedBaseAddress();
> >>>     >>
> >>>     >> Since #ifdef ASSERT makes the code harder to read, I think we
> >>>     should use
> >>>     >> it only when really necessary.
> >>>     > It seems cleaner to get rid of these debugging only variables, by
> >>>     > using 'relocatable_base' and
> >>>     > '(address)Arguments::default_SharedBaseAddress()' in the logging
> >>>     code.
> >>>
> >>>     SharedDataRelocator is used under 3 different situations. These six
> >>>     variables (patch_base, patch_end, valid_old_base, valid_old_end,
> >>>     valid_new_base, valid_new_end) describes what is being patched,
> >>>     and what
> >>>     the expectations are, for each situation. The code will be hard to
> >>>     understand without them.
> >>>
> >>>     Please note there's also logging code in the SharedDataRelocator
> >>>     constructor that prints out these values.
> >>>
> >>>     I think I'll just remove the 'debug only' comment to avoid
> >>> confusion.
> >>>
> >>>
> >>> Ok.
> >>>
> >>>
> >>>     >
> >>>     >>> ---
> >>>     >>>
> >>>     >>>    993
> >>>  dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
> >>>     >>>
> >>>     >>> We could combine the archived heap data bitmap into the new
> >>>     region as
> >>>     >>> well? It can be handled as a separate RFE.
> >>>     >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
> >>>     >>
> >>>     >>> - src/hotspot/share/memory/filemap.cpp
> >>>     >>>
> >>>     >>> 1038     if (is_static()) {
> >>>     >>> 1039       if (errno == ENOENT) {
> >>>     >>> 1040         // Not locating the shared archive is ok.
> >>>     >>> 1041         fail_continue("Specified shared archive not found
> >>>     (%s).",
> >>>     >>> _full_path);
> >>>     >>> 1042       } else {
> >>>     >>> 1043         fail_continue("Failed to open shared archive file
> >>>     (%s).",
> >>>     >>> 1044  os::strerror(errno));
> >>>     >>> 1045       }
> >>>     >>> 1046     } else {
> >>>     >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
> >>>     >>> doesn't exist: %s", _full_path);
> >>>     >>> 1048     }
> >>>     >>>
> >>>     >>> If the top layer is explicitly specified by the user, a
> >>>     warning does
> >>>     >>> not seem to be a proper behavior if the VM fails to open the
> >>>     archive
> >>>     >>> file.
> >>>     >>>
> >>>     >>> If might be better to handle the relocation unrelated code in
> >>>     separate
> >>>     >>> changeset and track with a separate RFE.
> >>>     >> This code was moved from
> >>>     >>
> >>>     >>
> >>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
> >>>
> >>>     >>
> >>>     >> so I am not changing the behavior. If you want, we can file an
> >>>     REF to
> >>>     >> change the behavior.
> >>>     > Ok. A new RFE sounds like the right thing to re-evaluable the
> >>> usage
> >>>     > issue here. Thanks.
> >>>
> >>>     I created https://bugs.openjdk.java.net/browse/JDK-8233446
> >>>
> >>>     >>> ---
> >>>     >>>
> >>>     >>> 1148 void FileMapInfo::write_region(int region, char* base,
> >>>     size_t size,
> >>>     >>> 1149                                bool read_only, bool
> >>>     allow_exec) {
> >>>     >>> ...
> >>>     >>> 1154
> >>>     >>> 1155   if (region == MetaspaceShared::bm) {
> >>>     >>> 1156     target_base = NULL;
> >>>     >>> 1157   } else if (DynamicDumpSharedSpaces) {
> >>>     >>>
> >>>     >>> It's not too clear to me how the bitmap (bm) region is handled
> >>>     for the
> >>>     >>> base layer and top layer. Could you please explain?
> >>>     >> The bm region for both layers are mapped at an address picked
> >>>     by the OS:
> >>>     >>
> >>>     >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
> >>>     >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>>     >>     bitmap_size = si->used_aligned();
> >>>     >>     bool read_only = true, allow_exec = false;
> >>>     >>     char* requested_addr = NULL; // allow OS to pick any
> >>> location
> >>>     >>     char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>     si->file_offset(),
> >>>     >> requested_addr, bitmap_size,
> >>>     >> read_only, allow_exec);
> >>>     >>
> >>>     > Ok, after staring at the code for a few seconds I saw that's
> >>>     intended.
> >>>     > If the current region is 'bm', then the 'target_base' is NULL
> >>>     > regardless if it's static or dynamic archive. Otherwise, the
> >>>     > 'target_base' is handled differently for the static and dynamic
> >>>     case.
> >>>     > The following would be cleaner and has better reliability.
> >>>     >
> >>>     >     char* target_base = NULL;
> >>>     >
> >>>     >     // The target_base is NULL for 'bm' region.
> >>>     >     if (!region == MetaspaceShared::bm) {
> >>>     >       if (DynamicDumpSharedSpaces) {
> >>>     >         assert(!HeapShared::is_heap_region(region), "dynamic
> >>> archive
> >>>     > doesn't support heap regions");
> >>>     >         target_base = DynamicArchive::buffer_to_target(base);
> >>>     >       } else {
> >>>     >         target_base = base;
> >>>     >       }
> >>>     >    }
> >>>
> >>>     How about this?
> >>>
> >>>        char* target_base;
> >>>        if (region == MetaspaceShared::bm) {
> >>>          target_base = NULL; // always NULL for bm region.
> >>>        } else {
> >>>          if (DynamicDumpSharedSpaces) {
> >>>              assert(!HeapShared::is_heap_region(region), "dynamic
> >>> archive
> >>>     doesn't support heap regions");
> >>>              target_base = DynamicArchive::buffer_to_target(base);
> >>>          } else {
> >>>              target_base = base;
> >>>          }
> >>>        }
> >>>
> >>>
> >>> No objection If you prefer the extra 'else' block.
> >>>
> >>>
> >>>     >
> >>>     >>> ---
> >>>     >>>
> >>>     >>> 1362
> >>>  DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>
> >>>     >>>
> >>>     >>> Could you please explain the above?
> >>>     >> I added the comments
> >>>     >>
> >>>     >>     // Make sure we don't attempt to use
> >>>     header()->mapped_base_address()
> >>>     >> unless
> >>>     >>     // it's been successfully mapped.
> >>>     >>
> >>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>
> >>>     >>
> >>>     >>> ---
> >>>     >>>
> >>>     >>> 1359   FileMapRegion* last_region = NULL;
> >>>     >>>
> >>>     >>> 1371     if (last_region != NULL) {
> >>>     >>> 1372       // Ensure that the OS won't be able to allocate new
> >>>     memory
> >>>     >>> spaces between any mapped
> >>>     >>> 1373       // regions, or else it would mess up the simple
> >>>     comparision
> >>>     >>> in MetaspaceObj::is_shared().
> >>>     >>> 1374       assert(si->mapped_base() ==
> >>> last_region->mapped_end(),
> >>>     >>> "must have no gaps");
> >>>     >>>
> >>>     >>> 1379     last_region = si;
> >>>     >>>
> >>>     >>> Can you please place 'last_region' related code under #ifdef
> >>>     ASSERT?
> >>>     >> I think that will make the code more cluttered. The compiler
> >>> will
> >>>     >> optimize out that away.
> >>>     > It's cleaner to define debugging only variable for debugging only
> >>>     > builds. You can wrapper it and related usage with DEBUG_ONLY.
> >>>
> >>>     OK, will do.
> >>>
> >>>     >
> >>>     >>> ---
> >>>     >>>
> >>>     >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
> >>>     bitmap_size) {
> >>>     >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>>     >>> 1480   bitmap_size = si->used_aligned();
> >>>     >>> 1481   bool read_only = true, allow_exec = false;
> >>>     >>> 1482   char* requested_addr = NULL; // allow OS to pick any
> >>>     location
> >>>     >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>     si->file_offset(),
> >>>     >>> 1484 requested_addr, bitmap_size,
> >>>     >>> read_only, allow_exec);
> >>>     >>>
> >>>     >>> We need to handle mapping failure here.
> >>>     >> It's handled here:
> >>>     >>
> >>>     >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
> >>>     >>     log_debug(cds, reloc)("runtime archive relocation start");
> >>>     >>     size_t bitmap_size;
> >>>     >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
> >>>     >>     if (bitmap_base != NULL) {
> >>>     >>     ...
> >>>     >>     } else {
> >>>     >>       log_error(cds)("failed to map relocation bitmap");
> >>>     >>       return false;
> >>>     >>     }
> >>>     >>
> >>>     > 'bitmap_base' is used immediately after map_memory(). So the
> >>> check
> >>>     > needs to be done immediately after map_memory(), but not in the
> >>>     caller
> >>>     > of map_relocation_bitmap().
> >>>     >
> >>>     > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>     si->file_offset(),
> >>>     > 1491 requested_addr, bitmap_size,
> >>>     > read_only, allow_exec);
> >>>     > 1492
> >>>     > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
> >>>     > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
> >>>
> >>>     OK, I'll fix that.
> >>>
> >>>     >
> >>>     >
> >>>     >>> ---
> >>>     >>>
> >>>     >>> 1513     // debug only -- the current value of the pointers
> >>> to be
> >>>     >>> patched must be within this
> >>>     >>> 1514     // range (i.e., must be between the requesed base
> >>>     address,
> >>>     >>> and the of the current archive).
> >>>     >>> 1515     // Note: top archive may point to objects in the base
> >>>     >>> archive, but not the other way around.
> >>>     >>> 1516     address valid_old_base =
> >>>     (address)header()->requested_base_address();
> >>>     >>> 1517     address valid_old_end  = valid_old_base +
> >>>     mapping_end_offset();
> >>>     >>>
> >>>     >>> Please place all FileMapInfo::relocate_pointers debugging only
> >>>     code
> >>>     >>> under #ifdef ASSERT.
> >>>     >> Ditto about ifdef ASSERT
> >>>     >>
> >>>     >>> - src/hotspot/share/memory/heapShared.cpp
> >>>     >>>
> >>>     >>>    441 void
> >>>     HeapShared::initialize_from_archived_subgraph(Klass* k) {
> >>>     >>>    442   if (!open_archive_heap_region_mapped() ||
> >>>     !MetaspaceObj::is_shared(k)) {
> >>>     >>>    443     return; // nothing to do
> >>>     >>>    444   }
> >>>     >>>
> >>>     >>> When do we call HeapShared::initialize_from_archived_subgraph
> >>>     for a
> >>>     >>> klass that's not shared?
> >>>     >> I've removed the !MetaspaceObj::is_shared(k). I probably added
> >>>     that for
> >>>     >> debugging purposes only.
> >>>     >>
> >>>     >>>    616   DEBUG_ONLY({
> >>>     >>>    617       Klass* klass = orig_obj->klass();
> >>>     >>>    618       assert(klass !=
> >>> SystemDictionary::Module_klass() &&
> >>>     >>>    619              klass !=
> >>>     SystemDictionary::ResolvedMethodName_klass() &&
> >>>     >>>    620              klass !=
> >>>     SystemDictionary::MemberName_klass() &&
> >>>     >>>    621              klass !=
> >>> SystemDictionary::Context_klass() &&
> >>>     >>>    622              klass !=
> >>>     SystemDictionary::ClassLoader_klass(), "we
> >>>     >>> can only relocate metaspace object pointers inside
> >>> java_lang_Class
> >>>     >>> instances");
> >>>     >>>    623     });
> >>>     >>>
> >>>     >>> Let's leave the above for a separate RFE. I think assert is not
> >>>     >>> sufficient for the check. Also, why ResolvedMethodName,
> >>> Module and
> >>>     >>> MemberName cannot be part of the graph?
> >>>     >>>
> >>>     >>>
> >>>     >> I added the following comment:
> >>>     >>
> >>>     >>     DEBUG_ONLY({
> >>>     >>         // The following are classes in
> >>>     share/classfile/javaClasses.cpp
> >>>     >> that have injected native pointers
> >>>     >>         // to metaspace objects. To support these classes, we
> >>>     need to add
> >>>     >> relocation code similar to
> >>>     >>         //
> >>> java_lang_Class::update_archived_mirror_native_pointers.
> >>>     >>         Klass* klass = orig_obj->klass();
> >>>     >>         assert(klass != SystemDictionary::Module_klass() &&
> >>>     >>                klass !=
> >>>     SystemDictionary::ResolvedMethodName_klass() &&
> >>>     >>
> >>>     > It's too restrictive to exclude those objects from the archived
> >>>     object
> >>>     > graph because metadata relocation, since metadata relocation is
> >>>     rare.
> >>>     > The trade-off doesn't seem to buy us much.
> >>>     >
> >>>     > Do you plan to add the needed relocation code?
> >>>
> >>>     I looked more into this. Actually we cannot handle these 5
> >>> classes at
> >>>     all, even without archive relocation:
> >>>
> >>>     [1] #define MODULE_INJECTED_FIELDS(macro) \
> >>>        macro(java_lang_Module, module_entry, intptr_signature, false)
> >>>
> >>>     ->  module_entry is malloc'ed
> >>>
> >>>     [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
> >>>        macro(java_lang_invoke_ResolvedMethodName, vmholder,
> >>>     object_signature, false) \
> >>>        macro(java_lang_invoke_ResolvedMethodName, vmtarget,
> >>>     intptr_signature, false)
> >>>
> >>>     -> these fields are related to method handles and lambda forms,
> >>> etc.
> >>>     They can't be easily be archived without implementing lambda form
> >>>     archiving. (I did a prototype; it's very complex and fragile).
> >>>
> >>>     [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
> >>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> >>>     vmdependencies, intptr_signature, false) \
> >>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> >>>     last_cleanup, long_signature, false)
> >>>
> >>>     -> vmdependencies is malloc'ed.
> >>>
> >>>     [4] #define
> >>> MEMBERNAME_INJECTED_FIELDS(macro) \
> >>>        macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
> >>>     false)
> >>>
> >>>     -> this one is probably OK. Despite being declared as
> >>>     'intptr_signature', it seems to be used just as an integer.
> >>> However,
> >>>     MemberNames are typically used with [2] and [3]. So let's just
> >>>     forbid it
> >>>     to be safe.
> >>>
> >>>     [2] [3] [4] are not used directly by regular Java code and are
> >>>     unlikely
> >>>     to be referenced (directly or indirectly) by static fields (except
> >>>     for
> >>>     the static fields in the classes in java.lang.invoke, which we
> >>>     probably
> >>>     won't support for heap archiving due to the problem I described for
> >>>     [2]). Objects of these types are typically referenced via constant
> >>>     pool
> >>>     entries.
> >>>
> >>>     [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
> >>>        macro(java_lang_ClassLoader, loader_data, intptr_signature,
> >>> false)
> >>>
> >>>     -> loader_data is malloc'ed.
> >>>
> >>>     So, I will change the DEBUG_ONLY into a product-mode check, and
> >>> quit
> >>>     dumping if these objects are found in the object subgraph.
> >>>
> >>>
> >>> Sounds good. Can you please also add a comment with explanation.
> >>>
> >>> For  ClassLoader and Module, it worth considering caching the
> >>> additional native data some time in the future. Lois had suggested
> >>> the Module part a while ago.
> >>
> >> I think we can do that if/when we archive Modules directly into the
> >> shared heap.
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>     Maybe we should backport the check to older versions as well?
> >>>
> >>>
> >>> We should discuss with Andrew Haley for backports to JDK 11 update
> >>> releases. Since the current OpenJDK 11 only applies Java heap
> >>> archiving to a restricted set of JDK library code, I think it is
> >>> safe without the new check.
> >>>
> >>> For non-LTS releases, it might not be worthwhile as they may not be
> >>> widely used?
> >>
> >> I agree. FYI, we (Oracle) have no plan for backporting more types of
> >> heap object archiving, so the decision would be up to whoever that
> >> decides to do so.
> >>
> >> Thanks
> >> - Ioi
> >>
> >>
> >>>
> >>> Thanks,
> >>> Jiangli
> >>>
> >>>
> >>>     >
> >>>     >>> - src/hotspot/share/memory/metaspace.cpp
> >>>     >>>
> >>>     >>> 1036   metaspace_rs =
> >>> ReservedSpace(compressed_class_space_size(),
> >>>     >>> 1037   _reserve_alignment,
> >>>     >>> 1038   large_pages,
> >>>     >>> 1039   requested_addr);
> >>>     >>>
> >>>     >>> Please fix indentation.
> >>>     >> Fixed.
> >>>     >>
> >>>     >>> - src/hotspot/share/memory/metaspaceClosure.hpp
> >>>     >>>
> >>>     >>>     78   enum SpecialRef {
> >>>     >>>     79     _method_entry_ref
> >>>     >>>     80   };
> >>>     >>>
> >>>     >>> Are there other pointers that are not references to
> >>>     MetaspaceObj? If
> >>>     >>> _method_entry_ref is the only type, it's probably not worth
> >>>     defining
> >>>     >>> SpecialRef?
> >>>     >> There may be more types in the future, so I want to have a
> >>>     stable API
> >>>     >> that can be easily expanded without touching all the code that
> >>>     uses it.
> >>>     >>
> >>>     >>
> >>>     >>> - src/hotspot/share/memory/metaspaceShared.hpp
> >>>     >>>
> >>>     >>>     42 enum MapArchiveResult {
> >>>     >>>     43   MAP_ARCHIVE_SUCCESS,
> >>>     >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
> >>>     >>>     45   MAP_ARCHIVE_OTHER_FAILURE
> >>>     >>>     46 };
> >>>     >>>
> >>>     >>> If we want to define different failure types, it's probably
> >>> worth
> >>>     >>> using separate types for relocation failure and validation
> >>>     failure.
> >>>     >> For now, I just need to distinguish between MMAP_FAILURE (where
> >>>     I should
> >>>     >> attempt to remap at an alternative address) and OTHER_FAILURE
> >>>     (where the
> >>>     >> CDS archive loading will fail -- due to validation error,
> >>>     insufficient
> >>>     >> memory, etc -- without attempting to remap.)
> >>>     >>
> >>>     >>> ---
> >>>     >>>
> >>>     >>>    193   static intx _mapping_delta; // FIXME rename
> >>>     >>>
> >>>     >>> How about _relocation_delta?
> >>>     >> Changed as suggested.
> >>>     >>
> >>>     >>> - src/hotspot/share/oops/instanceKlass
> >>>     >>>
> >>>     >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
> >>>     >>>
> >>>     >>> The use of _disable_method_binary_search is not necessary. You
> >>>     can use
> >>>     >>> DynamicDumpSharedSpaces for the purpose. That would make things
> >>>     >>> cleaner.
> >>>     >> If we always disable the binary search when
> >>>     DynamicDumpSharedSpaces is
> >>>     >> true, it will slow down normal execution of the Java program
> >>> when
> >>>     >> -XX:ArchiveClassesAtExit has been specified, but the program
> >>>     hasn't exited.
> >>>     > Could you please add some comments to
> >>> _disable_method_binary_search
> >>>     > with the above explanation? Thanks.
> >>>
> >>>     OK
> >>>     >
> >>>     >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
> >>>     >>>
> >>>     >>>     76                     if (name.equals("s0") ||
> >>>     name.equals("s1")) {
> >>>     >>>     77                       // String regions are listed at
> >>>     the end and
> >>>     >>> they may not be fully occupied.
> >>>     >>>     78                       break;
> >>>     >>>     79                     } else if (name.equals("bm")) {
> >>>     >>>     80                       // Bitmap space does not have a
> >>>     requested address.
> >>>     >>>     81                       break;
> >>>     >>>
> >>>     >>> It's not part of your change, but could you please fix line 76
> >>>     - 78
> >>>     >>> since it is trivial. It seems the lines can be removed.
> >>>     >> Removed.
> >>>     >>
> >>>     >>> - /src/hotspot/share/memory/archiveUtils.hpp
> >>>     >>> The file name does not match with the macro '#ifndef
> >>>     >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
> >>>     >>> archiveUtils.* ? archiveRelocator.hpp and
> >>> archiveRelocator.cpp are
> >>>     >>> more descriptive.
> >>>     >> I named the file archiveUtils.hpp so we can move other misc
> >>>     stuff used
> >>>     >> by dumping into this file (e.g., DumpRegion, WriteClosure from
> >>>     >> metaspaceShared.hpp), since theses are not used by the majority
> >>>     of the
> >>>     >> files that use metaspaceShared.hpp.
> >>>     >>
> >>>     >> I fixed the ifdef.
> >>>     >>
> >>>     >>> - src/hotspot/share/memory/archiveUtils.cpp
> >>>     >>>
> >>>     >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
> >>>     address*
> >>>     >>> ptr_base, address* ptr_end) {
> >>>     >>>     37   assert(_ptrmap == NULL, "initialize only once");
> >>>     >>>     38   _ptr_base = ptr_base;
> >>>     >>>     39   _ptr_end = ptr_end;
> >>>     >>>     40   _compacted = false;
> >>>     >>>     41   _ptrmap = ptrmap;
> >>>     >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
> >>>     default
> >>>     >>> archive is about 12MB.
> >>>     >>>     43 }
> >>>     >>>
> >>>     >>> Could we do a better estimate here? We could guesstimate the
> >>> size
> >>>     >>> based on the current used class space and metaspace size. It's
> >>>     okay if
> >>>     >>> a larger bitmap used, since it can be reduced after all
> >>>     marking are
> >>>     >>> done.
> >>>     >> The bitmap is automatically expanded when necessary in
> >>>     >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
> >>>     of the
> >>>     >> total archive size, so even if we do expand, the cost will be
> >>>     trivial.
> >>>     > The initial value is based on the default CDS archive. When
> >>> dealing
> >>>     > with a really large archive, it would have to re-grow many times.
> >>>     > Also, using a hard-coded value is less desirable.
> >>>
> >>>     OK, I changed it to the following
> >>>
> >>>        // Use this as initial guesstimate. We should need less space
> >>>     in the
> >>>        // archive, but if we're wrong the bitmap will be expanded
> >>>     automatically.
> >>>        size_t estimated_archive_size =
> >>> MetaspaceGC::capacity_until_GC();
> >>>        // But set it smaller in debug builds so we always test the
> >>>     expansion
> >>>     code.
> >>>        // (Default archive is about 12MB).
> >>>        DEBUG_ONLY(estimated_archive_size = 6 * M);
> >>>
> >>>        // We need one bit per pointer in the archive.
> >>>        _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
> >>>
> >>>
> >>>     Thanks!
> >>>     - Ioi
> >>>
> >>>     >
> >>>     >>>
> >>>     >>>
> >>>     >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
> >>>     <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
> >>>     >>>> Hi Ioi,
> >>>     >>>>
> >>>     >>>> This is another great step for CDS usability improvement.
> >>>     Thank you!
> >>>     >>>>
> >>>     >>>> I have a high level question (or request): could we consider
> >>>     >>>> separating the relocation work for 'direct' class metadata
> >>>     from other
> >>>     >>>> types of metadata (such as the shared system dictionary,
> >>>     symbol table,
> >>>     >>>> etc)? Initially we only relocate the tables and other
> >>>     archived global
> >>>     >>>> data. When each archived class is being loaded, we can
> >>>     relocate all
> >>>     >>>> the pointers within the current class. We could find the
> >>>     segment (for
> >>>     >>>> the current class) in the bitmap and update the pointers
> >>>     within the
> >>>     >>>> segment. That way we can reduce initial startup costs and
> >>>     also avoid
> >>>     >>>> relocating class data that's not used at runtime. In some
> >>>     real world
> >>>     >>>> large systems, an archive may contain extremely large
> >>> number of
> >>>     >>>> classes.
> >>>     >>>>
> >>>     >>>> Following are partial review comments so we can move things
> >>>     forward.
> >>>     >>>> Still going through the rest of the changes.
> >>>     >>>>
> >>>     >>>> - src/hotspot/share/classfile/javaClasses.cpp
> >>>     >>>>
> >>>     >>>> 1218 void
> >>> java_lang_Class::update_archived_mirror_native_pointers(oop
> >>>     >>>> archived_mirror) {
> >>>     >>>> 1219   Klass* k =
> >>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
> >>>     >>>> 1220   if (k != NULL) { // k is NULL for the primitive
> >>>     classes such as
> >>>     >>>> java.lang.Byte::TYPE <<<<<<<<<<<
> >>>     >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
> >>>     >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
> >>>     >>>> 1222   }
> >>>     >>>> 1223 ...
> >>>     >>>>
> >>>     >>>> Primitive type mirrors are handled separately. Could you
> >>>     please verify
> >>>     >>>> if this call path happens for primitive type mirror?
> >>>     >>>>
> >>>     >>>> To answer my question above, looks like you added the
> >>>     following, which
> >>>     >>>> is to be used for primitive type mirrors. That seems to be
> >>>     the reason
> >>>     >>>> why update_archived_mirror_native_pointers is trying to also
> >>>     cover
> >>>     >>>> primitive type. It better to have a separate API for
> >>>     primitive type
> >>>     >>>> mirror, which is cleaner. And, we also can replace the above
> >>>     check at
> >>>     >>>> line 1220 to be an assert for regular mirrors.
> >>>     >>>>
> >>>     >>>> +void ReadClosure::do_mirror_oop(oop *p) {
> >>>     >>>> +  do_oop(p);
> >>>     >>>> +  oop mirror = *p;
> >>>     >>>> +  if (mirror != NULL) {
> >>>     >>>> +
> >>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
> >>>     >>>> +  }
> >>>     >>>> +}
> >>>     >>>> +
> >>>     >>>>
> >>>     >>>> How about renaming update_archived_mirror_native_pointers to
> >>>     >>>> update_archived_mirror_klass_pointers.
> >>>     >>>>
> >>>     >>>> It would be good to pass the current klass as an argument.
> >>> We can
> >>>     >>>> verify the relocated pointer matches with the current klass
> >>>     pointer.
> >>>     >>>>
> >>>     >>>> We should also check if relocation is necessary before
> >>>     spending cycles
> >>>     >>>> to obtain the klass pointer from the mirror.
> >>>     >>>>
> >>>     >>>> 1252  update_archived_mirror_native_pointers(m);
> >>>     >>>> 1253
> >>>     >>>> 1254   // mirror is archived, restore
> >>>     >>>> 1255  assert(HeapShared::is_archived_object(m), "must be
> >>> archived
> >>>     >>>> mirror object");
> >>>     >>>> 1256   Handle mirror(THREAD, m);
> >>>     >>>>
> >>>     >>>> Could we move the line at 1252 after the assert at line 1255?
> >>>     >>>>
> >>>     >>>> - src/hotspot/share/include/cds.h
> >>>     >>>>
> >>>     >>>>     47   int     _mapped_from_file;  // Is this region mapped
> >>>     from a file?
> >>>     >>>>     48                               // If false, this
> >>> region was
> >>>     >>>> initialized using os::read().
> >>>     >>>>
> >>>     >>>> Is the new field truly needed? It seems we could use
> >>>     _mapped_base to
> >>>     >>>> determine if a region is mapped or not?
> >>>     >>>>
> >>>     >>>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>     >>>>
> >>>     >>>> Could you please remove the debugging print code in
> >>>     >>>> dynamic_dump_method_comparator? Or convert those to logging
> >>>     output if
> >>>     >>>> they are helpful.
> >>>     >>>>
> >>>     >>>> Will send out the rest of the review comments later.
> >>>     >>>>
> >>>     >>>> Best,
> >>>     >>>>
> >>>     >>>> Jiangli
> >>>     >>>>
> >>>     >>>>
> >>>     >>>>
> >>>     >>>>
> >>>     >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
> >>>     <mailto:ioi.lam at oracle.com>> wrote:
> >>>     >>>>> Bug:
> >>>     >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
> >>>     >>>>>
> >>>     >>>>> Webrev:
> >>>     >>>>>
> >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
> >>>
> >>>     >>>>>
> >>>     >>>>> Design:
> >>>     >>>>>
> >>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
> >>>
> >>>     >>>>>
> >>>     >>>>>
> >>>     >>>>> Overview:
> >>>     >>>>>
> >>>     >>>>> The CDS archive is mmaped to a fixed address range
> >>> (starting at
> >>>     >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
> >>>     >>>>> requested address range is not available (usually due to
> >>> Address
> >>>     >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give
> >>> up and
> >>>     >>>>> will load classes dynamically using class files.
> >>>     >>>>>
> >>>     >>>>> [a] This causes slow down in JVM start-up.
> >>>     >>>>> [b] Handling of mapping failures causes unnecessary
> >>>     complication in
> >>>     >>>>>        the CDS tests.
> >>>     >>>>>
> >>>     >>>>> Here are some preliminary benchmarking results (using
> >>>     default CDS archive,
> >>>     >>>>> running helloworld):
> >>>     >>>>>
> >>>     >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
> >>>     >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
> >>>     >>>>> (c) 86.2ms (CDS disabled)
> >>>     >>>>>
> >>>     >>>>> The small degradation in (b) is caused by the relocation of
> >>>     >>>>> absolute pointers embedded in the CDS archive. However, it is
> >>>     >>>>> still a big improvement over case (c)
> >>>     >>>>>
> >>>     >>>>> Please see the design doc (link above) for details.
> >>>     >>>>>
> >>>     >>>>> Thanks
> >>>     >>>>> - Ioi
> >>>     >>>>>
> >>>
> >>
> >
>

From jianglizhou at google.com  Fri Nov  8 02:34:27 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Thu, 7 Nov 2019 18:34:27 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
Message-ID: <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>

On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> I looked both 05.full and 06.delta webrevs. They look good.
>
> I still feel a bit uneasy about the potential runtime impact when data
> does get relocated. Long running apps/services may be shy away from
> enabling archive at runtime, if there is a detectable overhead even
> though it may only occur rarely. As relocation is enabled by default
> and users cannot turn it off, disabling with -Xshare:off entirely
> would become the only choice. Could you please create a new RFE
> (possibly with higher priority) to investigate the potential effect,
> or provide an option for users to opt-in relocation with the
> command-line switch?

Forgot to say that when Java heap can fit into low 32G space, it takes
the class space size into account and leaves need space right above
(also in low 32G space) when reserving heap, for !UseSharedSpace. In
that case, it's more likely the class data and heap data can be
colocated successfully.

Thanks,
Jiangli

>
> Regards,
> Jiangli
>
> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >
> > Hi Coleen,
> >
> > Thanks for the review. Here's an webrev that has incorporated your
> > suggestions:
> >
> > http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
> >
> > Please see comments in-line
> >
> > On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
> > > Hi, I've done a more high level code review of this and it looks good!
> > >
> > > http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
> > >
> > >
> > > I think these classes require comments on what they do and why. The
> > > comments you sent me offline look good.
> >
> > I added more comments for ArchivePtrMarker::_compacted per your offline
> > request.
> >
> > >
> > > Also .hpp files shouldn't include .inline.hpp files, like
> > > bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
> > > the cpp file.
> >
> > I moved the do_bit() function into archiveUtils.inline.hpp, since is
> > used by 3 .cpp files, and performance is important.
> >
> > >
> > > I wonder if the exception list of classes to exclude should be a
> > > function in javaClasses.hpp/cpp where the explanation would make more
> > > sense?  ie bool
> > > JavaClasses::has_injected_native_pointers(InstanceKlass* k);
> >
> > I moved the checking code to javaClasses.cpp. Since we do (partially)
> > support java.lang.Class, which has injected native pointers, I named the
> > function as JavaClasses::is_supported_for_archiving instead. I also
> > massaged the comments a little for clarification.
> >
> > >
> > > Is there already an RFE to move the DumpSharedSpaces output from
> > > tty->print() to log_info() ?
> >
> > I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
> > dumping tty->print_cr() to unified logging).
> >
> > Thanks
> > - Ioi
> >
> > >
> > > Thanks,
> > > Coleen
> > >
> > > On 11/6/19 4:17 PM, Ioi Lam wrote:
> > >> Hi Jiangli,
> > >>
> > >> I've uploaded the webrev after integrating your comments:
> > >>
> > >> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
> > >>
> > >> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
> > >>
> > >>
> > >> Please see more replies below:
> > >>
> > >>
> > >> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
> > >>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
> > >>> <mailto:ioi.lam at oracle.com>> wrote:
> > >>>
> > >>>     Hi Jiangli,
> > >>>
> > >>>     Thank you so much for spending time reviewing this RFE!
> > >>>
> > >>>     On 11/3/19 6:34 PM, Jiangli Zhou wrote:
> > >>>     > Hi Ioi,
> > >>>     >
> > >>>     > Sorry for the delay again. Will try to put this on the top of my
> > >>>     list
> > >>>     > next week and reduce the turn-around time. The updates look
> > >>> good in
> > >>>     > general.
> > >>>     >
> > >>>     > We might want to have a better strategy when choosing metadata
> > >>>     > relocation address (when relocation is needed). Some
> > >>>     > applications/benchmarks may be more sensitive to cache
> > >>> locality and
> > >>>     > memory/data layout. There was a bug,
> > >>>     > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
> > >>> 1G gap
> > >>>     > between Java heap data and metadata before JDK 12. The gap
> > >>> seemed to
> > >>>     > cause a small but noticeable runtime effect in one case that I
> > >>> came
> > >>>     > across.
> > >>>
> > >>>     I guess you're saying we should try to relocate the archive into
> > >>>     somewhere under 32GB?
> > >>>
> > >>>
> > >>> I don't yet have sufficient data that determins if mapping at low
> > >>> 32G produces better runtime performance. I experimented with that,
> > >>> but didn't see noticeable difference when comparing to mapping at
> > >>> the current default address. It doesn't hurt, I think. So it may be
> > >>> a better choice than relocating to a random address in high 32G
> > >>> space (when Java heap is in low 32G address space).
> > >>
> > >> Maybe we should reconsider this when we have more concrete data for
> > >> the benefits of moving the compressed class space to under 32G.
> > >>
> > >> Please note that in metaspace.cpp, when CDS is disabled and  the VM
> > >> fails to allocate the class space at the requested address
> > >> (0x7c000000 for 16GB heap), it also just allocates from a random
> > >> address (without trying to to search under 32GB):
> > >>
> > >> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
> > >>
> > >>
> > >> This code has been there since 2013 and we have not seen any issues.
> > >>
> > >>
> > >>
> > >>
> > >>>
> > >>>     Could you elaborate more about the performance issue, especially
> > >>>     about
> > >>>     cache locality? I looked at JDK-8213713 but it didn't mention about
> > >>>     performance.
> > >>>
> > >>>
> > >>> When enabling CDS we noticed a small runtime overhead in JDK 11
> > >>> recently with a benchmark. After I backported JDK-8213713 to 11, it
> > >>> seemed to reduce the runtime overhead that the benchmark was
> > >>> experiencing.
> > >>>
> > >>>
> > >>>     Also, by default, we have non-zero narrow_klass_base and
> > >>>     narrow_klass_shift = 3, and archive relocation doesn't change that:
> > >>>
> > >>>     $ java -Xlog:cds=debug -version
> > >>>     ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
> > >>>     $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
> > >>>     ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
> > >>>
> > >>>     We always use narrow_klass_shift due to this:
> > >>>
> > >>>        // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
> > >>>        //
> > >>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
> > >>>        // how dump time narrow_klass_shift is set. Although, CDS can
> > >>> work
> > >>>        // with zero-shift mode also, to be consistent with AOT it uses
> > >>>        // LogKlassAlignmentInBytes for klass shift so archived java
> > >>>     heap objects
> > >>>        // can be used at same time as AOT code.
> > >>>        if (!UseSharedSpaces
> > >>>            && (uint64_t)(higher_address - lower_base) <=
> > >>>     UnscaledClassSpaceMax) {
> > >>>          CompressedKlassPointers::set_shift(0);
> > >>>        } else {
> > >>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
> > >>>        }
> > >>>
> > >>>
> > >>> Right. If we relocate to low 32G space, it needs to make sure that
> > >>> the range containing the mapped class data and class space must be
> > >>> encodable.
> > >>>
> > >>>
> > >>>     > Here are some additional comments (minor).
> > >>>     >
> > >>>     > Could you please fix the long lines in the following?
> > >>>     >
> > >>>     > 1237 void
> > >>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
> > >>>     > archived_mirror) {
> > >>>     > 1238   if (MetaspaceShared::relocation_delta() != 0) {
> > >>>     > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
> > >>>     > NULL, "must be for primitive class");
> > >>>     > 1240
> > >>>     > 1241     Klass* ak =
> > >>>     > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
> > >>>     > 1242     if (ak != NULL) {
> > >>>     > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
> > >>>     > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
> > >>>     > 1244     }
> > >>>     > 1245   }
> > >>>     > 1246 }
> > >>>     >
> > >>>     > src/hotspot/share/memory/dynamicArchive.cpp
> > >>>     >
> > >>>     >   889   Thread* THREAD = Thread::current();
> > >>>     >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
> > >>>     > dynamic_dump_method_comparator);
> > >>>     >   891   if (ik->default_methods() != NULL) {
> > >>>     >   892  Method::sort_methods(ik->default_methods(),
> > >>>     > /*set_idnums=*/false, dynamic_dump_method_comparator);
> > >>>     >   893   }
> > >>>     >
> > >>>
> > >>>     OK will do.
> > >>>
> > >>>     > Please see inlined comments below.
> > >>>     >
> > >>>     > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
> > >>>     <mailto:ioi.lam at oracle.com>> wrote:
> > >>>     >> Hi Jiangli,
> > >>>     >>
> > >>>     >> Thanks for the review. I've updated the patch according to your
> > >>>     comments:
> > >>>     >>
> > >>>     >>
> > >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
> > >>>
> > >>>     >>
> > >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
> > >>>
> > >>>     >>
> > >>>     >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
> > >>>     in my
> > >>>     >> reply to Calvin's comments).
> > >>>     >>
> > >>>     >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
> > >>>     >>> Hi Ioi,
> > >>>     >>>
> > >>>     >>> Sorry for the delay. Here are my remaining comments.
> > >>>     >>>
> > >>>     >>> - src/hotspot/share/memory/dynamicArchive.cpp
> > >>>     >>>
> > >>>     >>> 128   static intx _method_comparator_name_delta;
> > >>>     >>>
> > >>>     >>> The name of the above variable is confusing. It's the value of
> > >>>     >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
> > >>>     >>> directly.
> > >>>     >> _buffer_to_target_delta is a non-static field, but
> > >>>     >> dynamic_dump_method_comparator() must be a static function so
> > >>>     it can't
> > >>>     >> use the non-static field easily.
> > >>>     >
> > >>>     > It sounds like an issue. _buffer_to_target_delta was made as a
> > >>>     > non-static mostly because we might support more than one dynamic
> > >>>     > archives in the future. However, today's usages bake in an
> > >>>     assumption
> > >>>     > that _buffer_to_target_delta is a singleton value. It is
> > >>> cleaner to
> > >>>     > either make _buffer_to_target_delta as a static variable for
> > >>> now, or
> > >>>     > adding an access API in DynamicArchiveBuilder to allow other
> > >>> code to
> > >>>     > properly and correctly use the value.
> > >>>
> > >>>     OK, I'll move it to a static variable.
> > >>>
> > >>>     >
> > >>>     >>> Also, we can do a quick pointer comparison of 'a_name' and
> > >>>     >>> 'b_name' first before adjusting the pointers.
> > >>>     >> I added this:
> > >>>     >>
> > >>>     >>       if (a_name == b_name) {
> > >>>     >>         return 0;
> > >>>     >>       }
> > >>>     >>
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
> > >>>     >>> ...
> > >>>     >>>    944
> > >>>     >>>    945  ArchivePtrMarker::compact(relocatable_base,
> > >>>     relocatable_end);
> > >>>     >>> ...
> > >>>     >>>
> > >>>     >>>    974     SharedDataRelocator patcher((address*)patch_base,
> > >>>     >>> (address*)patch_end, valid_old_base, valid_old_end,
> > >>>     >>>    975  valid_new_base, valid_new_end, addr_delta);
> > >>>     >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
> > >>>     >>>
> > >>>     >>> Could we reduce the number of data re-iterations to help
> > >>> archive
> > >>>     >>> dumping performance. The ArchivePtrMarker::compact operation
> > >>>     can be
> > >>>     >>> combined with the patching iteration.
> > >>>     ArchivePtrMarker::compact API
> > >>>     >>> can be removed.
> > >>>     >> That's a good idea. I implemented it using a template parameter
> > >>>     so that
> > >>>     >> we can have max performance when relocating the archive at run
> > >>>     time.
> > >>>     >>
> > >>>     >> I added comments to explain why the relocation is done here. The
> > >>>     >> relocation is pretty rare (only when the base archive was not
> > >>>     mapped at
> > >>>     >> the default location).
> > >>>     >>
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>>    967     address valid_new_base =
> > >>>     >>> (address)Arguments::default_SharedBaseAddress();
> > >>>     >>>    968     address valid_new_end  = valid_new_base +
> > >>>     base_plus_top_size;
> > >>>     >>>
> > >>>     >>> The debugging only code can be included under #ifdef ASSERT.
> > >>>     >> These values are actually also used in debug logging so they
> > >>>     can't be
> > >>>     >> ifdef'ed out.
> > >>>     >>
> > >>>     >> Also, the c++ compiler is pretty good with eliding code
> > >>> that's no
> > >>>     >> actually used. If I comment out all the logging code in
> > >>>     >> DynamicArchiveBuilder::relocate_buffer_to_target() and
> > >>>     >> SharedDataRelocator, gcc elides all the unused fields and their
> > >>>     >> assignments. So no code is generated for this, etc.
> > >>>     >>
> > >>>     >>       address valid_new_base =
> > >>>     >> (address)Arguments::default_SharedBaseAddress();
> > >>>     >>
> > >>>     >> Since #ifdef ASSERT makes the code harder to read, I think we
> > >>>     should use
> > >>>     >> it only when really necessary.
> > >>>     > It seems cleaner to get rid of these debugging only variables, by
> > >>>     > using 'relocatable_base' and
> > >>>     > '(address)Arguments::default_SharedBaseAddress()' in the logging
> > >>>     code.
> > >>>
> > >>>     SharedDataRelocator is used under 3 different situations. These six
> > >>>     variables (patch_base, patch_end, valid_old_base, valid_old_end,
> > >>>     valid_new_base, valid_new_end) describes what is being patched,
> > >>>     and what
> > >>>     the expectations are, for each situation. The code will be hard to
> > >>>     understand without them.
> > >>>
> > >>>     Please note there's also logging code in the SharedDataRelocator
> > >>>     constructor that prints out these values.
> > >>>
> > >>>     I think I'll just remove the 'debug only' comment to avoid
> > >>> confusion.
> > >>>
> > >>>
> > >>> Ok.
> > >>>
> > >>>
> > >>>     >
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>>    993
> > >>>  dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
> > >>>     >>>
> > >>>     >>> We could combine the archived heap data bitmap into the new
> > >>>     region as
> > >>>     >>> well? It can be handled as a separate RFE.
> > >>>     >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
> > >>>     >>
> > >>>     >>> - src/hotspot/share/memory/filemap.cpp
> > >>>     >>>
> > >>>     >>> 1038     if (is_static()) {
> > >>>     >>> 1039       if (errno == ENOENT) {
> > >>>     >>> 1040         // Not locating the shared archive is ok.
> > >>>     >>> 1041         fail_continue("Specified shared archive not found
> > >>>     (%s).",
> > >>>     >>> _full_path);
> > >>>     >>> 1042       } else {
> > >>>     >>> 1043         fail_continue("Failed to open shared archive file
> > >>>     (%s).",
> > >>>     >>> 1044  os::strerror(errno));
> > >>>     >>> 1045       }
> > >>>     >>> 1046     } else {
> > >>>     >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
> > >>>     >>> doesn't exist: %s", _full_path);
> > >>>     >>> 1048     }
> > >>>     >>>
> > >>>     >>> If the top layer is explicitly specified by the user, a
> > >>>     warning does
> > >>>     >>> not seem to be a proper behavior if the VM fails to open the
> > >>>     archive
> > >>>     >>> file.
> > >>>     >>>
> > >>>     >>> If might be better to handle the relocation unrelated code in
> > >>>     separate
> > >>>     >>> changeset and track with a separate RFE.
> > >>>     >> This code was moved from
> > >>>     >>
> > >>>     >>
> > >>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
> > >>>
> > >>>     >>
> > >>>     >> so I am not changing the behavior. If you want, we can file an
> > >>>     REF to
> > >>>     >> change the behavior.
> > >>>     > Ok. A new RFE sounds like the right thing to re-evaluable the
> > >>> usage
> > >>>     > issue here. Thanks.
> > >>>
> > >>>     I created https://bugs.openjdk.java.net/browse/JDK-8233446
> > >>>
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>> 1148 void FileMapInfo::write_region(int region, char* base,
> > >>>     size_t size,
> > >>>     >>> 1149                                bool read_only, bool
> > >>>     allow_exec) {
> > >>>     >>> ...
> > >>>     >>> 1154
> > >>>     >>> 1155   if (region == MetaspaceShared::bm) {
> > >>>     >>> 1156     target_base = NULL;
> > >>>     >>> 1157   } else if (DynamicDumpSharedSpaces) {
> > >>>     >>>
> > >>>     >>> It's not too clear to me how the bitmap (bm) region is handled
> > >>>     for the
> > >>>     >>> base layer and top layer. Could you please explain?
> > >>>     >> The bm region for both layers are mapped at an address picked
> > >>>     by the OS:
> > >>>     >>
> > >>>     >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
> > >>>     >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
> > >>>     >>     bitmap_size = si->used_aligned();
> > >>>     >>     bool read_only = true, allow_exec = false;
> > >>>     >>     char* requested_addr = NULL; // allow OS to pick any
> > >>> location
> > >>>     >>     char* bitmap_base = os::map_memory(_fd, _full_path,
> > >>>     si->file_offset(),
> > >>>     >> requested_addr, bitmap_size,
> > >>>     >> read_only, allow_exec);
> > >>>     >>
> > >>>     > Ok, after staring at the code for a few seconds I saw that's
> > >>>     intended.
> > >>>     > If the current region is 'bm', then the 'target_base' is NULL
> > >>>     > regardless if it's static or dynamic archive. Otherwise, the
> > >>>     > 'target_base' is handled differently for the static and dynamic
> > >>>     case.
> > >>>     > The following would be cleaner and has better reliability.
> > >>>     >
> > >>>     >     char* target_base = NULL;
> > >>>     >
> > >>>     >     // The target_base is NULL for 'bm' region.
> > >>>     >     if (!region == MetaspaceShared::bm) {
> > >>>     >       if (DynamicDumpSharedSpaces) {
> > >>>     >         assert(!HeapShared::is_heap_region(region), "dynamic
> > >>> archive
> > >>>     > doesn't support heap regions");
> > >>>     >         target_base = DynamicArchive::buffer_to_target(base);
> > >>>     >       } else {
> > >>>     >         target_base = base;
> > >>>     >       }
> > >>>     >    }
> > >>>
> > >>>     How about this?
> > >>>
> > >>>        char* target_base;
> > >>>        if (region == MetaspaceShared::bm) {
> > >>>          target_base = NULL; // always NULL for bm region.
> > >>>        } else {
> > >>>          if (DynamicDumpSharedSpaces) {
> > >>>              assert(!HeapShared::is_heap_region(region), "dynamic
> > >>> archive
> > >>>     doesn't support heap regions");
> > >>>              target_base = DynamicArchive::buffer_to_target(base);
> > >>>          } else {
> > >>>              target_base = base;
> > >>>          }
> > >>>        }
> > >>>
> > >>>
> > >>> No objection If you prefer the extra 'else' block.
> > >>>
> > >>>
> > >>>     >
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>> 1362
> > >>>  DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> > >>>
> > >>>     >>>
> > >>>     >>> Could you please explain the above?
> > >>>     >> I added the comments
> > >>>     >>
> > >>>     >>     // Make sure we don't attempt to use
> > >>>     header()->mapped_base_address()
> > >>>     >> unless
> > >>>     >>     // it's been successfully mapped.
> > >>>     >>
> > >>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> > >>>
> > >>>     >>
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>> 1359   FileMapRegion* last_region = NULL;
> > >>>     >>>
> > >>>     >>> 1371     if (last_region != NULL) {
> > >>>     >>> 1372       // Ensure that the OS won't be able to allocate new
> > >>>     memory
> > >>>     >>> spaces between any mapped
> > >>>     >>> 1373       // regions, or else it would mess up the simple
> > >>>     comparision
> > >>>     >>> in MetaspaceObj::is_shared().
> > >>>     >>> 1374       assert(si->mapped_base() ==
> > >>> last_region->mapped_end(),
> > >>>     >>> "must have no gaps");
> > >>>     >>>
> > >>>     >>> 1379     last_region = si;
> > >>>     >>>
> > >>>     >>> Can you please place 'last_region' related code under #ifdef
> > >>>     ASSERT?
> > >>>     >> I think that will make the code more cluttered. The compiler
> > >>> will
> > >>>     >> optimize out that away.
> > >>>     > It's cleaner to define debugging only variable for debugging only
> > >>>     > builds. You can wrapper it and related usage with DEBUG_ONLY.
> > >>>
> > >>>     OK, will do.
> > >>>
> > >>>     >
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
> > >>>     bitmap_size) {
> > >>>     >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
> > >>>     >>> 1480   bitmap_size = si->used_aligned();
> > >>>     >>> 1481   bool read_only = true, allow_exec = false;
> > >>>     >>> 1482   char* requested_addr = NULL; // allow OS to pick any
> > >>>     location
> > >>>     >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
> > >>>     si->file_offset(),
> > >>>     >>> 1484 requested_addr, bitmap_size,
> > >>>     >>> read_only, allow_exec);
> > >>>     >>>
> > >>>     >>> We need to handle mapping failure here.
> > >>>     >> It's handled here:
> > >>>     >>
> > >>>     >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
> > >>>     >>     log_debug(cds, reloc)("runtime archive relocation start");
> > >>>     >>     size_t bitmap_size;
> > >>>     >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
> > >>>     >>     if (bitmap_base != NULL) {
> > >>>     >>     ...
> > >>>     >>     } else {
> > >>>     >>       log_error(cds)("failed to map relocation bitmap");
> > >>>     >>       return false;
> > >>>     >>     }
> > >>>     >>
> > >>>     > 'bitmap_base' is used immediately after map_memory(). So the
> > >>> check
> > >>>     > needs to be done immediately after map_memory(), but not in the
> > >>>     caller
> > >>>     > of map_relocation_bitmap().
> > >>>     >
> > >>>     > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
> > >>>     si->file_offset(),
> > >>>     > 1491 requested_addr, bitmap_size,
> > >>>     > read_only, allow_exec);
> > >>>     > 1492
> > >>>     > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
> > >>>     > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
> > >>>
> > >>>     OK, I'll fix that.
> > >>>
> > >>>     >
> > >>>     >
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>> 1513     // debug only -- the current value of the pointers
> > >>> to be
> > >>>     >>> patched must be within this
> > >>>     >>> 1514     // range (i.e., must be between the requesed base
> > >>>     address,
> > >>>     >>> and the of the current archive).
> > >>>     >>> 1515     // Note: top archive may point to objects in the base
> > >>>     >>> archive, but not the other way around.
> > >>>     >>> 1516     address valid_old_base =
> > >>>     (address)header()->requested_base_address();
> > >>>     >>> 1517     address valid_old_end  = valid_old_base +
> > >>>     mapping_end_offset();
> > >>>     >>>
> > >>>     >>> Please place all FileMapInfo::relocate_pointers debugging only
> > >>>     code
> > >>>     >>> under #ifdef ASSERT.
> > >>>     >> Ditto about ifdef ASSERT
> > >>>     >>
> > >>>     >>> - src/hotspot/share/memory/heapShared.cpp
> > >>>     >>>
> > >>>     >>>    441 void
> > >>>     HeapShared::initialize_from_archived_subgraph(Klass* k) {
> > >>>     >>>    442   if (!open_archive_heap_region_mapped() ||
> > >>>     !MetaspaceObj::is_shared(k)) {
> > >>>     >>>    443     return; // nothing to do
> > >>>     >>>    444   }
> > >>>     >>>
> > >>>     >>> When do we call HeapShared::initialize_from_archived_subgraph
> > >>>     for a
> > >>>     >>> klass that's not shared?
> > >>>     >> I've removed the !MetaspaceObj::is_shared(k). I probably added
> > >>>     that for
> > >>>     >> debugging purposes only.
> > >>>     >>
> > >>>     >>>    616   DEBUG_ONLY({
> > >>>     >>>    617       Klass* klass = orig_obj->klass();
> > >>>     >>>    618       assert(klass !=
> > >>> SystemDictionary::Module_klass() &&
> > >>>     >>>    619              klass !=
> > >>>     SystemDictionary::ResolvedMethodName_klass() &&
> > >>>     >>>    620              klass !=
> > >>>     SystemDictionary::MemberName_klass() &&
> > >>>     >>>    621              klass !=
> > >>> SystemDictionary::Context_klass() &&
> > >>>     >>>    622              klass !=
> > >>>     SystemDictionary::ClassLoader_klass(), "we
> > >>>     >>> can only relocate metaspace object pointers inside
> > >>> java_lang_Class
> > >>>     >>> instances");
> > >>>     >>>    623     });
> > >>>     >>>
> > >>>     >>> Let's leave the above for a separate RFE. I think assert is not
> > >>>     >>> sufficient for the check. Also, why ResolvedMethodName,
> > >>> Module and
> > >>>     >>> MemberName cannot be part of the graph?
> > >>>     >>>
> > >>>     >>>
> > >>>     >> I added the following comment:
> > >>>     >>
> > >>>     >>     DEBUG_ONLY({
> > >>>     >>         // The following are classes in
> > >>>     share/classfile/javaClasses.cpp
> > >>>     >> that have injected native pointers
> > >>>     >>         // to metaspace objects. To support these classes, we
> > >>>     need to add
> > >>>     >> relocation code similar to
> > >>>     >>         //
> > >>> java_lang_Class::update_archived_mirror_native_pointers.
> > >>>     >>         Klass* klass = orig_obj->klass();
> > >>>     >>         assert(klass != SystemDictionary::Module_klass() &&
> > >>>     >>                klass !=
> > >>>     SystemDictionary::ResolvedMethodName_klass() &&
> > >>>     >>
> > >>>     > It's too restrictive to exclude those objects from the archived
> > >>>     object
> > >>>     > graph because metadata relocation, since metadata relocation is
> > >>>     rare.
> > >>>     > The trade-off doesn't seem to buy us much.
> > >>>     >
> > >>>     > Do you plan to add the needed relocation code?
> > >>>
> > >>>     I looked more into this. Actually we cannot handle these 5
> > >>> classes at
> > >>>     all, even without archive relocation:
> > >>>
> > >>>     [1] #define MODULE_INJECTED_FIELDS(macro) \
> > >>>        macro(java_lang_Module, module_entry, intptr_signature, false)
> > >>>
> > >>>     ->  module_entry is malloc'ed
> > >>>
> > >>>     [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
> > >>>        macro(java_lang_invoke_ResolvedMethodName, vmholder,
> > >>>     object_signature, false) \
> > >>>        macro(java_lang_invoke_ResolvedMethodName, vmtarget,
> > >>>     intptr_signature, false)
> > >>>
> > >>>     -> these fields are related to method handles and lambda forms,
> > >>> etc.
> > >>>     They can't be easily be archived without implementing lambda form
> > >>>     archiving. (I did a prototype; it's very complex and fragile).
> > >>>
> > >>>     [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
> > >>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> > >>>     vmdependencies, intptr_signature, false) \
> > >>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> > >>>     last_cleanup, long_signature, false)
> > >>>
> > >>>     -> vmdependencies is malloc'ed.
> > >>>
> > >>>     [4] #define
> > >>> MEMBERNAME_INJECTED_FIELDS(macro) \
> > >>>        macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
> > >>>     false)
> > >>>
> > >>>     -> this one is probably OK. Despite being declared as
> > >>>     'intptr_signature', it seems to be used just as an integer.
> > >>> However,
> > >>>     MemberNames are typically used with [2] and [3]. So let's just
> > >>>     forbid it
> > >>>     to be safe.
> > >>>
> > >>>     [2] [3] [4] are not used directly by regular Java code and are
> > >>>     unlikely
> > >>>     to be referenced (directly or indirectly) by static fields (except
> > >>>     for
> > >>>     the static fields in the classes in java.lang.invoke, which we
> > >>>     probably
> > >>>     won't support for heap archiving due to the problem I described for
> > >>>     [2]). Objects of these types are typically referenced via constant
> > >>>     pool
> > >>>     entries.
> > >>>
> > >>>     [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
> > >>>        macro(java_lang_ClassLoader, loader_data, intptr_signature,
> > >>> false)
> > >>>
> > >>>     -> loader_data is malloc'ed.
> > >>>
> > >>>     So, I will change the DEBUG_ONLY into a product-mode check, and
> > >>> quit
> > >>>     dumping if these objects are found in the object subgraph.
> > >>>
> > >>>
> > >>> Sounds good. Can you please also add a comment with explanation.
> > >>>
> > >>> For  ClassLoader and Module, it worth considering caching the
> > >>> additional native data some time in the future. Lois had suggested
> > >>> the Module part a while ago.
> > >>
> > >> I think we can do that if/when we archive Modules directly into the
> > >> shared heap.
> > >>
> > >>
> > >>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>     Maybe we should backport the check to older versions as well?
> > >>>
> > >>>
> > >>> We should discuss with Andrew Haley for backports to JDK 11 update
> > >>> releases. Since the current OpenJDK 11 only applies Java heap
> > >>> archiving to a restricted set of JDK library code, I think it is
> > >>> safe without the new check.
> > >>>
> > >>> For non-LTS releases, it might not be worthwhile as they may not be
> > >>> widely used?
> > >>
> > >> I agree. FYI, we (Oracle) have no plan for backporting more types of
> > >> heap object archiving, so the decision would be up to whoever that
> > >> decides to do so.
> > >>
> > >> Thanks
> > >> - Ioi
> > >>
> > >>
> > >>>
> > >>> Thanks,
> > >>> Jiangli
> > >>>
> > >>>
> > >>>     >
> > >>>     >>> - src/hotspot/share/memory/metaspace.cpp
> > >>>     >>>
> > >>>     >>> 1036   metaspace_rs =
> > >>> ReservedSpace(compressed_class_space_size(),
> > >>>     >>> 1037   _reserve_alignment,
> > >>>     >>> 1038   large_pages,
> > >>>     >>> 1039   requested_addr);
> > >>>     >>>
> > >>>     >>> Please fix indentation.
> > >>>     >> Fixed.
> > >>>     >>
> > >>>     >>> - src/hotspot/share/memory/metaspaceClosure.hpp
> > >>>     >>>
> > >>>     >>>     78   enum SpecialRef {
> > >>>     >>>     79     _method_entry_ref
> > >>>     >>>     80   };
> > >>>     >>>
> > >>>     >>> Are there other pointers that are not references to
> > >>>     MetaspaceObj? If
> > >>>     >>> _method_entry_ref is the only type, it's probably not worth
> > >>>     defining
> > >>>     >>> SpecialRef?
> > >>>     >> There may be more types in the future, so I want to have a
> > >>>     stable API
> > >>>     >> that can be easily expanded without touching all the code that
> > >>>     uses it.
> > >>>     >>
> > >>>     >>
> > >>>     >>> - src/hotspot/share/memory/metaspaceShared.hpp
> > >>>     >>>
> > >>>     >>>     42 enum MapArchiveResult {
> > >>>     >>>     43   MAP_ARCHIVE_SUCCESS,
> > >>>     >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
> > >>>     >>>     45   MAP_ARCHIVE_OTHER_FAILURE
> > >>>     >>>     46 };
> > >>>     >>>
> > >>>     >>> If we want to define different failure types, it's probably
> > >>> worth
> > >>>     >>> using separate types for relocation failure and validation
> > >>>     failure.
> > >>>     >> For now, I just need to distinguish between MMAP_FAILURE (where
> > >>>     I should
> > >>>     >> attempt to remap at an alternative address) and OTHER_FAILURE
> > >>>     (where the
> > >>>     >> CDS archive loading will fail -- due to validation error,
> > >>>     insufficient
> > >>>     >> memory, etc -- without attempting to remap.)
> > >>>     >>
> > >>>     >>> ---
> > >>>     >>>
> > >>>     >>>    193   static intx _mapping_delta; // FIXME rename
> > >>>     >>>
> > >>>     >>> How about _relocation_delta?
> > >>>     >> Changed as suggested.
> > >>>     >>
> > >>>     >>> - src/hotspot/share/oops/instanceKlass
> > >>>     >>>
> > >>>     >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
> > >>>     >>>
> > >>>     >>> The use of _disable_method_binary_search is not necessary. You
> > >>>     can use
> > >>>     >>> DynamicDumpSharedSpaces for the purpose. That would make things
> > >>>     >>> cleaner.
> > >>>     >> If we always disable the binary search when
> > >>>     DynamicDumpSharedSpaces is
> > >>>     >> true, it will slow down normal execution of the Java program
> > >>> when
> > >>>     >> -XX:ArchiveClassesAtExit has been specified, but the program
> > >>>     hasn't exited.
> > >>>     > Could you please add some comments to
> > >>> _disable_method_binary_search
> > >>>     > with the above explanation? Thanks.
> > >>>
> > >>>     OK
> > >>>     >
> > >>>     >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
> > >>>     >>>
> > >>>     >>>     76                     if (name.equals("s0") ||
> > >>>     name.equals("s1")) {
> > >>>     >>>     77                       // String regions are listed at
> > >>>     the end and
> > >>>     >>> they may not be fully occupied.
> > >>>     >>>     78                       break;
> > >>>     >>>     79                     } else if (name.equals("bm")) {
> > >>>     >>>     80                       // Bitmap space does not have a
> > >>>     requested address.
> > >>>     >>>     81                       break;
> > >>>     >>>
> > >>>     >>> It's not part of your change, but could you please fix line 76
> > >>>     - 78
> > >>>     >>> since it is trivial. It seems the lines can be removed.
> > >>>     >> Removed.
> > >>>     >>
> > >>>     >>> - /src/hotspot/share/memory/archiveUtils.hpp
> > >>>     >>> The file name does not match with the macro '#ifndef
> > >>>     >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
> > >>>     >>> archiveUtils.* ? archiveRelocator.hpp and
> > >>> archiveRelocator.cpp are
> > >>>     >>> more descriptive.
> > >>>     >> I named the file archiveUtils.hpp so we can move other misc
> > >>>     stuff used
> > >>>     >> by dumping into this file (e.g., DumpRegion, WriteClosure from
> > >>>     >> metaspaceShared.hpp), since theses are not used by the majority
> > >>>     of the
> > >>>     >> files that use metaspaceShared.hpp.
> > >>>     >>
> > >>>     >> I fixed the ifdef.
> > >>>     >>
> > >>>     >>> - src/hotspot/share/memory/archiveUtils.cpp
> > >>>     >>>
> > >>>     >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
> > >>>     address*
> > >>>     >>> ptr_base, address* ptr_end) {
> > >>>     >>>     37   assert(_ptrmap == NULL, "initialize only once");
> > >>>     >>>     38   _ptr_base = ptr_base;
> > >>>     >>>     39   _ptr_end = ptr_end;
> > >>>     >>>     40   _compacted = false;
> > >>>     >>>     41   _ptrmap = ptrmap;
> > >>>     >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
> > >>>     default
> > >>>     >>> archive is about 12MB.
> > >>>     >>>     43 }
> > >>>     >>>
> > >>>     >>> Could we do a better estimate here? We could guesstimate the
> > >>> size
> > >>>     >>> based on the current used class space and metaspace size. It's
> > >>>     okay if
> > >>>     >>> a larger bitmap used, since it can be reduced after all
> > >>>     marking are
> > >>>     >>> done.
> > >>>     >> The bitmap is automatically expanded when necessary in
> > >>>     >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
> > >>>     of the
> > >>>     >> total archive size, so even if we do expand, the cost will be
> > >>>     trivial.
> > >>>     > The initial value is based on the default CDS archive. When
> > >>> dealing
> > >>>     > with a really large archive, it would have to re-grow many times.
> > >>>     > Also, using a hard-coded value is less desirable.
> > >>>
> > >>>     OK, I changed it to the following
> > >>>
> > >>>        // Use this as initial guesstimate. We should need less space
> > >>>     in the
> > >>>        // archive, but if we're wrong the bitmap will be expanded
> > >>>     automatically.
> > >>>        size_t estimated_archive_size =
> > >>> MetaspaceGC::capacity_until_GC();
> > >>>        // But set it smaller in debug builds so we always test the
> > >>>     expansion
> > >>>     code.
> > >>>        // (Default archive is about 12MB).
> > >>>        DEBUG_ONLY(estimated_archive_size = 6 * M);
> > >>>
> > >>>        // We need one bit per pointer in the archive.
> > >>>        _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
> > >>>
> > >>>
> > >>>     Thanks!
> > >>>     - Ioi
> > >>>
> > >>>     >
> > >>>     >>>
> > >>>     >>>
> > >>>     >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
> > >>>     <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
> > >>>     >>>> Hi Ioi,
> > >>>     >>>>
> > >>>     >>>> This is another great step for CDS usability improvement.
> > >>>     Thank you!
> > >>>     >>>>
> > >>>     >>>> I have a high level question (or request): could we consider
> > >>>     >>>> separating the relocation work for 'direct' class metadata
> > >>>     from other
> > >>>     >>>> types of metadata (such as the shared system dictionary,
> > >>>     symbol table,
> > >>>     >>>> etc)? Initially we only relocate the tables and other
> > >>>     archived global
> > >>>     >>>> data. When each archived class is being loaded, we can
> > >>>     relocate all
> > >>>     >>>> the pointers within the current class. We could find the
> > >>>     segment (for
> > >>>     >>>> the current class) in the bitmap and update the pointers
> > >>>     within the
> > >>>     >>>> segment. That way we can reduce initial startup costs and
> > >>>     also avoid
> > >>>     >>>> relocating class data that's not used at runtime. In some
> > >>>     real world
> > >>>     >>>> large systems, an archive may contain extremely large
> > >>> number of
> > >>>     >>>> classes.
> > >>>     >>>>
> > >>>     >>>> Following are partial review comments so we can move things
> > >>>     forward.
> > >>>     >>>> Still going through the rest of the changes.
> > >>>     >>>>
> > >>>     >>>> - src/hotspot/share/classfile/javaClasses.cpp
> > >>>     >>>>
> > >>>     >>>> 1218 void
> > >>> java_lang_Class::update_archived_mirror_native_pointers(oop
> > >>>     >>>> archived_mirror) {
> > >>>     >>>> 1219   Klass* k =
> > >>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
> > >>>     >>>> 1220   if (k != NULL) { // k is NULL for the primitive
> > >>>     classes such as
> > >>>     >>>> java.lang.Byte::TYPE <<<<<<<<<<<
> > >>>     >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
> > >>>     >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
> > >>>     >>>> 1222   }
> > >>>     >>>> 1223 ...
> > >>>     >>>>
> > >>>     >>>> Primitive type mirrors are handled separately. Could you
> > >>>     please verify
> > >>>     >>>> if this call path happens for primitive type mirror?
> > >>>     >>>>
> > >>>     >>>> To answer my question above, looks like you added the
> > >>>     following, which
> > >>>     >>>> is to be used for primitive type mirrors. That seems to be
> > >>>     the reason
> > >>>     >>>> why update_archived_mirror_native_pointers is trying to also
> > >>>     cover
> > >>>     >>>> primitive type. It better to have a separate API for
> > >>>     primitive type
> > >>>     >>>> mirror, which is cleaner. And, we also can replace the above
> > >>>     check at
> > >>>     >>>> line 1220 to be an assert for regular mirrors.
> > >>>     >>>>
> > >>>     >>>> +void ReadClosure::do_mirror_oop(oop *p) {
> > >>>     >>>> +  do_oop(p);
> > >>>     >>>> +  oop mirror = *p;
> > >>>     >>>> +  if (mirror != NULL) {
> > >>>     >>>> +
> > >>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
> > >>>     >>>> +  }
> > >>>     >>>> +}
> > >>>     >>>> +
> > >>>     >>>>
> > >>>     >>>> How about renaming update_archived_mirror_native_pointers to
> > >>>     >>>> update_archived_mirror_klass_pointers.
> > >>>     >>>>
> > >>>     >>>> It would be good to pass the current klass as an argument.
> > >>> We can
> > >>>     >>>> verify the relocated pointer matches with the current klass
> > >>>     pointer.
> > >>>     >>>>
> > >>>     >>>> We should also check if relocation is necessary before
> > >>>     spending cycles
> > >>>     >>>> to obtain the klass pointer from the mirror.
> > >>>     >>>>
> > >>>     >>>> 1252  update_archived_mirror_native_pointers(m);
> > >>>     >>>> 1253
> > >>>     >>>> 1254   // mirror is archived, restore
> > >>>     >>>> 1255  assert(HeapShared::is_archived_object(m), "must be
> > >>> archived
> > >>>     >>>> mirror object");
> > >>>     >>>> 1256   Handle mirror(THREAD, m);
> > >>>     >>>>
> > >>>     >>>> Could we move the line at 1252 after the assert at line 1255?
> > >>>     >>>>
> > >>>     >>>> - src/hotspot/share/include/cds.h
> > >>>     >>>>
> > >>>     >>>>     47   int     _mapped_from_file;  // Is this region mapped
> > >>>     from a file?
> > >>>     >>>>     48                               // If false, this
> > >>> region was
> > >>>     >>>> initialized using os::read().
> > >>>     >>>>
> > >>>     >>>> Is the new field truly needed? It seems we could use
> > >>>     _mapped_base to
> > >>>     >>>> determine if a region is mapped or not?
> > >>>     >>>>
> > >>>     >>>> - src/hotspot/share/memory/dynamicArchive.cpp
> > >>>     >>>>
> > >>>     >>>> Could you please remove the debugging print code in
> > >>>     >>>> dynamic_dump_method_comparator? Or convert those to logging
> > >>>     output if
> > >>>     >>>> they are helpful.
> > >>>     >>>>
> > >>>     >>>> Will send out the rest of the review comments later.
> > >>>     >>>>
> > >>>     >>>> Best,
> > >>>     >>>>
> > >>>     >>>> Jiangli
> > >>>     >>>>
> > >>>     >>>>
> > >>>     >>>>
> > >>>     >>>>
> > >>>     >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
> > >>>     <mailto:ioi.lam at oracle.com>> wrote:
> > >>>     >>>>> Bug:
> > >>>     >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
> > >>>     >>>>>
> > >>>     >>>>> Webrev:
> > >>>     >>>>>
> > >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
> > >>>
> > >>>     >>>>>
> > >>>     >>>>> Design:
> > >>>     >>>>>
> > >>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
> > >>>
> > >>>     >>>>>
> > >>>     >>>>>
> > >>>     >>>>> Overview:
> > >>>     >>>>>
> > >>>     >>>>> The CDS archive is mmaped to a fixed address range
> > >>> (starting at
> > >>>     >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
> > >>>     >>>>> requested address range is not available (usually due to
> > >>> Address
> > >>>     >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give
> > >>> up and
> > >>>     >>>>> will load classes dynamically using class files.
> > >>>     >>>>>
> > >>>     >>>>> [a] This causes slow down in JVM start-up.
> > >>>     >>>>> [b] Handling of mapping failures causes unnecessary
> > >>>     complication in
> > >>>     >>>>>        the CDS tests.
> > >>>     >>>>>
> > >>>     >>>>> Here are some preliminary benchmarking results (using
> > >>>     default CDS archive,
> > >>>     >>>>> running helloworld):
> > >>>     >>>>>
> > >>>     >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
> > >>>     >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
> > >>>     >>>>> (c) 86.2ms (CDS disabled)
> > >>>     >>>>>
> > >>>     >>>>> The small degradation in (b) is caused by the relocation of
> > >>>     >>>>> absolute pointers embedded in the CDS archive. However, it is
> > >>>     >>>>> still a big improvement over case (c)
> > >>>     >>>>>
> > >>>     >>>>> Please see the design doc (link above) for details.
> > >>>     >>>>>
> > >>>     >>>>> Thanks
> > >>>     >>>>> - Ioi
> > >>>     >>>>>
> > >>>
> > >>
> > >
> >

From leonid.mesnik at oracle.com  Fri Nov  8 04:32:31 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Thu, 7 Nov 2019 20:32:31 -0800
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
In-Reply-To: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
References: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
Message-ID: <2a082ca0-5cf6-12d9-3d62-30e99e593536@oracle.com>

Hi

I am not sure that it is a good idea. The bug says that test times out 
and take more 40 minutes.

However usually it takes less than 2 minutes to complete test (on any 
platform). So I think that it is not just a long test but rather some 
outlier which should be investigated.

Leonid

On 11/7/19 2:03 PM, Harold Seigel wrote:
> Hi,
>
> Please review this small change to help prevent test 
> runtime/modules/ModuleStress/ModuleStressGC.java from timing out. The 
> change reduces the number of loop iterations in the test by 40%.
>
> Open Webrev: 
> http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html
>
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055
>
> The change was tested by running Mach5 tier2 tests on Linux-x64, 
> Solaris, Windows, and Mac OS X.
>
> Thanks, Harold
>

From goetz.lindenmaier at sap.com  Fri Nov  8 07:04:04 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 8 Nov 2019 07:04:04 +0000
Subject: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
In-Reply-To: <F5466D90-798B-4D5F-932F-25A53D0E3B0A@oracle.com>
References: <14ff7e37-c7a1-9c74-584d-f00c7816696c@redhat.com>
 <AM6PR02MB53475BBD24B9F49F50C629D7EC780@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <F5466D90-798B-4D5F-932F-25A53D0E3B0A@oracle.com>
Message-ID: <AM6PR02MB5347F148C0DC5BAA68E35F39EC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi Kim,

thanks a lot!

Best regards,
  Goetz.

> -----Original Message-----
> From: Kim Barrett <kim.barrett at oracle.com>
> Sent: Donnerstag, 7. November 2019 20:47
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Cc: Aleksey Shipilev <shade at redhat.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR (XS) 8233698: GCC 4.8.5 build failure after JDK-8233530
> 
> > On Nov 7, 2019, at 5:24 AM, Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com> wrote:
> > @Oracle: it would also be nice to find there that you switched to
> > gcc 8.
> 
> That got done today.  Thanks for the reminder.


From felix.yang at huawei.com  Fri Nov  8 08:30:00 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Fri, 8 Nov 2019 08:30:00 +0000
Subject: 8233839: aarch64: missing memory barrier in NewObjectArrayStub and
 NewTypeArrayStub  
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820ED60280EF@dggeml527-mbx.china.huawei.com>

Hi,

I witnessed random fail of one jcstress test on my 128-core aarch64 server: "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest"
Bug: https://bugs.openjdk.java.net/browse/JDK-8233839

I used the latest aarch64 jdk8u release build.  Please refer to the bugzilla for details and the analysis.
  I checked the assembler code emitted by LIR_Assembler::emit_alloc_array:
For the fast path, the StoreStore memory barrier is there.  But it?s not the case for the slow path.

  Patch adding the missing barrier for 14:

diff -r ad157fab6bf5 src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp
--- a/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp   Thu Nov 07 16:26:57 2019 -0800
+++ b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp   Fri Nov 08 16:10:08 2019 +0800
@@ -840,6 +840,7 @@
           __ sub(arr_size, arr_size, t1);  // body length
           __ add(t1, t1, obj);       // body start
           __ initialize_body(t1, arr_size, 0, t2);
+          __ membar(Assembler::StoreStore);
           __ verify_oop(obj);

           __ ret(lr);

  JDK builds OK and passed tier1 test.

Thanks,
Felix

From adinn at redhat.com  Fri Nov  8 09:04:08 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 8 Nov 2019 09:04:08 +0000
Subject: 8233839: aarch64: missing memory barrier in NewObjectArrayStub
 and NewTypeArrayStub
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820ED60280EF@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820ED60280EF@dggeml527-mbx.china.huawei.com>
Message-ID: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com>

On 08/11/2019 08:30, Yangfei (Felix) wrote:
> I witnessed random fail of one jcstress test on my 128-core aarch64 server: "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest"
> Bug: https://bugs.openjdk.java.net/browse/JDK-8233839
> 
> I used the latest aarch64 jdk8u release build.  Please refer to the bugzilla for details and the analysis.
>   I checked the assembler code emitted by LIR_Assembler::emit_alloc_array:
> For the fast path, the StoreStore memory barrier is there.  But it?s not the case for the slow path.
> 
>   Patch adding the missing barrier for 14:
> 
> diff -r ad157fab6bf5 src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp
> --- a/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp   Thu Nov 07 16:26:57 2019 -0800
> +++ b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp   Fri Nov 08 16:10:08 2019 +0800
> @@ -840,6 +840,7 @@
>            __ sub(arr_size, arr_size, t1);  // body length
>            __ add(t1, t1, obj);       // body start
>            __ initialize_body(t1, arr_size, 0, t2);
> +          __ membar(Assembler::StoreStore);
>            __ verify_oop(obj);
> 
>            __ ret(lr);
> 
>   JDK builds OK and passed tier1 test.
Very nice detective work finding that one!

The jdk14 patch looks good. Also the same patch for jdk11 and the
variant for jdk8 are good.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From claes.redestad at oracle.com  Fri Nov  8 11:57:56 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Fri, 8 Nov 2019 12:57:56 +0100
Subject: RFR: 8233497: Optimize default method generation by data structure
 reuse
Message-ID: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>

Hi,

when loading classes with complex hierarchies and many default methods,
we can end up spending significant time in
DefaultMethods::generate_default_methods

This optimization reduces work done and memory requirements by reusing
allocated data structures. For example by maintaining free lists of
allocated Node objects.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8233497
Webrev: http://cr.openjdk.java.net/~redestad/8233497/open.00/

Testing: Tier1-3, will make sure tier4-7 pass before push

Performance notes: On one of our more complex startup tests we see a 3%
improvement on the execution time total. Much less on simpler
applications.

I've not done a formal complexity analysis, but I think the memory
complexity is now down from O(N*M) to O(N+M) where N is the number of
classes and interfaces in the hierarchy and M the number of methods of
interest in that hierarchy. Algorithmic complexity is probably O(N*M)
still, but with much better constants.

Special thanks to Lois for patience and persistence over several rounds
of pre-review!

Thanks!

/Claes

From harold.seigel at oracle.com  Fri Nov  8 12:59:02 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Fri, 8 Nov 2019 07:59:02 -0500
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
In-Reply-To: <82556da6-5e85-eb8e-d6b2-fa8cec141c73@oracle.com>
References: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
 <82556da6-5e85-eb8e-d6b2-fa8cec141c73@oracle.com>
Message-ID: <32a3f006-7c47-d63c-4e9b-71c73130b3cb@oracle.com>

Thanks Misha!

Harold

On 11/7/2019 7:03 PM, mikhailo.seledtsov at oracle.com wrote:
> Looks good to me,
>
> Misha
>
> On 11/7/19 2:03 PM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this small change to help prevent test 
>> runtime/modules/ModuleStress/ModuleStressGC.java from timing out. The 
>> change reduces the number of loop iterations in the test by 40%.
>>
>> Open Webrev: 
>> http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055
>>
>> The change was tested by running Mach5 tier2 tests on Linux-x64, 
>> Solaris, Windows, and Mac OS X.
>>
>> Thanks, Harold
>>

From harold.seigel at oracle.com  Fri Nov  8 12:59:44 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Fri, 8 Nov 2019 07:59:44 -0500
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
In-Reply-To: <2a082ca0-5cf6-12d9-3d62-30e99e593536@oracle.com>
References: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
 <2a082ca0-5cf6-12d9-3d62-30e99e593536@oracle.com>
Message-ID: <778431ec-85f5-92c4-7aaf-0727a3ff4768@oracle.com>

Thanks Leonid.

I'll withdraw the webrev.

Harold

On 11/7/2019 11:32 PM, Leonid Mesnik wrote:
> Hi
>
> I am not sure that it is a good idea. The bug says that test times out 
> and take more 40 minutes.
>
> However usually it takes less than 2 minutes to complete test (on any 
> platform). So I think that it is not just a long test but rather some 
> outlier which should be investigated.
>
> Leonid
>
> On 11/7/19 2:03 PM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this small change to help prevent test 
>> runtime/modules/ModuleStress/ModuleStressGC.java from timing out. The 
>> change reduces the number of loop iterations in the test by 40%.
>>
>> Open Webrev: 
>> http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055
>>
>> The change was tested by running Mach5 tier2 tests on Linux-x64, 
>> Solaris, Windows, and Mac OS X.
>>
>> Thanks, Harold
>>

From boris.ulasevich at bell-sw.com  Fri Nov  8 13:28:47 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Fri, 8 Nov 2019 16:28:47 +0300
Subject: RFR(S): 8233113: ARM32: assert on UnsafeJlong mutex rank check
Message-ID: <5124def3-3bf7-8425-557d-c6cba6192927@bell-sw.com>

Hi,

Recent JDK-8184732 change adds the assertion that fires on UnsafeJlong 
mutex rank check, on platforms without 64 bit atomics 
compare-and-exchange support. On preliminary review (thanks to Coleen 
and David!) it is suggested to remove the assertion and corresponding 
test codes.

http://bugs.openjdk.java.net/browse/JDK-8233113
http://cr.openjdk.java.net/~bulasevich/8233113/webrev.01

Thanks,
Boris

From robbin.ehn at oracle.com  Fri Nov  8 13:35:31 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Fri, 8 Nov 2019 14:35:31 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
Message-ID: <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>

Hi Dan,

Thanks for looking into this, some comments on v8:

##################
src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
src/hotspot/cpu/x86/globalDefinitions_x86.hpp
src/hotspot/share/logging/logTag.hpp
src/hotspot/share/oops/markWord.hpp
src/hotspot/share/runtime/basicLock.cpp
src/hotspot/share/runtime/safepoint.cpp
src/hotspot/share/runtime/serviceThread.cpp
src/hotspot/share/runtime/sharedRuntime.cpp
src/hotspot/share/runtime/synchronizer.hpp
src/hotspot/share/runtime/vmOperations.cpp
src/hotspot/share/runtime/vmOperations.hpp
src/hotspot/share/runtime/vmStructs.cpp
src/hotspot/share/runtime/vmThread.cpp
test/hotspot/gtest/oops/test_markWord.cpp

No comments.

##################
I don't see the benefit of having the -HandshakeAfterDeflateIdleMonitors code paths.
Removing that option would mean these files can be reverted:
src/hotspot/cpu/aarch64/globals_aarch64.hpp
src/hotspot/cpu/arm/globals_arm.hpp
src/hotspot/cpu/ppc/globals_ppc.hpp
src/hotspot/cpu/s390/globals_s390.hpp
src/hotspot/cpu/sparc/globals_sparc.hpp
src/hotspot/cpu/x86/globals_x86.hpp
src/hotspot/cpu/x86/macroAssembler_x86.cpp
src/hotspot/cpu/x86/macroAssembler_x86.hpp
src/hotspot/cpu/zero/globals_zero.hpp

And one less option here:
src/hotspot/share/runtime/globals.hpp

##################
src/hotspot/share/prims/jvm.cpp

Unclear if this is a good idea.

##################
src/hotspot/share/prims/whitebox.cpp

This would assume the test expects the right thing, but that is not obvious.

##################
src/hotspot/share/prims/jvmtiEnvBase.cpp

The current pending and waiting monitor is only changed by the JavaThread itself.
It only sets it after _contentions is increased.
It clears it before _contentions is decreased.
We are depending on safepoint or the thread is suspended, so it can't be 
deflated since _contentions are > 0.
Plus the thread have already increased the ref count and can't decrease it 
(since at safepoint or suspended).

##################
src/hotspot/share/runtime/objectMonitor.cpp

###1
You have several these (and in other files):
242   jint l_ref_count = ref_count();
243   ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d, 
ref_count=%d", l_ref_count, ref_count());
Please use Atomic::load() in ref_count.
Since this is dependent on ref_count being volatile, otherwise the compiler may 
only do one load.

###2
307   // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
...
311   Atomic::add(1, &_contentions);
In ObjectSynchronizer::deflate_monitor if you would check ref count instead of 
_contetion, we could remove contention.
Since all waiters also have a ref count it looks like we don't need waiters either.
In ObjectSynchronizer::deflate_monitor:
if (mid->_contentions != 0 || mid->_waiters != 0) {
Why not just do:
if (mid->ref_count()) {
?

##################
src/hotspot/share/runtime/objectMonitor.hpp

###1
  252   intptr_t is_busy() const {
  253     // TODO-FIXME: assert _owner == null implies _recursions = 0
  254     // We do not include _ref_count in the is_busy() check because
  255     // _ref_count is for indicating that the ObjectMonitor* is in
  256     // use which is orthogonal to whether the ObjectMonitor itself
  257     // is in use for a locking operation.

But in the non-debug code we always check:
+  if (mid->is_busy() || mid->ref_count() != 0) {

So it seem like you should have a method including ref count.

##################
src/hotspot/share/runtime/objectMonitor.inline.hpp

Use Atomic::load for ref count.

##################
src/hotspot/share/runtime/synchronizer.cpp

###1
  139 static volatile int g_om_free_count = 0;    // # on g_free_list
  140 static volatile int g_om_in_use_count = 0;  // # on g_om_in_use_list
  141 static volatile int g_om_population = 0;    // # Extant -- in circulation
  142 static volatile int g_om_wait_count = 0;    // # on g_wait_list
No padding here, aren't they more contended than the fields in the OM?

###2
151 static bool is_next_marked(ObjectMonitor* om) {

Is only used in ObjectSynchronizer::om_flush.
Here you fetch a OM and read the next field, this do not need LA semantics on 
supported platforms.
This would only need Atomic::load.

###3
191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {

In no place you need SR, in the only places it would made a difference:
  345       OrderAccess::storestore();
  346       set_next(cur, next);  // Unmark the previous list head.
and
1714     OrderAccess::storestore();
1715     set_next(in_use_list, next);

You have a storestore already!

This code reads as:
OrderAccess::storestore();
OrderAccess::loadstore();
OrderAccess::storestore();
om->_next_om = value

So it should be an Atomic::store.

###4
198 static bool mark_list_head(ObjectMonitor* volatile * list_p

Since the mark is an embedded spinlock I think the terminology should be 
changed. (that the spinlock is inside a the next pointer should be abstracted away)
E.g. mark_next_loop would just be lock.
The load of the list heads should use Atmoic:load.
It also seem a bit wired to return next for the locking method.
And output parameter can just be returned, and return NULL if list head is NULL.
E.g.

  198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* volatile * list_p) {
  200   while (true) {
  201     ObjectMonitor* mid = Atomic::load(list_p);
  202     if (mid == NULL) {
  203       return NULL;  // The list is empty.
  204     }
  205     if (try_lock(mid)) {
  206       if (Atmoic::load(list_p) != mid) {
  207         // The list head changed so we have to retry.
  208         unlock(mid);
  210       } else {
              return mid;
	   }
  214     }
          // Yield ?
  215   }
  216 }

With colleteral changes.

###5
220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
Atomic::store is what needed.

###6
333 static void prepend_to_common(

  345       OrderAccess::storestore();
  346       set_next(cur, next);  // Unmark the previous list head.
Double storestore. (fixed by changing set_next to Atomic::store)

###7
  375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* volatile * 
list_p,

Triple storestore here.

  386   Atomic::dec(count_p);
  387   // mark_list_head() used cmpxchg() above, switching list head can be lazier:
  388   OrderAccess::storestore();
  389   // Unmark take, but leave the next value for any lagging list
  390   // walkers. It will get cleaned up when take is prepended to
  391   // the in-use list:
  392   set_next(take, next);
  393   return take;

Reads:
count_p--
OrderAccess::loadstore();
OrderAccess::storestore();
OrderAccess::storestore();
OrderAccess::loadstore();
OrderAccess::storestore();
take->_next_om = next;

Fixed by changing set_next to Atomic::store and removing the 
OrderAccess::storestore();

###8
ObjectSynchronizer::om_release(

1591       if (m == mid) {
1592         // We found 'm' on the per-thread in-use list so try to extract it.
1593         if (cur_mid_in_use == NULL) {
1594           // mid is the list head and it is marked. Switch the list head
1595           // to next which unmarks the list head, but leaves mid marked:
1596           self->om_in_use_list = next;
1597           // mark_list_head() used cmpxchg() above, switching list head can 
be lazier:
1598           OrderAccess::storestore();
1599         } else {
1600           // mid and cur_mid_in_use are marked. Switch cur_mid_in_use's
1601           // next field to next which unmarks cur_mid_in_use, but leaves
1602           // mid marked:
1603           OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
1604         }
1605         extracted = true;
1606         Atomic::dec(&self->om_in_use_count);
1607         // Unmark mid, but leave the next value for any lagging list
1608         // walkers. It will get cleaned up when mid is prepended to
1609         // the thread's free list:
1610         set_next(mid, next);
1611         break;
1612       }

This does not look correct. Before taking this branch we have done a cmpxchg in 
mark_list_head or mark_next_loop.
This is how it reads:
OrderAccess::storestore(); // from previous cmpxchg
OrderAccess::loadstore(); // from previous cmpxchg
1591       if (m == mid) {
1593         if (cur_mid_in_use == NULL) {
1596           self->om_in_use_list = next;
1598           OrderAccess::storestore();
1599         } else {
                OrderAccess::storestore();
                OrderAccess::loadstore();
1603           cur_mid_in_use->_next_om = next;
1604         }
1605         extracted = true;
              OrderAccess::storestore();
              OrderAccess::fence(); // storestore|storeload|loadstore|loadload
	     self->om_in_use_count--; // Atomic::dec
              OrderAccess::storestore();
              OrderAccess::loadstore();
              OrderAccess::storestore();
              OrderAccess::loadstore();
	     mid->_next_om = next; // Atomic::store
1611         break;
1612       }

extracted is local variable so you so not need any orderaccess before it set.
Fixed by changing set_next to Atomic::store, removing the 
OrderAccess::storestore() and changing OrderAccess::release_store to 
Atmoic::store();

###9
1653 void ObjectSynchronizer::om_flush(Thread* self) {

1714     OrderAccess::storestore();
1715     set_next(in_use_list, next);
Fixed by changing set_next to Atomic::store.

###10
1737     self->om_free_list = NULL;
1738     OrderAccess::storestore();  // Lazier memory is okay for list walkers.

prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does first thing 
cmpxchg so there is no need for this storestore.

###11
1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, Thread* self,

1938       // Once ObjectMonitor is configured and the object is associated
1939       // with the ObjectMonitor, it is safe to allow async deflation:
1940       assert(m->is_new(), "freshly allocated monitor must be new");
1941       m->set_allocation_state(ObjectMonitor::Old);

So we use ref count, contention, waiter, owner and allocation state to keep OM 
alive in different scenarios.
There is not way for me to keep track of that. I don't see why you would need 
more than owner and ref count.
If you allocate the om with ref count 1 you can remove _allocation_state and 
just decrease ref count here instead.

###12
2079 bool ObjectSynchronizer::deflate_monitor

2112     if (AsyncDeflateIdleMonitors) {
2113       // clear() expects the owner field to be NULL and we won't race
2114       // with the simple C2 ObjectMonitor

The macro assambler code is not just executed by C2, so this comment is a bit 
misleading. (there are some more also)

###13
2306 int ObjectSynchronizer::deflate_monitor_list(

Same issue as ObjectSynchronizer::om_release.
Fixed by changing set_next to Atomic::store, removing the 
OrderAccess::storestore() and changing OrderAccess::release_store to 
Atmoic::store();

###14
2474       if (SafepointSynchronize::is_synchronizing() &&

This is the wrong method to call, it should 
SafepointMechanism::should_block(Thread* thread);

###15
2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {

2616     g_wait_list = NULL;
2617     OrderAccess::storestore();  // Lazier memory sync is okay for list walkers.

I don't see that g_wait_list is ever simutainously read.
Either it is accessed by serviceThread outside a safepoint or by VMThread inside 
a safepoint?

It looks like g_wait_list can just be a local in:
void ObjectSynchronizer::deflate_idle_monitors_using_JT()

(disregarding the debug code that might read it in a safepoint)

###16
2722         assert(SafepointSynchronize::is_synchronizing(), "sanity check");

This is the wrong method to call, it should 
SafepointMechanism::should_block(Thread* thread);

##################
src/hotspot/share/runtime/vframe.cpp

We are at safepoint or current thread or in a handshake, current pending and 
waiting monitor is already stable.

##################
src/hotspot/share/services/threadService.cpp

These changes are only needed for the -HandshakeAfterDeflateIdleMonitors path.

##################
test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java

Note: if OM had a weak to object instead this would not be needed.

Thanks, Robbin


On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
> Greetings,
> 
> I have made changes to the Async Monitor Deflation code in response to
> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
> and Erik O. for their comments!
> 
> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
> Deflation project needs to push before Nov. 12, 2019 in order to allow
> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
> so we have 8 days from today to finish this code review cycle and push
> this code for JDK14.
> 
> Carsten and Roman! Time for you guys to chime in again on the code reviews.
> 
> I have attached the change list from CR7 to CR8 instead of putting it in
> the body of this email. I've also added a link to the CR7-to-CR8-changes
> file to the webrevs so it should be easy to find.
> 
> Main bug URL:
> 
>  ??? JDK-8153224 Monitor deflation prolong safepoints
>  ??? https://bugs.openjdk.java.net/browse/JDK-8153224
> 
> The project is currently baselined on jdk-14+21.
> 
> Here's the full webrev URL for those folks that want to see all of the
> current Async Monitor Deflation code in one go (v2.08 full):
> 
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
> 
> Some folks might want to see just what has changed since the last review
> cycle so here's a webrev for that (v2.08 inc):
> 
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
> 
> The OpenJDK wiki did not need any changes for this round:
> 
> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
> 
> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 tier[1-8]
> testing on Oracle's usual set of platforms. It has also been through my usual
> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
> my lab. Some testing is still running, but so far there are no new regressions.
> 
> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 bits.
> 
> Thanks, in advance, for any questions, comments or suggestions.
> 
> Dan
> 
> 
> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The Async Monitor Deflation project is reaching the end game. I have no
>> changes planned for the project at this time so all that is left is code
>> review and any changes that results from those reviews.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>
>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>> in the main body of this email.
>>
>> Main bug URL:
>>
>> ??? JDK-8153224 Monitor deflation prolong safepoints
>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+19.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.07 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.07 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/
>>
>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 changes:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>> my lab.
>>
>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 tier[1-3]
>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>
>> I did another round of SPECjbb2015 testing in Oracle's Aurora Performance lab
>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>
>> ??? - "base" is jdk-14+18
>> ??? - "v2.07" is the latest version and includes C2 inc_om_ref_count() support
>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors specified
>>
>> ???????? hbIR?????????? hbIR
>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>> ??? ---------------? ---------? --------? -------------? -------
>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 v2.07_handshake
>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 v2.07_off
>>
>> ??? - The Aurora Perf comparison tool reports:
>>
>> ??????? Comparison????????????? max-jOPS critical-jOPS
>> ??????? ----------------------? -------------------- --------------------
>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, p=0.790)
>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, p=0.536)
>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, p=0.347)
>>
>> ??????? (s) - significant? (ns) - not-significant
>>
>> ??? - For historical comparison, the Aurora Perf comparision tool
>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>
>> ??????? Comparison????????????? max-jOPS critical-jOPS
>> ??????? ----------------------? -------------------- --------------------
>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, p=0.646)
>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, p=0.481)
>>
>> ??????? (s) - significant? (ns) - not-significant
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>> for our first code review in that new context!!
>>>
>>> I've been focused on changing the monitor list management code to be
>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>> like that, it takes a while to chase down all the new and wonderful
>>> races. At this point, I have the code back to the same stability that
>>> I had with CR5/v2.05/8-for-jdk13.
>>>
>>> To lay the ground work for this round of review, I pushed the following
>>> two fixes to jdk/jdk earlier today:
>>>
>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in preparation
>>> ? ? ??????????? for lock free Monitor lists
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>
>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails after 8230184
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>
>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>> JDK-8230184 and JDK-8230317.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/
>>>
>>>
>>> The primary focus of this review cycle is on the lock-free Monitor List
>>> management changes so here's a webrev for just that patch (v2.06c):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/
>>>
>>> The secondary focus of this review cycle is on the bug fixes that have
>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>> patch (v2.06b):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/
>>>
>>> The third and final bucket for this review cycle is the rename, whitespace,
>>> indent and comments changes made in preparation for lock free Monitor list
>>> management. Almost all of that was extracted into JDK-8230184 for the
>>> baseline so this bucket now has just a few comment changes relative to
>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/
>>>
>>>
>>> Some folks might want to see just what has changed since the last review
>>> cycle so here's a webrev for that (v2.06 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
>>>
>>>
>>> Last, but not least, some folks might want to see the code before the
>>> addition of lock-free Monitor List management so here's a webrev for
>>> that (v2.00 -> v2.05):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/
>>>
>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> but that should only be changes to describe per-thread list async monitor
>>> deflation being done by the ServiceThread.
>>>
>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>
>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>> Oracle's usual set of platforms. It has also been through my usual set
>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>
>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was using
>>> this patch baselined on jdk-13+31 (for stability):
>>>
>>> ????????? hbIR?????????? hbIR
>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>> ???? ---------------? ---------? --------? ------------- -------
>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 v2.06d_off
>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 v2.06d_ocache
>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 v2.05_off
>>>
>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>> the "ocache" config is with 128 byte cache line sizes instead of 64 byte
>>> cache lines sizes. "v2.06d" is the last set of changes that I made before
>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>> buckets for this review recycle.
>>>
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I've been focused on chasing down and fixing the rare test failures
>>>> that only pop up rarely. So this round is primarily fixes for races
>>>> with a few additional fixes that came from Karen's review of CR4.
>>>> Thanks Karen!
>>>>
>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>> (jdk/jdk) repo soon...
>>>>
>>>> Here's the full webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>
>>>> Here's the incremental webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>
>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>> to match the CR5 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become available.
>>>> Since I haven't made any performance changes in this round, I'll only
>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>
>>>> Next up:
>>>>
>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>> ? Linux-X64 and we've determined that some of that comes from
>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>> - Of course, going lock free often causes new races and new failures
>>>> ? so that's a good reason for make those changes isolated in their
>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>> - I finally have a potential fix for the Win* failure with
>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>> ? but I haven't run it through Mach5 yet so it'll be in the next round.
>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>> ? monitor related failures there. I suspect that I need to go take a
>>>> ? look at the C2 RTM macro assembler code and look for things that might
>>>> ? conflict if Async Monitor Deflation. If you're interested in that kind
>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>> ? added in this round!
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a fix for an issue that came up during performance testing.
>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>> experiments.
>>>>>
>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>> verbose due to the complexity of the issue, but the changes
>>>>> themselves are not that big.
>>>>>
>>>>> Functional:
>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>> ??????? safepoints that simply request async deflation which
>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>> ??????? their async deflation work.
>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>> ??? - is_async_deflation_needed() returns true when
>>>>> ????? is_async_cleanup_requested() is true or when
>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>> ????? AsyncDeflationInterval).
>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>> ??????? (default is 1000 millis/1 second)
>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>> ??????? This is to prevent async deflation from swamping the
>>>>> ??????? ServiceThread.
>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>> ??? of their in-use lists.
>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>> ????? GuaranteedSafepointInterval.
>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>
>>>>> Test update:
>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>
>>>>> Collateral:
>>>>> ? - Add/clarify/update some logging messages.
>>>>>
>>>>> Cleanup:
>>>>> ? - Updated comments based on Karen's code review.
>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>> ??? - comment and function name changes
>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+22.
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>
>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>> mechanism so I have to figure out how to add that content.
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I had some discussions with Karen about a race that was in the
>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>> simple: remove the special case code for async deflation in the
>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>> for ObjectMonitor::enter() protection.
>>>>>>
>>>>>> During those discussions Karen also floated the idea of using the
>>>>>> ref_count field instead of the contentions field for the Async
>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>> Sorry for all the names...
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>
>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>> stress kit is running right now.
>>>>>>
>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>> the results and analyze them.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>>>> project ready to go. It's also known as v2.02 (for those for with the
>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>>>>>> for all the names...
>>>>>>>
>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>>>>>> is out of our hair.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>
>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>
>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>> testing is done.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>> names...
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> Baseline bug fixes URL:
>>>>>>>>
>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor Deflation project
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>
>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>>>>>>
>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>
>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest version
>>>>>>>> of JDK-8153224...
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>
>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>
>>>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>
>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>> latest version of the patch.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> P.S.
>>>>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> 

From coleen.phillimore at oracle.com  Fri Nov  8 14:03:07 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 8 Nov 2019 09:03:07 -0500
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
In-Reply-To: <778431ec-85f5-92c4-7aaf-0727a3ff4768@oracle.com>
References: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
 <2a082ca0-5cf6-12d9-3d62-30e99e593536@oracle.com>
 <778431ec-85f5-92c4-7aaf-0727a3ff4768@oracle.com>
Message-ID: <825dcc03-5389-3a75-1d7b-50fed2b73f75@oracle.com>


I had a look at the artifacts from the failed run, and there's no good 
information to tell why the test stopped for 40 minutes (both on 
windows).? It could have been swapped out.

Reducing the loop count so that we verify what the test intends to 
verify makes sense to me.? It'll run faster on all platforms.

I think this change looks fine.

Coleen

On 11/8/19 7:59 AM, Harold Seigel wrote:
> Thanks Leonid.
>
> I'll withdraw the webrev.
>
> Harold
>
> On 11/7/2019 11:32 PM, Leonid Mesnik wrote:
>> Hi
>>
>> I am not sure that it is a good idea. The bug says that test times 
>> out and take more 40 minutes.
>>
>> However usually it takes less than 2 minutes to complete test (on any 
>> platform). So I think that it is not just a long test but rather some 
>> outlier which should be investigated.
>>
>> Leonid
>>
>> On 11/7/19 2:03 PM, Harold Seigel wrote:
>>> Hi,
>>>
>>> Please review this small change to help prevent test 
>>> runtime/modules/ModuleStress/ModuleStressGC.java from timing out. 
>>> The change reduces the number of loop iterations in the test by 40%.
>>>
>>> Open Webrev: 
>>> http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html
>>>
>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055
>>>
>>> The change was tested by running Mach5 tier2 tests on Linux-x64, 
>>> Solaris, Windows, and Mac OS X.
>>>
>>> Thanks, Harold
>>>


From daniel.daugherty at oracle.com  Fri Nov  8 14:10:22 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 8 Nov 2019 09:10:22 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
Message-ID: <2b956948-96d2-8d66-63ea-0c9c634dd4c3@oracle.com>

Robbin,

Thanks for doing such a thorough crawl thru review! I very much appreciate
the feedback. It will take a bit of time to go thru and address all of
these comments which I'll do in another reply.

So this is just an ACK that I've gotten the review email... :-)

Dan


On 11/8/19 8:35 AM, Robbin Ehn wrote:
> Hi Dan,
>
> Thanks for looking into this, some comments on v8:
>
> ##################
> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
> src/hotspot/share/logging/logTag.hpp
> src/hotspot/share/oops/markWord.hpp
> src/hotspot/share/runtime/basicLock.cpp
> src/hotspot/share/runtime/safepoint.cpp
> src/hotspot/share/runtime/serviceThread.cpp
> src/hotspot/share/runtime/sharedRuntime.cpp
> src/hotspot/share/runtime/synchronizer.hpp
> src/hotspot/share/runtime/vmOperations.cpp
> src/hotspot/share/runtime/vmOperations.hpp
> src/hotspot/share/runtime/vmStructs.cpp
> src/hotspot/share/runtime/vmThread.cpp
> test/hotspot/gtest/oops/test_markWord.cpp
>
> No comments.
>
> ##################
> I don't see the benefit of having the 
> -HandshakeAfterDeflateIdleMonitors code paths.
> Removing that option would mean these files can be reverted:
> src/hotspot/cpu/aarch64/globals_aarch64.hpp
> src/hotspot/cpu/arm/globals_arm.hpp
> src/hotspot/cpu/ppc/globals_ppc.hpp
> src/hotspot/cpu/s390/globals_s390.hpp
> src/hotspot/cpu/sparc/globals_sparc.hpp
> src/hotspot/cpu/x86/globals_x86.hpp
> src/hotspot/cpu/x86/macroAssembler_x86.cpp
> src/hotspot/cpu/x86/macroAssembler_x86.hpp
> src/hotspot/cpu/zero/globals_zero.hpp
>
> And one less option here:
> src/hotspot/share/runtime/globals.hpp
>
> ##################
> src/hotspot/share/prims/jvm.cpp
>
> Unclear if this is a good idea.
>
> ##################
> src/hotspot/share/prims/whitebox.cpp
>
> This would assume the test expects the right thing, but that is not 
> obvious.
>
> ##################
> src/hotspot/share/prims/jvmtiEnvBase.cpp
>
> The current pending and waiting monitor is only changed by the 
> JavaThread itself.
> It only sets it after _contentions is increased.
> It clears it before _contentions is decreased.
> We are depending on safepoint or the thread is suspended, so it can't 
> be deflated since _contentions are > 0.
> Plus the thread have already increased the ref count and can't 
> decrease it (since at safepoint or suspended).
>
> ##################
> src/hotspot/share/runtime/objectMonitor.cpp
>
> ###1
> You have several these (and in other files):
> 242?? jint l_ref_count = ref_count();
> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: 
> l_ref_count=%d, ref_count=%d", l_ref_count, ref_count());
> Please use Atomic::load() in ref_count.
> Since this is dependent on ref_count being volatile, otherwise the 
> compiler may only do one load.
>
> ###2
> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
> ...
> 311?? Atomic::add(1, &_contentions);
> In ObjectSynchronizer::deflate_monitor if you would check ref count 
> instead of _contetion, we could remove contention.
> Since all waiters also have a ref count it looks like we don't need 
> waiters either.
> In ObjectSynchronizer::deflate_monitor:
> if (mid->_contentions != 0 || mid->_waiters != 0) {
> Why not just do:
> if (mid->ref_count()) {
> ?
>
> ##################
> src/hotspot/share/runtime/objectMonitor.hpp
>
> ###1
> ?252?? intptr_t is_busy() const {
> ?253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
> ?254???? // We do not include _ref_count in the is_busy() check because
> ?255???? // _ref_count is for indicating that the ObjectMonitor* is in
> ?256???? // use which is orthogonal to whether the ObjectMonitor itself
> ?257???? // is in use for a locking operation.
>
> But in the non-debug code we always check:
> +? if (mid->is_busy() || mid->ref_count() != 0) {
>
> So it seem like you should have a method including ref count.
>
> ##################
> src/hotspot/share/runtime/objectMonitor.inline.hpp
>
> Use Atomic::load for ref count.
>
> ##################
> src/hotspot/share/runtime/synchronizer.cpp
>
> ###1
> ?139 static volatile int g_om_free_count = 0;??? // # on g_free_list
> ?140 static volatile int g_om_in_use_count = 0;? // # on g_om_in_use_list
> ?141 static volatile int g_om_population = 0;??? // # Extant -- in 
> circulation
> ?142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
> No padding here, aren't they more contended than the fields in the OM?
>
> ###2
> 151 static bool is_next_marked(ObjectMonitor* om) {
>
> Is only used in ObjectSynchronizer::om_flush.
> Here you fetch a OM and read the next field, this do not need LA 
> semantics on supported platforms.
> This would only need Atomic::load.
>
> ###3
> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>
> In no place you need SR, in the only places it would made a difference:
> ?345?????? OrderAccess::storestore();
> ?346?????? set_next(cur, next);? // Unmark the previous list head.
> and
> 1714???? OrderAccess::storestore();
> 1715???? set_next(in_use_list, next);
>
> You have a storestore already!
>
> This code reads as:
> OrderAccess::storestore();
> OrderAccess::loadstore();
> OrderAccess::storestore();
> om->_next_om = value
>
> So it should be an Atomic::store.
>
> ###4
> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>
> Since the mark is an embedded spinlock I think the terminology should 
> be changed. (that the spinlock is inside a the next pointer should be 
> abstracted away)
> E.g. mark_next_loop would just be lock.
> The load of the list heads should use Atmoic:load.
> It also seem a bit wired to return next for the locking method.
> And output parameter can just be returned, and return NULL if list 
> head is NULL.
> E.g.
>
> ?198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* 
> volatile * list_p) {
> ?200?? while (true) {
> ?201???? ObjectMonitor* mid = Atomic::load(list_p);
> ?202???? if (mid == NULL) {
> ?203?????? return NULL;? // The list is empty.
> ?204???? }
> ?205???? if (try_lock(mid)) {
> ?206?????? if (Atmoic::load(list_p) != mid) {
> ?207???????? // The list head changed so we have to retry.
> ?208???????? unlock(mid);
> ?210?????? } else {
> ???????????? return mid;
> ?????? }
> ?214???? }
> ???????? // Yield ?
> ?215?? }
> ?216 }
>
> With colleteral changes.
>
> ###5
> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
> Atomic::store is what needed.
>
> ###6
> 333 static void prepend_to_common(
>
> ?345?????? OrderAccess::storestore();
> ?346?????? set_next(cur, next);? // Unmark the previous list head.
> Double storestore. (fixed by changing set_next to Atomic::store)
>
> ###7
> ?375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* 
> volatile * list_p,
>
> Triple storestore here.
>
> ?386?? Atomic::dec(count_p);
> ?387?? // mark_list_head() used cmpxchg() above, switching list head 
> can be lazier:
> ?388?? OrderAccess::storestore();
> ?389?? // Unmark take, but leave the next value for any lagging list
> ?390?? // walkers. It will get cleaned up when take is prepended to
> ?391?? // the in-use list:
> ?392?? set_next(take, next);
> ?393?? return take;
>
> Reads:
> count_p--
> OrderAccess::loadstore();
> OrderAccess::storestore();
> OrderAccess::storestore();
> OrderAccess::loadstore();
> OrderAccess::storestore();
> take->_next_om = next;
>
> Fixed by changing set_next to Atomic::store and removing the 
> OrderAccess::storestore();
>
> ###8
> ObjectSynchronizer::om_release(
>
> 1591?????? if (m == mid) {
> 1592???????? // We found 'm' on the per-thread in-use list so try to 
> extract it.
> 1593???????? if (cur_mid_in_use == NULL) {
> 1594?????????? // mid is the list head and it is marked. Switch the 
> list head
> 1595?????????? // to next which unmarks the list head, but leaves mid 
> marked:
> 1596?????????? self->om_in_use_list = next;
> 1597?????????? // mark_list_head() used cmpxchg() above, switching 
> list head can be lazier:
> 1598?????????? OrderAccess::storestore();
> 1599???????? } else {
> 1600?????????? // mid and cur_mid_in_use are marked. Switch 
> cur_mid_in_use's
> 1601?????????? // next field to next which unmarks cur_mid_in_use, but 
> leaves
> 1602?????????? // mid marked:
> 1603 OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
> 1604???????? }
> 1605???????? extracted = true;
> 1606???????? Atomic::dec(&self->om_in_use_count);
> 1607???????? // Unmark mid, but leave the next value for any lagging list
> 1608???????? // walkers. It will get cleaned up when mid is prepended to
> 1609???????? // the thread's free list:
> 1610???????? set_next(mid, next);
> 1611???????? break;
> 1612?????? }
>
> This does not look correct. Before taking this branch we have done a 
> cmpxchg in mark_list_head or mark_next_loop.
> This is how it reads:
> OrderAccess::storestore(); // from previous cmpxchg
> OrderAccess::loadstore(); // from previous cmpxchg
> 1591?????? if (m == mid) {
> 1593???????? if (cur_mid_in_use == NULL) {
> 1596?????????? self->om_in_use_list = next;
> 1598?????????? OrderAccess::storestore();
> 1599???????? } else {
> ?????????????? OrderAccess::storestore();
> ?????????????? OrderAccess::loadstore();
> 1603?????????? cur_mid_in_use->_next_om = next;
> 1604???????? }
> 1605???????? extracted = true;
> ???????????? OrderAccess::storestore();
> ???????????? OrderAccess::fence(); // 
> storestore|storeload|loadstore|loadload
> ???????? self->om_in_use_count--; // Atomic::dec
> ???????????? OrderAccess::storestore();
> ???????????? OrderAccess::loadstore();
> ???????????? OrderAccess::storestore();
> ???????????? OrderAccess::loadstore();
> ???????? mid->_next_om = next; // Atomic::store
> 1611???????? break;
> 1612?????? }
>
> extracted is local variable so you so not need any orderaccess before 
> it set.
> Fixed by changing set_next to Atomic::store, removing the 
> OrderAccess::storestore() and changing OrderAccess::release_store to 
> Atmoic::store();
>
> ###9
> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>
> 1714???? OrderAccess::storestore();
> 1715???? set_next(in_use_list, next);
> Fixed by changing set_next to Atomic::store.
>
> ###10
> 1737???? self->om_free_list = NULL;
> 1738???? OrderAccess::storestore();? // Lazier memory is okay for list 
> walkers.
>
> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does 
> first thing cmpxchg so there is no need for this storestore.
>
> ###11
> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, 
> Thread* self,
>
> 1938?????? // Once ObjectMonitor is configured and the object is 
> associated
> 1939?????? // with the ObjectMonitor, it is safe to allow async 
> deflation:
> 1940?????? assert(m->is_new(), "freshly allocated monitor must be new");
> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
>
> So we use ref count, contention, waiter, owner and allocation state to 
> keep OM alive in different scenarios.
> There is not way for me to keep track of that. I don't see why you 
> would need more than owner and ref count.
> If you allocate the om with ref count 1 you can remove 
> _allocation_state and just decrease ref count here instead.
>
> ###12
> 2079 bool ObjectSynchronizer::deflate_monitor
>
> 2112???? if (AsyncDeflateIdleMonitors) {
> 2113?????? // clear() expects the owner field to be NULL and we won't 
> race
> 2114?????? // with the simple C2 ObjectMonitor
>
> The macro assambler code is not just executed by C2, so this comment 
> is a bit misleading. (there are some more also)
>
> ###13
> 2306 int ObjectSynchronizer::deflate_monitor_list(
>
> Same issue as ObjectSynchronizer::om_release.
> Fixed by changing set_next to Atomic::store, removing the 
> OrderAccess::storestore() and changing OrderAccess::release_store to 
> Atmoic::store();
>
> ###14
> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
>
> This is the wrong method to call, it should 
> SafepointMechanism::should_block(Thread* thread);
>
> ###15
> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>
> 2616???? g_wait_list = NULL;
> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay for 
> list walkers.
>
> I don't see that g_wait_list is ever simutainously read.
> Either it is accessed by serviceThread outside a safepoint or by 
> VMThread inside a safepoint?
>
> It looks like g_wait_list can just be a local in:
> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>
> (disregarding the debug code that might read it in a safepoint)
>
> ###16
> 2722???????? assert(SafepointSynchronize::is_synchronizing(), "sanity 
> check");
>
> This is the wrong method to call, it should 
> SafepointMechanism::should_block(Thread* thread);
>
> ##################
> src/hotspot/share/runtime/vframe.cpp
>
> We are at safepoint or current thread or in a handshake, current 
> pending and waiting monitor is already stable.
>
> ##################
> src/hotspot/share/services/threadService.cpp
>
> These changes are only needed for the 
> -HandshakeAfterDeflateIdleMonitors path.
>
> ##################
> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java 
>
>
> Note: if OM had a weak to object instead this would not be needed.
>
> Thanks, Robbin
>
>
> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have made changes to the Async Monitor Deflation code in response to
>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>> and Erik O. for their comments!
>>
>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async 
>> Monitor
>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>> for sufficient bake time for such a big change. Nov. 12 is _next_ 
>> Tuesday
>> so we have 8 days from today to finish this code review cycle and push
>> this code for JDK14.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the change list from CR7 to CR8 instead of putting it in
>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>> file to the webrevs so it should be easy to find.
>>
>> Main bug URL:
>>
>> ???? JDK-8153224 Monitor deflation prolong safepoints
>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+21.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.08 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full 
>>
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.08 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/ 
>>
>>
>> The OpenJDK wiki did not need any changes for this round:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab. Some testing is still running, but so far there are no new 
>> regressions.
>>
>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 
>> bits.
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project is reaching the end game. I have no
>>> changes planned for the project at this time so all that is left is 
>>> code
>>> review and any changes that results from those reviews.
>>>
>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>> reviews.
>>>
>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+19.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>
>>>
>>> Some folks might want to see just what has changed since the last 
>>> review
>>> cycle so here's a webrev for that (v2.07 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>
>>>
>>> The OpenJDK wiki has been updated to match the 
>>> CR7/v2.07/10-for-jdk14 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-8]
>>> testing on Oracle's usual set of platforms. It has also been through 
>>> my usual
>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>> addition
>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>> tests in
>>> my lab.
>>>
>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-3]
>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>> process.
>>>
>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>> Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>
>>> ??? - "base" is jdk-14+18
>>> ??? - "v2.07" is the latest version and includes C2 
>>> inc_om_ref_count() support
>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>> specified
>>>
>>> ???????? hbIR?????????? hbIR
>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>> ??? ---------------? ---------? --------? ------------- -------
>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20 3841.30 base
>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20 3964.10 v2.07
>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00 3931.30 
>>> v2.07_handshake
>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10 3850.20 
>>> v2.07_off
>>>
>>> ??? - The Aurora Perf comparison tool reports:
>>>
>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>> ??????? ----------------------? -------------------- 
>>> --------------------
>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>>> p=0.790)
>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>>> p=0.536)
>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>>> p=0.347)
>>>
>>> ??????? (s) - significant? (ns) - not-significant
>>>
>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>
>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>> ??????? ----------------------? -------------------- 
>>> --------------------
>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>>> p=0.646)
>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>>> p=0.481)
>>>
>>> ??????? (s) - significant? (ns) - not-significant
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>> for our first code review in that new context!!
>>>>
>>>> I've been focused on changing the monitor list management code to be
>>>> lock-free in order to make SPECjbb2015 happier. Of course with a 
>>>> change
>>>> like that, it takes a while to chase down all the new and wonderful
>>>> races. At this point, I have the code back to the same stability that
>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>
>>>> To lay the ground work for this round of review, I pushed the 
>>>> following
>>>> two fixes to jdk/jdk earlier today:
>>>>
>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>>> preparation
>>>> ? ? ??????????? for lock free Monitor lists
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>
>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>> after 8230184
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>
>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>> JDK-8230184 and JDK-8230317.
>>>>
>>>> Here's the full webrev URL for those folks that want to see all of the
>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>
>>>>
>>>>
>>>> The primary focus of this review cycle is on the lock-free Monitor 
>>>> List
>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>
>>>>
>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>> patch (v2.06b):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>
>>>>
>>>> The third and final bucket for this review cycle is the rename, 
>>>> whitespace,
>>>> indent and comments changes made in preparation for lock free 
>>>> Monitor list
>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>> baseline so this bucket now has just a few comment changes relative to
>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>
>>>>
>>>>
>>>> Some folks might want to see just what has changed since the last 
>>>> review
>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>
>>>>
>>>>
>>>> Last, but not least, some folks might want to see the code before the
>>>> addition of lock-free Monitor List management so here's a webrev for
>>>> that (v2.00 -> v2.05):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>
>>>>
>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> but that should only be changes to describe per-thread list async 
>>>> monitor
>>>> deflation being done by the ServiceThread.
>>>>
>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>
>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance 
>>>> lab
>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>>> using
>>>> this patch baselined on jdk-13+31 (for stability):
>>>>
>>>> ????????? hbIR?????????? hbIR
>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>> ???? ---------------? ---------? --------? ------------- -------
>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 
>>>> v2.06d
>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>>> v2.06d_off
>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>> v2.06d_ocache
>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>>> v2.05_off
>>>>
>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 
>>>> byte
>>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>>> before
>>>> those changes were distributed into the "v2.06a", "v2.06b" and 
>>>> "v2.06c"
>>>> buckets for this review recycle.
>>>>
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>> Thanks Karen!
>>>>>
>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>> in the main body of this email.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>> (jdk/jdk) repo soon...
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>
>>>>> I have not yet checked the OpenJDK wiki to see if it needs any 
>>>>> updates
>>>>> to match the CR5 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 
>>>>> 2019.06.26)
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>> available.
>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>
>>>>> Next up:
>>>>>
>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>> - Of course, going lock free often causes new races and new failures
>>>>> ? so that's a good reason for make those changes isolated in their
>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>> - I finally have a potential fix for the Win* failure with
>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next 
>>>>> round.
>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing 
>>>>> some
>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>> ? look at the C2 RTM macro assembler code and look for things that 
>>>>> might
>>>>> ? conflict if Async Monitor Deflation. If you're interested in 
>>>>> that kind
>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>> ? added in this round!
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>> experiments.
>>>>>>
>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>> themselves are not that big.
>>>>>>
>>>>>> Functional:
>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>> ??????? safepoints that simply request async deflation which
>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>> ??????? their async deflation work.
>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>> ????? monitors_used_above_threshold() is true (but no more often 
>>>>>> than
>>>>>> ????? AsyncDeflationInterval).
>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>> ??????? ServiceThread.
>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>> ??? in-use list and now it also marks JavaThreads for async 
>>>>>> deflation
>>>>>> ??? of their in-use lists.
>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>> ????? GuaranteedSafepointInterval.
>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>
>>>>>> Test update:
>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>
>>>>>> Collateral:
>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>
>>>>>> Cleanup:
>>>>>> ? - Updated comments based on Karen's code review.
>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>> ??? - comment and function name changes
>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>
>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>
>>>>>>
>>>>>> The wiki doesn't say a whole lot about the async deflation 
>>>>>> invocation
>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to 
>>>>>> run
>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>
>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>> Sorry for all the names...
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>
>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>> stress kit is running right now.
>>>>>>>
>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>> the results and analyze them.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>> Deflation
>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>> with the
>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev 
>>>>>>>> URLs). Sorry
>>>>>>>> for all the names...
>>>>>>>>
>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>>> patch
>>>>>>>> is out of our hair.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 
>>>>>>>> changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>>> running
>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>
>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>> testing is done.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>> ready to
>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>>> and as
>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all 
>>>>>>>>> the
>>>>>>>>> names...
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>> Deflation project
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>
>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>>> version
>>>>>>>>> of JDK-8153224...
>>>>>>>>>
>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR 
>>>>>>>>> changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] 
>>>>>>>>> testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running 
>>>>>>>>> now and
>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>> Solaris-X64
>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. 
>>>>>>>>> I'm
>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>>> work on:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>
>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>
>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>> rounds of
>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>> Kitchensink
>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>>> fastdebug
>>>>>>>>>> and slowdebug). Earlier versions have run my monitor 
>>>>>>>>>> inflation stress
>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>> (product,
>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>
>>>>>>>>>> All of the testing done on earlier versions will be redone on 
>>>>>>>>>> the
>>>>>>>>>> latest version of the patch.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> P.S.
>>>>>>>>>> One subtest in 
>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>>> trying
>>>>>>>>>> to characterize/analyze this failure for more than a week 
>>>>>>>>>> now. At
>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>> aggravating
>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>


From daniel.daugherty at oracle.com  Fri Nov  8 14:29:45 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 8 Nov 2019 09:29:45 -0500
Subject: RFR 8230055: ModuleStressGC.java times out on Win*
In-Reply-To: <778431ec-85f5-92c4-7aaf-0727a3ff4768@oracle.com>
References: <e7c8b0c1-49fe-ac92-5e18-53f63df97e02@oracle.com>
 <2a082ca0-5cf6-12d9-3d62-30e99e593536@oracle.com>
 <778431ec-85f5-92c4-7aaf-0727a3ff4768@oracle.com>
Message-ID: <6e69b475-4f70-0c0a-ed45-55d32813c040@oracle.com>

Chris Plummer has been chasing some test failures where some system
calls just take a really long time to complete. I _think_ he has
isolated his sightings to a specific system calls and machine configs
in our test farm.

You might want to check with him to see if your test is running into
something similar...

Dan


On 11/8/19 7:59 AM, Harold Seigel wrote:
> Thanks Leonid.
>
> I'll withdraw the webrev.
>
> Harold
>
> On 11/7/2019 11:32 PM, Leonid Mesnik wrote:
>> Hi
>>
>> I am not sure that it is a good idea. The bug says that test times 
>> out and take more 40 minutes.
>>
>> However usually it takes less than 2 minutes to complete test (on any 
>> platform). So I think that it is not just a long test but rather some 
>> outlier which should be investigated.
>>
>> Leonid
>>
>> On 11/7/19 2:03 PM, Harold Seigel wrote:
>>> Hi,
>>>
>>> Please review this small change to help prevent test 
>>> runtime/modules/ModuleStress/ModuleStressGC.java from timing out. 
>>> The change reduces the number of loop iterations in the test by 40%.
>>>
>>> Open Webrev: 
>>> http://cr.openjdk.java.net/~hseigel/bug_8230055/webrev/index.html
>>>
>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8230055
>>>
>>> The change was tested by running Mach5 tier2 tests on Linux-x64, 
>>> Solaris, Windows, and Mac OS X.
>>>
>>> Thanks, Harold
>>>


From goetz.lindenmaier at sap.com  Fri Nov  8 15:32:48 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 8 Nov 2019 15:32:48 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
Message-ID: <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi,

I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
which makes one of the fixes unnecessary.
Also, I had to fix the argument of verify_oop_helper 
from oop to oopDesc* for the fastdebug build.

New webrev:
http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Freitag, 18. Oktober 2019 01:38
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> > Hi David,
> >
> > you are right, thanks for pointing me to that!
> > Doing one test for vm.bits=64 and one for 32 should fix it:
> > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> 
> s/01/02/ :)
> 
> For the 32-bit case you can delete the line:
> 
>     * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> 
> For the 64-but case you can delete the "sparc" check from the same line.
> 
> Thanks,
> David
> 
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Donnerstag, 17. Oktober 2019 13:18
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>
> >> Hi Goetz,
> >>
> >> UseCompressedOops is a 64-bit flag only so your change will break the
> >> test on 32-bit systems.
> >>
> >> David
> >>
> >> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> >>> Hi,
> >>>
> >>> 8231058 introduced a test that enables +VerifyOops.
> >>> This fails on ppc, because this was not used in a very
> >>> long time.
> >>>
> >>> The crash is caused by passing compressed oops from
> >>> LIR_Assembler::store() to the checker routine.
> >>> I fix this by implementing a checker routine verify_coop
> >>> that first decompresses the coop.  This makes the new
> >>> test pass.
> >>>
> >>> Further testing showed that the additional checker
> >>> coding makes Patching Stubs overflow. These
> >>> can not be increased in size to fit the code. I
> >>> disable generating verify_oop code in LIRAssembler::load()
> >>> which fixes the issue.
> >>>
> >>> Further I extended the message printed when verification
> >>> of an oop failed. First, I print the location in the source
> >>> code where the checker code was generated. Second,
> >>> I print the faulty oop.
> >>>
> >>> I also improved the message printed when PatchingStubs
> >>> overflow.
> >>>
> >>> Finally, I improve the test to run with and without compressed
> >>> Oops.
> >>>
> >>> Please review:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>>
> >>> @runtime as I modify the test introduced there
> >>> @compiler as the error is in C1.
> >>>
> >>> Best regards,
> >>>     Goetz.
> >>>

From ioi.lam at oracle.com  Fri Nov  8 18:52:04 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 8 Nov 2019 10:52:04 -0800
Subject: RFR(T) 8233855 [TESTBUG] appcds/FieldLayoutFlags.java failed to clean
 up files after test
Message-ID: <7d4e295a-2822-bc8d-af78-8164bc01b514@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8233855

In the test library code used by this test, the following close() is 
missing.
For some reason, the problem shows up only in this test, probably 
because this
test has opened the zipfile more often than other test cases.

$ hg diff
diff -r ad157fab6bf5 test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java
--- a/test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java??? Thu Nov 
07 16:26:57 2019 -0800
+++ b/test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java??? Fri Nov 
08 10:50:08 2019 -0800
@@ -343,6 +343,7 @@
 ???????????? newFile.renameTo(oldFile);
 ???????????? System.out.println("firstJar = " + firstJar + " Modified");
 ???????? } else {
+??????????? zipFile.close();
 ???????????? System.out.println("firstJar = " + firstJar);
 ???????? }
 ???? }

I ran the test again in tier4 and now it passed.

Thanks
- Ioi

From harold.seigel at oracle.com  Fri Nov  8 19:00:05 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Fri, 8 Nov 2019 14:00:05 -0500
Subject: RFR(T) 8233855 [TESTBUG] appcds/FieldLayoutFlags.java failed to
 clean up files after test
In-Reply-To: <7d4e295a-2822-bc8d-af78-8164bc01b514@oracle.com>
References: <7d4e295a-2822-bc8d-af78-8164bc01b514@oracle.com>
Message-ID: <c5c1d95e-ffc8-da1f-5e61-168aaa4fdf89@oracle.com>

Looks good and trivial.

Thanks, Harold

On 11/8/2019 1:52 PM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8233855
>
> In the test library code used by this test, the following close() is 
> missing.
> For some reason, the problem shows up only in this test, probably 
> because this
> test has opened the zipfile more often than other test cases.
>
> $ hg diff
> diff -r ad157fab6bf5 
> test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java
> --- a/test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java??? Thu Nov 
> 07 16:26:57 2019 -0800
> +++ b/test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java??? Fri Nov 
> 08 10:50:08 2019 -0800
> @@ -343,6 +343,7 @@
> ???????????? newFile.renameTo(oldFile);
> ???????????? System.out.println("firstJar = " + firstJar + " Modified");
> ???????? } else {
> +??????????? zipFile.close();
> ???????????? System.out.println("firstJar = " + firstJar);
> ???????? }
> ???? }
>
> I ran the test again in tier4 and now it passed.
>
> Thanks
> - Ioi

From ioi.lam at oracle.com  Fri Nov  8 19:01:29 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 8 Nov 2019 11:01:29 -0800
Subject: RFR(T) 8233855 [TESTBUG] appcds/FieldLayoutFlags.java failed to
 clean up files after test
In-Reply-To: <c5c1d95e-ffc8-da1f-5e61-168aaa4fdf89@oracle.com>
References: <7d4e295a-2822-bc8d-af78-8164bc01b514@oracle.com>
 <c5c1d95e-ffc8-da1f-5e61-168aaa4fdf89@oracle.com>
Message-ID: <42aa52d5-1d7c-b1f2-1dad-049a0fd17a8b@oracle.com>

Thanks Harold!

- Ioi

On 11/8/19 11:00 AM, Harold Seigel wrote:
> Looks good and trivial.
>
> Thanks, Harold
>
> On 11/8/2019 1:52 PM, Ioi Lam wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8233855
>>
>> In the test library code used by this test, the following close() is 
>> missing.
>> For some reason, the problem shows up only in this test, probably 
>> because this
>> test has opened the zipfile more often than other test cases.
>>
>> $ hg diff
>> diff -r ad157fab6bf5 
>> test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java
>> --- a/test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java Thu Nov 
>> 07 16:26:57 2019 -0800
>> +++ b/test/hotspot/jtreg/runtime/cds/appcds/TestCommon.java Fri Nov 
>> 08 10:50:08 2019 -0800
>> @@ -343,6 +343,7 @@
>> ???????????? newFile.renameTo(oldFile);
>> ???????????? System.out.println("firstJar = " + firstJar + " Modified");
>> ???????? } else {
>> +??????????? zipFile.close();
>> ???????????? System.out.println("firstJar = " + firstJar);
>> ???????? }
>> ???? }
>>
>> I ran the test again in tier4 and now it passed.
>>
>> Thanks
>> - Ioi


From ioi.lam at oracle.com  Fri Nov  8 21:35:36 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 8 Nov 2019 13:35:36 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
Message-ID: <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>

Hi Jiangli,

Thanks for your comments. Please see my replies in-line:

On 11/7/19 6:34 PM, Jiangli Zhou wrote:
> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>> I looked both 05.full and 06.delta webrevs. They look good.
>>
>> I still feel a bit uneasy about the potential runtime impact when data
>> does get relocated. Long running apps/services may be shy away from
>> enabling archive at runtime, if there is a detectable overhead even
>> though it may only occur rarely. As relocation is enabled by default
>> and users cannot turn it off, disabling with -Xshare:off entirely
>> would become the only choice. Could you please create a new RFE
>> (possibly with higher priority) to investigate the potential effect,
>> or provide an option for users to opt-in relocation with the
>> command-line switch?

I created https://bugs.openjdk.java.net/browse/JDK-8233862
Investigate performance benefit of relocating CDS archive to under 32G

As I noted in the bug report, I ran benchmarks with CDS relocation 
on/off, and there's no sign of regression when the CDS archive is 
relocated. Please see the bug report for how to configure the VM to do 
the comparison.

As you said before: "When enabling CDS we [google] noticed a small 
runtime overhead in JDK 11 recently with a benchmark. After I backported 
JDK-8213713 to 11, it seemed to reduce the runtime overhead that the 
benchmark was experiencing":

Can you confirm whether this is stock JDK 11 or a special google build? 
Which test case did you use? Is it possible for you to run the tests 
again (using the exact before/after bits that you had when backporting 
JDK-8213713)? Can you check if narrow_klass_base and narrow_klass_shift 
are the same in your before/after builds?

> Forgot to say that when Java heap can fit into low 32G space, it takes
> the class space size into account and leaves need space right above
> (also in low 32G space) when reserving heap, for !UseSharedSpace. In
> that case, it's more likely the class data and heap data can be
> colocated successfully.

The reason is not for "colocation". It's so that narrow_klass_base can 
be zero, and the klass pointer can be uncompressed with a shift (without 
also doing an addition).

But with CDS enabled, we always hard code to use non-zero 
narrow_klass_base and 3 bit shift (for AOT). So by just relocating the 
CDS archive to under 32GB, without modifying how CDS handles 
narrow_klass_base/shift, I don't think we can expect any benefit.

For modern architectures, I am not aware of any inherent speed benefit 
simply by putting data (in our case much larger than a page) "close to 
each other" in the virtual address space. If you have any reference of 
that, please let me know.

Thanks
- Ioi

>
> Thanks,
> Jiangli
>
>> Regards,
>> Jiangli
>>
>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>> Hi Coleen,
>>>
>>> Thanks for the review. Here's an webrev that has incorporated your
>>> suggestions:
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
>>>
>>> Please see comments in-line
>>>
>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
>>>> Hi, I've done a more high level code review of this and it looks good!
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
>>>>
>>>>
>>>> I think these classes require comments on what they do and why. The
>>>> comments you sent me offline look good.
>>> I added more comments for ArchivePtrMarker::_compacted per your offline
>>> request.
>>>
>>>> Also .hpp files shouldn't include .inline.hpp files, like
>>>> bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
>>>> the cpp file.
>>> I moved the do_bit() function into archiveUtils.inline.hpp, since is
>>> used by 3 .cpp files, and performance is important.
>>>
>>>> I wonder if the exception list of classes to exclude should be a
>>>> function in javaClasses.hpp/cpp where the explanation would make more
>>>> sense?  ie bool
>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>>> I moved the checking code to javaClasses.cpp. Since we do (partially)
>>> support java.lang.Class, which has injected native pointers, I named the
>>> function as JavaClasses::is_supported_for_archiving instead. I also
>>> massaged the comments a little for clarification.
>>>
>>>> Is there already an RFE to move the DumpSharedSpaces output from
>>>> tty->print() to log_info() ?
>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
>>> dumping tty->print_cr() to unified logging).
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
>>>>> Hi Jiangli,
>>>>>
>>>>> I've uploaded the webrev after integrating your comments:
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
>>>>>
>>>>>
>>>>> Please see more replies below:
>>>>>
>>>>>
>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
>>>>>> <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>
>>>>>>      Hi Jiangli,
>>>>>>
>>>>>>      Thank you so much for spending time reviewing this RFE!
>>>>>>
>>>>>>      On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>>>>>>      > Hi Ioi,
>>>>>>      >
>>>>>>      > Sorry for the delay again. Will try to put this on the top of my
>>>>>>      list
>>>>>>      > next week and reduce the turn-around time. The updates look
>>>>>> good in
>>>>>>      > general.
>>>>>>      >
>>>>>>      > We might want to have a better strategy when choosing metadata
>>>>>>      > relocation address (when relocation is needed). Some
>>>>>>      > applications/benchmarks may be more sensitive to cache
>>>>>> locality and
>>>>>>      > memory/data layout. There was a bug,
>>>>>>      > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
>>>>>> 1G gap
>>>>>>      > between Java heap data and metadata before JDK 12. The gap
>>>>>> seemed to
>>>>>>      > cause a small but noticeable runtime effect in one case that I
>>>>>> came
>>>>>>      > across.
>>>>>>
>>>>>>      I guess you're saying we should try to relocate the archive into
>>>>>>      somewhere under 32GB?
>>>>>>
>>>>>>
>>>>>> I don't yet have sufficient data that determins if mapping at low
>>>>>> 32G produces better runtime performance. I experimented with that,
>>>>>> but didn't see noticeable difference when comparing to mapping at
>>>>>> the current default address. It doesn't hurt, I think. So it may be
>>>>>> a better choice than relocating to a random address in high 32G
>>>>>> space (when Java heap is in low 32G address space).
>>>>> Maybe we should reconsider this when we have more concrete data for
>>>>> the benefits of moving the compressed class space to under 32G.
>>>>>
>>>>> Please note that in metaspace.cpp, when CDS is disabled and  the VM
>>>>> fails to allocate the class space at the requested address
>>>>> (0x7c000000 for 16GB heap), it also just allocates from a random
>>>>> address (without trying to to search under 32GB):
>>>>>
>>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
>>>>>
>>>>>
>>>>> This code has been there since 2013 and we have not seen any issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>      Could you elaborate more about the performance issue, especially
>>>>>>      about
>>>>>>      cache locality? I looked at JDK-8213713 but it didn't mention about
>>>>>>      performance.
>>>>>>
>>>>>>
>>>>>> When enabling CDS we noticed a small runtime overhead in JDK 11
>>>>>> recently with a benchmark. After I backported JDK-8213713 to 11, it
>>>>>> seemed to reduce the runtime overhead that the benchmark was
>>>>>> experiencing.
>>>>>>
>>>>>>
>>>>>>      Also, by default, we have non-zero narrow_klass_base and
>>>>>>      narrow_klass_shift = 3, and archive relocation doesn't change that:
>>>>>>
>>>>>>      $ java -Xlog:cds=debug -version
>>>>>>      ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>>>>>>      $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>>>>>>      ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>>>>>>
>>>>>>      We always use narrow_klass_shift due to this:
>>>>>>
>>>>>>         // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>>>>>>         //
>>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>>>>>>         // how dump time narrow_klass_shift is set. Although, CDS can
>>>>>> work
>>>>>>         // with zero-shift mode also, to be consistent with AOT it uses
>>>>>>         // LogKlassAlignmentInBytes for klass shift so archived java
>>>>>>      heap objects
>>>>>>         // can be used at same time as AOT code.
>>>>>>         if (!UseSharedSpaces
>>>>>>             && (uint64_t)(higher_address - lower_base) <=
>>>>>>      UnscaledClassSpaceMax) {
>>>>>>           CompressedKlassPointers::set_shift(0);
>>>>>>         } else {
>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>>>>>>         }
>>>>>>
>>>>>>
>>>>>> Right. If we relocate to low 32G space, it needs to make sure that
>>>>>> the range containing the mapped class data and class space must be
>>>>>> encodable.
>>>>>>
>>>>>>
>>>>>>      > Here are some additional comments (minor).
>>>>>>      >
>>>>>>      > Could you please fix the long lines in the following?
>>>>>>      >
>>>>>>      > 1237 void
>>>>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>>>>>>      > archived_mirror) {
>>>>>>      > 1238   if (MetaspaceShared::relocation_delta() != 0) {
>>>>>>      > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
>>>>>>      > NULL, "must be for primitive class");
>>>>>>      > 1240
>>>>>>      > 1241     Klass* ak =
>>>>>>      > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>>>>>>      > 1242     if (ak != NULL) {
>>>>>>      > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
>>>>>>      > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>>>>>>      > 1244     }
>>>>>>      > 1245   }
>>>>>>      > 1246 }
>>>>>>      >
>>>>>>      > src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>      >
>>>>>>      >   889   Thread* THREAD = Thread::current();
>>>>>>      >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>>>>>>      > dynamic_dump_method_comparator);
>>>>>>      >   891   if (ik->default_methods() != NULL) {
>>>>>>      >   892  Method::sort_methods(ik->default_methods(),
>>>>>>      > /*set_idnums=*/false, dynamic_dump_method_comparator);
>>>>>>      >   893   }
>>>>>>      >
>>>>>>
>>>>>>      OK will do.
>>>>>>
>>>>>>      > Please see inlined comments below.
>>>>>>      >
>>>>>>      > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>      <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>      >> Hi Jiangli,
>>>>>>      >>
>>>>>>      >> Thanks for the review. I've updated the patch according to your
>>>>>>      comments:
>>>>>>      >>
>>>>>>      >>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>>>>>>
>>>>>>      >>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>>>>>>
>>>>>>      >>
>>>>>>      >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>>>>>>      in my
>>>>>>      >> reply to Calvin's comments).
>>>>>>      >>
>>>>>>      >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>>>>>>      >>> Hi Ioi,
>>>>>>      >>>
>>>>>>      >>> Sorry for the delay. Here are my remaining comments.
>>>>>>      >>>
>>>>>>      >>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>      >>>
>>>>>>      >>> 128   static intx _method_comparator_name_delta;
>>>>>>      >>>
>>>>>>      >>> The name of the above variable is confusing. It's the value of
>>>>>>      >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>>>>>>      >>> directly.
>>>>>>      >> _buffer_to_target_delta is a non-static field, but
>>>>>>      >> dynamic_dump_method_comparator() must be a static function so
>>>>>>      it can't
>>>>>>      >> use the non-static field easily.
>>>>>>      >
>>>>>>      > It sounds like an issue. _buffer_to_target_delta was made as a
>>>>>>      > non-static mostly because we might support more than one dynamic
>>>>>>      > archives in the future. However, today's usages bake in an
>>>>>>      assumption
>>>>>>      > that _buffer_to_target_delta is a singleton value. It is
>>>>>> cleaner to
>>>>>>      > either make _buffer_to_target_delta as a static variable for
>>>>>> now, or
>>>>>>      > adding an access API in DynamicArchiveBuilder to allow other
>>>>>> code to
>>>>>>      > properly and correctly use the value.
>>>>>>
>>>>>>      OK, I'll move it to a static variable.
>>>>>>
>>>>>>      >
>>>>>>      >>> Also, we can do a quick pointer comparison of 'a_name' and
>>>>>>      >>> 'b_name' first before adjusting the pointers.
>>>>>>      >> I added this:
>>>>>>      >>
>>>>>>      >>       if (a_name == b_name) {
>>>>>>      >>         return 0;
>>>>>>      >>       }
>>>>>>      >>
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>>>>>>      >>> ...
>>>>>>      >>>    944
>>>>>>      >>>    945  ArchivePtrMarker::compact(relocatable_base,
>>>>>>      relocatable_end);
>>>>>>      >>> ...
>>>>>>      >>>
>>>>>>      >>>    974     SharedDataRelocator patcher((address*)patch_base,
>>>>>>      >>> (address*)patch_end, valid_old_base, valid_old_end,
>>>>>>      >>>    975  valid_new_base, valid_new_end, addr_delta);
>>>>>>      >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
>>>>>>      >>>
>>>>>>      >>> Could we reduce the number of data re-iterations to help
>>>>>> archive
>>>>>>      >>> dumping performance. The ArchivePtrMarker::compact operation
>>>>>>      can be
>>>>>>      >>> combined with the patching iteration.
>>>>>>      ArchivePtrMarker::compact API
>>>>>>      >>> can be removed.
>>>>>>      >> That's a good idea. I implemented it using a template parameter
>>>>>>      so that
>>>>>>      >> we can have max performance when relocating the archive at run
>>>>>>      time.
>>>>>>      >>
>>>>>>      >> I added comments to explain why the relocation is done here. The
>>>>>>      >> relocation is pretty rare (only when the base archive was not
>>>>>>      mapped at
>>>>>>      >> the default location).
>>>>>>      >>
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>>    967     address valid_new_base =
>>>>>>      >>> (address)Arguments::default_SharedBaseAddress();
>>>>>>      >>>    968     address valid_new_end  = valid_new_base +
>>>>>>      base_plus_top_size;
>>>>>>      >>>
>>>>>>      >>> The debugging only code can be included under #ifdef ASSERT.
>>>>>>      >> These values are actually also used in debug logging so they
>>>>>>      can't be
>>>>>>      >> ifdef'ed out.
>>>>>>      >>
>>>>>>      >> Also, the c++ compiler is pretty good with eliding code
>>>>>> that's no
>>>>>>      >> actually used. If I comment out all the logging code in
>>>>>>      >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>>>>>>      >> SharedDataRelocator, gcc elides all the unused fields and their
>>>>>>      >> assignments. So no code is generated for this, etc.
>>>>>>      >>
>>>>>>      >>       address valid_new_base =
>>>>>>      >> (address)Arguments::default_SharedBaseAddress();
>>>>>>      >>
>>>>>>      >> Since #ifdef ASSERT makes the code harder to read, I think we
>>>>>>      should use
>>>>>>      >> it only when really necessary.
>>>>>>      > It seems cleaner to get rid of these debugging only variables, by
>>>>>>      > using 'relocatable_base' and
>>>>>>      > '(address)Arguments::default_SharedBaseAddress()' in the logging
>>>>>>      code.
>>>>>>
>>>>>>      SharedDataRelocator is used under 3 different situations. These six
>>>>>>      variables (patch_base, patch_end, valid_old_base, valid_old_end,
>>>>>>      valid_new_base, valid_new_end) describes what is being patched,
>>>>>>      and what
>>>>>>      the expectations are, for each situation. The code will be hard to
>>>>>>      understand without them.
>>>>>>
>>>>>>      Please note there's also logging code in the SharedDataRelocator
>>>>>>      constructor that prints out these values.
>>>>>>
>>>>>>      I think I'll just remove the 'debug only' comment to avoid
>>>>>> confusion.
>>>>>>
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>>
>>>>>>      >
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>>    993
>>>>>>   dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>>>>>>      >>>
>>>>>>      >>> We could combine the archived heap data bitmap into the new
>>>>>>      region as
>>>>>>      >>> well? It can be handled as a separate RFE.
>>>>>>      >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>>>>>>      >>
>>>>>>      >>> - src/hotspot/share/memory/filemap.cpp
>>>>>>      >>>
>>>>>>      >>> 1038     if (is_static()) {
>>>>>>      >>> 1039       if (errno == ENOENT) {
>>>>>>      >>> 1040         // Not locating the shared archive is ok.
>>>>>>      >>> 1041         fail_continue("Specified shared archive not found
>>>>>>      (%s).",
>>>>>>      >>> _full_path);
>>>>>>      >>> 1042       } else {
>>>>>>      >>> 1043         fail_continue("Failed to open shared archive file
>>>>>>      (%s).",
>>>>>>      >>> 1044  os::strerror(errno));
>>>>>>      >>> 1045       }
>>>>>>      >>> 1046     } else {
>>>>>>      >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
>>>>>>      >>> doesn't exist: %s", _full_path);
>>>>>>      >>> 1048     }
>>>>>>      >>>
>>>>>>      >>> If the top layer is explicitly specified by the user, a
>>>>>>      warning does
>>>>>>      >>> not seem to be a proper behavior if the VM fails to open the
>>>>>>      archive
>>>>>>      >>> file.
>>>>>>      >>>
>>>>>>      >>> If might be better to handle the relocation unrelated code in
>>>>>>      separate
>>>>>>      >>> changeset and track with a separate RFE.
>>>>>>      >> This code was moved from
>>>>>>      >>
>>>>>>      >>
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>>>>>>
>>>>>>      >>
>>>>>>      >> so I am not changing the behavior. If you want, we can file an
>>>>>>      REF to
>>>>>>      >> change the behavior.
>>>>>>      > Ok. A new RFE sounds like the right thing to re-evaluable the
>>>>>> usage
>>>>>>      > issue here. Thanks.
>>>>>>
>>>>>>      I created https://bugs.openjdk.java.net/browse/JDK-8233446
>>>>>>
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>> 1148 void FileMapInfo::write_region(int region, char* base,
>>>>>>      size_t size,
>>>>>>      >>> 1149                                bool read_only, bool
>>>>>>      allow_exec) {
>>>>>>      >>> ...
>>>>>>      >>> 1154
>>>>>>      >>> 1155   if (region == MetaspaceShared::bm) {
>>>>>>      >>> 1156     target_base = NULL;
>>>>>>      >>> 1157   } else if (DynamicDumpSharedSpaces) {
>>>>>>      >>>
>>>>>>      >>> It's not too clear to me how the bitmap (bm) region is handled
>>>>>>      for the
>>>>>>      >>> base layer and top layer. Could you please explain?
>>>>>>      >> The bm region for both layers are mapped at an address picked
>>>>>>      by the OS:
>>>>>>      >>
>>>>>>      >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>>>>>>      >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
>>>>>>      >>     bitmap_size = si->used_aligned();
>>>>>>      >>     bool read_only = true, allow_exec = false;
>>>>>>      >>     char* requested_addr = NULL; // allow OS to pick any
>>>>>> location
>>>>>>      >>     char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>      si->file_offset(),
>>>>>>      >> requested_addr, bitmap_size,
>>>>>>      >> read_only, allow_exec);
>>>>>>      >>
>>>>>>      > Ok, after staring at the code for a few seconds I saw that's
>>>>>>      intended.
>>>>>>      > If the current region is 'bm', then the 'target_base' is NULL
>>>>>>      > regardless if it's static or dynamic archive. Otherwise, the
>>>>>>      > 'target_base' is handled differently for the static and dynamic
>>>>>>      case.
>>>>>>      > The following would be cleaner and has better reliability.
>>>>>>      >
>>>>>>      >     char* target_base = NULL;
>>>>>>      >
>>>>>>      >     // The target_base is NULL for 'bm' region.
>>>>>>      >     if (!region == MetaspaceShared::bm) {
>>>>>>      >       if (DynamicDumpSharedSpaces) {
>>>>>>      >         assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>> archive
>>>>>>      > doesn't support heap regions");
>>>>>>      >         target_base = DynamicArchive::buffer_to_target(base);
>>>>>>      >       } else {
>>>>>>      >         target_base = base;
>>>>>>      >       }
>>>>>>      >    }
>>>>>>
>>>>>>      How about this?
>>>>>>
>>>>>>         char* target_base;
>>>>>>         if (region == MetaspaceShared::bm) {
>>>>>>           target_base = NULL; // always NULL for bm region.
>>>>>>         } else {
>>>>>>           if (DynamicDumpSharedSpaces) {
>>>>>>               assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>> archive
>>>>>>      doesn't support heap regions");
>>>>>>               target_base = DynamicArchive::buffer_to_target(base);
>>>>>>           } else {
>>>>>>               target_base = base;
>>>>>>           }
>>>>>>         }
>>>>>>
>>>>>>
>>>>>> No objection If you prefer the extra 'else' block.
>>>>>>
>>>>>>
>>>>>>      >
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>> 1362
>>>>>>   DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>
>>>>>>      >>>
>>>>>>      >>> Could you please explain the above?
>>>>>>      >> I added the comments
>>>>>>      >>
>>>>>>      >>     // Make sure we don't attempt to use
>>>>>>      header()->mapped_base_address()
>>>>>>      >> unless
>>>>>>      >>     // it's been successfully mapped.
>>>>>>      >>
>>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>
>>>>>>      >>
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>> 1359   FileMapRegion* last_region = NULL;
>>>>>>      >>>
>>>>>>      >>> 1371     if (last_region != NULL) {
>>>>>>      >>> 1372       // Ensure that the OS won't be able to allocate new
>>>>>>      memory
>>>>>>      >>> spaces between any mapped
>>>>>>      >>> 1373       // regions, or else it would mess up the simple
>>>>>>      comparision
>>>>>>      >>> in MetaspaceObj::is_shared().
>>>>>>      >>> 1374       assert(si->mapped_base() ==
>>>>>> last_region->mapped_end(),
>>>>>>      >>> "must have no gaps");
>>>>>>      >>>
>>>>>>      >>> 1379     last_region = si;
>>>>>>      >>>
>>>>>>      >>> Can you please place 'last_region' related code under #ifdef
>>>>>>      ASSERT?
>>>>>>      >> I think that will make the code more cluttered. The compiler
>>>>>> will
>>>>>>      >> optimize out that away.
>>>>>>      > It's cleaner to define debugging only variable for debugging only
>>>>>>      > builds. You can wrapper it and related usage with DEBUG_ONLY.
>>>>>>
>>>>>>      OK, will do.
>>>>>>
>>>>>>      >
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>>>>>>      bitmap_size) {
>>>>>>      >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
>>>>>>      >>> 1480   bitmap_size = si->used_aligned();
>>>>>>      >>> 1481   bool read_only = true, allow_exec = false;
>>>>>>      >>> 1482   char* requested_addr = NULL; // allow OS to pick any
>>>>>>      location
>>>>>>      >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>      si->file_offset(),
>>>>>>      >>> 1484 requested_addr, bitmap_size,
>>>>>>      >>> read_only, allow_exec);
>>>>>>      >>>
>>>>>>      >>> We need to handle mapping failure here.
>>>>>>      >> It's handled here:
>>>>>>      >>
>>>>>>      >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>>>>>>      >>     log_debug(cds, reloc)("runtime archive relocation start");
>>>>>>      >>     size_t bitmap_size;
>>>>>>      >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
>>>>>>      >>     if (bitmap_base != NULL) {
>>>>>>      >>     ...
>>>>>>      >>     } else {
>>>>>>      >>       log_error(cds)("failed to map relocation bitmap");
>>>>>>      >>       return false;
>>>>>>      >>     }
>>>>>>      >>
>>>>>>      > 'bitmap_base' is used immediately after map_memory(). So the
>>>>>> check
>>>>>>      > needs to be done immediately after map_memory(), but not in the
>>>>>>      caller
>>>>>>      > of map_relocation_bitmap().
>>>>>>      >
>>>>>>      > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>      si->file_offset(),
>>>>>>      > 1491 requested_addr, bitmap_size,
>>>>>>      > read_only, allow_exec);
>>>>>>      > 1492
>>>>>>      > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
>>>>>>      > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>>>>>>
>>>>>>      OK, I'll fix that.
>>>>>>
>>>>>>      >
>>>>>>      >
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>> 1513     // debug only -- the current value of the pointers
>>>>>> to be
>>>>>>      >>> patched must be within this
>>>>>>      >>> 1514     // range (i.e., must be between the requesed base
>>>>>>      address,
>>>>>>      >>> and the of the current archive).
>>>>>>      >>> 1515     // Note: top archive may point to objects in the base
>>>>>>      >>> archive, but not the other way around.
>>>>>>      >>> 1516     address valid_old_base =
>>>>>>      (address)header()->requested_base_address();
>>>>>>      >>> 1517     address valid_old_end  = valid_old_base +
>>>>>>      mapping_end_offset();
>>>>>>      >>>
>>>>>>      >>> Please place all FileMapInfo::relocate_pointers debugging only
>>>>>>      code
>>>>>>      >>> under #ifdef ASSERT.
>>>>>>      >> Ditto about ifdef ASSERT
>>>>>>      >>
>>>>>>      >>> - src/hotspot/share/memory/heapShared.cpp
>>>>>>      >>>
>>>>>>      >>>    441 void
>>>>>>      HeapShared::initialize_from_archived_subgraph(Klass* k) {
>>>>>>      >>>    442   if (!open_archive_heap_region_mapped() ||
>>>>>>      !MetaspaceObj::is_shared(k)) {
>>>>>>      >>>    443     return; // nothing to do
>>>>>>      >>>    444   }
>>>>>>      >>>
>>>>>>      >>> When do we call HeapShared::initialize_from_archived_subgraph
>>>>>>      for a
>>>>>>      >>> klass that's not shared?
>>>>>>      >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>>>>>>      that for
>>>>>>      >> debugging purposes only.
>>>>>>      >>
>>>>>>      >>>    616   DEBUG_ONLY({
>>>>>>      >>>    617       Klass* klass = orig_obj->klass();
>>>>>>      >>>    618       assert(klass !=
>>>>>> SystemDictionary::Module_klass() &&
>>>>>>      >>>    619              klass !=
>>>>>>      SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>      >>>    620              klass !=
>>>>>>      SystemDictionary::MemberName_klass() &&
>>>>>>      >>>    621              klass !=
>>>>>> SystemDictionary::Context_klass() &&
>>>>>>      >>>    622              klass !=
>>>>>>      SystemDictionary::ClassLoader_klass(), "we
>>>>>>      >>> can only relocate metaspace object pointers inside
>>>>>> java_lang_Class
>>>>>>      >>> instances");
>>>>>>      >>>    623     });
>>>>>>      >>>
>>>>>>      >>> Let's leave the above for a separate RFE. I think assert is not
>>>>>>      >>> sufficient for the check. Also, why ResolvedMethodName,
>>>>>> Module and
>>>>>>      >>> MemberName cannot be part of the graph?
>>>>>>      >>>
>>>>>>      >>>
>>>>>>      >> I added the following comment:
>>>>>>      >>
>>>>>>      >>     DEBUG_ONLY({
>>>>>>      >>         // The following are classes in
>>>>>>      share/classfile/javaClasses.cpp
>>>>>>      >> that have injected native pointers
>>>>>>      >>         // to metaspace objects. To support these classes, we
>>>>>>      need to add
>>>>>>      >> relocation code similar to
>>>>>>      >>         //
>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
>>>>>>      >>         Klass* klass = orig_obj->klass();
>>>>>>      >>         assert(klass != SystemDictionary::Module_klass() &&
>>>>>>      >>                klass !=
>>>>>>      SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>      >>
>>>>>>      > It's too restrictive to exclude those objects from the archived
>>>>>>      object
>>>>>>      > graph because metadata relocation, since metadata relocation is
>>>>>>      rare.
>>>>>>      > The trade-off doesn't seem to buy us much.
>>>>>>      >
>>>>>>      > Do you plan to add the needed relocation code?
>>>>>>
>>>>>>      I looked more into this. Actually we cannot handle these 5
>>>>>> classes at
>>>>>>      all, even without archive relocation:
>>>>>>
>>>>>>      [1] #define MODULE_INJECTED_FIELDS(macro) \
>>>>>>         macro(java_lang_Module, module_entry, intptr_signature, false)
>>>>>>
>>>>>>      ->  module_entry is malloc'ed
>>>>>>
>>>>>>      [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>>>>>>         macro(java_lang_invoke_ResolvedMethodName, vmholder,
>>>>>>      object_signature, false) \
>>>>>>         macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>>>>>>      intptr_signature, false)
>>>>>>
>>>>>>      -> these fields are related to method handles and lambda forms,
>>>>>> etc.
>>>>>>      They can't be easily be archived without implementing lambda form
>>>>>>      archiving. (I did a prototype; it's very complex and fragile).
>>>>>>
>>>>>>      [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>      vmdependencies, intptr_signature, false) \
>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>      last_cleanup, long_signature, false)
>>>>>>
>>>>>>      -> vmdependencies is malloc'ed.
>>>>>>
>>>>>>      [4] #define
>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>>>>>>         macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>>>>>>      false)
>>>>>>
>>>>>>      -> this one is probably OK. Despite being declared as
>>>>>>      'intptr_signature', it seems to be used just as an integer.
>>>>>> However,
>>>>>>      MemberNames are typically used with [2] and [3]. So let's just
>>>>>>      forbid it
>>>>>>      to be safe.
>>>>>>
>>>>>>      [2] [3] [4] are not used directly by regular Java code and are
>>>>>>      unlikely
>>>>>>      to be referenced (directly or indirectly) by static fields (except
>>>>>>      for
>>>>>>      the static fields in the classes in java.lang.invoke, which we
>>>>>>      probably
>>>>>>      won't support for heap archiving due to the problem I described for
>>>>>>      [2]). Objects of these types are typically referenced via constant
>>>>>>      pool
>>>>>>      entries.
>>>>>>
>>>>>>      [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>>>>>>         macro(java_lang_ClassLoader, loader_data, intptr_signature,
>>>>>> false)
>>>>>>
>>>>>>      -> loader_data is malloc'ed.
>>>>>>
>>>>>>      So, I will change the DEBUG_ONLY into a product-mode check, and
>>>>>> quit
>>>>>>      dumping if these objects are found in the object subgraph.
>>>>>>
>>>>>>
>>>>>> Sounds good. Can you please also add a comment with explanation.
>>>>>>
>>>>>> For  ClassLoader and Module, it worth considering caching the
>>>>>> additional native data some time in the future. Lois had suggested
>>>>>> the Module part a while ago.
>>>>> I think we can do that if/when we archive Modules directly into the
>>>>> shared heap.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      Maybe we should backport the check to older versions as well?
>>>>>>
>>>>>>
>>>>>> We should discuss with Andrew Haley for backports to JDK 11 update
>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
>>>>>> archiving to a restricted set of JDK library code, I think it is
>>>>>> safe without the new check.
>>>>>>
>>>>>> For non-LTS releases, it might not be worthwhile as they may not be
>>>>>> widely used?
>>>>> I agree. FYI, we (Oracle) have no plan for backporting more types of
>>>>> heap object archiving, so the decision would be up to whoever that
>>>>> decides to do so.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>>
>>>>>>      >
>>>>>>      >>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>      >>>
>>>>>>      >>> 1036   metaspace_rs =
>>>>>> ReservedSpace(compressed_class_space_size(),
>>>>>>      >>> 1037   _reserve_alignment,
>>>>>>      >>> 1038   large_pages,
>>>>>>      >>> 1039   requested_addr);
>>>>>>      >>>
>>>>>>      >>> Please fix indentation.
>>>>>>      >> Fixed.
>>>>>>      >>
>>>>>>      >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>>>>>>      >>>
>>>>>>      >>>     78   enum SpecialRef {
>>>>>>      >>>     79     _method_entry_ref
>>>>>>      >>>     80   };
>>>>>>      >>>
>>>>>>      >>> Are there other pointers that are not references to
>>>>>>      MetaspaceObj? If
>>>>>>      >>> _method_entry_ref is the only type, it's probably not worth
>>>>>>      defining
>>>>>>      >>> SpecialRef?
>>>>>>      >> There may be more types in the future, so I want to have a
>>>>>>      stable API
>>>>>>      >> that can be easily expanded without touching all the code that
>>>>>>      uses it.
>>>>>>      >>
>>>>>>      >>
>>>>>>      >>> - src/hotspot/share/memory/metaspaceShared.hpp
>>>>>>      >>>
>>>>>>      >>>     42 enum MapArchiveResult {
>>>>>>      >>>     43   MAP_ARCHIVE_SUCCESS,
>>>>>>      >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
>>>>>>      >>>     45   MAP_ARCHIVE_OTHER_FAILURE
>>>>>>      >>>     46 };
>>>>>>      >>>
>>>>>>      >>> If we want to define different failure types, it's probably
>>>>>> worth
>>>>>>      >>> using separate types for relocation failure and validation
>>>>>>      failure.
>>>>>>      >> For now, I just need to distinguish between MMAP_FAILURE (where
>>>>>>      I should
>>>>>>      >> attempt to remap at an alternative address) and OTHER_FAILURE
>>>>>>      (where the
>>>>>>      >> CDS archive loading will fail -- due to validation error,
>>>>>>      insufficient
>>>>>>      >> memory, etc -- without attempting to remap.)
>>>>>>      >>
>>>>>>      >>> ---
>>>>>>      >>>
>>>>>>      >>>    193   static intx _mapping_delta; // FIXME rename
>>>>>>      >>>
>>>>>>      >>> How about _relocation_delta?
>>>>>>      >> Changed as suggested.
>>>>>>      >>
>>>>>>      >>> - src/hotspot/share/oops/instanceKlass
>>>>>>      >>>
>>>>>>      >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>>>>>>      >>>
>>>>>>      >>> The use of _disable_method_binary_search is not necessary. You
>>>>>>      can use
>>>>>>      >>> DynamicDumpSharedSpaces for the purpose. That would make things
>>>>>>      >>> cleaner.
>>>>>>      >> If we always disable the binary search when
>>>>>>      DynamicDumpSharedSpaces is
>>>>>>      >> true, it will slow down normal execution of the Java program
>>>>>> when
>>>>>>      >> -XX:ArchiveClassesAtExit has been specified, but the program
>>>>>>      hasn't exited.
>>>>>>      > Could you please add some comments to
>>>>>> _disable_method_binary_search
>>>>>>      > with the above explanation? Thanks.
>>>>>>
>>>>>>      OK
>>>>>>      >
>>>>>>      >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>>>>>>      >>>
>>>>>>      >>>     76                     if (name.equals("s0") ||
>>>>>>      name.equals("s1")) {
>>>>>>      >>>     77                       // String regions are listed at
>>>>>>      the end and
>>>>>>      >>> they may not be fully occupied.
>>>>>>      >>>     78                       break;
>>>>>>      >>>     79                     } else if (name.equals("bm")) {
>>>>>>      >>>     80                       // Bitmap space does not have a
>>>>>>      requested address.
>>>>>>      >>>     81                       break;
>>>>>>      >>>
>>>>>>      >>> It's not part of your change, but could you please fix line 76
>>>>>>      - 78
>>>>>>      >>> since it is trivial. It seems the lines can be removed.
>>>>>>      >> Removed.
>>>>>>      >>
>>>>>>      >>> - /src/hotspot/share/memory/archiveUtils.hpp
>>>>>>      >>> The file name does not match with the macro '#ifndef
>>>>>>      >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>>>>>>      >>> archiveUtils.* ? archiveRelocator.hpp and
>>>>>> archiveRelocator.cpp are
>>>>>>      >>> more descriptive.
>>>>>>      >> I named the file archiveUtils.hpp so we can move other misc
>>>>>>      stuff used
>>>>>>      >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>>>>>>      >> metaspaceShared.hpp), since theses are not used by the majority
>>>>>>      of the
>>>>>>      >> files that use metaspaceShared.hpp.
>>>>>>      >>
>>>>>>      >> I fixed the ifdef.
>>>>>>      >>
>>>>>>      >>> - src/hotspot/share/memory/archiveUtils.cpp
>>>>>>      >>>
>>>>>>      >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>>>>>>      address*
>>>>>>      >>> ptr_base, address* ptr_end) {
>>>>>>      >>>     37   assert(_ptrmap == NULL, "initialize only once");
>>>>>>      >>>     38   _ptr_base = ptr_base;
>>>>>>      >>>     39   _ptr_end = ptr_end;
>>>>>>      >>>     40   _compacted = false;
>>>>>>      >>>     41   _ptrmap = ptrmap;
>>>>>>      >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>>>>>>      default
>>>>>>      >>> archive is about 12MB.
>>>>>>      >>>     43 }
>>>>>>      >>>
>>>>>>      >>> Could we do a better estimate here? We could guesstimate the
>>>>>> size
>>>>>>      >>> based on the current used class space and metaspace size. It's
>>>>>>      okay if
>>>>>>      >>> a larger bitmap used, since it can be reduced after all
>>>>>>      marking are
>>>>>>      >>> done.
>>>>>>      >> The bitmap is automatically expanded when necessary in
>>>>>>      >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>>>>>>      of the
>>>>>>      >> total archive size, so even if we do expand, the cost will be
>>>>>>      trivial.
>>>>>>      > The initial value is based on the default CDS archive. When
>>>>>> dealing
>>>>>>      > with a really large archive, it would have to re-grow many times.
>>>>>>      > Also, using a hard-coded value is less desirable.
>>>>>>
>>>>>>      OK, I changed it to the following
>>>>>>
>>>>>>         // Use this as initial guesstimate. We should need less space
>>>>>>      in the
>>>>>>         // archive, but if we're wrong the bitmap will be expanded
>>>>>>      automatically.
>>>>>>         size_t estimated_archive_size =
>>>>>> MetaspaceGC::capacity_until_GC();
>>>>>>         // But set it smaller in debug builds so we always test the
>>>>>>      expansion
>>>>>>      code.
>>>>>>         // (Default archive is about 12MB).
>>>>>>         DEBUG_ONLY(estimated_archive_size = 6 * M);
>>>>>>
>>>>>>         // We need one bit per pointer in the archive.
>>>>>>         _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>>>>>>
>>>>>>
>>>>>>      Thanks!
>>>>>>      - Ioi
>>>>>>
>>>>>>      >
>>>>>>      >>>
>>>>>>      >>>
>>>>>>      >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>>>>>>      <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>>>>>>      >>>> Hi Ioi,
>>>>>>      >>>>
>>>>>>      >>>> This is another great step for CDS usability improvement.
>>>>>>      Thank you!
>>>>>>      >>>>
>>>>>>      >>>> I have a high level question (or request): could we consider
>>>>>>      >>>> separating the relocation work for 'direct' class metadata
>>>>>>      from other
>>>>>>      >>>> types of metadata (such as the shared system dictionary,
>>>>>>      symbol table,
>>>>>>      >>>> etc)? Initially we only relocate the tables and other
>>>>>>      archived global
>>>>>>      >>>> data. When each archived class is being loaded, we can
>>>>>>      relocate all
>>>>>>      >>>> the pointers within the current class. We could find the
>>>>>>      segment (for
>>>>>>      >>>> the current class) in the bitmap and update the pointers
>>>>>>      within the
>>>>>>      >>>> segment. That way we can reduce initial startup costs and
>>>>>>      also avoid
>>>>>>      >>>> relocating class data that's not used at runtime. In some
>>>>>>      real world
>>>>>>      >>>> large systems, an archive may contain extremely large
>>>>>> number of
>>>>>>      >>>> classes.
>>>>>>      >>>>
>>>>>>      >>>> Following are partial review comments so we can move things
>>>>>>      forward.
>>>>>>      >>>> Still going through the rest of the changes.
>>>>>>      >>>>
>>>>>>      >>>> - src/hotspot/share/classfile/javaClasses.cpp
>>>>>>      >>>>
>>>>>>      >>>> 1218 void
>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>>>>>>      >>>> archived_mirror) {
>>>>>>      >>>> 1219   Klass* k =
>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>>>>>>      >>>> 1220   if (k != NULL) { // k is NULL for the primitive
>>>>>>      classes such as
>>>>>>      >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>>>>>>      >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
>>>>>>      >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>>>>>>      >>>> 1222   }
>>>>>>      >>>> 1223 ...
>>>>>>      >>>>
>>>>>>      >>>> Primitive type mirrors are handled separately. Could you
>>>>>>      please verify
>>>>>>      >>>> if this call path happens for primitive type mirror?
>>>>>>      >>>>
>>>>>>      >>>> To answer my question above, looks like you added the
>>>>>>      following, which
>>>>>>      >>>> is to be used for primitive type mirrors. That seems to be
>>>>>>      the reason
>>>>>>      >>>> why update_archived_mirror_native_pointers is trying to also
>>>>>>      cover
>>>>>>      >>>> primitive type. It better to have a separate API for
>>>>>>      primitive type
>>>>>>      >>>> mirror, which is cleaner. And, we also can replace the above
>>>>>>      check at
>>>>>>      >>>> line 1220 to be an assert for regular mirrors.
>>>>>>      >>>>
>>>>>>      >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>>>>>>      >>>> +  do_oop(p);
>>>>>>      >>>> +  oop mirror = *p;
>>>>>>      >>>> +  if (mirror != NULL) {
>>>>>>      >>>> +
>>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
>>>>>>      >>>> +  }
>>>>>>      >>>> +}
>>>>>>      >>>> +
>>>>>>      >>>>
>>>>>>      >>>> How about renaming update_archived_mirror_native_pointers to
>>>>>>      >>>> update_archived_mirror_klass_pointers.
>>>>>>      >>>>
>>>>>>      >>>> It would be good to pass the current klass as an argument.
>>>>>> We can
>>>>>>      >>>> verify the relocated pointer matches with the current klass
>>>>>>      pointer.
>>>>>>      >>>>
>>>>>>      >>>> We should also check if relocation is necessary before
>>>>>>      spending cycles
>>>>>>      >>>> to obtain the klass pointer from the mirror.
>>>>>>      >>>>
>>>>>>      >>>> 1252  update_archived_mirror_native_pointers(m);
>>>>>>      >>>> 1253
>>>>>>      >>>> 1254   // mirror is archived, restore
>>>>>>      >>>> 1255  assert(HeapShared::is_archived_object(m), "must be
>>>>>> archived
>>>>>>      >>>> mirror object");
>>>>>>      >>>> 1256   Handle mirror(THREAD, m);
>>>>>>      >>>>
>>>>>>      >>>> Could we move the line at 1252 after the assert at line 1255?
>>>>>>      >>>>
>>>>>>      >>>> - src/hotspot/share/include/cds.h
>>>>>>      >>>>
>>>>>>      >>>>     47   int     _mapped_from_file;  // Is this region mapped
>>>>>>      from a file?
>>>>>>      >>>>     48                               // If false, this
>>>>>> region was
>>>>>>      >>>> initialized using os::read().
>>>>>>      >>>>
>>>>>>      >>>> Is the new field truly needed? It seems we could use
>>>>>>      _mapped_base to
>>>>>>      >>>> determine if a region is mapped or not?
>>>>>>      >>>>
>>>>>>      >>>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>      >>>>
>>>>>>      >>>> Could you please remove the debugging print code in
>>>>>>      >>>> dynamic_dump_method_comparator? Or convert those to logging
>>>>>>      output if
>>>>>>      >>>> they are helpful.
>>>>>>      >>>>
>>>>>>      >>>> Will send out the rest of the review comments later.
>>>>>>      >>>>
>>>>>>      >>>> Best,
>>>>>>      >>>>
>>>>>>      >>>> Jiangli
>>>>>>      >>>>
>>>>>>      >>>>
>>>>>>      >>>>
>>>>>>      >>>>
>>>>>>      >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>      <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>      >>>>> Bug:
>>>>>>      >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
>>>>>>      >>>>>
>>>>>>      >>>>> Webrev:
>>>>>>      >>>>>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
>>>>>>
>>>>>>      >>>>>
>>>>>>      >>>>> Design:
>>>>>>      >>>>>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
>>>>>>
>>>>>>      >>>>>
>>>>>>      >>>>>
>>>>>>      >>>>> Overview:
>>>>>>      >>>>>
>>>>>>      >>>>> The CDS archive is mmaped to a fixed address range
>>>>>> (starting at
>>>>>>      >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
>>>>>>      >>>>> requested address range is not available (usually due to
>>>>>> Address
>>>>>>      >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give
>>>>>> up and
>>>>>>      >>>>> will load classes dynamically using class files.
>>>>>>      >>>>>
>>>>>>      >>>>> [a] This causes slow down in JVM start-up.
>>>>>>      >>>>> [b] Handling of mapping failures causes unnecessary
>>>>>>      complication in
>>>>>>      >>>>>        the CDS tests.
>>>>>>      >>>>>
>>>>>>      >>>>> Here are some preliminary benchmarking results (using
>>>>>>      default CDS archive,
>>>>>>      >>>>> running helloworld):
>>>>>>      >>>>>
>>>>>>      >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
>>>>>>      >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
>>>>>>      >>>>> (c) 86.2ms (CDS disabled)
>>>>>>      >>>>>
>>>>>>      >>>>> The small degradation in (b) is caused by the relocation of
>>>>>>      >>>>> absolute pointers embedded in the CDS archive. However, it is
>>>>>>      >>>>> still a big improvement over case (c)
>>>>>>      >>>>>
>>>>>>      >>>>> Please see the design doc (link above) for details.
>>>>>>      >>>>>
>>>>>>      >>>>> Thanks
>>>>>>      >>>>> - Ioi
>>>>>>      >>>>>
>>>>>>


From coleen.phillimore at oracle.com  Fri Nov  8 23:20:36 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 8 Nov 2019 18:20:36 -0500
Subject: RFR (XS) 8232735: Convert PrintJNIResolving to Unified Logging
Message-ID: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>

Summary: converted the existing output at debug level because it is noisy

Tested with tier1 on all Oracle platforms, with os's linux, bsd, solaris 
and windows.

open webrev at http://cr.openjdk.java.net/~coleenp/2019/8232735.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8232735

Thanks,
Coleen

From ioi.lam at oracle.com  Sat Nov  9 02:23:43 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 8 Nov 2019 18:23:43 -0800
Subject: RFR (XS) 8232735: Convert PrintJNIResolving to Unified Logging
In-Reply-To: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>
References: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>
Message-ID: <def1f5b2-f66b-7574-8f2f-0edcaa96b4aa@oracle.com>

Hi Coleen,

Looks good to me.

Thanks
- Ioi

On 11/8/19 3:20 PM, coleen.phillimore at oracle.com wrote:
> Summary: converted the existing output at debug level because it is noisy
>
> Tested with tier1 on all Oracle platforms, with os's linux, bsd, 
> solaris and windows.
>
> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8232735.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8232735
>
> Thanks,
> Coleen


From jianglizhou at google.com  Sun Nov 10 04:25:22 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Sat, 9 Nov 2019 20:25:22 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
Message-ID: <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>

Hi Ioi,

On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Hi Jiangli,
>
> Thanks for your comments. Please see my replies in-line:
>
> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
> > On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com> wrote:
> >> I looked both 05.full and 06.delta webrevs. They look good.
> >>
> >> I still feel a bit uneasy about the potential runtime impact when data
> >> does get relocated. Long running apps/services may be shy away from
> >> enabling archive at runtime, if there is a detectable overhead even
> >> though it may only occur rarely. As relocation is enabled by default
> >> and users cannot turn it off, disabling with -Xshare:off entirely
> >> would become the only choice. Could you please create a new RFE
> >> (possibly with higher priority) to investigate the potential effect,
> >> or provide an option for users to opt-in relocation with the
> >> command-line switch?
>
> I created https://bugs.openjdk.java.net/browse/JDK-8233862
> Investigate performance benefit of relocating CDS archive to under 32G
>
> As I noted in the bug report, I ran benchmarks with CDS relocation
> on/off, and there's no sign of regression when the CDS archive is
> relocated. Please see the bug report for how to configure the VM to do
> the comparison.
>
> As you said before: "When enabling CDS we [google] noticed a small
> runtime overhead in JDK 11 recently with a benchmark. After I backported
> JDK-8213713 to 11, it seemed to reduce the runtime overhead that the
> benchmark was experiencing":
>
> Can you confirm whether this is stock JDK 11 or a special google build?
> Which test case did you use? Is it possible for you to run the tests
> again (using the exact before/after bits that you had when backporting
> JDK-8213713)? Can you check if narrow_klass_base and narrow_klass_shift
> are the same in your before/after builds?

Thanks for creating the RFE.

JDK-8213713 closes the 1G gap between the shared space and class space
and everything else is unaffected. The compressed class base and shift
were the same for before and after applying JDK-8213713. The effect
was statistically observed for the benchmark since the difference was
very small and could be within noise level for single run comparison.
A small difference could still be important for some use cases so it
needs to be taken into consideration when designing and implementing
new changes.

A new command-line for archived metadata relocation may still be
valuable. It would also be helpful for debugging and diagnosis.

>
> > Forgot to say that when Java heap can fit into low 32G space, it takes
> > the class space size into account and leaves need space right above
> > (also in low 32G space) when reserving heap, for !UseSharedSpace. In
> > that case, it's more likely the class data and heap data can be
> > colocated successfully.
>
> The reason is not for "colocation". It's so that narrow_klass_base can
> be zero, and the klass pointer can be uncompressed with a shift (without
> also doing an addition).
>
> But with CDS enabled, we always hard code to use non-zero
> narrow_klass_base and 3 bit shift (for AOT). So by just relocating the
> CDS archive to under 32GB, without modifying how CDS handles
> narrow_klass_base/shift, I don't think we can expect any benefit.

I experimented with mapping the shared space in low 32G and placed
right above the Java heap. The class space was also allocated in the
low 32G space and after the mapped shared space in the experiment. The
compress class encoding was using 0 base and 3 shift, which was the
same as the encoding when CDS was disabled. I didn't observe runtime
performance difference when comparing that specific configuration with
the normal CDS mapping scheme (the shared space start at 32G and the
encoding is non-zero base and 3 shift).

Thanks,
Jiangli
>
> For modern architectures, I am not aware of any inherent speed benefit
> simply by putting data (in our case much larger than a page) "close to
> each other" in the virtual address space. If you have any reference of
> that, please let me know.
>
> Thanks
> - Ioi
>
> >
> > Thanks,
> > Jiangli
> >
> >> Regards,
> >> Jiangli
> >>
> >> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>> Hi Coleen,
> >>>
> >>> Thanks for the review. Here's an webrev that has incorporated your
> >>> suggestions:
> >>>
> >>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
> >>>
> >>> Please see comments in-line
> >>>
> >>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
> >>>> Hi, I've done a more high level code review of this and it looks good!
> >>>>
> >>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
> >>>>
> >>>>
> >>>> I think these classes require comments on what they do and why. The
> >>>> comments you sent me offline look good.
> >>> I added more comments for ArchivePtrMarker::_compacted per your offline
> >>> request.
> >>>
> >>>> Also .hpp files shouldn't include .inline.hpp files, like
> >>>> bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
> >>>> the cpp file.
> >>> I moved the do_bit() function into archiveUtils.inline.hpp, since is
> >>> used by 3 .cpp files, and performance is important.
> >>>
> >>>> I wonder if the exception list of classes to exclude should be a
> >>>> function in javaClasses.hpp/cpp where the explanation would make more
> >>>> sense?  ie bool
> >>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
> >>> I moved the checking code to javaClasses.cpp. Since we do (partially)
> >>> support java.lang.Class, which has injected native pointers, I named the
> >>> function as JavaClasses::is_supported_for_archiving instead. I also
> >>> massaged the comments a little for clarification.
> >>>
> >>>> Is there already an RFE to move the DumpSharedSpaces output from
> >>>> tty->print() to log_info() ?
> >>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
> >>> dumping tty->print_cr() to unified logging).
> >>>
> >>> Thanks
> >>> - Ioi
> >>>
> >>>> Thanks,
> >>>> Coleen
> >>>>
> >>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
> >>>>> Hi Jiangli,
> >>>>>
> >>>>> I've uploaded the webrev after integrating your comments:
> >>>>>
> >>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
> >>>>>
> >>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
> >>>>>
> >>>>>
> >>>>> Please see more replies below:
> >>>>>
> >>>>>
> >>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
> >>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
> >>>>>> <mailto:ioi.lam at oracle.com>> wrote:
> >>>>>>
> >>>>>>      Hi Jiangli,
> >>>>>>
> >>>>>>      Thank you so much for spending time reviewing this RFE!
> >>>>>>
> >>>>>>      On 11/3/19 6:34 PM, Jiangli Zhou wrote:
> >>>>>>      > Hi Ioi,
> >>>>>>      >
> >>>>>>      > Sorry for the delay again. Will try to put this on the top of my
> >>>>>>      list
> >>>>>>      > next week and reduce the turn-around time. The updates look
> >>>>>> good in
> >>>>>>      > general.
> >>>>>>      >
> >>>>>>      > We might want to have a better strategy when choosing metadata
> >>>>>>      > relocation address (when relocation is needed). Some
> >>>>>>      > applications/benchmarks may be more sensitive to cache
> >>>>>> locality and
> >>>>>>      > memory/data layout. There was a bug,
> >>>>>>      > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
> >>>>>> 1G gap
> >>>>>>      > between Java heap data and metadata before JDK 12. The gap
> >>>>>> seemed to
> >>>>>>      > cause a small but noticeable runtime effect in one case that I
> >>>>>> came
> >>>>>>      > across.
> >>>>>>
> >>>>>>      I guess you're saying we should try to relocate the archive into
> >>>>>>      somewhere under 32GB?
> >>>>>>
> >>>>>>
> >>>>>> I don't yet have sufficient data that determins if mapping at low
> >>>>>> 32G produces better runtime performance. I experimented with that,
> >>>>>> but didn't see noticeable difference when comparing to mapping at
> >>>>>> the current default address. It doesn't hurt, I think. So it may be
> >>>>>> a better choice than relocating to a random address in high 32G
> >>>>>> space (when Java heap is in low 32G address space).
> >>>>> Maybe we should reconsider this when we have more concrete data for
> >>>>> the benefits of moving the compressed class space to under 32G.
> >>>>>
> >>>>> Please note that in metaspace.cpp, when CDS is disabled and  the VM
> >>>>> fails to allocate the class space at the requested address
> >>>>> (0x7c000000 for 16GB heap), it also just allocates from a random
> >>>>> address (without trying to to search under 32GB):
> >>>>>
> >>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
> >>>>>
> >>>>>
> >>>>> This code has been there since 2013 and we have not seen any issues.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>      Could you elaborate more about the performance issue, especially
> >>>>>>      about
> >>>>>>      cache locality? I looked at JDK-8213713 but it didn't mention about
> >>>>>>      performance.
> >>>>>>
> >>>>>>
> >>>>>> When enabling CDS we noticed a small runtime overhead in JDK 11
> >>>>>> recently with a benchmark. After I backported JDK-8213713 to 11, it
> >>>>>> seemed to reduce the runtime overhead that the benchmark was
> >>>>>> experiencing.
> >>>>>>
> >>>>>>
> >>>>>>      Also, by default, we have non-zero narrow_klass_base and
> >>>>>>      narrow_klass_shift = 3, and archive relocation doesn't change that:
> >>>>>>
> >>>>>>      $ java -Xlog:cds=debug -version
> >>>>>>      ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
> >>>>>>      $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
> >>>>>>      ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
> >>>>>>
> >>>>>>      We always use narrow_klass_shift due to this:
> >>>>>>
> >>>>>>         // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
> >>>>>>         //
> >>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
> >>>>>>         // how dump time narrow_klass_shift is set. Although, CDS can
> >>>>>> work
> >>>>>>         // with zero-shift mode also, to be consistent with AOT it uses
> >>>>>>         // LogKlassAlignmentInBytes for klass shift so archived java
> >>>>>>      heap objects
> >>>>>>         // can be used at same time as AOT code.
> >>>>>>         if (!UseSharedSpaces
> >>>>>>             && (uint64_t)(higher_address - lower_base) <=
> >>>>>>      UnscaledClassSpaceMax) {
> >>>>>>           CompressedKlassPointers::set_shift(0);
> >>>>>>         } else {
> >>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
> >>>>>>         }
> >>>>>>
> >>>>>>
> >>>>>> Right. If we relocate to low 32G space, it needs to make sure that
> >>>>>> the range containing the mapped class data and class space must be
> >>>>>> encodable.
> >>>>>>
> >>>>>>
> >>>>>>      > Here are some additional comments (minor).
> >>>>>>      >
> >>>>>>      > Could you please fix the long lines in the following?
> >>>>>>      >
> >>>>>>      > 1237 void
> >>>>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
> >>>>>>      > archived_mirror) {
> >>>>>>      > 1238   if (MetaspaceShared::relocation_delta() != 0) {
> >>>>>>      > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
> >>>>>>      > NULL, "must be for primitive class");
> >>>>>>      > 1240
> >>>>>>      > 1241     Klass* ak =
> >>>>>>      > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
> >>>>>>      > 1242     if (ak != NULL) {
> >>>>>>      > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
> >>>>>>      > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
> >>>>>>      > 1244     }
> >>>>>>      > 1245   }
> >>>>>>      > 1246 }
> >>>>>>      >
> >>>>>>      > src/hotspot/share/memory/dynamicArchive.cpp
> >>>>>>      >
> >>>>>>      >   889   Thread* THREAD = Thread::current();
> >>>>>>      >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
> >>>>>>      > dynamic_dump_method_comparator);
> >>>>>>      >   891   if (ik->default_methods() != NULL) {
> >>>>>>      >   892  Method::sort_methods(ik->default_methods(),
> >>>>>>      > /*set_idnums=*/false, dynamic_dump_method_comparator);
> >>>>>>      >   893   }
> >>>>>>      >
> >>>>>>
> >>>>>>      OK will do.
> >>>>>>
> >>>>>>      > Please see inlined comments below.
> >>>>>>      >
> >>>>>>      > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
> >>>>>>      <mailto:ioi.lam at oracle.com>> wrote:
> >>>>>>      >> Hi Jiangli,
> >>>>>>      >>
> >>>>>>      >> Thanks for the review. I've updated the patch according to your
> >>>>>>      comments:
> >>>>>>      >>
> >>>>>>      >>
> >>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
> >>>>>>
> >>>>>>      >>
> >>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
> >>>>>>
> >>>>>>      >>
> >>>>>>      >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
> >>>>>>      in my
> >>>>>>      >> reply to Calvin's comments).
> >>>>>>      >>
> >>>>>>      >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
> >>>>>>      >>> Hi Ioi,
> >>>>>>      >>>
> >>>>>>      >>> Sorry for the delay. Here are my remaining comments.
> >>>>>>      >>>
> >>>>>>      >>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>>>>      >>>
> >>>>>>      >>> 128   static intx _method_comparator_name_delta;
> >>>>>>      >>>
> >>>>>>      >>> The name of the above variable is confusing. It's the value of
> >>>>>>      >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
> >>>>>>      >>> directly.
> >>>>>>      >> _buffer_to_target_delta is a non-static field, but
> >>>>>>      >> dynamic_dump_method_comparator() must be a static function so
> >>>>>>      it can't
> >>>>>>      >> use the non-static field easily.
> >>>>>>      >
> >>>>>>      > It sounds like an issue. _buffer_to_target_delta was made as a
> >>>>>>      > non-static mostly because we might support more than one dynamic
> >>>>>>      > archives in the future. However, today's usages bake in an
> >>>>>>      assumption
> >>>>>>      > that _buffer_to_target_delta is a singleton value. It is
> >>>>>> cleaner to
> >>>>>>      > either make _buffer_to_target_delta as a static variable for
> >>>>>> now, or
> >>>>>>      > adding an access API in DynamicArchiveBuilder to allow other
> >>>>>> code to
> >>>>>>      > properly and correctly use the value.
> >>>>>>
> >>>>>>      OK, I'll move it to a static variable.
> >>>>>>
> >>>>>>      >
> >>>>>>      >>> Also, we can do a quick pointer comparison of 'a_name' and
> >>>>>>      >>> 'b_name' first before adjusting the pointers.
> >>>>>>      >> I added this:
> >>>>>>      >>
> >>>>>>      >>       if (a_name == b_name) {
> >>>>>>      >>         return 0;
> >>>>>>      >>       }
> >>>>>>      >>
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
> >>>>>>      >>> ...
> >>>>>>      >>>    944
> >>>>>>      >>>    945  ArchivePtrMarker::compact(relocatable_base,
> >>>>>>      relocatable_end);
> >>>>>>      >>> ...
> >>>>>>      >>>
> >>>>>>      >>>    974     SharedDataRelocator patcher((address*)patch_base,
> >>>>>>      >>> (address*)patch_end, valid_old_base, valid_old_end,
> >>>>>>      >>>    975  valid_new_base, valid_new_end, addr_delta);
> >>>>>>      >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
> >>>>>>      >>>
> >>>>>>      >>> Could we reduce the number of data re-iterations to help
> >>>>>> archive
> >>>>>>      >>> dumping performance. The ArchivePtrMarker::compact operation
> >>>>>>      can be
> >>>>>>      >>> combined with the patching iteration.
> >>>>>>      ArchivePtrMarker::compact API
> >>>>>>      >>> can be removed.
> >>>>>>      >> That's a good idea. I implemented it using a template parameter
> >>>>>>      so that
> >>>>>>      >> we can have max performance when relocating the archive at run
> >>>>>>      time.
> >>>>>>      >>
> >>>>>>      >> I added comments to explain why the relocation is done here. The
> >>>>>>      >> relocation is pretty rare (only when the base archive was not
> >>>>>>      mapped at
> >>>>>>      >> the default location).
> >>>>>>      >>
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>>    967     address valid_new_base =
> >>>>>>      >>> (address)Arguments::default_SharedBaseAddress();
> >>>>>>      >>>    968     address valid_new_end  = valid_new_base +
> >>>>>>      base_plus_top_size;
> >>>>>>      >>>
> >>>>>>      >>> The debugging only code can be included under #ifdef ASSERT.
> >>>>>>      >> These values are actually also used in debug logging so they
> >>>>>>      can't be
> >>>>>>      >> ifdef'ed out.
> >>>>>>      >>
> >>>>>>      >> Also, the c++ compiler is pretty good with eliding code
> >>>>>> that's no
> >>>>>>      >> actually used. If I comment out all the logging code in
> >>>>>>      >> DynamicArchiveBuilder::relocate_buffer_to_target() and
> >>>>>>      >> SharedDataRelocator, gcc elides all the unused fields and their
> >>>>>>      >> assignments. So no code is generated for this, etc.
> >>>>>>      >>
> >>>>>>      >>       address valid_new_base =
> >>>>>>      >> (address)Arguments::default_SharedBaseAddress();
> >>>>>>      >>
> >>>>>>      >> Since #ifdef ASSERT makes the code harder to read, I think we
> >>>>>>      should use
> >>>>>>      >> it only when really necessary.
> >>>>>>      > It seems cleaner to get rid of these debugging only variables, by
> >>>>>>      > using 'relocatable_base' and
> >>>>>>      > '(address)Arguments::default_SharedBaseAddress()' in the logging
> >>>>>>      code.
> >>>>>>
> >>>>>>      SharedDataRelocator is used under 3 different situations. These six
> >>>>>>      variables (patch_base, patch_end, valid_old_base, valid_old_end,
> >>>>>>      valid_new_base, valid_new_end) describes what is being patched,
> >>>>>>      and what
> >>>>>>      the expectations are, for each situation. The code will be hard to
> >>>>>>      understand without them.
> >>>>>>
> >>>>>>      Please note there's also logging code in the SharedDataRelocator
> >>>>>>      constructor that prints out these values.
> >>>>>>
> >>>>>>      I think I'll just remove the 'debug only' comment to avoid
> >>>>>> confusion.
> >>>>>>
> >>>>>>
> >>>>>> Ok.
> >>>>>>
> >>>>>>
> >>>>>>      >
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>>    993
> >>>>>>   dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
> >>>>>>      >>>
> >>>>>>      >>> We could combine the archived heap data bitmap into the new
> >>>>>>      region as
> >>>>>>      >>> well? It can be handled as a separate RFE.
> >>>>>>      >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
> >>>>>>      >>
> >>>>>>      >>> - src/hotspot/share/memory/filemap.cpp
> >>>>>>      >>>
> >>>>>>      >>> 1038     if (is_static()) {
> >>>>>>      >>> 1039       if (errno == ENOENT) {
> >>>>>>      >>> 1040         // Not locating the shared archive is ok.
> >>>>>>      >>> 1041         fail_continue("Specified shared archive not found
> >>>>>>      (%s).",
> >>>>>>      >>> _full_path);
> >>>>>>      >>> 1042       } else {
> >>>>>>      >>> 1043         fail_continue("Failed to open shared archive file
> >>>>>>      (%s).",
> >>>>>>      >>> 1044  os::strerror(errno));
> >>>>>>      >>> 1045       }
> >>>>>>      >>> 1046     } else {
> >>>>>>      >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
> >>>>>>      >>> doesn't exist: %s", _full_path);
> >>>>>>      >>> 1048     }
> >>>>>>      >>>
> >>>>>>      >>> If the top layer is explicitly specified by the user, a
> >>>>>>      warning does
> >>>>>>      >>> not seem to be a proper behavior if the VM fails to open the
> >>>>>>      archive
> >>>>>>      >>> file.
> >>>>>>      >>>
> >>>>>>      >>> If might be better to handle the relocation unrelated code in
> >>>>>>      separate
> >>>>>>      >>> changeset and track with a separate RFE.
> >>>>>>      >> This code was moved from
> >>>>>>      >>
> >>>>>>      >>
> >>>>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
> >>>>>>
> >>>>>>      >>
> >>>>>>      >> so I am not changing the behavior. If you want, we can file an
> >>>>>>      REF to
> >>>>>>      >> change the behavior.
> >>>>>>      > Ok. A new RFE sounds like the right thing to re-evaluable the
> >>>>>> usage
> >>>>>>      > issue here. Thanks.
> >>>>>>
> >>>>>>      I created https://bugs.openjdk.java.net/browse/JDK-8233446
> >>>>>>
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>> 1148 void FileMapInfo::write_region(int region, char* base,
> >>>>>>      size_t size,
> >>>>>>      >>> 1149                                bool read_only, bool
> >>>>>>      allow_exec) {
> >>>>>>      >>> ...
> >>>>>>      >>> 1154
> >>>>>>      >>> 1155   if (region == MetaspaceShared::bm) {
> >>>>>>      >>> 1156     target_base = NULL;
> >>>>>>      >>> 1157   } else if (DynamicDumpSharedSpaces) {
> >>>>>>      >>>
> >>>>>>      >>> It's not too clear to me how the bitmap (bm) region is handled
> >>>>>>      for the
> >>>>>>      >>> base layer and top layer. Could you please explain?
> >>>>>>      >> The bm region for both layers are mapped at an address picked
> >>>>>>      by the OS:
> >>>>>>      >>
> >>>>>>      >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
> >>>>>>      >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>>>>>      >>     bitmap_size = si->used_aligned();
> >>>>>>      >>     bool read_only = true, allow_exec = false;
> >>>>>>      >>     char* requested_addr = NULL; // allow OS to pick any
> >>>>>> location
> >>>>>>      >>     char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>>>>      si->file_offset(),
> >>>>>>      >> requested_addr, bitmap_size,
> >>>>>>      >> read_only, allow_exec);
> >>>>>>      >>
> >>>>>>      > Ok, after staring at the code for a few seconds I saw that's
> >>>>>>      intended.
> >>>>>>      > If the current region is 'bm', then the 'target_base' is NULL
> >>>>>>      > regardless if it's static or dynamic archive. Otherwise, the
> >>>>>>      > 'target_base' is handled differently for the static and dynamic
> >>>>>>      case.
> >>>>>>      > The following would be cleaner and has better reliability.
> >>>>>>      >
> >>>>>>      >     char* target_base = NULL;
> >>>>>>      >
> >>>>>>      >     // The target_base is NULL for 'bm' region.
> >>>>>>      >     if (!region == MetaspaceShared::bm) {
> >>>>>>      >       if (DynamicDumpSharedSpaces) {
> >>>>>>      >         assert(!HeapShared::is_heap_region(region), "dynamic
> >>>>>> archive
> >>>>>>      > doesn't support heap regions");
> >>>>>>      >         target_base = DynamicArchive::buffer_to_target(base);
> >>>>>>      >       } else {
> >>>>>>      >         target_base = base;
> >>>>>>      >       }
> >>>>>>      >    }
> >>>>>>
> >>>>>>      How about this?
> >>>>>>
> >>>>>>         char* target_base;
> >>>>>>         if (region == MetaspaceShared::bm) {
> >>>>>>           target_base = NULL; // always NULL for bm region.
> >>>>>>         } else {
> >>>>>>           if (DynamicDumpSharedSpaces) {
> >>>>>>               assert(!HeapShared::is_heap_region(region), "dynamic
> >>>>>> archive
> >>>>>>      doesn't support heap regions");
> >>>>>>               target_base = DynamicArchive::buffer_to_target(base);
> >>>>>>           } else {
> >>>>>>               target_base = base;
> >>>>>>           }
> >>>>>>         }
> >>>>>>
> >>>>>>
> >>>>>> No objection If you prefer the extra 'else' block.
> >>>>>>
> >>>>>>
> >>>>>>      >
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>> 1362
> >>>>>>   DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>>>>
> >>>>>>      >>>
> >>>>>>      >>> Could you please explain the above?
> >>>>>>      >> I added the comments
> >>>>>>      >>
> >>>>>>      >>     // Make sure we don't attempt to use
> >>>>>>      header()->mapped_base_address()
> >>>>>>      >> unless
> >>>>>>      >>     // it's been successfully mapped.
> >>>>>>      >>
> >>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>>>>
> >>>>>>      >>
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>> 1359   FileMapRegion* last_region = NULL;
> >>>>>>      >>>
> >>>>>>      >>> 1371     if (last_region != NULL) {
> >>>>>>      >>> 1372       // Ensure that the OS won't be able to allocate new
> >>>>>>      memory
> >>>>>>      >>> spaces between any mapped
> >>>>>>      >>> 1373       // regions, or else it would mess up the simple
> >>>>>>      comparision
> >>>>>>      >>> in MetaspaceObj::is_shared().
> >>>>>>      >>> 1374       assert(si->mapped_base() ==
> >>>>>> last_region->mapped_end(),
> >>>>>>      >>> "must have no gaps");
> >>>>>>      >>>
> >>>>>>      >>> 1379     last_region = si;
> >>>>>>      >>>
> >>>>>>      >>> Can you please place 'last_region' related code under #ifdef
> >>>>>>      ASSERT?
> >>>>>>      >> I think that will make the code more cluttered. The compiler
> >>>>>> will
> >>>>>>      >> optimize out that away.
> >>>>>>      > It's cleaner to define debugging only variable for debugging only
> >>>>>>      > builds. You can wrapper it and related usage with DEBUG_ONLY.
> >>>>>>
> >>>>>>      OK, will do.
> >>>>>>
> >>>>>>      >
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
> >>>>>>      bitmap_size) {
> >>>>>>      >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>>>>>      >>> 1480   bitmap_size = si->used_aligned();
> >>>>>>      >>> 1481   bool read_only = true, allow_exec = false;
> >>>>>>      >>> 1482   char* requested_addr = NULL; // allow OS to pick any
> >>>>>>      location
> >>>>>>      >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>>>>      si->file_offset(),
> >>>>>>      >>> 1484 requested_addr, bitmap_size,
> >>>>>>      >>> read_only, allow_exec);
> >>>>>>      >>>
> >>>>>>      >>> We need to handle mapping failure here.
> >>>>>>      >> It's handled here:
> >>>>>>      >>
> >>>>>>      >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
> >>>>>>      >>     log_debug(cds, reloc)("runtime archive relocation start");
> >>>>>>      >>     size_t bitmap_size;
> >>>>>>      >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
> >>>>>>      >>     if (bitmap_base != NULL) {
> >>>>>>      >>     ...
> >>>>>>      >>     } else {
> >>>>>>      >>       log_error(cds)("failed to map relocation bitmap");
> >>>>>>      >>       return false;
> >>>>>>      >>     }
> >>>>>>      >>
> >>>>>>      > 'bitmap_base' is used immediately after map_memory(). So the
> >>>>>> check
> >>>>>>      > needs to be done immediately after map_memory(), but not in the
> >>>>>>      caller
> >>>>>>      > of map_relocation_bitmap().
> >>>>>>      >
> >>>>>>      > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>>>>      si->file_offset(),
> >>>>>>      > 1491 requested_addr, bitmap_size,
> >>>>>>      > read_only, allow_exec);
> >>>>>>      > 1492
> >>>>>>      > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
> >>>>>>      > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
> >>>>>>
> >>>>>>      OK, I'll fix that.
> >>>>>>
> >>>>>>      >
> >>>>>>      >
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>> 1513     // debug only -- the current value of the pointers
> >>>>>> to be
> >>>>>>      >>> patched must be within this
> >>>>>>      >>> 1514     // range (i.e., must be between the requesed base
> >>>>>>      address,
> >>>>>>      >>> and the of the current archive).
> >>>>>>      >>> 1515     // Note: top archive may point to objects in the base
> >>>>>>      >>> archive, but not the other way around.
> >>>>>>      >>> 1516     address valid_old_base =
> >>>>>>      (address)header()->requested_base_address();
> >>>>>>      >>> 1517     address valid_old_end  = valid_old_base +
> >>>>>>      mapping_end_offset();
> >>>>>>      >>>
> >>>>>>      >>> Please place all FileMapInfo::relocate_pointers debugging only
> >>>>>>      code
> >>>>>>      >>> under #ifdef ASSERT.
> >>>>>>      >> Ditto about ifdef ASSERT
> >>>>>>      >>
> >>>>>>      >>> - src/hotspot/share/memory/heapShared.cpp
> >>>>>>      >>>
> >>>>>>      >>>    441 void
> >>>>>>      HeapShared::initialize_from_archived_subgraph(Klass* k) {
> >>>>>>      >>>    442   if (!open_archive_heap_region_mapped() ||
> >>>>>>      !MetaspaceObj::is_shared(k)) {
> >>>>>>      >>>    443     return; // nothing to do
> >>>>>>      >>>    444   }
> >>>>>>      >>>
> >>>>>>      >>> When do we call HeapShared::initialize_from_archived_subgraph
> >>>>>>      for a
> >>>>>>      >>> klass that's not shared?
> >>>>>>      >> I've removed the !MetaspaceObj::is_shared(k). I probably added
> >>>>>>      that for
> >>>>>>      >> debugging purposes only.
> >>>>>>      >>
> >>>>>>      >>>    616   DEBUG_ONLY({
> >>>>>>      >>>    617       Klass* klass = orig_obj->klass();
> >>>>>>      >>>    618       assert(klass !=
> >>>>>> SystemDictionary::Module_klass() &&
> >>>>>>      >>>    619              klass !=
> >>>>>>      SystemDictionary::ResolvedMethodName_klass() &&
> >>>>>>      >>>    620              klass !=
> >>>>>>      SystemDictionary::MemberName_klass() &&
> >>>>>>      >>>    621              klass !=
> >>>>>> SystemDictionary::Context_klass() &&
> >>>>>>      >>>    622              klass !=
> >>>>>>      SystemDictionary::ClassLoader_klass(), "we
> >>>>>>      >>> can only relocate metaspace object pointers inside
> >>>>>> java_lang_Class
> >>>>>>      >>> instances");
> >>>>>>      >>>    623     });
> >>>>>>      >>>
> >>>>>>      >>> Let's leave the above for a separate RFE. I think assert is not
> >>>>>>      >>> sufficient for the check. Also, why ResolvedMethodName,
> >>>>>> Module and
> >>>>>>      >>> MemberName cannot be part of the graph?
> >>>>>>      >>>
> >>>>>>      >>>
> >>>>>>      >> I added the following comment:
> >>>>>>      >>
> >>>>>>      >>     DEBUG_ONLY({
> >>>>>>      >>         // The following are classes in
> >>>>>>      share/classfile/javaClasses.cpp
> >>>>>>      >> that have injected native pointers
> >>>>>>      >>         // to metaspace objects. To support these classes, we
> >>>>>>      need to add
> >>>>>>      >> relocation code similar to
> >>>>>>      >>         //
> >>>>>> java_lang_Class::update_archived_mirror_native_pointers.
> >>>>>>      >>         Klass* klass = orig_obj->klass();
> >>>>>>      >>         assert(klass != SystemDictionary::Module_klass() &&
> >>>>>>      >>                klass !=
> >>>>>>      SystemDictionary::ResolvedMethodName_klass() &&
> >>>>>>      >>
> >>>>>>      > It's too restrictive to exclude those objects from the archived
> >>>>>>      object
> >>>>>>      > graph because metadata relocation, since metadata relocation is
> >>>>>>      rare.
> >>>>>>      > The trade-off doesn't seem to buy us much.
> >>>>>>      >
> >>>>>>      > Do you plan to add the needed relocation code?
> >>>>>>
> >>>>>>      I looked more into this. Actually we cannot handle these 5
> >>>>>> classes at
> >>>>>>      all, even without archive relocation:
> >>>>>>
> >>>>>>      [1] #define MODULE_INJECTED_FIELDS(macro) \
> >>>>>>         macro(java_lang_Module, module_entry, intptr_signature, false)
> >>>>>>
> >>>>>>      ->  module_entry is malloc'ed
> >>>>>>
> >>>>>>      [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
> >>>>>>         macro(java_lang_invoke_ResolvedMethodName, vmholder,
> >>>>>>      object_signature, false) \
> >>>>>>         macro(java_lang_invoke_ResolvedMethodName, vmtarget,
> >>>>>>      intptr_signature, false)
> >>>>>>
> >>>>>>      -> these fields are related to method handles and lambda forms,
> >>>>>> etc.
> >>>>>>      They can't be easily be archived without implementing lambda form
> >>>>>>      archiving. (I did a prototype; it's very complex and fragile).
> >>>>>>
> >>>>>>      [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
> >>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> >>>>>>      vmdependencies, intptr_signature, false) \
> >>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> >>>>>>      last_cleanup, long_signature, false)
> >>>>>>
> >>>>>>      -> vmdependencies is malloc'ed.
> >>>>>>
> >>>>>>      [4] #define
> >>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
> >>>>>>         macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
> >>>>>>      false)
> >>>>>>
> >>>>>>      -> this one is probably OK. Despite being declared as
> >>>>>>      'intptr_signature', it seems to be used just as an integer.
> >>>>>> However,
> >>>>>>      MemberNames are typically used with [2] and [3]. So let's just
> >>>>>>      forbid it
> >>>>>>      to be safe.
> >>>>>>
> >>>>>>      [2] [3] [4] are not used directly by regular Java code and are
> >>>>>>      unlikely
> >>>>>>      to be referenced (directly or indirectly) by static fields (except
> >>>>>>      for
> >>>>>>      the static fields in the classes in java.lang.invoke, which we
> >>>>>>      probably
> >>>>>>      won't support for heap archiving due to the problem I described for
> >>>>>>      [2]). Objects of these types are typically referenced via constant
> >>>>>>      pool
> >>>>>>      entries.
> >>>>>>
> >>>>>>      [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
> >>>>>>         macro(java_lang_ClassLoader, loader_data, intptr_signature,
> >>>>>> false)
> >>>>>>
> >>>>>>      -> loader_data is malloc'ed.
> >>>>>>
> >>>>>>      So, I will change the DEBUG_ONLY into a product-mode check, and
> >>>>>> quit
> >>>>>>      dumping if these objects are found in the object subgraph.
> >>>>>>
> >>>>>>
> >>>>>> Sounds good. Can you please also add a comment with explanation.
> >>>>>>
> >>>>>> For  ClassLoader and Module, it worth considering caching the
> >>>>>> additional native data some time in the future. Lois had suggested
> >>>>>> the Module part a while ago.
> >>>>> I think we can do that if/when we archive Modules directly into the
> >>>>> shared heap.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>      Maybe we should backport the check to older versions as well?
> >>>>>>
> >>>>>>
> >>>>>> We should discuss with Andrew Haley for backports to JDK 11 update
> >>>>>> releases. Since the current OpenJDK 11 only applies Java heap
> >>>>>> archiving to a restricted set of JDK library code, I think it is
> >>>>>> safe without the new check.
> >>>>>>
> >>>>>> For non-LTS releases, it might not be worthwhile as they may not be
> >>>>>> widely used?
> >>>>> I agree. FYI, we (Oracle) have no plan for backporting more types of
> >>>>> heap object archiving, so the decision would be up to whoever that
> >>>>> decides to do so.
> >>>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>> Jiangli
> >>>>>>
> >>>>>>
> >>>>>>      >
> >>>>>>      >>> - src/hotspot/share/memory/metaspace.cpp
> >>>>>>      >>>
> >>>>>>      >>> 1036   metaspace_rs =
> >>>>>> ReservedSpace(compressed_class_space_size(),
> >>>>>>      >>> 1037   _reserve_alignment,
> >>>>>>      >>> 1038   large_pages,
> >>>>>>      >>> 1039   requested_addr);
> >>>>>>      >>>
> >>>>>>      >>> Please fix indentation.
> >>>>>>      >> Fixed.
> >>>>>>      >>
> >>>>>>      >>> - src/hotspot/share/memory/metaspaceClosure.hpp
> >>>>>>      >>>
> >>>>>>      >>>     78   enum SpecialRef {
> >>>>>>      >>>     79     _method_entry_ref
> >>>>>>      >>>     80   };
> >>>>>>      >>>
> >>>>>>      >>> Are there other pointers that are not references to
> >>>>>>      MetaspaceObj? If
> >>>>>>      >>> _method_entry_ref is the only type, it's probably not worth
> >>>>>>      defining
> >>>>>>      >>> SpecialRef?
> >>>>>>      >> There may be more types in the future, so I want to have a
> >>>>>>      stable API
> >>>>>>      >> that can be easily expanded without touching all the code that
> >>>>>>      uses it.
> >>>>>>      >>
> >>>>>>      >>
> >>>>>>      >>> - src/hotspot/share/memory/metaspaceShared.hpp
> >>>>>>      >>>
> >>>>>>      >>>     42 enum MapArchiveResult {
> >>>>>>      >>>     43   MAP_ARCHIVE_SUCCESS,
> >>>>>>      >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
> >>>>>>      >>>     45   MAP_ARCHIVE_OTHER_FAILURE
> >>>>>>      >>>     46 };
> >>>>>>      >>>
> >>>>>>      >>> If we want to define different failure types, it's probably
> >>>>>> worth
> >>>>>>      >>> using separate types for relocation failure and validation
> >>>>>>      failure.
> >>>>>>      >> For now, I just need to distinguish between MMAP_FAILURE (where
> >>>>>>      I should
> >>>>>>      >> attempt to remap at an alternative address) and OTHER_FAILURE
> >>>>>>      (where the
> >>>>>>      >> CDS archive loading will fail -- due to validation error,
> >>>>>>      insufficient
> >>>>>>      >> memory, etc -- without attempting to remap.)
> >>>>>>      >>
> >>>>>>      >>> ---
> >>>>>>      >>>
> >>>>>>      >>>    193   static intx _mapping_delta; // FIXME rename
> >>>>>>      >>>
> >>>>>>      >>> How about _relocation_delta?
> >>>>>>      >> Changed as suggested.
> >>>>>>      >>
> >>>>>>      >>> - src/hotspot/share/oops/instanceKlass
> >>>>>>      >>>
> >>>>>>      >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
> >>>>>>      >>>
> >>>>>>      >>> The use of _disable_method_binary_search is not necessary. You
> >>>>>>      can use
> >>>>>>      >>> DynamicDumpSharedSpaces for the purpose. That would make things
> >>>>>>      >>> cleaner.
> >>>>>>      >> If we always disable the binary search when
> >>>>>>      DynamicDumpSharedSpaces is
> >>>>>>      >> true, it will slow down normal execution of the Java program
> >>>>>> when
> >>>>>>      >> -XX:ArchiveClassesAtExit has been specified, but the program
> >>>>>>      hasn't exited.
> >>>>>>      > Could you please add some comments to
> >>>>>> _disable_method_binary_search
> >>>>>>      > with the above explanation? Thanks.
> >>>>>>
> >>>>>>      OK
> >>>>>>      >
> >>>>>>      >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
> >>>>>>      >>>
> >>>>>>      >>>     76                     if (name.equals("s0") ||
> >>>>>>      name.equals("s1")) {
> >>>>>>      >>>     77                       // String regions are listed at
> >>>>>>      the end and
> >>>>>>      >>> they may not be fully occupied.
> >>>>>>      >>>     78                       break;
> >>>>>>      >>>     79                     } else if (name.equals("bm")) {
> >>>>>>      >>>     80                       // Bitmap space does not have a
> >>>>>>      requested address.
> >>>>>>      >>>     81                       break;
> >>>>>>      >>>
> >>>>>>      >>> It's not part of your change, but could you please fix line 76
> >>>>>>      - 78
> >>>>>>      >>> since it is trivial. It seems the lines can be removed.
> >>>>>>      >> Removed.
> >>>>>>      >>
> >>>>>>      >>> - /src/hotspot/share/memory/archiveUtils.hpp
> >>>>>>      >>> The file name does not match with the macro '#ifndef
> >>>>>>      >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
> >>>>>>      >>> archiveUtils.* ? archiveRelocator.hpp and
> >>>>>> archiveRelocator.cpp are
> >>>>>>      >>> more descriptive.
> >>>>>>      >> I named the file archiveUtils.hpp so we can move other misc
> >>>>>>      stuff used
> >>>>>>      >> by dumping into this file (e.g., DumpRegion, WriteClosure from
> >>>>>>      >> metaspaceShared.hpp), since theses are not used by the majority
> >>>>>>      of the
> >>>>>>      >> files that use metaspaceShared.hpp.
> >>>>>>      >>
> >>>>>>      >> I fixed the ifdef.
> >>>>>>      >>
> >>>>>>      >>> - src/hotspot/share/memory/archiveUtils.cpp
> >>>>>>      >>>
> >>>>>>      >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
> >>>>>>      address*
> >>>>>>      >>> ptr_base, address* ptr_end) {
> >>>>>>      >>>     37   assert(_ptrmap == NULL, "initialize only once");
> >>>>>>      >>>     38   _ptr_base = ptr_base;
> >>>>>>      >>>     39   _ptr_end = ptr_end;
> >>>>>>      >>>     40   _compacted = false;
> >>>>>>      >>>     41   _ptrmap = ptrmap;
> >>>>>>      >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
> >>>>>>      default
> >>>>>>      >>> archive is about 12MB.
> >>>>>>      >>>     43 }
> >>>>>>      >>>
> >>>>>>      >>> Could we do a better estimate here? We could guesstimate the
> >>>>>> size
> >>>>>>      >>> based on the current used class space and metaspace size. It's
> >>>>>>      okay if
> >>>>>>      >>> a larger bitmap used, since it can be reduced after all
> >>>>>>      marking are
> >>>>>>      >>> done.
> >>>>>>      >> The bitmap is automatically expanded when necessary in
> >>>>>>      >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
> >>>>>>      of the
> >>>>>>      >> total archive size, so even if we do expand, the cost will be
> >>>>>>      trivial.
> >>>>>>      > The initial value is based on the default CDS archive. When
> >>>>>> dealing
> >>>>>>      > with a really large archive, it would have to re-grow many times.
> >>>>>>      > Also, using a hard-coded value is less desirable.
> >>>>>>
> >>>>>>      OK, I changed it to the following
> >>>>>>
> >>>>>>         // Use this as initial guesstimate. We should need less space
> >>>>>>      in the
> >>>>>>         // archive, but if we're wrong the bitmap will be expanded
> >>>>>>      automatically.
> >>>>>>         size_t estimated_archive_size =
> >>>>>> MetaspaceGC::capacity_until_GC();
> >>>>>>         // But set it smaller in debug builds so we always test the
> >>>>>>      expansion
> >>>>>>      code.
> >>>>>>         // (Default archive is about 12MB).
> >>>>>>         DEBUG_ONLY(estimated_archive_size = 6 * M);
> >>>>>>
> >>>>>>         // We need one bit per pointer in the archive.
> >>>>>>         _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
> >>>>>>
> >>>>>>
> >>>>>>      Thanks!
> >>>>>>      - Ioi
> >>>>>>
> >>>>>>      >
> >>>>>>      >>>
> >>>>>>      >>>
> >>>>>>      >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
> >>>>>>      <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
> >>>>>>      >>>> Hi Ioi,
> >>>>>>      >>>>
> >>>>>>      >>>> This is another great step for CDS usability improvement.
> >>>>>>      Thank you!
> >>>>>>      >>>>
> >>>>>>      >>>> I have a high level question (or request): could we consider
> >>>>>>      >>>> separating the relocation work for 'direct' class metadata
> >>>>>>      from other
> >>>>>>      >>>> types of metadata (such as the shared system dictionary,
> >>>>>>      symbol table,
> >>>>>>      >>>> etc)? Initially we only relocate the tables and other
> >>>>>>      archived global
> >>>>>>      >>>> data. When each archived class is being loaded, we can
> >>>>>>      relocate all
> >>>>>>      >>>> the pointers within the current class. We could find the
> >>>>>>      segment (for
> >>>>>>      >>>> the current class) in the bitmap and update the pointers
> >>>>>>      within the
> >>>>>>      >>>> segment. That way we can reduce initial startup costs and
> >>>>>>      also avoid
> >>>>>>      >>>> relocating class data that's not used at runtime. In some
> >>>>>>      real world
> >>>>>>      >>>> large systems, an archive may contain extremely large
> >>>>>> number of
> >>>>>>      >>>> classes.
> >>>>>>      >>>>
> >>>>>>      >>>> Following are partial review comments so we can move things
> >>>>>>      forward.
> >>>>>>      >>>> Still going through the rest of the changes.
> >>>>>>      >>>>
> >>>>>>      >>>> - src/hotspot/share/classfile/javaClasses.cpp
> >>>>>>      >>>>
> >>>>>>      >>>> 1218 void
> >>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
> >>>>>>      >>>> archived_mirror) {
> >>>>>>      >>>> 1219   Klass* k =
> >>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
> >>>>>>      >>>> 1220   if (k != NULL) { // k is NULL for the primitive
> >>>>>>      classes such as
> >>>>>>      >>>> java.lang.Byte::TYPE <<<<<<<<<<<
> >>>>>>      >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
> >>>>>>      >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
> >>>>>>      >>>> 1222   }
> >>>>>>      >>>> 1223 ...
> >>>>>>      >>>>
> >>>>>>      >>>> Primitive type mirrors are handled separately. Could you
> >>>>>>      please verify
> >>>>>>      >>>> if this call path happens for primitive type mirror?
> >>>>>>      >>>>
> >>>>>>      >>>> To answer my question above, looks like you added the
> >>>>>>      following, which
> >>>>>>      >>>> is to be used for primitive type mirrors. That seems to be
> >>>>>>      the reason
> >>>>>>      >>>> why update_archived_mirror_native_pointers is trying to also
> >>>>>>      cover
> >>>>>>      >>>> primitive type. It better to have a separate API for
> >>>>>>      primitive type
> >>>>>>      >>>> mirror, which is cleaner. And, we also can replace the above
> >>>>>>      check at
> >>>>>>      >>>> line 1220 to be an assert for regular mirrors.
> >>>>>>      >>>>
> >>>>>>      >>>> +void ReadClosure::do_mirror_oop(oop *p) {
> >>>>>>      >>>> +  do_oop(p);
> >>>>>>      >>>> +  oop mirror = *p;
> >>>>>>      >>>> +  if (mirror != NULL) {
> >>>>>>      >>>> +
> >>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
> >>>>>>      >>>> +  }
> >>>>>>      >>>> +}
> >>>>>>      >>>> +
> >>>>>>      >>>>
> >>>>>>      >>>> How about renaming update_archived_mirror_native_pointers to
> >>>>>>      >>>> update_archived_mirror_klass_pointers.
> >>>>>>      >>>>
> >>>>>>      >>>> It would be good to pass the current klass as an argument.
> >>>>>> We can
> >>>>>>      >>>> verify the relocated pointer matches with the current klass
> >>>>>>      pointer.
> >>>>>>      >>>>
> >>>>>>      >>>> We should also check if relocation is necessary before
> >>>>>>      spending cycles
> >>>>>>      >>>> to obtain the klass pointer from the mirror.
> >>>>>>      >>>>
> >>>>>>      >>>> 1252  update_archived_mirror_native_pointers(m);
> >>>>>>      >>>> 1253
> >>>>>>      >>>> 1254   // mirror is archived, restore
> >>>>>>      >>>> 1255  assert(HeapShared::is_archived_object(m), "must be
> >>>>>> archived
> >>>>>>      >>>> mirror object");
> >>>>>>      >>>> 1256   Handle mirror(THREAD, m);
> >>>>>>      >>>>
> >>>>>>      >>>> Could we move the line at 1252 after the assert at line 1255?
> >>>>>>      >>>>
> >>>>>>      >>>> - src/hotspot/share/include/cds.h
> >>>>>>      >>>>
> >>>>>>      >>>>     47   int     _mapped_from_file;  // Is this region mapped
> >>>>>>      from a file?
> >>>>>>      >>>>     48                               // If false, this
> >>>>>> region was
> >>>>>>      >>>> initialized using os::read().
> >>>>>>      >>>>
> >>>>>>      >>>> Is the new field truly needed? It seems we could use
> >>>>>>      _mapped_base to
> >>>>>>      >>>> determine if a region is mapped or not?
> >>>>>>      >>>>
> >>>>>>      >>>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>>>>      >>>>
> >>>>>>      >>>> Could you please remove the debugging print code in
> >>>>>>      >>>> dynamic_dump_method_comparator? Or convert those to logging
> >>>>>>      output if
> >>>>>>      >>>> they are helpful.
> >>>>>>      >>>>
> >>>>>>      >>>> Will send out the rest of the review comments later.
> >>>>>>      >>>>
> >>>>>>      >>>> Best,
> >>>>>>      >>>>
> >>>>>>      >>>> Jiangli
> >>>>>>      >>>>
> >>>>>>      >>>>
> >>>>>>      >>>>
> >>>>>>      >>>>
> >>>>>>      >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
> >>>>>>      <mailto:ioi.lam at oracle.com>> wrote:
> >>>>>>      >>>>> Bug:
> >>>>>>      >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
> >>>>>>      >>>>>
> >>>>>>      >>>>> Webrev:
> >>>>>>      >>>>>
> >>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
> >>>>>>
> >>>>>>      >>>>>
> >>>>>>      >>>>> Design:
> >>>>>>      >>>>>
> >>>>>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
> >>>>>>
> >>>>>>      >>>>>
> >>>>>>      >>>>>
> >>>>>>      >>>>> Overview:
> >>>>>>      >>>>>
> >>>>>>      >>>>> The CDS archive is mmaped to a fixed address range
> >>>>>> (starting at
> >>>>>>      >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
> >>>>>>      >>>>> requested address range is not available (usually due to
> >>>>>> Address
> >>>>>>      >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give
> >>>>>> up and
> >>>>>>      >>>>> will load classes dynamically using class files.
> >>>>>>      >>>>>
> >>>>>>      >>>>> [a] This causes slow down in JVM start-up.
> >>>>>>      >>>>> [b] Handling of mapping failures causes unnecessary
> >>>>>>      complication in
> >>>>>>      >>>>>        the CDS tests.
> >>>>>>      >>>>>
> >>>>>>      >>>>> Here are some preliminary benchmarking results (using
> >>>>>>      default CDS archive,
> >>>>>>      >>>>> running helloworld):
> >>>>>>      >>>>>
> >>>>>>      >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
> >>>>>>      >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
> >>>>>>      >>>>> (c) 86.2ms (CDS disabled)
> >>>>>>      >>>>>
> >>>>>>      >>>>> The small degradation in (b) is caused by the relocation of
> >>>>>>      >>>>> absolute pointers embedded in the CDS archive. However, it is
> >>>>>>      >>>>> still a big improvement over case (c)
> >>>>>>      >>>>>
> >>>>>>      >>>>> Please see the design doc (link above) for details.
> >>>>>>      >>>>>
> >>>>>>      >>>>> Thanks
> >>>>>>      >>>>> - Ioi
> >>>>>>      >>>>>
> >>>>>>
>

From ivan.gerasimov at oracle.com  Sun Nov 10 10:26:19 2019
From: ivan.gerasimov at oracle.com (Ivan Gerasimov)
Date: Sun, 10 Nov 2019 02:26:19 -0800
Subject: RFR: 8233497: Optimize default method generation by data
 structure reuse
In-Reply-To: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
References: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
Message-ID: <c48bde91-bdb7-9e6d-9c3c-2be762b17fa1@oracle.com>

Hi Claes!

I see that 'buffer' is never deleted in create_defaults_and_exceptions().

With kind regards,

Ivan

On 11/8/19 3:57 AM, Claes Redestad wrote:
> Hi,
>
> when loading classes with complex hierarchies and many default methods,
> we can end up spending significant time in
> DefaultMethods::generate_default_methods
>
> This optimization reduces work done and memory requirements by reusing
> allocated data structures. For example by maintaining free lists of
> allocated Node objects.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233497
> Webrev: http://cr.openjdk.java.net/~redestad/8233497/open.00/
>
> Testing: Tier1-3, will make sure tier4-7 pass before push
>
> Performance notes: On one of our more complex startup tests we see a 3%
> improvement on the execution time total. Much less on simpler
> applications.
>
> I've not done a formal complexity analysis, but I think the memory
> complexity is now down from O(N*M) to O(N+M) where N is the number of
> classes and interfaces in the hierarchy and M the number of methods of
> interest in that hierarchy. Algorithmic complexity is probably O(N*M)
> still, but with much better constants.
>
> Special thanks to Lois for patience and persistence over several rounds
> of pre-review!
>
> Thanks!
>
> /Claes
>
-- 
With kind regards,
Ivan Gerasimov


From claes.redestad at oracle.com  Sun Nov 10 21:58:39 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Sun, 10 Nov 2019 22:58:39 +0100
Subject: RFR: 8233497: Optimize default method generation by data
 structure reuse
In-Reply-To: <c48bde91-bdb7-9e6d-9c3c-2be762b17fa1@oracle.com>
References: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
 <c48bde91-bdb7-9e6d-9c3c-2be762b17fa1@oracle.com>
Message-ID: <e3da1b9f-f088-a667-83ab-322e9c9fdf5d@oracle.com>

Hi Ivan,

unless I'm gravely mistaken, ResourceObjs like 'buffer' does not need to
explicitly freed as long as it's allocated within a ResourceMark scope.

/Claes

On 2019-11-10 11:26, Ivan Gerasimov wrote:
> Hi Claes!
> 
> I see that 'buffer' is never deleted in create_defaults_and_exceptions().
> 
> With kind regards,
> 
> Ivan
> 
> On 11/8/19 3:57 AM, Claes Redestad wrote:
>> Hi,
>>
>> when loading classes with complex hierarchies and many default methods,
>> we can end up spending significant time in
>> DefaultMethods::generate_default_methods
>>
>> This optimization reduces work done and memory requirements by reusing
>> allocated data structures. For example by maintaining free lists of
>> allocated Node objects.
>>
>> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233497
>> Webrev: http://cr.openjdk.java.net/~redestad/8233497/open.00/
>>
>> Testing: Tier1-3, will make sure tier4-7 pass before push
>>
>> Performance notes: On one of our more complex startup tests we see a 3%
>> improvement on the execution time total. Much less on simpler
>> applications.
>>
>> I've not done a formal complexity analysis, but I think the memory
>> complexity is now down from O(N*M) to O(N+M) where N is the number of
>> classes and interfaces in the hierarchy and M the number of methods of
>> interest in that hierarchy. Algorithmic complexity is probably O(N*M)
>> still, but with much better constants.
>>
>> Special thanks to Lois for patience and persistence over several rounds
>> of pre-review!
>>
>> Thanks!
>>
>> /Claes
>>

From ioi.lam at oracle.com  Sun Nov 10 23:13:43 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Sun, 10 Nov 2019 15:13:43 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
 <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
Message-ID: <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>


On 11/9/19 8:25 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
> On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> Hi Jiangli,
>>
>> Thanks for your comments. Please see my replies in-line:
>>
>> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
>>> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>>>> I looked both 05.full and 06.delta webrevs. They look good.
>>>>
>>>> I still feel a bit uneasy about the potential runtime impact when data
>>>> does get relocated. Long running apps/services may be shy away from
>>>> enabling archive at runtime, if there is a detectable overhead even
>>>> though it may only occur rarely. As relocation is enabled by default
>>>> and users cannot turn it off, disabling with -Xshare:off entirely
>>>> would become the only choice. Could you please create a new RFE
>>>> (possibly with higher priority) to investigate the potential effect,
>>>> or provide an option for users to opt-in relocation with the
>>>> command-line switch?
>> I created https://bugs.openjdk.java.net/browse/JDK-8233862
>> Investigate performance benefit of relocating CDS archive to under 32G
>>
>> As I noted in the bug report, I ran benchmarks with CDS relocation
>> on/off, and there's no sign of regression when the CDS archive is
>> relocated. Please see the bug report for how to configure the VM to do
>> the comparison.
>>
>> As you said before: "When enabling CDS we [google] noticed a small
>> runtime overhead in JDK 11 recently with a benchmark. After I backported
>> JDK-8213713 to 11, it seemed to reduce the runtime overhead that the
>> benchmark was experiencing":
>>
>> Can you confirm whether this is stock JDK 11 or a special google build?
>> Which test case did you use? Is it possible for you to run the tests
>> again (using the exact before/after bits that you had when backporting
>> JDK-8213713)? Can you check if narrow_klass_base and narrow_klass_shift
>> are the same in your before/after builds?
> Thanks for creating the RFE.
>
> JDK-8213713 closes the 1G gap between the shared space and class space
> and everything else is unaffected. The compressed class base and shift
> were the same for before and after applying JDK-8213713. The effect
> was statistically observed for the benchmark since the difference was
> very small and could be within noise level for single run comparison.
> A small difference could still be important for some use cases so it
> needs to be taken into consideration when designing and implementing
> new changes.

Hi Jiangli,

Thanks for taking the time for doing the performance measurements.

I also ran benchmarks in all 3 modes (no CDS, CDS without relocation, 
CDS with relocation), and did not see any significant performance with
Octane-DeltaBlue, Octane-NavierStokes, SPECjbb2005-Tuned, 
JFR-SPECjbb2005-Tuned, SPECjvm2008-Serial-G1 and Tools-Javac-Hello.


>
> A new command-line for archived metadata relocation may still be
> valuable. It would also be helpful for debugging and diagnosis.
>

How about a diagnostic flag ArchiveRelocationMode:

0: (default) first map at preferred address, and if unsuccessful, map to 
alternative address;
1: always map to alternative address;
2: always map at preferred address, and if unsuccessful, do not map the 
archive;

1 is for testing relocation, as well as for easy performance measurement 
(replaces the use of -XX:SharedBaseAddress=0 in my current patch.).
2 is for avoiding potential regression that may be introduced by 
relocation (revert to JDK 13 behavior).

What do you think? If you like this I'll open a CSR.

Thanks
- Ioi


>>> Forgot to say that when Java heap can fit into low 32G space, it takes
>>> the class space size into account and leaves need space right above
>>> (also in low 32G space) when reserving heap, for !UseSharedSpace. In
>>> that case, it's more likely the class data and heap data can be
>>> colocated successfully.
>> The reason is not for "colocation". It's so that narrow_klass_base can
>> be zero, and the klass pointer can be uncompressed with a shift (without
>> also doing an addition).
>>
>> But with CDS enabled, we always hard code to use non-zero
>> narrow_klass_base and 3 bit shift (for AOT). So by just relocating the
>> CDS archive to under 32GB, without modifying how CDS handles
>> narrow_klass_base/shift, I don't think we can expect any benefit.
> I experimented with mapping the shared space in low 32G and placed
> right above the Java heap. The class space was also allocated in the
> low 32G space and after the mapped shared space in the experiment. The
> compress class encoding was using 0 base and 3 shift, which was the
> same as the encoding when CDS was disabled. I didn't observe runtime
> performance difference when comparing that specific configuration with
> the normal CDS mapping scheme (the shared space start at 32G and the
> encoding is non-zero base and 3 shift).
>
> Thanks,
> Jiangli
>> For modern architectures, I am not aware of any inherent speed benefit
>> simply by putting data (in our case much larger than a page) "close to
>> each other" in the virtual address space. If you have any reference of
>> that, please let me know.
>>
>> Thanks
>> - Ioi
>>
>>> Thanks,
>>> Jiangli
>>>
>>>> Regards,
>>>> Jiangli
>>>>
>>>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> Thanks for the review. Here's an webrev that has incorporated your
>>>>> suggestions:
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
>>>>>
>>>>> Please see comments in-line
>>>>>
>>>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
>>>>>> Hi, I've done a more high level code review of this and it looks good!
>>>>>>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
>>>>>>
>>>>>>
>>>>>> I think these classes require comments on what they do and why. The
>>>>>> comments you sent me offline look good.
>>>>> I added more comments for ArchivePtrMarker::_compacted per your offline
>>>>> request.
>>>>>
>>>>>> Also .hpp files shouldn't include .inline.hpp files, like
>>>>>> bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
>>>>>> the cpp file.
>>>>> I moved the do_bit() function into archiveUtils.inline.hpp, since is
>>>>> used by 3 .cpp files, and performance is important.
>>>>>
>>>>>> I wonder if the exception list of classes to exclude should be a
>>>>>> function in javaClasses.hpp/cpp where the explanation would make more
>>>>>> sense?  ie bool
>>>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>>>>> I moved the checking code to javaClasses.cpp. Since we do (partially)
>>>>> support java.lang.Class, which has injected native pointers, I named the
>>>>> function as JavaClasses::is_supported_for_archiving instead. I also
>>>>> massaged the comments a little for clarification.
>>>>>
>>>>>> Is there already an RFE to move the DumpSharedSpaces output from
>>>>>> tty->print() to log_info() ?
>>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
>>>>> dumping tty->print_cr() to unified logging).
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
>>>>>>> Hi Jiangli,
>>>>>>>
>>>>>>> I've uploaded the webrev after integrating your comments:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
>>>>>>>
>>>>>>>
>>>>>>> Please see more replies below:
>>>>>>>
>>>>>>>
>>>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>>>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>>> <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>
>>>>>>>>       Hi Jiangli,
>>>>>>>>
>>>>>>>>       Thank you so much for spending time reviewing this RFE!
>>>>>>>>
>>>>>>>>       On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>>>>>>>>       > Hi Ioi,
>>>>>>>>       >
>>>>>>>>       > Sorry for the delay again. Will try to put this on the top of my
>>>>>>>>       list
>>>>>>>>       > next week and reduce the turn-around time. The updates look
>>>>>>>> good in
>>>>>>>>       > general.
>>>>>>>>       >
>>>>>>>>       > We might want to have a better strategy when choosing metadata
>>>>>>>>       > relocation address (when relocation is needed). Some
>>>>>>>>       > applications/benchmarks may be more sensitive to cache
>>>>>>>> locality and
>>>>>>>>       > memory/data layout. There was a bug,
>>>>>>>>       > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
>>>>>>>> 1G gap
>>>>>>>>       > between Java heap data and metadata before JDK 12. The gap
>>>>>>>> seemed to
>>>>>>>>       > cause a small but noticeable runtime effect in one case that I
>>>>>>>> came
>>>>>>>>       > across.
>>>>>>>>
>>>>>>>>       I guess you're saying we should try to relocate the archive into
>>>>>>>>       somewhere under 32GB?
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't yet have sufficient data that determins if mapping at low
>>>>>>>> 32G produces better runtime performance. I experimented with that,
>>>>>>>> but didn't see noticeable difference when comparing to mapping at
>>>>>>>> the current default address. It doesn't hurt, I think. So it may be
>>>>>>>> a better choice than relocating to a random address in high 32G
>>>>>>>> space (when Java heap is in low 32G address space).
>>>>>>> Maybe we should reconsider this when we have more concrete data for
>>>>>>> the benefits of moving the compressed class space to under 32G.
>>>>>>>
>>>>>>> Please note that in metaspace.cpp, when CDS is disabled and  the VM
>>>>>>> fails to allocate the class space at the requested address
>>>>>>> (0x7c000000 for 16GB heap), it also just allocates from a random
>>>>>>> address (without trying to to search under 32GB):
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
>>>>>>>
>>>>>>>
>>>>>>> This code has been there since 2013 and we have not seen any issues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>       Could you elaborate more about the performance issue, especially
>>>>>>>>       about
>>>>>>>>       cache locality? I looked at JDK-8213713 but it didn't mention about
>>>>>>>>       performance.
>>>>>>>>
>>>>>>>>
>>>>>>>> When enabling CDS we noticed a small runtime overhead in JDK 11
>>>>>>>> recently with a benchmark. After I backported JDK-8213713 to 11, it
>>>>>>>> seemed to reduce the runtime overhead that the benchmark was
>>>>>>>> experiencing.
>>>>>>>>
>>>>>>>>
>>>>>>>>       Also, by default, we have non-zero narrow_klass_base and
>>>>>>>>       narrow_klass_shift = 3, and archive relocation doesn't change that:
>>>>>>>>
>>>>>>>>       $ java -Xlog:cds=debug -version
>>>>>>>>       ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>>>>>>>>       $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>>>>>>>>       ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>>>>>>>>
>>>>>>>>       We always use narrow_klass_shift due to this:
>>>>>>>>
>>>>>>>>          // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>>>>>>>>          //
>>>>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>>>>>>>>          // how dump time narrow_klass_shift is set. Although, CDS can
>>>>>>>> work
>>>>>>>>          // with zero-shift mode also, to be consistent with AOT it uses
>>>>>>>>          // LogKlassAlignmentInBytes for klass shift so archived java
>>>>>>>>       heap objects
>>>>>>>>          // can be used at same time as AOT code.
>>>>>>>>          if (!UseSharedSpaces
>>>>>>>>              && (uint64_t)(higher_address - lower_base) <=
>>>>>>>>       UnscaledClassSpaceMax) {
>>>>>>>>            CompressedKlassPointers::set_shift(0);
>>>>>>>>          } else {
>>>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>>>>>>>>          }
>>>>>>>>
>>>>>>>>
>>>>>>>> Right. If we relocate to low 32G space, it needs to make sure that
>>>>>>>> the range containing the mapped class data and class space must be
>>>>>>>> encodable.
>>>>>>>>
>>>>>>>>
>>>>>>>>       > Here are some additional comments (minor).
>>>>>>>>       >
>>>>>>>>       > Could you please fix the long lines in the following?
>>>>>>>>       >
>>>>>>>>       > 1237 void
>>>>>>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>>>>>>>>       > archived_mirror) {
>>>>>>>>       > 1238   if (MetaspaceShared::relocation_delta() != 0) {
>>>>>>>>       > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
>>>>>>>>       > NULL, "must be for primitive class");
>>>>>>>>       > 1240
>>>>>>>>       > 1241     Klass* ak =
>>>>>>>>       > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>>>>>>>>       > 1242     if (ak != NULL) {
>>>>>>>>       > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
>>>>>>>>       > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>>>>>>>>       > 1244     }
>>>>>>>>       > 1245   }
>>>>>>>>       > 1246 }
>>>>>>>>       >
>>>>>>>>       > src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>       >
>>>>>>>>       >   889   Thread* THREAD = Thread::current();
>>>>>>>>       >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>>>>>>>>       > dynamic_dump_method_comparator);
>>>>>>>>       >   891   if (ik->default_methods() != NULL) {
>>>>>>>>       >   892  Method::sort_methods(ik->default_methods(),
>>>>>>>>       > /*set_idnums=*/false, dynamic_dump_method_comparator);
>>>>>>>>       >   893   }
>>>>>>>>       >
>>>>>>>>
>>>>>>>>       OK will do.
>>>>>>>>
>>>>>>>>       > Please see inlined comments below.
>>>>>>>>       >
>>>>>>>>       > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>>>       <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>       >> Hi Jiangli,
>>>>>>>>       >>
>>>>>>>>       >> Thanks for the review. I've updated the patch according to your
>>>>>>>>       comments:
>>>>>>>>       >>
>>>>>>>>       >>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>>>>>>>>
>>>>>>>>       >>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>>>>>>>>
>>>>>>>>       >>
>>>>>>>>       >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>>>>>>>>       in my
>>>>>>>>       >> reply to Calvin's comments).
>>>>>>>>       >>
>>>>>>>>       >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>>>>>>>>       >>> Hi Ioi,
>>>>>>>>       >>>
>>>>>>>>       >>> Sorry for the delay. Here are my remaining comments.
>>>>>>>>       >>>
>>>>>>>>       >>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>       >>>
>>>>>>>>       >>> 128   static intx _method_comparator_name_delta;
>>>>>>>>       >>>
>>>>>>>>       >>> The name of the above variable is confusing. It's the value of
>>>>>>>>       >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>>>>>>>>       >>> directly.
>>>>>>>>       >> _buffer_to_target_delta is a non-static field, but
>>>>>>>>       >> dynamic_dump_method_comparator() must be a static function so
>>>>>>>>       it can't
>>>>>>>>       >> use the non-static field easily.
>>>>>>>>       >
>>>>>>>>       > It sounds like an issue. _buffer_to_target_delta was made as a
>>>>>>>>       > non-static mostly because we might support more than one dynamic
>>>>>>>>       > archives in the future. However, today's usages bake in an
>>>>>>>>       assumption
>>>>>>>>       > that _buffer_to_target_delta is a singleton value. It is
>>>>>>>> cleaner to
>>>>>>>>       > either make _buffer_to_target_delta as a static variable for
>>>>>>>> now, or
>>>>>>>>       > adding an access API in DynamicArchiveBuilder to allow other
>>>>>>>> code to
>>>>>>>>       > properly and correctly use the value.
>>>>>>>>
>>>>>>>>       OK, I'll move it to a static variable.
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >>> Also, we can do a quick pointer comparison of 'a_name' and
>>>>>>>>       >>> 'b_name' first before adjusting the pointers.
>>>>>>>>       >> I added this:
>>>>>>>>       >>
>>>>>>>>       >>       if (a_name == b_name) {
>>>>>>>>       >>         return 0;
>>>>>>>>       >>       }
>>>>>>>>       >>
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>>>>>>>>       >>> ...
>>>>>>>>       >>>    944
>>>>>>>>       >>>    945  ArchivePtrMarker::compact(relocatable_base,
>>>>>>>>       relocatable_end);
>>>>>>>>       >>> ...
>>>>>>>>       >>>
>>>>>>>>       >>>    974     SharedDataRelocator patcher((address*)patch_base,
>>>>>>>>       >>> (address*)patch_end, valid_old_base, valid_old_end,
>>>>>>>>       >>>    975  valid_new_base, valid_new_end, addr_delta);
>>>>>>>>       >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
>>>>>>>>       >>>
>>>>>>>>       >>> Could we reduce the number of data re-iterations to help
>>>>>>>> archive
>>>>>>>>       >>> dumping performance. The ArchivePtrMarker::compact operation
>>>>>>>>       can be
>>>>>>>>       >>> combined with the patching iteration.
>>>>>>>>       ArchivePtrMarker::compact API
>>>>>>>>       >>> can be removed.
>>>>>>>>       >> That's a good idea. I implemented it using a template parameter
>>>>>>>>       so that
>>>>>>>>       >> we can have max performance when relocating the archive at run
>>>>>>>>       time.
>>>>>>>>       >>
>>>>>>>>       >> I added comments to explain why the relocation is done here. The
>>>>>>>>       >> relocation is pretty rare (only when the base archive was not
>>>>>>>>       mapped at
>>>>>>>>       >> the default location).
>>>>>>>>       >>
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>>    967     address valid_new_base =
>>>>>>>>       >>> (address)Arguments::default_SharedBaseAddress();
>>>>>>>>       >>>    968     address valid_new_end  = valid_new_base +
>>>>>>>>       base_plus_top_size;
>>>>>>>>       >>>
>>>>>>>>       >>> The debugging only code can be included under #ifdef ASSERT.
>>>>>>>>       >> These values are actually also used in debug logging so they
>>>>>>>>       can't be
>>>>>>>>       >> ifdef'ed out.
>>>>>>>>       >>
>>>>>>>>       >> Also, the c++ compiler is pretty good with eliding code
>>>>>>>> that's no
>>>>>>>>       >> actually used. If I comment out all the logging code in
>>>>>>>>       >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>>>>>>>>       >> SharedDataRelocator, gcc elides all the unused fields and their
>>>>>>>>       >> assignments. So no code is generated for this, etc.
>>>>>>>>       >>
>>>>>>>>       >>       address valid_new_base =
>>>>>>>>       >> (address)Arguments::default_SharedBaseAddress();
>>>>>>>>       >>
>>>>>>>>       >> Since #ifdef ASSERT makes the code harder to read, I think we
>>>>>>>>       should use
>>>>>>>>       >> it only when really necessary.
>>>>>>>>       > It seems cleaner to get rid of these debugging only variables, by
>>>>>>>>       > using 'relocatable_base' and
>>>>>>>>       > '(address)Arguments::default_SharedBaseAddress()' in the logging
>>>>>>>>       code.
>>>>>>>>
>>>>>>>>       SharedDataRelocator is used under 3 different situations. These six
>>>>>>>>       variables (patch_base, patch_end, valid_old_base, valid_old_end,
>>>>>>>>       valid_new_base, valid_new_end) describes what is being patched,
>>>>>>>>       and what
>>>>>>>>       the expectations are, for each situation. The code will be hard to
>>>>>>>>       understand without them.
>>>>>>>>
>>>>>>>>       Please note there's also logging code in the SharedDataRelocator
>>>>>>>>       constructor that prints out these values.
>>>>>>>>
>>>>>>>>       I think I'll just remove the 'debug only' comment to avoid
>>>>>>>> confusion.
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>>    993
>>>>>>>>    dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>>>>>>>>       >>>
>>>>>>>>       >>> We could combine the archived heap data bitmap into the new
>>>>>>>>       region as
>>>>>>>>       >>> well? It can be handled as a separate RFE.
>>>>>>>>       >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>>>>>>>>       >>
>>>>>>>>       >>> - src/hotspot/share/memory/filemap.cpp
>>>>>>>>       >>>
>>>>>>>>       >>> 1038     if (is_static()) {
>>>>>>>>       >>> 1039       if (errno == ENOENT) {
>>>>>>>>       >>> 1040         // Not locating the shared archive is ok.
>>>>>>>>       >>> 1041         fail_continue("Specified shared archive not found
>>>>>>>>       (%s).",
>>>>>>>>       >>> _full_path);
>>>>>>>>       >>> 1042       } else {
>>>>>>>>       >>> 1043         fail_continue("Failed to open shared archive file
>>>>>>>>       (%s).",
>>>>>>>>       >>> 1044  os::strerror(errno));
>>>>>>>>       >>> 1045       }
>>>>>>>>       >>> 1046     } else {
>>>>>>>>       >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
>>>>>>>>       >>> doesn't exist: %s", _full_path);
>>>>>>>>       >>> 1048     }
>>>>>>>>       >>>
>>>>>>>>       >>> If the top layer is explicitly specified by the user, a
>>>>>>>>       warning does
>>>>>>>>       >>> not seem to be a proper behavior if the VM fails to open the
>>>>>>>>       archive
>>>>>>>>       >>> file.
>>>>>>>>       >>>
>>>>>>>>       >>> If might be better to handle the relocation unrelated code in
>>>>>>>>       separate
>>>>>>>>       >>> changeset and track with a separate RFE.
>>>>>>>>       >> This code was moved from
>>>>>>>>       >>
>>>>>>>>       >>
>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>>>>>>>>
>>>>>>>>       >>
>>>>>>>>       >> so I am not changing the behavior. If you want, we can file an
>>>>>>>>       REF to
>>>>>>>>       >> change the behavior.
>>>>>>>>       > Ok. A new RFE sounds like the right thing to re-evaluable the
>>>>>>>> usage
>>>>>>>>       > issue here. Thanks.
>>>>>>>>
>>>>>>>>       I created https://bugs.openjdk.java.net/browse/JDK-8233446
>>>>>>>>
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>> 1148 void FileMapInfo::write_region(int region, char* base,
>>>>>>>>       size_t size,
>>>>>>>>       >>> 1149                                bool read_only, bool
>>>>>>>>       allow_exec) {
>>>>>>>>       >>> ...
>>>>>>>>       >>> 1154
>>>>>>>>       >>> 1155   if (region == MetaspaceShared::bm) {
>>>>>>>>       >>> 1156     target_base = NULL;
>>>>>>>>       >>> 1157   } else if (DynamicDumpSharedSpaces) {
>>>>>>>>       >>>
>>>>>>>>       >>> It's not too clear to me how the bitmap (bm) region is handled
>>>>>>>>       for the
>>>>>>>>       >>> base layer and top layer. Could you please explain?
>>>>>>>>       >> The bm region for both layers are mapped at an address picked
>>>>>>>>       by the OS:
>>>>>>>>       >>
>>>>>>>>       >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>>>>>>>>       >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
>>>>>>>>       >>     bitmap_size = si->used_aligned();
>>>>>>>>       >>     bool read_only = true, allow_exec = false;
>>>>>>>>       >>     char* requested_addr = NULL; // allow OS to pick any
>>>>>>>> location
>>>>>>>>       >>     char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>>>       si->file_offset(),
>>>>>>>>       >> requested_addr, bitmap_size,
>>>>>>>>       >> read_only, allow_exec);
>>>>>>>>       >>
>>>>>>>>       > Ok, after staring at the code for a few seconds I saw that's
>>>>>>>>       intended.
>>>>>>>>       > If the current region is 'bm', then the 'target_base' is NULL
>>>>>>>>       > regardless if it's static or dynamic archive. Otherwise, the
>>>>>>>>       > 'target_base' is handled differently for the static and dynamic
>>>>>>>>       case.
>>>>>>>>       > The following would be cleaner and has better reliability.
>>>>>>>>       >
>>>>>>>>       >     char* target_base = NULL;
>>>>>>>>       >
>>>>>>>>       >     // The target_base is NULL for 'bm' region.
>>>>>>>>       >     if (!region == MetaspaceShared::bm) {
>>>>>>>>       >       if (DynamicDumpSharedSpaces) {
>>>>>>>>       >         assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>>>> archive
>>>>>>>>       > doesn't support heap regions");
>>>>>>>>       >         target_base = DynamicArchive::buffer_to_target(base);
>>>>>>>>       >       } else {
>>>>>>>>       >         target_base = base;
>>>>>>>>       >       }
>>>>>>>>       >    }
>>>>>>>>
>>>>>>>>       How about this?
>>>>>>>>
>>>>>>>>          char* target_base;
>>>>>>>>          if (region == MetaspaceShared::bm) {
>>>>>>>>            target_base = NULL; // always NULL for bm region.
>>>>>>>>          } else {
>>>>>>>>            if (DynamicDumpSharedSpaces) {
>>>>>>>>                assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>>>> archive
>>>>>>>>       doesn't support heap regions");
>>>>>>>>                target_base = DynamicArchive::buffer_to_target(base);
>>>>>>>>            } else {
>>>>>>>>                target_base = base;
>>>>>>>>            }
>>>>>>>>          }
>>>>>>>>
>>>>>>>>
>>>>>>>> No objection If you prefer the extra 'else' block.
>>>>>>>>
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>> 1362
>>>>>>>>    DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>>>
>>>>>>>>       >>>
>>>>>>>>       >>> Could you please explain the above?
>>>>>>>>       >> I added the comments
>>>>>>>>       >>
>>>>>>>>       >>     // Make sure we don't attempt to use
>>>>>>>>       header()->mapped_base_address()
>>>>>>>>       >> unless
>>>>>>>>       >>     // it's been successfully mapped.
>>>>>>>>       >>
>>>>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>>>
>>>>>>>>       >>
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>> 1359   FileMapRegion* last_region = NULL;
>>>>>>>>       >>>
>>>>>>>>       >>> 1371     if (last_region != NULL) {
>>>>>>>>       >>> 1372       // Ensure that the OS won't be able to allocate new
>>>>>>>>       memory
>>>>>>>>       >>> spaces between any mapped
>>>>>>>>       >>> 1373       // regions, or else it would mess up the simple
>>>>>>>>       comparision
>>>>>>>>       >>> in MetaspaceObj::is_shared().
>>>>>>>>       >>> 1374       assert(si->mapped_base() ==
>>>>>>>> last_region->mapped_end(),
>>>>>>>>       >>> "must have no gaps");
>>>>>>>>       >>>
>>>>>>>>       >>> 1379     last_region = si;
>>>>>>>>       >>>
>>>>>>>>       >>> Can you please place 'last_region' related code under #ifdef
>>>>>>>>       ASSERT?
>>>>>>>>       >> I think that will make the code more cluttered. The compiler
>>>>>>>> will
>>>>>>>>       >> optimize out that away.
>>>>>>>>       > It's cleaner to define debugging only variable for debugging only
>>>>>>>>       > builds. You can wrapper it and related usage with DEBUG_ONLY.
>>>>>>>>
>>>>>>>>       OK, will do.
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>>>>>>>>       bitmap_size) {
>>>>>>>>       >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
>>>>>>>>       >>> 1480   bitmap_size = si->used_aligned();
>>>>>>>>       >>> 1481   bool read_only = true, allow_exec = false;
>>>>>>>>       >>> 1482   char* requested_addr = NULL; // allow OS to pick any
>>>>>>>>       location
>>>>>>>>       >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>>>       si->file_offset(),
>>>>>>>>       >>> 1484 requested_addr, bitmap_size,
>>>>>>>>       >>> read_only, allow_exec);
>>>>>>>>       >>>
>>>>>>>>       >>> We need to handle mapping failure here.
>>>>>>>>       >> It's handled here:
>>>>>>>>       >>
>>>>>>>>       >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>>>>>>>>       >>     log_debug(cds, reloc)("runtime archive relocation start");
>>>>>>>>       >>     size_t bitmap_size;
>>>>>>>>       >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
>>>>>>>>       >>     if (bitmap_base != NULL) {
>>>>>>>>       >>     ...
>>>>>>>>       >>     } else {
>>>>>>>>       >>       log_error(cds)("failed to map relocation bitmap");
>>>>>>>>       >>       return false;
>>>>>>>>       >>     }
>>>>>>>>       >>
>>>>>>>>       > 'bitmap_base' is used immediately after map_memory(). So the
>>>>>>>> check
>>>>>>>>       > needs to be done immediately after map_memory(), but not in the
>>>>>>>>       caller
>>>>>>>>       > of map_relocation_bitmap().
>>>>>>>>       >
>>>>>>>>       > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>>>       si->file_offset(),
>>>>>>>>       > 1491 requested_addr, bitmap_size,
>>>>>>>>       > read_only, allow_exec);
>>>>>>>>       > 1492
>>>>>>>>       > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
>>>>>>>>       > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>>>>>>>>
>>>>>>>>       OK, I'll fix that.
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>> 1513     // debug only -- the current value of the pointers
>>>>>>>> to be
>>>>>>>>       >>> patched must be within this
>>>>>>>>       >>> 1514     // range (i.e., must be between the requesed base
>>>>>>>>       address,
>>>>>>>>       >>> and the of the current archive).
>>>>>>>>       >>> 1515     // Note: top archive may point to objects in the base
>>>>>>>>       >>> archive, but not the other way around.
>>>>>>>>       >>> 1516     address valid_old_base =
>>>>>>>>       (address)header()->requested_base_address();
>>>>>>>>       >>> 1517     address valid_old_end  = valid_old_base +
>>>>>>>>       mapping_end_offset();
>>>>>>>>       >>>
>>>>>>>>       >>> Please place all FileMapInfo::relocate_pointers debugging only
>>>>>>>>       code
>>>>>>>>       >>> under #ifdef ASSERT.
>>>>>>>>       >> Ditto about ifdef ASSERT
>>>>>>>>       >>
>>>>>>>>       >>> - src/hotspot/share/memory/heapShared.cpp
>>>>>>>>       >>>
>>>>>>>>       >>>    441 void
>>>>>>>>       HeapShared::initialize_from_archived_subgraph(Klass* k) {
>>>>>>>>       >>>    442   if (!open_archive_heap_region_mapped() ||
>>>>>>>>       !MetaspaceObj::is_shared(k)) {
>>>>>>>>       >>>    443     return; // nothing to do
>>>>>>>>       >>>    444   }
>>>>>>>>       >>>
>>>>>>>>       >>> When do we call HeapShared::initialize_from_archived_subgraph
>>>>>>>>       for a
>>>>>>>>       >>> klass that's not shared?
>>>>>>>>       >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>>>>>>>>       that for
>>>>>>>>       >> debugging purposes only.
>>>>>>>>       >>
>>>>>>>>       >>>    616   DEBUG_ONLY({
>>>>>>>>       >>>    617       Klass* klass = orig_obj->klass();
>>>>>>>>       >>>    618       assert(klass !=
>>>>>>>> SystemDictionary::Module_klass() &&
>>>>>>>>       >>>    619              klass !=
>>>>>>>>       SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>>>       >>>    620              klass !=
>>>>>>>>       SystemDictionary::MemberName_klass() &&
>>>>>>>>       >>>    621              klass !=
>>>>>>>> SystemDictionary::Context_klass() &&
>>>>>>>>       >>>    622              klass !=
>>>>>>>>       SystemDictionary::ClassLoader_klass(), "we
>>>>>>>>       >>> can only relocate metaspace object pointers inside
>>>>>>>> java_lang_Class
>>>>>>>>       >>> instances");
>>>>>>>>       >>>    623     });
>>>>>>>>       >>>
>>>>>>>>       >>> Let's leave the above for a separate RFE. I think assert is not
>>>>>>>>       >>> sufficient for the check. Also, why ResolvedMethodName,
>>>>>>>> Module and
>>>>>>>>       >>> MemberName cannot be part of the graph?
>>>>>>>>       >>>
>>>>>>>>       >>>
>>>>>>>>       >> I added the following comment:
>>>>>>>>       >>
>>>>>>>>       >>     DEBUG_ONLY({
>>>>>>>>       >>         // The following are classes in
>>>>>>>>       share/classfile/javaClasses.cpp
>>>>>>>>       >> that have injected native pointers
>>>>>>>>       >>         // to metaspace objects. To support these classes, we
>>>>>>>>       need to add
>>>>>>>>       >> relocation code similar to
>>>>>>>>       >>         //
>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
>>>>>>>>       >>         Klass* klass = orig_obj->klass();
>>>>>>>>       >>         assert(klass != SystemDictionary::Module_klass() &&
>>>>>>>>       >>                klass !=
>>>>>>>>       SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>>>       >>
>>>>>>>>       > It's too restrictive to exclude those objects from the archived
>>>>>>>>       object
>>>>>>>>       > graph because metadata relocation, since metadata relocation is
>>>>>>>>       rare.
>>>>>>>>       > The trade-off doesn't seem to buy us much.
>>>>>>>>       >
>>>>>>>>       > Do you plan to add the needed relocation code?
>>>>>>>>
>>>>>>>>       I looked more into this. Actually we cannot handle these 5
>>>>>>>> classes at
>>>>>>>>       all, even without archive relocation:
>>>>>>>>
>>>>>>>>       [1] #define MODULE_INJECTED_FIELDS(macro) \
>>>>>>>>          macro(java_lang_Module, module_entry, intptr_signature, false)
>>>>>>>>
>>>>>>>>       ->  module_entry is malloc'ed
>>>>>>>>
>>>>>>>>       [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>>>>>>>>          macro(java_lang_invoke_ResolvedMethodName, vmholder,
>>>>>>>>       object_signature, false) \
>>>>>>>>          macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>>>>>>>>       intptr_signature, false)
>>>>>>>>
>>>>>>>>       -> these fields are related to method handles and lambda forms,
>>>>>>>> etc.
>>>>>>>>       They can't be easily be archived without implementing lambda form
>>>>>>>>       archiving. (I did a prototype; it's very complex and fragile).
>>>>>>>>
>>>>>>>>       [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>>>       vmdependencies, intptr_signature, false) \
>>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>>>       last_cleanup, long_signature, false)
>>>>>>>>
>>>>>>>>       -> vmdependencies is malloc'ed.
>>>>>>>>
>>>>>>>>       [4] #define
>>>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>>>>>>>>          macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>>>>>>>>       false)
>>>>>>>>
>>>>>>>>       -> this one is probably OK. Despite being declared as
>>>>>>>>       'intptr_signature', it seems to be used just as an integer.
>>>>>>>> However,
>>>>>>>>       MemberNames are typically used with [2] and [3]. So let's just
>>>>>>>>       forbid it
>>>>>>>>       to be safe.
>>>>>>>>
>>>>>>>>       [2] [3] [4] are not used directly by regular Java code and are
>>>>>>>>       unlikely
>>>>>>>>       to be referenced (directly or indirectly) by static fields (except
>>>>>>>>       for
>>>>>>>>       the static fields in the classes in java.lang.invoke, which we
>>>>>>>>       probably
>>>>>>>>       won't support for heap archiving due to the problem I described for
>>>>>>>>       [2]). Objects of these types are typically referenced via constant
>>>>>>>>       pool
>>>>>>>>       entries.
>>>>>>>>
>>>>>>>>       [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>>>>>>>>          macro(java_lang_ClassLoader, loader_data, intptr_signature,
>>>>>>>> false)
>>>>>>>>
>>>>>>>>       -> loader_data is malloc'ed.
>>>>>>>>
>>>>>>>>       So, I will change the DEBUG_ONLY into a product-mode check, and
>>>>>>>> quit
>>>>>>>>       dumping if these objects are found in the object subgraph.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sounds good. Can you please also add a comment with explanation.
>>>>>>>>
>>>>>>>> For  ClassLoader and Module, it worth considering caching the
>>>>>>>> additional native data some time in the future. Lois had suggested
>>>>>>>> the Module part a while ago.
>>>>>>> I think we can do that if/when we archive Modules directly into the
>>>>>>> shared heap.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>       Maybe we should backport the check to older versions as well?
>>>>>>>>
>>>>>>>>
>>>>>>>> We should discuss with Andrew Haley for backports to JDK 11 update
>>>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
>>>>>>>> archiving to a restricted set of JDK library code, I think it is
>>>>>>>> safe without the new check.
>>>>>>>>
>>>>>>>> For non-LTS releases, it might not be worthwhile as they may not be
>>>>>>>> widely used?
>>>>>>> I agree. FYI, we (Oracle) have no plan for backporting more types of
>>>>>>> heap object archiving, so the decision would be up to whoever that
>>>>>>> decides to do so.
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>>>       >>>
>>>>>>>>       >>> 1036   metaspace_rs =
>>>>>>>> ReservedSpace(compressed_class_space_size(),
>>>>>>>>       >>> 1037   _reserve_alignment,
>>>>>>>>       >>> 1038   large_pages,
>>>>>>>>       >>> 1039   requested_addr);
>>>>>>>>       >>>
>>>>>>>>       >>> Please fix indentation.
>>>>>>>>       >> Fixed.
>>>>>>>>       >>
>>>>>>>>       >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>>>>>>>>       >>>
>>>>>>>>       >>>     78   enum SpecialRef {
>>>>>>>>       >>>     79     _method_entry_ref
>>>>>>>>       >>>     80   };
>>>>>>>>       >>>
>>>>>>>>       >>> Are there other pointers that are not references to
>>>>>>>>       MetaspaceObj? If
>>>>>>>>       >>> _method_entry_ref is the only type, it's probably not worth
>>>>>>>>       defining
>>>>>>>>       >>> SpecialRef?
>>>>>>>>       >> There may be more types in the future, so I want to have a
>>>>>>>>       stable API
>>>>>>>>       >> that can be easily expanded without touching all the code that
>>>>>>>>       uses it.
>>>>>>>>       >>
>>>>>>>>       >>
>>>>>>>>       >>> - src/hotspot/share/memory/metaspaceShared.hpp
>>>>>>>>       >>>
>>>>>>>>       >>>     42 enum MapArchiveResult {
>>>>>>>>       >>>     43   MAP_ARCHIVE_SUCCESS,
>>>>>>>>       >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
>>>>>>>>       >>>     45   MAP_ARCHIVE_OTHER_FAILURE
>>>>>>>>       >>>     46 };
>>>>>>>>       >>>
>>>>>>>>       >>> If we want to define different failure types, it's probably
>>>>>>>> worth
>>>>>>>>       >>> using separate types for relocation failure and validation
>>>>>>>>       failure.
>>>>>>>>       >> For now, I just need to distinguish between MMAP_FAILURE (where
>>>>>>>>       I should
>>>>>>>>       >> attempt to remap at an alternative address) and OTHER_FAILURE
>>>>>>>>       (where the
>>>>>>>>       >> CDS archive loading will fail -- due to validation error,
>>>>>>>>       insufficient
>>>>>>>>       >> memory, etc -- without attempting to remap.)
>>>>>>>>       >>
>>>>>>>>       >>> ---
>>>>>>>>       >>>
>>>>>>>>       >>>    193   static intx _mapping_delta; // FIXME rename
>>>>>>>>       >>>
>>>>>>>>       >>> How about _relocation_delta?
>>>>>>>>       >> Changed as suggested.
>>>>>>>>       >>
>>>>>>>>       >>> - src/hotspot/share/oops/instanceKlass
>>>>>>>>       >>>
>>>>>>>>       >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>>>>>>>>       >>>
>>>>>>>>       >>> The use of _disable_method_binary_search is not necessary. You
>>>>>>>>       can use
>>>>>>>>       >>> DynamicDumpSharedSpaces for the purpose. That would make things
>>>>>>>>       >>> cleaner.
>>>>>>>>       >> If we always disable the binary search when
>>>>>>>>       DynamicDumpSharedSpaces is
>>>>>>>>       >> true, it will slow down normal execution of the Java program
>>>>>>>> when
>>>>>>>>       >> -XX:ArchiveClassesAtExit has been specified, but the program
>>>>>>>>       hasn't exited.
>>>>>>>>       > Could you please add some comments to
>>>>>>>> _disable_method_binary_search
>>>>>>>>       > with the above explanation? Thanks.
>>>>>>>>
>>>>>>>>       OK
>>>>>>>>       >
>>>>>>>>       >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>>>>>>>>       >>>
>>>>>>>>       >>>     76                     if (name.equals("s0") ||
>>>>>>>>       name.equals("s1")) {
>>>>>>>>       >>>     77                       // String regions are listed at
>>>>>>>>       the end and
>>>>>>>>       >>> they may not be fully occupied.
>>>>>>>>       >>>     78                       break;
>>>>>>>>       >>>     79                     } else if (name.equals("bm")) {
>>>>>>>>       >>>     80                       // Bitmap space does not have a
>>>>>>>>       requested address.
>>>>>>>>       >>>     81                       break;
>>>>>>>>       >>>
>>>>>>>>       >>> It's not part of your change, but could you please fix line 76
>>>>>>>>       - 78
>>>>>>>>       >>> since it is trivial. It seems the lines can be removed.
>>>>>>>>       >> Removed.
>>>>>>>>       >>
>>>>>>>>       >>> - /src/hotspot/share/memory/archiveUtils.hpp
>>>>>>>>       >>> The file name does not match with the macro '#ifndef
>>>>>>>>       >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>>>>>>>>       >>> archiveUtils.* ? archiveRelocator.hpp and
>>>>>>>> archiveRelocator.cpp are
>>>>>>>>       >>> more descriptive.
>>>>>>>>       >> I named the file archiveUtils.hpp so we can move other misc
>>>>>>>>       stuff used
>>>>>>>>       >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>>>>>>>>       >> metaspaceShared.hpp), since theses are not used by the majority
>>>>>>>>       of the
>>>>>>>>       >> files that use metaspaceShared.hpp.
>>>>>>>>       >>
>>>>>>>>       >> I fixed the ifdef.
>>>>>>>>       >>
>>>>>>>>       >>> - src/hotspot/share/memory/archiveUtils.cpp
>>>>>>>>       >>>
>>>>>>>>       >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>>>>>>>>       address*
>>>>>>>>       >>> ptr_base, address* ptr_end) {
>>>>>>>>       >>>     37   assert(_ptrmap == NULL, "initialize only once");
>>>>>>>>       >>>     38   _ptr_base = ptr_base;
>>>>>>>>       >>>     39   _ptr_end = ptr_end;
>>>>>>>>       >>>     40   _compacted = false;
>>>>>>>>       >>>     41   _ptrmap = ptrmap;
>>>>>>>>       >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>>>>>>>>       default
>>>>>>>>       >>> archive is about 12MB.
>>>>>>>>       >>>     43 }
>>>>>>>>       >>>
>>>>>>>>       >>> Could we do a better estimate here? We could guesstimate the
>>>>>>>> size
>>>>>>>>       >>> based on the current used class space and metaspace size. It's
>>>>>>>>       okay if
>>>>>>>>       >>> a larger bitmap used, since it can be reduced after all
>>>>>>>>       marking are
>>>>>>>>       >>> done.
>>>>>>>>       >> The bitmap is automatically expanded when necessary in
>>>>>>>>       >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>>>>>>>>       of the
>>>>>>>>       >> total archive size, so even if we do expand, the cost will be
>>>>>>>>       trivial.
>>>>>>>>       > The initial value is based on the default CDS archive. When
>>>>>>>> dealing
>>>>>>>>       > with a really large archive, it would have to re-grow many times.
>>>>>>>>       > Also, using a hard-coded value is less desirable.
>>>>>>>>
>>>>>>>>       OK, I changed it to the following
>>>>>>>>
>>>>>>>>          // Use this as initial guesstimate. We should need less space
>>>>>>>>       in the
>>>>>>>>          // archive, but if we're wrong the bitmap will be expanded
>>>>>>>>       automatically.
>>>>>>>>          size_t estimated_archive_size =
>>>>>>>> MetaspaceGC::capacity_until_GC();
>>>>>>>>          // But set it smaller in debug builds so we always test the
>>>>>>>>       expansion
>>>>>>>>       code.
>>>>>>>>          // (Default archive is about 12MB).
>>>>>>>>          DEBUG_ONLY(estimated_archive_size = 6 * M);
>>>>>>>>
>>>>>>>>          // We need one bit per pointer in the archive.
>>>>>>>>          _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>>>>>>>>
>>>>>>>>
>>>>>>>>       Thanks!
>>>>>>>>       - Ioi
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >>>
>>>>>>>>       >>>
>>>>>>>>       >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>>>>>>>>       <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>>>>>>>>       >>>> Hi Ioi,
>>>>>>>>       >>>>
>>>>>>>>       >>>> This is another great step for CDS usability improvement.
>>>>>>>>       Thank you!
>>>>>>>>       >>>>
>>>>>>>>       >>>> I have a high level question (or request): could we consider
>>>>>>>>       >>>> separating the relocation work for 'direct' class metadata
>>>>>>>>       from other
>>>>>>>>       >>>> types of metadata (such as the shared system dictionary,
>>>>>>>>       symbol table,
>>>>>>>>       >>>> etc)? Initially we only relocate the tables and other
>>>>>>>>       archived global
>>>>>>>>       >>>> data. When each archived class is being loaded, we can
>>>>>>>>       relocate all
>>>>>>>>       >>>> the pointers within the current class. We could find the
>>>>>>>>       segment (for
>>>>>>>>       >>>> the current class) in the bitmap and update the pointers
>>>>>>>>       within the
>>>>>>>>       >>>> segment. That way we can reduce initial startup costs and
>>>>>>>>       also avoid
>>>>>>>>       >>>> relocating class data that's not used at runtime. In some
>>>>>>>>       real world
>>>>>>>>       >>>> large systems, an archive may contain extremely large
>>>>>>>> number of
>>>>>>>>       >>>> classes.
>>>>>>>>       >>>>
>>>>>>>>       >>>> Following are partial review comments so we can move things
>>>>>>>>       forward.
>>>>>>>>       >>>> Still going through the rest of the changes.
>>>>>>>>       >>>>
>>>>>>>>       >>>> - src/hotspot/share/classfile/javaClasses.cpp
>>>>>>>>       >>>>
>>>>>>>>       >>>> 1218 void
>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>>>>>>>>       >>>> archived_mirror) {
>>>>>>>>       >>>> 1219   Klass* k =
>>>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>>>>>>>>       >>>> 1220   if (k != NULL) { // k is NULL for the primitive
>>>>>>>>       classes such as
>>>>>>>>       >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>>>>>>>>       >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
>>>>>>>>       >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>>>>>>>>       >>>> 1222   }
>>>>>>>>       >>>> 1223 ...
>>>>>>>>       >>>>
>>>>>>>>       >>>> Primitive type mirrors are handled separately. Could you
>>>>>>>>       please verify
>>>>>>>>       >>>> if this call path happens for primitive type mirror?
>>>>>>>>       >>>>
>>>>>>>>       >>>> To answer my question above, looks like you added the
>>>>>>>>       following, which
>>>>>>>>       >>>> is to be used for primitive type mirrors. That seems to be
>>>>>>>>       the reason
>>>>>>>>       >>>> why update_archived_mirror_native_pointers is trying to also
>>>>>>>>       cover
>>>>>>>>       >>>> primitive type. It better to have a separate API for
>>>>>>>>       primitive type
>>>>>>>>       >>>> mirror, which is cleaner. And, we also can replace the above
>>>>>>>>       check at
>>>>>>>>       >>>> line 1220 to be an assert for regular mirrors.
>>>>>>>>       >>>>
>>>>>>>>       >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>>>>>>>>       >>>> +  do_oop(p);
>>>>>>>>       >>>> +  oop mirror = *p;
>>>>>>>>       >>>> +  if (mirror != NULL) {
>>>>>>>>       >>>> +
>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
>>>>>>>>       >>>> +  }
>>>>>>>>       >>>> +}
>>>>>>>>       >>>> +
>>>>>>>>       >>>>
>>>>>>>>       >>>> How about renaming update_archived_mirror_native_pointers to
>>>>>>>>       >>>> update_archived_mirror_klass_pointers.
>>>>>>>>       >>>>
>>>>>>>>       >>>> It would be good to pass the current klass as an argument.
>>>>>>>> We can
>>>>>>>>       >>>> verify the relocated pointer matches with the current klass
>>>>>>>>       pointer.
>>>>>>>>       >>>>
>>>>>>>>       >>>> We should also check if relocation is necessary before
>>>>>>>>       spending cycles
>>>>>>>>       >>>> to obtain the klass pointer from the mirror.
>>>>>>>>       >>>>
>>>>>>>>       >>>> 1252  update_archived_mirror_native_pointers(m);
>>>>>>>>       >>>> 1253
>>>>>>>>       >>>> 1254   // mirror is archived, restore
>>>>>>>>       >>>> 1255  assert(HeapShared::is_archived_object(m), "must be
>>>>>>>> archived
>>>>>>>>       >>>> mirror object");
>>>>>>>>       >>>> 1256   Handle mirror(THREAD, m);
>>>>>>>>       >>>>
>>>>>>>>       >>>> Could we move the line at 1252 after the assert at line 1255?
>>>>>>>>       >>>>
>>>>>>>>       >>>> - src/hotspot/share/include/cds.h
>>>>>>>>       >>>>
>>>>>>>>       >>>>     47   int     _mapped_from_file;  // Is this region mapped
>>>>>>>>       from a file?
>>>>>>>>       >>>>     48                               // If false, this
>>>>>>>> region was
>>>>>>>>       >>>> initialized using os::read().
>>>>>>>>       >>>>
>>>>>>>>       >>>> Is the new field truly needed? It seems we could use
>>>>>>>>       _mapped_base to
>>>>>>>>       >>>> determine if a region is mapped or not?
>>>>>>>>       >>>>
>>>>>>>>       >>>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>       >>>>
>>>>>>>>       >>>> Could you please remove the debugging print code in
>>>>>>>>       >>>> dynamic_dump_method_comparator? Or convert those to logging
>>>>>>>>       output if
>>>>>>>>       >>>> they are helpful.
>>>>>>>>       >>>>
>>>>>>>>       >>>> Will send out the rest of the review comments later.
>>>>>>>>       >>>>
>>>>>>>>       >>>> Best,
>>>>>>>>       >>>>
>>>>>>>>       >>>> Jiangli
>>>>>>>>       >>>>
>>>>>>>>       >>>>
>>>>>>>>       >>>>
>>>>>>>>       >>>>
>>>>>>>>       >>>> On Thu, Oct 10, 2019 at 6:00 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>>>       <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>       >>>>> Bug:
>>>>>>>>       >>>>> https://bugs.openjdk.java.net/browse/JDK-8231610
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Webrev:
>>>>>>>>       >>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v01/
>>>>>>>>
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Design:
>>>>>>>>       >>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/design/8231610-relocate-cds-archive.txt
>>>>>>>>
>>>>>>>>       >>>>>
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Overview:
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> The CDS archive is mmaped to a fixed address range
>>>>>>>> (starting at
>>>>>>>>       >>>>> SharedBaseAddress, usually 0x800000000). Previously, if this
>>>>>>>>       >>>>> requested address range is not available (usually due to
>>>>>>>> Address
>>>>>>>>       >>>>> Space Layout Randomization (ASLR) [2]), the JVM will give
>>>>>>>> up and
>>>>>>>>       >>>>> will load classes dynamically using class files.
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> [a] This causes slow down in JVM start-up.
>>>>>>>>       >>>>> [b] Handling of mapping failures causes unnecessary
>>>>>>>>       complication in
>>>>>>>>       >>>>>        the CDS tests.
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Here are some preliminary benchmarking results (using
>>>>>>>>       default CDS archive,
>>>>>>>>       >>>>> running helloworld):
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> (a) 47.1ms (CDS enabled, mapped at requested addr)
>>>>>>>>       >>>>> (b) 53.8ms (CDS enabled, mapped at alternate addr)
>>>>>>>>       >>>>> (c) 86.2ms (CDS disabled)
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> The small degradation in (b) is caused by the relocation of
>>>>>>>>       >>>>> absolute pointers embedded in the CDS archive. However, it is
>>>>>>>>       >>>>> still a big improvement over case (c)
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Please see the design doc (link above) for details.
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Thanks
>>>>>>>>       >>>>> - Ioi
>>>>>>>>       >>>>>
>>>>>>>>


From jianglizhou at google.com  Mon Nov 11 01:14:40 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Sun, 10 Nov 2019 17:14:40 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
 <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
 <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>
Message-ID: <CALrW1jye1Oua7e3LCNV6-c_pkYa3Ujni7own-ntXaFqv8tM6-Q@mail.gmail.com>

On Sun, Nov 10, 2019, 3:13 PM Ioi Lam <ioi.lam at oracle.com> wrote:

>
>
> On 11/9/19 8:25 PM, Jiangli Zhou wrote:
> > Hi Ioi,
> >
> > On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >> Hi Jiangli,
> >>
> >> Thanks for your comments. Please see my replies in-line:
> >>
> >> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
> >>> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com>
> wrote:
> >>>> I looked both 05.full and 06.delta webrevs. They look good.
> >>>>
> >>>> I still feel a bit uneasy about the potential runtime impact when data
> >>>> does get relocated. Long running apps/services may be shy away from
> >>>> enabling archive at runtime, if there is a detectable overhead even
> >>>> though it may only occur rarely. As relocation is enabled by default
> >>>> and users cannot turn it off, disabling with -Xshare:off entirely
> >>>> would become the only choice. Could you please create a new RFE
> >>>> (possibly with higher priority) to investigate the potential effect,
> >>>> or provide an option for users to opt-in relocation with the
> >>>> command-line switch?
> >> I created https://bugs.openjdk.java.net/browse/JDK-8233862
> >> Investigate performance benefit of relocating CDS archive to under 32G
> >>
> >> As I noted in the bug report, I ran benchmarks with CDS relocation
> >> on/off, and there's no sign of regression when the CDS archive is
> >> relocated. Please see the bug report for how to configure the VM to do
> >> the comparison.
> >>
> >> As you said before: "When enabling CDS we [google] noticed a small
> >> runtime overhead in JDK 11 recently with a benchmark. After I backported
> >> JDK-8213713 to 11, it seemed to reduce the runtime overhead that the
> >> benchmark was experiencing":
> >>
> >> Can you confirm whether this is stock JDK 11 or a special google build?
> >> Which test case did you use? Is it possible for you to run the tests
> >> again (using the exact before/after bits that you had when backporting
> >> JDK-8213713)? Can you check if narrow_klass_base and narrow_klass_shift
> >> are the same in your before/after builds?
> > Thanks for creating the RFE.
> >
> > JDK-8213713 closes the 1G gap between the shared space and class space
> > and everything else is unaffected. The compressed class base and shift
> > were the same for before and after applying JDK-8213713. The effect
> > was statistically observed for the benchmark since the difference was
> > very small and could be within noise level for single run comparison.
> > A small difference could still be important for some use cases so it
> > needs to be taken into consideration when designing and implementing
> > new changes.
>
> Hi Jiangli,
>
> Thanks for taking the time for doing the performance measurements.
>
> I also ran benchmarks in all 3 modes (no CDS, CDS without relocation,
> CDS with relocation), and did not see any significant performance with
> Octane-DeltaBlue, Octane-NavierStokes, SPECjbb2005-Tuned,
> JFR-SPECjbb2005-Tuned, SPECjvm2008-Serial-G1 and Tools-Javac-Hello.
>
>
> >
> > A new command-line for archived metadata relocation may still be
> > valuable. It would also be helpful for debugging and diagnosis.
> >
>
> How about a diagnostic flag ArchiveRelocationMode:
>
> 0: (default) first map at preferred address, and if unsuccessful, map to
> alternative address;
> 1: always map to alternative address;
> 2: always map at preferred address, and if unsuccessful, do not map the
> archive;
>
> 1 is for testing relocation, as well as for easy performance measurement
> (replaces the use of -XX:SharedBaseAddress=0 in my current patch.).
> 2 is for avoiding potential regression that may be introduced by
> relocation (revert to JDK 13 behavior).
>
> What do you think? If you like this I'll open a CSR.
>


That sounds good to me!

Regards,
Jiangli


> Thanks
> - Ioi
>
>
>
> >>> Forgot to say that when Java heap can fit into low 32G space, it takes
> >>> the class space size into account and leaves need space right above
> >>> (also in low 32G space) when reserving heap, for !UseSharedSpace. In
> >>> that case, it's more likely the class data and heap data can be
> >>> colocated successfully.
> >> The reason is not for "colocation". It's so that narrow_klass_base can
> >> be zero, and the klass pointer can be uncompressed with a shift (without
> >> also doing an addition).
> >>
> >> But with CDS enabled, we always hard code to use non-zero
> >> narrow_klass_base and 3 bit shift (for AOT). So by just relocating the
> >> CDS archive to under 32GB, without modifying how CDS handles
> >> narrow_klass_base/shift, I don't think we can expect any benefit.
> > I experimented with mapping the shared space in low 32G and placed
> > right above the Java heap. The class space was also allocated in the
> > low 32G space and after the mapped shared space in the experiment. The
> > compress class encoding was using 0 base and 3 shift, which was the
> > same as the encoding when CDS was disabled. I didn't observe runtime
> > performance difference when comparing that specific configuration with
> > the normal CDS mapping scheme (the shared space start at 32G and the
> > encoding is non-zero base and 3 shift).
> >
> > Thanks,
> > Jiangli
> >> For modern architectures, I am not aware of any inherent speed benefit
> >> simply by putting data (in our case much larger than a page) "close to
> >> each other" in the virtual address space. If you have any reference of
> >> that, please let me know.
> >>
> >> Thanks
> >> - Ioi
> >>
> >>> Thanks,
> >>> Jiangli
> >>>
> >>>> Regards,
> >>>> Jiangli
> >>>>
> >>>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>> Hi Coleen,
> >>>>>
> >>>>> Thanks for the review. Here's an webrev that has incorporated your
> >>>>> suggestions:
> >>>>>
> >>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
> >>>>>
> >>>>> Please see comments in-line
> >>>>>
> >>>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
> >>>>>> Hi, I've done a more high level code review of this and it looks
> good!
> >>>>>>
> >>>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
> >>>>>>
> >>>>>>
> >>>>>> I think these classes require comments on what they do and why. The
> >>>>>> comments you sent me offline look good.
> >>>>> I added more comments for ArchivePtrMarker::_compacted per your
> offline
> >>>>> request.
> >>>>>
> >>>>>> Also .hpp files shouldn't include .inline.hpp files, like
> >>>>>> bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit()
> into
> >>>>>> the cpp file.
> >>>>> I moved the do_bit() function into archiveUtils.inline.hpp, since is
> >>>>> used by 3 .cpp files, and performance is important.
> >>>>>
> >>>>>> I wonder if the exception list of classes to exclude should be a
> >>>>>> function in javaClasses.hpp/cpp where the explanation would make
> more
> >>>>>> sense?  ie bool
> >>>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
> >>>>> I moved the checking code to javaClasses.cpp. Since we do (partially)
> >>>>> support java.lang.Class, which has injected native pointers, I named
> the
> >>>>> function as JavaClasses::is_supported_for_archiving instead. I also
> >>>>> massaged the comments a little for clarification.
> >>>>>
> >>>>>> Is there already an RFE to move the DumpSharedSpaces output from
> >>>>>> tty->print() to log_info() ?
> >>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change
> CDS
> >>>>> dumping tty->print_cr() to unified logging).
> >>>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
> >>>>>> Thanks,
> >>>>>> Coleen
> >>>>>>
> >>>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
> >>>>>>> Hi Jiangli,
> >>>>>>>
> >>>>>>> I've uploaded the webrev after integrating your comments:
> >>>>>>>
> >>>>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
> >>>>>>>
> >>>>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
> >>>>>>>
> >>>>>>>
> >>>>>>> Please see more replies below:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
> >>>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
> >>>>>>>> <mailto:ioi.lam at oracle.com>> wrote:
> >>>>>>>>
> >>>>>>>>       Hi Jiangli,
> >>>>>>>>
> >>>>>>>>       Thank you so much for spending time reviewing this RFE!
> >>>>>>>>
> >>>>>>>>       On 11/3/19 6:34 PM, Jiangli Zhou wrote:
> >>>>>>>>       > Hi Ioi,
> >>>>>>>>       >
> >>>>>>>>       > Sorry for the delay again. Will try to put this on the
> top of my
> >>>>>>>>       list
> >>>>>>>>       > next week and reduce the turn-around time. The updates
> look
> >>>>>>>> good in
> >>>>>>>>       > general.
> >>>>>>>>       >
> >>>>>>>>       > We might want to have a better strategy when choosing
> metadata
> >>>>>>>>       > relocation address (when relocation is needed). Some
> >>>>>>>>       > applications/benchmarks may be more sensitive to cache
> >>>>>>>> locality and
> >>>>>>>>       > memory/data layout. There was a bug,
> >>>>>>>>       > https://bugs.openjdk.java.net/browse/JDK-8213713 that
> caused
> >>>>>>>> 1G gap
> >>>>>>>>       > between Java heap data and metadata before JDK 12. The gap
> >>>>>>>> seemed to
> >>>>>>>>       > cause a small but noticeable runtime effect in one case
> that I
> >>>>>>>> came
> >>>>>>>>       > across.
> >>>>>>>>
> >>>>>>>>       I guess you're saying we should try to relocate the archive
> into
> >>>>>>>>       somewhere under 32GB?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I don't yet have sufficient data that determins if mapping at low
> >>>>>>>> 32G produces better runtime performance. I experimented with that,
> >>>>>>>> but didn't see noticeable difference when comparing to mapping at
> >>>>>>>> the current default address. It doesn't hurt, I think. So it may
> be
> >>>>>>>> a better choice than relocating to a random address in high 32G
> >>>>>>>> space (when Java heap is in low 32G address space).
> >>>>>>> Maybe we should reconsider this when we have more concrete data for
> >>>>>>> the benefits of moving the compressed class space to under 32G.
> >>>>>>>
> >>>>>>> Please note that in metaspace.cpp, when CDS is disabled and  the VM
> >>>>>>> fails to allocate the class space at the requested address
> >>>>>>> (0x7c000000 for 16GB heap), it also just allocates from a random
> >>>>>>> address (without trying to to search under 32GB):
> >>>>>>>
> >>>>>>>
> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
> >>>>>>>
> >>>>>>>
> >>>>>>> This code has been there since 2013 and we have not seen any
> issues.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>       Could you elaborate more about the performance issue,
> especially
> >>>>>>>>       about
> >>>>>>>>       cache locality? I looked at JDK-8213713 but it didn't
> mention about
> >>>>>>>>       performance.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> When enabling CDS we noticed a small runtime overhead in JDK 11
> >>>>>>>> recently with a benchmark. After I backported JDK-8213713 to 11,
> it
> >>>>>>>> seemed to reduce the runtime overhead that the benchmark was
> >>>>>>>> experiencing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       Also, by default, we have non-zero narrow_klass_base and
> >>>>>>>>       narrow_klass_shift = 3, and archive relocation doesn't
> change that:
> >>>>>>>>
> >>>>>>>>       $ java -Xlog:cds=debug -version
> >>>>>>>>       ... narrow_klass_base = 0x0000000800000000,
> narrow_klass_shift = 3
> >>>>>>>>       $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
> >>>>>>>>       ... narrow_klass_base = 0x00007f1e8b499000,
> narrow_klass_shift = 3
> >>>>>>>>
> >>>>>>>>       We always use narrow_klass_shift due to this:
> >>>>>>>>
> >>>>>>>>          // CDS uses LogKlassAlignmentInBytes for
> narrow_klass_shift. See
> >>>>>>>>          //
> >>>>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
> >>>>>>>>          // how dump time narrow_klass_shift is set. Although,
> CDS can
> >>>>>>>> work
> >>>>>>>>          // with zero-shift mode also, to be consistent with AOT
> it uses
> >>>>>>>>          // LogKlassAlignmentInBytes for klass shift so archived
> java
> >>>>>>>>       heap objects
> >>>>>>>>          // can be used at same time as AOT code.
> >>>>>>>>          if (!UseSharedSpaces
> >>>>>>>>              && (uint64_t)(higher_address - lower_base) <=
> >>>>>>>>       UnscaledClassSpaceMax) {
> >>>>>>>>            CompressedKlassPointers::set_shift(0);
> >>>>>>>>          } else {
> >>>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
> >>>>>>>>          }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Right. If we relocate to low 32G space, it needs to make sure that
> >>>>>>>> the range containing the mapped class data and class space must be
> >>>>>>>> encodable.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       > Here are some additional comments (minor).
> >>>>>>>>       >
> >>>>>>>>       > Could you please fix the long lines in the following?
> >>>>>>>>       >
> >>>>>>>>       > 1237 void
> >>>>>>>>
> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
> >>>>>>>>       > archived_mirror) {
> >>>>>>>>       > 1238   if (MetaspaceShared::relocation_delta() != 0) {
> >>>>>>>>       > 1239
> assert(archived_mirror->metadata_field(_klass_offset) ==
> >>>>>>>>       > NULL, "must be for primitive class");
> >>>>>>>>       > 1240
> >>>>>>>>       > 1241     Klass* ak =
> >>>>>>>>       >
> ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
> >>>>>>>>       > 1242     if (ak != NULL) {
> >>>>>>>>       > 1243
> archived_mirror->metadata_field_put(_array_klass_offset,
> >>>>>>>>       > (Klass*)(address(ak) +
> MetaspaceShared::relocation_delta()));
> >>>>>>>>       > 1244     }
> >>>>>>>>       > 1245   }
> >>>>>>>>       > 1246 }
> >>>>>>>>       >
> >>>>>>>>       > src/hotspot/share/memory/dynamicArchive.cpp
> >>>>>>>>       >
> >>>>>>>>       >   889   Thread* THREAD = Thread::current();
> >>>>>>>>       >   890   Method::sort_methods(ik->methods(),
> /*set_idnums=*/true,
> >>>>>>>>       > dynamic_dump_method_comparator);
> >>>>>>>>       >   891   if (ik->default_methods() != NULL) {
> >>>>>>>>       >   892  Method::sort_methods(ik->default_methods(),
> >>>>>>>>       > /*set_idnums=*/false, dynamic_dump_method_comparator);
> >>>>>>>>       >   893   }
> >>>>>>>>       >
> >>>>>>>>
> >>>>>>>>       OK will do.
> >>>>>>>>
> >>>>>>>>       > Please see inlined comments below.
> >>>>>>>>       >
> >>>>>>>>       > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <
> ioi.lam at oracle.com
> >>>>>>>>       <mailto:ioi.lam at oracle.com>> wrote:
> >>>>>>>>       >> Hi Jiangli,
> >>>>>>>>       >>
> >>>>>>>>       >> Thanks for the review. I've updated the patch according
> to your
> >>>>>>>>       comments:
> >>>>>>>>       >>
> >>>>>>>>       >>
> >>>>>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
> >>>>>>>>
> >>>>>>>>       >>
> >>>>>>>>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
> >>>>>>>>
> >>>>>>>>       >>
> >>>>>>>>       >> (the delta is on top of
> 8231610-relocate-cds-archive.v03.delta
> >>>>>>>>       in my
> >>>>>>>>       >> reply to Calvin's comments).
> >>>>>>>>       >>
> >>>>>>>>       >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
> >>>>>>>>       >>> Hi Ioi,
> >>>>>>>>       >>>
> >>>>>>>>       >>> Sorry for the delay. Here are my remaining comments.
> >>>>>>>>       >>>
> >>>>>>>>       >>> - src/hotspot/share/memory/dynamicArchive.cpp
> >>>>>>>>       >>>
> >>>>>>>>       >>> 128   static intx _method_comparator_name_delta;
> >>>>>>>>       >>>
> >>>>>>>>       >>> The name of the above variable is confusing. It's the
> value of
> >>>>>>>>       >>> _buffer_to_target_delta. It's better to
> _buffer_to_target_delta
> >>>>>>>>       >>> directly.
> >>>>>>>>       >> _buffer_to_target_delta is a non-static field, but
> >>>>>>>>       >> dynamic_dump_method_comparator() must be a static
> function so
> >>>>>>>>       it can't
> >>>>>>>>       >> use the non-static field easily.
> >>>>>>>>       >
> >>>>>>>>       > It sounds like an issue. _buffer_to_target_delta was made
> as a
> >>>>>>>>       > non-static mostly because we might support more than one
> dynamic
> >>>>>>>>       > archives in the future. However, today's usages bake in an
> >>>>>>>>       assumption
> >>>>>>>>       > that _buffer_to_target_delta is a singleton value. It is
> >>>>>>>> cleaner to
> >>>>>>>>       > either make _buffer_to_target_delta as a static variable
> for
> >>>>>>>> now, or
> >>>>>>>>       > adding an access API in DynamicArchiveBuilder to allow
> other
> >>>>>>>> code to
> >>>>>>>>       > properly and correctly use the value.
> >>>>>>>>
> >>>>>>>>       OK, I'll move it to a static variable.
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >>> Also, we can do a quick pointer comparison of 'a_name'
> and
> >>>>>>>>       >>> 'b_name' first before adjusting the pointers.
> >>>>>>>>       >> I added this:
> >>>>>>>>       >>
> >>>>>>>>       >>       if (a_name == b_name) {
> >>>>>>>>       >>         return 0;
> >>>>>>>>       >>       }
> >>>>>>>>       >>
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>> 934 void
> DynamicArchiveBuilder::relocate_buffer_to_target() {
> >>>>>>>>       >>> ...
> >>>>>>>>       >>>    944
> >>>>>>>>       >>>    945  ArchivePtrMarker::compact(relocatable_base,
> >>>>>>>>       relocatable_end);
> >>>>>>>>       >>> ...
> >>>>>>>>       >>>
> >>>>>>>>       >>>    974     SharedDataRelocator
> patcher((address*)patch_base,
> >>>>>>>>       >>> (address*)patch_end, valid_old_base, valid_old_end,
> >>>>>>>>       >>>    975  valid_new_base, valid_new_end, addr_delta);
> >>>>>>>>       >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
> >>>>>>>>       >>>
> >>>>>>>>       >>> Could we reduce the number of data re-iterations to help
> >>>>>>>> archive
> >>>>>>>>       >>> dumping performance. The ArchivePtrMarker::compact
> operation
> >>>>>>>>       can be
> >>>>>>>>       >>> combined with the patching iteration.
> >>>>>>>>       ArchivePtrMarker::compact API
> >>>>>>>>       >>> can be removed.
> >>>>>>>>       >> That's a good idea. I implemented it using a template
> parameter
> >>>>>>>>       so that
> >>>>>>>>       >> we can have max performance when relocating the archive
> at run
> >>>>>>>>       time.
> >>>>>>>>       >>
> >>>>>>>>       >> I added comments to explain why the relocation is done
> here. The
> >>>>>>>>       >> relocation is pretty rare (only when the base archive
> was not
> >>>>>>>>       mapped at
> >>>>>>>>       >> the default location).
> >>>>>>>>       >>
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>>    967     address valid_new_base =
> >>>>>>>>       >>> (address)Arguments::default_SharedBaseAddress();
> >>>>>>>>       >>>    968     address valid_new_end  = valid_new_base +
> >>>>>>>>       base_plus_top_size;
> >>>>>>>>       >>>
> >>>>>>>>       >>> The debugging only code can be included under #ifdef
> ASSERT.
> >>>>>>>>       >> These values are actually also used in debug logging so
> they
> >>>>>>>>       can't be
> >>>>>>>>       >> ifdef'ed out.
> >>>>>>>>       >>
> >>>>>>>>       >> Also, the c++ compiler is pretty good with eliding code
> >>>>>>>> that's no
> >>>>>>>>       >> actually used. If I comment out all the logging code in
> >>>>>>>>       >> DynamicArchiveBuilder::relocate_buffer_to_target() and
> >>>>>>>>       >> SharedDataRelocator, gcc elides all the unused fields
> and their
> >>>>>>>>       >> assignments. So no code is generated for this, etc.
> >>>>>>>>       >>
> >>>>>>>>       >>       address valid_new_base =
> >>>>>>>>       >> (address)Arguments::default_SharedBaseAddress();
> >>>>>>>>       >>
> >>>>>>>>       >> Since #ifdef ASSERT makes the code harder to read, I
> think we
> >>>>>>>>       should use
> >>>>>>>>       >> it only when really necessary.
> >>>>>>>>       > It seems cleaner to get rid of these debugging only
> variables, by
> >>>>>>>>       > using 'relocatable_base' and
> >>>>>>>>       > '(address)Arguments::default_SharedBaseAddress()' in the
> logging
> >>>>>>>>       code.
> >>>>>>>>
> >>>>>>>>       SharedDataRelocator is used under 3 different situations.
> These six
> >>>>>>>>       variables (patch_base, patch_end, valid_old_base,
> valid_old_end,
> >>>>>>>>       valid_new_base, valid_new_end) describes what is being
> patched,
> >>>>>>>>       and what
> >>>>>>>>       the expectations are, for each situation. The code will be
> hard to
> >>>>>>>>       understand without them.
> >>>>>>>>
> >>>>>>>>       Please note there's also logging code in the
> SharedDataRelocator
> >>>>>>>>       constructor that prints out these values.
> >>>>>>>>
> >>>>>>>>       I think I'll just remove the 'debug only' comment to avoid
> >>>>>>>> confusion.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>>    993
> >>>>>>>>    dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
> >>>>>>>>       >>>
> >>>>>>>>       >>> We could combine the archived heap data bitmap into the
> new
> >>>>>>>>       region as
> >>>>>>>>       >>> well? It can be handled as a separate RFE.
> >>>>>>>>       >> I've filed
> https://bugs.openjdk.java.net/browse/JDK-8233093
> >>>>>>>>       >>
> >>>>>>>>       >>> - src/hotspot/share/memory/filemap.cpp
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1038     if (is_static()) {
> >>>>>>>>       >>> 1039       if (errno == ENOENT) {
> >>>>>>>>       >>> 1040         // Not locating the shared archive is ok.
> >>>>>>>>       >>> 1041         fail_continue("Specified shared archive
> not found
> >>>>>>>>       (%s).",
> >>>>>>>>       >>> _full_path);
> >>>>>>>>       >>> 1042       } else {
> >>>>>>>>       >>> 1043         fail_continue("Failed to open shared
> archive file
> >>>>>>>>       (%s).",
> >>>>>>>>       >>> 1044  os::strerror(errno));
> >>>>>>>>       >>> 1045       }
> >>>>>>>>       >>> 1046     } else {
> >>>>>>>>       >>> 1047       log_warning(cds, dynamic)("specified dynamic
> archive
> >>>>>>>>       >>> doesn't exist: %s", _full_path);
> >>>>>>>>       >>> 1048     }
> >>>>>>>>       >>>
> >>>>>>>>       >>> If the top layer is explicitly specified by the user, a
> >>>>>>>>       warning does
> >>>>>>>>       >>> not seem to be a proper behavior if the VM fails to
> open the
> >>>>>>>>       archive
> >>>>>>>>       >>> file.
> >>>>>>>>       >>>
> >>>>>>>>       >>> If might be better to handle the relocation unrelated
> code in
> >>>>>>>>       separate
> >>>>>>>>       >>> changeset and track with a separate RFE.
> >>>>>>>>       >> This code was moved from
> >>>>>>>>       >>
> >>>>>>>>       >>
> >>>>>>>>
> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
> >>>>>>>>
> >>>>>>>>       >>
> >>>>>>>>       >> so I am not changing the behavior. If you want, we can
> file an
> >>>>>>>>       REF to
> >>>>>>>>       >> change the behavior.
> >>>>>>>>       > Ok. A new RFE sounds like the right thing to re-evaluable
> the
> >>>>>>>> usage
> >>>>>>>>       > issue here. Thanks.
> >>>>>>>>
> >>>>>>>>       I created https://bugs.openjdk.java.net/browse/JDK-8233446
> >>>>>>>>
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1148 void FileMapInfo::write_region(int region, char*
> base,
> >>>>>>>>       size_t size,
> >>>>>>>>       >>> 1149                                bool read_only, bool
> >>>>>>>>       allow_exec) {
> >>>>>>>>       >>> ...
> >>>>>>>>       >>> 1154
> >>>>>>>>       >>> 1155   if (region == MetaspaceShared::bm) {
> >>>>>>>>       >>> 1156     target_base = NULL;
> >>>>>>>>       >>> 1157   } else if (DynamicDumpSharedSpaces) {
> >>>>>>>>       >>>
> >>>>>>>>       >>> It's not too clear to me how the bitmap (bm) region is
> handled
> >>>>>>>>       for the
> >>>>>>>>       >>> base layer and top layer. Could you please explain?
> >>>>>>>>       >> The bm region for both layers are mapped at an address
> picked
> >>>>>>>>       by the OS:
> >>>>>>>>       >>
> >>>>>>>>       >> char* FileMapInfo::map_relocation_bitmap(size_t&
> bitmap_size) {
> >>>>>>>>       >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
> >>>>>>>>       >>     bitmap_size = si->used_aligned();
> >>>>>>>>       >>     bool read_only = true, allow_exec = false;
> >>>>>>>>       >>     char* requested_addr = NULL; // allow OS to pick any
> >>>>>>>> location
> >>>>>>>>       >>     char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>>>>>>       si->file_offset(),
> >>>>>>>>       >> requested_addr, bitmap_size,
> >>>>>>>>       >> read_only, allow_exec);
> >>>>>>>>       >>
> >>>>>>>>       > Ok, after staring at the code for a few seconds I saw
> that's
> >>>>>>>>       intended.
> >>>>>>>>       > If the current region is 'bm', then the 'target_base' is
> NULL
> >>>>>>>>       > regardless if it's static or dynamic archive. Otherwise,
> the
> >>>>>>>>       > 'target_base' is handled differently for the static and
> dynamic
> >>>>>>>>       case.
> >>>>>>>>       > The following would be cleaner and has better reliability.
> >>>>>>>>       >
> >>>>>>>>       >     char* target_base = NULL;
> >>>>>>>>       >
> >>>>>>>>       >     // The target_base is NULL for 'bm' region.
> >>>>>>>>       >     if (!region == MetaspaceShared::bm) {
> >>>>>>>>       >       if (DynamicDumpSharedSpaces) {
> >>>>>>>>       >         assert(!HeapShared::is_heap_region(region),
> "dynamic
> >>>>>>>> archive
> >>>>>>>>       > doesn't support heap regions");
> >>>>>>>>       >         target_base =
> DynamicArchive::buffer_to_target(base);
> >>>>>>>>       >       } else {
> >>>>>>>>       >         target_base = base;
> >>>>>>>>       >       }
> >>>>>>>>       >    }
> >>>>>>>>
> >>>>>>>>       How about this?
> >>>>>>>>
> >>>>>>>>          char* target_base;
> >>>>>>>>          if (region == MetaspaceShared::bm) {
> >>>>>>>>            target_base = NULL; // always NULL for bm region.
> >>>>>>>>          } else {
> >>>>>>>>            if (DynamicDumpSharedSpaces) {
> >>>>>>>>                assert(!HeapShared::is_heap_region(region),
> "dynamic
> >>>>>>>> archive
> >>>>>>>>       doesn't support heap regions");
> >>>>>>>>                target_base =
> DynamicArchive::buffer_to_target(base);
> >>>>>>>>            } else {
> >>>>>>>>                target_base = base;
> >>>>>>>>            }
> >>>>>>>>          }
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> No objection If you prefer the extra 'else' block.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1362
> >>>>>>>>
> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>>>>>>
> >>>>>>>>       >>>
> >>>>>>>>       >>> Could you please explain the above?
> >>>>>>>>       >> I added the comments
> >>>>>>>>       >>
> >>>>>>>>       >>     // Make sure we don't attempt to use
> >>>>>>>>       header()->mapped_base_address()
> >>>>>>>>       >> unless
> >>>>>>>>       >>     // it's been successfully mapped.
> >>>>>>>>       >>
> >>>>>>>>
> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
> >>>>>>>>
> >>>>>>>>       >>
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1359   FileMapRegion* last_region = NULL;
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1371     if (last_region != NULL) {
> >>>>>>>>       >>> 1372       // Ensure that the OS won't be able to
> allocate new
> >>>>>>>>       memory
> >>>>>>>>       >>> spaces between any mapped
> >>>>>>>>       >>> 1373       // regions, or else it would mess up the
> simple
> >>>>>>>>       comparision
> >>>>>>>>       >>> in MetaspaceObj::is_shared().
> >>>>>>>>       >>> 1374       assert(si->mapped_base() ==
> >>>>>>>> last_region->mapped_end(),
> >>>>>>>>       >>> "must have no gaps");
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1379     last_region = si;
> >>>>>>>>       >>>
> >>>>>>>>       >>> Can you please place 'last_region' related code under
> #ifdef
> >>>>>>>>       ASSERT?
> >>>>>>>>       >> I think that will make the code more cluttered. The
> compiler
> >>>>>>>> will
> >>>>>>>>       >> optimize out that away.
> >>>>>>>>       > It's cleaner to define debugging only variable for
> debugging only
> >>>>>>>>       > builds. You can wrapper it and related usage with
> DEBUG_ONLY.
> >>>>>>>>
> >>>>>>>>       OK, will do.
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
> >>>>>>>>       bitmap_size) {
> >>>>>>>>       >>> 1479   FileMapRegion* si =
> space_at(MetaspaceShared::bm);
> >>>>>>>>       >>> 1480   bitmap_size = si->used_aligned();
> >>>>>>>>       >>> 1481   bool read_only = true, allow_exec = false;
> >>>>>>>>       >>> 1482   char* requested_addr = NULL; // allow OS to pick
> any
> >>>>>>>>       location
> >>>>>>>>       >>> 1483   char* bitmap_base = os::map_memory(_fd,
> _full_path,
> >>>>>>>>       si->file_offset(),
> >>>>>>>>       >>> 1484 requested_addr, bitmap_size,
> >>>>>>>>       >>> read_only, allow_exec);
> >>>>>>>>       >>>
> >>>>>>>>       >>> We need to handle mapping failure here.
> >>>>>>>>       >> It's handled here:
> >>>>>>>>       >>
> >>>>>>>>       >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
> >>>>>>>>       >>     log_debug(cds, reloc)("runtime archive relocation
> start");
> >>>>>>>>       >>     size_t bitmap_size;
> >>>>>>>>       >>     char* bitmap_base =
> map_relocation_bitmap(bitmap_size);
> >>>>>>>>       >>     if (bitmap_base != NULL) {
> >>>>>>>>       >>     ...
> >>>>>>>>       >>     } else {
> >>>>>>>>       >>       log_error(cds)("failed to map relocation bitmap");
> >>>>>>>>       >>       return false;
> >>>>>>>>       >>     }
> >>>>>>>>       >>
> >>>>>>>>       > 'bitmap_base' is used immediately after map_memory(). So
> the
> >>>>>>>> check
> >>>>>>>>       > needs to be done immediately after map_memory(), but not
> in the
> >>>>>>>>       caller
> >>>>>>>>       > of map_relocation_bitmap().
> >>>>>>>>       >
> >>>>>>>>       > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
> >>>>>>>>       si->file_offset(),
> >>>>>>>>       > 1491 requested_addr, bitmap_size,
> >>>>>>>>       > read_only, allow_exec);
> >>>>>>>>       > 1492
> >>>>>>>>       > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
> >>>>>>>>       > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
> >>>>>>>>
> >>>>>>>>       OK, I'll fix that.
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1513     // debug only -- the current value of the
> pointers
> >>>>>>>> to be
> >>>>>>>>       >>> patched must be within this
> >>>>>>>>       >>> 1514     // range (i.e., must be between the requesed
> base
> >>>>>>>>       address,
> >>>>>>>>       >>> and the of the current archive).
> >>>>>>>>       >>> 1515     // Note: top archive may point to objects in
> the base
> >>>>>>>>       >>> archive, but not the other way around.
> >>>>>>>>       >>> 1516     address valid_old_base =
> >>>>>>>>       (address)header()->requested_base_address();
> >>>>>>>>       >>> 1517     address valid_old_end  = valid_old_base +
> >>>>>>>>       mapping_end_offset();
> >>>>>>>>       >>>
> >>>>>>>>       >>> Please place all FileMapInfo::relocate_pointers
> debugging only
> >>>>>>>>       code
> >>>>>>>>       >>> under #ifdef ASSERT.
> >>>>>>>>       >> Ditto about ifdef ASSERT
> >>>>>>>>       >>
> >>>>>>>>       >>> - src/hotspot/share/memory/heapShared.cpp
> >>>>>>>>       >>>
> >>>>>>>>       >>>    441 void
> >>>>>>>>       HeapShared::initialize_from_archived_subgraph(Klass* k) {
> >>>>>>>>       >>>    442   if (!open_archive_heap_region_mapped() ||
> >>>>>>>>       !MetaspaceObj::is_shared(k)) {
> >>>>>>>>       >>>    443     return; // nothing to do
> >>>>>>>>       >>>    444   }
> >>>>>>>>       >>>
> >>>>>>>>       >>> When do we call
> HeapShared::initialize_from_archived_subgraph
> >>>>>>>>       for a
> >>>>>>>>       >>> klass that's not shared?
> >>>>>>>>       >> I've removed the !MetaspaceObj::is_shared(k). I probably
> added
> >>>>>>>>       that for
> >>>>>>>>       >> debugging purposes only.
> >>>>>>>>       >>
> >>>>>>>>       >>>    616   DEBUG_ONLY({
> >>>>>>>>       >>>    617       Klass* klass = orig_obj->klass();
> >>>>>>>>       >>>    618       assert(klass !=
> >>>>>>>> SystemDictionary::Module_klass() &&
> >>>>>>>>       >>>    619              klass !=
> >>>>>>>>       SystemDictionary::ResolvedMethodName_klass() &&
> >>>>>>>>       >>>    620              klass !=
> >>>>>>>>       SystemDictionary::MemberName_klass() &&
> >>>>>>>>       >>>    621              klass !=
> >>>>>>>> SystemDictionary::Context_klass() &&
> >>>>>>>>       >>>    622              klass !=
> >>>>>>>>       SystemDictionary::ClassLoader_klass(), "we
> >>>>>>>>       >>> can only relocate metaspace object pointers inside
> >>>>>>>> java_lang_Class
> >>>>>>>>       >>> instances");
> >>>>>>>>       >>>    623     });
> >>>>>>>>       >>>
> >>>>>>>>       >>> Let's leave the above for a separate RFE. I think
> assert is not
> >>>>>>>>       >>> sufficient for the check. Also, why ResolvedMethodName,
> >>>>>>>> Module and
> >>>>>>>>       >>> MemberName cannot be part of the graph?
> >>>>>>>>       >>>
> >>>>>>>>       >>>
> >>>>>>>>       >> I added the following comment:
> >>>>>>>>       >>
> >>>>>>>>       >>     DEBUG_ONLY({
> >>>>>>>>       >>         // The following are classes in
> >>>>>>>>       share/classfile/javaClasses.cpp
> >>>>>>>>       >> that have injected native pointers
> >>>>>>>>       >>         // to metaspace objects. To support these
> classes, we
> >>>>>>>>       need to add
> >>>>>>>>       >> relocation code similar to
> >>>>>>>>       >>         //
> >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
> >>>>>>>>       >>         Klass* klass = orig_obj->klass();
> >>>>>>>>       >>         assert(klass != SystemDictionary::Module_klass()
> &&
> >>>>>>>>       >>                klass !=
> >>>>>>>>       SystemDictionary::ResolvedMethodName_klass() &&
> >>>>>>>>       >>
> >>>>>>>>       > It's too restrictive to exclude those objects from the
> archived
> >>>>>>>>       object
> >>>>>>>>       > graph because metadata relocation, since metadata
> relocation is
> >>>>>>>>       rare.
> >>>>>>>>       > The trade-off doesn't seem to buy us much.
> >>>>>>>>       >
> >>>>>>>>       > Do you plan to add the needed relocation code?
> >>>>>>>>
> >>>>>>>>       I looked more into this. Actually we cannot handle these 5
> >>>>>>>> classes at
> >>>>>>>>       all, even without archive relocation:
> >>>>>>>>
> >>>>>>>>       [1] #define MODULE_INJECTED_FIELDS(macro) \
> >>>>>>>>          macro(java_lang_Module, module_entry, intptr_signature,
> false)
> >>>>>>>>
> >>>>>>>>       ->  module_entry is malloc'ed
> >>>>>>>>
> >>>>>>>>       [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
> >>>>>>>>          macro(java_lang_invoke_ResolvedMethodName, vmholder,
> >>>>>>>>       object_signature, false) \
> >>>>>>>>          macro(java_lang_invoke_ResolvedMethodName, vmtarget,
> >>>>>>>>       intptr_signature, false)
> >>>>>>>>
> >>>>>>>>       -> these fields are related to method handles and lambda
> forms,
> >>>>>>>> etc.
> >>>>>>>>       They can't be easily be archived without implementing
> lambda form
> >>>>>>>>       archiving. (I did a prototype; it's very complex and
> fragile).
> >>>>>>>>
> >>>>>>>>       [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
> >>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> >>>>>>>>       vmdependencies, intptr_signature, false) \
> >>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
> >>>>>>>>       last_cleanup, long_signature, false)
> >>>>>>>>
> >>>>>>>>       -> vmdependencies is malloc'ed.
> >>>>>>>>
> >>>>>>>>       [4] #define
> >>>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
> >>>>>>>>          macro(java_lang_invoke_MemberName, vmindex,
> intptr_signature,
> >>>>>>>>       false)
> >>>>>>>>
> >>>>>>>>       -> this one is probably OK. Despite being declared as
> >>>>>>>>       'intptr_signature', it seems to be used just as an integer.
> >>>>>>>> However,
> >>>>>>>>       MemberNames are typically used with [2] and [3]. So let's
> just
> >>>>>>>>       forbid it
> >>>>>>>>       to be safe.
> >>>>>>>>
> >>>>>>>>       [2] [3] [4] are not used directly by regular Java code and
> are
> >>>>>>>>       unlikely
> >>>>>>>>       to be referenced (directly or indirectly) by static fields
> (except
> >>>>>>>>       for
> >>>>>>>>       the static fields in the classes in java.lang.invoke, which
> we
> >>>>>>>>       probably
> >>>>>>>>       won't support for heap archiving due to the problem I
> described for
> >>>>>>>>       [2]). Objects of these types are typically referenced via
> constant
> >>>>>>>>       pool
> >>>>>>>>       entries.
> >>>>>>>>
> >>>>>>>>       [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
> >>>>>>>>          macro(java_lang_ClassLoader, loader_data,
> intptr_signature,
> >>>>>>>> false)
> >>>>>>>>
> >>>>>>>>       -> loader_data is malloc'ed.
> >>>>>>>>
> >>>>>>>>       So, I will change the DEBUG_ONLY into a product-mode check,
> and
> >>>>>>>> quit
> >>>>>>>>       dumping if these objects are found in the object subgraph.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Sounds good. Can you please also add a comment with explanation.
> >>>>>>>>
> >>>>>>>> For  ClassLoader and Module, it worth considering caching the
> >>>>>>>> additional native data some time in the future. Lois had suggested
> >>>>>>>> the Module part a while ago.
> >>>>>>> I think we can do that if/when we archive Modules directly into the
> >>>>>>> shared heap.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       Maybe we should backport the check to older versions as
> well?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> We should discuss with Andrew Haley for backports to JDK 11 update
> >>>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
> >>>>>>>> archiving to a restricted set of JDK library code, I think it is
> >>>>>>>> safe without the new check.
> >>>>>>>>
> >>>>>>>> For non-LTS releases, it might not be worthwhile as they may not
> be
> >>>>>>>> widely used?
> >>>>>>> I agree. FYI, we (Oracle) have no plan for backporting more types
> of
> >>>>>>> heap object archiving, so the decision would be up to whoever that
> >>>>>>> decides to do so.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> - Ioi
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jiangli
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >>> - src/hotspot/share/memory/metaspace.cpp
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1036   metaspace_rs =
> >>>>>>>> ReservedSpace(compressed_class_space_size(),
> >>>>>>>>       >>> 1037   _reserve_alignment,
> >>>>>>>>       >>> 1038   large_pages,
> >>>>>>>>       >>> 1039   requested_addr);
> >>>>>>>>       >>>
> >>>>>>>>       >>> Please fix indentation.
> >>>>>>>>       >> Fixed.
> >>>>>>>>       >>
> >>>>>>>>       >>> - src/hotspot/share/memory/metaspaceClosure.hpp
> >>>>>>>>       >>>
> >>>>>>>>       >>>     78   enum SpecialRef {
> >>>>>>>>       >>>     79     _method_entry_ref
> >>>>>>>>       >>>     80   };
> >>>>>>>>       >>>
> >>>>>>>>       >>> Are there other pointers that are not references to
> >>>>>>>>       MetaspaceObj? If
> >>>>>>>>       >>> _method_entry_ref is the only type, it's probably not
> worth
> >>>>>>>>       defining
> >>>>>>>>       >>> SpecialRef?
> >>>>>>>>       >> There may be more types in the future, so I want to have
> a
> >>>>>>>>       stable API
> >>>>>>>>       >> that can be easily expanded without touching all the
> code that
> >>>>>>>>       uses it.
> >>>>>>>>       >>
> >>>>>>>>       >>
> >>>>>>>>       >>> - src/hotspot/share/memory/metaspaceShared.hpp
> >>>>>>>>       >>>
> >>>>>>>>       >>>     42 enum MapArchiveResult {
> >>>>>>>>       >>>     43   MAP_ARCHIVE_SUCCESS,
> >>>>>>>>       >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
> >>>>>>>>       >>>     45   MAP_ARCHIVE_OTHER_FAILURE
> >>>>>>>>       >>>     46 };
> >>>>>>>>       >>>
> >>>>>>>>       >>> If we want to define different failure types, it's
> probably
> >>>>>>>> worth
> >>>>>>>>       >>> using separate types for relocation failure and
> validation
> >>>>>>>>       failure.
> >>>>>>>>       >> For now, I just need to distinguish between MMAP_FAILURE
> (where
> >>>>>>>>       I should
> >>>>>>>>       >> attempt to remap at an alternative address) and
> OTHER_FAILURE
> >>>>>>>>       (where the
> >>>>>>>>       >> CDS archive loading will fail -- due to validation error,
> >>>>>>>>       insufficient
> >>>>>>>>       >> memory, etc -- without attempting to remap.)
> >>>>>>>>       >>
> >>>>>>>>       >>> ---
> >>>>>>>>       >>>
> >>>>>>>>       >>>    193   static intx _mapping_delta; // FIXME rename
> >>>>>>>>       >>>
> >>>>>>>>       >>> How about _relocation_delta?
> >>>>>>>>       >> Changed as suggested.
> >>>>>>>>       >>
> >>>>>>>>       >>> - src/hotspot/share/oops/instanceKlass
> >>>>>>>>       >>>
> >>>>>>>>       >>> 1573 bool InstanceKlass::_disable_method_binary_search
> = false;
> >>>>>>>>       >>>
> >>>>>>>>       >>> The use of _disable_method_binary_search is not
> necessary. You
> >>>>>>>>       can use
> >>>>>>>>       >>> DynamicDumpSharedSpaces for the purpose. That would
> make things
> >>>>>>>>       >>> cleaner.
> >>>>>>>>       >> If we always disable the binary search when
> >>>>>>>>       DynamicDumpSharedSpaces is
> >>>>>>>>       >> true, it will slow down normal execution of the Java
> program
> >>>>>>>> when
> >>>>>>>>       >> -XX:ArchiveClassesAtExit has been specified, but the
> program
> >>>>>>>>       hasn't exited.
> >>>>>>>>       > Could you please add some comments to
> >>>>>>>> _disable_method_binary_search
> >>>>>>>>       > with the above explanation? Thanks.
> >>>>>>>>
> >>>>>>>>       OK
> >>>>>>>>       >
> >>>>>>>>       >>> -
> test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
> >>>>>>>>       >>>
> >>>>>>>>       >>>     76                     if (name.equals("s0") ||
> >>>>>>>>       name.equals("s1")) {
> >>>>>>>>       >>>     77                       // String regions are
> listed at
> >>>>>>>>       the end and
> >>>>>>>>       >>> they may not be fully occupied.
> >>>>>>>>       >>>     78                       break;
> >>>>>>>>       >>>     79                     } else if
> (name.equals("bm")) {
> >>>>>>>>       >>>     80                       // Bitmap space does not
> have a
> >>>>>>>>       requested address.
> >>>>>>>>       >>>     81                       break;
> >>>>>>>>       >>>
> >>>>>>>>       >>> It's not part of your change, but could you please fix
> line 76
> >>>>>>>>       - 78
> >>>>>>>>       >>> since it is trivial. It seems the lines can be removed.
> >>>>>>>>       >> Removed.
> >>>>>>>>       >>
> >>>>>>>>       >>> - /src/hotspot/share/memory/archiveUtils.hpp
> >>>>>>>>       >>> The file name does not match with the macro '#ifndef
> >>>>>>>>       >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please
> rename
> >>>>>>>>       >>> archiveUtils.* ? archiveRelocator.hpp and
> >>>>>>>> archiveRelocator.cpp are
> >>>>>>>>       >>> more descriptive.
> >>>>>>>>       >> I named the file archiveUtils.hpp so we can move other
> misc
> >>>>>>>>       stuff used
> >>>>>>>>       >> by dumping into this file (e.g., DumpRegion,
> WriteClosure from
> >>>>>>>>       >> metaspaceShared.hpp), since theses are not used by the
> majority
> >>>>>>>>       of the
> >>>>>>>>       >> files that use metaspaceShared.hpp.
> >>>>>>>>       >>
> >>>>>>>>       >> I fixed the ifdef.
> >>>>>>>>       >>
> >>>>>>>>       >>> - src/hotspot/share/memory/archiveUtils.cpp
> >>>>>>>>       >>>
> >>>>>>>>       >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap*
> ptrmap,
> >>>>>>>>       address*
> >>>>>>>>       >>> ptr_base, address* ptr_end) {
> >>>>>>>>       >>>     37   assert(_ptrmap == NULL, "initialize only
> once");
> >>>>>>>>       >>>     38   _ptr_base = ptr_base;
> >>>>>>>>       >>>     39   _ptr_end = ptr_end;
> >>>>>>>>       >>>     40   _compacted = false;
> >>>>>>>>       >>>     41   _ptrmap = ptrmap;
> >>>>>>>>       >>>     42   _ptrmap->initialize(12 * M /
> sizeof(intptr_t)); //
> >>>>>>>>       default
> >>>>>>>>       >>> archive is about 12MB.
> >>>>>>>>       >>>     43 }
> >>>>>>>>       >>>
> >>>>>>>>       >>> Could we do a better estimate here? We could
> guesstimate the
> >>>>>>>> size
> >>>>>>>>       >>> based on the current used class space and metaspace
> size. It's
> >>>>>>>>       okay if
> >>>>>>>>       >>> a larger bitmap used, since it can be reduced after all
> >>>>>>>>       marking are
> >>>>>>>>       >>> done.
> >>>>>>>>       >> The bitmap is automatically expanded when necessary in
> >>>>>>>>       >> ArchivePtrMarker::mark_pointer(). It's only about 1/32
> or 1/64
> >>>>>>>>       of the
> >>>>>>>>       >> total archive size, so even if we do expand, the cost
> will be
> >>>>>>>>       trivial.
> >>>>>>>>       > The initial value is based on the default CDS archive.
> When
> >>>>>>>> dealing
> >>>>>>>>       > with a really large archive, it would have to re-grow
> many times.
> >>>>>>>>       > Also, using a hard-coded value is less desirable.
> >>>>>>>>
> >>>>>>>>       OK, I changed it to the following
> >>>>>>>>
> >>>>>>>>          // Use this as initial guesstimate. We should need less
> space
> >>>>>>>>       in the
> >>>>>>>>          // archive, but if we're wrong the bitmap will be
> expanded
> >>>>>>>>       automatically.
> >>>>>>>>          size_t estimated_archive_size =
> >>>>>>>> MetaspaceGC::capacity_until_GC();
> >>>>>>>>          // But set it smaller in debug builds so we always test
> the
> >>>>>>>>       expansion
> >>>>>>>>       code.
> >>>>>>>>          // (Default archive is about 12MB).
> >>>>>>>>          DEBUG_ONLY(estimated_archive_size = 6 * M);
> >>>>>>>>
> >>>>>>>>          // We need one bit per pointer in the archive.
> >>>>>>>>          _ptrmap->initialize(estimated_archive_size /
> sizeof(intptr_t));
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       Thanks!
> >>>>>>>>       - Ioi
> >>>>>>>>
> >>>>>>>>       >
> >>>>>>>>       >>>
> >>>>>>>>       >>>
> >>>>>>>>       >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
> >>>>>>>>       <jianglizhou at google.com <mailto:jianglizhou at google.com>>
> wrote:
> >>>>>>>>       >>>> Hi Ioi,
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> This is another great step for CDS usability
> improvement.
> >>>>>>>>       Thank you!
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> I have a high level question (or request): could we
> consider
> >>>>>>>>       >>>> separating the relocation work for 'direct' class
> metadata
> >>>>>>>>       from other
> >>>>>>>>       >>>> types of metadata (such as the shared system
> dictionary,
> >>>>>>>>       symbol table,
> >>>>>>>>       >>>> etc)? Initially we only relocate the tables and other
> >>>>>>>>       archived global
> >>>>>>>>       >>>> data. When each archived class is being loaded, we can
> >>>>>>>>       relocate all
> >>>>>>>>       >>>> the pointers within the current class. We could find
> the
> >>>>>>>>       segment (for
> >>>>>>>>       >>>> the current class) in the bitmap and update the
> pointers
> >>>>>>>>       within the
> >>>>>>>>       >>>> segment. That way we can reduce initial startup costs
> and
> >>>>>>>>       also avoid
> >>>>>>>>       >>>> relocating class data that's not used at runtime. In
> some
> >>>>>>>>       real world
> >>>>>>>>       >>>> large systems, an archive may contain extremely large
> >>>>>>>> number of
> >>>>>>>>       >>>> classes.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> Following are partial review comments so we can move
> things
> >>>>>>>>       forward.
> >>>>>>>>       >>>> Still going through the rest of the changes.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> - src/hotspot/share/classfile/javaClasses.cpp
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> 1218 void
> >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
> >>>>>>>>       >>>> archived_mirror) {
> >>>>>>>>       >>>> 1219   Klass* k =
> >>>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
> >>>>>>>>       >>>> 1220   if (k != NULL) { // k is NULL for the primitive
> >>>>>>>>       classes such as
> >>>>>>>>       >>>> java.lang.Byte::TYPE <<<<<<<<<<<
> >>>>>>>>       >>>> 1221
> archived_mirror->metadata_field_put(_klass_offset,
> >>>>>>>>       >>>> (Klass*)(address(k) +
> MetaspaceShared::mapping_delta()));
> >>>>>>>>       >>>> 1222   }
> >>>>>>>>       >>>> 1223 ...
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> Primitive type mirrors are handled separately. Could
> you
> >>>>>>>>       please verify
> >>>>>>>>       >>>> if this call path happens for primitive type mirror?
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> To answer my question above, looks like you added the
> >>>>>>>>       following, which
> >>>>>>>>       >>>> is to be used for primitive type mirrors. That seems
> to be
> >>>>>>>>       the reason
> >>>>>>>>       >>>> why update_archived_mirror_native_pointers is trying
> to also
> >>>>>>>>       cover
> >>>>>>>>       >>>> primitive type. It better to have a separate API for
> >>>>>>>>       primitive type
> >>>>>>>>       >>>> mirror, which is cleaner. And, we also can replace the
> above
> >>>>>>>>       check at
> >>>>>>>>       >>>> line 1220 to be an assert for regular mirrors.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> +void ReadClosure::do_mirror_oop(oop *p) {
> >>>>>>>>       >>>> +  do_oop(p);
> >>>>>>>>       >>>> +  oop mirror = *p;
> >>>>>>>>       >>>> +  if (mirror != NULL) {
> >>>>>>>>       >>>> +
> >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
> >>>>>>>>       >>>> +  }
> >>>>>>>>       >>>> +}
> >>>>>>>>       >>>> +
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> How about renaming
> update_archived_mirror_native_pointers to
> >>>>>>>>       >>>> update_archived_mirror_klass_pointers.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> It would be good to pass the current klass as an
> argument.
> >>>>>>>> We can
> >>>>>>>>       >>>> verify the relocated pointer matches with the current
> klass
> >>>>>>>>       pointer.
> >>>>>>>>       >>>>
> >>>>>>>>       >>>> We should also check if relocation is necessary before
> >>>>>>>>       spending cycles
> >>>>>>>>       >>>> to obtain the

From felix.yang at huawei.com  Mon Nov 11 01:41:22 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 11 Nov 2019 01:41:22 +0000
Subject: 8233839: aarch64: missing memory barrier in NewObjectArrayStub
 and NewTypeArrayStub
In-Reply-To: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com>
References: <DA41BE1DDCA941489001C7FBD7A8820ED60280EF@dggeml527-mbx.china.huawei.com>
 <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820ED60327A2@dggeml527-mbx.china.huawei.com>

> -----Original Message-----
> From: Andrew Dinn [mailto:adinn at redhat.com]
> Sent: Friday, November 8, 2019 5:04 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>;
> hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
> Subject: Re: 8233839: aarch64: missing memory barrier in NewObjectArrayStub
> and NewTypeArrayStub
> 
> On 08/11/2019 08:30, Yangfei (Felix) wrote:
> > I witnessed random fail of one jcstress test on my 128-core aarch64 server:
> "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest"
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8233839
> >
> > I used the latest aarch64 jdk8u release build.  Please refer to the bugzilla for
> details and the analysis.
> >   I checked the assembler code emitted by
> LIR_Assembler::emit_alloc_array:
> > For the fast path, the StoreStore memory barrier is there.  But it?s not the
> case for the slow path.
> >
> >   Patch adding the missing barrier for 14:
> >
> > diff -r ad157fab6bf5 src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp
> > --- a/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp   Thu Nov 07
> 16:26:57 2019 -0800
> > +++ b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp   Fri Nov 08
> 16:10:08 2019 +0800
> > @@ -840,6 +840,7 @@
> >            __ sub(arr_size, arr_size, t1);  // body length
> >            __ add(t1, t1, obj);       // body start
> >            __ initialize_body(t1, arr_size, 0, t2);
> > +          __ membar(Assembler::StoreStore);
> >            __ verify_oop(obj);
> >
> >            __ ret(lr);
> >
> >   JDK builds OK and passed tier1 test.
> Very nice detective work finding that one!
> 
> The jdk14 patch looks good. Also the same patch for jdk11 and the variant for
> jdk8 are good.
> 

Thanks for reviewing this.  
The jdk14 patch has been pushed as: https://hg.openjdk.java.net/jdk/jdk/rev/90cf1d4e712f  
Will push to aarch64 jdk8u after the jdk11u-fix-request is approved.  

Felix

From david.holmes at oracle.com  Mon Nov 11 07:56:21 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 11 Nov 2019 17:56:21 +1000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
Message-ID: <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>

Hi Goetz,

Please note I only looked at the test initially and have not reviewed 
this overall fix as I don't know the PPC code.

The updated test seems fine.

Thanks,
David

On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> Hi,
> 
> I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> which makes one of the fixes unnecessary.
> Also, I had to fix the argument of verify_oop_helper
> from oop to oopDesc* for the fastdebug build.
> 
> New webrev:
> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> 
> Best regards,
>    Goetz.
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Freitag, 18. Oktober 2019 01:38
>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
>> compiler-dev at openjdk.java.net>
>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
>>
>> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
>>> Hi David,
>>>
>>> you are right, thanks for pointing me to that!
>>> Doing one test for vm.bits=64 and one for 32 should fix it:
>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
>>
>> s/01/02/ :)
>>
>> For the 32-bit case you can delete the line:
>>
>>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
>>
>> For the 64-but case you can delete the "sparc" check from the same line.
>>
>> Thanks,
>> David
>>
>>>
>>> Best regards,
>>>     Goetz.
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Donnerstag, 17. Oktober 2019 13:18
>>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
>>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
>>>> compiler-dev at openjdk.java.net>
>>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
>>>>
>>>> Hi Goetz,
>>>>
>>>> UseCompressedOops is a 64-bit flag only so your change will break the
>>>> test on 32-bit systems.
>>>>
>>>> David
>>>>
>>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> 8231058 introduced a test that enables +VerifyOops.
>>>>> This fails on ppc, because this was not used in a very
>>>>> long time.
>>>>>
>>>>> The crash is caused by passing compressed oops from
>>>>> LIR_Assembler::store() to the checker routine.
>>>>> I fix this by implementing a checker routine verify_coop
>>>>> that first decompresses the coop.  This makes the new
>>>>> test pass.
>>>>>
>>>>> Further testing showed that the additional checker
>>>>> coding makes Patching Stubs overflow. These
>>>>> can not be increased in size to fit the code. I
>>>>> disable generating verify_oop code in LIRAssembler::load()
>>>>> which fixes the issue.
>>>>>
>>>>> Further I extended the message printed when verification
>>>>> of an oop failed. First, I print the location in the source
>>>>> code where the checker code was generated. Second,
>>>>> I print the faulty oop.
>>>>>
>>>>> I also improved the message printed when PatchingStubs
>>>>> overflow.
>>>>>
>>>>> Finally, I improve the test to run with and without compressed
>>>>> Oops.
>>>>>
>>>>> Please review:
>>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
>>>>>
>>>>> @runtime as I modify the test introduced there
>>>>> @compiler as the error is in C1.
>>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>

From david.holmes at oracle.com  Mon Nov 11 10:01:44 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 11 Nov 2019 20:01:44 +1000
Subject: RFR (XS) 8232735: Convert PrintJNIResolving to Unified Logging
In-Reply-To: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>
References: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>
Message-ID: <4d6facef-3971-5325-32f5-c045015d40c1@oracle.com>

Hi Coleen,

On 9/11/2019 9:20 am, coleen.phillimore at oracle.com wrote:
> Summary: converted the existing output at debug level because it is noisy
> 
> Tested with tier1 on all Oracle platforms, with os's linux, bsd, solaris 
> and windows.
> 
> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8232735.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8232735

Looks good to me.

Possible missing includes of the logging headers:
- src/hotspot/share/jvmci/jvmciCompilerToVM.cpp
- src/hotspot/share/oops/method.cpp

Thanks,
David

> Thanks,
> Coleen

From aph at redhat.com  Mon Nov 11 10:07:10 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 11 Nov 2019 10:07:10 +0000
Subject: [aarch64-port-dev ] 8233839: aarch64: missing memory barrier in
 NewObjectArrayStub and NewTypeArrayStub
In-Reply-To: <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com>
References: <DA41BE1DDCA941489001C7FBD7A8820ED60280EF@dggeml527-mbx.china.huawei.com>
 <382f7448-c392-5d98-ecf5-ac22e86f4888@redhat.com>
Message-ID: <9270b589-736e-fcce-064b-dcc6b6570406@redhat.com>

On 11/8/19 9:04 AM, Andrew Dinn wrote:
> On 08/11/2019 08:30, Yangfei (Felix) wrote:
>> I witnessed random fail of one jcstress test on my 128-core aarch64 server: "org.openjdk.jcstress.tests.defaultValues.arrays.small.plain.StringTest"
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8233839
>>   JDK builds OK and passed tier1 test.
> Very nice detective work finding that one!
> 
> The jdk14 patch looks good. Also the same patch for jdk11 and the
> variant for jdk8 are good.

Looks like ARM32 does not have the same bug. PowerPC doesn't even
attempt a fast path in this case.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From david.holmes at oracle.com  Mon Nov 11 10:28:42 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 11 Nov 2019 20:28:42 +1000
Subject: RFR: 8233785: Incorrect JDK version is reported in hs_err log
In-Reply-To: <6811d542-a530-5d70-5fd6-bea47de81d35@oracle.com>
References: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>
 <6811d542-a530-5d70-5fd6-bea47de81d35@oracle.com>
Message-ID: <317c088d-687c-c9a5-cc7f-c6744ea275a5@oracle.com>

Sorry for the delay.

Just confirming I've verified against the spec from JEP 223 and this fix 
is correct.

Thanks,
David

On 7/11/2019 10:39 pm, David Holmes wrote:
> Hi Yasumasa,
> 
> On 7/11/2019 10:28 pm, Yasumasa Suenaga wrote:
>> Hi all,
>>
>> Please review this change:
>>
>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8233785
>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233785/webrev.00/
>>
>> If JVM which is configured with --with-version-patch is crashed, JDK 
>> version in he_err log is incorrect.
>> We can get hs_err log which contains the following in header when we 
>> configure configure with "--with-version-update=0 
>> --with-version-patch=1":
>>
>> ```
>> # JRE version: OpenJDK Runtime Environment (14.0.1+2) (build 
>> 14.0.0.1+2-TypeS)
>> ```
>>
>> Valid JDK version is "14.0.0.1", however it includes "14.0.1".
>> It is a bug in JDK_Version::to_string().
> 
> I initially missed the fact that you always print _security along with 
> _patch.
> 
> I think what you have looks correct, but I'd want to double check that 
> against the versioning spec to be sure.
> 
> Thanks,
> David
> 
>>
>> Thanks,
>>
>> Yasumasa

From david.holmes at oracle.com  Mon Nov 11 10:52:54 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 11 Nov 2019 20:52:54 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
Message-ID: <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>

Hi Robbin,

Can you clarify your comments regarding use of Atomic::load and 
Atomic::store please. Seems to me you are suggesting using those for 
some memory ordering affect not for any atomicity effect per-se.

For example you say:

 > 242   jint l_ref_count = ref_count();
 > 243   ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d,
 > ref_count=%d", l_ref_count, ref_count());
 > Please use Atomic::load() in ref_count.

But it seems to me that to solve the problem of the compiler not 
reissuing the load of ref_count, you should be using 
OrderAccess::loadload() in the ref_count() method.

Or are you simply saying that given:

jint l1 = ref_count();
jint l2 = ref_count();

where ref_count simply does "return _ref_count;"

the compiler could treat the above as:

jint l1 = ref_count();
jint l2 = l1;

whereas if we have ref_count defined as "return 
Atomic_load(&_ref_count);" then the compiler cannot do that?

I don't like seeing Atomic::load/store being used just to trick the 
compiler that way. I thought we already relied on use of volatile to 
disallow such optimisations and that this was the accepted way to do it.

Thanks,
David


On 8/11/2019 11:35 pm, Robbin Ehn wrote:
> Hi Dan,
> 
> Thanks for looking into this, some comments on v8:
> 
> ##################
> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
> src/hotspot/share/logging/logTag.hpp
> src/hotspot/share/oops/markWord.hpp
> src/hotspot/share/runtime/basicLock.cpp
> src/hotspot/share/runtime/safepoint.cpp
> src/hotspot/share/runtime/serviceThread.cpp
> src/hotspot/share/runtime/sharedRuntime.cpp
> src/hotspot/share/runtime/synchronizer.hpp
> src/hotspot/share/runtime/vmOperations.cpp
> src/hotspot/share/runtime/vmOperations.hpp
> src/hotspot/share/runtime/vmStructs.cpp
> src/hotspot/share/runtime/vmThread.cpp
> test/hotspot/gtest/oops/test_markWord.cpp
> 
> No comments.
> 
> ##################
> I don't see the benefit of having the -HandshakeAfterDeflateIdleMonitors 
> code paths.
> Removing that option would mean these files can be reverted:
> src/hotspot/cpu/aarch64/globals_aarch64.hpp
> src/hotspot/cpu/arm/globals_arm.hpp
> src/hotspot/cpu/ppc/globals_ppc.hpp
> src/hotspot/cpu/s390/globals_s390.hpp
> src/hotspot/cpu/sparc/globals_sparc.hpp
> src/hotspot/cpu/x86/globals_x86.hpp
> src/hotspot/cpu/x86/macroAssembler_x86.cpp
> src/hotspot/cpu/x86/macroAssembler_x86.hpp
> src/hotspot/cpu/zero/globals_zero.hpp
> 
> And one less option here:
> src/hotspot/share/runtime/globals.hpp
> 
> ##################
> src/hotspot/share/prims/jvm.cpp
> 
> Unclear if this is a good idea.
> 
> ##################
> src/hotspot/share/prims/whitebox.cpp
> 
> This would assume the test expects the right thing, but that is not 
> obvious.
> 
> ##################
> src/hotspot/share/prims/jvmtiEnvBase.cpp
> 
> The current pending and waiting monitor is only changed by the 
> JavaThread itself.
> It only sets it after _contentions is increased.
> It clears it before _contentions is decreased.
> We are depending on safepoint or the thread is suspended, so it can't be 
> deflated since _contentions are > 0.
> Plus the thread have already increased the ref count and can't decrease 
> it (since at safepoint or suspended).
> 
> ##################
> src/hotspot/share/runtime/objectMonitor.cpp
> 
> ###1
> You have several these (and in other files):
> 242?? jint l_ref_count = ref_count();
> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d, 
> ref_count=%d", l_ref_count, ref_count());
> Please use Atomic::load() in ref_count.
> Since this is dependent on ref_count being volatile, otherwise the 
> compiler may only do one load.
> 
> ###2
> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
> ...
> 311?? Atomic::add(1, &_contentions);
> In ObjectSynchronizer::deflate_monitor if you would check ref count 
> instead of _contetion, we could remove contention.
> Since all waiters also have a ref count it looks like we don't need 
> waiters either.
> In ObjectSynchronizer::deflate_monitor:
> if (mid->_contentions != 0 || mid->_waiters != 0) {
> Why not just do:
> if (mid->ref_count()) {
> ?
> 
> ##################
> src/hotspot/share/runtime/objectMonitor.hpp
> 
> ###1
>  ?252?? intptr_t is_busy() const {
>  ?253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
>  ?254???? // We do not include _ref_count in the is_busy() check because
>  ?255???? // _ref_count is for indicating that the ObjectMonitor* is in
>  ?256???? // use which is orthogonal to whether the ObjectMonitor itself
>  ?257???? // is in use for a locking operation.
> 
> But in the non-debug code we always check:
> +? if (mid->is_busy() || mid->ref_count() != 0) {
> 
> So it seem like you should have a method including ref count.
> 
> ##################
> src/hotspot/share/runtime/objectMonitor.inline.hpp
> 
> Use Atomic::load for ref count.
> 
> ##################
> src/hotspot/share/runtime/synchronizer.cpp
> 
> ###1
>  ?139 static volatile int g_om_free_count = 0;??? // # on g_free_list
>  ?140 static volatile int g_om_in_use_count = 0;? // # on g_om_in_use_list
>  ?141 static volatile int g_om_population = 0;??? // # Extant -- in 
> circulation
>  ?142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
> No padding here, aren't they more contended than the fields in the OM?
> 
> ###2
> 151 static bool is_next_marked(ObjectMonitor* om) {
> 
> Is only used in ObjectSynchronizer::om_flush.
> Here you fetch a OM and read the next field, this do not need LA 
> semantics on supported platforms.
> This would only need Atomic::load.
> 
> ###3
> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
> 
> In no place you need SR, in the only places it would made a difference:
>  ?345?????? OrderAccess::storestore();
>  ?346?????? set_next(cur, next);? // Unmark the previous list head.
> and
> 1714???? OrderAccess::storestore();
> 1715???? set_next(in_use_list, next);
> 
> You have a storestore already!
> 
> This code reads as:
> OrderAccess::storestore();
> OrderAccess::loadstore();
> OrderAccess::storestore();
> om->_next_om = value
> 
> So it should be an Atomic::store.
> 
> ###4
> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
> 
> Since the mark is an embedded spinlock I think the terminology should be 
> changed. (that the spinlock is inside a the next pointer should be 
> abstracted away)
> E.g. mark_next_loop would just be lock.
> The load of the list heads should use Atmoic:load.
> It also seem a bit wired to return next for the locking method.
> And output parameter can just be returned, and return NULL if list head 
> is NULL.
> E.g.
> 
>  ?198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* volatile 
> * list_p) {
>  ?200?? while (true) {
>  ?201???? ObjectMonitor* mid = Atomic::load(list_p);
>  ?202???? if (mid == NULL) {
>  ?203?????? return NULL;? // The list is empty.
>  ?204???? }
>  ?205???? if (try_lock(mid)) {
>  ?206?????? if (Atmoic::load(list_p) != mid) {
>  ?207???????? // The list head changed so we have to retry.
>  ?208???????? unlock(mid);
>  ?210?????? } else {
>  ???????????? return mid;
>  ?????? }
>  ?214???? }
>  ???????? // Yield ?
>  ?215?? }
>  ?216 }
> 
> With colleteral changes.
> 
> ###5
> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
> Atomic::store is what needed.
> 
> ###6
> 333 static void prepend_to_common(
> 
>  ?345?????? OrderAccess::storestore();
>  ?346?????? set_next(cur, next);? // Unmark the previous list head.
> Double storestore. (fixed by changing set_next to Atomic::store)
> 
> ###7
>  ?375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* 
> volatile * list_p,
> 
> Triple storestore here.
> 
>  ?386?? Atomic::dec(count_p);
>  ?387?? // mark_list_head() used cmpxchg() above, switching list head 
> can be lazier:
>  ?388?? OrderAccess::storestore();
>  ?389?? // Unmark take, but leave the next value for any lagging list
>  ?390?? // walkers. It will get cleaned up when take is prepended to
>  ?391?? // the in-use list:
>  ?392?? set_next(take, next);
>  ?393?? return take;
> 
> Reads:
> count_p--
> OrderAccess::loadstore();
> OrderAccess::storestore();
> OrderAccess::storestore();
> OrderAccess::loadstore();
> OrderAccess::storestore();
> take->_next_om = next;
> 
> Fixed by changing set_next to Atomic::store and removing the 
> OrderAccess::storestore();
> 
> ###8
> ObjectSynchronizer::om_release(
> 
> 1591?????? if (m == mid) {
> 1592???????? // We found 'm' on the per-thread in-use list so try to 
> extract it.
> 1593???????? if (cur_mid_in_use == NULL) {
> 1594?????????? // mid is the list head and it is marked. Switch the list 
> head
> 1595?????????? // to next which unmarks the list head, but leaves mid 
> marked:
> 1596?????????? self->om_in_use_list = next;
> 1597?????????? // mark_list_head() used cmpxchg() above, switching list 
> head can be lazier:
> 1598?????????? OrderAccess::storestore();
> 1599???????? } else {
> 1600?????????? // mid and cur_mid_in_use are marked. Switch 
> cur_mid_in_use's
> 1601?????????? // next field to next which unmarks cur_mid_in_use, but 
> leaves
> 1602?????????? // mid marked:
> 1603?????????? OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
> 1604???????? }
> 1605???????? extracted = true;
> 1606???????? Atomic::dec(&self->om_in_use_count);
> 1607???????? // Unmark mid, but leave the next value for any lagging list
> 1608???????? // walkers. It will get cleaned up when mid is prepended to
> 1609???????? // the thread's free list:
> 1610???????? set_next(mid, next);
> 1611???????? break;
> 1612?????? }
> 
> This does not look correct. Before taking this branch we have done a 
> cmpxchg in mark_list_head or mark_next_loop.
> This is how it reads:
> OrderAccess::storestore(); // from previous cmpxchg
> OrderAccess::loadstore(); // from previous cmpxchg
> 1591?????? if (m == mid) {
> 1593???????? if (cur_mid_in_use == NULL) {
> 1596?????????? self->om_in_use_list = next;
> 1598?????????? OrderAccess::storestore();
> 1599???????? } else {
>  ?????????????? OrderAccess::storestore();
>  ?????????????? OrderAccess::loadstore();
> 1603?????????? cur_mid_in_use->_next_om = next;
> 1604???????? }
> 1605???????? extracted = true;
>  ???????????? OrderAccess::storestore();
>  ???????????? OrderAccess::fence(); // 
> storestore|storeload|loadstore|loadload
>  ???????? self->om_in_use_count--; // Atomic::dec
>  ???????????? OrderAccess::storestore();
>  ???????????? OrderAccess::loadstore();
>  ???????????? OrderAccess::storestore();
>  ???????????? OrderAccess::loadstore();
>  ???????? mid->_next_om = next; // Atomic::store
> 1611???????? break;
> 1612?????? }
> 
> extracted is local variable so you so not need any orderaccess before it 
> set.
> Fixed by changing set_next to Atomic::store, removing the 
> OrderAccess::storestore() and changing OrderAccess::release_store to 
> Atmoic::store();
> 
> ###9
> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
> 
> 1714???? OrderAccess::storestore();
> 1715???? set_next(in_use_list, next);
> Fixed by changing set_next to Atomic::store.
> 
> ###10
> 1737???? self->om_free_list = NULL;
> 1738???? OrderAccess::storestore();? // Lazier memory is okay for list 
> walkers.
> 
> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does first 
> thing cmpxchg so there is no need for this storestore.
> 
> ###11
> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, 
> Thread* self,
> 
> 1938?????? // Once ObjectMonitor is configured and the object is associated
> 1939?????? // with the ObjectMonitor, it is safe to allow async deflation:
> 1940?????? assert(m->is_new(), "freshly allocated monitor must be new");
> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
> 
> So we use ref count, contention, waiter, owner and allocation state to 
> keep OM alive in different scenarios.
> There is not way for me to keep track of that. I don't see why you would 
> need more than owner and ref count.
> If you allocate the om with ref count 1 you can remove _allocation_state 
> and just decrease ref count here instead.
> 
> ###12
> 2079 bool ObjectSynchronizer::deflate_monitor
> 
> 2112???? if (AsyncDeflateIdleMonitors) {
> 2113?????? // clear() expects the owner field to be NULL and we won't race
> 2114?????? // with the simple C2 ObjectMonitor
> 
> The macro assambler code is not just executed by C2, so this comment is 
> a bit misleading. (there are some more also)
> 
> ###13
> 2306 int ObjectSynchronizer::deflate_monitor_list(
> 
> Same issue as ObjectSynchronizer::om_release.
> Fixed by changing set_next to Atomic::store, removing the 
> OrderAccess::storestore() and changing OrderAccess::release_store to 
> Atmoic::store();
> 
> ###14
> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
> 
> This is the wrong method to call, it should 
> SafepointMechanism::should_block(Thread* thread);
> 
> ###15
> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
> 
> 2616???? g_wait_list = NULL;
> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay for 
> list walkers.
> 
> I don't see that g_wait_list is ever simutainously read.
> Either it is accessed by serviceThread outside a safepoint or by 
> VMThread inside a safepoint?
> 
> It looks like g_wait_list can just be a local in:
> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
> 
> (disregarding the debug code that might read it in a safepoint)
> 
> ###16
> 2722???????? assert(SafepointSynchronize::is_synchronizing(), "sanity 
> check");
> 
> This is the wrong method to call, it should 
> SafepointMechanism::should_block(Thread* thread);
> 
> ##################
> src/hotspot/share/runtime/vframe.cpp
> 
> We are at safepoint or current thread or in a handshake, current pending 
> and waiting monitor is already stable.
> 
> ##################
> src/hotspot/share/services/threadService.cpp
> 
> These changes are only needed for the -HandshakeAfterDeflateIdleMonitors 
> path.
> 
> ##################
> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java 
> 
> 
> Note: if OM had a weak to object instead this would not be needed.
> 
> Thanks, Robbin
> 
> 
> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have made changes to the Async Monitor Deflation code in response to
>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>> and Erik O. for their comments!
>>
>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
>> so we have 8 days from today to finish this code review cycle and push
>> this code for JDK14.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the change list from CR7 to CR8 instead of putting it in
>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>> file to the webrevs so it should be easy to find.
>>
>> Main bug URL:
>>
>> ???? JDK-8153224 Monitor deflation prolong safepoints
>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+21.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.08 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.08 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
>>
>> The OpenJDK wiki did not need any changes for this round:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab. Some testing is still running, but so far there are no new 
>> regressions.
>>
>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 
>> bits.
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project is reaching the end game. I have no
>>> changes planned for the project at this time so all that is left is code
>>> review and any changes that results from those reviews.
>>>
>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>> reviews.
>>>
>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+19.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>
>>>
>>> Some folks might want to see just what has changed since the last review
>>> cycle so here's a webrev for that (v2.07 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>
>>>
>>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
>>> changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-8]
>>> testing on Oracle's usual set of platforms. It has also been through 
>>> my usual
>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>> addition
>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>> tests in
>>> my lab.
>>>
>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-3]
>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>>
>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>> Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>
>>> ??? - "base" is jdk-14+18
>>> ??? - "v2.07" is the latest version and includes C2 
>>> inc_om_ref_count() support
>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>> specified
>>>
>>> ???????? hbIR?????????? hbIR
>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>>> ??? ---------------? ---------? --------? -------------? -------
>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>>> v2.07_handshake
>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 
>>> v2.07_off
>>>
>>> ??? - The Aurora Perf comparison tool reports:
>>>
>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>> ??????? ----------------------? -------------------- 
>>> --------------------
>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>>> p=0.790)
>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>>> p=0.536)
>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>>> p=0.347)
>>>
>>> ??????? (s) - significant? (ns) - not-significant
>>>
>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>
>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>> ??????? ----------------------? -------------------- 
>>> --------------------
>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>>> p=0.646)
>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>>> p=0.481)
>>>
>>> ??????? (s) - significant? (ns) - not-significant
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>> for our first code review in that new context!!
>>>>
>>>> I've been focused on changing the monitor list management code to be
>>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>>> like that, it takes a while to chase down all the new and wonderful
>>>> races. At this point, I have the code back to the same stability that
>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>
>>>> To lay the ground work for this round of review, I pushed the following
>>>> two fixes to jdk/jdk earlier today:
>>>>
>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>>> preparation
>>>> ? ? ??????????? for lock free Monitor lists
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>
>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>> after 8230184
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>
>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>> JDK-8230184 and JDK-8230317.
>>>>
>>>> Here's the full webrev URL for those folks that want to see all of the
>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>
>>>>
>>>>
>>>> The primary focus of this review cycle is on the lock-free Monitor List
>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>
>>>>
>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>> patch (v2.06b):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>
>>>>
>>>> The third and final bucket for this review cycle is the rename, 
>>>> whitespace,
>>>> indent and comments changes made in preparation for lock free 
>>>> Monitor list
>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>> baseline so this bucket now has just a few comment changes relative to
>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>
>>>>
>>>>
>>>> Some folks might want to see just what has changed since the last 
>>>> review
>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>
>>>>
>>>>
>>>> Last, but not least, some folks might want to see the code before the
>>>> addition of lock-free Monitor List management so here's a webrev for
>>>> that (v2.00 -> v2.05):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>
>>>>
>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> but that should only be changes to describe per-thread list async 
>>>> monitor
>>>> deflation being done by the ServiceThread.
>>>>
>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>
>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>>> using
>>>> this patch baselined on jdk-13+31 (for stability):
>>>>
>>>> ????????? hbIR?????????? hbIR
>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>> ???? ---------------? ---------? --------? ------------- -------
>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>>> v2.06d_off
>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>> v2.06d_ocache
>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>>> v2.05_off
>>>>
>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 
>>>> byte
>>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>>> before
>>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>>> buckets for this review recycle.
>>>>
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>> Thanks Karen!
>>>>>
>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>> in the main body of this email.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>> (jdk/jdk) repo soon...
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>
>>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>>> to match the CR5 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>> available.
>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>
>>>>> Next up:
>>>>>
>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>> - Of course, going lock free often causes new races and new failures
>>>>> ? so that's a good reason for make those changes isolated in their
>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>> - I finally have a potential fix for the Win* failure with
>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next 
>>>>> round.
>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>> ? look at the C2 RTM macro assembler code and look for things that 
>>>>> might
>>>>> ? conflict if Async Monitor Deflation. If you're interested in that 
>>>>> kind
>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>> ? added in this round!
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>> experiments.
>>>>>>
>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>> themselves are not that big.
>>>>>>
>>>>>> Functional:
>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>> ??????? safepoints that simply request async deflation which
>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>> ??????? their async deflation work.
>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>>> ????? AsyncDeflationInterval).
>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>> ??????? ServiceThread.
>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>>> ??? of their in-use lists.
>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>> ????? GuaranteedSafepointInterval.
>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>
>>>>>> Test update:
>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>
>>>>>> Collateral:
>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>
>>>>>> Cleanup:
>>>>>> ? - Updated comments based on Karen's code review.
>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>> ??? - comment and function name changes
>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>
>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>
>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>> Sorry for all the names...
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>
>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>> stress kit is running right now.
>>>>>>>
>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>> the results and analyze them.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>> Deflation
>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>> with the
>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). 
>>>>>>>> Sorry
>>>>>>>> for all the names...
>>>>>>>>
>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>>> patch
>>>>>>>> is out of our hair.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>>> running
>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>
>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>> testing is done.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>> ready to
>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>>> and as
>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>>> names...
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>> Deflation project
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>
>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>>
>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>>> version
>>>>>>>>> of JDK-8153224...
>>>>>>>>>
>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now 
>>>>>>>>> and
>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>> Solaris-X64
>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>>> work on:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>
>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>
>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>> rounds of
>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>> Kitchensink
>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>>> fastdebug
>>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>>>>> stress
>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>
>>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>>> latest version of the patch.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> P.S.
>>>>>>>>>> One subtest in 
>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>>> trying
>>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>> aggravating
>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

From david.holmes at oracle.com  Mon Nov 11 10:56:21 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 11 Nov 2019 20:56:21 +1000
Subject: RFR(S): 8233113: ARM32: assert on UnsafeJlong mutex rank check
In-Reply-To: <5124def3-3bf7-8425-557d-c6cba6192927@bell-sw.com>
References: <5124def3-3bf7-8425-557d-c6cba6192927@bell-sw.com>
Message-ID: <df1adebe-31d5-8cc2-d8c7-a22e2397c7b9@oracle.com>

Hi Boris,

This seems fine to me.

Thanks,
David

On 8/11/2019 11:28 pm, Boris Ulasevich wrote:
> Hi,
> 
> Recent JDK-8184732 change adds the assertion that fires on UnsafeJlong 
> mutex rank check, on platforms without 64 bit atomics 
> compare-and-exchange support. On preliminary review (thanks to Coleen 
> and David!) it is suggested to remove the assertion and corresponding 
> test codes.
> 
> http://bugs.openjdk.java.net/browse/JDK-8233113
> http://cr.openjdk.java.net/~bulasevich/8233113/webrev.01
> 
> Thanks,
> Boris

From coleen.phillimore at oracle.com  Mon Nov 11 12:31:12 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Mon, 11 Nov 2019 07:31:12 -0500
Subject: RFR (XS) 8232735: Convert PrintJNIResolving to Unified Logging
In-Reply-To: <4d6facef-3971-5325-32f5-c045015d40c1@oracle.com>
References: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>
 <4d6facef-3971-5325-32f5-c045015d40c1@oracle.com>
Message-ID: <d7209926-1ed9-5102-e226-7eed91895862@oracle.com>


On 11/11/19 5:01 AM, David Holmes wrote:
> Hi Coleen,
>
> On 9/11/2019 9:20 am, coleen.phillimore at oracle.com wrote:
>> Summary: converted the existing output at debug level because it is 
>> noisy
>>
>> Tested with tier1 on all Oracle platforms, with os's linux, bsd, 
>> solaris and windows.
>>
>> open webrev at 
>> http://cr.openjdk.java.net/~coleenp/2019/8232735.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8232735
>
> Looks good to me.
>
> Possible missing includes of the logging headers:
> - src/hotspot/share/jvmci/jvmciCompilerToVM.cpp
> - src/hotspot/share/oops/method.cpp

They must be transitively included, because I compile without 
precompiled headers, but I'll add them.? Thank you for finding this.
Coleen
>
> Thanks,
> David
>
>> Thanks,
>> Coleen


From coleen.phillimore at oracle.com  Mon Nov 11 12:32:38 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Mon, 11 Nov 2019 07:32:38 -0500
Subject: RFR (XS) 8232735: Convert PrintJNIResolving to Unified Logging
In-Reply-To: <def1f5b2-f66b-7574-8f2f-0edcaa96b4aa@oracle.com>
References: <52dd271d-07ec-5e1c-9b9b-6966935f4b9f@oracle.com>
 <def1f5b2-f66b-7574-8f2f-0edcaa96b4aa@oracle.com>
Message-ID: <56ae7a59-b525-a51f-a0a0-7ab9b4612551@oracle.com>

Thanks, Ioi!
Coleen

On 11/8/19 9:23 PM, Ioi Lam wrote:
> Hi Coleen,
>
> Looks good to me.
>
> Thanks
> - Ioi
>
> On 11/8/19 3:20 PM, coleen.phillimore at oracle.com wrote:
>> Summary: converted the existing output at debug level because it is 
>> noisy
>>
>> Tested with tier1 on all Oracle platforms, with os's linux, bsd, 
>> solaris and windows.
>>
>> open webrev at 
>> http://cr.openjdk.java.net/~coleenp/2019/8232735.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8232735
>>
>> Thanks,
>> Coleen
>


From robbin.ehn at oracle.com  Mon Nov 11 12:41:17 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 11 Nov 2019 13:41:17 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
Message-ID: <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>

Hi David,

On 2019-11-11 11:52, David Holmes wrote:
> Hi Robbin,
> 
> Can you clarify your comments regarding use of Atomic::load and Atomic::store 
> please. Seems to me you are suggesting using those for some memory ordering 
> affect not for any atomicity effect per-se.

No I'm not. I'm taking about atomicity as in no word-tearing, single store/load.

> 
> For example you say:
> 
>  > 242?? jint l_ref_count = ref_count();
>  > 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d,
>  > ref_count=%d", l_ref_count, ref_count());
>  > Please use Atomic::load() in ref_count.
> 
> But it seems to me that to solve the problem of the compiler not reissuing the 
> load of ref_count, you should be using OrderAccess::loadload() in the 
> ref_count() method.
> 
> Or are you simply saying that given:
> 
> jint l1 = ref_count();
> jint l2 = ref_count();
> 
> where ref_count simply does "return _ref_count;"
> 
> the compiler could treat the above as:
> 
> jint l1 = ref_count();
> jint l2 = l1;
> 
> whereas if we have ref_count defined as "return Atomic_load(&_ref_count);" then 
> the compiler cannot do that?

Yes. (not considering _ref_count is volatile)

> 
> I don't like seeing Atomic::load/store being used just to trick the compiler 
> that way. I thought we already relied on use of volatile to disallow such 
> optimisations and that this was the accepted way to do it.

What would the reason for using Atomic::load/store be if not to guarantee an 
atomic load/store ?

Yes, we use volatile for that.
The problem with using volatile is that is also affects ordering.
And you never want to use that ordering (compiler do not re-order volatile 
access, but CPU might...).

By using either Atomic::load/store or OrderAccess::release_store/load_acquire 
(or stronger), you get the semantic that is appropiate.

Also in this patch there is already Atomic::store/load on "volatile markWord 
_header;".

Argubly above should be written as:
jint l_ref_count = ref_count(); // Atomic::load()
if (l_ref_count > 0) {
	OrderAccess::loadload();
	ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d,
	ref_count=%d", l_ref_count, ref_count());
}

But since _ref_count could have been changed many times before the second load I 
didn't see the point of printing the same value again.

Now there is a zillion places where we use volatile instead Atomic::load/store.
Either those cases have to strong or to weak ordering.

Thanks, Robbin

> 
> Thanks,
> David
> 
> 
> On 8/11/2019 11:35 pm, Robbin Ehn wrote:
>> Hi Dan,
>>
>> Thanks for looking into this, some comments on v8:
>>
>> ##################
>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>> src/hotspot/share/logging/logTag.hpp
>> src/hotspot/share/oops/markWord.hpp
>> src/hotspot/share/runtime/basicLock.cpp
>> src/hotspot/share/runtime/safepoint.cpp
>> src/hotspot/share/runtime/serviceThread.cpp
>> src/hotspot/share/runtime/sharedRuntime.cpp
>> src/hotspot/share/runtime/synchronizer.hpp
>> src/hotspot/share/runtime/vmOperations.cpp
>> src/hotspot/share/runtime/vmOperations.hpp
>> src/hotspot/share/runtime/vmStructs.cpp
>> src/hotspot/share/runtime/vmThread.cpp
>> test/hotspot/gtest/oops/test_markWord.cpp
>>
>> No comments.
>>
>> ##################
>> I don't see the benefit of having the -HandshakeAfterDeflateIdleMonitors code 
>> paths.
>> Removing that option would mean these files can be reverted:
>> src/hotspot/cpu/aarch64/globals_aarch64.hpp
>> src/hotspot/cpu/arm/globals_arm.hpp
>> src/hotspot/cpu/ppc/globals_ppc.hpp
>> src/hotspot/cpu/s390/globals_s390.hpp
>> src/hotspot/cpu/sparc/globals_sparc.hpp
>> src/hotspot/cpu/x86/globals_x86.hpp
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>> src/hotspot/cpu/zero/globals_zero.hpp
>>
>> And one less option here:
>> src/hotspot/share/runtime/globals.hpp
>>
>> ##################
>> src/hotspot/share/prims/jvm.cpp
>>
>> Unclear if this is a good idea.
>>
>> ##################
>> src/hotspot/share/prims/whitebox.cpp
>>
>> This would assume the test expects the right thing, but that is not obvious.
>>
>> ##################
>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>
>> The current pending and waiting monitor is only changed by the JavaThread itself.
>> It only sets it after _contentions is increased.
>> It clears it before _contentions is decreased.
>> We are depending on safepoint or the thread is suspended, so it can't be 
>> deflated since _contentions are > 0.
>> Plus the thread have already increased the ref count and can't decrease it 
>> (since at safepoint or suspended).
>>
>> ##################
>> src/hotspot/share/runtime/objectMonitor.cpp
>>
>> ###1
>> You have several these (and in other files):
>> 242?? jint l_ref_count = ref_count();
>> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d, 
>> ref_count=%d", l_ref_count, ref_count());
>> Please use Atomic::load() in ref_count.
>> Since this is dependent on ref_count being volatile, otherwise the compiler 
>> may only do one load.
>>
>> ###2
>> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
>> ...
>> 311?? Atomic::add(1, &_contentions);
>> In ObjectSynchronizer::deflate_monitor if you would check ref count instead of 
>> _contetion, we could remove contention.
>> Since all waiters also have a ref count it looks like we don't need waiters 
>> either.
>> In ObjectSynchronizer::deflate_monitor:
>> if (mid->_contentions != 0 || mid->_waiters != 0) {
>> Why not just do:
>> if (mid->ref_count()) {
>> ?
>>
>> ##################
>> src/hotspot/share/runtime/objectMonitor.hpp
>>
>> ###1
>> ??252?? intptr_t is_busy() const {
>> ??253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
>> ??254???? // We do not include _ref_count in the is_busy() check because
>> ??255???? // _ref_count is for indicating that the ObjectMonitor* is in
>> ??256???? // use which is orthogonal to whether the ObjectMonitor itself
>> ??257???? // is in use for a locking operation.
>>
>> But in the non-debug code we always check:
>> +? if (mid->is_busy() || mid->ref_count() != 0) {
>>
>> So it seem like you should have a method including ref count.
>>
>> ##################
>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>
>> Use Atomic::load for ref count.
>>
>> ##################
>> src/hotspot/share/runtime/synchronizer.cpp
>>
>> ###1
>> ??139 static volatile int g_om_free_count = 0;??? // # on g_free_list
>> ??140 static volatile int g_om_in_use_count = 0;? // # on g_om_in_use_list
>> ??141 static volatile int g_om_population = 0;??? // # Extant -- in circulation
>> ??142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
>> No padding here, aren't they more contended than the fields in the OM?
>>
>> ###2
>> 151 static bool is_next_marked(ObjectMonitor* om) {
>>
>> Is only used in ObjectSynchronizer::om_flush.
>> Here you fetch a OM and read the next field, this do not need LA semantics on 
>> supported platforms.
>> This would only need Atomic::load.
>>
>> ###3
>> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>>
>> In no place you need SR, in the only places it would made a difference:
>> ??345?????? OrderAccess::storestore();
>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>> and
>> 1714???? OrderAccess::storestore();
>> 1715???? set_next(in_use_list, next);
>>
>> You have a storestore already!
>>
>> This code reads as:
>> OrderAccess::storestore();
>> OrderAccess::loadstore();
>> OrderAccess::storestore();
>> om->_next_om = value
>>
>> So it should be an Atomic::store.
>>
>> ###4
>> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>>
>> Since the mark is an embedded spinlock I think the terminology should be 
>> changed. (that the spinlock is inside a the next pointer should be abstracted 
>> away)
>> E.g. mark_next_loop would just be lock.
>> The load of the list heads should use Atmoic:load.
>> It also seem a bit wired to return next for the locking method.
>> And output parameter can just be returned, and return NULL if list head is NULL.
>> E.g.
>>
>> ??198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* volatile * 
>> list_p) {
>> ??200?? while (true) {
>> ??201???? ObjectMonitor* mid = Atomic::load(list_p);
>> ??202???? if (mid == NULL) {
>> ??203?????? return NULL;? // The list is empty.
>> ??204???? }
>> ??205???? if (try_lock(mid)) {
>> ??206?????? if (Atmoic::load(list_p) != mid) {
>> ??207???????? // The list head changed so we have to retry.
>> ??208???????? unlock(mid);
>> ??210?????? } else {
>> ????????????? return mid;
>> ??????? }
>> ??214???? }
>> ????????? // Yield ?
>> ??215?? }
>> ??216 }
>>
>> With colleteral changes.
>>
>> ###5
>> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
>> Atomic::store is what needed.
>>
>> ###6
>> 333 static void prepend_to_common(
>>
>> ??345?????? OrderAccess::storestore();
>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>> Double storestore. (fixed by changing set_next to Atomic::store)
>>
>> ###7
>> ??375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* volatile 
>> * list_p,
>>
>> Triple storestore here.
>>
>> ??386?? Atomic::dec(count_p);
>> ??387?? // mark_list_head() used cmpxchg() above, switching list head can be 
>> lazier:
>> ??388?? OrderAccess::storestore();
>> ??389?? // Unmark take, but leave the next value for any lagging list
>> ??390?? // walkers. It will get cleaned up when take is prepended to
>> ??391?? // the in-use list:
>> ??392?? set_next(take, next);
>> ??393?? return take;
>>
>> Reads:
>> count_p--
>> OrderAccess::loadstore();
>> OrderAccess::storestore();
>> OrderAccess::storestore();
>> OrderAccess::loadstore();
>> OrderAccess::storestore();
>> take->_next_om = next;
>>
>> Fixed by changing set_next to Atomic::store and removing the 
>> OrderAccess::storestore();
>>
>> ###8
>> ObjectSynchronizer::om_release(
>>
>> 1591?????? if (m == mid) {
>> 1592???????? // We found 'm' on the per-thread in-use list so try to extract it.
>> 1593???????? if (cur_mid_in_use == NULL) {
>> 1594?????????? // mid is the list head and it is marked. Switch the list head
>> 1595?????????? // to next which unmarks the list head, but leaves mid marked:
>> 1596?????????? self->om_in_use_list = next;
>> 1597?????????? // mark_list_head() used cmpxchg() above, switching list head 
>> can be lazier:
>> 1598?????????? OrderAccess::storestore();
>> 1599???????? } else {
>> 1600?????????? // mid and cur_mid_in_use are marked. Switch cur_mid_in_use's
>> 1601?????????? // next field to next which unmarks cur_mid_in_use, but leaves
>> 1602?????????? // mid marked:
>> 1603?????????? OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
>> 1604???????? }
>> 1605???????? extracted = true;
>> 1606???????? Atomic::dec(&self->om_in_use_count);
>> 1607???????? // Unmark mid, but leave the next value for any lagging list
>> 1608???????? // walkers. It will get cleaned up when mid is prepended to
>> 1609???????? // the thread's free list:
>> 1610???????? set_next(mid, next);
>> 1611???????? break;
>> 1612?????? }
>>
>> This does not look correct. Before taking this branch we have done a cmpxchg 
>> in mark_list_head or mark_next_loop.
>> This is how it reads:
>> OrderAccess::storestore(); // from previous cmpxchg
>> OrderAccess::loadstore(); // from previous cmpxchg
>> 1591?????? if (m == mid) {
>> 1593???????? if (cur_mid_in_use == NULL) {
>> 1596?????????? self->om_in_use_list = next;
>> 1598?????????? OrderAccess::storestore();
>> 1599???????? } else {
>> ??????????????? OrderAccess::storestore();
>> ??????????????? OrderAccess::loadstore();
>> 1603?????????? cur_mid_in_use->_next_om = next;
>> 1604???????? }
>> 1605???????? extracted = true;
>> ????????????? OrderAccess::storestore();
>> ????????????? OrderAccess::fence(); // storestore|storeload|loadstore|loadload
>> ????????? self->om_in_use_count--; // Atomic::dec
>> ????????????? OrderAccess::storestore();
>> ????????????? OrderAccess::loadstore();
>> ????????????? OrderAccess::storestore();
>> ????????????? OrderAccess::loadstore();
>> ????????? mid->_next_om = next; // Atomic::store
>> 1611???????? break;
>> 1612?????? }
>>
>> extracted is local variable so you so not need any orderaccess before it set.
>> Fixed by changing set_next to Atomic::store, removing the 
>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>> Atmoic::store();
>>
>> ###9
>> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>>
>> 1714???? OrderAccess::storestore();
>> 1715???? set_next(in_use_list, next);
>> Fixed by changing set_next to Atomic::store.
>>
>> ###10
>> 1737???? self->om_free_list = NULL;
>> 1738???? OrderAccess::storestore();? // Lazier memory is okay for list walkers.
>>
>> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does first thing 
>> cmpxchg so there is no need for this storestore.
>>
>> ###11
>> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, Thread* self,
>>
>> 1938?????? // Once ObjectMonitor is configured and the object is associated
>> 1939?????? // with the ObjectMonitor, it is safe to allow async deflation:
>> 1940?????? assert(m->is_new(), "freshly allocated monitor must be new");
>> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
>>
>> So we use ref count, contention, waiter, owner and allocation state to keep OM 
>> alive in different scenarios.
>> There is not way for me to keep track of that. I don't see why you would need 
>> more than owner and ref count.
>> If you allocate the om with ref count 1 you can remove _allocation_state and 
>> just decrease ref count here instead.
>>
>> ###12
>> 2079 bool ObjectSynchronizer::deflate_monitor
>>
>> 2112???? if (AsyncDeflateIdleMonitors) {
>> 2113?????? // clear() expects the owner field to be NULL and we won't race
>> 2114?????? // with the simple C2 ObjectMonitor
>>
>> The macro assambler code is not just executed by C2, so this comment is a bit 
>> misleading. (there are some more also)
>>
>> ###13
>> 2306 int ObjectSynchronizer::deflate_monitor_list(
>>
>> Same issue as ObjectSynchronizer::om_release.
>> Fixed by changing set_next to Atomic::store, removing the 
>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>> Atmoic::store();
>>
>> ###14
>> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
>>
>> This is the wrong method to call, it should 
>> SafepointMechanism::should_block(Thread* thread);
>>
>> ###15
>> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>>
>> 2616???? g_wait_list = NULL;
>> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay for list 
>> walkers.
>>
>> I don't see that g_wait_list is ever simutainously read.
>> Either it is accessed by serviceThread outside a safepoint or by VMThread 
>> inside a safepoint?
>>
>> It looks like g_wait_list can just be a local in:
>> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>>
>> (disregarding the debug code that might read it in a safepoint)
>>
>> ###16
>> 2722???????? assert(SafepointSynchronize::is_synchronizing(), "sanity check");
>>
>> This is the wrong method to call, it should 
>> SafepointMechanism::should_block(Thread* thread);
>>
>> ##################
>> src/hotspot/share/runtime/vframe.cpp
>>
>> We are at safepoint or current thread or in a handshake, current pending and 
>> waiting monitor is already stable.
>>
>> ##################
>> src/hotspot/share/services/threadService.cpp
>>
>> These changes are only needed for the -HandshakeAfterDeflateIdleMonitors path.
>>
>> ##################
>> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java
>>
>> Note: if OM had a weak to object instead this would not be needed.
>>
>> Thanks, Robbin
>>
>>
>> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have made changes to the Async Monitor Deflation code in response to
>>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>>> and Erik O. for their comments!
>>>
>>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
>>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>>> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
>>> so we have 8 days from today to finish this code review cycle and push
>>> this code for JDK14.
>>>
>>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>>
>>> I have attached the change list from CR7 to CR8 instead of putting it in
>>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>>> file to the webrevs so it should be easy to find.
>>>
>>> Main bug URL:
>>>
>>> ???? JDK-8153224 Monitor deflation prolong safepoints
>>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+21.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.08 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
>>>
>>> Some folks might want to see just what has changed since the last review
>>> cycle so here's a webrev for that (v2.08 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
>>>
>>> The OpenJDK wiki did not need any changes for this round:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 tier[1-8]
>>> testing on Oracle's usual set of platforms. It has also been through my usual
>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
>>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>>> my lab. Some testing is still running, but so far there are no new regressions.
>>>
>>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 bits.
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> The Async Monitor Deflation project is reaching the end game. I have no
>>>> changes planned for the project at this time so all that is left is code
>>>> review and any changes that results from those reviews.
>>>>
>>>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>>>
>>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-14+19.
>>>>
>>>> Here's the full webrev URL for those folks that want to see all of the
>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full
>>>>
>>>> Some folks might want to see just what has changed since the last review
>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/
>>>>
>>>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 tier[1-8]
>>>> testing on Oracle's usual set of platforms. It has also been through my usual
>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>>>> my lab.
>>>>
>>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 tier[1-3]
>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>>>
>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>
>>>> ??? - "base" is jdk-14+18
>>>> ??? - "v2.07" is the latest version and includes C2 inc_om_ref_count() support
>>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors specified
>>>>
>>>> ???????? hbIR?????????? hbIR
>>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>>>> ??? ---------------? ---------? --------? -------------? -------
>>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>>>> v2.07_handshake
>>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 v2.07_off
>>>>
>>>> ??? - The Aurora Perf comparison tool reports:
>>>>
>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>> ??????? ----------------------? -------------------- --------------------
>>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, p=0.790)
>>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, p=0.536)
>>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, p=0.347)
>>>>
>>>> ??????? (s) - significant? (ns) - not-significant
>>>>
>>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>>
>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>> ??????? ----------------------? -------------------- --------------------
>>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, p=0.646)
>>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, p=0.481)
>>>>
>>>> ??????? (s) - significant? (ns) - not-significant
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>>> for our first code review in that new context!!
>>>>>
>>>>> I've been focused on changing the monitor list management code to be
>>>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>>>> like that, it takes a while to chase down all the new and wonderful
>>>>> races. At this point, I have the code back to the same stability that
>>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>>
>>>>> To lay the ground work for this round of review, I pushed the following
>>>>> two fixes to jdk/jdk earlier today:
>>>>>
>>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in preparation
>>>>> ? ? ??????????? for lock free Monitor lists
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>>
>>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails after 8230184
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>>
>>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>>> in the main body of this email.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>>> JDK-8230184 and JDK-8230317.
>>>>>
>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/
>>>>>
>>>>>
>>>>> The primary focus of this review cycle is on the lock-free Monitor List
>>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/
>>>>>
>>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>>> patch (v2.06b):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/
>>>>>
>>>>> The third and final bucket for this review cycle is the rename, whitespace,
>>>>> indent and comments changes made in preparation for lock free Monitor list
>>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>>> baseline so this bucket now has just a few comment changes relative to
>>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/
>>>>>
>>>>>
>>>>> Some folks might want to see just what has changed since the last review
>>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
>>>>>
>>>>>
>>>>> Last, but not least, some folks might want to see the code before the
>>>>> addition of lock-free Monitor List management so here's a webrev for
>>>>> that (v2.00 -> v2.05):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/
>>>>>
>>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> but that should only be changes to describe per-thread list async monitor
>>>>> deflation being done by the ServiceThread.
>>>>>
>>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>>
>>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was using
>>>>> this patch baselined on jdk-13+31 (for stability):
>>>>>
>>>>> ????????? hbIR?????????? hbIR
>>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>>> ???? ---------------? ---------? --------? ------------- -------
>>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 v2.06d_off
>>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 v2.06d_ocache
>>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 v2.05_off
>>>>>
>>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 byte
>>>>> cache lines sizes. "v2.06d" is the last set of changes that I made before
>>>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>>>> buckets for this review recycle.
>>>>>
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>>> Thanks Karen!
>>>>>>
>>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>>> in the main body of this email.
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>>> (jdk/jdk) repo soon...
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>>
>>>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>>>> to match the CR5 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become available.
>>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>>
>>>>>> Next up:
>>>>>>
>>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>>> - Of course, going lock free often causes new races and new failures
>>>>>> ? so that's a good reason for make those changes isolated in their
>>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>>> - I finally have a potential fix for the Win* failure with
>>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next round.
>>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>>> ? look at the C2 RTM macro assembler code and look for things that might
>>>>>> ? conflict if Async Monitor Deflation. If you're interested in that kind
>>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>>> ? added in this round!
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>>> experiments.
>>>>>>>
>>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>>> themselves are not that big.
>>>>>>>
>>>>>>> Functional:
>>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>>> ??????? safepoints that simply request async deflation which
>>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>>> ??????? their async deflation work.
>>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>>>> ????? AsyncDeflationInterval).
>>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>>> ??????? ServiceThread.
>>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>>>> ??? of their in-use lists.
>>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>>> ????? GuaranteedSafepointInterval.
>>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>>
>>>>>>> Test update:
>>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>>
>>>>>>> Collateral:
>>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>>
>>>>>>> Cleanup:
>>>>>>> ? - Updated comments based on Karen's code review.
>>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>>> ??? - comment and function name changes
>>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>>
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>>
>>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>>
>>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>>> Sorry for all the names...
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>>> stress kit is running right now.
>>>>>>>>
>>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>>> the results and analyze them.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>>>>>> project ready to go. It's also known as v2.02 (for those for with the
>>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>>>>>>>> for all the names...
>>>>>>>>>
>>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>>>>>>>> is out of our hair.
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>>
>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>>
>>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>>> testing is done.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>>>> names...
>>>>>>>>>>
>>>>>>>>>> Main bug URL:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor Deflation 
>>>>>>>>>> project
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>>
>>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>>
>>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>>>>>>>>
>>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>>>>>>
>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>>>
>>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest version
>>>>>>>>>> of JDK-8153224...
>>>>>>>>>>
>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>>>>>>>
>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>
>>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>
>>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>>
>>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>>
>>>>>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>>
>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>>
>>>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>>>> latest version of the patch.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>> P.S.
>>>>>>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>

From david.holmes at oracle.com  Mon Nov 11 14:03:01 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 12 Nov 2019 00:03:01 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
Message-ID: <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>

Hi Robbin,

On 11/11/2019 10:41 pm, Robbin Ehn wrote:
> Hi David,
> 
> On 2019-11-11 11:52, David Holmes wrote:
>> Hi Robbin,
>>
>> Can you clarify your comments regarding use of Atomic::load and 
>> Atomic::store please. Seems to me you are suggesting using those for 
>> some memory ordering affect not for any atomicity effect per-se.
> 
> No I'm not. I'm taking about atomicity as in no word-tearing, single 
> store/load.

Word-tearing is only a potential issue for 16-bit or smaller accesses, 
or unaligned 32-bit or 64-bit accesses. But we don't (shouldn't) use 
unaligned 32-bit and 64-bit accesses to ensure there is no possibility 
of word-tearing. Otherwise we would need to use Atomic::load/store for 
every lock-free algorithm and data-structure that we have.

Atomic::load/store was primarily needed for 64-bit values on 32-bit 
platforms.

>>
>> For example you say:
>>
>> ?> 242?? jint l_ref_count = ref_count();
>> ?> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: 
>> l_ref_count=%d,
>> ?> ref_count=%d", l_ref_count, ref_count());
>> ?> Please use Atomic::load() in ref_count.
>>
>> But it seems to me that to solve the problem of the compiler not 
>> reissuing the load of ref_count, you should be using 
>> OrderAccess::loadload() in the ref_count() method.
>>
>> Or are you simply saying that given:
>>
>> jint l1 = ref_count();
>> jint l2 = ref_count();
>>
>> where ref_count simply does "return _ref_count;"
>>
>> the compiler could treat the above as:
>>
>> jint l1 = ref_count();
>> jint l2 = l1;
>>
>> whereas if we have ref_count defined as "return 
>> Atomic_load(&_ref_count);" then the compiler cannot do that?
> 
> Yes. (not considering _ref_count is volatile)
> 
>>
>> I don't like seeing Atomic::load/store being used just to trick the 
>> compiler that way. I thought we already relied on use of volatile to 
>> disallow such optimisations and that this was the accepted way to do it.
> 
> What would the reason for using Atomic::load/store be if not to 
> guarantee an atomic load/store ?

The point is we shouldn't need to guarantee atomic load/store for 32-bit 
or 64-bit values using the Atomic class because it is the implicit mode 
in which we operate.

But I can see quite a number of uses have crept in to the code base. :( 
Go back to Java 7 and the only use of Atomic's was for dealing with 
64-bit on 32-bit platforms.

I also can't see how/where the Atomic class is in fact doing anything to 
guarantee atomicity for 32-bit or 64-bit values.

> 
> Yes, we use volatile for that.
> The problem with using volatile is that is also affects ordering.
> And you never want to use that ordering (compiler do not re-order 
> volatile access, but CPU might...).

We only use volatile as a flag for the compiler, it tells us nothing 
about what the hardware may do. If you need hardware ordering guarantees 
you have to use OrderAccess.

> By using either Atomic::load/store or 
> OrderAccess::release_store/load_acquire (or stronger), you get the 
> semantic that is appropiate.
> 
> Also in this patch there is already Atomic::store/load on "volatile 
> markWord _header;".

And I've flagged the inappropriateness of using these with Dan. Though I 
see we already have a couple of pre-existing occurrences which have 
snuck in - again this seems to be a misunderstanding about the need for 
Atomic use in these cases.

> Argubly above should be written as:
> jint l_ref_count = ref_count(); // Atomic::load()
> if (l_ref_count > 0) {
>  ????OrderAccess::loadload();
>  ????ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d,
>  ????ref_count=%d", l_ref_count, ref_count());
> }
> 
> But since _ref_count could have been changed many times before the 
> second load I didn't see the point of printing the same value again.

I'm not at all clear in that example why we care what ref_count may have 
changed its value to, or how it relates to the failed guarantee.

> Now there is a zillion places where we use volatile instead 
> Atomic::load/store.
> Either those cases have to strong or to weak ordering.

volatile is only used to prevent basic compiler optimizations from being 
applied. It is used for any concurrently modified variable that is 
accessed lock-free to at least request the compiler to not try to be 
clever when accessing this variable. This may not have any well 
specified semantics according to language specifications but we have 
always used compilers in good faith that they do the right thing. Use of 
volatile has nothing to do with any perceived atomicity of access, nor 
does it suggest anything about hardware reordering.

Atomicity of access for 32-bit and 64-bit values is implicitly obtained 
by using plain load/stores and having suitable aligned variables. That's 
the way it is supposed to work so that we don't need to write 
Atomic::load and Atomic::store on every variables used in lock-free 
contexts. But it seems that message has not been passed on through the 
years. I can point you to an internal wiki that I wrote up on 2010 where 
it states:

"In addition the Java platform requires that basic accesses (simple 
loads and stores, but not compound operations like increment) are atomic 
for all 32-bit Java data types (all except long and double). This is 
usually trivially achieved by aligning values on 32-bit boundaries on 
32-bit, or 64-bit systems."

David
-----

> Thanks, Robbin
> 
>>
>> Thanks,
>> David
>>
>>
>> On 8/11/2019 11:35 pm, Robbin Ehn wrote:
>>> Hi Dan,
>>>
>>> Thanks for looking into this, some comments on v8:
>>>
>>> ##################
>>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>>> src/hotspot/share/logging/logTag.hpp
>>> src/hotspot/share/oops/markWord.hpp
>>> src/hotspot/share/runtime/basicLock.cpp
>>> src/hotspot/share/runtime/safepoint.cpp
>>> src/hotspot/share/runtime/serviceThread.cpp
>>> src/hotspot/share/runtime/sharedRuntime.cpp
>>> src/hotspot/share/runtime/synchronizer.hpp
>>> src/hotspot/share/runtime/vmOperations.cpp
>>> src/hotspot/share/runtime/vmOperations.hpp
>>> src/hotspot/share/runtime/vmStructs.cpp
>>> src/hotspot/share/runtime/vmThread.cpp
>>> test/hotspot/gtest/oops/test_markWord.cpp
>>>
>>> No comments.
>>>
>>> ##################
>>> I don't see the benefit of having the 
>>> -HandshakeAfterDeflateIdleMonitors code paths.
>>> Removing that option would mean these files can be reverted:
>>> src/hotspot/cpu/aarch64/globals_aarch64.hpp
>>> src/hotspot/cpu/arm/globals_arm.hpp
>>> src/hotspot/cpu/ppc/globals_ppc.hpp
>>> src/hotspot/cpu/s390/globals_s390.hpp
>>> src/hotspot/cpu/sparc/globals_sparc.hpp
>>> src/hotspot/cpu/x86/globals_x86.hpp
>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>> src/hotspot/cpu/zero/globals_zero.hpp
>>>
>>> And one less option here:
>>> src/hotspot/share/runtime/globals.hpp
>>>
>>> ##################
>>> src/hotspot/share/prims/jvm.cpp
>>>
>>> Unclear if this is a good idea.
>>>
>>> ##################
>>> src/hotspot/share/prims/whitebox.cpp
>>>
>>> This would assume the test expects the right thing, but that is not 
>>> obvious.
>>>
>>> ##################
>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>
>>> The current pending and waiting monitor is only changed by the 
>>> JavaThread itself.
>>> It only sets it after _contentions is increased.
>>> It clears it before _contentions is decreased.
>>> We are depending on safepoint or the thread is suspended, so it can't 
>>> be deflated since _contentions are > 0.
>>> Plus the thread have already increased the ref count and can't 
>>> decrease it (since at safepoint or suspended).
>>>
>>> ##################
>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>
>>> ###1
>>> You have several these (and in other files):
>>> 242?? jint l_ref_count = ref_count();
>>> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: 
>>> l_ref_count=%d, ref_count=%d", l_ref_count, ref_count());
>>> Please use Atomic::load() in ref_count.
>>> Since this is dependent on ref_count being volatile, otherwise the 
>>> compiler may only do one load.
>>>
>>> ###2
>>> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
>>> ...
>>> 311?? Atomic::add(1, &_contentions);
>>> In ObjectSynchronizer::deflate_monitor if you would check ref count 
>>> instead of _contetion, we could remove contention.
>>> Since all waiters also have a ref count it looks like we don't need 
>>> waiters either.
>>> In ObjectSynchronizer::deflate_monitor:
>>> if (mid->_contentions != 0 || mid->_waiters != 0) {
>>> Why not just do:
>>> if (mid->ref_count()) {
>>> ?
>>>
>>> ##################
>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>
>>> ###1
>>> ??252?? intptr_t is_busy() const {
>>> ??253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
>>> ??254???? // We do not include _ref_count in the is_busy() check because
>>> ??255???? // _ref_count is for indicating that the ObjectMonitor* is in
>>> ??256???? // use which is orthogonal to whether the ObjectMonitor itself
>>> ??257???? // is in use for a locking operation.
>>>
>>> But in the non-debug code we always check:
>>> +? if (mid->is_busy() || mid->ref_count() != 0) {
>>>
>>> So it seem like you should have a method including ref count.
>>>
>>> ##################
>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>
>>> Use Atomic::load for ref count.
>>>
>>> ##################
>>> src/hotspot/share/runtime/synchronizer.cpp
>>>
>>> ###1
>>> ??139 static volatile int g_om_free_count = 0;??? // # on g_free_list
>>> ??140 static volatile int g_om_in_use_count = 0;? // # on 
>>> g_om_in_use_list
>>> ??141 static volatile int g_om_population = 0;??? // # Extant -- in 
>>> circulation
>>> ??142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
>>> No padding here, aren't they more contended than the fields in the OM?
>>>
>>> ###2
>>> 151 static bool is_next_marked(ObjectMonitor* om) {
>>>
>>> Is only used in ObjectSynchronizer::om_flush.
>>> Here you fetch a OM and read the next field, this do not need LA 
>>> semantics on supported platforms.
>>> This would only need Atomic::load.
>>>
>>> ###3
>>> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>>>
>>> In no place you need SR, in the only places it would made a difference:
>>> ??345?????? OrderAccess::storestore();
>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>> and
>>> 1714???? OrderAccess::storestore();
>>> 1715???? set_next(in_use_list, next);
>>>
>>> You have a storestore already!
>>>
>>> This code reads as:
>>> OrderAccess::storestore();
>>> OrderAccess::loadstore();
>>> OrderAccess::storestore();
>>> om->_next_om = value
>>>
>>> So it should be an Atomic::store.
>>>
>>> ###4
>>> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>>>
>>> Since the mark is an embedded spinlock I think the terminology should 
>>> be changed. (that the spinlock is inside a the next pointer should be 
>>> abstracted away)
>>> E.g. mark_next_loop would just be lock.
>>> The load of the list heads should use Atmoic:load.
>>> It also seem a bit wired to return next for the locking method.
>>> And output parameter can just be returned, and return NULL if list 
>>> head is NULL.
>>> E.g.
>>>
>>> ??198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* 
>>> volatile * list_p) {
>>> ??200?? while (true) {
>>> ??201???? ObjectMonitor* mid = Atomic::load(list_p);
>>> ??202???? if (mid == NULL) {
>>> ??203?????? return NULL;? // The list is empty.
>>> ??204???? }
>>> ??205???? if (try_lock(mid)) {
>>> ??206?????? if (Atmoic::load(list_p) != mid) {
>>> ??207???????? // The list head changed so we have to retry.
>>> ??208???????? unlock(mid);
>>> ??210?????? } else {
>>> ????????????? return mid;
>>> ??????? }
>>> ??214???? }
>>> ????????? // Yield ?
>>> ??215?? }
>>> ??216 }
>>>
>>> With colleteral changes.
>>>
>>> ###5
>>> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
>>> Atomic::store is what needed.
>>>
>>> ###6
>>> 333 static void prepend_to_common(
>>>
>>> ??345?????? OrderAccess::storestore();
>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>> Double storestore. (fixed by changing set_next to Atomic::store)
>>>
>>> ###7
>>> ??375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* 
>>> volatile * list_p,
>>>
>>> Triple storestore here.
>>>
>>> ??386?? Atomic::dec(count_p);
>>> ??387?? // mark_list_head() used cmpxchg() above, switching list head 
>>> can be lazier:
>>> ??388?? OrderAccess::storestore();
>>> ??389?? // Unmark take, but leave the next value for any lagging list
>>> ??390?? // walkers. It will get cleaned up when take is prepended to
>>> ??391?? // the in-use list:
>>> ??392?? set_next(take, next);
>>> ??393?? return take;
>>>
>>> Reads:
>>> count_p--
>>> OrderAccess::loadstore();
>>> OrderAccess::storestore();
>>> OrderAccess::storestore();
>>> OrderAccess::loadstore();
>>> OrderAccess::storestore();
>>> take->_next_om = next;
>>>
>>> Fixed by changing set_next to Atomic::store and removing the 
>>> OrderAccess::storestore();
>>>
>>> ###8
>>> ObjectSynchronizer::om_release(
>>>
>>> 1591?????? if (m == mid) {
>>> 1592???????? // We found 'm' on the per-thread in-use list so try to 
>>> extract it.
>>> 1593???????? if (cur_mid_in_use == NULL) {
>>> 1594?????????? // mid is the list head and it is marked. Switch the 
>>> list head
>>> 1595?????????? // to next which unmarks the list head, but leaves mid 
>>> marked:
>>> 1596?????????? self->om_in_use_list = next;
>>> 1597?????????? // mark_list_head() used cmpxchg() above, switching 
>>> list head can be lazier:
>>> 1598?????????? OrderAccess::storestore();
>>> 1599???????? } else {
>>> 1600?????????? // mid and cur_mid_in_use are marked. Switch 
>>> cur_mid_in_use's
>>> 1601?????????? // next field to next which unmarks cur_mid_in_use, 
>>> but leaves
>>> 1602?????????? // mid marked:
>>> 1603?????????? OrderAccess::release_store(&cur_mid_in_use->_next_om, 
>>> next);
>>> 1604???????? }
>>> 1605???????? extracted = true;
>>> 1606???????? Atomic::dec(&self->om_in_use_count);
>>> 1607???????? // Unmark mid, but leave the next value for any lagging 
>>> list
>>> 1608???????? // walkers. It will get cleaned up when mid is prepended to
>>> 1609???????? // the thread's free list:
>>> 1610???????? set_next(mid, next);
>>> 1611???????? break;
>>> 1612?????? }
>>>
>>> This does not look correct. Before taking this branch we have done a 
>>> cmpxchg in mark_list_head or mark_next_loop.
>>> This is how it reads:
>>> OrderAccess::storestore(); // from previous cmpxchg
>>> OrderAccess::loadstore(); // from previous cmpxchg
>>> 1591?????? if (m == mid) {
>>> 1593???????? if (cur_mid_in_use == NULL) {
>>> 1596?????????? self->om_in_use_list = next;
>>> 1598?????????? OrderAccess::storestore();
>>> 1599???????? } else {
>>> ??????????????? OrderAccess::storestore();
>>> ??????????????? OrderAccess::loadstore();
>>> 1603?????????? cur_mid_in_use->_next_om = next;
>>> 1604???????? }
>>> 1605???????? extracted = true;
>>> ????????????? OrderAccess::storestore();
>>> ????????????? OrderAccess::fence(); // 
>>> storestore|storeload|loadstore|loadload
>>> ????????? self->om_in_use_count--; // Atomic::dec
>>> ????????????? OrderAccess::storestore();
>>> ????????????? OrderAccess::loadstore();
>>> ????????????? OrderAccess::storestore();
>>> ????????????? OrderAccess::loadstore();
>>> ????????? mid->_next_om = next; // Atomic::store
>>> 1611???????? break;
>>> 1612?????? }
>>>
>>> extracted is local variable so you so not need any orderaccess before 
>>> it set.
>>> Fixed by changing set_next to Atomic::store, removing the 
>>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>>> Atmoic::store();
>>>
>>> ###9
>>> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>>>
>>> 1714???? OrderAccess::storestore();
>>> 1715???? set_next(in_use_list, next);
>>> Fixed by changing set_next to Atomic::store.
>>>
>>> ###10
>>> 1737???? self->om_free_list = NULL;
>>> 1738???? OrderAccess::storestore();? // Lazier memory is okay for 
>>> list walkers.
>>>
>>> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does 
>>> first thing cmpxchg so there is no need for this storestore.
>>>
>>> ###11
>>> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, 
>>> Thread* self,
>>>
>>> 1938?????? // Once ObjectMonitor is configured and the object is 
>>> associated
>>> 1939?????? // with the ObjectMonitor, it is safe to allow async 
>>> deflation:
>>> 1940?????? assert(m->is_new(), "freshly allocated monitor must be new");
>>> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
>>>
>>> So we use ref count, contention, waiter, owner and allocation state 
>>> to keep OM alive in different scenarios.
>>> There is not way for me to keep track of that. I don't see why you 
>>> would need more than owner and ref count.
>>> If you allocate the om with ref count 1 you can remove 
>>> _allocation_state and just decrease ref count here instead.
>>>
>>> ###12
>>> 2079 bool ObjectSynchronizer::deflate_monitor
>>>
>>> 2112???? if (AsyncDeflateIdleMonitors) {
>>> 2113?????? // clear() expects the owner field to be NULL and we won't 
>>> race
>>> 2114?????? // with the simple C2 ObjectMonitor
>>>
>>> The macro assambler code is not just executed by C2, so this comment 
>>> is a bit misleading. (there are some more also)
>>>
>>> ###13
>>> 2306 int ObjectSynchronizer::deflate_monitor_list(
>>>
>>> Same issue as ObjectSynchronizer::om_release.
>>> Fixed by changing set_next to Atomic::store, removing the 
>>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>>> Atmoic::store();
>>>
>>> ###14
>>> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
>>>
>>> This is the wrong method to call, it should 
>>> SafepointMechanism::should_block(Thread* thread);
>>>
>>> ###15
>>> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>>>
>>> 2616???? g_wait_list = NULL;
>>> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay 
>>> for list walkers.
>>>
>>> I don't see that g_wait_list is ever simutainously read.
>>> Either it is accessed by serviceThread outside a safepoint or by 
>>> VMThread inside a safepoint?
>>>
>>> It looks like g_wait_list can just be a local in:
>>> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>>>
>>> (disregarding the debug code that might read it in a safepoint)
>>>
>>> ###16
>>> 2722???????? assert(SafepointSynchronize::is_synchronizing(), "sanity 
>>> check");
>>>
>>> This is the wrong method to call, it should 
>>> SafepointMechanism::should_block(Thread* thread);
>>>
>>> ##################
>>> src/hotspot/share/runtime/vframe.cpp
>>>
>>> We are at safepoint or current thread or in a handshake, current 
>>> pending and waiting monitor is already stable.
>>>
>>> ##################
>>> src/hotspot/share/services/threadService.cpp
>>>
>>> These changes are only needed for the 
>>> -HandshakeAfterDeflateIdleMonitors path.
>>>
>>> ##################
>>> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java 
>>>
>>>
>>> Note: if OM had a weak to object instead this would not be needed.
>>>
>>> Thanks, Robbin
>>>
>>>
>>> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have made changes to the Async Monitor Deflation code in response to
>>>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., 
>>>> Robbin
>>>> and Erik O. for their comments!
>>>>
>>>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async 
>>>> Monitor
>>>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>>>> for sufficient bake time for such a big change. Nov. 12 is _next_ 
>>>> Tuesday
>>>> so we have 8 days from today to finish this code review cycle and push
>>>> this code for JDK14.
>>>>
>>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>>> reviews.
>>>>
>>>> I have attached the change list from CR7 to CR8 instead of putting 
>>>> it in
>>>> the body of this email. I've also added a link to the 
>>>> CR7-to-CR8-changes
>>>> file to the webrevs so it should be easy to find.
>>>>
>>>> Main bug URL:
>>>>
>>>> ???? JDK-8153224 Monitor deflation prolong safepoints
>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-14+21.
>>>>
>>>> Here's the full webrev URL for those folks that want to see all of the
>>>> current Async Monitor Deflation code in one go (v2.08 full):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full 
>>>>
>>>>
>>>> Some folks might want to see just what has changed since the last 
>>>> review
>>>> cycle so here's a webrev for that (v2.08 inc):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/ 
>>>>
>>>>
>>>> The OpenJDK wiki did not need any changes for this round:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>>>> tier[1-8]
>>>> testing on Oracle's usual set of platforms. It has also been through 
>>>> my usual
>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>>> addition
>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>>> tests in
>>>> my lab. Some testing is still running, but so far there are no new 
>>>> regressions.
>>>>
>>>> I have not yet done a SPECjbb2015 round on the 
>>>> CR8/v2.08/11-for-jdk14 bits.
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> The Async Monitor Deflation project is reaching the end game. I 
>>>>> have no
>>>>> changes planned for the project at this time so all that is left is 
>>>>> code
>>>>> review and any changes that results from those reviews.
>>>>>
>>>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>>>> reviews.
>>>>>
>>>>> I have attached the list of fixes from CR6 to CR7 instead of 
>>>>> putting it
>>>>> in the main body of this email.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-14+19.
>>>>>
>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>>>
>>>>>
>>>>> Some folks might want to see just what has changed since the last 
>>>>> review
>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>>>
>>>>>
>>>>> The OpenJDK wiki has been updated to match the 
>>>>> CR7/v2.07/10-for-jdk14 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>>>>> tier[1-8]
>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>> through my usual
>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>>>> addition
>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>>>> tests in
>>>>> my lab.
>>>>>
>>>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>>>>> tier[1-3]
>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>>>> process.
>>>>>
>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>>>> Performance lab
>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>
>>>>> ??? - "base" is jdk-14+18
>>>>> ??? - "v2.07" is the latest version and includes C2 
>>>>> inc_om_ref_count() support
>>>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>>>> specified
>>>>>
>>>>> ???????? hbIR?????????? hbIR
>>>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>>>>> ??? ---------------? ---------? --------? -------------? -------
>>>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>>>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>>>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>>>>> v2.07_handshake
>>>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 
>>>>> v2.07_off
>>>>>
>>>>> ??? - The Aurora Perf comparison tool reports:
>>>>>
>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>> ??????? ----------------------? -------------------- 
>>>>> --------------------
>>>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>>>>> p=0.790)
>>>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>>>>> p=0.536)
>>>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>>>>> p=0.347)
>>>>>
>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>
>>>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>>>
>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>> ??????? ----------------------? -------------------- 
>>>>> --------------------
>>>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>>>>> p=0.646)
>>>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>>>>> p=0.481)
>>>>>
>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>>>> for our first code review in that new context!!
>>>>>>
>>>>>> I've been focused on changing the monitor list management code to be
>>>>>> lock-free in order to make SPECjbb2015 happier. Of course with a 
>>>>>> change
>>>>>> like that, it takes a while to chase down all the new and wonderful
>>>>>> races. At this point, I have the code back to the same stability that
>>>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>>>
>>>>>> To lay the ground work for this round of review, I pushed the 
>>>>>> following
>>>>>> two fixes to jdk/jdk earlier today:
>>>>>>
>>>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>>>>> preparation
>>>>>> ? ? ??????????? for lock free Monitor lists
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>>>
>>>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>>>> after 8230184
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>>>
>>>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>>>> in the main body of this email.
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>>>> JDK-8230184 and JDK-8230317.
>>>>>>
>>>>>> Here's the full webrev URL for those folks that want to see all of 
>>>>>> the
>>>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>>>
>>>>>>
>>>>>>
>>>>>> The primary focus of this review cycle is on the lock-free Monitor 
>>>>>> List
>>>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>>>
>>>>>>
>>>>>> The secondary focus of this review cycle is on the bug fixes that 
>>>>>> have
>>>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just 
>>>>>> that
>>>>>> patch (v2.06b):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>>>
>>>>>>
>>>>>> The third and final bucket for this review cycle is the rename, 
>>>>>> whitespace,
>>>>>> indent and comments changes made in preparation for lock free 
>>>>>> Monitor list
>>>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>>>> baseline so this bucket now has just a few comment changes 
>>>>>> relative to
>>>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Some folks might want to see just what has changed since the last 
>>>>>> review
>>>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Last, but not least, some folks might want to see the code before the
>>>>>> addition of lock-free Monitor List management so here's a webrev for
>>>>>> that (v2.00 -> v2.05):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>>>
>>>>>>
>>>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> but that should only be changes to describe per-thread list async 
>>>>>> monitor
>>>>>> deflation being done by the ServiceThread.
>>>>>>
>>>>>> (I did update the OpenJDK wiki for the CR5 changes back on 
>>>>>> 2019.08.14)
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. It has also been through my usual 
>>>>>> set
>>>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>>>
>>>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora 
>>>>>> Performance lab
>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>>>>> using
>>>>>> this patch baselined on jdk-13+31 (for stability):
>>>>>>
>>>>>> ????????? hbIR?????????? hbIR
>>>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>>>> ???? ---------------? ---------? --------? ------------- -------
>>>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 
>>>>>> v2.06d
>>>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>>>>> v2.06d_off
>>>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>>>> v2.06d_ocache
>>>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 
>>>>>> v2.05
>>>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>>>>> v2.05_off
>>>>>>
>>>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified 
>>>>>> and
>>>>>> the "ocache" config is with 128 byte cache line sizes instead of 
>>>>>> 64 byte
>>>>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>>>>> before
>>>>>> those changes were distributed into the "v2.06a", "v2.06b" and 
>>>>>> "v2.06c"
>>>>>> buckets for this review recycle.
>>>>>>
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>>>> Thanks Karen!
>>>>>>>
>>>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>>>> in the main body of this email.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>>>> (jdk/jdk) repo soon...
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>>>
>>>>>>> I have not yet checked the OpenJDK wiki to see if it needs any 
>>>>>>> updates
>>>>>>> to match the CR5 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 
>>>>>>> 2019.06.26)
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>>>> available.
>>>>>>> Since I haven't made any performance changes in this round, I'll 
>>>>>>> only
>>>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>>>
>>>>>>> Next up:
>>>>>>>
>>>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>>>> - Of course, going lock free often causes new races and new failures
>>>>>>> ? so that's a good reason for make those changes isolated in their
>>>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>>>> - I finally have a potential fix for the Win* failure with
>>>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next 
>>>>>>> round.
>>>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing 
>>>>>>> some
>>>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>>>> ? look at the C2 RTM macro assembler code and look for things 
>>>>>>> that might
>>>>>>> ? conflict if Async Monitor Deflation. If you're interested in 
>>>>>>> that kind
>>>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>>>> ? added in this round!
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>>>> experiments.
>>>>>>>>
>>>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>>>> themselves are not that big.
>>>>>>>>
>>>>>>>> Functional:
>>>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>>>> ?????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>>>> ??????? safepoints that simply request async deflation which
>>>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>>>> ??????? their async deflation work.
>>>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>>>> ????? monitors_used_above_threshold() is true (but no more often 
>>>>>>>> than
>>>>>>>> ????? AsyncDeflationInterval).
>>>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits 
>>>>>>>> for
>>>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>>>> ??????? ServiceThread.
>>>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>>>> ??? in-use list and now it also marks JavaThreads for async 
>>>>>>>> deflation
>>>>>>>> ??? of their in-use lists.
>>>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>>>> ????? GuaranteedSafepointInterval.
>>>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>>>> ??? flag to reduce the in-use monitor population that is 
>>>>>>>> reported by
>>>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>>>
>>>>>>>> Test update:
>>>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work 
>>>>>>>> with
>>>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>>>
>>>>>>>> Collateral:
>>>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>>>
>>>>>>>> Cleanup:
>>>>>>>> ? - Updated comments based on Karen's code review.
>>>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>>>> ??? - comment and function name changes
>>>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>>>
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> The wiki doesn't say a whole lot about the async deflation 
>>>>>>>> invocation
>>>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug 
>>>>>>>> bits
>>>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have 
>>>>>>>> to run
>>>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race 
>>>>>>>>> was
>>>>>>>>> theoretical and I had no test failures due to it. The fix is 
>>>>>>>>> pretty
>>>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>>>
>>>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>>>> change and I have run it through the usual stress and Mach5 
>>>>>>>>> testing
>>>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>>>> Sorry for all the names...
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>>>
>>>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>>>> Solaris-X64 release had the usual "Too large time diff" 
>>>>>>>>> complaints.
>>>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>>>> stress kit is running right now.
>>>>>>>>>
>>>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>>>> the results and analyze them.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>>>> Deflation
>>>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>>>> with the
>>>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev 
>>>>>>>>>> URLs). Sorry
>>>>>>>>>> for all the names...
>>>>>>>>>>
>>>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that 
>>>>>>>>>> baseline patch
>>>>>>>>>> is out of our hair.
>>>>>>>>>>
>>>>>>>>>> Main bug URL:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>>>
>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 
>>>>>>>>>> changes:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>>>>> running
>>>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, 
>>>>>>>>>> MacOSX
>>>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on 
>>>>>>>>>> product,
>>>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and 
>>>>>>>>>> Solaris-X64.
>>>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday 
>>>>>>>>>> (after
>>>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>>>
>>>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>>>> testing is done.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>>>> ready to
>>>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>>>>> and as
>>>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for 
>>>>>>>>>>> all the
>>>>>>>>>>> names...
>>>>>>>>>>>
>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>
>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>
>>>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>>>
>>>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>>>> Deflation project
>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>>>
>>>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>>>
>>>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the 
>>>>>>>>>>> latest version
>>>>>>>>>>> of JDK-8153224...
>>>>>>>>>>>
>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR 
>>>>>>>>>>> changes:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] 
>>>>>>>>>>> testing on
>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running 
>>>>>>>>>>> now and
>>>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>>>> Solaris-X64
>>>>>>>>>>> is running now. Linux-X64 stress testing will start on 
>>>>>>>>>>> Sunday. I'm
>>>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> Welcome to the OpenJDK review thread for my port of 
>>>>>>>>>>>> Carsten's work on:
>>>>>>>>>>>>
>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>>>
>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>>>
>>>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>>>> rounds of
>>>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>>>> Roman for their preliminary code review comments. A very 
>>>>>>>>>>>> special
>>>>>>>>>>>> thanks to Robbin and Roman for building and testing the 
>>>>>>>>>>>> patch in
>>>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>>>
>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>>>> testing on
>>>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>>>> Kitchensink
>>>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>>>>> fastdebug
>>>>>>>>>>>> and slowdebug). Earlier versions have run my monitor 
>>>>>>>>>>>> inflation stress
>>>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>>>> (product,
>>>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>>>
>>>>>>>>>>>> All of the testing done on earlier versions will be redone 
>>>>>>>>>>>> on the
>>>>>>>>>>>> latest version of the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>> P.S.
>>>>>>>>>>>> One subtest in 
>>>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>>>>> trying
>>>>>>>>>>>> to characterize/analyze this failure for more than a week 
>>>>>>>>>>>> now. At
>>>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>>>> aggravating
>>>>>>>>>>>> an existing bug. However, I plan to have a better handle on 
>>>>>>>>>>>> that
>>>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>

From suenaga at oss.nttdata.com  Mon Nov 11 14:07:27 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Mon, 11 Nov 2019 23:07:27 +0900
Subject: RFR: 8233785: Incorrect JDK version is reported in hs_err log
In-Reply-To: <317c088d-687c-c9a5-cc7f-c6744ea275a5@oracle.com>
References: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>
 <6811d542-a530-5d70-5fd6-bea47de81d35@oracle.com>
 <317c088d-687c-c9a5-cc7f-c6744ea275a5@oracle.com>
Message-ID: <e4a5a5c4-c9de-a93a-5f59-3528c534a6df@oss.nttdata.com>

Thanks David!
I wait second reviewer.

Yasumasa

On 2019/11/11 19:28, David Holmes wrote:
> Sorry for the delay.
> 
> Just confirming I've verified against the spec from JEP 223 and this fix is correct.
> 
> Thanks,
> David
> 
> On 7/11/2019 10:39 pm, David Holmes wrote:
>> Hi Yasumasa,
>>
>> On 7/11/2019 10:28 pm, Yasumasa Suenaga wrote:
>>> Hi all,
>>>
>>> Please review this change:
>>>
>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8233785
>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233785/webrev.00/
>>>
>>> If JVM which is configured with --with-version-patch is crashed, JDK version in he_err log is incorrect.
>>> We can get hs_err log which contains the following in header when we configure configure with "--with-version-update=0 --with-version-patch=1":
>>>
>>> ```
>>> # JRE version: OpenJDK Runtime Environment (14.0.1+2) (build 14.0.0.1+2-TypeS)
>>> ```
>>>
>>> Valid JDK version is "14.0.0.1", however it includes "14.0.1".
>>> It is a bug in JDK_Version::to_string().
>>
>> I initially missed the fact that you always print _security along with _patch.
>>
>> I think what you have looks correct, but I'd want to double check that against the versioning spec to be sure.
>>
>> Thanks,
>> David
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa

From martin.doerr at sap.com  Mon Nov 11 15:06:54 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 11 Nov 2019 15:06:54 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
Message-ID: <VI1PR0201MB247982FA923CE145853156219A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi G?tz,

the PPC64 code looks good, too.
Thanks for fixing and improving it.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-
> bounces at openjdk.java.net> On Behalf Of David Holmes
> Sent: Montag, 11. November 2019 08:56
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> Hi Goetz,
> 
> Please note I only looked at the test initially and have not reviewed
> this overall fix as I don't know the PPC code.
> 
> The updated test seems fine.
> 
> Thanks,
> David
> 
> On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> > Hi,
> >
> > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> > which makes one of the fixes unnecessary.
> > Also, I had to fix the argument of verify_oop_helper
> > from oop to oopDesc* for the fastdebug build.
> >
> > New webrev:
> > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Freitag, 18. Oktober 2019 01:38
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>
> >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> >>> Hi David,
> >>>
> >>> you are right, thanks for pointing me to that!
> >>> Doing one test for vm.bits=64 and one for 32 should fix it:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>
> >> s/01/02/ :)
> >>
> >> For the 32-bit case you can delete the line:
> >>
> >>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> >>
> >> For the 64-but case you can delete the "sparc" check from the same line.
> >>
> >> Thanks,
> >> David
> >>
> >>>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Donnerstag, 17. Oktober 2019 13:18
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> runtime-
> >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-
> >>>> compiler-dev at openjdk.java.net>
> >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> UseCompressedOops is a 64-bit flag only so your change will break the
> >>>> test on 32-bit systems.
> >>>>
> >>>> David
> >>>>
> >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> >>>>> Hi,
> >>>>>
> >>>>> 8231058 introduced a test that enables +VerifyOops.
> >>>>> This fails on ppc, because this was not used in a very
> >>>>> long time.
> >>>>>
> >>>>> The crash is caused by passing compressed oops from
> >>>>> LIR_Assembler::store() to the checker routine.
> >>>>> I fix this by implementing a checker routine verify_coop
> >>>>> that first decompresses the coop.  This makes the new
> >>>>> test pass.
> >>>>>
> >>>>> Further testing showed that the additional checker
> >>>>> coding makes Patching Stubs overflow. These
> >>>>> can not be increased in size to fit the code. I
> >>>>> disable generating verify_oop code in LIRAssembler::load()
> >>>>> which fixes the issue.
> >>>>>
> >>>>> Further I extended the message printed when verification
> >>>>> of an oop failed. First, I print the location in the source
> >>>>> code where the checker code was generated. Second,
> >>>>> I print the faulty oop.
> >>>>>
> >>>>> I also improved the message printed when PatchingStubs
> >>>>> overflow.
> >>>>>
> >>>>> Finally, I improve the test to run with and without compressed
> >>>>> Oops.
> >>>>>
> >>>>> Please review:
> >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>>>>
> >>>>> @runtime as I modify the test introduced there
> >>>>> @compiler as the error is in C1.
> >>>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>

From goetz.lindenmaier at sap.com  Mon Nov 11 15:17:23 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 11 Nov 2019 15:17:23 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
Message-ID: <AM6PR02MB534728BE141A46D33E45A298EC740@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi David,

thanks for looking again. 
Martin checked the PPC code.

Best regards,
  Goetz.

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Montag, 11. November 2019 08:56
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> compiler-dev at openjdk.java.net>
> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> Hi Goetz,
> 
> Please note I only looked at the test initially and have not reviewed
> this overall fix as I don't know the PPC code.
> 
> The updated test seems fine.
> 
> Thanks,
> David
> 
> On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> > Hi,
> >
> > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> > which makes one of the fixes unnecessary.
> > Also, I had to fix the argument of verify_oop_helper
> > from oop to oopDesc* for the fastdebug build.
> >
> > New webrev:
> > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> >
> > Best regards,
> >    Goetz.
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Freitag, 18. Oktober 2019 01:38
> >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> >> compiler-dev at openjdk.java.net>
> >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>
> >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> >>> Hi David,
> >>>
> >>> you are right, thanks for pointing me to that!
> >>> Doing one test for vm.bits=64 and one for 32 should fix it:
> >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>
> >> s/01/02/ :)
> >>
> >> For the 32-bit case you can delete the line:
> >>
> >>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> >>
> >> For the 64-but case you can delete the "sparc" check from the same line.
> >>
> >> Thanks,
> >> David
> >>
> >>>
> >>> Best regards,
> >>>     Goetz.
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Donnerstag, 17. Oktober 2019 13:18
> >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> <hotspot-
> >>>> compiler-dev at openjdk.java.net>
> >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >>>>
> >>>> Hi Goetz,
> >>>>
> >>>> UseCompressedOops is a 64-bit flag only so your change will break the
> >>>> test on 32-bit systems.
> >>>>
> >>>> David
> >>>>
> >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> >>>>> Hi,
> >>>>>
> >>>>> 8231058 introduced a test that enables +VerifyOops.
> >>>>> This fails on ppc, because this was not used in a very
> >>>>> long time.
> >>>>>
> >>>>> The crash is caused by passing compressed oops from
> >>>>> LIR_Assembler::store() to the checker routine.
> >>>>> I fix this by implementing a checker routine verify_coop
> >>>>> that first decompresses the coop.  This makes the new
> >>>>> test pass.
> >>>>>
> >>>>> Further testing showed that the additional checker
> >>>>> coding makes Patching Stubs overflow. These
> >>>>> can not be increased in size to fit the code. I
> >>>>> disable generating verify_oop code in LIRAssembler::load()
> >>>>> which fixes the issue.
> >>>>>
> >>>>> Further I extended the message printed when verification
> >>>>> of an oop failed. First, I print the location in the source
> >>>>> code where the checker code was generated. Second,
> >>>>> I print the faulty oop.
> >>>>>
> >>>>> I also improved the message printed when PatchingStubs
> >>>>> overflow.
> >>>>>
> >>>>> Finally, I improve the test to run with and without compressed
> >>>>> Oops.
> >>>>>
> >>>>> Please review:
> >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> >>>>>
> >>>>> @runtime as I modify the test introduced there
> >>>>> @compiler as the error is in C1.
> >>>>>
> >>>>> Best regards,
> >>>>>      Goetz.
> >>>>>

From goetz.lindenmaier at sap.com  Mon Nov 11 15:18:03 2019
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 11 Nov 2019 15:18:03 +0000
Subject: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
In-Reply-To: <VI1PR0201MB247982FA923CE145853156219A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <AM6PR02MB534783144A9A6CF30C1049F8EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <3f1be78d-b5e3-192b-d05f-e81ed520d65a@oracle.com>
 <AM6PR02MB534729CACDA052E9B80ED146EC6D0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <5e283ed2-1f93-f3a1-1762-da6d5cf465a2@oracle.com>
 <AM6PR02MB5347AB9322749A90AED3128BEC7B0@AM6PR02MB5347.eurprd02.prod.outlook.com>
 <cb3b92d1-5dee-8d51-1298-2d01a016ef2b@oracle.com>
 <VI1PR0201MB247982FA923CE145853156219A740@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <AM6PR02MB534721086E3BEDC657A5436BEC740@AM6PR02MB5347.eurprd02.prod.outlook.com>

Hi Martin, 

thanks for looking at this, 
and thanks for resolving the patching stub issue!

Best regards,
  Goetz.

> -----Original Message-----
> From: Doerr, Martin <martin.doerr at sap.com>
> Sent: Montag, 11. November 2019 16:07
> To: David Holmes <david.holmes at oracle.com>; Lindenmaier, Goetz
> <goetz.lindenmaier at sap.com>; hotspot-runtime-dev at openjdk.java.net;
> 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: RE: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> 
> Hi G?tz,
> 
> the PPC64 code looks good, too.
> Thanks for fixing and improving it.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: hotspot-compiler-dev <hotspot-compiler-dev-
> > bounces at openjdk.java.net> On Behalf Of David Holmes
> > Sent: Montag, 11. November 2019 08:56
> > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> > dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-
> > compiler-dev at openjdk.java.net>
> > Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> >
> > Hi Goetz,
> >
> > Please note I only looked at the test initially and have not reviewed
> > this overall fix as I don't know the PPC code.
> >
> > The updated test seems fine.
> >
> > Thanks,
> > David
> >
> > On 9/11/2019 1:32 am, Lindenmaier, Goetz wrote:
> > > Hi,
> > >
> > > I waited for https://bugs.openjdk.java.net/browse/JDK-8233081
> > > which makes one of the fixes unnecessary.
> > > Also, I had to fix the argument of verify_oop_helper
> > > from oop to oopDesc* for the fastdebug build.
> > >
> > > New webrev:
> > > http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/03/
> > >
> > > Best regards,
> > >    Goetz.
> > >
> > >> -----Original Message-----
> > >> From: David Holmes <david.holmes at oracle.com>
> > >> Sent: Freitag, 18. Oktober 2019 01:38
> > >> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-runtime-
> > >> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> > <hotspot-
> > >> compiler-dev at openjdk.java.net>
> > >> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> > >>
> > >> On 18/10/2019 12:10 am, Lindenmaier, Goetz wrote:
> > >>> Hi David,
> > >>>
> > >>> you are right, thanks for pointing me to that!
> > >>> Doing one test for vm.bits=64 and one for 32 should fix it:
> > >>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> > >>
> > >> s/01/02/ :)
> > >>
> > >> For the 32-bit case you can delete the line:
> > >>
> > >>      * @requires vm.debug & (os.arch != "sparc") & (os.arch != "sparcv9")
> > >>
> > >> For the 64-but case you can delete the "sparc" check from the same line.
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >>>
> > >>> Best regards,
> > >>>     Goetz.
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: David Holmes <david.holmes at oracle.com>
> > >>>> Sent: Donnerstag, 17. Oktober 2019 13:18
> > >>>> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; hotspot-
> > runtime-
> > >>>> dev at openjdk.java.net; 'hotspot-compiler-dev at openjdk.java.net'
> > <hotspot-
> > >>>> compiler-dev at openjdk.java.net>
> > >>>> Subject: Re: 8231757: [ppc] Fix VerifyOops. Errors show since 8231058.
> > >>>>
> > >>>> Hi Goetz,
> > >>>>
> > >>>> UseCompressedOops is a 64-bit flag only so your change will break the
> > >>>> test on 32-bit systems.
> > >>>>
> > >>>> David
> > >>>>
> > >>>> On 17/10/2019 8:55 pm, Lindenmaier, Goetz wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> 8231058 introduced a test that enables +VerifyOops.
> > >>>>> This fails on ppc, because this was not used in a very
> > >>>>> long time.
> > >>>>>
> > >>>>> The crash is caused by passing compressed oops from
> > >>>>> LIR_Assembler::store() to the checker routine.
> > >>>>> I fix this by implementing a checker routine verify_coop
> > >>>>> that first decompresses the coop.  This makes the new
> > >>>>> test pass.
> > >>>>>
> > >>>>> Further testing showed that the additional checker
> > >>>>> coding makes Patching Stubs overflow. These
> > >>>>> can not be increased in size to fit the code. I
> > >>>>> disable generating verify_oop code in LIRAssembler::load()
> > >>>>> which fixes the issue.
> > >>>>>
> > >>>>> Further I extended the message printed when verification
> > >>>>> of an oop failed. First, I print the location in the source
> > >>>>> code where the checker code was generated. Second,
> > >>>>> I print the faulty oop.
> > >>>>>
> > >>>>> I also improved the message printed when PatchingStubs
> > >>>>> overflow.
> > >>>>>
> > >>>>> Finally, I improve the test to run with and without compressed
> > >>>>> Oops.
> > >>>>>
> > >>>>> Please review:
> > >>>>> http://cr.openjdk.java.net/~goetz/wr19/8231757-fix_VerifyOops/01/
> > >>>>>
> > >>>>> @runtime as I modify the test introduced there
> > >>>>> @compiler as the error is in C1.
> > >>>>>
> > >>>>> Best regards,
> > >>>>>      Goetz.
> > >>>>>

From robbin.ehn at oracle.com  Mon Nov 11 20:24:54 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 11 Nov 2019 21:24:54 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
Message-ID: <73dec889-15c1-4f59-11d7-1df89ee99150@oracle.com>

Hi David,

On 2019-11-11 15:03, David Holmes wrote:
> Word-tearing is only a potential issue for 16-bit or smaller accesses, or 
> unaligned 32-bit or 64-bit accesses. But we don't (shouldn't) use unaligned 
> 32-bit and 64-bit accesses to ensure there is no possibility of word-tearing. 
> Otherwise we would need to use Atomic::load/store for every lock-free algorithm 
> and data-structure that we have.

Yes, but we must be make sure compiler generates such accesses.

Here is what Linux docs says.
https://www.kernel.org/doc/Documentation/memory-barriers.txt:

  (*) For aligned memory locations whose size allows them to be accessed
      with a single memory-reference instruction, prevents "load tearing"
      and "store tearing," in which a single large access is replaced by
      multiple smaller accesses.  For example, given an architecture having
      16-bit store instructions with 7-bit immediate fields, the compiler
      might be tempted to use two 16-bit store-immediate instructions to
      implement the following 32-bit store:

	p = 0x00010002;

      Please note that GCC really does use this sort of optimization,
      which is not surprising given that it would likely take more
      than two instructions to build the constant and then store it.
      This optimization can therefore be a win in single-threaded code.
      In fact, a recent bug (since fixed) caused GCC to incorrectly use
      this optimization in a volatile store.  In the absence of such bugs,
      use of WRITE_ONCE() prevents store tearing in the following example:

	WRITE_ONCE(p, 0x00010002);


WRITE_ONCE implemented as cast to volatile then store.
https://elixir.bootlin.com/linux/latest/source/include/linux/compiler.h#L220

WRITE_ONCE and Atomic::store is the same.

> 
> Atomic::load/store was primarily needed for 64-bit values on 32-bit platforms.
> 
> 
> The point is we shouldn't need to guarantee atomic load/store for 32-bit or 
> 64-bit values using the Atomic class because it is the implicit mode in which we 
> operate.

Not sure, what the implicit mode is.
Compiler may generate basically whatever it fells like as long as a 
single-threaded execution have the same final result.
The compiler may elide the stores/loads completely, it may fuse accesses, it may 
add accesses, etc...

E.g.

if (b == 7) a = 7;
else a = 0;
c = 42;

Compiler may emit:
a = 0; // extra store
c = 42;
if (b == 7) a = 7;

Making it a volatile will prevents that and turn it into a atomic store in both 
branches.
But volatile might also force the compiler to order it with c (depending on if c 
is volatile or not).
If we use Atomic::load/store it's much more obvious that we allow the compiler 
to move store of c before store of a.

> 
> But I can see quite a number of uses have crept in to the code base. :( Go back 
> to Java 7 and the only use of Atomic's was for dealing with 64-bit on 32-bit 
> platforms.
> 
> I also can't see how/where the Atomic class is in fact doing anything to 
> guarantee atomicity for 32-bit or 64-bit values.

It casts to volatile AFAIK.
When we can it should be C++ Atomic store with memory_order_relaxed.
So compiler may re-order atomic stores/loads.

> volatile is only used to prevent basic compiler optimizations from being 
> applied. 

And this prevention is either to weak or to strong, it's always wrong.
I guess that is also the reason for C++ proposal of deprecating volatile.

> It is used for any concurrently modified variable that is accessed
> lock-free to at least request the compiler to not try to be clever when 
> accessing this variable. This may not have any well specified semantics 
> according to language specifications but we have always used compilers in good 
> faith that they do the right thing. Use of volatile has nothing to do with any 
> perceived atomicity of access, nor does it suggest anything about hardware 
> reordering.

volatile forces the compiler to generate plain store/load instructions which 
gives us the atomic property we want, but the reordering constraints are an 
unwanted side-effect. Using Atomic::load/store gives us exactly what we need.

> 
> Atomicity of access for 32-bit and 64-bit values is implicitly obtained by using 
> plain load/stores and having suitable aligned variables. That's the way it is 
> supposed to work so that we don't need to write Atomic::load and Atomic::store 
> on every variables used in lock-free contexts. But it seems that message has not 
> been passed on through the years. I can point you to an internal wiki that I 
> wrote up on 2010 where it states:

The only way to guaratee that the compiler will emit plain stores/load is to use 
Atomic:: or volatile. Often if you have taken the time to write something 
lock-free you have done that for preformance, and any un-needed ordering just 
reduce performance. So using volatile semantics is wrong, either you have a bug 
or to strict ordering.

> 
> "In addition the Java platform requires that basic accesses (simple loads and 
> stores, but not compound operations like increment) are atomic for all 32-bit 
> Java data types (all except long and double). This is usually trivially achieved 
> by aligning values on 32-bit boundaries on 32-bit, or 64-bit systems."

We are compiling with gcc/clang, not C2, which do not care about above.

Now, I'm not suggestion changing anything, but I think new code should use 
correct semantic. This code is a bit mixed since the prexisting code use volatile.

We don't have consume yet and acquire is to strong, I suggested relaxed, 
Atomic::load(). Which I think is correct for all usecases of ref_count() except 
the ADIM_guarantee where we double load ref count, where I think consume is 
correct. But consume does not help, since if the ref count is wrong who knows 
what the second load will be.

/Robbin

> 
> David
> -----
> 
>> Thanks, Robbin
>>
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 8/11/2019 11:35 pm, Robbin Ehn wrote:
>>>> Hi Dan,
>>>>
>>>> Thanks for looking into this, some comments on v8:
>>>>
>>>> ##################
>>>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>>>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>>>> src/hotspot/share/logging/logTag.hpp
>>>> src/hotspot/share/oops/markWord.hpp
>>>> src/hotspot/share/runtime/basicLock.cpp
>>>> src/hotspot/share/runtime/safepoint.cpp
>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>> src/hotspot/share/runtime/sharedRuntime.cpp
>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>> src/hotspot/share/runtime/vmStructs.cpp
>>>> src/hotspot/share/runtime/vmThread.cpp
>>>> test/hotspot/gtest/oops/test_markWord.cpp
>>>>
>>>> No comments.
>>>>
>>>> ##################
>>>> I don't see the benefit of having the -HandshakeAfterDeflateIdleMonitors 
>>>> code paths.
>>>> Removing that option would mean these files can be reverted:
>>>> src/hotspot/cpu/aarch64/globals_aarch64.hpp
>>>> src/hotspot/cpu/arm/globals_arm.hpp
>>>> src/hotspot/cpu/ppc/globals_ppc.hpp
>>>> src/hotspot/cpu/s390/globals_s390.hpp
>>>> src/hotspot/cpu/sparc/globals_sparc.hpp
>>>> src/hotspot/cpu/x86/globals_x86.hpp
>>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>>> src/hotspot/cpu/zero/globals_zero.hpp
>>>>
>>>> And one less option here:
>>>> src/hotspot/share/runtime/globals.hpp
>>>>
>>>> ##################
>>>> src/hotspot/share/prims/jvm.cpp
>>>>
>>>> Unclear if this is a good idea.
>>>>
>>>> ##################
>>>> src/hotspot/share/prims/whitebox.cpp
>>>>
>>>> This would assume the test expects the right thing, but that is not obvious.
>>>>
>>>> ##################
>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>>
>>>> The current pending and waiting monitor is only changed by the JavaThread 
>>>> itself.
>>>> It only sets it after _contentions is increased.
>>>> It clears it before _contentions is decreased.
>>>> We are depending on safepoint or the thread is suspended, so it can't be 
>>>> deflated since _contentions are > 0.
>>>> Plus the thread have already increased the ref count and can't decrease it 
>>>> (since at safepoint or suspended).
>>>>
>>>> ##################
>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>
>>>> ###1
>>>> You have several these (and in other files):
>>>> 242?? jint l_ref_count = ref_count();
>>>> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d, 
>>>> ref_count=%d", l_ref_count, ref_count());
>>>> Please use Atomic::load() in ref_count.
>>>> Since this is dependent on ref_count being volatile, otherwise the compiler 
>>>> may only do one load.
>>>>
>>>> ###2
>>>> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
>>>> ...
>>>> 311?? Atomic::add(1, &_contentions);
>>>> In ObjectSynchronizer::deflate_monitor if you would check ref count instead 
>>>> of _contetion, we could remove contention.
>>>> Since all waiters also have a ref count it looks like we don't need waiters 
>>>> either.
>>>> In ObjectSynchronizer::deflate_monitor:
>>>> if (mid->_contentions != 0 || mid->_waiters != 0) {
>>>> Why not just do:
>>>> if (mid->ref_count()) {
>>>> ?
>>>>
>>>> ##################
>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>
>>>> ###1
>>>> ??252?? intptr_t is_busy() const {
>>>> ??253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
>>>> ??254???? // We do not include _ref_count in the is_busy() check because
>>>> ??255???? // _ref_count is for indicating that the ObjectMonitor* is in
>>>> ??256???? // use which is orthogonal to whether the ObjectMonitor itself
>>>> ??257???? // is in use for a locking operation.
>>>>
>>>> But in the non-debug code we always check:
>>>> +? if (mid->is_busy() || mid->ref_count() != 0) {
>>>>
>>>> So it seem like you should have a method including ref count.
>>>>
>>>> ##################
>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>
>>>> Use Atomic::load for ref count.
>>>>
>>>> ##################
>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>
>>>> ###1
>>>> ??139 static volatile int g_om_free_count = 0;??? // # on g_free_list
>>>> ??140 static volatile int g_om_in_use_count = 0;? // # on g_om_in_use_list
>>>> ??141 static volatile int g_om_population = 0;??? // # Extant -- in circulation
>>>> ??142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
>>>> No padding here, aren't they more contended than the fields in the OM?
>>>>
>>>> ###2
>>>> 151 static bool is_next_marked(ObjectMonitor* om) {
>>>>
>>>> Is only used in ObjectSynchronizer::om_flush.
>>>> Here you fetch a OM and read the next field, this do not need LA semantics 
>>>> on supported platforms.
>>>> This would only need Atomic::load.
>>>>
>>>> ###3
>>>> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>>>>
>>>> In no place you need SR, in the only places it would made a difference:
>>>> ??345?????? OrderAccess::storestore();
>>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>>> and
>>>> 1714???? OrderAccess::storestore();
>>>> 1715???? set_next(in_use_list, next);
>>>>
>>>> You have a storestore already!
>>>>
>>>> This code reads as:
>>>> OrderAccess::storestore();
>>>> OrderAccess::loadstore();
>>>> OrderAccess::storestore();
>>>> om->_next_om = value
>>>>
>>>> So it should be an Atomic::store.
>>>>
>>>> ###4
>>>> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>>>>
>>>> Since the mark is an embedded spinlock I think the terminology should be 
>>>> changed. (that the spinlock is inside a the next pointer should be 
>>>> abstracted away)
>>>> E.g. mark_next_loop would just be lock.
>>>> The load of the list heads should use Atmoic:load.
>>>> It also seem a bit wired to return next for the locking method.
>>>> And output parameter can just be returned, and return NULL if list head is 
>>>> NULL.
>>>> E.g.
>>>>
>>>> ??198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* volatile * 
>>>> list_p) {
>>>> ??200?? while (true) {
>>>> ??201???? ObjectMonitor* mid = Atomic::load(list_p);
>>>> ??202???? if (mid == NULL) {
>>>> ??203?????? return NULL;? // The list is empty.
>>>> ??204???? }
>>>> ??205???? if (try_lock(mid)) {
>>>> ??206?????? if (Atmoic::load(list_p) != mid) {
>>>> ??207???????? // The list head changed so we have to retry.
>>>> ??208???????? unlock(mid);
>>>> ??210?????? } else {
>>>> ????????????? return mid;
>>>> ??????? }
>>>> ??214???? }
>>>> ????????? // Yield ?
>>>> ??215?? }
>>>> ??216 }
>>>>
>>>> With colleteral changes.
>>>>
>>>> ###5
>>>> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
>>>> Atomic::store is what needed.
>>>>
>>>> ###6
>>>> 333 static void prepend_to_common(
>>>>
>>>> ??345?????? OrderAccess::storestore();
>>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>>> Double storestore. (fixed by changing set_next to Atomic::store)
>>>>
>>>> ###7
>>>> ??375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* 
>>>> volatile * list_p,
>>>>
>>>> Triple storestore here.
>>>>
>>>> ??386?? Atomic::dec(count_p);
>>>> ??387?? // mark_list_head() used cmpxchg() above, switching list head can be 
>>>> lazier:
>>>> ??388?? OrderAccess::storestore();
>>>> ??389?? // Unmark take, but leave the next value for any lagging list
>>>> ??390?? // walkers. It will get cleaned up when take is prepended to
>>>> ??391?? // the in-use list:
>>>> ??392?? set_next(take, next);
>>>> ??393?? return take;
>>>>
>>>> Reads:
>>>> count_p--
>>>> OrderAccess::loadstore();
>>>> OrderAccess::storestore();
>>>> OrderAccess::storestore();
>>>> OrderAccess::loadstore();
>>>> OrderAccess::storestore();
>>>> take->_next_om = next;
>>>>
>>>> Fixed by changing set_next to Atomic::store and removing the 
>>>> OrderAccess::storestore();
>>>>
>>>> ###8
>>>> ObjectSynchronizer::om_release(
>>>>
>>>> 1591?????? if (m == mid) {
>>>> 1592???????? // We found 'm' on the per-thread in-use list so try to extract 
>>>> it.
>>>> 1593???????? if (cur_mid_in_use == NULL) {
>>>> 1594?????????? // mid is the list head and it is marked. Switch the list head
>>>> 1595?????????? // to next which unmarks the list head, but leaves mid marked:
>>>> 1596?????????? self->om_in_use_list = next;
>>>> 1597?????????? // mark_list_head() used cmpxchg() above, switching list head 
>>>> can be lazier:
>>>> 1598?????????? OrderAccess::storestore();
>>>> 1599???????? } else {
>>>> 1600?????????? // mid and cur_mid_in_use are marked. Switch cur_mid_in_use's
>>>> 1601?????????? // next field to next which unmarks cur_mid_in_use, but leaves
>>>> 1602?????????? // mid marked:
>>>> 1603?????????? OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
>>>> 1604???????? }
>>>> 1605???????? extracted = true;
>>>> 1606???????? Atomic::dec(&self->om_in_use_count);
>>>> 1607???????? // Unmark mid, but leave the next value for any lagging list
>>>> 1608???????? // walkers. It will get cleaned up when mid is prepended to
>>>> 1609???????? // the thread's free list:
>>>> 1610???????? set_next(mid, next);
>>>> 1611???????? break;
>>>> 1612?????? }
>>>>
>>>> This does not look correct. Before taking this branch we have done a cmpxchg 
>>>> in mark_list_head or mark_next_loop.
>>>> This is how it reads:
>>>> OrderAccess::storestore(); // from previous cmpxchg
>>>> OrderAccess::loadstore(); // from previous cmpxchg
>>>> 1591?????? if (m == mid) {
>>>> 1593???????? if (cur_mid_in_use == NULL) {
>>>> 1596?????????? self->om_in_use_list = next;
>>>> 1598?????????? OrderAccess::storestore();
>>>> 1599???????? } else {
>>>> ??????????????? OrderAccess::storestore();
>>>> ??????????????? OrderAccess::loadstore();
>>>> 1603?????????? cur_mid_in_use->_next_om = next;
>>>> 1604???????? }
>>>> 1605???????? extracted = true;
>>>> ????????????? OrderAccess::storestore();
>>>> ????????????? OrderAccess::fence(); // storestore|storeload|loadstore|loadload
>>>> ????????? self->om_in_use_count--; // Atomic::dec
>>>> ????????????? OrderAccess::storestore();
>>>> ????????????? OrderAccess::loadstore();
>>>> ????????????? OrderAccess::storestore();
>>>> ????????????? OrderAccess::loadstore();
>>>> ????????? mid->_next_om = next; // Atomic::store
>>>> 1611???????? break;
>>>> 1612?????? }
>>>>
>>>> extracted is local variable so you so not need any orderaccess before it set.
>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>>>> Atmoic::store();
>>>>
>>>> ###9
>>>> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>>>>
>>>> 1714???? OrderAccess::storestore();
>>>> 1715???? set_next(in_use_list, next);
>>>> Fixed by changing set_next to Atomic::store.
>>>>
>>>> ###10
>>>> 1737???? self->om_free_list = NULL;
>>>> 1738???? OrderAccess::storestore();? // Lazier memory is okay for list walkers.
>>>>
>>>> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does first 
>>>> thing cmpxchg so there is no need for this storestore.
>>>>
>>>> ###11
>>>> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, Thread* self,
>>>>
>>>> 1938?????? // Once ObjectMonitor is configured and the object is associated
>>>> 1939?????? // with the ObjectMonitor, it is safe to allow async deflation:
>>>> 1940?????? assert(m->is_new(), "freshly allocated monitor must be new");
>>>> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
>>>>
>>>> So we use ref count, contention, waiter, owner and allocation state to keep 
>>>> OM alive in different scenarios.
>>>> There is not way for me to keep track of that. I don't see why you would 
>>>> need more than owner and ref count.
>>>> If you allocate the om with ref count 1 you can remove _allocation_state and 
>>>> just decrease ref count here instead.
>>>>
>>>> ###12
>>>> 2079 bool ObjectSynchronizer::deflate_monitor
>>>>
>>>> 2112???? if (AsyncDeflateIdleMonitors) {
>>>> 2113?????? // clear() expects the owner field to be NULL and we won't race
>>>> 2114?????? // with the simple C2 ObjectMonitor
>>>>
>>>> The macro assambler code is not just executed by C2, so this comment is a 
>>>> bit misleading. (there are some more also)
>>>>
>>>> ###13
>>>> 2306 int ObjectSynchronizer::deflate_monitor_list(
>>>>
>>>> Same issue as ObjectSynchronizer::om_release.
>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>>>> Atmoic::store();
>>>>
>>>> ###14
>>>> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
>>>>
>>>> This is the wrong method to call, it should 
>>>> SafepointMechanism::should_block(Thread* thread);
>>>>
>>>> ###15
>>>> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>>>>
>>>> 2616???? g_wait_list = NULL;
>>>> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay for list 
>>>> walkers.
>>>>
>>>> I don't see that g_wait_list is ever simutainously read.
>>>> Either it is accessed by serviceThread outside a safepoint or by VMThread 
>>>> inside a safepoint?
>>>>
>>>> It looks like g_wait_list can just be a local in:
>>>> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>>>>
>>>> (disregarding the debug code that might read it in a safepoint)
>>>>
>>>> ###16
>>>> 2722???????? assert(SafepointSynchronize::is_synchronizing(), "sanity check");
>>>>
>>>> This is the wrong method to call, it should 
>>>> SafepointMechanism::should_block(Thread* thread);
>>>>
>>>> ##################
>>>> src/hotspot/share/runtime/vframe.cpp
>>>>
>>>> We are at safepoint or current thread or in a handshake, current pending and 
>>>> waiting monitor is already stable.
>>>>
>>>> ##################
>>>> src/hotspot/share/services/threadService.cpp
>>>>
>>>> These changes are only needed for the -HandshakeAfterDeflateIdleMonitors path.
>>>>
>>>> ##################
>>>> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java
>>>>
>>>> Note: if OM had a weak to object instead this would not be needed.
>>>>
>>>> Thanks, Robbin
>>>>
>>>>
>>>> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have made changes to the Async Monitor Deflation code in response to
>>>>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>>>>> and Erik O. for their comments!
>>>>>
>>>>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
>>>>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>>>>> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
>>>>> so we have 8 days from today to finish this code review cycle and push
>>>>> this code for JDK14.
>>>>>
>>>>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>>>>
>>>>> I have attached the change list from CR7 to CR8 instead of putting it in
>>>>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>>>>> file to the webrevs so it should be easy to find.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ???? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-14+21.
>>>>>
>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>> current Async Monitor Deflation code in one go (v2.08 full):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
>>>>>
>>>>> Some folks might want to see just what has changed since the last review
>>>>> cycle so here's a webrev for that (v2.08 inc):
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
>>>>>
>>>>> The OpenJDK wiki did not need any changes for this round:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 tier[1-8]
>>>>> testing on Oracle's usual set of platforms. It has also been through my usual
>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>>>>> my lab. Some testing is still running, but so far there are no new 
>>>>> regressions.
>>>>>
>>>>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 bits.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> The Async Monitor Deflation project is reaching the end game. I have no
>>>>>> changes planned for the project at this time so all that is left is code
>>>>>> review and any changes that results from those reviews.
>>>>>>
>>>>>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>>>>>
>>>>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>>>>> in the main body of this email.
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-14+19.
>>>>>>
>>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full
>>>>>>
>>>>>> Some folks might want to see just what has changed since the last review
>>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/
>>>>>>
>>>>>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
>>>>>> changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 tier[1-8]
>>>>>> testing on Oracle's usual set of platforms. It has also been through my usual
>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>>>>>> my lab.
>>>>>>
>>>>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 tier[1-3]
>>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>>>>>
>>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>>
>>>>>> ??? - "base" is jdk-14+18
>>>>>> ??? - "v2.07" is the latest version and includes C2 inc_om_ref_count() 
>>>>>> support
>>>>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>>>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors specified
>>>>>>
>>>>>> ???????? hbIR?????????? hbIR
>>>>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>>>>>> ??? ---------------? ---------? --------? -------------? -------
>>>>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>>>>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>>>>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>>>>>> v2.07_handshake
>>>>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 v2.07_off
>>>>>>
>>>>>> ??? - The Aurora Perf comparison tool reports:
>>>>>>
>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>> ??????? ----------------------? -------------------- --------------------
>>>>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, p=0.790)
>>>>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, p=0.536)
>>>>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, p=0.347)
>>>>>>
>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>
>>>>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>>>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>>>>
>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>> ??????? ----------------------? -------------------- --------------------
>>>>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, p=0.646)
>>>>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, p=0.481)
>>>>>>
>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>>>>> for our first code review in that new context!!
>>>>>>>
>>>>>>> I've been focused on changing the monitor list management code to be
>>>>>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>>>>>> like that, it takes a while to chase down all the new and wonderful
>>>>>>> races. At this point, I have the code back to the same stability that
>>>>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>>>>
>>>>>>> To lay the ground work for this round of review, I pushed the following
>>>>>>> two fixes to jdk/jdk earlier today:
>>>>>>>
>>>>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>>>>>> preparation
>>>>>>> ? ? ??????????? for lock free Monitor lists
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>>>>
>>>>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails after 
>>>>>>> 8230184
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>>>>
>>>>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>>>>> in the main body of this email.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>>>>> JDK-8230184 and JDK-8230317.
>>>>>>>
>>>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/
>>>>>>>
>>>>>>>
>>>>>>> The primary focus of this review cycle is on the lock-free Monitor List
>>>>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/
>>>>>>>
>>>>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>>>>> patch (v2.06b):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/
>>>>>>>
>>>>>>> The third and final bucket for this review cycle is the rename, whitespace,
>>>>>>> indent and comments changes made in preparation for lock free Monitor list
>>>>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>>>>> baseline so this bucket now has just a few comment changes relative to
>>>>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/
>>>>>>>
>>>>>>>
>>>>>>> Some folks might want to see just what has changed since the last review
>>>>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
>>>>>>>
>>>>>>>
>>>>>>> Last, but not least, some folks might want to see the code before the
>>>>>>> addition of lock-free Monitor List management so here's a webrev for
>>>>>>> that (v2.00 -> v2.05):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/
>>>>>>>
>>>>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>> but that should only be changes to describe per-thread list async monitor
>>>>>>> deflation being done by the ServiceThread.
>>>>>>>
>>>>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>>>>
>>>>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was using
>>>>>>> this patch baselined on jdk-13+31 (for stability):
>>>>>>>
>>>>>>> ????????? hbIR?????????? hbIR
>>>>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>>>>> ???? ---------------? ---------? --------? ------------- -------
>>>>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>>>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 v2.06d_off
>>>>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>>>>> v2.06d_ocache
>>>>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>>>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 v2.05_off
>>>>>>>
>>>>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 byte
>>>>>>> cache lines sizes. "v2.06d" is the last set of changes that I made before
>>>>>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>>>>>> buckets for this review recycle.
>>>>>>>
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>>>>> Thanks Karen!
>>>>>>>>
>>>>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>>>>> in the main body of this email.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>>>>> (jdk/jdk) repo soon...
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>>>>>> to match the CR5 changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>
>>>>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become available.
>>>>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>>>>
>>>>>>>> Next up:
>>>>>>>>
>>>>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>>>>> - Of course, going lock free often causes new races and new failures
>>>>>>>> ? so that's a good reason for make those changes isolated in their
>>>>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>>>>> - I finally have a potential fix for the Win* failure with
>>>>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next round.
>>>>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>>>>> ? look at the C2 RTM macro assembler code and look for things that might
>>>>>>>> ? conflict if Async Monitor Deflation. If you're interested in that kind
>>>>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>>>>> ? added in this round!
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>>>>> experiments.
>>>>>>>>>
>>>>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>>>>> themselves are not that big.
>>>>>>>>>
>>>>>>>>> Functional:
>>>>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>>>>> ?????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>>>>> ??????? safepoints that simply request async deflation which
>>>>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>>>>> ??????? their async deflation work.
>>>>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>>>>>> ????? AsyncDeflationInterval).
>>>>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>>>>> ??????? ServiceThread.
>>>>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>>>>>> ??? of their in-use lists.
>>>>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>>>>> ????? GuaranteedSafepointInterval.
>>>>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>>>>
>>>>>>>>> Test update:
>>>>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>>>>
>>>>>>>>> Collateral:
>>>>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>>>>
>>>>>>>>> Cleanup:
>>>>>>>>> ? - Updated comments based on Karen's code review.
>>>>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>>>>> ??? - comment and function name changes
>>>>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>>>>
>>>>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>
>>>>>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>>>>
>>>>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>>>>> Sorry for all the names...
>>>>>>>>>>
>>>>>>>>>> Main bug URL:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>>>>
>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>>>>
>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>>>>
>>>>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>>>>> stress kit is running right now.
>>>>>>>>>>
>>>>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>>>>> the results and analyze them.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>>>>>>>> project ready to go. It's also known as v2.02 (for those for with the
>>>>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>>>>>>>>>> for all the names...
>>>>>>>>>>>
>>>>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>>>>>>>>>> is out of our hair.
>>>>>>>>>>>
>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>
>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>
>>>>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>>>>
>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>>>>>
>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>>>>
>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>
>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>>>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>>>>
>>>>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>>>>> testing is done.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>>>>>> names...
>>>>>>>>>>>>
>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>
>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>
>>>>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>>>>
>>>>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor Deflation 
>>>>>>>>>>>> project
>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>>>>
>>>>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>>>>>
>>>>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest version
>>>>>>>>>>>> of JDK-8153224...
>>>>>>>>>>>>
>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>>>>>
>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>>
>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's work on:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>>>>
>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run Kitchensink
>>>>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, fastdebug
>>>>>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation stress
>>>>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>>>>
>>>>>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>>>>>> latest version of the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S.
>>>>>>>>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>

From jianglizhou at google.com  Mon Nov 11 22:12:21 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 11 Nov 2019 14:12:21 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
Message-ID: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>

Please review the following change that allows archiving
pre-JAVA_6_VERSION classes with -Xverify:none.

webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
RFE: https://bugs.openjdk.java.net/browse/JDK-8230413

Currently there are still large number of existing classes (pre-built)
with older class versions (< 50) in real world applications. Those
classes are missing the benefit of archiving. Particularly, in some
use cases, class verification can be safely disabled. For those use
cases, supporting archiving pre JDK 6 classes shows good performance
benefit. We can re-evaluate this support when -Xverify:none is removed
in the future, hopefully the needs for supporting class version < 50
is no longer significant at that time.

This change brings back the pre-JDK-8198849 behavior. Runtime makes
sure the dump-time verification mode must be the same or stronger than
the current mode.

A CSR may be needed for the change. Any thoughts on that?

Tested with jtreg appcds tests.

Best,
Jiangli

From david.holmes at oracle.com  Mon Nov 11 23:12:18 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 12 Nov 2019 09:12:18 +1000
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
Message-ID: <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>

Hi Jiangli,

On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> Please review the following change that allows archiving
> pre-JAVA_6_VERSION classes with -Xverify:none.
> 
> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> 
> Currently there are still large number of existing classes (pre-built)
> with older class versions (< 50) in real world applications. Those
> classes are missing the benefit of archiving. Particularly, in some
> use cases, class verification can be safely disabled. For those use
> cases, supporting archiving pre JDK 6 classes shows good performance
> benefit. We can re-evaluate this support when -Xverify:none is removed
> in the future, hopefully the needs for supporting class version < 50
> is no longer significant at that time.
> 
> This change brings back the pre-JDK-8198849 behavior. Runtime makes
> sure the dump-time verification mode must be the same or stronger than
> the current mode.
> 
> A CSR may be needed for the change. Any thoughts on that?

A CSR request is definitely required given that you are proposing to 
undo a change that was itself put in place via a CSR request! And given 
this is relaxing a "defense-in-depth" check which will result in 
increasing exploitability, I think you will need a very strong argument 
to justify this.

Further this not only undoes JDK-8197972 but it also invalidates 
JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671 
requested a way to know if verification had been disabled, to help with 
analyzing crash reports, but instead we decided to not allow 
verification to be disabled.

David
-----


> Tested with jtreg appcds tests.
> 
> Best,
> Jiangli
> 

From jianglizhou at google.com  Tue Nov 12 00:25:27 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 11 Nov 2019 16:25:27 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
Message-ID: <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>

Hi David,

Thanks for quick response!

On Mon, Nov 11, 2019 at 3:12 PM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> > Please review the following change that allows archiving
> > pre-JAVA_6_VERSION classes with -Xverify:none.
> >
> > webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> > RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> >
> > Currently there are still large number of existing classes (pre-built)
> > with older class versions (< 50) in real world applications. Those
> > classes are missing the benefit of archiving. Particularly, in some
> > use cases, class verification can be safely disabled. For those use
> > cases, supporting archiving pre JDK 6 classes shows good performance
> > benefit. We can re-evaluate this support when -Xverify:none is removed
> > in the future, hopefully the needs for supporting class version < 50
> > is no longer significant at that time.
> >
> > This change brings back the pre-JDK-8198849 behavior. Runtime makes
> > sure the dump-time verification mode must be the same or stronger than
> > the current mode.
> >
> > A CSR may be needed for the change. Any thoughts on that?
>
> A CSR request is definitely required given that you are proposing to
> undo a change that was itself put in place via a CSR request! And given
> this is relaxing a "defense-in-depth" check which will result in
> increasing exploitability, I think you will need a very strong argument
> to justify this.

Thanks for confirming this! Will do.

>
> Further this not only undoes JDK-8197972 but it also invalidates
> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
> requested a way to know if verification had been disabled, to help with
> analyzing crash reports, but instead we decided to not allow
> verification to be disabled.

I had some concerns about JDK-8155671 initially before making the
change, as it's a closed bug and my memory about the specific issue
was flushed out. I brought up the question in the bug. My take on
Ioi's response to my query about JDK-8155671 was that the
pre-JDK-8197972 behavior would not cause any security hole.

Re-evaluating this particular behavior, I think the pre-JDK-8155671
would actually matches user intention better. If user decides to turn
off verification in safe use cases, it seems to be a good idea to
honor that. With the new dynamic archiving capability, archive could
be created at the first time when running a particular application.
Not forcing verification when user decides to can avoid
unnecessary/unwanted overhead.

If verification is turned off at dump time for application classes,
runtime does not allow execution without also turning off
verification. We can determine a crash is not caused by relaxed dump
time verification.

Regards,
Jiangli

>
> David
> -----
>
>
>
> > Tested with jtreg appcds tests.
> >
> > Best,
> > Jiangli
> >

From david.holmes at oracle.com  Tue Nov 12 02:20:37 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 12 Nov 2019 12:20:37 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <73dec889-15c1-4f59-11d7-1df89ee99150@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
 <73dec889-15c1-4f59-11d7-1df89ee99150@oracle.com>
Message-ID: <ae08de29-d831-452e-786b-1e60908062eb@oracle.com>

Hi Robbin,

tl;dr I can see us moving to a new style of using Atomic::load/store to 
replace plain load/store and declaring variables as volatile. But I'd 
like to see it discussed and agreed upon and written up clearly in the 
wiki so we can consistently apply it. Only the new lock-free queue 
management code should attempt that in this set of changes IMO.

On 12/11/2019 6:24 am, Robbin Ehn wrote:
> Hi David,
> 
> On 2019-11-11 15:03, David Holmes wrote:
>> Word-tearing is only a potential issue for 16-bit or smaller accesses, 
>> or unaligned 32-bit or 64-bit accesses. But we don't (shouldn't) use 
>> unaligned 32-bit and 64-bit accesses to ensure there is no possibility 
>> of word-tearing. Otherwise we would need to use Atomic::load/store for 
>> every lock-free algorithm and data-structure that we have.
> 
> Yes, but we must be make sure compiler generates such accesses.
> 
> Here is what Linux docs says.
> https://www.kernel.org/doc/Documentation/memory-barriers.txt:
> 
>  ?(*) For aligned memory locations whose size allows them to be accessed
>  ???? with a single memory-reference instruction, prevents "load tearing"
>  ???? and "store tearing," in which a single large access is replaced by
>  ???? multiple smaller accesses.? For example, given an architecture having
>  ???? 16-bit store instructions with 7-bit immediate fields, the compiler
>  ???? might be tempted to use two 16-bit store-immediate instructions to
>  ???? implement the following 32-bit store:
> 
>  ????p = 0x00010002;
> 
>  ???? Please note that GCC really does use this sort of optimization,
>  ???? which is not surprising given that it would likely take more
>  ???? than two instructions to build the constant and then store it.
>  ???? This optimization can therefore be a win in single-threaded code.
>  ???? In fact, a recent bug (since fixed) caused GCC to incorrectly use
>  ???? this optimization in a volatile store.? In the absence of such bugs,
>  ???? use of WRITE_ONCE() prevents store tearing in the following example:
> 
>  ????WRITE_ONCE(p, 0x00010002);
> 
> 
> WRITE_ONCE implemented as cast to volatile then store.
> https://elixir.bootlin.com/linux/latest/source/include/linux/compiler.h#L220 
> 
> 
> WRITE_ONCE and Atomic::store is the same.

The document you quote goes on to say:

"All that aside, it is never necessary to use READ_ONCE() and
WRITE_ONCE() on a variable that has been marked volatile. "

So our use of "volatile" is already addressing this aspect.

>>
>> Atomic::load/store was primarily needed for 64-bit values on 32-bit 
>> platforms.
>>
>>
>> The point is we shouldn't need to guarantee atomic load/store for 
>> 32-bit or 64-bit values using the Atomic class because it is the 
>> implicit mode in which we operate.
> 
> Not sure, what the implicit mode is.

The implicit mode is where we assume/require/demand that accesses to 
aligned 32-bit and 64-bit fields are atomic - as per the comment from 
the wiki. The wiki may have been referring to Java variables but it was 
not specifically about C2 code, and in fact the comment about atomic 
accesses was a fundamental expectation of the compiler across the whole VM.

> Compiler may generate basically whatever it fells like as long as a 
> single-threaded execution have the same final result.
> The compiler may elide the stores/loads completely, it may fuse 
> accesses, it may add accesses, etc...

Sure. I hadn't registered the fact it was the "volatile" that was 
ensuring this but now I do, so that is fine. Compilers may not guarantee 
this but we were always working with very specific compilers, and 
specific versions, and only updated after verifying the compiler 
appeared to be doing the expected "right thing".

> E.g.
> 
> if (b == 7) a = 7;
> else a = 0;
> c = 42;
> 
> Compiler may emit:
> a = 0; // extra store
> c = 42;
> if (b == 7) a = 7;
> 
> Making it a volatile will prevents that and turn it into a atomic store 
> in both branches.
> But volatile might also force the compiler to order it with c (depending 
> on if c is volatile or not).
> If we use Atomic::load/store it's much more obvious that we allow the 
> compiler to move store of c before store of a.

I would be surprised if we ever actually care about this aspect. Most of 
the time we will have ordering dependencies that require specific 
OrderAccess operations.

>>
>> But I can see quite a number of uses have crept in to the code base. 
>> :( Go back to Java 7 and the only use of Atomic's was for dealing with 
>> 64-bit on 32-bit platforms.
>>
>> I also can't see how/where the Atomic class is in fact doing anything 
>> to guarantee atomicity for 32-bit or 64-bit values.
> 
> It casts to volatile AFAIK.

Okay and that is sufficient (albeit still a non-guaranteed property IIUC).

> When we can it should be C++ Atomic store with memory_order_relaxed.
> So compiler may re-order atomic stores/loads.

I can buy into this approach ...

>> volatile is only used to prevent basic compiler optimizations from 
>> being applied. 
> 
> And this prevention is either to weak or to strong, it's always wrong.
> I guess that is also the reason for C++ proposal of deprecating volatile.
> 
>> It is used for any concurrently modified variable that is accessed
>> lock-free to at least request the compiler to not try to be clever 
>> when accessing this variable. This may not have any well specified 
>> semantics according to language specifications but we have always used 
>> compilers in good faith that they do the right thing. Use of volatile 
>> has nothing to do with any perceived atomicity of access, nor does it 
>> suggest anything about hardware reordering.
> 
> volatile forces the compiler to generate plain store/load instructions 
> which gives us the atomic property we want, but the reordering 
> constraints are an unwanted side-effect. Using Atomic::load/store gives 
> us exactly what we need.

... as I said I'm not sure these reordering side-effects ever really 
adversely affect us, but I can buy into an approach that separates the 
two concerns.

>>
>> Atomicity of access for 32-bit and 64-bit values is implicitly 
>> obtained by using plain load/stores and having suitable aligned 
>> variables. That's the way it is supposed to work so that we don't need 
>> to write Atomic::load and Atomic::store on every variables used in 
>> lock-free contexts. But it seems that message has not been passed on 
>> through the years. I can point you to an internal wiki that I wrote up 
>> on 2010 where it states:
> 
> The only way to guaratee that the compiler will emit plain stores/load 
> is to use Atomic:: or volatile. Often if you have taken the time to 
> write something lock-free you have done that for preformance, and any 
> un-needed ordering just reduce performance. So using volatile semantics 
> is wrong, either you have a bug or to strict ordering.
> 
>>
>> "In addition the Java platform requires that basic accesses (simple 
>> loads and stores, but not compound operations like increment) are 
>> atomic for all 32-bit Java data types (all except long and double). 
>> This is usually trivially achieved by aligning values on 32-bit 
>> boundaries on 32-bit, or 64-bit systems."
> 
> We are compiling with gcc/clang, not C2, which do not care about above.

The above is not specifically about C2.

> Now, I'm not suggestion changing anything, but I think new code should 
> use correct semantic. This code is a bit mixed since the prexisting code 
> use volatile.

As I said I can buy into this style of programming _but_ this is 
something that should be discussed and agreed upon and implemented using 
clear and consistent styles, not applied ad-hoc here and there based on 
the preference of the current developer. Our use of "volatile" is not 
incorrect but may be suboptimal (though I doubt is it really an issue).

In reference to the async monitor deflation code I could see the new 
lock-free list management using this new style (a style we need to 
clearly document on the hotspot wiki), but I would not want to see a mix 
of Atomic and plain accesses on existing volatile variables at this 
stage. (We can adapt existing code to the new style as a later 
enhancement.) [ Note: I'm making an assumption about how well isolated 
the list management code and it may not quite be what I think.]

> We don't have consume yet and acquire is to strong, I suggested relaxed, 
> Atomic::load(). 

I don't think Atomic::load/store should have any memory-ordering 
properties - so yes "relaxed". We have OrderAccess for imposing true 
memory ordering and load_acquire/release_store etc use 
Atomic::load/store internally.

> Which I think is correct for all usecases of ref_count() 
> except the ADIM_guarantee where we double load ref count, where I think 
> consume is correct. But consume does not help, since if the ref count is 
> wrong who knows what the second load will be.

I'm not at all clear on "consume" (didn't they decide to scrap that 
access mode?). Anyway this is way too much overthinking in relation to 
the ADIM_guarantee. The guarantee initially checked the value of the 
local variable but reported the current value of ref_count() which could 
have changed - so you could see an inconsistent message of the form 
"assert failed: expected > 0 got 1". So Dan fixed that to report the 
value of the local, but also report the current value of ref_count(). 
This is useful in the case that ref_count() has in fact changed as you 
can more obviously see there is a race - but it does depend on the two 
calls to ref_count() not being compacted into one (which is certainly 
not an issue with the way it is/was implemented).

Cheers,
David
-----

> /Robbin
> 
>>
>> David
>> -----
>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>> On 8/11/2019 11:35 pm, Robbin Ehn wrote:
>>>>> Hi Dan,
>>>>>
>>>>> Thanks for looking into this, some comments on v8:
>>>>>
>>>>> ##################
>>>>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>>>>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>>>>> src/hotspot/share/logging/logTag.hpp
>>>>> src/hotspot/share/oops/markWord.hpp
>>>>> src/hotspot/share/runtime/basicLock.cpp
>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>> src/hotspot/share/runtime/sharedRuntime.cpp
>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>> src/hotspot/share/runtime/vmStructs.cpp
>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>> test/hotspot/gtest/oops/test_markWord.cpp
>>>>>
>>>>> No comments.
>>>>>
>>>>> ##################
>>>>> I don't see the benefit of having the 
>>>>> -HandshakeAfterDeflateIdleMonitors code paths.
>>>>> Removing that option would mean these files can be reverted:
>>>>> src/hotspot/cpu/aarch64/globals_aarch64.hpp
>>>>> src/hotspot/cpu/arm/globals_arm.hpp
>>>>> src/hotspot/cpu/ppc/globals_ppc.hpp
>>>>> src/hotspot/cpu/s390/globals_s390.hpp
>>>>> src/hotspot/cpu/sparc/globals_sparc.hpp
>>>>> src/hotspot/cpu/x86/globals_x86.hpp
>>>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>>>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>>>> src/hotspot/cpu/zero/globals_zero.hpp
>>>>>
>>>>> And one less option here:
>>>>> src/hotspot/share/runtime/globals.hpp
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/prims/jvm.cpp
>>>>>
>>>>> Unclear if this is a good idea.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/prims/whitebox.cpp
>>>>>
>>>>> This would assume the test expects the right thing, but that is not 
>>>>> obvious.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>>>
>>>>> The current pending and waiting monitor is only changed by the 
>>>>> JavaThread itself.
>>>>> It only sets it after _contentions is increased.
>>>>> It clears it before _contentions is decreased.
>>>>> We are depending on safepoint or the thread is suspended, so it 
>>>>> can't be deflated since _contentions are > 0.
>>>>> Plus the thread have already increased the ref count and can't 
>>>>> decrease it (since at safepoint or suspended).
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>
>>>>> ###1
>>>>> You have several these (and in other files):
>>>>> 242?? jint l_ref_count = ref_count();
>>>>> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: 
>>>>> l_ref_count=%d, ref_count=%d", l_ref_count, ref_count());
>>>>> Please use Atomic::load() in ref_count.
>>>>> Since this is dependent on ref_count being volatile, otherwise the 
>>>>> compiler may only do one load.
>>>>>
>>>>> ###2
>>>>> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
>>>>> ...
>>>>> 311?? Atomic::add(1, &_contentions);
>>>>> In ObjectSynchronizer::deflate_monitor if you would check ref count 
>>>>> instead of _contetion, we could remove contention.
>>>>> Since all waiters also have a ref count it looks like we don't need 
>>>>> waiters either.
>>>>> In ObjectSynchronizer::deflate_monitor:
>>>>> if (mid->_contentions != 0 || mid->_waiters != 0) {
>>>>> Why not just do:
>>>>> if (mid->ref_count()) {
>>>>> ?
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>>
>>>>> ###1
>>>>> ??252?? intptr_t is_busy() const {
>>>>> ??253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
>>>>> ??254???? // We do not include _ref_count in the is_busy() check 
>>>>> because
>>>>> ??255???? // _ref_count is for indicating that the ObjectMonitor* 
>>>>> is in
>>>>> ??256???? // use which is orthogonal to whether the ObjectMonitor 
>>>>> itself
>>>>> ??257???? // is in use for a locking operation.
>>>>>
>>>>> But in the non-debug code we always check:
>>>>> +? if (mid->is_busy() || mid->ref_count() != 0) {
>>>>>
>>>>> So it seem like you should have a method including ref count.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>
>>>>> Use Atomic::load for ref count.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>
>>>>> ###1
>>>>> ??139 static volatile int g_om_free_count = 0;??? // # on g_free_list
>>>>> ??140 static volatile int g_om_in_use_count = 0;? // # on 
>>>>> g_om_in_use_list
>>>>> ??141 static volatile int g_om_population = 0;??? // # Extant -- in 
>>>>> circulation
>>>>> ??142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
>>>>> No padding here, aren't they more contended than the fields in the OM?
>>>>>
>>>>> ###2
>>>>> 151 static bool is_next_marked(ObjectMonitor* om) {
>>>>>
>>>>> Is only used in ObjectSynchronizer::om_flush.
>>>>> Here you fetch a OM and read the next field, this do not need LA 
>>>>> semantics on supported platforms.
>>>>> This would only need Atomic::load.
>>>>>
>>>>> ###3
>>>>> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>>>>>
>>>>> In no place you need SR, in the only places it would made a 
>>>>> difference:
>>>>> ??345?????? OrderAccess::storestore();
>>>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>>>> and
>>>>> 1714???? OrderAccess::storestore();
>>>>> 1715???? set_next(in_use_list, next);
>>>>>
>>>>> You have a storestore already!
>>>>>
>>>>> This code reads as:
>>>>> OrderAccess::storestore();
>>>>> OrderAccess::loadstore();
>>>>> OrderAccess::storestore();
>>>>> om->_next_om = value
>>>>>
>>>>> So it should be an Atomic::store.
>>>>>
>>>>> ###4
>>>>> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>>>>>
>>>>> Since the mark is an embedded spinlock I think the terminology 
>>>>> should be changed. (that the spinlock is inside a the next pointer 
>>>>> should be abstracted away)
>>>>> E.g. mark_next_loop would just be lock.
>>>>> The load of the list heads should use Atmoic:load.
>>>>> It also seem a bit wired to return next for the locking method.
>>>>> And output parameter can just be returned, and return NULL if list 
>>>>> head is NULL.
>>>>> E.g.
>>>>>
>>>>> ??198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* 
>>>>> volatile * list_p) {
>>>>> ??200?? while (true) {
>>>>> ??201???? ObjectMonitor* mid = Atomic::load(list_p);
>>>>> ??202???? if (mid == NULL) {
>>>>> ??203?????? return NULL;? // The list is empty.
>>>>> ??204???? }
>>>>> ??205???? if (try_lock(mid)) {
>>>>> ??206?????? if (Atmoic::load(list_p) != mid) {
>>>>> ??207???????? // The list head changed so we have to retry.
>>>>> ??208???????? unlock(mid);
>>>>> ??210?????? } else {
>>>>> ????????????? return mid;
>>>>> ??????? }
>>>>> ??214???? }
>>>>> ????????? // Yield ?
>>>>> ??215?? }
>>>>> ??216 }
>>>>>
>>>>> With colleteral changes.
>>>>>
>>>>> ###5
>>>>> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
>>>>> Atomic::store is what needed.
>>>>>
>>>>> ###6
>>>>> 333 static void prepend_to_common(
>>>>>
>>>>> ??345?????? OrderAccess::storestore();
>>>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>>>> Double storestore. (fixed by changing set_next to Atomic::store)
>>>>>
>>>>> ###7
>>>>> ??375 static ObjectMonitor* 
>>>>> take_from_start_of_common(ObjectMonitor* volatile * list_p,
>>>>>
>>>>> Triple storestore here.
>>>>>
>>>>> ??386?? Atomic::dec(count_p);
>>>>> ??387?? // mark_list_head() used cmpxchg() above, switching list 
>>>>> head can be lazier:
>>>>> ??388?? OrderAccess::storestore();
>>>>> ??389?? // Unmark take, but leave the next value for any lagging list
>>>>> ??390?? // walkers. It will get cleaned up when take is prepended to
>>>>> ??391?? // the in-use list:
>>>>> ??392?? set_next(take, next);
>>>>> ??393?? return take;
>>>>>
>>>>> Reads:
>>>>> count_p--
>>>>> OrderAccess::loadstore();
>>>>> OrderAccess::storestore();
>>>>> OrderAccess::storestore();
>>>>> OrderAccess::loadstore();
>>>>> OrderAccess::storestore();
>>>>> take->_next_om = next;
>>>>>
>>>>> Fixed by changing set_next to Atomic::store and removing the 
>>>>> OrderAccess::storestore();
>>>>>
>>>>> ###8
>>>>> ObjectSynchronizer::om_release(
>>>>>
>>>>> 1591?????? if (m == mid) {
>>>>> 1592???????? // We found 'm' on the per-thread in-use list so try 
>>>>> to extract it.
>>>>> 1593???????? if (cur_mid_in_use == NULL) {
>>>>> 1594?????????? // mid is the list head and it is marked. Switch the 
>>>>> list head
>>>>> 1595?????????? // to next which unmarks the list head, but leaves 
>>>>> mid marked:
>>>>> 1596?????????? self->om_in_use_list = next;
>>>>> 1597?????????? // mark_list_head() used cmpxchg() above, switching 
>>>>> list head can be lazier:
>>>>> 1598?????????? OrderAccess::storestore();
>>>>> 1599???????? } else {
>>>>> 1600?????????? // mid and cur_mid_in_use are marked. Switch 
>>>>> cur_mid_in_use's
>>>>> 1601?????????? // next field to next which unmarks cur_mid_in_use, 
>>>>> but leaves
>>>>> 1602?????????? // mid marked:
>>>>> 1603           
>>>>> OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
>>>>> 1604???????? }
>>>>> 1605???????? extracted = true;
>>>>> 1606???????? Atomic::dec(&self->om_in_use_count);
>>>>> 1607???????? // Unmark mid, but leave the next value for any 
>>>>> lagging list
>>>>> 1608???????? // walkers. It will get cleaned up when mid is 
>>>>> prepended to
>>>>> 1609???????? // the thread's free list:
>>>>> 1610???????? set_next(mid, next);
>>>>> 1611???????? break;
>>>>> 1612?????? }
>>>>>
>>>>> This does not look correct. Before taking this branch we have done 
>>>>> a cmpxchg in mark_list_head or mark_next_loop.
>>>>> This is how it reads:
>>>>> OrderAccess::storestore(); // from previous cmpxchg
>>>>> OrderAccess::loadstore(); // from previous cmpxchg
>>>>> 1591?????? if (m == mid) {
>>>>> 1593???????? if (cur_mid_in_use == NULL) {
>>>>> 1596?????????? self->om_in_use_list = next;
>>>>> 1598?????????? OrderAccess::storestore();
>>>>> 1599???????? } else {
>>>>> ??????????????? OrderAccess::storestore();
>>>>> ??????????????? OrderAccess::loadstore();
>>>>> 1603?????????? cur_mid_in_use->_next_om = next;
>>>>> 1604???????? }
>>>>> 1605???????? extracted = true;
>>>>> ????????????? OrderAccess::storestore();
>>>>> ????????????? OrderAccess::fence(); // 
>>>>> storestore|storeload|loadstore|loadload
>>>>> ????????? self->om_in_use_count--; // Atomic::dec
>>>>> ????????????? OrderAccess::storestore();
>>>>> ????????????? OrderAccess::loadstore();
>>>>> ????????????? OrderAccess::storestore();
>>>>> ????????????? OrderAccess::loadstore();
>>>>> ????????? mid->_next_om = next; // Atomic::store
>>>>> 1611???????? break;
>>>>> 1612?????? }
>>>>>
>>>>> extracted is local variable so you so not need any orderaccess 
>>>>> before it set.
>>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>>> OrderAccess::storestore() and changing OrderAccess::release_store 
>>>>> to Atmoic::store();
>>>>>
>>>>> ###9
>>>>> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>>>>>
>>>>> 1714???? OrderAccess::storestore();
>>>>> 1715???? set_next(in_use_list, next);
>>>>> Fixed by changing set_next to Atomic::store.
>>>>>
>>>>> ###10
>>>>> 1737???? self->om_free_list = NULL;
>>>>> 1738???? OrderAccess::storestore();? // Lazier memory is okay for 
>>>>> list walkers.
>>>>>
>>>>> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does 
>>>>> first thing cmpxchg so there is no need for this storestore.
>>>>>
>>>>> ###11
>>>>> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, 
>>>>> Thread* self,
>>>>>
>>>>> 1938?????? // Once ObjectMonitor is configured and the object is 
>>>>> associated
>>>>> 1939?????? // with the ObjectMonitor, it is safe to allow async 
>>>>> deflation:
>>>>> 1940?????? assert(m->is_new(), "freshly allocated monitor must be 
>>>>> new");
>>>>> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
>>>>>
>>>>> So we use ref count, contention, waiter, owner and allocation state 
>>>>> to keep OM alive in different scenarios.
>>>>> There is not way for me to keep track of that. I don't see why you 
>>>>> would need more than owner and ref count.
>>>>> If you allocate the om with ref count 1 you can remove 
>>>>> _allocation_state and just decrease ref count here instead.
>>>>>
>>>>> ###12
>>>>> 2079 bool ObjectSynchronizer::deflate_monitor
>>>>>
>>>>> 2112???? if (AsyncDeflateIdleMonitors) {
>>>>> 2113?????? // clear() expects the owner field to be NULL and we 
>>>>> won't race
>>>>> 2114?????? // with the simple C2 ObjectMonitor
>>>>>
>>>>> The macro assambler code is not just executed by C2, so this 
>>>>> comment is a bit misleading. (there are some more also)
>>>>>
>>>>> ###13
>>>>> 2306 int ObjectSynchronizer::deflate_monitor_list(
>>>>>
>>>>> Same issue as ObjectSynchronizer::om_release.
>>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>>> OrderAccess::storestore() and changing OrderAccess::release_store 
>>>>> to Atmoic::store();
>>>>>
>>>>> ###14
>>>>> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
>>>>>
>>>>> This is the wrong method to call, it should 
>>>>> SafepointMechanism::should_block(Thread* thread);
>>>>>
>>>>> ###15
>>>>> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>>>>>
>>>>> 2616???? g_wait_list = NULL;
>>>>> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay 
>>>>> for list walkers.
>>>>>
>>>>> I don't see that g_wait_list is ever simutainously read.
>>>>> Either it is accessed by serviceThread outside a safepoint or by 
>>>>> VMThread inside a safepoint?
>>>>>
>>>>> It looks like g_wait_list can just be a local in:
>>>>> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>>>>>
>>>>> (disregarding the debug code that might read it in a safepoint)
>>>>>
>>>>> ###16
>>>>> 2722???????? assert(SafepointSynchronize::is_synchronizing(), 
>>>>> "sanity check");
>>>>>
>>>>> This is the wrong method to call, it should 
>>>>> SafepointMechanism::should_block(Thread* thread);
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/runtime/vframe.cpp
>>>>>
>>>>> We are at safepoint or current thread or in a handshake, current 
>>>>> pending and waiting monitor is already stable.
>>>>>
>>>>> ##################
>>>>> src/hotspot/share/services/threadService.cpp
>>>>>
>>>>> These changes are only needed for the 
>>>>> -HandshakeAfterDeflateIdleMonitors path.
>>>>>
>>>>> ##################
>>>>> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java 
>>>>>
>>>>>
>>>>> Note: if OM had a weak to object instead this would not be needed.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>>
>>>>> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have made changes to the Async Monitor Deflation code in 
>>>>>> response to
>>>>>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., 
>>>>>> Robbin
>>>>>> and Erik O. for their comments!
>>>>>>
>>>>>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async 
>>>>>> Monitor
>>>>>> Deflation project needs to push before Nov. 12, 2019 in order to 
>>>>>> allow
>>>>>> for sufficient bake time for such a big change. Nov. 12 is _next_ 
>>>>>> Tuesday
>>>>>> so we have 8 days from today to finish this code review cycle and 
>>>>>> push
>>>>>> this code for JDK14.
>>>>>>
>>>>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>>>>> reviews.
>>>>>>
>>>>>> I have attached the change list from CR7 to CR8 instead of putting 
>>>>>> it in
>>>>>> the body of this email. I've also added a link to the 
>>>>>> CR7-to-CR8-changes
>>>>>> file to the webrevs so it should be easy to find.
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ???? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-14+21.
>>>>>>
>>>>>> Here's the full webrev URL for those folks that want to see all of 
>>>>>> the
>>>>>> current Async Monitor Deflation code in one go (v2.08 full):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full 
>>>>>>
>>>>>>
>>>>>> Some folks might want to see just what has changed since the last 
>>>>>> review
>>>>>> cycle so here's a webrev for that (v2.08 inc):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/ 
>>>>>>
>>>>>>
>>>>>> The OpenJDK wiki did not need any changes for this round:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>>>>>> tier[1-8]
>>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>>> through my usual
>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with 
>>>>>> the addition
>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>>>>> tests in
>>>>>> my lab. Some testing is still running, but so far there are no new 
>>>>>> regressions.
>>>>>>
>>>>>> I have not yet done a SPECjbb2015 round on the 
>>>>>> CR8/v2.08/11-for-jdk14 bits.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> The Async Monitor Deflation project is reaching the end game. I 
>>>>>>> have no
>>>>>>> changes planned for the project at this time so all that is left 
>>>>>>> is code
>>>>>>> review and any changes that results from those reviews.
>>>>>>>
>>>>>>> Carsten and Roman! Time for you guys to chime in again on the 
>>>>>>> code reviews.
>>>>>>>
>>>>>>> I have attached the list of fixes from CR6 to CR7 instead of 
>>>>>>> putting it
>>>>>>> in the main body of this email.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-14+19.
>>>>>>>
>>>>>>> Here's the full webrev URL for those folks that want to see all 
>>>>>>> of the
>>>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>>>>>
>>>>>>>
>>>>>>> Some folks might want to see just what has changed since the last 
>>>>>>> review
>>>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>>>>>
>>>>>>>
>>>>>>> The OpenJDK wiki has been updated to match the 
>>>>>>> CR7/v2.07/10-for-jdk14 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> The jdk-14+18 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-8]
>>>>>>> testing on Oracle's usual set of platforms. It has also been 
>>>>>>> through my usual
>>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with 
>>>>>>> the addition
>>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the 
>>>>>>> other tests in
>>>>>>> my lab.
>>>>>>>
>>>>>>> The jdk-14+19 based v2.07 version of the patch has been thru 
>>>>>>> Mach5 tier[1-3]
>>>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>>>>>> process.
>>>>>>>
>>>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>> Performance lab
>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>>>
>>>>>>> ??? - "base" is jdk-14+18
>>>>>>> ??? - "v2.07" is the latest version and includes C2 
>>>>>>> inc_om_ref_count() support
>>>>>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors 
>>>>>>> option
>>>>>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>>>>>> specified
>>>>>>>
>>>>>>> ???????? hbIR?????????? hbIR
>>>>>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>>>>>>> ??? ---------------? ---------? --------? -------------? -------
>>>>>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 
>>>>>>> base
>>>>>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 
>>>>>>> v2.07
>>>>>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>>>>>>> v2.07_handshake
>>>>>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 
>>>>>>> v2.07_off
>>>>>>>
>>>>>>> ??? - The Aurora Perf comparison tool reports:
>>>>>>>
>>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>>> ??????? ----------------------? -------------------- 
>>>>>>> --------------------
>>>>>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>>>>>>> p=0.790)
>>>>>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>>>>>>> p=0.536)
>>>>>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>>>>>>> p=0.347)
>>>>>>>
>>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>>
>>>>>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>>>>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>>>>>
>>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>>> ??????? ----------------------? -------------------- 
>>>>>>> --------------------
>>>>>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>>>>>>> p=0.646)
>>>>>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>>>>>>> p=0.481)
>>>>>>>
>>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> The Async Monitor Deflation project has rebased to JDK14 so it's 
>>>>>>>> time
>>>>>>>> for our first code review in that new context!!
>>>>>>>>
>>>>>>>> I've been focused on changing the monitor list management code 
>>>>>>>> to be
>>>>>>>> lock-free in order to make SPECjbb2015 happier. Of course with a 
>>>>>>>> change
>>>>>>>> like that, it takes a while to chase down all the new and wonderful
>>>>>>>> races. At this point, I have the code back to the same stability 
>>>>>>>> that
>>>>>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>>>>>
>>>>>>>> To lay the ground work for this round of review, I pushed the 
>>>>>>>> following
>>>>>>>> two fixes to jdk/jdk earlier today:
>>>>>>>>
>>>>>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes 
>>>>>>>> in preparation
>>>>>>>> ? ? ??????????? for lock free Monitor lists
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>>>>>
>>>>>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>>>>>> after 8230184
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>>>>>
>>>>>>>> I have attached the list of fixes from CR5 to CR6 instead of 
>>>>>>>> putting
>>>>>>>> in the main body of this email.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>>>>>> JDK-8230184 and JDK-8230317.
>>>>>>>>
>>>>>>>> Here's the full webrev URL for those folks that want to see all 
>>>>>>>> of the
>>>>>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The primary focus of this review cycle is on the lock-free 
>>>>>>>> Monitor List
>>>>>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The secondary focus of this review cycle is on the bug fixes 
>>>>>>>> that have
>>>>>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for 
>>>>>>>> just that
>>>>>>>> patch (v2.06b):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The third and final bucket for this review cycle is the rename, 
>>>>>>>> whitespace,
>>>>>>>> indent and comments changes made in preparation for lock free 
>>>>>>>> Monitor list
>>>>>>>> management. Almost all of that was extracted into JDK-8230184 
>>>>>>>> for the
>>>>>>>> baseline so this bucket now has just a few comment changes 
>>>>>>>> relative to
>>>>>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Some folks might want to see just what has changed since the 
>>>>>>>> last review
>>>>>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Last, but not least, some folks might want to see the code 
>>>>>>>> before the
>>>>>>>> addition of lock-free Monitor List management so here's a webrev 
>>>>>>>> for
>>>>>>>> that (v2.00 -> v2.05):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> but that should only be changes to describe per-thread list 
>>>>>>>> async monitor
>>>>>>>> deflation being done by the ServiceThread.
>>>>>>>>
>>>>>>>> (I did update the OpenJDK wiki for the CR5 changes back on 
>>>>>>>> 2019.08.14)
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>> Oracle's usual set of platforms. It has also been through my 
>>>>>>>> usual set
>>>>>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>>> Performance lab
>>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This 
>>>>>>>> was using
>>>>>>>> this patch baselined on jdk-13+31 (for stability):
>>>>>>>>
>>>>>>>> ????????? hbIR?????????? hbIR
>>>>>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>>>>>> ???? ---------------? ---------? --------? ------------- -------
>>>>>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 
>>>>>>>> base
>>>>>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 
>>>>>>>> v2.06d
>>>>>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>>>>>>> v2.06d_off
>>>>>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>>>>>> v2.06d_ocache
>>>>>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 
>>>>>>>> v2.05
>>>>>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>>>>>>> v2.05_off
>>>>>>>>
>>>>>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors 
>>>>>>>> specified and
>>>>>>>> the "ocache" config is with 128 byte cache line sizes instead of 
>>>>>>>> 64 byte
>>>>>>>> cache lines sizes. "v2.06d" is the last set of changes that I 
>>>>>>>> made before
>>>>>>>> those changes were distributed into the "v2.06a", "v2.06b" and 
>>>>>>>> "v2.06c"
>>>>>>>> buckets for this review recycle.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I've been focused on chasing down and fixing the rare test 
>>>>>>>>> failures
>>>>>>>>> that only pop up rarely. So this round is primarily fixes for 
>>>>>>>>> races
>>>>>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>>>>>> Thanks Karen!
>>>>>>>>>
>>>>>>>>> I have attached the list of fixes from CR4 to CR5 instead of 
>>>>>>>>> putting
>>>>>>>>> in the main body of this email.
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+29. This will 
>>>>>>>>> likely be
>>>>>>>>> the last JDK13 baseline for this project and I'll roll to the 
>>>>>>>>> JDK14
>>>>>>>>> (jdk/jdk) repo soon...
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>>>>>
>>>>>>>>> I have not yet checked the OpenJDK wiki to see if it needs any 
>>>>>>>>> updates
>>>>>>>>> to match the CR5 changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 
>>>>>>>>> 2019.06.26)
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now 
>>>>>>>>> and
>>>>>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>>>>>> available.
>>>>>>>>> Since I haven't made any performance changes in this round, 
>>>>>>>>> I'll only
>>>>>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>>>>>
>>>>>>>>> Next up:
>>>>>>>>>
>>>>>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>>>>>> ? contention on the gListLock. So I'm going to investigate 
>>>>>>>>> removing
>>>>>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>>>>>> - Of course, going lock free often causes new races and new 
>>>>>>>>> failures
>>>>>>>>> ? so that's a good reason for make those changes isolated in their
>>>>>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>>>>>> - I finally have a potential fix for the Win* failure with
>>>>>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>> ? but I haven't run it through Mach5 yet so it'll be in the 
>>>>>>>>> next round.
>>>>>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm 
>>>>>>>>> seeing some
>>>>>>>>> ? monitor related failures there. I suspect that I need to go 
>>>>>>>>> take a
>>>>>>>>> ? look at the C2 RTM macro assembler code and look for things 
>>>>>>>>> that might
>>>>>>>>> ? conflict if Async Monitor Deflation. If you're interested in 
>>>>>>>>> that kind
>>>>>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check 
>>>>>>>>> that I
>>>>>>>>> ? added in this round!
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I have a fix for an issue that came up during performance 
>>>>>>>>>> testing.
>>>>>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>>>>>> experiments.
>>>>>>>>>>
>>>>>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>>>>>> themselves are not that big.
>>>>>>>>>>
>>>>>>>>>> Functional:
>>>>>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>>>>>> ?????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>>>>>> ??????? safepoints that simply request async deflation which
>>>>>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>>>>>> ??????? their async deflation work.
>>>>>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() 
>>>>>>>>>> with
>>>>>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>>>>>> ????? monitors_used_above_threshold() is true (but no more 
>>>>>>>>>> often than
>>>>>>>>>> ????? AsyncDeflationInterval).
>>>>>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now 
>>>>>>>>>> waits for
>>>>>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>>>>>> ??????? ServiceThread.
>>>>>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>>>>>> ??? in-use list and now it also marks JavaThreads for async 
>>>>>>>>>> deflation
>>>>>>>>>> ??? of their in-use lists.
>>>>>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>>>>>> ????? GuaranteedSafepointInterval.
>>>>>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() 
>>>>>>>>>> into
>>>>>>>>>> ??? monitors_used_above_threshold() and remove 
>>>>>>>>>> is_cleanup_needed().
>>>>>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>>>>>> ??? flag to reduce the in-use monitor population that is 
>>>>>>>>>> reported by
>>>>>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>>>>>
>>>>>>>>>> Test update:
>>>>>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to 
>>>>>>>>>> work with
>>>>>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>>>>>
>>>>>>>>>> Collateral:
>>>>>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>>>>>
>>>>>>>>>> Cleanup:
>>>>>>>>>> ? - Updated comments based on Karen's code review.
>>>>>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>>>>>> ??? - comment and function name changes
>>>>>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Main bug URL:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>>>>>
>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The wiki doesn't say a whole lot about the async deflation 
>>>>>>>>>> invocation
>>>>>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>>>>>> running now. Kitchensink8H on product, fastdebug, and 
>>>>>>>>>> slowdebug bits
>>>>>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have 
>>>>>>>>>> to run
>>>>>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This 
>>>>>>>>>>> race was
>>>>>>>>>>> theoretical and I had no test failures due to it. The fix is 
>>>>>>>>>>> pretty
>>>>>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>>>>>
>>>>>>>>>>> During those discussions Karen also floated the idea of using 
>>>>>>>>>>> the
>>>>>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up 
>>>>>>>>>>> that
>>>>>>>>>>> change and I have run it through the usual stress and Mach5 
>>>>>>>>>>> testing
>>>>>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>>>>>> Sorry for all the names...
>>>>>>>>>>>
>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>
>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>
>>>>>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>>>>>
>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>>> testing on
>>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit 
>>>>>>>>>>> run had
>>>>>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug 
>>>>>>>>>>> bits
>>>>>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>>>>>> Solaris-X64 release had the usual "Too large time diff" 
>>>>>>>>>>> complaints.
>>>>>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug 
>>>>>>>>>>> bits on
>>>>>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>>>>>> stress kit is running right now.
>>>>>>>>>>>
>>>>>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to 
>>>>>>>>>>> gather
>>>>>>>>>>> the results and analyze them.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>>>>>> Deflation
>>>>>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>>>>>> with the
>>>>>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev 
>>>>>>>>>>>> URLs). Sorry
>>>>>>>>>>>> for all the names...
>>>>>>>>>>>>
>>>>>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that 
>>>>>>>>>>>> baseline patch
>>>>>>>>>>>> is out of our hair.
>>>>>>>>>>>>
>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>
>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>
>>>>>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 
>>>>>>>>>>>> changes:
>>>>>>>>>>>>
>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] 
>>>>>>>>>>>> testing on
>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running 
>>>>>>>>>>>> now.
>>>>>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H 
>>>>>>>>>>>> is running
>>>>>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, 
>>>>>>>>>>>> MacOSX
>>>>>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on 
>>>>>>>>>>>> product,
>>>>>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and 
>>>>>>>>>>>> Solaris-X64.
>>>>>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday 
>>>>>>>>>>>> (after
>>>>>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>>>>>
>>>>>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>>>>>> testing is done.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>>>>>> ready to
>>>>>>>>>>>>> go. It's also known as v2.01 (for those for with the 
>>>>>>>>>>>>> patches) and as
>>>>>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for 
>>>>>>>>>>>>> all the
>>>>>>>>>>>>> names...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>
>>>>>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>>>>>> Deflation project
>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>>>>>
>>>>>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the webrev for the latest baseline changes 
>>>>>>>>>>>>> (JDK-8222295):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the 
>>>>>>>>>>>>> latest version
>>>>>>>>>>>>> of JDK-8153224...
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR 
>>>>>>>>>>>>> changes:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] 
>>>>>>>>>>>>> testing on
>>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running 
>>>>>>>>>>>>> now and
>>>>>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>>>>>> Solaris-X64
>>>>>>>>>>>>> is running now. Linux-X64 stress testing will start on 
>>>>>>>>>>>>> Sunday. I'm
>>>>>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my 
>>>>>>>>>>>>> monitor
>>>>>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or 
>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Welcome to the OpenJDK review thread for my port of 
>>>>>>>>>>>>>> Carsten's work on:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/ 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>>>>>> rounds of
>>>>>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, 
>>>>>>>>>>>>>> Robbin, and
>>>>>>>>>>>>>> Roman for their preliminary code review comments. A very 
>>>>>>>>>>>>>> special
>>>>>>>>>>>>>> thanks to Robbin and Roman for building and testing the 
>>>>>>>>>>>>>> patch in
>>>>>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>>>>>> testing on
>>>>>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have 
>>>>>>>>>>>>>> been run
>>>>>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>>>>>> Kitchensink
>>>>>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>>>>>> (product, fastdebug
>>>>>>>>>>>>>> and slowdebug). Earlier versions have run my monitor 
>>>>>>>>>>>>>> inflation stress
>>>>>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>>>>>> (product,
>>>>>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All of the testing done on earlier versions will be redone 
>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>> latest version of the patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or 
>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> P.S.
>>>>>>>>>>>>>> One subtest in 
>>>>>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've 
>>>>>>>>>>>>>> been trying
>>>>>>>>>>>>>> to characterize/analyze this failure for more than a week 
>>>>>>>>>>>>>> now. At
>>>>>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>>>>>> aggravating
>>>>>>>>>>>>>> an existing bug. However, I plan to have a better handle 
>>>>>>>>>>>>>> on that
>>>>>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>

From david.holmes at oracle.com  Tue Nov 12 04:52:42 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 12 Nov 2019 14:52:42 +1000
Subject: RFR: 8233549: Thread interrupted state must only be accessed when not
 in a safepoint-safe state
Message-ID: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>

webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
bug: https://bugs.openjdk.java.net/browse/JDK-8233549

In JDK-8229516 I moved the interrupted state of a thread from the 
osThread in the VM to the java.lang.Thread instance. In doing that I 
overlooked a critical aspect, which is that to access the field of a 
Java object the JavaThread must not be in a safepoint-safe state** - 
otherwise the oop, and anything referenced there from could be relocated 
by the GC whilst the JavaThread is accessing it. This manifested in a 
number of tests using JVM TI Agent threads and JVM TI RawMonitors 
because the JavaThread's were marked _thread_blocked and hence 
safepoint-safe, and we read a non-zero value for the interrupted field 
even though we had never been interrupted.

This problem existed in all the code that checks for interruption when 
"waiting":

- Parker::park (the code underpinning 
java.util.concurrent.LockSupport.park())

To fix this code I simply deleted a late check of the interrupted field. 
The check was not needed because if an interrupt has occurred then we 
will find the ParkEvent in a signalled state.

- ObjectMonitor::wait

Here the late check of the interrupted state is essential as we reset 
the ParkEvent after an earlier check of the interrupted state. But the 
fix was simply achieved by moving the check slightly earlier before we 
use ThreadBlockInVm to become _thread_blocked.

- RawMonitor::wait

This fix was much more involved. The RawMonitor code directly 
transitions the JavaThread from _thread_in_Native to _thread_blocked. 
This is safe from a safepoint perspective because they are equivalent 
safepoint-safe states. To allow access to the interrupted field I have 
to transition from native to _thread_in_vm, and that has to be done by 
proper thread-state transitions to ensure correct access to the oop and 
its fields. Having done that I can then use ThreadBlockInVM for the 
transitions to blocked. However, as the old code noted it can't use 
proper thread-state transitions as this will lead to deadlocks with the 
VMThread that can also use RawMonitors when executing various event 
callbacks. To deal with that we have to note that the real constraint is 
that the JavaThread cannot block at a safepoint whilst it holds the 
RawMonitor. Hence the fix was push all the interrupt checking code and 
the thread-state transitions to the lowest level of RawMonitorWait, 
around the final park() call, after we have enqueued the waiter and 
released the monitor. That avoids any deadlock possibility.

I also added checks to is_interrupted/interrupted to ensure they are 
only called by a thread in a suitable state. This should only be the 
VMThread (as a consequence of the Thread.stop implementation occurring 
at a safepoint and issuing a JavaThread::interrupt() call to unblock the 
target); or a JavaThread that is not _thread_in_native or _thread_blocked.

Testing: (still finalizing)
  - tiers 1 - 6 (Oracle platforms)
  - Local Linux testing
   - vmTestbase/nsk/monitoring/
   - vmTestbase/nsk/jdwp
   - vmTestbase/nsk/jdb/
   - vmTestbase/nsk/jdi/
   - vmTestbase/nsk/jvmti/
   - serviceability/jvmti/
   - serviceability/jdwp
   - JDK: java/lang/management
          com/sun/management

** Note that this applies to all accesses we make via code in 
javaClasses.*. For this particular code I thought about adding a guard 
in JavaThread::threadObj() but it turns out when we generate a crash 
report we access the Thread's name() field and that can happen when in 
any state, so we'd always trigger a secondary assertion failure during 
error reporting if we did that. Note that accessing name() can still 
easily lead to secondary assertions failures as I discovered when trying 
to debug this and print the thread name out - I would see an is_instance 
assertion fail checking that the Thread name() is an instance of 
java.lang.String!

Thanks,
David
-----

From ioi.lam at oracle.com  Tue Nov 12 04:53:48 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 11 Nov 2019 20:53:48 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
Message-ID: <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>

I wonder if there's a safer alternative. Are there tools that can add 
stackmaps to pre-JDK6 classes? That way they can be verified with the 
split verifier during CDS dump time.

Thanks
- Ioi

On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> Hi David,
>
> Thanks for quick response!
>
> On Mon, Nov 11, 2019 at 3:12 PM David Holmes <david.holmes at oracle.com> wrote:
>> Hi Jiangli,
>>
>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
>>> Please review the following change that allows archiving
>>> pre-JAVA_6_VERSION classes with -Xverify:none.
>>>
>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
>>>
>>> Currently there are still large number of existing classes (pre-built)
>>> with older class versions (< 50) in real world applications. Those
>>> classes are missing the benefit of archiving. Particularly, in some
>>> use cases, class verification can be safely disabled. For those use
>>> cases, supporting archiving pre JDK 6 classes shows good performance
>>> benefit. We can re-evaluate this support when -Xverify:none is removed
>>> in the future, hopefully the needs for supporting class version < 50
>>> is no longer significant at that time.
>>>
>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
>>> sure the dump-time verification mode must be the same or stronger than
>>> the current mode.
>>>
>>> A CSR may be needed for the change. Any thoughts on that?
>> A CSR request is definitely required given that you are proposing to
>> undo a change that was itself put in place via a CSR request! And given
>> this is relaxing a "defense-in-depth" check which will result in
>> increasing exploitability, I think you will need a very strong argument
>> to justify this.
> Thanks for confirming this! Will do.
>
>> Further this not only undoes JDK-8197972 but it also invalidates
>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
>> requested a way to know if verification had been disabled, to help with
>> analyzing crash reports, but instead we decided to not allow
>> verification to be disabled.
> I had some concerns about JDK-8155671 initially before making the
> change, as it's a closed bug and my memory about the specific issue
> was flushed out. I brought up the question in the bug. My take on
> Ioi's response to my query about JDK-8155671 was that the
> pre-JDK-8197972 behavior would not cause any security hole.
>
> Re-evaluating this particular behavior, I think the pre-JDK-8155671
> would actually matches user intention better. If user decides to turn
> off verification in safe use cases, it seems to be a good idea to
> honor that. With the new dynamic archiving capability, archive could
> be created at the first time when running a particular application.
> Not forcing verification when user decides to can avoid
> unnecessary/unwanted overhead.
>
> If verification is turned off at dump time for application classes,
> runtime does not allow execution without also turning off
> verification. We can determine a crash is not caused by relaxed dump
> time verification.
>
> Regards,
> Jiangli
>
>> David
>> -----
>>
>>
>>
>>> Tested with jtreg appcds tests.
>>>
>>> Best,
>>> Jiangli
>>>


From robbin.ehn at oracle.com  Tue Nov 12 08:16:31 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 12 Nov 2019 09:16:31 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <ae08de29-d831-452e-786b-1e60908062eb@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
 <73dec889-15c1-4f59-11d7-1df89ee99150@oracle.com>
 <ae08de29-d831-452e-786b-1e60908062eb@oracle.com>
Message-ID: <62310a8b-4736-3b49-a94a-d46424f06955@oracle.com>

Hi David, in short.

On 2019-11-12 03:20, David Holmes wrote:
> Hi Robbin,
> 
> tl;dr I can see us moving to a new style of using Atomic::load/store to replace 
> plain load/store and declaring variables as volatile. But I'd like to see it 
> discussed and agreed upon and written up clearly in the wiki so we can 
> consistently apply it. Only the new lock-free queue management code should 
> attempt that in this set of changes IMO.

Ok, agreed.

*truncated*

> "All that aside, it is never necessary to use READ_ONCE() and
> WRITE_ONCE() on a variable that has been marked volatile. "
> 
> So our use of "volatile" is already addressing this aspect.

Yes

> 
> In reference to the async monitor deflation code I could see the new lock-free 
> list management using this new style (a style we need to clearly document on the 
> hotspot wiki), but I would not want to see a mix of Atomic and plain accesses on 
> existing volatile variables at this stage. (We can adapt existing code to the 
> new style as a later enhancement.) [ Note: I'm making an assumption about how 
> well isolated the list management code and it may not quite be what I think.]
> 

Yes, the argument that don't mix is perfectly fine.

>> We don't have consume yet and acquire is to strong, I suggested relaxed, 
>> Atomic::load(). 
> 
> I don't think Atomic::load/store should have any memory-ordering properties - so 
> yes "relaxed". We have OrderAccess for imposing true memory ordering and 
> load_acquire/release_store etc use Atomic::load/store internally.

No. I mean a "OrderAccess::store_consume()".
Atomic::load/store should be relaxed, yes.

> 
>> Which I think is correct for all usecases of ref_count() except the 
>> ADIM_guarantee where we double load ref count, where I think consume is 
>> correct. But consume does not help, since if the ref count is wrong who knows 
>> what the second load will be.
> 
> I'm not at all clear on "consume" (didn't they decide to scrap that access 
> mode?).

I think they re-added it in C++20 with fixed semantic, not sure... :)

Thanks, Robbin

  Anyway this is way too much overthinking in relation to the
> ADIM_guarantee. The guarantee initially checked the value of the local variable 
> but reported the current value of ref_count() which could have changed - so you 
> could see an inconsistent message of the form "assert failed: expected > 0 got 
> 1". So Dan fixed that to report the value of the local, but also report the 
> current value of ref_count(). This is useful in the case that ref_count() has in 
> fact changed as you can more obviously see there is a race - but it does depend 
> on the two calls to ref_count() not being compacted into one (which is certainly 
> not an issue with the way it is/was implemented).
> 
> Cheers,
> David
> -----
> 
>> /Robbin
>>
>>>
>>> David
>>> -----
>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On 8/11/2019 11:35 pm, Robbin Ehn wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> Thanks for looking into this, some comments on v8:
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>>>>>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>>>>>> src/hotspot/share/logging/logTag.hpp
>>>>>> src/hotspot/share/oops/markWord.hpp
>>>>>> src/hotspot/share/runtime/basicLock.cpp
>>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>>> src/hotspot/share/runtime/sharedRuntime.cpp
>>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>> src/hotspot/share/runtime/vmStructs.cpp
>>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>>> test/hotspot/gtest/oops/test_markWord.cpp
>>>>>>
>>>>>> No comments.
>>>>>>
>>>>>> ##################
>>>>>> I don't see the benefit of having the -HandshakeAfterDeflateIdleMonitors 
>>>>>> code paths.
>>>>>> Removing that option would mean these files can be reverted:
>>>>>> src/hotspot/cpu/aarch64/globals_aarch64.hpp
>>>>>> src/hotspot/cpu/arm/globals_arm.hpp
>>>>>> src/hotspot/cpu/ppc/globals_ppc.hpp
>>>>>> src/hotspot/cpu/s390/globals_s390.hpp
>>>>>> src/hotspot/cpu/sparc/globals_sparc.hpp
>>>>>> src/hotspot/cpu/x86/globals_x86.hpp
>>>>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>>>>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>>>>> src/hotspot/cpu/zero/globals_zero.hpp
>>>>>>
>>>>>> And one less option here:
>>>>>> src/hotspot/share/runtime/globals.hpp
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/prims/jvm.cpp
>>>>>>
>>>>>> Unclear if this is a good idea.
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/prims/whitebox.cpp
>>>>>>
>>>>>> This would assume the test expects the right thing, but that is not obvious.
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>>>>
>>>>>> The current pending and waiting monitor is only changed by the JavaThread 
>>>>>> itself.
>>>>>> It only sets it after _contentions is increased.
>>>>>> It clears it before _contentions is decreased.
>>>>>> We are depending on safepoint or the thread is suspended, so it can't be 
>>>>>> deflated since _contentions are > 0.
>>>>>> Plus the thread have already increased the ref count and can't decrease it 
>>>>>> (since at safepoint or suspended).
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>>
>>>>>> ###1
>>>>>> You have several these (and in other files):
>>>>>> 242?? jint l_ref_count = ref_count();
>>>>>> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d, 
>>>>>> ref_count=%d", l_ref_count, ref_count());
>>>>>> Please use Atomic::load() in ref_count.
>>>>>> Since this is dependent on ref_count being volatile, otherwise the 
>>>>>> compiler may only do one load.
>>>>>>
>>>>>> ###2
>>>>>> 307?? // Prevent deflation. See ObjectSynchronizer::deflate_monitor(),
>>>>>> ...
>>>>>> 311?? Atomic::add(1, &_contentions);
>>>>>> In ObjectSynchronizer::deflate_monitor if you would check ref count 
>>>>>> instead of _contetion, we could remove contention.
>>>>>> Since all waiters also have a ref count it looks like we don't need 
>>>>>> waiters either.
>>>>>> In ObjectSynchronizer::deflate_monitor:
>>>>>> if (mid->_contentions != 0 || mid->_waiters != 0) {
>>>>>> Why not just do:
>>>>>> if (mid->ref_count()) {
>>>>>> ?
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>>>
>>>>>> ###1
>>>>>> ??252?? intptr_t is_busy() const {
>>>>>> ??253???? // TODO-FIXME: assert _owner == null implies _recursions = 0
>>>>>> ??254???? // We do not include _ref_count in the is_busy() check because
>>>>>> ??255???? // _ref_count is for indicating that the ObjectMonitor* is in
>>>>>> ??256???? // use which is orthogonal to whether the ObjectMonitor itself
>>>>>> ??257???? // is in use for a locking operation.
>>>>>>
>>>>>> But in the non-debug code we always check:
>>>>>> +? if (mid->is_busy() || mid->ref_count() != 0) {
>>>>>>
>>>>>> So it seem like you should have a method including ref count.
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp
>>>>>>
>>>>>> Use Atomic::load for ref count.
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>>
>>>>>> ###1
>>>>>> ??139 static volatile int g_om_free_count = 0;??? // # on g_free_list
>>>>>> ??140 static volatile int g_om_in_use_count = 0;? // # on g_om_in_use_list
>>>>>> ??141 static volatile int g_om_population = 0;??? // # Extant -- in 
>>>>>> circulation
>>>>>> ??142 static volatile int g_om_wait_count = 0;??? // # on g_wait_list
>>>>>> No padding here, aren't they more contended than the fields in the OM?
>>>>>>
>>>>>> ###2
>>>>>> 151 static bool is_next_marked(ObjectMonitor* om) {
>>>>>>
>>>>>> Is only used in ObjectSynchronizer::om_flush.
>>>>>> Here you fetch a OM and read the next field, this do not need LA semantics 
>>>>>> on supported platforms.
>>>>>> This would only need Atomic::load.
>>>>>>
>>>>>> ###3
>>>>>> 191 static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
>>>>>>
>>>>>> In no place you need SR, in the only places it would made a difference:
>>>>>> ??345?????? OrderAccess::storestore();
>>>>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>>>>> and
>>>>>> 1714???? OrderAccess::storestore();
>>>>>> 1715???? set_next(in_use_list, next);
>>>>>>
>>>>>> You have a storestore already!
>>>>>>
>>>>>> This code reads as:
>>>>>> OrderAccess::storestore();
>>>>>> OrderAccess::loadstore();
>>>>>> OrderAccess::storestore();
>>>>>> om->_next_om = value
>>>>>>
>>>>>> So it should be an Atomic::store.
>>>>>>
>>>>>> ###4
>>>>>> 198 static bool mark_list_head(ObjectMonitor* volatile * list_p
>>>>>>
>>>>>> Since the mark is an embedded spinlock I think the terminology should be 
>>>>>> changed. (that the spinlock is inside a the next pointer should be 
>>>>>> abstracted away)
>>>>>> E.g. mark_next_loop would just be lock.
>>>>>> The load of the list heads should use Atmoic:load.
>>>>>> It also seem a bit wired to return next for the locking method.
>>>>>> And output parameter can just be returned, and return NULL if list head is 
>>>>>> NULL.
>>>>>> E.g.
>>>>>>
>>>>>> ??198 static ObjectMonitor* get_list_head_locked(ObjectMonitor* volatile * 
>>>>>> list_p) {
>>>>>> ??200?? while (true) {
>>>>>> ??201???? ObjectMonitor* mid = Atomic::load(list_p);
>>>>>> ??202???? if (mid == NULL) {
>>>>>> ??203?????? return NULL;? // The list is empty.
>>>>>> ??204???? }
>>>>>> ??205???? if (try_lock(mid)) {
>>>>>> ??206?????? if (Atmoic::load(list_p) != mid) {
>>>>>> ??207???????? // The list head changed so we have to retry.
>>>>>> ??208???????? unlock(mid);
>>>>>> ??210?????? } else {
>>>>>> ????????????? return mid;
>>>>>> ??????? }
>>>>>> ??214???? }
>>>>>> ????????? // Yield ?
>>>>>> ??215?? }
>>>>>> ??216 }
>>>>>>
>>>>>> With colleteral changes.
>>>>>>
>>>>>> ###5
>>>>>> 220 static ObjectMonitor* unmarked_next(ObjectMonitor* om)
>>>>>> Atomic::store is what needed.
>>>>>>
>>>>>> ###6
>>>>>> 333 static void prepend_to_common(
>>>>>>
>>>>>> ??345?????? OrderAccess::storestore();
>>>>>> ??346?????? set_next(cur, next);? // Unmark the previous list head.
>>>>>> Double storestore. (fixed by changing set_next to Atomic::store)
>>>>>>
>>>>>> ###7
>>>>>> ??375 static ObjectMonitor* take_from_start_of_common(ObjectMonitor* 
>>>>>> volatile * list_p,
>>>>>>
>>>>>> Triple storestore here.
>>>>>>
>>>>>> ??386?? Atomic::dec(count_p);
>>>>>> ??387?? // mark_list_head() used cmpxchg() above, switching list head can 
>>>>>> be lazier:
>>>>>> ??388?? OrderAccess::storestore();
>>>>>> ??389?? // Unmark take, but leave the next value for any lagging list
>>>>>> ??390?? // walkers. It will get cleaned up when take is prepended to
>>>>>> ??391?? // the in-use list:
>>>>>> ??392?? set_next(take, next);
>>>>>> ??393?? return take;
>>>>>>
>>>>>> Reads:
>>>>>> count_p--
>>>>>> OrderAccess::loadstore();
>>>>>> OrderAccess::storestore();
>>>>>> OrderAccess::storestore();
>>>>>> OrderAccess::loadstore();
>>>>>> OrderAccess::storestore();
>>>>>> take->_next_om = next;
>>>>>>
>>>>>> Fixed by changing set_next to Atomic::store and removing the 
>>>>>> OrderAccess::storestore();
>>>>>>
>>>>>> ###8
>>>>>> ObjectSynchronizer::om_release(
>>>>>>
>>>>>> 1591?????? if (m == mid) {
>>>>>> 1592???????? // We found 'm' on the per-thread in-use list so try to 
>>>>>> extract it.
>>>>>> 1593???????? if (cur_mid_in_use == NULL) {
>>>>>> 1594?????????? // mid is the list head and it is marked. Switch the list head
>>>>>> 1595?????????? // to next which unmarks the list head, but leaves mid marked:
>>>>>> 1596?????????? self->om_in_use_list = next;
>>>>>> 1597?????????? // mark_list_head() used cmpxchg() above, switching list 
>>>>>> head can be lazier:
>>>>>> 1598?????????? OrderAccess::storestore();
>>>>>> 1599???????? } else {
>>>>>> 1600?????????? // mid and cur_mid_in_use are marked. Switch cur_mid_in_use's
>>>>>> 1601?????????? // next field to next which unmarks cur_mid_in_use, but leaves
>>>>>> 1602?????????? // mid marked:
>>>>>> 1603 OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
>>>>>> 1604???????? }
>>>>>> 1605???????? extracted = true;
>>>>>> 1606???????? Atomic::dec(&self->om_in_use_count);
>>>>>> 1607???????? // Unmark mid, but leave the next value for any lagging list
>>>>>> 1608???????? // walkers. It will get cleaned up when mid is prepended to
>>>>>> 1609???????? // the thread's free list:
>>>>>> 1610???????? set_next(mid, next);
>>>>>> 1611???????? break;
>>>>>> 1612?????? }
>>>>>>
>>>>>> This does not look correct. Before taking this branch we have done a 
>>>>>> cmpxchg in mark_list_head or mark_next_loop.
>>>>>> This is how it reads:
>>>>>> OrderAccess::storestore(); // from previous cmpxchg
>>>>>> OrderAccess::loadstore(); // from previous cmpxchg
>>>>>> 1591?????? if (m == mid) {
>>>>>> 1593???????? if (cur_mid_in_use == NULL) {
>>>>>> 1596?????????? self->om_in_use_list = next;
>>>>>> 1598?????????? OrderAccess::storestore();
>>>>>> 1599???????? } else {
>>>>>> ??????????????? OrderAccess::storestore();
>>>>>> ??????????????? OrderAccess::loadstore();
>>>>>> 1603?????????? cur_mid_in_use->_next_om = next;
>>>>>> 1604???????? }
>>>>>> 1605???????? extracted = true;
>>>>>> ????????????? OrderAccess::storestore();
>>>>>> ????????????? OrderAccess::fence(); // 
>>>>>> storestore|storeload|loadstore|loadload
>>>>>> ????????? self->om_in_use_count--; // Atomic::dec
>>>>>> ????????????? OrderAccess::storestore();
>>>>>> ????????????? OrderAccess::loadstore();
>>>>>> ????????????? OrderAccess::storestore();
>>>>>> ????????????? OrderAccess::loadstore();
>>>>>> ????????? mid->_next_om = next; // Atomic::store
>>>>>> 1611???????? break;
>>>>>> 1612?????? }
>>>>>>
>>>>>> extracted is local variable so you so not need any orderaccess before it set.
>>>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>>>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>>>>>> Atmoic::store();
>>>>>>
>>>>>> ###9
>>>>>> 1653 void ObjectSynchronizer::om_flush(Thread* self) {
>>>>>>
>>>>>> 1714???? OrderAccess::storestore();
>>>>>> 1715???? set_next(in_use_list, next);
>>>>>> Fixed by changing set_next to Atomic::store.
>>>>>>
>>>>>> ###10
>>>>>> 1737???? self->om_free_list = NULL;
>>>>>> 1738???? OrderAccess::storestore();? // Lazier memory is okay for list 
>>>>>> walkers.
>>>>>>
>>>>>> prepend_list_to_g_free_list/prepend_list_to_g_om_in_use_list does first 
>>>>>> thing cmpxchg so there is no need for this storestore.
>>>>>>
>>>>>> ###11
>>>>>> 1797 void ObjectSynchronizer::inflate(ObjectMonitorHandle* omh_p, Thread* 
>>>>>> self,
>>>>>>
>>>>>> 1938?????? // Once ObjectMonitor is configured and the object is associated
>>>>>> 1939?????? // with the ObjectMonitor, it is safe to allow async deflation:
>>>>>> 1940?????? assert(m->is_new(), "freshly allocated monitor must be new");
>>>>>> 1941?????? m->set_allocation_state(ObjectMonitor::Old);
>>>>>>
>>>>>> So we use ref count, contention, waiter, owner and allocation state to 
>>>>>> keep OM alive in different scenarios.
>>>>>> There is not way for me to keep track of that. I don't see why you would 
>>>>>> need more than owner and ref count.
>>>>>> If you allocate the om with ref count 1 you can remove _allocation_state 
>>>>>> and just decrease ref count here instead.
>>>>>>
>>>>>> ###12
>>>>>> 2079 bool ObjectSynchronizer::deflate_monitor
>>>>>>
>>>>>> 2112???? if (AsyncDeflateIdleMonitors) {
>>>>>> 2113?????? // clear() expects the owner field to be NULL and we won't race
>>>>>> 2114?????? // with the simple C2 ObjectMonitor
>>>>>>
>>>>>> The macro assambler code is not just executed by C2, so this comment is a 
>>>>>> bit misleading. (there are some more also)
>>>>>>
>>>>>> ###13
>>>>>> 2306 int ObjectSynchronizer::deflate_monitor_list(
>>>>>>
>>>>>> Same issue as ObjectSynchronizer::om_release.
>>>>>> Fixed by changing set_next to Atomic::store, removing the 
>>>>>> OrderAccess::storestore() and changing OrderAccess::release_store to 
>>>>>> Atmoic::store();
>>>>>>
>>>>>> ###14
>>>>>> 2474?????? if (SafepointSynchronize::is_synchronizing() &&
>>>>>>
>>>>>> This is the wrong method to call, it should 
>>>>>> SafepointMechanism::should_block(Thread* thread);
>>>>>>
>>>>>> ###15
>>>>>> 2578 void ObjectSynchronizer::deflate_idle_monitors_using_JT() {
>>>>>>
>>>>>> 2616???? g_wait_list = NULL;
>>>>>> 2617???? OrderAccess::storestore();? // Lazier memory sync is okay for 
>>>>>> list walkers.
>>>>>>
>>>>>> I don't see that g_wait_list is ever simutainously read.
>>>>>> Either it is accessed by serviceThread outside a safepoint or by VMThread 
>>>>>> inside a safepoint?
>>>>>>
>>>>>> It looks like g_wait_list can just be a local in:
>>>>>> void ObjectSynchronizer::deflate_idle_monitors_using_JT()
>>>>>>
>>>>>> (disregarding the debug code that might read it in a safepoint)
>>>>>>
>>>>>> ###16
>>>>>> 2722???????? assert(SafepointSynchronize::is_synchronizing(), "sanity 
>>>>>> check");
>>>>>>
>>>>>> This is the wrong method to call, it should 
>>>>>> SafepointMechanism::should_block(Thread* thread);
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/runtime/vframe.cpp
>>>>>>
>>>>>> We are at safepoint or current thread or in a handshake, current pending 
>>>>>> and waiting monitor is already stable.
>>>>>>
>>>>>> ##################
>>>>>> src/hotspot/share/services/threadService.cpp
>>>>>>
>>>>>> These changes are only needed for the -HandshakeAfterDeflateIdleMonitors 
>>>>>> path.
>>>>>>
>>>>>> ##################
>>>>>> test/jdk/java/rmi/server/UnicastRemoteObject/unexportObject/UnexportLeak.java
>>>>>>
>>>>>> Note: if OM had a weak to object instead this would not be needed.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>
>>>>>> On 11/4/19 10:03 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have made changes to the Async Monitor Deflation code in response to
>>>>>>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>>>>>>> and Erik O. for their comments!
>>>>>>>
>>>>>>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
>>>>>>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>>>>>>> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
>>>>>>> so we have 8 days from today to finish this code review cycle and push
>>>>>>> this code for JDK14.
>>>>>>>
>>>>>>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>>>>>>
>>>>>>> I have attached the change list from CR7 to CR8 instead of putting it in
>>>>>>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>>>>>>> file to the webrevs so it should be easy to find.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ???? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-14+21.
>>>>>>>
>>>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>>>> current Async Monitor Deflation code in one go (v2.08 full):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
>>>>>>>
>>>>>>> Some folks might want to see just what has changed since the last review
>>>>>>> cycle so here's a webrev for that (v2.08 inc):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
>>>>>>>
>>>>>>> The OpenJDK wiki did not need any changes for this round:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>
>>>>>>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 tier[1-8]
>>>>>>> testing on Oracle's usual set of platforms. It has also been through my 
>>>>>>> usual
>>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the addition
>>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>>>>>>> my lab. Some testing is still running, but so far there are no new 
>>>>>>> regressions.
>>>>>>>
>>>>>>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 bits.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> The Async Monitor Deflation project is reaching the end game. I have no
>>>>>>>> changes planned for the project at this time so all that is left is code
>>>>>>>> review and any changes that results from those reviews.
>>>>>>>>
>>>>>>>> Carsten and Roman! Time for you guys to chime in again on the code reviews.
>>>>>>>>
>>>>>>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>>>>>>> in the main body of this email.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-14+19.
>>>>>>>>
>>>>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>>>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full
>>>>>>>>
>>>>>>>> Some folks might want to see just what has changed since the last review
>>>>>>>> cycle so here's a webrev for that (v2.07 inc):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/
>>>>>>>>
>>>>>>>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
>>>>>>>> changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>
>>>>>>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>>>>>>>> tier[1-8]
>>>>>>>> testing on Oracle's usual set of platforms. It has also been through my 
>>>>>>>> usual
>>>>>>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>>>>>>> addition
>>>>>>>> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
>>>>>>>> my lab.
>>>>>>>>
>>>>>>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>>>>>>>> tier[1-3]
>>>>>>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>>>>>>>
>>>>>>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>>>>>>> Performance lab
>>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>>>>>>
>>>>>>>> ??? - "base" is jdk-14+18
>>>>>>>> ??? - "v2.07" is the latest version and includes C2 inc_om_ref_count() 
>>>>>>>> support
>>>>>>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>>>>>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>>>>>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors specified
>>>>>>>>
>>>>>>>> ???????? hbIR?????????? hbIR
>>>>>>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>>>>>>>> ??? ---------------? ---------? --------? -------------? -------
>>>>>>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>>>>>>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>>>>>>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>>>>>>>> v2.07_handshake
>>>>>>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 v2.07_off
>>>>>>>>
>>>>>>>> ??? - The Aurora Perf comparison tool reports:
>>>>>>>>
>>>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>>>> ??????? ----------------------? -------------------- --------------------
>>>>>>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, p=0.790)
>>>>>>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, p=0.536)
>>>>>>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, p=0.347)
>>>>>>>>
>>>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>>>
>>>>>>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>>>>>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>>>>>>
>>>>>>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>>>>>>> ??????? ----------------------? -------------------- --------------------
>>>>>>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, p=0.646)
>>>>>>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, p=0.481)
>>>>>>>>
>>>>>>>> ??????? (s) - significant? (ns) - not-significant
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>>>>>>> for our first code review in that new context!!
>>>>>>>>>
>>>>>>>>> I've been focused on changing the monitor list management code to be
>>>>>>>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>>>>>>>> like that, it takes a while to chase down all the new and wonderful
>>>>>>>>> races. At this point, I have the code back to the same stability that
>>>>>>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>>>>>>
>>>>>>>>> To lay the ground work for this round of review, I pushed the following
>>>>>>>>> two fixes to jdk/jdk earlier today:
>>>>>>>>>
>>>>>>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>>>>>>>> preparation
>>>>>>>>> ? ? ??????????? for lock free Monitor lists
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>>>>>>
>>>>>>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails after 
>>>>>>>>> 8230184
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>>>>>>
>>>>>>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>>>>>>> in the main body of this email.
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>>>>>>> JDK-8230184 and JDK-8230317.
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL for those folks that want to see all of the
>>>>>>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The primary focus of this review cycle is on the lock-free Monitor List
>>>>>>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/
>>>>>>>>>
>>>>>>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>>>>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>>>>>>> patch (v2.06b):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/
>>>>>>>>>
>>>>>>>>> The third and final bucket for this review cycle is the rename, 
>>>>>>>>> whitespace,
>>>>>>>>> indent and comments changes made in preparation for lock free Monitor list
>>>>>>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>>>>>>> baseline so this bucket now has just a few comment changes relative to
>>>>>>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Some folks might want to see just what has changed since the last review
>>>>>>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Last, but not least, some folks might want to see the code before the
>>>>>>>>> addition of lock-free Monitor List management so here's a webrev for
>>>>>>>>> that (v2.00 -> v2.05):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/
>>>>>>>>>
>>>>>>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>
>>>>>>>>> but that should only be changes to describe per-thread list async monitor
>>>>>>>>> deflation being done by the ServiceThread.
>>>>>>>>>
>>>>>>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>>>>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>>>>>>
>>>>>>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>>>>>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was using
>>>>>>>>> this patch baselined on jdk-13+31 (for stability):
>>>>>>>>>
>>>>>>>>> ????????? hbIR?????????? hbIR
>>>>>>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>>>>>>> ???? ---------------? ---------? --------? ------------- -------
>>>>>>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>>>>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>>>>>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>>>>>>>> v2.06d_off
>>>>>>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>>>>>>> v2.06d_ocache
>>>>>>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>>>>>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 v2.05_off
>>>>>>>>>
>>>>>>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>>>>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 byte
>>>>>>>>> cache lines sizes. "v2.06d" is the last set of changes that I made before
>>>>>>>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>>>>>>>> buckets for this review recycle.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>>>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>>>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>>>>>>> Thanks Karen!
>>>>>>>>>>
>>>>>>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>>>>>>> in the main body of this email.
>>>>>>>>>>
>>>>>>>>>> Main bug URL:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>>>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>>>>>>> (jdk/jdk) repo soon...
>>>>>>>>>>
>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>>>>>>
>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>>>>>>
>>>>>>>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>>>>>>>> to match the CR5 changes:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>
>>>>>>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>>>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become available.
>>>>>>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>>>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>>>>>>
>>>>>>>>>> Next up:
>>>>>>>>>>
>>>>>>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>>>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>>>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>>>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>>>>>>> - Of course, going lock free often causes new races and new failures
>>>>>>>>>> ? so that's a good reason for make those changes isolated in their
>>>>>>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>>>>>>> - I finally have a potential fix for the Win* failure with
>>>>>>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next round.
>>>>>>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>>>>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>>>>>>> ? look at the C2 RTM macro assembler code and look for things that might
>>>>>>>>>> ? conflict if Async Monitor Deflation. If you're interested in that kind
>>>>>>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>>>>>>> ? added in this round!
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>>>>>>> experiments.
>>>>>>>>>>>
>>>>>>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>>>>>>> themselves are not that big.
>>>>>>>>>>>
>>>>>>>>>>> Functional:
>>>>>>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>>>>>>> ?????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>>>>>>> ??????? safepoints that simply request async deflation which
>>>>>>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>>>>>>> ??????? their async deflation work.
>>>>>>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>>>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>>>>>>>> ????? AsyncDeflationInterval).
>>>>>>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>>>>>>> ??????? ServiceThread.
>>>>>>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>>>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>>>>>>>> ??? of their in-use lists.
>>>>>>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>>>>>>> ????? GuaranteedSafepointInterval.
>>>>>>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>>>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>>>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>>>>>>
>>>>>>>>>>> Test update:
>>>>>>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>>>>>>
>>>>>>>>>>> Collateral:
>>>>>>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>>>>>>
>>>>>>>>>>> Cleanup:
>>>>>>>>>>> ? - Updated comments based on Karen's code review.
>>>>>>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>>>>>>> ??? - comment and function name changes
>>>>>>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>
>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>
>>>>>>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>>>>>>
>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>>>>>>
>>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>>>>>>
>>>>>>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>
>>>>>>>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>>>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>>>>>>
>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>>>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Dan
>>>>>>>>>>>
>>>>>>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>>>>>>
>>>>>>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>>>>>>> Sorry for all the names...
>>>>>>>>>>>>
>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>
>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>
>>>>>>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>>>>>>
>>>>>>>>>>>> Here's the incremental webrev URL:
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>>>>>>
>>>>>>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>>>>>>
>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>>
>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>>>>>>> stress kit is running right now.
>>>>>>>>>>>>
>>>>>>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>>>>>>> the results and analyze them.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>>>>>>>>>> project ready to go. It's also known as v2.02 (for those for with the
>>>>>>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). Sorry
>>>>>>>>>>>>> for all the names...
>>>>>>>>>>>>>
>>>>>>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline patch
>>>>>>>>>>>>> is out of our hair.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>
>>>>>>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the full webrev URL:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>>>>>>
>>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>>>
>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is running
>>>>>>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>>>>>>> testing is done.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) and as
>>>>>>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>>>>>>>> names...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Main bug URL:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>>>>>>> Deflation project
>>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>>>>>>>> version
>>>>>>>>>>>>>> of JDK-8153224...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on Solaris-X64
>>>>>>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>>>>>>>> work on:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>>>>>>> Kitchensink
>>>>>>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>>>>>>>> fastdebug
>>>>>>>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>>>>>>>>>> stress
>>>>>>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>>>>>>>> latest version of the patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> P.S.
>>>>>>>>>>>>>>> One subtest in gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is aggravating
>>>>>>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>

From jianglizhou at google.com  Tue Nov 12 15:49:35 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Tue, 12 Nov 2019 07:49:35 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
Message-ID: <CALrW1jzcLJd+z7+OhAjS2j==nxO0nYA8WhLRBxzNWKdWd+kvXw@mail.gmail.com>

The use case for this RFE is when verification is explicitly disabled
with -Xverify:none by users. For example, users want to disable
verification when running trusted tools.

Regards,
Jiangli

On Mon, Nov 11, 2019 at 8:55 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> I wonder if there's a safer alternative. Are there tools that can add
> stackmaps to pre-JDK6 classes? That way they can be verified with the
> split verifier during CDS dump time.
>
> Thanks
> - Ioi
>
> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> > Hi David,
> >
> > Thanks for quick response!
> >
> > On Mon, Nov 11, 2019 at 3:12 PM David Holmes <david.holmes at oracle.com> wrote:
> >> Hi Jiangli,
> >>
> >> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> >>> Please review the following change that allows archiving
> >>> pre-JAVA_6_VERSION classes with -Xverify:none.
> >>>
> >>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> >>>
> >>> Currently there are still large number of existing classes (pre-built)
> >>> with older class versions (< 50) in real world applications. Those
> >>> classes are missing the benefit of archiving. Particularly, in some
> >>> use cases, class verification can be safely disabled. For those use
> >>> cases, supporting archiving pre JDK 6 classes shows good performance
> >>> benefit. We can re-evaluate this support when -Xverify:none is removed
> >>> in the future, hopefully the needs for supporting class version < 50
> >>> is no longer significant at that time.
> >>>
> >>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
> >>> sure the dump-time verification mode must be the same or stronger than
> >>> the current mode.
> >>>
> >>> A CSR may be needed for the change. Any thoughts on that?
> >> A CSR request is definitely required given that you are proposing to
> >> undo a change that was itself put in place via a CSR request! And given
> >> this is relaxing a "defense-in-depth" check which will result in
> >> increasing exploitability, I think you will need a very strong argument
> >> to justify this.
> > Thanks for confirming this! Will do.
> >
> >> Further this not only undoes JDK-8197972 but it also invalidates
> >> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
> >> requested a way to know if verification had been disabled, to help with
> >> analyzing crash reports, but instead we decided to not allow
> >> verification to be disabled.
> > I had some concerns about JDK-8155671 initially before making the
> > change, as it's a closed bug and my memory about the specific issue
> > was flushed out. I brought up the question in the bug. My take on
> > Ioi's response to my query about JDK-8155671 was that the
> > pre-JDK-8197972 behavior would not cause any security hole.
> >
> > Re-evaluating this particular behavior, I think the pre-JDK-8155671
> > would actually matches user intention better. If user decides to turn
> > off verification in safe use cases, it seems to be a good idea to
> > honor that. With the new dynamic archiving capability, archive could
> > be created at the first time when running a particular application.
> > Not forcing verification when user decides to can avoid
> > unnecessary/unwanted overhead.
> >
> > If verification is turned off at dump time for application classes,
> > runtime does not allow execution without also turning off
> > verification. We can determine a crash is not caused by relaxed dump
> > time verification.
> >
> > Regards,
> > Jiangli
> >
> >> David
> >> -----
> >>
> >>
> >>
> >>> Tested with jtreg appcds tests.
> >>>
> >>> Best,
> >>> Jiangli
> >>>
>

From daniel.daugherty at oracle.com  Tue Nov 12 17:18:12 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 12 Nov 2019 12:18:12 -0500
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
Message-ID: <05b4ec18-1a93-7d3d-fb17-1ce2f5c27e11@oracle.com>

On 11/11/19 11:52 PM, David Holmes wrote:
> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/

src/hotspot/os/posix/os_posix.cpp
 ??? L2078: ? // Can't access interrupt state now we are 
_thread_blocked. If we've been
 ??? L2079: ? // interrupted since we checked above then _counter will 
be > 0.
 ??????? nit - grammar. Please consider:
 ?? ? ? ? ? ? // Can't access interrupt state now that we are 
_thread_blocked. If we've
 ??? ? ? ?? ? // been interrupted since we checked above then _counter 
will be > 0.

src/hotspot/os/solaris/os_solaris.cpp
 ??? L4924: ? // Can't access interrupt state now we are 
_thread_blocked. If we've been
 ??? L4925: ? // interrupted since we checked above then _counter will 
be > 0.
 ??????? nit - grammar. Please consider:
 ???????????? // Can't access interrupt state now that we are 
_thread_blocked. If we've
 ???????????? // been interrupted since we checked above then _counter 
will be > 0.

src/hotspot/share/classfile/javaClasses.cpp
 ??? No comments.

src/hotspot/share/prims/jvmtiEnv.cpp
 ??? Hmmm... did the "non-JavaThread can't be interrupted" check also get
 ??? pushed down?
 ??? Update: Similar check is now in JvmtiRawMonitor::raw_wait().

src/hotspot/share/prims/jvmtiRawMonitor.cpp
 ??? L239: ??? ThreadInVMfromNative tivm(jt);
 ??? L240: ??? if (jt->is_interrupted(true)) {
 ??? L241: ??????? ret = M_INTERRUPTED;
 ??? L242: ??? } else {
 ? ? L243: ????? ThreadBlockInVM tbivm(jt);
 ? ? L244: ????? jt->set_suspend_equivalent();
 ? ? L245: ????? if (millis <= 0) {
 ? ? L246: ??????? self->_ParkEvent->park();
 ? ? L247: ????? } else {
 ? ? L248: ??????? self->_ParkEvent->park(millis);
 ? ? L249: ????? }
 ? ? L250: ??? }
 ? ? L251: ??? // Return to VM before post-check of interrupt state
 ? ? L252: ??? if (jt->is_interrupted(true)) {
 ??????? The comment on L251 is better between L249 and L250 since that
 ??????? is where 'tbivm' gets destroyed and you transition back.

 ??????? You could have this comment before L252:

 ?????????????? // Must be in VM to safely access interrupt state:

 ??????? if you think you really need a comment there.

src/hotspot/share/prims/jvmtiRawMonitor.hpp
 ??? No comments.

src/hotspot/share/runtime/objectMonitor.cpp
 ??? You've moved the is_interrupted() check from after ThreadBlockInVM
 ??? to before it. ThreadBlockInVM can block for a safepoint which widens
 ??? the window for an interrupt to come in after the check on L1272 and
 ??? and before the thread parks on L1286 or L1288.

 ??? Can this result in an unexpected park() where before we would have
 ??? taken the "Intentionally empty" code path on L1283?

 ??? What I'm worried about is whether we've opened a window where we
 ??? do Object.wait(0) and that wait() is supposed to be interrupted.
 ??? However, we lose that interrupt because it arrives in the now wider
 ??? window between L1272 and L1286 and we never return from the wait(0).

 ??? It is possible that I'm not remembering something about how interrupt()
 ??? interacts with park().

test/hotspot/jtreg/ProblemList.txt
 ??? Thanks for remembering to update the ProblemList.

The only part I'm worried about is ObjectMonitor::wait(). If my worry is
baseless, then thumbs up.

I have a couple of nits above. If you choose to fix those, then I don't
need to see another webrev.

Dan


> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>
> In JDK-8229516 I moved the interrupted state of a thread from the 
> osThread in the VM to the java.lang.Thread instance. In doing that I 
> overlooked a critical aspect, which is that to access the field of a 
> Java object the JavaThread must not be in a safepoint-safe state** - 
> otherwise the oop, and anything referenced there from could be 
> relocated by the GC whilst the JavaThread is accessing it. This 
> manifested in a number of tests using JVM TI Agent threads and JVM TI 
> RawMonitors because the JavaThread's were marked _thread_blocked and 
> hence safepoint-safe, and we read a non-zero value for the interrupted 
> field even though we had never been interrupted.
>
> This problem existed in all the code that checks for interruption when 
> "waiting":
>
> - Parker::park (the code underpinning 
> java.util.concurrent.LockSupport.park())
>
> To fix this code I simply deleted a late check of the interrupted 
> field. The check was not needed because if an interrupt has occurred 
> then we will find the ParkEvent in a signalled state.
>
> - ObjectMonitor::wait
>
> Here the late check of the interrupted state is essential as we reset 
> the ParkEvent after an earlier check of the interrupted state. But the 
> fix was simply achieved by moving the check slightly earlier before we 
> use ThreadBlockInVm to become _thread_blocked.
>
> - RawMonitor::wait
>
> This fix was much more involved. The RawMonitor code directly 
> transitions the JavaThread from _thread_in_Native to _thread_blocked. 
> This is safe from a safepoint perspective because they are equivalent 
> safepoint-safe states. To allow access to the interrupted field I have 
> to transition from native to _thread_in_vm, and that has to be done by 
> proper thread-state transitions to ensure correct access to the oop 
> and its fields. Having done that I can then use ThreadBlockInVM for 
> the transitions to blocked. However, as the old code noted it can't 
> use proper thread-state transitions as this will lead to deadlocks 
> with the VMThread that can also use RawMonitors when executing various 
> event callbacks. To deal with that we have to note that the real 
> constraint is that the JavaThread cannot block at a safepoint whilst 
> it holds the RawMonitor. Hence the fix was push all the interrupt 
> checking code and the thread-state transitions to the lowest level of 
> RawMonitorWait, around the final park() call, after we have enqueued 
> the waiter and released the monitor. That avoids any deadlock 
> possibility.
>
> I also added checks to is_interrupted/interrupted to ensure they are 
> only called by a thread in a suitable state. This should only be the 
> VMThread (as a consequence of the Thread.stop implementation occurring 
> at a safepoint and issuing a JavaThread::interrupt() call to unblock 
> the target); or a JavaThread that is not _thread_in_native or 
> _thread_blocked.
>
> Testing: (still finalizing)
> ?- tiers 1 - 6 (Oracle platforms)
> ?- Local Linux testing
> ? - vmTestbase/nsk/monitoring/
> ? - vmTestbase/nsk/jdwp
> ? - vmTestbase/nsk/jdb/
> ? - vmTestbase/nsk/jdi/
> ? - vmTestbase/nsk/jvmti/
> ? - serviceability/jvmti/
> ? - serviceability/jdwp
> ? - JDK: java/lang/management
> ???????? com/sun/management
>
> ** Note that this applies to all accesses we make via code in 
> javaClasses.*. For this particular code I thought about adding a guard 
> in JavaThread::threadObj() but it turns out when we generate a crash 
> report we access the Thread's name() field and that can happen when in 
> any state, so we'd always trigger a secondary assertion failure during 
> error reporting if we did that. Note that accessing name() can still 
> easily lead to secondary assertions failures as I discovered when 
> trying to debug this and print the thread name out - I would see an 
> is_instance assertion fail checking that the Thread name() is an 
> instance of java.lang.String!
>
> Thanks,
> David
> -----


From Alan.Hayward at arm.com  Tue Nov 12 18:03:02 2019
From: Alan.Hayward at arm.com (Alan Hayward)
Date: Tue, 12 Nov 2019 18:03:02 +0000
Subject: [aarch64-port-dev ] RFR 8231841: AArch64: Add entry to pns output in
 help()
Message-ID: <BCE2D257-945C-4BFC-930E-D8922C59B6A6@arm.com>

Please could you review this change which adds AArch64 to the pns section of the help() output.

Bug: https://bugs.openjdk.java.net/browse/JDK-8231841
Webrev: http://cr.openjdk.java.net/~smonteith/8231841/webrev.0/


Built and ran tier1 on x86 and AArch64.


Thanks,
Alan.

From thomas.stuefe at gmail.com  Tue Nov 12 20:15:38 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 12 Nov 2019 21:15:38 +0100
Subject: RFR: 8233785: Incorrect JDK version is reported in hs_err log
In-Reply-To: <e4a5a5c4-c9de-a93a-5f59-3528c534a6df@oss.nttdata.com>
References: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>
 <6811d542-a530-5d70-5fd6-bea47de81d35@oracle.com>
 <317c088d-687c-c9a5-cc7f-c6744ea275a5@oracle.com>
 <e4a5a5c4-c9de-a93a-5f59-3528c534a6df@oss.nttdata.com>
Message-ID: <CAA-vtUzMfY34XR-NXSrpoyOpfsTTC5DVyfvk=mWtDU=qmKVgSg@mail.gmail.com>

Hi Yasumasa,

looks good. Could you please add a small comment mentioning JEP 223? E.g.
just a simple "/* See JEP 223 */"

I do not need to see another webrev.

Thanks, Thomas


On Mon, Nov 11, 2019 at 3:07 PM Yasumasa Suenaga <suenaga at oss.nttdata.com>
wrote:

> Thanks David!
> I wait second reviewer.
>
> Yasumasa
>
> On 2019/11/11 19:28, David Holmes wrote:
> > Sorry for the delay.
> >
> > Just confirming I've verified against the spec from JEP 223 and this fix
> is correct.
> >
> > Thanks,
> > David
> >
> > On 7/11/2019 10:39 pm, David Holmes wrote:
> >> Hi Yasumasa,
> >>
> >> On 7/11/2019 10:28 pm, Yasumasa Suenaga wrote:
> >>> Hi all,
> >>>
> >>> Please review this change:
> >>>
> >>>    JBS: https://bugs.openjdk.java.net/browse/JDK-8233785
> >>>    webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233785/webrev.00/
> >>>
> >>> If JVM which is configured with --with-version-patch is crashed, JDK
> version in he_err log is incorrect.
> >>> We can get hs_err log which contains the following in header when we
> configure configure with "--with-version-update=0 --with-version-patch=1":
> >>>
> >>> ```
> >>> # JRE version: OpenJDK Runtime Environment (14.0.1+2) (build
> 14.0.0.1+2-TypeS)
> >>> ```
> >>>
> >>> Valid JDK version is "14.0.0.1", however it includes "14.0.1".
> >>> It is a bug in JDK_Version::to_string().
> >>
> >> I initially missed the fact that you always print _security along with
> _patch.
> >>
> >> I think what you have looks correct, but I'd want to double check that
> against the versioning spec to be sure.
> >>
> >> Thanks,
> >> David
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Yasumasa
>

From harold.seigel at oracle.com  Tue Nov 12 20:40:40 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Tue, 12 Nov 2019 15:40:40 -0500
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
Message-ID: <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>

Hi Jiangli,

I think this change is going in the wrong direction.? We are trying to 
discourage disabling verification, not encourage it.? We also do not 
want to create more use-cases for preserving -Xverify:none.

It looks like your change would allow archiving of unverified pre-JDK6 
classes, but not allow archiving of verified pre-JDK6 classes.? If so, 
that seems backward.

Thanks, Harold

On 11/11/2019 11:53 PM, Ioi Lam wrote:
> I wonder if there's a safer alternative. Are there tools that can add 
> stackmaps to pre-JDK6 classes? That way they can be verified with the 
> split verifier during CDS dump time.
>
> Thanks
> - Ioi
>
> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
>> Hi David,
>>
>> Thanks for quick response!
>>
>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes 
>> <david.holmes at oracle.com> wrote:
>>> Hi Jiangli,
>>>
>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
>>>> Please review the following change that allows archiving
>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
>>>>
>>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
>>>>
>>>> Currently there are still large number of existing classes (pre-built)
>>>> with older class versions (< 50) in real world applications. Those
>>>> classes are missing the benefit of archiving. Particularly, in some
>>>> use cases, class verification can be safely disabled. For those use
>>>> cases, supporting archiving pre JDK 6 classes shows good performance
>>>> benefit. We can re-evaluate this support when -Xverify:none is removed
>>>> in the future, hopefully the needs for supporting class version < 50
>>>> is no longer significant at that time.
>>>>
>>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
>>>> sure the dump-time verification mode must be the same or stronger than
>>>> the current mode.
>>>>
>>>> A CSR may be needed for the change. Any thoughts on that?
>>> A CSR request is definitely required given that you are proposing to
>>> undo a change that was itself put in place via a CSR request! And given
>>> this is relaxing a "defense-in-depth" check which will result in
>>> increasing exploitability, I think you will need a very strong argument
>>> to justify this.
>> Thanks for confirming this! Will do.
>>
>>> Further this not only undoes JDK-8197972 but it also invalidates
>>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
>>> requested a way to know if verification had been disabled, to help with
>>> analyzing crash reports, but instead we decided to not allow
>>> verification to be disabled.
>> I had some concerns about JDK-8155671 initially before making the
>> change, as it's a closed bug and my memory about the specific issue
>> was flushed out. I brought up the question in the bug. My take on
>> Ioi's response to my query about JDK-8155671 was that the
>> pre-JDK-8197972 behavior would not cause any security hole.
>>
>> Re-evaluating this particular behavior, I think the pre-JDK-8155671
>> would actually matches user intention better. If user decides to turn
>> off verification in safe use cases, it seems to be a good idea to
>> honor that. With the new dynamic archiving capability, archive could
>> be created at the first time when running a particular application.
>> Not forcing verification when user decides to can avoid
>> unnecessary/unwanted overhead.
>>
>> If verification is turned off at dump time for application classes,
>> runtime does not allow execution without also turning off
>> verification. We can determine a crash is not caused by relaxed dump
>> time verification.
>>
>> Regards,
>> Jiangli
>>
>>> David
>>> -----
>>>
>>>
>>>
>>>> Tested with jtreg appcds tests.
>>>>
>>>> Best,
>>>> Jiangli
>>>>
>

From daniel.daugherty at oracle.com  Tue Nov 12 22:24:24 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 12 Nov 2019 17:24:24 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
Message-ID: <fc503d62-b1f6-5842-85c7-f230fc942f5b@oracle.com>

Greetings,

I'm only going to jump in on a single item here (at this time).


On 11/11/19 9:03 AM, David Holmes wrote:
> Hi Robbin,
>
> On 11/11/2019 10:41 pm, Robbin Ehn wrote:
>>
>> Also in this patch there is already Atomic::store/load on "volatile 
>> markWord _header;".
>
> And I've flagged the inappropriateness of using these with Dan. Though 
> I see we already have a couple of pre-existing occurrences which have 
> snuck in - again this seems to be a misunderstanding about the need 
> for Atomic use in these cases.

I read this comment from Robbin and David's reply and my brain said: What?
It might have been just the common three letter acronym, but I 
digress... :-)

So I searched the patch:

$ grep _header 11-for-jdk14.v2.08.full/open.patch | egrep 'store|load'
 ?? assert(Atomic::load(&_header).value() != 0, "must be non-zero");
+? Atomic::store(markWord::zero(), &_header);
-? Atomic::store(markWord::zero(), &_header);

Oh... that code... If you look at the 8153224 webrev you'll see that the
ObjectMonitor::clear() function was refactored into two parts:

 ??? ObjectMonitor::clear()
 ??? ObjectMonitor::clear_using_JT()

This line:

 ??? Atomic::store(markWord::zero(), &_header);

is in the original ObjectMonitor::clear() function and it is in
the new ObjectMonitor::clear() function, but there's a lot of
code motion in between those two functions so... diff took the
easy way out and showed this:

 ??? +? Atomic::store(markWord::zero(), &_header);

as a new line in the shorter, new clear() function
and as a deleted line in the longer, old clear() function:

 ??? -? Atomic::store(markWord::zero(), &_header);


Okay, so where did that come from? I've been tweaking a lot of
ObjectMonitor code lately, but I don't think that line is mine...

$ hg annot src/hotspot/share/runtime/objectMonitor.inline.hpp | grep 
'Atomic::store(markWord::zero(), &_header);'
56006:?? Atomic::store(markWord::zero(), &_header);

$ hg log -r 56006
changeset:?? 56006:90ead0febf56
user:??????? stefank
date:??????? Tue Aug 06 10:48:21 2019 +0200
summary:???? 8229258: Rework markOop and markOopDesc into a simpler mark 
word value carrier

Okay, I remember this bug:

https://bugs.openjdk.java.net/browse/JDK-8229258

and I even reviewed it... :-)? It looks like the reviewers are:

 > Reviewed-by: rkennke, coleenp, kbarrett, dcubed


Roman posted this comment during the 8229258 review:

On 8/15/19 1:06 PM, Roman Kennke wrote:

> Out of curiosity, what's with the changes in objectMonitor.inline.hpp to
> access the markWord atomically?:
>
> -inline markOop ObjectMonitor::header() const {
> -  return _header;
> +inline markWord ObjectMonitor::header() const {
> +  return Atomic::load(&_header);
>   }
>
> I guess this is good (equal or stronger than before) but is there a
> rationale behind these changes?

and Stefan K replied with this:

On 8/15/19 3:26 PM, Stefan Karlsson wrote:

> Ahh. Right. That was done to solve the problems I were having with 
> volatiles. For example:
> src/hotspot/share/runtime/objectMonitor.inline.hpp:38:10: error: 
> binding reference of type 'const markWord&' to 'const volatile 
> markWord' discards qualifiers
> ?? return _header;
>
> and:
> src/hotspot/share/runtime/basicLock.hpp:40:74: error: implicit 
> dereference will not access object of type ?volatile markWord? in 
> statement [-Werror]
> ? void???????? set_displaced_header(markWord header) { 
> _displaced_header = header; }
>
> Kim suggested that the fact that these fields were volatile was an 
> indication that we should be doing some kind of atomic/ordered 
> operation. By replacing these loads and stores with calls to the 
> Atomic APIs, and providing the 
> PrimitiveConversions::Translate<markWord> specialization, we could 
> solve that problem. 

So it appears that Stefan has a good rationale for making the
Atomic::load() and Atomic::store() changes with the _header field.
Since I've added more volatile fields to ObjectMonitor, it would
follow that I should make similar changes...

However, it's not clear that David agrees with the above change so
I'm hesitant to make the similar changes to my patch...

How do we resolve this issue?

Dan

From david.holmes at oracle.com  Tue Nov 12 22:50:15 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 13 Nov 2019 08:50:15 +1000
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <05b4ec18-1a93-7d3d-fb17-1ce2f5c27e11@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <05b4ec18-1a93-7d3d-fb17-1ce2f5c27e11@oracle.com>
Message-ID: <96cf9e10-a3df-fc5a-2cfa-ac156c10f99c@oracle.com>

Hi Dan,

Thanks for taking a look so quickly!

On 13/11/2019 3:18 am, Daniel D. Daugherty wrote:
> On 11/11/19 11:52 PM, David Holmes wrote:
>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
> 
> src/hotspot/os/posix/os_posix.cpp
>  ??? L2078: ? // Can't access interrupt state now we are 
> _thread_blocked. If we've been
>  ??? L2079: ? // interrupted since we checked above then _counter will 
> be > 0.
>  ??????? nit - grammar. Please consider:
>  ?? ? ? ? ? ? // Can't access interrupt state now that we are 
> _thread_blocked. If we've
>  ??? ? ? ?? ? // been interrupted since we checked above then _counter 
> will be > 0.
> 
> src/hotspot/os/solaris/os_solaris.cpp
>  ??? L4924: ? // Can't access interrupt state now we are 
> _thread_blocked. If we've been
>  ??? L4925: ? // interrupted since we checked above then _counter will 
> be > 0.
>  ??????? nit - grammar. Please consider:
>  ???????????? // Can't access interrupt state now that we are 
> _thread_blocked. If we've
>  ???????????? // been interrupted since we checked above then _counter 
> will be > 0.

Will fix grammatical nits.

> src/hotspot/share/classfile/javaClasses.cpp
>  ??? No comments.
> 
> src/hotspot/share/prims/jvmtiEnv.cpp
>  ??? Hmmm... did the "non-JavaThread can't be interrupted" check also get
>  ??? pushed down?
>  ??? Update: Similar check is now in JvmtiRawMonitor::raw_wait().
> 
> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>  ??? L239: ??? ThreadInVMfromNative tivm(jt);
>  ??? L240: ??? if (jt->is_interrupted(true)) {
>  ??? L241: ??????? ret = M_INTERRUPTED;
>  ??? L242: ??? } else {
>  ? ? L243: ????? ThreadBlockInVM tbivm(jt);
>  ? ? L244: ????? jt->set_suspend_equivalent();
>  ? ? L245: ????? if (millis <= 0) {
>  ? ? L246: ??????? self->_ParkEvent->park();
>  ? ? L247: ????? } else {
>  ? ? L248: ??????? self->_ParkEvent->park(millis);
>  ? ? L249: ????? }
>  ? ? L250: ??? }
>  ? ? L251: ??? // Return to VM before post-check of interrupt state
>  ? ? L252: ??? if (jt->is_interrupted(true)) {
>  ??????? The comment on L251 is better between L249 and L250 since that
>  ??????? is where 'tbivm' gets destroyed and you transition back.
> 
>  ??????? You could have this comment before L252:
> 
>  ?????????????? // Must be in VM to safely access interrupt state:
> 
>  ??????? if you think you really need a comment there.

Will move comment up as suggested.

> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/objectMonitor.cpp
>  ??? You've moved the is_interrupted() check from after ThreadBlockInVM
>  ??? to before it. ThreadBlockInVM can block for a safepoint which widens
>  ??? the window for an interrupt to come in after the check on L1272 and
>  ??? and before the thread parks on L1286 or L1288.
> 
>  ??? Can this result in an unexpected park() where before we would have
>  ??? taken the "Intentionally empty" code path on L1283?
> 
>  ??? What I'm worried about is whether we've opened a window where we
>  ??? do Object.wait(0) and that wait() is supposed to be interrupted.
>  ??? However, we lose that interrupt because it arrives in the now wider
>  ??? window between L1272 and L1286 and we never return from the wait(0).
> 
>  ??? It is possible that I'm not remembering something about how 
> interrupt()
>  ??? interacts with park().

The interrupt() not only sets the field but also issues an unpark() to 
the ParkEvent. So if we are interrupted whilst processing through the 
TBIVM, the call to park() will return immediately as the ParkEvent will 
be in the signalled state.

> test/hotspot/jtreg/ProblemList.txt
>  ??? Thanks for remembering to update the ProblemList.
> 
> The only part I'm worried about is ObjectMonitor::wait(). If my worry is
> baseless, then thumbs up.

Worry is baseless :)

> I have a couple of nits above. If you choose to fix those, then I don't
> need to see another webrev.

Thanks again!

David
-----

> Dan
> 
> 
>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>
>> In JDK-8229516 I moved the interrupted state of a thread from the 
>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>> overlooked a critical aspect, which is that to access the field of a 
>> Java object the JavaThread must not be in a safepoint-safe state** - 
>> otherwise the oop, and anything referenced there from could be 
>> relocated by the GC whilst the JavaThread is accessing it. This 
>> manifested in a number of tests using JVM TI Agent threads and JVM TI 
>> RawMonitors because the JavaThread's were marked _thread_blocked and 
>> hence safepoint-safe, and we read a non-zero value for the interrupted 
>> field even though we had never been interrupted.
>>
>> This problem existed in all the code that checks for interruption when 
>> "waiting":
>>
>> - Parker::park (the code underpinning 
>> java.util.concurrent.LockSupport.park())
>>
>> To fix this code I simply deleted a late check of the interrupted 
>> field. The check was not needed because if an interrupt has occurred 
>> then we will find the ParkEvent in a signalled state.
>>
>> - ObjectMonitor::wait
>>
>> Here the late check of the interrupted state is essential as we reset 
>> the ParkEvent after an earlier check of the interrupted state. But the 
>> fix was simply achieved by moving the check slightly earlier before we 
>> use ThreadBlockInVm to become _thread_blocked.
>>
>> - RawMonitor::wait
>>
>> This fix was much more involved. The RawMonitor code directly 
>> transitions the JavaThread from _thread_in_Native to _thread_blocked. 
>> This is safe from a safepoint perspective because they are equivalent 
>> safepoint-safe states. To allow access to the interrupted field I have 
>> to transition from native to _thread_in_vm, and that has to be done by 
>> proper thread-state transitions to ensure correct access to the oop 
>> and its fields. Having done that I can then use ThreadBlockInVM for 
>> the transitions to blocked. However, as the old code noted it can't 
>> use proper thread-state transitions as this will lead to deadlocks 
>> with the VMThread that can also use RawMonitors when executing various 
>> event callbacks. To deal with that we have to note that the real 
>> constraint is that the JavaThread cannot block at a safepoint whilst 
>> it holds the RawMonitor. Hence the fix was push all the interrupt 
>> checking code and the thread-state transitions to the lowest level of 
>> RawMonitorWait, around the final park() call, after we have enqueued 
>> the waiter and released the monitor. That avoids any deadlock 
>> possibility.
>>
>> I also added checks to is_interrupted/interrupted to ensure they are 
>> only called by a thread in a suitable state. This should only be the 
>> VMThread (as a consequence of the Thread.stop implementation occurring 
>> at a safepoint and issuing a JavaThread::interrupt() call to unblock 
>> the target); or a JavaThread that is not _thread_in_native or 
>> _thread_blocked.
>>
>> Testing: (still finalizing)
>> ?- tiers 1 - 6 (Oracle platforms)
>> ?- Local Linux testing
>> ? - vmTestbase/nsk/monitoring/
>> ? - vmTestbase/nsk/jdwp
>> ? - vmTestbase/nsk/jdb/
>> ? - vmTestbase/nsk/jdi/
>> ? - vmTestbase/nsk/jvmti/
>> ? - serviceability/jvmti/
>> ? - serviceability/jdwp
>> ? - JDK: java/lang/management
>> ???????? com/sun/management
>>
>> ** Note that this applies to all accesses we make via code in 
>> javaClasses.*. For this particular code I thought about adding a guard 
>> in JavaThread::threadObj() but it turns out when we generate a crash 
>> report we access the Thread's name() field and that can happen when in 
>> any state, so we'd always trigger a secondary assertion failure during 
>> error reporting if we did that. Note that accessing name() can still 
>> easily lead to secondary assertions failures as I discovered when 
>> trying to debug this and print the thread name out - I would see an 
>> is_instance assertion fail checking that the Thread name() is an 
>> instance of java.lang.String!
>>
>> Thanks,
>> David
>> -----
> 

From david.holmes at oracle.com  Tue Nov 12 23:12:14 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 13 Nov 2019 09:12:14 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <fc503d62-b1f6-5842-85c7-f230fc942f5b@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
 <fc503d62-b1f6-5842-85c7-f230fc942f5b@oracle.com>
Message-ID: <fa7d8cb8-a948-10aa-6c82-d81a2271af49@oracle.com>

Hi Dan,

On 13/11/2019 8:24 am, Daniel D. Daugherty wrote:
> Greetings,
> 
> I'm only going to jump in on a single item here (at this time).
> 
> 
> On 11/11/19 9:03 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> On 11/11/2019 10:41 pm, Robbin Ehn wrote:
>>>
>>> Also in this patch there is already Atomic::store/load on "volatile 
>>> markWord _header;".
>>
>> And I've flagged the inappropriateness of using these with Dan. Though 
>> I see we already have a couple of pre-existing occurrences which have 
>> snuck in - again this seems to be a misunderstanding about the need 
>> for Atomic use in these cases.
> 
> I read this comment from Robbin and David's reply and my brain said: What?
> It might have been just the common three letter acronym, but I 
> digress... :-)

First let me clarify that my comment quoted above may have been too 
broad/general. I wasn't recalling specific changes where you added 
Atomic::load/store but email exchanges where you said words to the 
effect of "I could replace ... with Atomic::store ..." and I replied 
that there was no need to use Atomic::load/store.

> So I searched the patch:

Thanks for digging that out, as I hadn't recalled those details.

Short version: I've agreed with Robbin that we should move to use 
Atomic::load/store to get compiler-based-atomicity rather than relying 
on use of "volatile" on variables. But for the purposes of this patch 
(where Robbin made a number of suggestions on where to use 
Atomic::load/store) that we try to limit using this new style to 
inherently new code (ie lock-free list management) rather than 
retrofitting all existing usages.

Hope that clarifies somewhat.

Thanks,
David
-----

> $ grep _header 11-for-jdk14.v2.08.full/open.patch | egrep 'store|load'
>  ?? assert(Atomic::load(&_header).value() != 0, "must be non-zero");
> +? Atomic::store(markWord::zero(), &_header);
> -? Atomic::store(markWord::zero(), &_header);
> 
> Oh... that code... If you look at the 8153224 webrev you'll see that the
> ObjectMonitor::clear() function was refactored into two parts:
> 
>  ??? ObjectMonitor::clear()
>  ??? ObjectMonitor::clear_using_JT()
> 
> This line:
> 
>  ??? Atomic::store(markWord::zero(), &_header);
> 
> is in the original ObjectMonitor::clear() function and it is in
> the new ObjectMonitor::clear() function, but there's a lot of
> code motion in between those two functions so... diff took the
> easy way out and showed this:
> 
>  ??? +? Atomic::store(markWord::zero(), &_header);
> 
> as a new line in the shorter, new clear() function
> and as a deleted line in the longer, old clear() function:
> 
>  ??? -? Atomic::store(markWord::zero(), &_header);
> 
> 
> Okay, so where did that come from? I've been tweaking a lot of
> ObjectMonitor code lately, but I don't think that line is mine...
> 
> $ hg annot src/hotspot/share/runtime/objectMonitor.inline.hpp | grep 
> 'Atomic::store(markWord::zero(), &_header);'
> 56006:?? Atomic::store(markWord::zero(), &_header);
> 
> $ hg log -r 56006
> changeset:?? 56006:90ead0febf56
> user:??????? stefank
> date:??????? Tue Aug 06 10:48:21 2019 +0200
> summary:???? 8229258: Rework markOop and markOopDesc into a simpler mark 
> word value carrier
> 
> Okay, I remember this bug:
> 
> https://bugs.openjdk.java.net/browse/JDK-8229258
> 
> and I even reviewed it... :-)? It looks like the reviewers are:
> 
>  > Reviewed-by: rkennke, coleenp, kbarrett, dcubed
> 
> 
> Roman posted this comment during the 8229258 review:
> 
> On 8/15/19 1:06 PM, Roman Kennke wrote:
> 
>> Out of curiosity, what's with the changes in objectMonitor.inline.hpp to
>> access the markWord atomically?:
>>
>> -inline markOop ObjectMonitor::header() const {
>> -? return _header;
>> +inline markWord ObjectMonitor::header() const {
>> +? return Atomic::load(&_header);
>> ? }
>>
>> I guess this is good (equal or stronger than before) but is there a
>> rationale behind these changes?
> 
> and Stefan K replied with this:
> 
> On 8/15/19 3:26 PM, Stefan Karlsson wrote:
> 
>> Ahh. Right. That was done to solve the problems I were having with 
>> volatiles. For example:
>> src/hotspot/share/runtime/objectMonitor.inline.hpp:38:10: error: 
>> binding reference of type 'const markWord&' to 'const volatile 
>> markWord' discards qualifiers
>> ?? return _header;
>>
>> and:
>> src/hotspot/share/runtime/basicLock.hpp:40:74: error: implicit 
>> dereference will not access object of type ?volatile markWord? in 
>> statement [-Werror]
>> ? void???????? set_displaced_header(markWord header) { 
>> _displaced_header = header; }
>>
>> Kim suggested that the fact that these fields were volatile was an 
>> indication that we should be doing some kind of atomic/ordered 
>> operation. By replacing these loads and stores with calls to the 
>> Atomic APIs, and providing the 
>> PrimitiveConversions::Translate<markWord> specialization, we could 
>> solve that problem. 
> 
> So it appears that Stefan has a good rationale for making the
> Atomic::load() and Atomic::store() changes with the _header field.
> Since I've added more volatile fields to ObjectMonitor, it would
> follow that I should make similar changes...
> 
> However, it's not clear that David agrees with the above change so
> I'm hesitant to make the similar changes to my patch...
> 
> How do we resolve this issue?
> 
> Dan

From ioi.lam at oracle.com  Wed Nov 13 00:16:43 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 12 Nov 2019 16:16:43 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
Message-ID: <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>

I am also a little worried that this might send the wrong message -- "if 
you want to archive pre-JDK6 classes, you need to disable verification 
altogether for all classes in your entire app".

Thanks
- Ioi

On 11/12/19 12:40 PM, Harold Seigel wrote:
> Hi Jiangli,
>
> I think this change is going in the wrong direction.? We are trying to 
> discourage disabling verification, not encourage it.? We also do not 
> want to create more use-cases for preserving -Xverify:none.
>
> It looks like your change would allow archiving of unverified pre-JDK6 
> classes, but not allow archiving of verified pre-JDK6 classes.? If so, 
> that seems backward.
>
> Thanks, Harold
>
> On 11/11/2019 11:53 PM, Ioi Lam wrote:
>> I wonder if there's a safer alternative. Are there tools that can add 
>> stackmaps to pre-JDK6 classes? That way they can be verified with the 
>> split verifier during CDS dump time.
>>
>> Thanks
>> - Ioi
>>
>> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
>>> Hi David,
>>>
>>> Thanks for quick response!
>>>
>>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes 
>>> <david.holmes at oracle.com> wrote:
>>>> Hi Jiangli,
>>>>
>>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
>>>>> Please review the following change that allows archiving
>>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
>>>>>
>>>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
>>>>>
>>>>> Currently there are still large number of existing classes 
>>>>> (pre-built)
>>>>> with older class versions (< 50) in real world applications. Those
>>>>> classes are missing the benefit of archiving. Particularly, in some
>>>>> use cases, class verification can be safely disabled. For those use
>>>>> cases, supporting archiving pre JDK 6 classes shows good performance
>>>>> benefit. We can re-evaluate this support when -Xverify:none is 
>>>>> removed
>>>>> in the future, hopefully the needs for supporting class version < 50
>>>>> is no longer significant at that time.
>>>>>
>>>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
>>>>> sure the dump-time verification mode must be the same or stronger 
>>>>> than
>>>>> the current mode.
>>>>>
>>>>> A CSR may be needed for the change. Any thoughts on that?
>>>> A CSR request is definitely required given that you are proposing to
>>>> undo a change that was itself put in place via a CSR request! And 
>>>> given
>>>> this is relaxing a "defense-in-depth" check which will result in
>>>> increasing exploitability, I think you will need a very strong 
>>>> argument
>>>> to justify this.
>>> Thanks for confirming this! Will do.
>>>
>>>> Further this not only undoes JDK-8197972 but it also invalidates
>>>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
>>>> requested a way to know if verification had been disabled, to help 
>>>> with
>>>> analyzing crash reports, but instead we decided to not allow
>>>> verification to be disabled.
>>> I had some concerns about JDK-8155671 initially before making the
>>> change, as it's a closed bug and my memory about the specific issue
>>> was flushed out. I brought up the question in the bug. My take on
>>> Ioi's response to my query about JDK-8155671 was that the
>>> pre-JDK-8197972 behavior would not cause any security hole.
>>>
>>> Re-evaluating this particular behavior, I think the pre-JDK-8155671
>>> would actually matches user intention better. If user decides to turn
>>> off verification in safe use cases, it seems to be a good idea to
>>> honor that. With the new dynamic archiving capability, archive could
>>> be created at the first time when running a particular application.
>>> Not forcing verification when user decides to can avoid
>>> unnecessary/unwanted overhead.
>>>
>>> If verification is turned off at dump time for application classes,
>>> runtime does not allow execution without also turning off
>>> verification. We can determine a crash is not caused by relaxed dump
>>> time verification.
>>>
>>> Regards,
>>> Jiangli
>>>
>>>> David
>>>> -----
>>>>
>>>>
>>>>
>>>>> Tested with jtreg appcds tests.
>>>>>
>>>>> Best,
>>>>> Jiangli
>>>>>
>>


From suenaga at oss.nttdata.com  Wed Nov 13 01:20:47 2019
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Wed, 13 Nov 2019 10:20:47 +0900
Subject: RFR: 8233785: Incorrect JDK version is reported in hs_err log
In-Reply-To: <CAA-vtUzMfY34XR-NXSrpoyOpfsTTC5DVyfvk=mWtDU=qmKVgSg@mail.gmail.com>
References: <d9a24903-9053-06a0-e74b-7bfb43370767@oss.nttdata.com>
 <6811d542-a530-5d70-5fd6-bea47de81d35@oracle.com>
 <317c088d-687c-c9a5-cc7f-c6744ea275a5@oracle.com>
 <e4a5a5c4-c9de-a93a-5f59-3528c534a6df@oss.nttdata.com>
 <CAA-vtUzMfY34XR-NXSrpoyOpfsTTC5DVyfvk=mWtDU=qmKVgSg@mail.gmail.com>
Message-ID: <7a5a850b-2513-8de0-6069-5da50c3c8cc8@oss.nttdata.com>

Thanks Stufe!

I will add the comment to just before JDK_Version::to_string() declaration.


Yasumasa


On 2019/11/13 5:15, Thomas St?fe wrote:
> Hi Yasumasa,
> 
> looks good. Could you please add a small comment mentioning JEP 223? E.g. just a simple "/* See JEP 223 */"
> 
> I do not need to see another webrev.
> 
> Thanks, Thomas
> 
> 
> 
> On Mon, Nov 11, 2019 at 3:07 PM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
> 
>     Thanks David!
>     I wait second reviewer.
> 
>     Yasumasa
> 
>     On 2019/11/11 19:28, David Holmes wrote:
>      > Sorry for the delay.
>      >
>      > Just confirming I've verified against the spec from JEP 223 and this fix is correct.
>      >
>      > Thanks,
>      > David
>      >
>      > On 7/11/2019 10:39 pm, David Holmes wrote:
>      >> Hi Yasumasa,
>      >>
>      >> On 7/11/2019 10:28 pm, Yasumasa Suenaga wrote:
>      >>> Hi all,
>      >>>
>      >>> Please review this change:
>      >>>
>      >>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8233785
>      >>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8233785/webrev.00/
>      >>>
>      >>> If JVM which is configured with --with-version-patch is crashed, JDK version in he_err log is incorrect.
>      >>> We can get hs_err log which contains the following in header when we configure configure with "--with-version-update=0 --with-version-patch=1":
>      >>>
>      >>> ```
>      >>> # JRE version: OpenJDK Runtime Environment (14.0.1+2) (build 14.0.0.1+2-TypeS)
>      >>> ```
>      >>>
>      >>> Valid JDK version is "14.0.0.1", however it includes "14.0.1".
>      >>> It is a bug in JDK_Version::to_string().
>      >>
>      >> I initially missed the fact that you always print _security along with _patch.
>      >>
>      >> I think what you have looks correct, but I'd want to double check that against the versioning spec to be sure.
>      >>
>      >> Thanks,
>      >> David
>      >>
>      >>>
>      >>> Thanks,
>      >>>
>      >>> Yasumasa
> 

From jianglizhou at google.com  Wed Nov 13 02:20:18 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Tue, 12 Nov 2019 18:20:18 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
 <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
Message-ID: <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>

Hi Harold and Ioi,

Thanks a lot for the additional feedback.

I did some quick research today about -Xverify:none usages. My finding
showed that the use of -Xverify:none is not very uncommon in some
cases. Here are some of the usages:

- trusted tools
- some limited testing environment

CDS (particularly with dynamic archiving capability) may help avoid
runtime verification overhead by verifying classes at dump time and
reduce the needs for -Xverify:none. It would be good to have
strategies for the following senators as well when removing
-Xverify:none:

1) In cases when shared archive is disabled at runtime (I hope it's
not common cases)
2) When users want to reduce the overhead caused by verification
during archiving dump time

Thoughts?

Best,
Jiangli

On Tue, Nov 12, 2019 at 4:16 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> I am also a little worried that this might send the wrong message -- "if
> you want to archive pre-JDK6 classes, you need to disable verification
> altogether for all classes in your entire app".
>
> Thanks
> - Ioi
>
> On 11/12/19 12:40 PM, Harold Seigel wrote:
> > Hi Jiangli,
> >
> > I think this change is going in the wrong direction.  We are trying to
> > discourage disabling verification, not encourage it.  We also do not
> > want to create more use-cases for preserving -Xverify:none.
> >
> > It looks like your change would allow archiving of unverified pre-JDK6
> > classes, but not allow archiving of verified pre-JDK6 classes.  If so,
> > that seems backward.
> >
> > Thanks, Harold
> >
> > On 11/11/2019 11:53 PM, Ioi Lam wrote:
> >> I wonder if there's a safer alternative. Are there tools that can add
> >> stackmaps to pre-JDK6 classes? That way they can be verified with the
> >> split verifier during CDS dump time.
> >>
> >> Thanks
> >> - Ioi
> >>
> >> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> >>> Hi David,
> >>>
> >>> Thanks for quick response!
> >>>
> >>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes
> >>> <david.holmes at oracle.com> wrote:
> >>>> Hi Jiangli,
> >>>>
> >>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> >>>>> Please review the following change that allows archiving
> >>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
> >>>>>
> >>>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> >>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> >>>>>
> >>>>> Currently there are still large number of existing classes
> >>>>> (pre-built)
> >>>>> with older class versions (< 50) in real world applications. Those
> >>>>> classes are missing the benefit of archiving. Particularly, in some
> >>>>> use cases, class verification can be safely disabled. For those use
> >>>>> cases, supporting archiving pre JDK 6 classes shows good performance
> >>>>> benefit. We can re-evaluate this support when -Xverify:none is
> >>>>> removed
> >>>>> in the future, hopefully the needs for supporting class version < 50
> >>>>> is no longer significant at that time.
> >>>>>
> >>>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
> >>>>> sure the dump-time verification mode must be the same or stronger
> >>>>> than
> >>>>> the current mode.
> >>>>>
> >>>>> A CSR may be needed for the change. Any thoughts on that?
> >>>> A CSR request is definitely required given that you are proposing to
> >>>> undo a change that was itself put in place via a CSR request! And
> >>>> given
> >>>> this is relaxing a "defense-in-depth" check which will result in
> >>>> increasing exploitability, I think you will need a very strong
> >>>> argument
> >>>> to justify this.
> >>> Thanks for confirming this! Will do.
> >>>
> >>>> Further this not only undoes JDK-8197972 but it also invalidates
> >>>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
> >>>> requested a way to know if verification had been disabled, to help
> >>>> with
> >>>> analyzing crash reports, but instead we decided to not allow
> >>>> verification to be disabled.
> >>> I had some concerns about JDK-8155671 initially before making the
> >>> change, as it's a closed bug and my memory about the specific issue
> >>> was flushed out. I brought up the question in the bug. My take on
> >>> Ioi's response to my query about JDK-8155671 was that the
> >>> pre-JDK-8197972 behavior would not cause any security hole.
> >>>
> >>> Re-evaluating this particular behavior, I think the pre-JDK-8155671
> >>> would actually matches user intention better. If user decides to turn
> >>> off verification in safe use cases, it seems to be a good idea to
> >>> honor that. With the new dynamic archiving capability, archive could
> >>> be created at the first time when running a particular application.
> >>> Not forcing verification when user decides to can avoid
> >>> unnecessary/unwanted overhead.
> >>>
> >>> If verification is turned off at dump time for application classes,
> >>> runtime does not allow execution without also turning off
> >>> verification. We can determine a crash is not caused by relaxed dump
> >>> time verification.
> >>>
> >>> Regards,
> >>> Jiangli
> >>>
> >>>> David
> >>>> -----
> >>>>
> >>>>
> >>>>
> >>>>> Tested with jtreg appcds tests.
> >>>>>
> >>>>> Best,
> >>>>> Jiangli
> >>>>>
> >>
>

From ioi.lam at oracle.com  Wed Nov 13 03:51:59 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 12 Nov 2019 19:51:59 -0800
Subject: RFR (M) 8233913: Remove implicit conversion from Method* to
 methodHandle
In-Reply-To: <c7d96b9b-5ad5-2b91-9bbc-d10db36862d9@oracle.com>
References: <c7d96b9b-5ad5-2b91-9bbc-d10db36862d9@oracle.com>
Message-ID: <18ecffa1-36be-4a1b-616d-5badf50d87ab@oracle.com>

Hi Coleen,

I've scanned through all the changes. It looks good. Just a few small nits:


[1] Not sure if you want to handle it in this patch, but MethodData 
initialization is a bit messy:

For MethodData::MethodData() -> MethodData::initialize() -> 
MethodData::init(), I think you can pass in both the THREAD and the 
methodHandle, so you don't need to query the current thread again.This 
can skip two Thread::current() calls for each allocated MethodData.

(But you'd also need no-arg variants of initialize() and init() other 
callers, such as reprofile in jvmciCompilerToVM.cpp .... )

and why do we have MethodData::initialize() and MethodData::init()??

[2] Not a big deal, but should the variables be renamed from mh to m?

void TieredThresholdPolicy::print_counters(const char* prefix, Method* mh) {

void TieredThresholdPolicy::print_event(EventType type, Method* mh, 
Method* imh,
 ??????????????????????????????????????? int bci, CompLevel level) {


Thanks
- Ioi

From ioi.lam at oracle.com  Wed Nov 13 05:10:21 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 12 Nov 2019 21:10:21 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jye1Oua7e3LCNV6-c_pkYa3Ujni7own-ntXaFqv8tM6-Q@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
 <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
 <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>
 <CALrW1jye1Oua7e3LCNV6-c_pkYa3Ujni7own-ntXaFqv8tM6-Q@mail.gmail.com>
Message-ID: <0fec66c6-b8a2-6019-655b-467f84404386@oracle.com>


On 11/10/19 5:14 PM, Jiangli Zhou wrote:
>
>
> On Sun, Nov 10, 2019, 3:13 PM Ioi Lam <ioi.lam at oracle.com 
> <mailto:ioi.lam at oracle.com>> wrote:
>
>
>
>     On 11/9/19 8:25 PM, Jiangli Zhou wrote:
>     > Hi Ioi,
>     >
>     > On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com
>     <mailto:ioi.lam at oracle.com>> wrote:
>     >> Hi Jiangli,
>     >>
>     >> Thanks for your comments. Please see my replies in-line:
>     >>
>     >> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
>     >>> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou
>     <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>     >>>> I looked both 05.full and 06.delta webrevs. They look good.
>     >>>>
>     >>>> I still feel a bit uneasy about the potential runtime impact
>     when data
>     >>>> does get relocated. Long running apps/services may be shy
>     away from
>     >>>> enabling archive at runtime, if there is a detectable
>     overhead even
>     >>>> though it may only occur rarely. As relocation is enabled by
>     default
>     >>>> and users cannot turn it off, disabling with -Xshare:off entirely
>     >>>> would become the only choice. Could you please create a new RFE
>     >>>> (possibly with higher priority) to investigate the potential
>     effect,
>     >>>> or provide an option for users to opt-in relocation with the
>     >>>> command-line switch?
>     >> I created https://bugs.openjdk.java.net/browse/JDK-8233862
>     >> Investigate performance benefit of relocating CDS archive to
>     under 32G
>     >>
>     >> As I noted in the bug report, I ran benchmarks with CDS relocation
>     >> on/off, and there's no sign of regression when the CDS archive is
>     >> relocated. Please see the bug report for how to configure the
>     VM to do
>     >> the comparison.
>     >>
>     >> As you said before: "When enabling CDS we [google] noticed a small
>     >> runtime overhead in JDK 11 recently with a benchmark. After I
>     backported
>     >> JDK-8213713 to 11, it seemed to reduce the runtime overhead
>     that the
>     >> benchmark was experiencing":
>     >>
>     >> Can you confirm whether this is stock JDK 11 or a special
>     google build?
>     >> Which test case did you use? Is it possible for you to run the
>     tests
>     >> again (using the exact before/after bits that you had when
>     backporting
>     >> JDK-8213713)? Can you check if narrow_klass_base and
>     narrow_klass_shift
>     >> are the same in your before/after builds?
>     > Thanks for creating the RFE.
>     >
>     > JDK-8213713 closes the 1G gap between the shared space and class
>     space
>     > and everything else is unaffected. The compressed class base and
>     shift
>     > were the same for before and after applying JDK-8213713. The effect
>     > was statistically observed for the benchmark since the
>     difference was
>     > very small and could be within noise level for single run
>     comparison.
>     > A small difference could still be important for some use cases so it
>     > needs to be taken into consideration when designing and implementing
>     > new changes.
>
>     Hi Jiangli,
>
>     Thanks for taking the time for doing the performance measurements.
>
>     I also ran benchmarks in all 3 modes (no CDS, CDS without relocation,
>     CDS with relocation), and did not see any significant performance with
>     Octane-DeltaBlue, Octane-NavierStokes, SPECjbb2005-Tuned,
>     JFR-SPECjbb2005-Tuned, SPECjvm2008-Serial-G1 and Tools-Javac-Hello.
>
>
>     >
>     > A new command-line for archived metadata relocation may still be
>     > valuable. It would also be helpful for debugging and diagnosis.
>     >
>
>     How about a diagnostic flag ArchiveRelocationMode:
>
>     0: (default) first map at preferred address, and if unsuccessful,
>     map to
>     alternative address;
>     1: always map to alternative address;
>     2: always map at preferred address, and if unsuccessful, do not
>     map the
>     archive;
>
>     1 is for testing relocation, as well as for easy performance
>     measurement
>     (replaces the use of -XX:SharedBaseAddress=0 in my current patch.).
>     2 is for avoiding potential regression that may be introduced by
>     relocation (revert to JDK 13 behavior).
>
>     What do you think? If you like this I'll open a CSR.
>
>
>
> That sounds good to me!

Hi Jiangli,

It turns out that CSR is not needed for adding a diagnostic flag.

I implemented the flag as described above. See:

http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v07-delta/


Thanks
- Ioi

>
> Regards,
> Jiangli
>
>
>     Thanks
>     - Ioi
>
>
>
>     >>> Forgot to say that when Java heap can fit into low 32G space,
>     it takes
>     >>> the class space size into account and leaves need space right
>     above
>     >>> (also in low 32G space) when reserving heap, for
>     !UseSharedSpace. In
>     >>> that case, it's more likely the class data and heap data can be
>     >>> colocated successfully.
>     >> The reason is not for "colocation". It's so that
>     narrow_klass_base can
>     >> be zero, and the klass pointer can be uncompressed with a shift
>     (without
>     >> also doing an addition).
>     >>
>     >> But with CDS enabled, we always hard code to use non-zero
>     >> narrow_klass_base and 3 bit shift (for AOT). So by just
>     relocating the
>     >> CDS archive to under 32GB, without modifying how CDS handles
>     >> narrow_klass_base/shift, I don't think we can expect any benefit.
>     > I experimented with mapping the shared space in low 32G and placed
>     > right above the Java heap. The class space was also allocated in the
>     > low 32G space and after the mapped shared space in the
>     experiment. The
>     > compress class encoding was using 0 base and 3 shift, which was the
>     > same as the encoding when CDS was disabled. I didn't observe runtime
>     > performance difference when comparing that specific
>     configuration with
>     > the normal CDS mapping scheme (the shared space start at 32G and the
>     > encoding is non-zero base and 3 shift).
>     >
>     > Thanks,
>     > Jiangli
>     >> For modern architectures, I am not aware of any inherent speed
>     benefit
>     >> simply by putting data (in our case much larger than a page)
>     "close to
>     >> each other" in the virtual address space. If you have any
>     reference of
>     >> that, please let me know.
>     >>
>     >> Thanks
>     >> - Ioi
>     >>
>     >>> Thanks,
>     >>> Jiangli
>     >>>
>     >>>> Regards,
>     >>>> Jiangli
>     >>>>
>     >>>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com
>     <mailto:ioi.lam at oracle.com>> wrote:
>     >>>>> Hi Coleen,
>     >>>>>
>     >>>>> Thanks for the review. Here's an webrev that has
>     incorporated your
>     >>>>> suggestions:
>     >>>>>
>     >>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
>     >>>>>
>     >>>>> Please see comments in-line
>     >>>>>
>     >>>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com
>     <mailto:coleen.phillimore at oracle.com> wrote:
>     >>>>>> Hi, I've done a more high level code review of this and it
>     looks good!
>     >>>>>>
>     >>>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
>     >>>>>>
>     >>>>>>
>     >>>>>> I think these classes require comments on what they do and
>     why. The
>     >>>>>> comments you sent me offline look good.
>     >>>>> I added more comments for ArchivePtrMarker::_compacted per
>     your offline
>     >>>>> request.
>     >>>>>
>     >>>>>> Also .hpp files shouldn't include .inline.hpp files, like
>     >>>>>> bitMap.inline.hpp.? Hopefully it's just a case of moving
>     do_bit() into
>     >>>>>> the cpp file.
>     >>>>> I moved the do_bit() function into archiveUtils.inline.hpp,
>     since is
>     >>>>> used by 3 .cpp files, and performance is important.
>     >>>>>
>     >>>>>> I wonder if the exception list of classes to exclude should
>     be a
>     >>>>>> function in javaClasses.hpp/cpp where the explanation would
>     make more
>     >>>>>> sense?? ie bool
>     >>>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>     >>>>> I moved the checking code to javaClasses.cpp. Since we do
>     (partially)
>     >>>>> support java.lang.Class, which has injected native pointers,
>     I named the
>     >>>>> function as JavaClasses::is_supported_for_archiving instead.
>     I also
>     >>>>> massaged the comments a little for clarification.
>     >>>>>
>     >>>>>> Is there already an RFE to move the DumpSharedSpaces output
>     from
>     >>>>>> tty->print() to log_info() ?
>     >>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826
>     (Change CDS
>     >>>>> dumping tty->print_cr() to unified logging).
>     >>>>>
>     >>>>> Thanks
>     >>>>> - Ioi
>     >>>>>
>     >>>>>> Thanks,
>     >>>>>> Coleen
>     >>>>>>
>     >>>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
>     >>>>>>> Hi Jiangli,
>     >>>>>>>
>     >>>>>>> I've uploaded the webrev after integrating your comments:
>     >>>>>>>
>     >>>>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
>     >>>>>>>
>     >>>>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Please see more replies below:
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>     >>>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam
>     <ioi.lam at oracle.com <mailto:ioi.lam at oracle.com>
>     >>>>>>>> <mailto:ioi.lam at oracle.com <mailto:ioi.lam at oracle.com>>>
>     wrote:
>     >>>>>>>>
>     >>>>>>>>? ? ? ?Hi Jiangli,
>     >>>>>>>>
>     >>>>>>>>? ? ? ?Thank you so much for spending time reviewing this RFE!
>     >>>>>>>>
>     >>>>>>>>? ? ? ?On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>     >>>>>>>>? ? ? ?> Hi Ioi,
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> Sorry for the delay again. Will try to put this
>     on the top of my
>     >>>>>>>>? ? ? ?list
>     >>>>>>>>? ? ? ?> next week and reduce the turn-around time. The
>     updates look
>     >>>>>>>> good in
>     >>>>>>>>? ? ? ?> general.
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> We might want to have a better strategy when
>     choosing metadata
>     >>>>>>>>? ? ? ?> relocation address (when relocation is needed). Some
>     >>>>>>>>? ? ? ?> applications/benchmarks may be more sensitive to
>     cache
>     >>>>>>>> locality and
>     >>>>>>>>? ? ? ?> memory/data layout. There was a bug,
>     >>>>>>>>? ? ? ?> https://bugs.openjdk.java.net/browse/JDK-8213713
>     that caused
>     >>>>>>>> 1G gap
>     >>>>>>>>? ? ? ?> between Java heap data and metadata before JDK
>     12. The gap
>     >>>>>>>> seemed to
>     >>>>>>>>? ? ? ?> cause a small but noticeable runtime effect in
>     one case that I
>     >>>>>>>> came
>     >>>>>>>>? ? ? ?> across.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?I guess you're saying we should try to relocate the
>     archive into
>     >>>>>>>>? ? ? ?somewhere under 32GB?
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> I don't yet have sufficient data that determins if
>     mapping at low
>     >>>>>>>> 32G produces better runtime performance. I experimented
>     with that,
>     >>>>>>>> but didn't see noticeable difference when comparing to
>     mapping at
>     >>>>>>>> the current default address. It doesn't hurt, I think. So
>     it may be
>     >>>>>>>> a better choice than relocating to a random address in
>     high 32G
>     >>>>>>>> space (when Java heap is in low 32G address space).
>     >>>>>>> Maybe we should reconsider this when we have more concrete
>     data for
>     >>>>>>> the benefits of moving the compressed class space to under
>     32G.
>     >>>>>>>
>     >>>>>>> Please note that in metaspace.cpp, when CDS is disabled
>     and? the VM
>     >>>>>>> fails to allocate the class space at the requested address
>     >>>>>>> (0x7c000000 for 16GB heap), it also just allocates from a
>     random
>     >>>>>>> address (without trying to to search under 32GB):
>     >>>>>>>
>     >>>>>>>
>     http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> This code has been there since 2013 and we have not seen
>     any issues.
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>>? ? ? ?Could you elaborate more about the performance
>     issue, especially
>     >>>>>>>>? ? ? ?about
>     >>>>>>>>? ? ? ?cache locality? I looked at JDK-8213713 but it
>     didn't mention about
>     >>>>>>>>? ? ? ?performance.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> When enabling CDS we noticed a small runtime overhead in
>     JDK 11
>     >>>>>>>> recently with a benchmark. After I backported JDK-8213713
>     to 11, it
>     >>>>>>>> seemed to reduce the runtime overhead that the benchmark was
>     >>>>>>>> experiencing.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?Also, by default, we have non-zero
>     narrow_klass_base and
>     >>>>>>>>? ? ? ?narrow_klass_shift = 3, and archive relocation
>     doesn't change that:
>     >>>>>>>>
>     >>>>>>>>? ? ? ?$ java -Xlog:cds=debug -version
>     >>>>>>>>? ? ? ?... narrow_klass_base = 0x0000000800000000,
>     narrow_klass_shift = 3
>     >>>>>>>>? ? ? ?$ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>     >>>>>>>>? ? ? ?... narrow_klass_base = 0x00007f1e8b499000,
>     narrow_klass_shift = 3
>     >>>>>>>>
>     >>>>>>>>? ? ? ?We always use narrow_klass_shift due to this:
>     >>>>>>>>
>     >>>>>>>>? ? ? ? ? // CDS uses LogKlassAlignmentInBytes for
>     narrow_klass_shift. See
>     >>>>>>>>? ? ? ? ? //
>     >>>>>>>>
>     MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>     >>>>>>>>? ? ? ? ? // how dump time narrow_klass_shift is set.
>     Although, CDS can
>     >>>>>>>> work
>     >>>>>>>>? ? ? ? ? // with zero-shift mode also, to be consistent
>     with AOT it uses
>     >>>>>>>>? ? ? ? ? // LogKlassAlignmentInBytes for klass shift so
>     archived java
>     >>>>>>>>? ? ? ?heap objects
>     >>>>>>>>? ? ? ? ? // can be used at same time as AOT code.
>     >>>>>>>>? ? ? ? ? if (!UseSharedSpaces
>     >>>>>>>>? ? ? ? ? ? ? && (uint64_t)(higher_address - lower_base) <=
>     >>>>>>>> ?UnscaledClassSpaceMax) {
>     >>>>>>>> CompressedKlassPointers::set_shift(0);
>     >>>>>>>>? ? ? ? ? } else {
>     >>>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>     >>>>>>>>? ? ? ? ? }
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> Right. If we relocate to low 32G space, it needs to make
>     sure that
>     >>>>>>>> the range containing the mapped class data and class
>     space must be
>     >>>>>>>> encodable.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?> Here are some additional comments (minor).
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> Could you please fix the long lines in the following?
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> 1237 void
>     >>>>>>>>
>     java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>     >>>>>>>>? ? ? ?> archived_mirror) {
>     >>>>>>>>? ? ? ?> 1238? ?if (MetaspaceShared::relocation_delta() !=
>     0) {
>     >>>>>>>>? ? ? ?> 1239
>     assert(archived_mirror->metadata_field(_klass_offset) ==
>     >>>>>>>>? ? ? ?> NULL, "must be for primitive class");
>     >>>>>>>>? ? ? ?> 1240
>     >>>>>>>>? ? ? ?> 1241 ?Klass* ak =
>     >>>>>>>>? ? ? ?>
>     ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>     >>>>>>>>? ? ? ?> 1242? ? ?if (ak != NULL) {
>     >>>>>>>>? ? ? ?> 1243
>     archived_mirror->metadata_field_put(_array_klass_offset,
>     >>>>>>>>? ? ? ?> (Klass*)(address(ak) +
>     MetaspaceShared::relocation_delta()));
>     >>>>>>>>? ? ? ?> 1244? ? ?}
>     >>>>>>>>? ? ? ?> 1245? ?}
>     >>>>>>>>? ? ? ?> 1246 }
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> src/hotspot/share/memory/dynamicArchive.cpp
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>? ?889 ?Thread* THREAD = Thread::current();
>     >>>>>>>>? ? ? ?>? ?890 ?Method::sort_methods(ik->methods(),
>     /*set_idnums=*/true,
>     >>>>>>>>? ? ? ?> dynamic_dump_method_comparator);
>     >>>>>>>>? ? ? ?>? ?891? ?if (ik->default_methods() != NULL) {
>     >>>>>>>>? ? ? ?>? ?892 Method::sort_methods(ik->default_methods(),
>     >>>>>>>>? ? ? ?> /*set_idnums=*/false,
>     dynamic_dump_method_comparator);
>     >>>>>>>>? ? ? ?>? ?893? ?}
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?OK will do.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?> Please see inlined comments below.
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam
>     <ioi.lam at oracle.com <mailto:ioi.lam at oracle.com>
>     >>>>>>>>? ? ? ?<mailto:ioi.lam at oracle.com
>     <mailto:ioi.lam at oracle.com>>> wrote:
>     >>>>>>>>? ? ? ?>> Hi Jiangli,
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> Thanks for the review. I've updated the patch
>     according to your
>     >>>>>>>>? ? ? ?comments:
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>
>     http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> (the delta is on top of
>     8231610-relocate-cds-archive.v03.delta
>     >>>>>>>>? ? ? ?in my
>     >>>>>>>>? ? ? ?>> reply to Calvin's comments).
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>     >>>>>>>>? ? ? ?>>> Hi Ioi,
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Sorry for the delay. Here are my remaining
>     comments.
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/dynamicArchive.cpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 128 ?static intx _method_comparator_name_delta;
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> The name of the above variable is confusing.
>     It's the value of
>     >>>>>>>>? ? ? ?>>> _buffer_to_target_delta. It's better to
>     _buffer_to_target_delta
>     >>>>>>>>? ? ? ?>>> directly.
>     >>>>>>>>? ? ? ?>> _buffer_to_target_delta is a non-static field, but
>     >>>>>>>>? ? ? ?>> dynamic_dump_method_comparator() must be a
>     static function so
>     >>>>>>>>? ? ? ?it can't
>     >>>>>>>>? ? ? ?>> use the non-static field easily.
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> It sounds like an issue. _buffer_to_target_delta
>     was made as a
>     >>>>>>>>? ? ? ?> non-static mostly because we might support more
>     than one dynamic
>     >>>>>>>>? ? ? ?> archives in the future. However, today's usages
>     bake in an
>     >>>>>>>>? ? ? ?assumption
>     >>>>>>>>? ? ? ?> that _buffer_to_target_delta is a singleton
>     value. It is
>     >>>>>>>> cleaner to
>     >>>>>>>>? ? ? ?> either make _buffer_to_target_delta as a static
>     variable for
>     >>>>>>>> now, or
>     >>>>>>>>? ? ? ?> adding an access API in DynamicArchiveBuilder to
>     allow other
>     >>>>>>>> code to
>     >>>>>>>>? ? ? ?> properly and correctly use the value.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?OK, I'll move it to a static variable.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> Also, we can do a quick pointer comparison of
>     'a_name' and
>     >>>>>>>>? ? ? ?>>> 'b_name' first before adjusting the pointers.
>     >>>>>>>>? ? ? ?>> I added this:
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>? ? ? ?if (a_name == b_name) {
>     >>>>>>>>? ? ? ?>> ?return 0;
>     >>>>>>>>? ? ? ?>>? ? ? ?}
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 934 void
>     DynamicArchiveBuilder::relocate_buffer_to_target() {
>     >>>>>>>>? ? ? ?>>> ...
>     >>>>>>>>? ? ? ?>>>? ? 944
>     >>>>>>>>? ? ? ?>>> 945? ArchivePtrMarker::compact(relocatable_base,
>     >>>>>>>>? ? ? ?relocatable_end);
>     >>>>>>>>? ? ? ?>>> ...
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 974? ? ?SharedDataRelocator
>     patcher((address*)patch_base,
>     >>>>>>>>? ? ? ?>>> (address*)patch_end, valid_old_base, valid_old_end,
>     >>>>>>>>? ? ? ?>>> 975? valid_new_base, valid_new_end, addr_delta);
>     >>>>>>>>? ? ? ?>>> 976? ArchivePtrMarker::ptrmap()->iterate(&patcher);
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Could we reduce the number of data
>     re-iterations to help
>     >>>>>>>> archive
>     >>>>>>>>? ? ? ?>>> dumping performance. The
>     ArchivePtrMarker::compact operation
>     >>>>>>>>? ? ? ?can be
>     >>>>>>>>? ? ? ?>>> combined with the patching iteration.
>     >>>>>>>> ?ArchivePtrMarker::compact API
>     >>>>>>>>? ? ? ?>>> can be removed.
>     >>>>>>>>? ? ? ?>> That's a good idea. I implemented it using a
>     template parameter
>     >>>>>>>>? ? ? ?so that
>     >>>>>>>>? ? ? ?>> we can have max performance when relocating the
>     archive at run
>     >>>>>>>>? ? ? ?time.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> I added comments to explain why the relocation
>     is done here. The
>     >>>>>>>>? ? ? ?>> relocation is pretty rare (only when the base
>     archive was not
>     >>>>>>>>? ? ? ?mapped at
>     >>>>>>>>? ? ? ?>> the default location).
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 967? ? ?address valid_new_base =
>     >>>>>>>>? ? ? ?>>> (address)Arguments::default_SharedBaseAddress();
>     >>>>>>>>? ? ? ?>>> 968? ? ?address valid_new_end? = valid_new_base +
>     >>>>>>>>? ? ? ?base_plus_top_size;
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> The debugging only code can be included under
>     #ifdef ASSERT.
>     >>>>>>>>? ? ? ?>> These values are actually also used in debug
>     logging so they
>     >>>>>>>>? ? ? ?can't be
>     >>>>>>>>? ? ? ?>> ifdef'ed out.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> Also, the c++ compiler is pretty good with
>     eliding code
>     >>>>>>>> that's no
>     >>>>>>>>? ? ? ?>> actually used. If I comment out all the logging
>     code in
>     >>>>>>>>? ? ? ?>>
>     DynamicArchiveBuilder::relocate_buffer_to_target() and
>     >>>>>>>>? ? ? ?>> SharedDataRelocator, gcc elides all the unused
>     fields and their
>     >>>>>>>>? ? ? ?>> assignments. So no code is generated for this, etc.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> ?address valid_new_base =
>     >>>>>>>>? ? ? ?>> (address)Arguments::default_SharedBaseAddress();
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> Since #ifdef ASSERT makes the code harder to
>     read, I think we
>     >>>>>>>>? ? ? ?should use
>     >>>>>>>>? ? ? ?>> it only when really necessary.
>     >>>>>>>>? ? ? ?> It seems cleaner to get rid of these debugging
>     only variables, by
>     >>>>>>>>? ? ? ?> using 'relocatable_base' and
>     >>>>>>>>? ? ? ?> '(address)Arguments::default_SharedBaseAddress()'
>     in the logging
>     >>>>>>>>? ? ? ?code.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?SharedDataRelocator is used under 3 different
>     situations. These six
>     >>>>>>>>? ? ? ?variables (patch_base, patch_end, valid_old_base,
>     valid_old_end,
>     >>>>>>>>? ? ? ?valid_new_base, valid_new_end) describes what is
>     being patched,
>     >>>>>>>>? ? ? ?and what
>     >>>>>>>>? ? ? ?the expectations are, for each situation. The code
>     will be hard to
>     >>>>>>>>? ? ? ?understand without them.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?Please note there's also logging code in the
>     SharedDataRelocator
>     >>>>>>>>? ? ? ?constructor that prints out these values.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?I think I'll just remove the 'debug only' comment
>     to avoid
>     >>>>>>>> confusion.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> Ok.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>>? ? 993
>     >>>>>>>>
>     dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> We could combine the archived heap data bitmap
>     into the new
>     >>>>>>>>? ? ? ?region as
>     >>>>>>>>? ? ? ?>>> well? It can be handled as a separate RFE.
>     >>>>>>>>? ? ? ?>> I've filed
>     https://bugs.openjdk.java.net/browse/JDK-8233093
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/filemap.cpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1038 ? ?if (is_static()) {
>     >>>>>>>>? ? ? ?>>> 1039 ? ? ?if (errno == ENOENT) {
>     >>>>>>>>? ? ? ?>>> 1040 ? ? ? ?// Not locating the shared archive
>     is ok.
>     >>>>>>>>? ? ? ?>>> 1041 ? ? ? ?fail_continue("Specified shared
>     archive not found
>     >>>>>>>>? ? ? ?(%s).",
>     >>>>>>>>? ? ? ?>>> _full_path);
>     >>>>>>>>? ? ? ?>>> 1042 ? ? ?} else {
>     >>>>>>>>? ? ? ?>>> 1043 ? ? ? ?fail_continue("Failed to open
>     shared archive file
>     >>>>>>>>? ? ? ?(%s).",
>     >>>>>>>>? ? ? ?>>> 1044 os::strerror(errno));
>     >>>>>>>>? ? ? ?>>> 1045 ? ? ?}
>     >>>>>>>>? ? ? ?>>> 1046 ? ?} else {
>     >>>>>>>>? ? ? ?>>> 1047 ? ? ?log_warning(cds, dynamic)("specified
>     dynamic archive
>     >>>>>>>>? ? ? ?>>> doesn't exist: %s", _full_path);
>     >>>>>>>>? ? ? ?>>> 1048 ? ?}
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> If the top layer is explicitly specified by the
>     user, a
>     >>>>>>>>? ? ? ?warning does
>     >>>>>>>>? ? ? ?>>> not seem to be a proper behavior if the VM
>     fails to open the
>     >>>>>>>>? ? ? ?archive
>     >>>>>>>>? ? ? ?>>> file.
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> If might be better to handle the relocation
>     unrelated code in
>     >>>>>>>>? ? ? ?separate
>     >>>>>>>>? ? ? ?>>> changeset and track with a separate RFE.
>     >>>>>>>>? ? ? ?>> This code was moved from
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>
>     http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> so I am not changing the behavior. If you want,
>     we can file an
>     >>>>>>>>? ? ? ?REF to
>     >>>>>>>>? ? ? ?>> change the behavior.
>     >>>>>>>>? ? ? ?> Ok. A new RFE sounds like the right thing to
>     re-evaluable the
>     >>>>>>>> usage
>     >>>>>>>>? ? ? ?> issue here. Thanks.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?I created
>     https://bugs.openjdk.java.net/browse/JDK-8233446
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1148 void FileMapInfo::write_region(int region,
>     char* base,
>     >>>>>>>>? ? ? ?size_t size,
>     >>>>>>>>? ? ? ?>>> 1149 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool
>     read_only, bool
>     >>>>>>>>? ? ? ?allow_exec) {
>     >>>>>>>>? ? ? ?>>> ...
>     >>>>>>>>? ? ? ?>>> 1154
>     >>>>>>>>? ? ? ?>>> 1155 ?if (region == MetaspaceShared::bm) {
>     >>>>>>>>? ? ? ?>>> 1156 ? ?target_base = NULL;
>     >>>>>>>>? ? ? ?>>> 1157 ?} else if (DynamicDumpSharedSpaces) {
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> It's not too clear to me how the bitmap (bm)
>     region is handled
>     >>>>>>>>? ? ? ?for the
>     >>>>>>>>? ? ? ?>>> base layer and top layer. Could you please explain?
>     >>>>>>>>? ? ? ?>> The bm region for both layers are mapped at an
>     address picked
>     >>>>>>>>? ? ? ?by the OS:
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> char* FileMapInfo::map_relocation_bitmap(size_t&
>     bitmap_size) {
>     >>>>>>>>? ? ? ?>> ?FileMapRegion* si = space_at(MetaspaceShared::bm);
>     >>>>>>>>? ? ? ?>> ?bitmap_size = si->used_aligned();
>     >>>>>>>>? ? ? ?>>? ? ?bool read_only = true, allow_exec = false;
>     >>>>>>>>? ? ? ?>>? ? ?char* requested_addr = NULL; // allow OS to
>     pick any
>     >>>>>>>> location
>     >>>>>>>>? ? ? ?>>? ? ?char* bitmap_base = os::map_memory(_fd,
>     _full_path,
>     >>>>>>>> ?si->file_offset(),
>     >>>>>>>>? ? ? ?>> requested_addr, bitmap_size,
>     >>>>>>>>? ? ? ?>> read_only, allow_exec);
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?> Ok, after staring at the code for a few seconds I
>     saw that's
>     >>>>>>>>? ? ? ?intended.
>     >>>>>>>>? ? ? ?> If the current region is 'bm', then the
>     'target_base' is NULL
>     >>>>>>>>? ? ? ?> regardless if it's static or dynamic archive.
>     Otherwise, the
>     >>>>>>>>? ? ? ?> 'target_base' is handled differently for the
>     static and dynamic
>     >>>>>>>>? ? ? ?case.
>     >>>>>>>>? ? ? ?> The following would be cleaner and has better
>     reliability.
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>? ? ?char* target_base = NULL;
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>? ? ?// The target_base is NULL for 'bm' region.
>     >>>>>>>>? ? ? ?>? ? ?if (!region == MetaspaceShared::bm) {
>     >>>>>>>>? ? ? ?>? ? ? ?if (DynamicDumpSharedSpaces) {
>     >>>>>>>>? ? ? ?> ?assert(!HeapShared::is_heap_region(region), "dynamic
>     >>>>>>>> archive
>     >>>>>>>>? ? ? ?> doesn't support heap regions");
>     >>>>>>>>? ? ? ?> ?target_base =
>     DynamicArchive::buffer_to_target(base);
>     >>>>>>>>? ? ? ?>? ? ? ?} else {
>     >>>>>>>>? ? ? ?> ?target_base = base;
>     >>>>>>>>? ? ? ?>? ? ? ?}
>     >>>>>>>>? ? ? ?>? ? }
>     >>>>>>>>
>     >>>>>>>>? ? ? ?How about this?
>     >>>>>>>>
>     >>>>>>>>? ? ? ? ? char* target_base;
>     >>>>>>>>? ? ? ? ? if (region == MetaspaceShared::bm) {
>     >>>>>>>>? ? ? ? ? ? target_base = NULL; // always NULL for bm region.
>     >>>>>>>>? ? ? ? ? } else {
>     >>>>>>>>? ? ? ? ? ? if (DynamicDumpSharedSpaces) {
>     >>>>>>>> assert(!HeapShared::is_heap_region(region), "dynamic
>     >>>>>>>> archive
>     >>>>>>>>? ? ? ?doesn't support heap regions");
>     >>>>>>>> target_base = DynamicArchive::buffer_to_target(base);
>     >>>>>>>>? ? ? ? ? ? } else {
>     >>>>>>>> target_base = base;
>     >>>>>>>>? ? ? ? ? ? }
>     >>>>>>>>? ? ? ? ? }
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> No objection If you prefer the extra 'else' block.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1362
>     >>>>>>>>
>     DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Could you please explain the above?
>     >>>>>>>>? ? ? ?>> I added the comments
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>? ? ?// Make sure we don't attempt to use
>     >>>>>>>> ?header()->mapped_base_address()
>     >>>>>>>>? ? ? ?>> unless
>     >>>>>>>>? ? ? ?>>? ? ?// it's been successfully mapped.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>
>     DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1359 ?FileMapRegion* last_region = NULL;
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1371 ? ?if (last_region != NULL) {
>     >>>>>>>>? ? ? ?>>> 1372 ? ? ?// Ensure that the OS won't be able
>     to allocate new
>     >>>>>>>>? ? ? ?memory
>     >>>>>>>>? ? ? ?>>> spaces between any mapped
>     >>>>>>>>? ? ? ?>>> 1373 ? ? ?// regions, or else it would mess up
>     the simple
>     >>>>>>>>? ? ? ?comparision
>     >>>>>>>>? ? ? ?>>> in MetaspaceObj::is_shared().
>     >>>>>>>>? ? ? ?>>> 1374 ? ? ?assert(si->mapped_base() ==
>     >>>>>>>> last_region->mapped_end(),
>     >>>>>>>>? ? ? ?>>> "must have no gaps");
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1379 ? ?last_region = si;
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Can you please place 'last_region' related code
>     under #ifdef
>     >>>>>>>>? ? ? ?ASSERT?
>     >>>>>>>>? ? ? ?>> I think that will make the code more cluttered.
>     The compiler
>     >>>>>>>> will
>     >>>>>>>>? ? ? ?>> optimize out that away.
>     >>>>>>>>? ? ? ?> It's cleaner to define debugging only variable
>     for debugging only
>     >>>>>>>>? ? ? ?> builds. You can wrapper it and related usage with
>     DEBUG_ONLY.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?OK, will do.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1478 char*
>     FileMapInfo::map_relocation_bitmap(size_t&
>     >>>>>>>>? ? ? ?bitmap_size) {
>     >>>>>>>>? ? ? ?>>> 1479 ?FileMapRegion* si =
>     space_at(MetaspaceShared::bm);
>     >>>>>>>>? ? ? ?>>> 1480 ?bitmap_size = si->used_aligned();
>     >>>>>>>>? ? ? ?>>> 1481 ?bool read_only = true, allow_exec = false;
>     >>>>>>>>? ? ? ?>>> 1482 ?char* requested_addr = NULL; // allow OS
>     to pick any
>     >>>>>>>>? ? ? ?location
>     >>>>>>>>? ? ? ?>>> 1483 ?char* bitmap_base = os::map_memory(_fd,
>     _full_path,
>     >>>>>>>> ?si->file_offset(),
>     >>>>>>>>? ? ? ?>>> 1484 requested_addr, bitmap_size,
>     >>>>>>>>? ? ? ?>>> read_only, allow_exec);
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> We need to handle mapping failure here.
>     >>>>>>>>? ? ? ?>> It's handled here:
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> bool FileMapInfo::relocate_pointers(intx
>     addr_delta) {
>     >>>>>>>>? ? ? ?>> ?log_debug(cds, reloc)("runtime archive
>     relocation start");
>     >>>>>>>>? ? ? ?>>? ? ?size_t bitmap_size;
>     >>>>>>>>? ? ? ?>>? ? ?char* bitmap_base =
>     map_relocation_bitmap(bitmap_size);
>     >>>>>>>>? ? ? ?>>? ? ?if (bitmap_base != NULL) {
>     >>>>>>>>? ? ? ?>>? ? ?...
>     >>>>>>>>? ? ? ?>>? ? ?} else {
>     >>>>>>>>? ? ? ?>> ?log_error(cds)("failed to map relocation bitmap");
>     >>>>>>>>? ? ? ?>> ?return false;
>     >>>>>>>>? ? ? ?>>? ? ?}
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?> 'bitmap_base' is used immediately after
>     map_memory(). So the
>     >>>>>>>> check
>     >>>>>>>>? ? ? ?> needs to be done immediately after map_memory(),
>     but not in the
>     >>>>>>>>? ? ? ?caller
>     >>>>>>>>? ? ? ?> of map_relocation_bitmap().
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> 1490? ?char* bitmap_base = os::map_memory(_fd,
>     _full_path,
>     >>>>>>>> ?si->file_offset(),
>     >>>>>>>>? ? ? ?> 1491 requested_addr, bitmap_size,
>     >>>>>>>>? ? ? ?> read_only, allow_exec);
>     >>>>>>>>? ? ? ?> 1492
>     >>>>>>>>? ? ? ?> 1493? ?if (VerifySharedSpaces && bitmap_base !=
>     NULL &&
>     >>>>>>>>? ? ? ?> !region_crc_check(bitmap_base, bitmap_size,
>     si->crc())) {
>     >>>>>>>>
>     >>>>>>>>? ? ? ?OK, I'll fix that.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1513 ? ?// debug only -- the current value of
>     the pointers
>     >>>>>>>> to be
>     >>>>>>>>? ? ? ?>>> patched must be within this
>     >>>>>>>>? ? ? ?>>> 1514 ? ?// range (i.e., must be between the
>     requesed base
>     >>>>>>>>? ? ? ?address,
>     >>>>>>>>? ? ? ?>>> and the of the current archive).
>     >>>>>>>>? ? ? ?>>> 1515 ? ?// Note: top archive may point to
>     objects in the base
>     >>>>>>>>? ? ? ?>>> archive, but not the other way around.
>     >>>>>>>>? ? ? ?>>> 1516 ? ?address valid_old_base =
>     >>>>>>>> ?(address)header()->requested_base_address();
>     >>>>>>>>? ? ? ?>>> 1517 ? ?address valid_old_end? = valid_old_base +
>     >>>>>>>> ?mapping_end_offset();
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Please place all FileMapInfo::relocate_pointers
>     debugging only
>     >>>>>>>>? ? ? ?code
>     >>>>>>>>? ? ? ?>>> under #ifdef ASSERT.
>     >>>>>>>>? ? ? ?>> Ditto about ifdef ASSERT
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/heapShared.cpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>>? ? 441 void
>     >>>>>>>> ?HeapShared::initialize_from_archived_subgraph(Klass* k) {
>     >>>>>>>>? ? ? ?>>> 442? ?if (!open_archive_heap_region_mapped() ||
>     >>>>>>>> ?!MetaspaceObj::is_shared(k)) {
>     >>>>>>>>? ? ? ?>>> 443? ? ?return; // nothing to do
>     >>>>>>>>? ? ? ?>>> 444? ?}
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> When do we call
>     HeapShared::initialize_from_archived_subgraph
>     >>>>>>>>? ? ? ?for a
>     >>>>>>>>? ? ? ?>>> klass that's not shared?
>     >>>>>>>>? ? ? ?>> I've removed the !MetaspaceObj::is_shared(k). I
>     probably added
>     >>>>>>>>? ? ? ?that for
>     >>>>>>>>? ? ? ?>> debugging purposes only.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> 616? ?DEBUG_ONLY({
>     >>>>>>>>? ? ? ?>>> 617? ? ? ?Klass* klass = orig_obj->klass();
>     >>>>>>>>? ? ? ?>>> 618? ? ? ?assert(klass !=
>     >>>>>>>> SystemDictionary::Module_klass() &&
>     >>>>>>>>? ? ? ?>>> 619? ? ? ? ? ? ? klass !=
>     >>>>>>>> ?SystemDictionary::ResolvedMethodName_klass() &&
>     >>>>>>>>? ? ? ?>>> 620? ? ? ? ? ? ? klass !=
>     >>>>>>>> ?SystemDictionary::MemberName_klass() &&
>     >>>>>>>>? ? ? ?>>> 621? ? ? ? ? ? ? klass !=
>     >>>>>>>> SystemDictionary::Context_klass() &&
>     >>>>>>>>? ? ? ?>>> 622? ? ? ? ? ? ? klass !=
>     >>>>>>>> ?SystemDictionary::ClassLoader_klass(), "we
>     >>>>>>>>? ? ? ?>>> can only relocate metaspace object pointers inside
>     >>>>>>>> java_lang_Class
>     >>>>>>>>? ? ? ?>>> instances");
>     >>>>>>>>? ? ? ?>>> 623? ? ?});
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Let's leave the above for a separate RFE. I
>     think assert is not
>     >>>>>>>>? ? ? ?>>> sufficient for the check. Also, why
>     ResolvedMethodName,
>     >>>>>>>> Module and
>     >>>>>>>>? ? ? ?>>> MemberName cannot be part of the graph?
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>> I added the following comment:
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> ?DEBUG_ONLY({
>     >>>>>>>>? ? ? ?>>? ? ? ? ?// The following are classes in
>     >>>>>>>> ?share/classfile/javaClasses.cpp
>     >>>>>>>>? ? ? ?>> that have injected native pointers
>     >>>>>>>>? ? ? ?>>? ? ? ? ?// to metaspace objects. To support
>     these classes, we
>     >>>>>>>>? ? ? ?need to add
>     >>>>>>>>? ? ? ?>> relocation code similar to
>     >>>>>>>>? ? ? ?>>? ? ? ? ?//
>     >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
>     >>>>>>>>? ? ? ?>> ?Klass* klass = orig_obj->klass();
>     >>>>>>>>? ? ? ?>> ?assert(klass != SystemDictionary::Module_klass() &&
>     >>>>>>>>? ? ? ?>> ? ? klass !=
>     >>>>>>>> ?SystemDictionary::ResolvedMethodName_klass() &&
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?> It's too restrictive to exclude those objects
>     from the archived
>     >>>>>>>>? ? ? ?object
>     >>>>>>>>? ? ? ?> graph because metadata relocation, since metadata
>     relocation is
>     >>>>>>>>? ? ? ?rare.
>     >>>>>>>>? ? ? ?> The trade-off doesn't seem to buy us much.
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?> Do you plan to add the needed relocation code?
>     >>>>>>>>
>     >>>>>>>>? ? ? ?I looked more into this. Actually we cannot handle
>     these 5
>     >>>>>>>> classes at
>     >>>>>>>>? ? ? ?all, even without archive relocation:
>     >>>>>>>>
>     >>>>>>>>? ? ? ?[1] #define MODULE_INJECTED_FIELDS(macro) \
>     >>>>>>>> macro(java_lang_Module, module_entry, intptr_signature,
>     false)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?->? module_entry is malloc'ed
>     >>>>>>>>
>     >>>>>>>>? ? ? ?[2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>     >>>>>>>> macro(java_lang_invoke_ResolvedMethodName, vmholder,
>     >>>>>>>>? ? ? ?object_signature, false) \
>     >>>>>>>> macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>     >>>>>>>>? ? ? ?intptr_signature, false)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?-> these fields are related to method handles and
>     lambda forms,
>     >>>>>>>> etc.
>     >>>>>>>>? ? ? ?They can't be easily be archived without
>     implementing lambda form
>     >>>>>>>>? ? ? ?archiving. (I did a prototype; it's very complex
>     and fragile).
>     >>>>>>>>
>     >>>>>>>>? ? ? ?[3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>     >>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>     >>>>>>>>? ? ? ?vmdependencies, intptr_signature, false) \
>     >>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>     >>>>>>>>? ? ? ?last_cleanup, long_signature, false)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?-> vmdependencies is malloc'ed.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?[4] #define
>     >>>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>     >>>>>>>> macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>     >>>>>>>>? ? ? ?false)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?-> this one is probably OK. Despite being declared as
>     >>>>>>>>? ? ? ?'intptr_signature', it seems to be used just as an
>     integer.
>     >>>>>>>> However,
>     >>>>>>>>? ? ? ?MemberNames are typically used with [2] and [3]. So
>     let's just
>     >>>>>>>>? ? ? ?forbid it
>     >>>>>>>>? ? ? ?to be safe.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?[2] [3] [4] are not used directly by regular Java
>     code and are
>     >>>>>>>>? ? ? ?unlikely
>     >>>>>>>>? ? ? ?to be referenced (directly or indirectly) by static
>     fields (except
>     >>>>>>>>? ? ? ?for
>     >>>>>>>>? ? ? ?the static fields in the classes in
>     java.lang.invoke, which we
>     >>>>>>>>? ? ? ?probably
>     >>>>>>>>? ? ? ?won't support for heap archiving due to the problem
>     I described for
>     >>>>>>>>? ? ? ?[2]). Objects of these types are typically
>     referenced via constant
>     >>>>>>>>? ? ? ?pool
>     >>>>>>>>? ? ? ?entries.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?[5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>     >>>>>>>> macro(java_lang_ClassLoader, loader_data, intptr_signature,
>     >>>>>>>> false)
>     >>>>>>>>
>     >>>>>>>>? ? ? ?-> loader_data is malloc'ed.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?So, I will change the DEBUG_ONLY into a
>     product-mode check, and
>     >>>>>>>> quit
>     >>>>>>>>? ? ? ?dumping if these objects are found in the object
>     subgraph.
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> Sounds good. Can you please also add a comment with
>     explanation.
>     >>>>>>>>
>     >>>>>>>> For? ClassLoader and Module, it worth considering caching the
>     >>>>>>>> additional native data some time in the future. Lois had
>     suggested
>     >>>>>>>> the Module part a while ago.
>     >>>>>>> I think we can do that if/when we archive Modules directly
>     into the
>     >>>>>>> shared heap.
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?Maybe we should backport the check to older
>     versions as well?
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> We should discuss with Andrew Haley for backports to JDK
>     11 update
>     >>>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
>     >>>>>>>> archiving to a restricted set of JDK library code, I
>     think it is
>     >>>>>>>> safe without the new check.
>     >>>>>>>>
>     >>>>>>>> For non-LTS releases, it might not be worthwhile as they
>     may not be
>     >>>>>>>> widely used?
>     >>>>>>> I agree. FYI, we (Oracle) have no plan for backporting
>     more types of
>     >>>>>>> heap object archiving, so the decision would be up to
>     whoever that
>     >>>>>>> decides to do so.
>     >>>>>>>
>     >>>>>>> Thanks
>     >>>>>>> - Ioi
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>> Thanks,
>     >>>>>>>> Jiangli
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/metaspace.cpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1036 ?metaspace_rs =
>     >>>>>>>> ReservedSpace(compressed_class_space_size(),
>     >>>>>>>>? ? ? ?>>> 1037 ?_reserve_alignment,
>     >>>>>>>>? ? ? ?>>> 1038 ?large_pages,
>     >>>>>>>>? ? ? ?>>> 1039 ?requested_addr);
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Please fix indentation.
>     >>>>>>>>? ? ? ?>> Fixed.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/metaspaceClosure.hpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> ?78? ?enum SpecialRef {
>     >>>>>>>>? ? ? ?>>> ?79? ? ?_method_entry_ref
>     >>>>>>>>? ? ? ?>>> ?80? ?};
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Are there other pointers that are not references to
>     >>>>>>>>? ? ? ?MetaspaceObj? If
>     >>>>>>>>? ? ? ?>>> _method_entry_ref is the only type, it's
>     probably not worth
>     >>>>>>>>? ? ? ?defining
>     >>>>>>>>? ? ? ?>>> SpecialRef?
>     >>>>>>>>? ? ? ?>> There may be more types in the future, so I want
>     to have a
>     >>>>>>>>? ? ? ?stable API
>     >>>>>>>>? ? ? ?>> that can be easily expanded without touching all
>     the code that
>     >>>>>>>>? ? ? ?uses it.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/metaspaceShared.hpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>>? ? ?42 enum MapArchiveResult {
>     >>>>>>>>? ? ? ?>>> ?43? ?MAP_ARCHIVE_SUCCESS,
>     >>>>>>>>? ? ? ?>>> ?44? ?MAP_ARCHIVE_MMAP_FAILURE,
>     >>>>>>>>? ? ? ?>>> ?45? ?MAP_ARCHIVE_OTHER_FAILURE
>     >>>>>>>>? ? ? ?>>>? ? ?46 };
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> If we want to define different failure types,
>     it's probably
>     >>>>>>>> worth
>     >>>>>>>>? ? ? ?>>> using separate types for relocation failure and
>     validation
>     >>>>>>>>? ? ? ?failure.
>     >>>>>>>>? ? ? ?>> For now, I just need to distinguish between
>     MMAP_FAILURE (where
>     >>>>>>>>? ? ? ?I should
>     >>>>>>>>? ? ? ?>> attempt to remap at an alternative address) and
>     OTHER_FAILURE
>     >>>>>>>>? ? ? ?(where the
>     >>>>>>>>? ? ? ?>> CDS archive loading will fail -- due to
>     validation error,
>     >>>>>>>>? ? ? ?insufficient
>     >>>>>>>>? ? ? ?>> memory, etc -- without attempting to remap.)
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> ---
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 193? ?static intx _mapping_delta; // FIXME rename
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> How about _relocation_delta?
>     >>>>>>>>? ? ? ?>> Changed as suggested.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/oops/instanceKlass
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> 1573 bool
>     InstanceKlass::_disable_method_binary_search = false;
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> The use of _disable_method_binary_search is not
>     necessary. You
>     >>>>>>>>? ? ? ?can use
>     >>>>>>>>? ? ? ?>>> DynamicDumpSharedSpaces for the purpose. That
>     would make things
>     >>>>>>>>? ? ? ?>>> cleaner.
>     >>>>>>>>? ? ? ?>> If we always disable the binary search when
>     >>>>>>>> ?DynamicDumpSharedSpaces is
>     >>>>>>>>? ? ? ?>> true, it will slow down normal execution of the
>     Java program
>     >>>>>>>> when
>     >>>>>>>>? ? ? ?>> -XX:ArchiveClassesAtExit has been specified, but
>     the program
>     >>>>>>>>? ? ? ?hasn't exited.
>     >>>>>>>>? ? ? ?> Could you please add some comments to
>     >>>>>>>> _disable_method_binary_search
>     >>>>>>>>? ? ? ?> with the above explanation? Thanks.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?OK
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>> -
>     test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> ?76? ? ? ? ? ? ? ? ? ? ?if (name.equals("s0") ||
>     >>>>>>>>? ? ? ?name.equals("s1")) {
>     >>>>>>>>? ? ? ?>>> ?77? ? ? ? ? ? ? ? ? ? ? ?// String regions are
>     listed at
>     >>>>>>>>? ? ? ?the end and
>     >>>>>>>>? ? ? ?>>> they may not be fully occupied.
>     >>>>>>>>? ? ? ?>>> ?78? ? ? ? ? ? ? ? ? ? ? ?break;
>     >>>>>>>>? ? ? ?>>> ?79? ? ? ? ? ? ? ? ? ? ?} else if
>     (name.equals("bm")) {
>     >>>>>>>>? ? ? ?>>> ?80? ? ? ? ? ? ? ? ? ? ? ?// Bitmap space does
>     not have a
>     >>>>>>>>? ? ? ?requested address.
>     >>>>>>>>? ? ? ?>>> ?81? ? ? ? ? ? ? ? ? ? ? ?break;
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> It's not part of your change, but could you
>     please fix line 76
>     >>>>>>>>? ? ? ?- 78
>     >>>>>>>>? ? ? ?>>> since it is trivial. It seems the lines can be
>     removed.
>     >>>>>>>>? ? ? ?>> Removed.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - /src/hotspot/share/memory/archiveUtils.hpp
>     >>>>>>>>? ? ? ?>>> The file name does not match with the macro
>     '#ifndef
>     >>>>>>>>? ? ? ?>>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could
>     you please rename
>     >>>>>>>>? ? ? ?>>> archiveUtils.* ? archiveRelocator.hpp and
>     >>>>>>>> archiveRelocator.cpp are
>     >>>>>>>>? ? ? ?>>> more descriptive.
>     >>>>>>>>? ? ? ?>> I named the file archiveUtils.hpp so we can move
>     other misc
>     >>>>>>>>? ? ? ?stuff used
>     >>>>>>>>? ? ? ?>> by dumping into this file (e.g., DumpRegion,
>     WriteClosure from
>     >>>>>>>>? ? ? ?>> metaspaceShared.hpp), since theses are not used
>     by the majority
>     >>>>>>>>? ? ? ?of the
>     >>>>>>>>? ? ? ?>> files that use metaspaceShared.hpp.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>> I fixed the ifdef.
>     >>>>>>>>? ? ? ?>>
>     >>>>>>>>? ? ? ?>>> - src/hotspot/share/memory/archiveUtils.cpp
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>>? ? ?36 void
>     ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>     >>>>>>>>? ? ? ?address*
>     >>>>>>>>? ? ? ?>>> ptr_base, address* ptr_end) {
>     >>>>>>>>? ? ? ?>>> ?37? ?assert(_ptrmap == NULL, "initialize only
>     once");
>     >>>>>>>>? ? ? ?>>> ?38? ?_ptr_base = ptr_base;
>     >>>>>>>>? ? ? ?>>> ?39? ?_ptr_end = ptr_end;
>     >>>>>>>>? ? ? ?>>> ?40? ?_compacted = false;
>     >>>>>>>>? ? ? ?>>> ?41? ?_ptrmap = ptrmap;
>     >>>>>>>>? ? ? ?>>> ?42? ?_ptrmap->initialize(12 * M /
>     sizeof(intptr_t)); //
>     >>>>>>>>? ? ? ?default
>     >>>>>>>>? ? ? ?>>> archive is about 12MB.
>     >>>>>>>>? ? ? ?>>>? ? ?43 }
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> Could we do a better estimate here? We could
>     guesstimate the
>     >>>>>>>> size
>     >>>>>>>>? ? ? ?>>> based on the current used class space and
>     metaspace size. It's
>     >>>>>>>>? ? ? ?okay if
>     >>>>>>>>? ? ? ?>>> a larger bitmap used, since it can be reduced
>     after all
>     >>>>>>>>? ? ? ?marking are
>     >>>>>>>>? ? ? ?>>> done.
>     >>>>>>>>? ? ? ?>> The bitmap is automatically expanded when
>     necessary in
>     >>>>>>>>? ? ? ?>> ArchivePtrMarker::mark_pointer(). It's only
>     about 1/32 or 1/64
>     >>>>>>>>? ? ? ?of the
>     >>>>>>>>? ? ? ?>> total archive size, so even if we do expand, the
>     cost will be
>     >>>>>>>>? ? ? ?trivial.
>     >>>>>>>>? ? ? ?> The initial value is based on the default CDS
>     archive. When
>     >>>>>>>> dealing
>     >>>>>>>>? ? ? ?> with a really large archive, it would have to
>     re-grow many times.
>     >>>>>>>>? ? ? ?> Also, using a hard-coded value is less desirable.
>     >>>>>>>>
>     >>>>>>>>? ? ? ?OK, I changed it to the following
>     >>>>>>>>
>     >>>>>>>>? ? ? ? ? // Use this as initial guesstimate. We should
>     need less space
>     >>>>>>>>? ? ? ?in the
>     >>>>>>>>? ? ? ? ? // archive, but if we're wrong the bitmap will
>     be expanded
>     >>>>>>>>? ? ? ?automatically.
>     >>>>>>>>? ? ? ? ? size_t estimated_archive_size =
>     >>>>>>>> MetaspaceGC::capacity_until_GC();
>     >>>>>>>>? ? ? ? ? // But set it smaller in debug builds so we
>     always test the
>     >>>>>>>>? ? ? ?expansion
>     >>>>>>>>? ? ? ?code.
>     >>>>>>>>? ? ? ? ? // (Default archive is about 12MB).
>     >>>>>>>> DEBUG_ONLY(estimated_archive_size = 6 * M);
>     >>>>>>>>
>     >>>>>>>>? ? ? ? ? // We need one bit per pointer in the archive.
>     >>>>>>>> _ptrmap->initialize(estimated_archive_size /
>     sizeof(intptr_t));
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>? ? ? ?Thanks!
>     >>>>>>>>? ? ? ?- Ioi
>     >>>>>>>>
>     >>>>>>>>? ? ? ?>
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>>
>     >>>>>>>>? ? ? ?>>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>     >>>>>>>>? ? ? ?<jianglizhou at google.com
>     <mailto:jianglizhou at google.com> <mailto:jianglizhou at google.com
>     <mailto:jianglizhou at google.com>>> wrote:
>     >>>>>>>>? ? ? ?>>>> Hi Ioi,
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> This is another great step for CDS usability
>     improvement.
>     >>>>>>>>? ? ? ?Thank you!
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> I have a high level question (or request):
>     could we consider
>     >>>>>>>>? ? ? ?>>>> separating the relocation work for 'direct'
>     class metadata
>     >>>>>>>>? ? ? ?from other
>     >>>>>>>>? ? ? ?>>>> types of metadata (such as the shared system
>     dictionary,
>     >>>>>>>>? ? ? ?symbol table,
>     >>>>>>>>? ? ? ?>>>> etc)? Initially we only relocate the tables
>     and other
>     >>>>>>>>? ? ? ?archived global
>     >>>>>>>>? ? ? ?>>>> data. When each archived class is being
>     loaded, we can
>     >>>>>>>>? ? ? ?relocate all
>     >>>>>>>>? ? ? ?>>>> the pointers within the current class. We
>     could find the
>     >>>>>>>>? ? ? ?segment (for
>     >>>>>>>>? ? ? ?>>>> the current class) in the bitmap and update
>     the pointers
>     >>>>>>>>? ? ? ?within the
>     >>>>>>>>? ? ? ?>>>> segment. That way we can reduce initial
>     startup costs and
>     >>>>>>>>? ? ? ?also avoid
>     >>>>>>>>? ? ? ?>>>> relocating class data that's not used at
>     runtime. In some
>     >>>>>>>>? ? ? ?real world
>     >>>>>>>>? ? ? ?>>>> large systems, an archive may contain
>     extremely large
>     >>>>>>>> number of
>     >>>>>>>>? ? ? ?>>>> classes.
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> Following are partial review comments so we
>     can move things
>     >>>>>>>>? ? ? ?forward.
>     >>>>>>>>? ? ? ?>>>> Still going through the rest of the changes.
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> - src/hotspot/share/classfile/javaClasses.cpp
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> 1218 void
>     >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>     >>>>>>>>? ? ? ?>>>> archived_mirror) {
>     >>>>>>>>? ? ? ?>>>> 1219? ?Klass* k =
>     >>>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>     >>>>>>>>? ? ? ?>>>> 1220? ?if (k != NULL) { // k is NULL for the
>     primitive
>     >>>>>>>>? ? ? ?classes such as
>     >>>>>>>>? ? ? ?>>>> java.lang.Byte::TYPE <<<<<<<<<<<
>     >>>>>>>>? ? ? ?>>>> 1221
>     archived_mirror->metadata_field_put(_klass_offset,
>     >>>>>>>>? ? ? ?>>>> (Klass*)(address(k) +
>     MetaspaceShared::mapping_delta()));
>     >>>>>>>>? ? ? ?>>>> 1222? ?}
>     >>>>>>>>? ? ? ?>>>> 1223 ...
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> Primitive type mirrors are handled separately.
>     Could you
>     >>>>>>>>? ? ? ?please verify
>     >>>>>>>>? ? ? ?>>>> if this call path happens for primitive type
>     mirror?
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> To answer my question above, looks like you
>     added the
>     >>>>>>>>? ? ? ?following, which
>     >>>>>>>>? ? ? ?>>>> is to be used for primitive type mirrors. That
>     seems to be
>     >>>>>>>>? ? ? ?the reason
>     >>>>>>>>? ? ? ?>>>> why update_archived_mirror_native_pointers is
>     trying to also
>     >>>>>>>>? ? ? ?cover
>     >>>>>>>>? ? ? ?>>>> primitive type. It better to have a separate
>     API for
>     >>>>>>>>? ? ? ?primitive type
>     >>>>>>>>? ? ? ?>>>> mirror, which is cleaner. And, we also can
>     replace the above
>     >>>>>>>>? ? ? ?check at
>     >>>>>>>>? ? ? ?>>>> line 1220 to be an assert for regular mirrors.
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> +void ReadClosure::do_mirror_oop(oop *p) {
>     >>>>>>>>? ? ? ?>>>> + do_oop(p);
>     >>>>>>>>? ? ? ?>>>> + oop mirror = *p;
>     >>>>>>>>? ? ? ?>>>> + if (mirror != NULL) {
>     >>>>>>>>? ? ? ?>>>> +
>     >>>>>>>>
>     java_lang_Class::update_archived_mirror_native_pointers(mirror);
>     >>>>>>>>? ? ? ?>>>> + }
>     >>>>>>>>? ? ? ?>>>> +}
>     >>>>>>>>? ? ? ?>>>> +
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> How about renaming
>     update_archived_mirror_native_pointers to
>     >>>>>>>>? ? ? ?>>>> update_archived_mirror_klass_pointers.
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> It would be good to pass the current klass as
>     an argument.
>     >>>>>>>> We can
>     >>>>>>>>? ? ? ?>>>> verify the relocated pointer matches with the
>     current klass
>     >>>>>>>>? ? ? ?pointer.
>     >>>>>>>>? ? ? ?>>>>
>     >>>>>>>>? ? ? ?>>>> We should also check if relocation is
>     necessary before
>     >>>>>>>>? ? ? ?spending cycles
>     >>>>>>>>? ? ? ?>>>> to obtain the 
>


From david.holmes at oracle.com  Wed Nov 13 05:59:54 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 13 Nov 2019 15:59:54 +1000
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
 <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
 <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>
Message-ID: <fd3d57f2-6f1b-7b3d-0601-6f2c36182559@oracle.com>

Hi Jiangli,

On 13/11/2019 12:20 pm, Jiangli Zhou wrote:
> Hi Harold and Ioi,
> 
> Thanks a lot for the additional feedback.
> 
> I did some quick research today about -Xverify:none usages. My finding
> showed that the use of -Xverify:none is not very uncommon in some
> cases. Here are some of the usages:
> 
> - trusted tools

But what is the context? Is it:

"I trust this tool, and all other classes, so I'll optimize by disabling 
verification,"; or

"This tool produces non-verifiable classfiles, but I trust the tool and 
so will disable verification" (which implicitly means all 
classes/libraries have to be fully trusted)

?

I'm not sure you can use any existing uses of -Xverify:none to infer the 
applicability or not to what is being proposed here for CDS.

> - some limited testing environment
> 
> CDS (particularly with dynamic archiving capability) may help avoid
> runtime verification overhead by verifying classes at dump time and
> reduce the needs for -Xverify:none. It would be good to have
> strategies for the following senators as well when removing
> -Xverify:none:
> 
> 1) In cases when shared archive is disabled at runtime (I hope it's
> not common cases)

I'm not quite sure what you are saying here. If a pre-verified archive 
can't be used at runtime then normal verification should occur as 
classes are not being loaded from a known pre-verified location.

> 2) When users want to reduce the overhead caused by verification
> during archiving dump time

I would not expect dumping to be such a time critical activity that 
users would care about the "overhead" of verification.

Cheers,
David

> Thoughts?
> 
> Best,
> Jiangli
> 
> On Tue, Nov 12, 2019 at 4:16 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>> I am also a little worried that this might send the wrong message -- "if
>> you want to archive pre-JDK6 classes, you need to disable verification
>> altogether for all classes in your entire app".
>>
>> Thanks
>> - Ioi
>>
>> On 11/12/19 12:40 PM, Harold Seigel wrote:
>>> Hi Jiangli,
>>>
>>> I think this change is going in the wrong direction.  We are trying to
>>> discourage disabling verification, not encourage it.  We also do not
>>> want to create more use-cases for preserving -Xverify:none.
>>>
>>> It looks like your change would allow archiving of unverified pre-JDK6
>>> classes, but not allow archiving of verified pre-JDK6 classes.  If so,
>>> that seems backward.
>>>
>>> Thanks, Harold
>>>
>>> On 11/11/2019 11:53 PM, Ioi Lam wrote:
>>>> I wonder if there's a safer alternative. Are there tools that can add
>>>> stackmaps to pre-JDK6 classes? That way they can be verified with the
>>>> split verifier during CDS dump time.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
>>>>> Hi David,
>>>>>
>>>>> Thanks for quick response!
>>>>>
>>>>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes
>>>>> <david.holmes at oracle.com> wrote:
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
>>>>>>> Please review the following change that allows archiving
>>>>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
>>>>>>>
>>>>>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
>>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
>>>>>>>
>>>>>>> Currently there are still large number of existing classes
>>>>>>> (pre-built)
>>>>>>> with older class versions (< 50) in real world applications. Those
>>>>>>> classes are missing the benefit of archiving. Particularly, in some
>>>>>>> use cases, class verification can be safely disabled. For those use
>>>>>>> cases, supporting archiving pre JDK 6 classes shows good performance
>>>>>>> benefit. We can re-evaluate this support when -Xverify:none is
>>>>>>> removed
>>>>>>> in the future, hopefully the needs for supporting class version < 50
>>>>>>> is no longer significant at that time.
>>>>>>>
>>>>>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
>>>>>>> sure the dump-time verification mode must be the same or stronger
>>>>>>> than
>>>>>>> the current mode.
>>>>>>>
>>>>>>> A CSR may be needed for the change. Any thoughts on that?
>>>>>> A CSR request is definitely required given that you are proposing to
>>>>>> undo a change that was itself put in place via a CSR request! And
>>>>>> given
>>>>>> this is relaxing a "defense-in-depth" check which will result in
>>>>>> increasing exploitability, I think you will need a very strong
>>>>>> argument
>>>>>> to justify this.
>>>>> Thanks for confirming this! Will do.
>>>>>
>>>>>> Further this not only undoes JDK-8197972 but it also invalidates
>>>>>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
>>>>>> requested a way to know if verification had been disabled, to help
>>>>>> with
>>>>>> analyzing crash reports, but instead we decided to not allow
>>>>>> verification to be disabled.
>>>>> I had some concerns about JDK-8155671 initially before making the
>>>>> change, as it's a closed bug and my memory about the specific issue
>>>>> was flushed out. I brought up the question in the bug. My take on
>>>>> Ioi's response to my query about JDK-8155671 was that the
>>>>> pre-JDK-8197972 behavior would not cause any security hole.
>>>>>
>>>>> Re-evaluating this particular behavior, I think the pre-JDK-8155671
>>>>> would actually matches user intention better. If user decides to turn
>>>>> off verification in safe use cases, it seems to be a good idea to
>>>>> honor that. With the new dynamic archiving capability, archive could
>>>>> be created at the first time when running a particular application.
>>>>> Not forcing verification when user decides to can avoid
>>>>> unnecessary/unwanted overhead.
>>>>>
>>>>> If verification is turned off at dump time for application classes,
>>>>> runtime does not allow execution without also turning off
>>>>> verification. We can determine a crash is not caused by relaxed dump
>>>>> time verification.
>>>>>
>>>>> Regards,
>>>>> Jiangli
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Tested with jtreg appcds tests.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jiangli
>>>>>>>
>>>>
>>

From adinn at redhat.com  Wed Nov 13 08:55:57 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 13 Nov 2019 08:55:57 +0000
Subject: [aarch64-port-dev ] RFR 8231841: AArch64: Add entry to pns output
 in help()
In-Reply-To: <BCE2D257-945C-4BFC-930E-D8922C59B6A6@arm.com>
References: <BCE2D257-945C-4BFC-930E-D8922C59B6A6@arm.com>
Message-ID: <eca78d92-145e-8ab8-ab5a-6a848464b0fe@redhat.com>


On 12/11/2019 18:03, Alan Hayward wrote:
> Please could you review this change which adds AArch64 to the pns section of the help() output.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8231841
> Webrev: http://cr.openjdk.java.net/~smonteith/8231841/webrev.0/
> 
> 
> Built and ran tier1 on x86 and AArch64.
Yes, that's good to push thanks.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From coleen.phillimore at oracle.com  Wed Nov 13 12:17:35 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 13 Nov 2019 07:17:35 -0500
Subject: RFR (M) 8233913: Remove implicit conversion from Method* to
 methodHandle
In-Reply-To: <18ecffa1-36be-4a1b-616d-5badf50d87ab@oracle.com>
References: <c7d96b9b-5ad5-2b91-9bbc-d10db36862d9@oracle.com>
 <18ecffa1-36be-4a1b-616d-5badf50d87ab@oracle.com>
Message-ID: <0b4acfe7-0147-1198-8993-d7ab3b4ce838@oracle.com>


On 11/12/19 10:51 PM, Ioi Lam wrote:
> Hi Coleen,
>
> I've scanned through all the changes. It looks good. Just a few small 
> nits:

Ioi,

Thank you for spending so much time looking at this!
>
>
> [1] Not sure if you want to handle it in this patch, but MethodData 
> initialization is a bit messy:
>
> For MethodData::MethodData() -> MethodData::initialize() -> 
> MethodData::init(), I think you can pass in both the THREAD and the 
> methodHandle, so you don't need to query the current thread again.This 
> can skip two Thread::current() calls for each allocated MethodData.
>
> (But you'd also need no-arg variants of initialize() and init() other 
> callers, such as reprofile in jvmciCompilerToVM.cpp .... )
>
> and why do we have MethodData::initialize() and MethodData::init()??

Ew, I don't really want to change this with this patch.? The implicit 
calls are now explicit so they look bad, so maybe someone can rewrite 
these.? I don't know why both exist either!
>
> [2] Not a big deal, but should the variables be renamed from mh to m?
>
> void TieredThresholdPolicy::print_counters(const char* prefix, Method* 
> mh) {
>
> void TieredThresholdPolicy::print_event(EventType type, Method* mh, 
> Method* imh,
> ??????????????????????????????????????? int bci, CompLevel level) {
>

I fixed these and retested.

Thanks!
Coleen
>
> Thanks
> - Ioi


From daniel.daugherty at oracle.com  Wed Nov 13 14:17:58 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 13 Nov 2019 09:17:58 -0500
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <96cf9e10-a3df-fc5a-2cfa-ac156c10f99c@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <05b4ec18-1a93-7d3d-fb17-1ce2f5c27e11@oracle.com>
 <96cf9e10-a3df-fc5a-2cfa-ac156c10f99c@oracle.com>
Message-ID: <89859c60-97bd-76f0-74dc-3c74fb919d3b@oracle.com>

On 11/12/19 5:50 PM, David Holmes wrote:
> Hi Dan,
>
> Thanks for taking a look so quickly!

Your welcome! I figured you would prefer to get this one out of
the way quickly.


>
> On 13/11/2019 3:18 am, Daniel D. Daugherty wrote:
>> On 11/11/19 11:52 PM, David Holmes wrote:
>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>
>> src/hotspot/os/posix/os_posix.cpp
>> ???? L2078: ? // Can't access interrupt state now we are 
>> _thread_blocked. If we've been
>> ???? L2079: ? // interrupted since we checked above then _counter 
>> will be > 0.
>> ???????? nit - grammar. Please consider:
>> ??? ? ? ? ? ? // Can't access interrupt state now that we are 
>> _thread_blocked. If we've
>> ???? ? ? ?? ? // been interrupted since we checked above then 
>> _counter will be > 0.
>>
>> src/hotspot/os/solaris/os_solaris.cpp
>> ???? L4924: ? // Can't access interrupt state now we are 
>> _thread_blocked. If we've been
>> ???? L4925: ? // interrupted since we checked above then _counter 
>> will be > 0.
>> ???????? nit - grammar. Please consider:
>> ????????????? // Can't access interrupt state now that we are 
>> _thread_blocked. If we've
>> ????????????? // been interrupted since we checked above then 
>> _counter will be > 0.
>
> Will fix grammatical nits.
>
>> src/hotspot/share/classfile/javaClasses.cpp
>> ???? No comments.
>>
>> src/hotspot/share/prims/jvmtiEnv.cpp
>> ???? Hmmm... did the "non-JavaThread can't be interrupted" check also 
>> get
>> ???? pushed down?
>> ???? Update: Similar check is now in JvmtiRawMonitor::raw_wait().
>>
>> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>> ???? L239: ??? ThreadInVMfromNative tivm(jt);
>> ???? L240: ??? if (jt->is_interrupted(true)) {
>> ???? L241: ??????? ret = M_INTERRUPTED;
>> ???? L242: ??? } else {
>> ?? ? L243: ????? ThreadBlockInVM tbivm(jt);
>> ?? ? L244: ????? jt->set_suspend_equivalent();
>> ?? ? L245: ????? if (millis <= 0) {
>> ?? ? L246: ??????? self->_ParkEvent->park();
>> ?? ? L247: ????? } else {
>> ?? ? L248: ??????? self->_ParkEvent->park(millis);
>> ?? ? L249: ????? }
>> ?? ? L250: ??? }
>> ?? ? L251: ??? // Return to VM before post-check of interrupt state
>> ?? ? L252: ??? if (jt->is_interrupted(true)) {
>> ???????? The comment on L251 is better between L249 and L250 since that
>> ???????? is where 'tbivm' gets destroyed and you transition back.
>>
>> ???????? You could have this comment before L252:
>>
>> ??????????????? // Must be in VM to safely access interrupt state:
>>
>> ???????? if you think you really need a comment there.
>
> Will move comment up as suggested.
>
>> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/objectMonitor.cpp
>> ???? You've moved the is_interrupted() check from after ThreadBlockInVM
>> ???? to before it. ThreadBlockInVM can block for a safepoint which 
>> widens
>> ???? the window for an interrupt to come in after the check on L1272 and
>> ???? and before the thread parks on L1286 or L1288.
>>
>> ???? Can this result in an unexpected park() where before we would have
>> ???? taken the "Intentionally empty" code path on L1283?
>>
>> ???? What I'm worried about is whether we've opened a window where we
>> ???? do Object.wait(0) and that wait() is supposed to be interrupted.
>> ???? However, we lose that interrupt because it arrives in the now wider
>> ???? window between L1272 and L1286 and we never return from the 
>> wait(0).
>>
>> ???? It is possible that I'm not remembering something about how 
>> interrupt()
>> ???? interacts with park().
>
> The interrupt() not only sets the field but also issues an unpark() to 
> the ParkEvent. So if we are interrupted whilst processing through the 
> TBIVM, the call to park() will return immediately as the ParkEvent 
> will be in the signalled state.

That was the piece I wasn't remembering. Thanks for filling in the detail.


>
>> test/hotspot/jtreg/ProblemList.txt
>> ???? Thanks for remembering to update the ProblemList.
>>
>> The only part I'm worried about is ObjectMonitor::wait(). If my worry is
>> baseless, then thumbs up.
>
> Worry is baseless :)

Agreed!


>
>> I have a couple of nits above. If you choose to fix those, then I don't
>> need to see another webrev.
>
> Thanks again!

You're welcome.

Dan

>
> David
> -----
>
>> Dan
>>
>>
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>
>>> In JDK-8229516 I moved the interrupted state of a thread from the 
>>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>>> overlooked a critical aspect, which is that to access the field of a 
>>> Java object the JavaThread must not be in a safepoint-safe state** - 
>>> otherwise the oop, and anything referenced there from could be 
>>> relocated by the GC whilst the JavaThread is accessing it. This 
>>> manifested in a number of tests using JVM TI Agent threads and JVM 
>>> TI RawMonitors because the JavaThread's were marked _thread_blocked 
>>> and hence safepoint-safe, and we read a non-zero value for the 
>>> interrupted field even though we had never been interrupted.
>>>
>>> This problem existed in all the code that checks for interruption 
>>> when "waiting":
>>>
>>> - Parker::park (the code underpinning 
>>> java.util.concurrent.LockSupport.park())
>>>
>>> To fix this code I simply deleted a late check of the interrupted 
>>> field. The check was not needed because if an interrupt has occurred 
>>> then we will find the ParkEvent in a signalled state.
>>>
>>> - ObjectMonitor::wait
>>>
>>> Here the late check of the interrupted state is essential as we 
>>> reset the ParkEvent after an earlier check of the interrupted state. 
>>> But the fix was simply achieved by moving the check slightly earlier 
>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>
>>> - RawMonitor::wait
>>>
>>> This fix was much more involved. The RawMonitor code directly 
>>> transitions the JavaThread from _thread_in_Native to 
>>> _thread_blocked. This is safe from a safepoint perspective because 
>>> they are equivalent safepoint-safe states. To allow access to the 
>>> interrupted field I have to transition from native to _thread_in_vm, 
>>> and that has to be done by proper thread-state transitions to ensure 
>>> correct access to the oop and its fields. Having done that I can 
>>> then use ThreadBlockInVM for the transitions to blocked. However, as 
>>> the old code noted it can't use proper thread-state transitions as 
>>> this will lead to deadlocks with the VMThread that can also use 
>>> RawMonitors when executing various event callbacks. To deal with 
>>> that we have to note that the real constraint is that the JavaThread 
>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence 
>>> the fix was push all the interrupt checking code and the 
>>> thread-state transitions to the lowest level of RawMonitorWait, 
>>> around the final park() call, after we have enqueued the waiter and 
>>> released the monitor. That avoids any deadlock possibility.
>>>
>>> I also added checks to is_interrupted/interrupted to ensure they are 
>>> only called by a thread in a suitable state. This should only be the 
>>> VMThread (as a consequence of the Thread.stop implementation 
>>> occurring at a safepoint and issuing a JavaThread::interrupt() call 
>>> to unblock the target); or a JavaThread that is not 
>>> _thread_in_native or _thread_blocked.
>>>
>>> Testing: (still finalizing)
>>> ?- tiers 1 - 6 (Oracle platforms)
>>> ?- Local Linux testing
>>> ? - vmTestbase/nsk/monitoring/
>>> ? - vmTestbase/nsk/jdwp
>>> ? - vmTestbase/nsk/jdb/
>>> ? - vmTestbase/nsk/jdi/
>>> ? - vmTestbase/nsk/jvmti/
>>> ? - serviceability/jvmti/
>>> ? - serviceability/jdwp
>>> ? - JDK: java/lang/management
>>> ???????? com/sun/management
>>>
>>> ** Note that this applies to all accesses we make via code in 
>>> javaClasses.*. For this particular code I thought about adding a 
>>> guard in JavaThread::threadObj() but it turns out when we generate a 
>>> crash report we access the Thread's name() field and that can happen 
>>> when in any state, so we'd always trigger a secondary assertion 
>>> failure during error reporting if we did that. Note that accessing 
>>> name() can still easily lead to secondary assertions failures as I 
>>> discovered when trying to debug this and print the thread name out - 
>>> I would see an is_instance assertion fail checking that the Thread 
>>> name() is an instance of java.lang.String!
>>>
>>> Thanks,
>>> David
>>> -----
>>


From daniel.daugherty at oracle.com  Wed Nov 13 14:28:28 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 13 Nov 2019 09:28:28 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <fa7d8cb8-a948-10aa-6c82-d81a2271af49@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <383a1330-1e3d-66db-c95b-9e6f9910641f@oracle.com>
 <fc503d62-b1f6-5842-85c7-f230fc942f5b@oracle.com>
 <fa7d8cb8-a948-10aa-6c82-d81a2271af49@oracle.com>
Message-ID: <a57bd7ca-f1e5-5536-c017-f4f6fe6c9ffe@oracle.com>

Hi David,

On 11/12/19 6:12 PM, David Holmes wrote:
> Hi Dan,
>
> On 13/11/2019 8:24 am, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I'm only going to jump in on a single item here (at this time).
>>
>>
>> On 11/11/19 9:03 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 11/11/2019 10:41 pm, Robbin Ehn wrote:
>>>>
>>>> Also in this patch there is already Atomic::store/load on "volatile 
>>>> markWord _header;".
>>>
>>> And I've flagged the inappropriateness of using these with Dan. 
>>> Though I see we already have a couple of pre-existing occurrences 
>>> which have snuck in - again this seems to be a misunderstanding 
>>> about the need for Atomic use in these cases.
>>
>> I read this comment from Robbin and David's reply and my brain said: 
>> What?
>> It might have been just the common three letter acronym, but I 
>> digress... :-)
>
> First let me clarify that my comment quoted above may have been too 
> broad/general. I wasn't recalling specific changes where you added 
> Atomic::load/store but email exchanges where you said words to the 
> effect of "I could replace ... with Atomic::store ..." and I replied 
> that there was no need to use Atomic::load/store.

Actually it was Robbin that said the change was in my patch... and it is...
I just wanted to clarify that it was due to code motion. :-)

I do remember your general comments about Atomic::load/store. Of course,
because of those comments, I was planning to ping you about the use of
Atomic::load/store with the _header field. Mostly because I hadn't done
that with all of the new volatiles I added... So this sub-thread in the
8153224 review provided the perfect opportunity for me to chase that
little to-do item to ground...


>
>> So I searched the patch:
>
> Thanks for digging that out, as I hadn't recalled those details.
>
> Short version: I've agreed with Robbin that we should move to use 
> Atomic::load/store to get compiler-based-atomicity rather than relying 
> on use of "volatile" on variables. But for the purposes of this patch 
> (where Robbin made a number of suggestions on where to use 
> Atomic::load/store) that we try to limit using this new style to 
> inherently new code (ie lock-free list management) rather than 
> retrofitting all existing usages.

Agreed. I have a bit of work to do there to address all of Robbin's
comments. We'll see if I can get it right in the next round...


> Hope that clarifies somewhat.

It does. Thanks! I'll ping you, Robbin and Erik off thread if I need
any clarifications...

Dan


>
> Thanks,
> David
> -----
>
>> $ grep _header 11-for-jdk14.v2.08.full/open.patch | egrep 'store|load'
>> ??? assert(Atomic::load(&_header).value() != 0, "must be non-zero");
>> +? Atomic::store(markWord::zero(), &_header);
>> -? Atomic::store(markWord::zero(), &_header);
>>
>> Oh... that code... If you look at the 8153224 webrev you'll see that the
>> ObjectMonitor::clear() function was refactored into two parts:
>>
>> ???? ObjectMonitor::clear()
>> ???? ObjectMonitor::clear_using_JT()
>>
>> This line:
>>
>> ???? Atomic::store(markWord::zero(), &_header);
>>
>> is in the original ObjectMonitor::clear() function and it is in
>> the new ObjectMonitor::clear() function, but there's a lot of
>> code motion in between those two functions so... diff took the
>> easy way out and showed this:
>>
>> ???? +? Atomic::store(markWord::zero(), &_header);
>>
>> as a new line in the shorter, new clear() function
>> and as a deleted line in the longer, old clear() function:
>>
>> ???? -? Atomic::store(markWord::zero(), &_header);
>>
>>
>> Okay, so where did that come from? I've been tweaking a lot of
>> ObjectMonitor code lately, but I don't think that line is mine...
>>
>> $ hg annot src/hotspot/share/runtime/objectMonitor.inline.hpp | grep 
>> 'Atomic::store(markWord::zero(), &_header);'
>> 56006:?? Atomic::store(markWord::zero(), &_header);
>>
>> $ hg log -r 56006
>> changeset:?? 56006:90ead0febf56
>> user:??????? stefank
>> date:??????? Tue Aug 06 10:48:21 2019 +0200
>> summary:???? 8229258: Rework markOop and markOopDesc into a simpler 
>> mark word value carrier
>>
>> Okay, I remember this bug:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8229258
>>
>> and I even reviewed it... :-)? It looks like the reviewers are:
>>
>> ?> Reviewed-by: rkennke, coleenp, kbarrett, dcubed
>>
>>
>> Roman posted this comment during the 8229258 review:
>>
>> On 8/15/19 1:06 PM, Roman Kennke wrote:
>>
>>> Out of curiosity, what's with the changes in 
>>> objectMonitor.inline.hpp to
>>> access the markWord atomically?:
>>>
>>> -inline markOop ObjectMonitor::header() const {
>>> -? return _header;
>>> +inline markWord ObjectMonitor::header() const {
>>> +? return Atomic::load(&_header);
>>> ? }
>>>
>>> I guess this is good (equal or stronger than before) but is there a
>>> rationale behind these changes?
>>
>> and Stefan K replied with this:
>>
>> On 8/15/19 3:26 PM, Stefan Karlsson wrote:
>>
>>> Ahh. Right. That was done to solve the problems I were having with 
>>> volatiles. For example:
>>> src/hotspot/share/runtime/objectMonitor.inline.hpp:38:10: error: 
>>> binding reference of type 'const markWord&' to 'const volatile 
>>> markWord' discards qualifiers
>>> ?? return _header;
>>>
>>> and:
>>> src/hotspot/share/runtime/basicLock.hpp:40:74: error: implicit 
>>> dereference will not access object of type ?volatile markWord? in 
>>> statement [-Werror]
>>> ? void???????? set_displaced_header(markWord header) { 
>>> _displaced_header = header; }
>>>
>>> Kim suggested that the fact that these fields were volatile was an 
>>> indication that we should be doing some kind of atomic/ordered 
>>> operation. By replacing these loads and stores with calls to the 
>>> Atomic APIs, and providing the 
>>> PrimitiveConversions::Translate<markWord> specialization, we could 
>>> solve that problem. 
>>
>> So it appears that Stefan has a good rationale for making the
>> Atomic::load() and Atomic::store() changes with the _header field.
>> Since I've added more volatile fields to ObjectMonitor, it would
>> follow that I should make similar changes...
>>
>> However, it's not clear that David agrees with the above change so
>> I'm hesitant to make the similar changes to my patch...
>>
>> How do we resolve this issue?
>>
>> Dan


From jianglizhou at google.com  Wed Nov 13 15:37:17 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Wed, 13 Nov 2019 07:37:17 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <0fec66c6-b8a2-6019-655b-467f84404386@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <CALrW1jyGu8eGdu+WiyUCuRyPDMGvFazPJWXdBawWM_O=7j6NnA@mail.gmail.com>
 <CALrW1jzTSrFnw1LWeT3o5ZLd=7+NOCTDMxY7Ex5r-EHWhbSAow@mail.gmail.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
 <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
 <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>
 <CALrW1jye1Oua7e3LCNV6-c_pkYa3Ujni7own-ntXaFqv8tM6-Q@mail.gmail.com>
 <0fec66c6-b8a2-6019-655b-467f84404386@oracle.com>
Message-ID: <CALrW1jxDLNH3Mp89YuU++NSnjo=OQGBdH1OSUJa9SZO6pjMo2A@mail.gmail.com>

Look good!

Best,
Jiangli

On Tue, Nov 12, 2019 at 9:12 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 11/10/19 5:14 PM, Jiangli Zhou wrote:
>
>
>
> On Sun, Nov 10, 2019, 3:13 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>>
>>
>> On 11/9/19 8:25 PM, Jiangli Zhou wrote:
>> > Hi Ioi,
>> >
>> > On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> >> Hi Jiangli,
>> >>
>> >> Thanks for your comments. Please see my replies in-line:
>> >>
>> >> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
>> >>> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>> >>>> I looked both 05.full and 06.delta webrevs. They look good.
>> >>>>
>> >>>> I still feel a bit uneasy about the potential runtime impact when data
>> >>>> does get relocated. Long running apps/services may be shy away from
>> >>>> enabling archive at runtime, if there is a detectable overhead even
>> >>>> though it may only occur rarely. As relocation is enabled by default
>> >>>> and users cannot turn it off, disabling with -Xshare:off entirely
>> >>>> would become the only choice. Could you please create a new RFE
>> >>>> (possibly with higher priority) to investigate the potential effect,
>> >>>> or provide an option for users to opt-in relocation with the
>> >>>> command-line switch?
>> >> I created https://bugs.openjdk.java.net/browse/JDK-8233862
>> >> Investigate performance benefit of relocating CDS archive to under 32G
>> >>
>> >> As I noted in the bug report, I ran benchmarks with CDS relocation
>> >> on/off, and there's no sign of regression when the CDS archive is
>> >> relocated. Please see the bug report for how to configure the VM to do
>> >> the comparison.
>> >>
>> >> As you said before: "When enabling CDS we [google] noticed a small
>> >> runtime overhead in JDK 11 recently with a benchmark. After I backported
>> >> JDK-8213713 to 11, it seemed to reduce the runtime overhead that the
>> >> benchmark was experiencing":
>> >>
>> >> Can you confirm whether this is stock JDK 11 or a special google build?
>> >> Which test case did you use? Is it possible for you to run the tests
>> >> again (using the exact before/after bits that you had when backporting
>> >> JDK-8213713)? Can you check if narrow_klass_base and narrow_klass_shift
>> >> are the same in your before/after builds?
>> > Thanks for creating the RFE.
>> >
>> > JDK-8213713 closes the 1G gap between the shared space and class space
>> > and everything else is unaffected. The compressed class base and shift
>> > were the same for before and after applying JDK-8213713. The effect
>> > was statistically observed for the benchmark since the difference was
>> > very small and could be within noise level for single run comparison.
>> > A small difference could still be important for some use cases so it
>> > needs to be taken into consideration when designing and implementing
>> > new changes.
>>
>> Hi Jiangli,
>>
>> Thanks for taking the time for doing the performance measurements.
>>
>> I also ran benchmarks in all 3 modes (no CDS, CDS without relocation,
>> CDS with relocation), and did not see any significant performance with
>> Octane-DeltaBlue, Octane-NavierStokes, SPECjbb2005-Tuned,
>> JFR-SPECjbb2005-Tuned, SPECjvm2008-Serial-G1 and Tools-Javac-Hello.
>>
>>
>> >
>> > A new command-line for archived metadata relocation may still be
>> > valuable. It would also be helpful for debugging and diagnosis.
>> >
>>
>> How about a diagnostic flag ArchiveRelocationMode:
>>
>> 0: (default) first map at preferred address, and if unsuccessful, map to
>> alternative address;
>> 1: always map to alternative address;
>> 2: always map at preferred address, and if unsuccessful, do not map the
>> archive;
>>
>> 1 is for testing relocation, as well as for easy performance measurement
>> (replaces the use of -XX:SharedBaseAddress=0 in my current patch.).
>> 2 is for avoiding potential regression that may be introduced by
>> relocation (revert to JDK 13 behavior).
>>
>> What do you think? If you like this I'll open a CSR.
>
>
>
> That sounds good to me!
>
>
> Hi Jiangli,
>
> It turns out that CSR is not needed for adding a diagnostic flag.
>
> I implemented the flag as described above. See:
>
> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v07-delta/
>
>
> Thanks
> - Ioi
>
>
> Regards,
> Jiangli
>
>>
>> Thanks
>> - Ioi
>>
>>
>>
>> >>> Forgot to say that when Java heap can fit into low 32G space, it takes
>> >>> the class space size into account and leaves need space right above
>> >>> (also in low 32G space) when reserving heap, for !UseSharedSpace. In
>> >>> that case, it's more likely the class data and heap data can be
>> >>> colocated successfully.
>> >> The reason is not for "colocation". It's so that narrow_klass_base can
>> >> be zero, and the klass pointer can be uncompressed with a shift (without
>> >> also doing an addition).
>> >>
>> >> But with CDS enabled, we always hard code to use non-zero
>> >> narrow_klass_base and 3 bit shift (for AOT). So by just relocating the
>> >> CDS archive to under 32GB, without modifying how CDS handles
>> >> narrow_klass_base/shift, I don't think we can expect any benefit.
>> > I experimented with mapping the shared space in low 32G and placed
>> > right above the Java heap. The class space was also allocated in the
>> > low 32G space and after the mapped shared space in the experiment. The
>> > compress class encoding was using 0 base and 3 shift, which was the
>> > same as the encoding when CDS was disabled. I didn't observe runtime
>> > performance difference when comparing that specific configuration with
>> > the normal CDS mapping scheme (the shared space start at 32G and the
>> > encoding is non-zero base and 3 shift).
>> >
>> > Thanks,
>> > Jiangli
>> >> For modern architectures, I am not aware of any inherent speed benefit
>> >> simply by putting data (in our case much larger than a page) "close to
>> >> each other" in the virtual address space. If you have any reference of
>> >> that, please let me know.
>> >>
>> >> Thanks
>> >> - Ioi
>> >>
>> >>> Thanks,
>> >>> Jiangli
>> >>>
>> >>>> Regards,
>> >>>> Jiangli
>> >>>>
>> >>>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> >>>>> Hi Coleen,
>> >>>>>
>> >>>>> Thanks for the review. Here's an webrev that has incorporated your
>> >>>>> suggestions:
>> >>>>>
>> >>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
>> >>>>>
>> >>>>> Please see comments in-line
>> >>>>>
>> >>>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
>> >>>>>> Hi, I've done a more high level code review of this and it looks good!
>> >>>>>>
>> >>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
>> >>>>>>
>> >>>>>>
>> >>>>>> I think these classes require comments on what they do and why. The
>> >>>>>> comments you sent me offline look good.
>> >>>>> I added more comments for ArchivePtrMarker::_compacted per your offline
>> >>>>> request.
>> >>>>>
>> >>>>>> Also .hpp files shouldn't include .inline.hpp files, like
>> >>>>>> bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
>> >>>>>> the cpp file.
>> >>>>> I moved the do_bit() function into archiveUtils.inline.hpp, since is
>> >>>>> used by 3 .cpp files, and performance is important.
>> >>>>>
>> >>>>>> I wonder if the exception list of classes to exclude should be a
>> >>>>>> function in javaClasses.hpp/cpp where the explanation would make more
>> >>>>>> sense?  ie bool
>> >>>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>> >>>>> I moved the checking code to javaClasses.cpp. Since we do (partially)
>> >>>>> support java.lang.Class, which has injected native pointers, I named the
>> >>>>> function as JavaClasses::is_supported_for_archiving instead. I also
>> >>>>> massaged the comments a little for clarification.
>> >>>>>
>> >>>>>> Is there already an RFE to move the DumpSharedSpaces output from
>> >>>>>> tty->print() to log_info() ?
>> >>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
>> >>>>> dumping tty->print_cr() to unified logging).
>> >>>>>
>> >>>>> Thanks
>> >>>>> - Ioi
>> >>>>>
>> >>>>>> Thanks,
>> >>>>>> Coleen
>> >>>>>>
>> >>>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
>> >>>>>>> Hi Jiangli,
>> >>>>>>>
>> >>>>>>> I've uploaded the webrev after integrating your comments:
>> >>>>>>>
>> >>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
>> >>>>>>>
>> >>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Please see more replies below:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>> >>>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
>> >>>>>>>> <mailto:ioi.lam at oracle.com>> wrote:
>> >>>>>>>>
>> >>>>>>>>       Hi Jiangli,
>> >>>>>>>>
>> >>>>>>>>       Thank you so much for spending time reviewing this RFE!
>> >>>>>>>>
>> >>>>>>>>       On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>> >>>>>>>>       > Hi Ioi,
>> >>>>>>>>       >
>> >>>>>>>>       > Sorry for the delay again. Will try to put this on the top of my
>> >>>>>>>>       list
>> >>>>>>>>       > next week and reduce the turn-around time. The updates look
>> >>>>>>>> good in
>> >>>>>>>>       > general.
>> >>>>>>>>       >
>> >>>>>>>>       > We might want to have a better strategy when choosing metadata
>> >>>>>>>>       > relocation address (when relocation is needed). Some
>> >>>>>>>>       > applications/benchmarks may be more sensitive to cache
>> >>>>>>>> locality and
>> >>>>>>>>       > memory/data layout. There was a bug,
>> >>>>>>>>       > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
>> >>>>>>>> 1G gap
>> >>>>>>>>       > between Java heap data and metadata before JDK 12. The gap
>> >>>>>>>> seemed to
>> >>>>>>>>       > cause a small but noticeable runtime effect in one case that I
>> >>>>>>>> came
>> >>>>>>>>       > across.
>> >>>>>>>>
>> >>>>>>>>       I guess you're saying we should try to relocate the archive into
>> >>>>>>>>       somewhere under 32GB?
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I don't yet have sufficient data that determins if mapping at low
>> >>>>>>>> 32G produces better runtime performance. I experimented with that,
>> >>>>>>>> but didn't see noticeable difference when comparing to mapping at
>> >>>>>>>> the current default address. It doesn't hurt, I think. So it may be
>> >>>>>>>> a better choice than relocating to a random address in high 32G
>> >>>>>>>> space (when Java heap is in low 32G address space).
>> >>>>>>> Maybe we should reconsider this when we have more concrete data for
>> >>>>>>> the benefits of moving the compressed class space to under 32G.
>> >>>>>>>
>> >>>>>>> Please note that in metaspace.cpp, when CDS is disabled and  the VM
>> >>>>>>> fails to allocate the class space at the requested address
>> >>>>>>> (0x7c000000 for 16GB heap), it also just allocates from a random
>> >>>>>>> address (without trying to to search under 32GB):
>> >>>>>>>
>> >>>>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> This code has been there since 2013 and we have not seen any issues.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>>       Could you elaborate more about the performance issue, especially
>> >>>>>>>>       about
>> >>>>>>>>       cache locality? I looked at JDK-8213713 but it didn't mention about
>> >>>>>>>>       performance.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> When enabling CDS we noticed a small runtime overhead in JDK 11
>> >>>>>>>> recently with a benchmark. After I backported JDK-8213713 to 11, it
>> >>>>>>>> seemed to reduce the runtime overhead that the benchmark was
>> >>>>>>>> experiencing.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       Also, by default, we have non-zero narrow_klass_base and
>> >>>>>>>>       narrow_klass_shift = 3, and archive relocation doesn't change that:
>> >>>>>>>>
>> >>>>>>>>       $ java -Xlog:cds=debug -version
>> >>>>>>>>       ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>> >>>>>>>>       $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>> >>>>>>>>       ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>> >>>>>>>>
>> >>>>>>>>       We always use narrow_klass_shift due to this:
>> >>>>>>>>
>> >>>>>>>>          // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>> >>>>>>>>          //
>> >>>>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>> >>>>>>>>          // how dump time narrow_klass_shift is set. Although, CDS can
>> >>>>>>>> work
>> >>>>>>>>          // with zero-shift mode also, to be consistent with AOT it uses
>> >>>>>>>>          // LogKlassAlignmentInBytes for klass shift so archived java
>> >>>>>>>>       heap objects
>> >>>>>>>>          // can be used at same time as AOT code.
>> >>>>>>>>          if (!UseSharedSpaces
>> >>>>>>>>              && (uint64_t)(higher_address - lower_base) <=
>> >>>>>>>>       UnscaledClassSpaceMax) {
>> >>>>>>>>            CompressedKlassPointers::set_shift(0);
>> >>>>>>>>          } else {
>> >>>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>> >>>>>>>>          }
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Right. If we relocate to low 32G space, it needs to make sure that
>> >>>>>>>> the range containing the mapped class data and class space must be
>> >>>>>>>> encodable.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       > Here are some additional comments (minor).
>> >>>>>>>>       >
>> >>>>>>>>       > Could you please fix the long lines in the following?
>> >>>>>>>>       >
>> >>>>>>>>       > 1237 void
>> >>>>>>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>> >>>>>>>>       > archived_mirror) {
>> >>>>>>>>       > 1238   if (MetaspaceShared::relocation_delta() != 0) {
>> >>>>>>>>       > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
>> >>>>>>>>       > NULL, "must be for primitive class");
>> >>>>>>>>       > 1240
>> >>>>>>>>       > 1241     Klass* ak =
>> >>>>>>>>       > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>> >>>>>>>>       > 1242     if (ak != NULL) {
>> >>>>>>>>       > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
>> >>>>>>>>       > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>> >>>>>>>>       > 1244     }
>> >>>>>>>>       > 1245   }
>> >>>>>>>>       > 1246 }
>> >>>>>>>>       >
>> >>>>>>>>       > src/hotspot/share/memory/dynamicArchive.cpp
>> >>>>>>>>       >
>> >>>>>>>>       >   889   Thread* THREAD = Thread::current();
>> >>>>>>>>       >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>> >>>>>>>>       > dynamic_dump_method_comparator);
>> >>>>>>>>       >   891   if (ik->default_methods() != NULL) {
>> >>>>>>>>       >   892  Method::sort_methods(ik->default_methods(),
>> >>>>>>>>       > /*set_idnums=*/false, dynamic_dump_method_comparator);
>> >>>>>>>>       >   893   }
>> >>>>>>>>       >
>> >>>>>>>>
>> >>>>>>>>       OK will do.
>> >>>>>>>>
>> >>>>>>>>       > Please see inlined comments below.
>> >>>>>>>>       >
>> >>>>>>>>       > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>> >>>>>>>>       <mailto:ioi.lam at oracle.com>> wrote:
>> >>>>>>>>       >> Hi Jiangli,
>> >>>>>>>>       >>
>> >>>>>>>>       >> Thanks for the review. I've updated the patch according to your
>> >>>>>>>>       comments:
>> >>>>>>>>       >>
>> >>>>>>>>       >>
>> >>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>> >>>>>>>>
>> >>>>>>>>       >>
>> >>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>> >>>>>>>>
>> >>>>>>>>       >>
>> >>>>>>>>       >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>> >>>>>>>>       in my
>> >>>>>>>>       >> reply to Calvin's comments).
>> >>>>>>>>       >>
>> >>>>>>>>       >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>> >>>>>>>>       >>> Hi Ioi,
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Sorry for the delay. Here are my remaining comments.
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> - src/hotspot/share/memory/dynamicArchive.cpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 128   static intx _method_comparator_name_delta;
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> The name of the above variable is confusing. It's the value of
>> >>>>>>>>       >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>> >>>>>>>>       >>> directly.
>> >>>>>>>>       >> _buffer_to_target_delta is a non-static field, but
>> >>>>>>>>       >> dynamic_dump_method_comparator() must be a static function so
>> >>>>>>>>       it can't
>> >>>>>>>>       >> use the non-static field easily.
>> >>>>>>>>       >
>> >>>>>>>>       > It sounds like an issue. _buffer_to_target_delta was made as a
>> >>>>>>>>       > non-static mostly because we might support more than one dynamic
>> >>>>>>>>       > archives in the future. However, today's usages bake in an
>> >>>>>>>>       assumption
>> >>>>>>>>       > that _buffer_to_target_delta is a singleton value. It is
>> >>>>>>>> cleaner to
>> >>>>>>>>       > either make _buffer_to_target_delta as a static variable for
>> >>>>>>>> now, or
>> >>>>>>>>       > adding an access API in DynamicArchiveBuilder to allow other
>> >>>>>>>> code to
>> >>>>>>>>       > properly and correctly use the value.
>> >>>>>>>>
>> >>>>>>>>       OK, I'll move it to a static variable.
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >>> Also, we can do a quick pointer comparison of 'a_name' and
>> >>>>>>>>       >>> 'b_name' first before adjusting the pointers.
>> >>>>>>>>       >> I added this:
>> >>>>>>>>       >>
>> >>>>>>>>       >>       if (a_name == b_name) {
>> >>>>>>>>       >>         return 0;
>> >>>>>>>>       >>       }
>> >>>>>>>>       >>
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>> >>>>>>>>       >>> ...
>> >>>>>>>>       >>>    944
>> >>>>>>>>       >>>    945  ArchivePtrMarker::compact(relocatable_base,
>> >>>>>>>>       relocatable_end);
>> >>>>>>>>       >>> ...
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>    974     SharedDataRelocator patcher((address*)patch_base,
>> >>>>>>>>       >>> (address*)patch_end, valid_old_base, valid_old_end,
>> >>>>>>>>       >>>    975  valid_new_base, valid_new_end, addr_delta);
>> >>>>>>>>       >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Could we reduce the number of data re-iterations to help
>> >>>>>>>> archive
>> >>>>>>>>       >>> dumping performance. The ArchivePtrMarker::compact operation
>> >>>>>>>>       can be
>> >>>>>>>>       >>> combined with the patching iteration.
>> >>>>>>>>       ArchivePtrMarker::compact API
>> >>>>>>>>       >>> can be removed.
>> >>>>>>>>       >> That's a good idea. I implemented it using a template parameter
>> >>>>>>>>       so that
>> >>>>>>>>       >> we can have max performance when relocating the archive at run
>> >>>>>>>>       time.
>> >>>>>>>>       >>
>> >>>>>>>>       >> I added comments to explain why the relocation is done here. The
>> >>>>>>>>       >> relocation is pretty rare (only when the base archive was not
>> >>>>>>>>       mapped at
>> >>>>>>>>       >> the default location).
>> >>>>>>>>       >>
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>    967     address valid_new_base =
>> >>>>>>>>       >>> (address)Arguments::default_SharedBaseAddress();
>> >>>>>>>>       >>>    968     address valid_new_end  = valid_new_base +
>> >>>>>>>>       base_plus_top_size;
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> The debugging only code can be included under #ifdef ASSERT.
>> >>>>>>>>       >> These values are actually also used in debug logging so they
>> >>>>>>>>       can't be
>> >>>>>>>>       >> ifdef'ed out.
>> >>>>>>>>       >>
>> >>>>>>>>       >> Also, the c++ compiler is pretty good with eliding code
>> >>>>>>>> that's no
>> >>>>>>>>       >> actually used. If I comment out all the logging code in
>> >>>>>>>>       >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>> >>>>>>>>       >> SharedDataRelocator, gcc elides all the unused fields and their
>> >>>>>>>>       >> assignments. So no code is generated for this, etc.
>> >>>>>>>>       >>
>> >>>>>>>>       >>       address valid_new_base =
>> >>>>>>>>       >> (address)Arguments::default_SharedBaseAddress();
>> >>>>>>>>       >>
>> >>>>>>>>       >> Since #ifdef ASSERT makes the code harder to read, I think we
>> >>>>>>>>       should use
>> >>>>>>>>       >> it only when really necessary.
>> >>>>>>>>       > It seems cleaner to get rid of these debugging only variables, by
>> >>>>>>>>       > using 'relocatable_base' and
>> >>>>>>>>       > '(address)Arguments::default_SharedBaseAddress()' in the logging
>> >>>>>>>>       code.
>> >>>>>>>>
>> >>>>>>>>       SharedDataRelocator is used under 3 different situations. These six
>> >>>>>>>>       variables (patch_base, patch_end, valid_old_base, valid_old_end,
>> >>>>>>>>       valid_new_base, valid_new_end) describes what is being patched,
>> >>>>>>>>       and what
>> >>>>>>>>       the expectations are, for each situation. The code will be hard to
>> >>>>>>>>       understand without them.
>> >>>>>>>>
>> >>>>>>>>       Please note there's also logging code in the SharedDataRelocator
>> >>>>>>>>       constructor that prints out these values.
>> >>>>>>>>
>> >>>>>>>>       I think I'll just remove the 'debug only' comment to avoid
>> >>>>>>>> confusion.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Ok.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>    993
>> >>>>>>>>    dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> We could combine the archived heap data bitmap into the new
>> >>>>>>>>       region as
>> >>>>>>>>       >>> well? It can be handled as a separate RFE.
>> >>>>>>>>       >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - src/hotspot/share/memory/filemap.cpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1038     if (is_static()) {
>> >>>>>>>>       >>> 1039       if (errno == ENOENT) {
>> >>>>>>>>       >>> 1040         // Not locating the shared archive is ok.
>> >>>>>>>>       >>> 1041         fail_continue("Specified shared archive not found
>> >>>>>>>>       (%s).",
>> >>>>>>>>       >>> _full_path);
>> >>>>>>>>       >>> 1042       } else {
>> >>>>>>>>       >>> 1043         fail_continue("Failed to open shared archive file
>> >>>>>>>>       (%s).",
>> >>>>>>>>       >>> 1044  os::strerror(errno));
>> >>>>>>>>       >>> 1045       }
>> >>>>>>>>       >>> 1046     } else {
>> >>>>>>>>       >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
>> >>>>>>>>       >>> doesn't exist: %s", _full_path);
>> >>>>>>>>       >>> 1048     }
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> If the top layer is explicitly specified by the user, a
>> >>>>>>>>       warning does
>> >>>>>>>>       >>> not seem to be a proper behavior if the VM fails to open the
>> >>>>>>>>       archive
>> >>>>>>>>       >>> file.
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> If might be better to handle the relocation unrelated code in
>> >>>>>>>>       separate
>> >>>>>>>>       >>> changeset and track with a separate RFE.
>> >>>>>>>>       >> This code was moved from
>> >>>>>>>>       >>
>> >>>>>>>>       >>
>> >>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>> >>>>>>>>
>> >>>>>>>>       >>
>> >>>>>>>>       >> so I am not changing the behavior. If you want, we can file an
>> >>>>>>>>       REF to
>> >>>>>>>>       >> change the behavior.
>> >>>>>>>>       > Ok. A new RFE sounds like the right thing to re-evaluable the
>> >>>>>>>> usage
>> >>>>>>>>       > issue here. Thanks.
>> >>>>>>>>
>> >>>>>>>>       I created https://bugs.openjdk.java.net/browse/JDK-8233446
>> >>>>>>>>
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1148 void FileMapInfo::write_region(int region, char* base,
>> >>>>>>>>       size_t size,
>> >>>>>>>>       >>> 1149                                bool read_only, bool
>> >>>>>>>>       allow_exec) {
>> >>>>>>>>       >>> ...
>> >>>>>>>>       >>> 1154
>> >>>>>>>>       >>> 1155   if (region == MetaspaceShared::bm) {
>> >>>>>>>>       >>> 1156     target_base = NULL;
>> >>>>>>>>       >>> 1157   } else if (DynamicDumpSharedSpaces) {
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> It's not too clear to me how the bitmap (bm) region is handled
>> >>>>>>>>       for the
>> >>>>>>>>       >>> base layer and top layer. Could you please explain?
>> >>>>>>>>       >> The bm region for both layers are mapped at an address picked
>> >>>>>>>>       by the OS:
>> >>>>>>>>       >>
>> >>>>>>>>       >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>> >>>>>>>>       >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
>> >>>>>>>>       >>     bitmap_size = si->used_aligned();
>> >>>>>>>>       >>     bool read_only = true, allow_exec = false;
>> >>>>>>>>       >>     char* requested_addr = NULL; // allow OS to pick any
>> >>>>>>>> location
>> >>>>>>>>       >>     char* bitmap_base = os::map_memory(_fd, _full_path,
>> >>>>>>>>       si->file_offset(),
>> >>>>>>>>       >> requested_addr, bitmap_size,
>> >>>>>>>>       >> read_only, allow_exec);
>> >>>>>>>>       >>
>> >>>>>>>>       > Ok, after staring at the code for a few seconds I saw that's
>> >>>>>>>>       intended.
>> >>>>>>>>       > If the current region is 'bm', then the 'target_base' is NULL
>> >>>>>>>>       > regardless if it's static or dynamic archive. Otherwise, the
>> >>>>>>>>       > 'target_base' is handled differently for the static and dynamic
>> >>>>>>>>       case.
>> >>>>>>>>       > The following would be cleaner and has better reliability.
>> >>>>>>>>       >
>> >>>>>>>>       >     char* target_base = NULL;
>> >>>>>>>>       >
>> >>>>>>>>       >     // The target_base is NULL for 'bm' region.
>> >>>>>>>>       >     if (!region == MetaspaceShared::bm) {
>> >>>>>>>>       >       if (DynamicDumpSharedSpaces) {
>> >>>>>>>>       >         assert(!HeapShared::is_heap_region(region), "dynamic
>> >>>>>>>> archive
>> >>>>>>>>       > doesn't support heap regions");
>> >>>>>>>>       >         target_base = DynamicArchive::buffer_to_target(base);
>> >>>>>>>>       >       } else {
>> >>>>>>>>       >         target_base = base;
>> >>>>>>>>       >       }
>> >>>>>>>>       >    }
>> >>>>>>>>
>> >>>>>>>>       How about this?
>> >>>>>>>>
>> >>>>>>>>          char* target_base;
>> >>>>>>>>          if (region == MetaspaceShared::bm) {
>> >>>>>>>>            target_base = NULL; // always NULL for bm region.
>> >>>>>>>>          } else {
>> >>>>>>>>            if (DynamicDumpSharedSpaces) {
>> >>>>>>>>                assert(!HeapShared::is_heap_region(region), "dynamic
>> >>>>>>>> archive
>> >>>>>>>>       doesn't support heap regions");
>> >>>>>>>>                target_base = DynamicArchive::buffer_to_target(base);
>> >>>>>>>>            } else {
>> >>>>>>>>                target_base = base;
>> >>>>>>>>            }
>> >>>>>>>>          }
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> No objection If you prefer the extra 'else' block.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1362
>> >>>>>>>>    DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>> >>>>>>>>
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Could you please explain the above?
>> >>>>>>>>       >> I added the comments
>> >>>>>>>>       >>
>> >>>>>>>>       >>     // Make sure we don't attempt to use
>> >>>>>>>>       header()->mapped_base_address()
>> >>>>>>>>       >> unless
>> >>>>>>>>       >>     // it's been successfully mapped.
>> >>>>>>>>       >>
>> >>>>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>> >>>>>>>>
>> >>>>>>>>       >>
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1359   FileMapRegion* last_region = NULL;
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1371     if (last_region != NULL) {
>> >>>>>>>>       >>> 1372       // Ensure that the OS won't be able to allocate new
>> >>>>>>>>       memory
>> >>>>>>>>       >>> spaces between any mapped
>> >>>>>>>>       >>> 1373       // regions, or else it would mess up the simple
>> >>>>>>>>       comparision
>> >>>>>>>>       >>> in MetaspaceObj::is_shared().
>> >>>>>>>>       >>> 1374       assert(si->mapped_base() ==
>> >>>>>>>> last_region->mapped_end(),
>> >>>>>>>>       >>> "must have no gaps");
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1379     last_region = si;
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Can you please place 'last_region' related code under #ifdef
>> >>>>>>>>       ASSERT?
>> >>>>>>>>       >> I think that will make the code more cluttered. The compiler
>> >>>>>>>> will
>> >>>>>>>>       >> optimize out that away.
>> >>>>>>>>       > It's cleaner to define debugging only variable for debugging only
>> >>>>>>>>       > builds. You can wrapper it and related usage with DEBUG_ONLY.
>> >>>>>>>>
>> >>>>>>>>       OK, will do.
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>> >>>>>>>>       bitmap_size) {
>> >>>>>>>>       >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
>> >>>>>>>>       >>> 1480   bitmap_size = si->used_aligned();
>> >>>>>>>>       >>> 1481   bool read_only = true, allow_exec = false;
>> >>>>>>>>       >>> 1482   char* requested_addr = NULL; // allow OS to pick any
>> >>>>>>>>       location
>> >>>>>>>>       >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
>> >>>>>>>>       si->file_offset(),
>> >>>>>>>>       >>> 1484 requested_addr, bitmap_size,
>> >>>>>>>>       >>> read_only, allow_exec);
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> We need to handle mapping failure here.
>> >>>>>>>>       >> It's handled here:
>> >>>>>>>>       >>
>> >>>>>>>>       >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>> >>>>>>>>       >>     log_debug(cds, reloc)("runtime archive relocation start");
>> >>>>>>>>       >>     size_t bitmap_size;
>> >>>>>>>>       >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
>> >>>>>>>>       >>     if (bitmap_base != NULL) {
>> >>>>>>>>       >>     ...
>> >>>>>>>>       >>     } else {
>> >>>>>>>>       >>       log_error(cds)("failed to map relocation bitmap");
>> >>>>>>>>       >>       return false;
>> >>>>>>>>       >>     }
>> >>>>>>>>       >>
>> >>>>>>>>       > 'bitmap_base' is used immediately after map_memory(). So the
>> >>>>>>>> check
>> >>>>>>>>       > needs to be done immediately after map_memory(), but not in the
>> >>>>>>>>       caller
>> >>>>>>>>       > of map_relocation_bitmap().
>> >>>>>>>>       >
>> >>>>>>>>       > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
>> >>>>>>>>       si->file_offset(),
>> >>>>>>>>       > 1491 requested_addr, bitmap_size,
>> >>>>>>>>       > read_only, allow_exec);
>> >>>>>>>>       > 1492
>> >>>>>>>>       > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
>> >>>>>>>>       > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>> >>>>>>>>
>> >>>>>>>>       OK, I'll fix that.
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1513     // debug only -- the current value of the pointers
>> >>>>>>>> to be
>> >>>>>>>>       >>> patched must be within this
>> >>>>>>>>       >>> 1514     // range (i.e., must be between the requesed base
>> >>>>>>>>       address,
>> >>>>>>>>       >>> and the of the current archive).
>> >>>>>>>>       >>> 1515     // Note: top archive may point to objects in the base
>> >>>>>>>>       >>> archive, but not the other way around.
>> >>>>>>>>       >>> 1516     address valid_old_base =
>> >>>>>>>>       (address)header()->requested_base_address();
>> >>>>>>>>       >>> 1517     address valid_old_end  = valid_old_base +
>> >>>>>>>>       mapping_end_offset();
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Please place all FileMapInfo::relocate_pointers debugging only
>> >>>>>>>>       code
>> >>>>>>>>       >>> under #ifdef ASSERT.
>> >>>>>>>>       >> Ditto about ifdef ASSERT
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - src/hotspot/share/memory/heapShared.cpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>    441 void
>> >>>>>>>>       HeapShared::initialize_from_archived_subgraph(Klass* k) {
>> >>>>>>>>       >>>    442   if (!open_archive_heap_region_mapped() ||
>> >>>>>>>>       !MetaspaceObj::is_shared(k)) {
>> >>>>>>>>       >>>    443     return; // nothing to do
>> >>>>>>>>       >>>    444   }
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> When do we call HeapShared::initialize_from_archived_subgraph
>> >>>>>>>>       for a
>> >>>>>>>>       >>> klass that's not shared?
>> >>>>>>>>       >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>> >>>>>>>>       that for
>> >>>>>>>>       >> debugging purposes only.
>> >>>>>>>>       >>
>> >>>>>>>>       >>>    616   DEBUG_ONLY({
>> >>>>>>>>       >>>    617       Klass* klass = orig_obj->klass();
>> >>>>>>>>       >>>    618       assert(klass !=
>> >>>>>>>> SystemDictionary::Module_klass() &&
>> >>>>>>>>       >>>    619              klass !=
>> >>>>>>>>       SystemDictionary::ResolvedMethodName_klass() &&
>> >>>>>>>>       >>>    620              klass !=
>> >>>>>>>>       SystemDictionary::MemberName_klass() &&
>> >>>>>>>>       >>>    621              klass !=
>> >>>>>>>> SystemDictionary::Context_klass() &&
>> >>>>>>>>       >>>    622              klass !=
>> >>>>>>>>       SystemDictionary::ClassLoader_klass(), "we
>> >>>>>>>>       >>> can only relocate metaspace object pointers inside
>> >>>>>>>> java_lang_Class
>> >>>>>>>>       >>> instances");
>> >>>>>>>>       >>>    623     });
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Let's leave the above for a separate RFE. I think assert is not
>> >>>>>>>>       >>> sufficient for the check. Also, why ResolvedMethodName,
>> >>>>>>>> Module and
>> >>>>>>>>       >>> MemberName cannot be part of the graph?
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>
>> >>>>>>>>       >> I added the following comment:
>> >>>>>>>>       >>
>> >>>>>>>>       >>     DEBUG_ONLY({
>> >>>>>>>>       >>         // The following are classes in
>> >>>>>>>>       share/classfile/javaClasses.cpp
>> >>>>>>>>       >> that have injected native pointers
>> >>>>>>>>       >>         // to metaspace objects. To support these classes, we
>> >>>>>>>>       need to add
>> >>>>>>>>       >> relocation code similar to
>> >>>>>>>>       >>         //
>> >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
>> >>>>>>>>       >>         Klass* klass = orig_obj->klass();
>> >>>>>>>>       >>         assert(klass != SystemDictionary::Module_klass() &&
>> >>>>>>>>       >>                klass !=
>> >>>>>>>>       SystemDictionary::ResolvedMethodName_klass() &&
>> >>>>>>>>       >>
>> >>>>>>>>       > It's too restrictive to exclude those objects from the archived
>> >>>>>>>>       object
>> >>>>>>>>       > graph because metadata relocation, since metadata relocation is
>> >>>>>>>>       rare.
>> >>>>>>>>       > The trade-off doesn't seem to buy us much.
>> >>>>>>>>       >
>> >>>>>>>>       > Do you plan to add the needed relocation code?
>> >>>>>>>>
>> >>>>>>>>       I looked more into this. Actually we cannot handle these 5
>> >>>>>>>> classes at
>> >>>>>>>>       all, even without archive relocation:
>> >>>>>>>>
>> >>>>>>>>       [1] #define MODULE_INJECTED_FIELDS(macro) \
>> >>>>>>>>          macro(java_lang_Module, module_entry, intptr_signature, false)
>> >>>>>>>>
>> >>>>>>>>       ->  module_entry is malloc'ed
>> >>>>>>>>
>> >>>>>>>>       [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>> >>>>>>>>          macro(java_lang_invoke_ResolvedMethodName, vmholder,
>> >>>>>>>>       object_signature, false) \
>> >>>>>>>>          macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>> >>>>>>>>       intptr_signature, false)
>> >>>>>>>>
>> >>>>>>>>       -> these fields are related to method handles and lambda forms,
>> >>>>>>>> etc.
>> >>>>>>>>       They can't be easily be archived without implementing lambda form
>> >>>>>>>>       archiving. (I did a prototype; it's very complex and fragile).
>> >>>>>>>>
>> >>>>>>>>       [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>> >>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>> >>>>>>>>       vmdependencies, intptr_signature, false) \
>> >>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>> >>>>>>>>       last_cleanup, long_signature, false)
>> >>>>>>>>
>> >>>>>>>>       -> vmdependencies is malloc'ed.
>> >>>>>>>>
>> >>>>>>>>       [4] #define
>> >>>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>> >>>>>>>>          macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>> >>>>>>>>       false)
>> >>>>>>>>
>> >>>>>>>>       -> this one is probably OK. Despite being declared as
>> >>>>>>>>       'intptr_signature', it seems to be used just as an integer.
>> >>>>>>>> However,
>> >>>>>>>>       MemberNames are typically used with [2] and [3]. So let's just
>> >>>>>>>>       forbid it
>> >>>>>>>>       to be safe.
>> >>>>>>>>
>> >>>>>>>>       [2] [3] [4] are not used directly by regular Java code and are
>> >>>>>>>>       unlikely
>> >>>>>>>>       to be referenced (directly or indirectly) by static fields (except
>> >>>>>>>>       for
>> >>>>>>>>       the static fields in the classes in java.lang.invoke, which we
>> >>>>>>>>       probably
>> >>>>>>>>       won't support for heap archiving due to the problem I described for
>> >>>>>>>>       [2]). Objects of these types are typically referenced via constant
>> >>>>>>>>       pool
>> >>>>>>>>       entries.
>> >>>>>>>>
>> >>>>>>>>       [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>> >>>>>>>>          macro(java_lang_ClassLoader, loader_data, intptr_signature,
>> >>>>>>>> false)
>> >>>>>>>>
>> >>>>>>>>       -> loader_data is malloc'ed.
>> >>>>>>>>
>> >>>>>>>>       So, I will change the DEBUG_ONLY into a product-mode check, and
>> >>>>>>>> quit
>> >>>>>>>>       dumping if these objects are found in the object subgraph.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Sounds good. Can you please also add a comment with explanation.
>> >>>>>>>>
>> >>>>>>>> For  ClassLoader and Module, it worth considering caching the
>> >>>>>>>> additional native data some time in the future. Lois had suggested
>> >>>>>>>> the Module part a while ago.
>> >>>>>>> I think we can do that if/when we archive Modules directly into the
>> >>>>>>> shared heap.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       Maybe we should backport the check to older versions as well?
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> We should discuss with Andrew Haley for backports to JDK 11 update
>> >>>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
>> >>>>>>>> archiving to a restricted set of JDK library code, I think it is
>> >>>>>>>> safe without the new check.
>> >>>>>>>>
>> >>>>>>>> For non-LTS releases, it might not be worthwhile as they may not be
>> >>>>>>>> widely used?
>> >>>>>>> I agree. FYI, we (Oracle) have no plan for backporting more types of
>> >>>>>>> heap object archiving, so the decision would be up to whoever that
>> >>>>>>> decides to do so.
>> >>>>>>>
>> >>>>>>> Thanks
>> >>>>>>> - Ioi
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Jiangli
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >>> - src/hotspot/share/memory/metaspace.cpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1036   metaspace_rs =
>> >>>>>>>> ReservedSpace(compressed_class_space_size(),
>> >>>>>>>>       >>> 1037   _reserve_alignment,
>> >>>>>>>>       >>> 1038   large_pages,
>> >>>>>>>>       >>> 1039   requested_addr);
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Please fix indentation.
>> >>>>>>>>       >> Fixed.
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>     78   enum SpecialRef {
>> >>>>>>>>       >>>     79     _method_entry_ref
>> >>>>>>>>       >>>     80   };
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Are there other pointers that are not references to
>> >>>>>>>>       MetaspaceObj? If
>> >>>>>>>>       >>> _method_entry_ref is the only type, it's probably not worth
>> >>>>>>>>       defining
>> >>>>>>>>       >>> SpecialRef?
>> >>>>>>>>       >> There may be more types in the future, so I want to have a
>> >>>>>>>>       stable API
>> >>>>>>>>       >> that can be easily expanded without touching all the code that
>> >>>>>>>>       uses it.
>> >>>>>>>>       >>
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - src/hotspot/share/memory/metaspaceShared.hpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>     42 enum MapArchiveResult {
>> >>>>>>>>       >>>     43   MAP_ARCHIVE_SUCCESS,
>> >>>>>>>>       >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
>> >>>>>>>>       >>>     45   MAP_ARCHIVE_OTHER_FAILURE
>> >>>>>>>>       >>>     46 };
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> If we want to define different failure types, it's probably
>> >>>>>>>> worth
>> >>>>>>>>       >>> using separate types for relocation failure and validation
>> >>>>>>>>       failure.
>> >>>>>>>>       >> For now, I just need to distinguish between MMAP_FAILURE (where
>> >>>>>>>>       I should
>> >>>>>>>>       >> attempt to remap at an alternative address) and OTHER_FAILURE
>> >>>>>>>>       (where the
>> >>>>>>>>       >> CDS archive loading will fail -- due to validation error,
>> >>>>>>>>       insufficient
>> >>>>>>>>       >> memory, etc -- without attempting to remap.)
>> >>>>>>>>       >>
>> >>>>>>>>       >>> ---
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>    193   static intx _mapping_delta; // FIXME rename
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> How about _relocation_delta?
>> >>>>>>>>       >> Changed as suggested.
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - src/hotspot/share/oops/instanceKlass
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> The use of _disable_method_binary_search is not necessary. You
>> >>>>>>>>       can use
>> >>>>>>>>       >>> DynamicDumpSharedSpaces for the purpose. That would make things
>> >>>>>>>>       >>> cleaner.
>> >>>>>>>>       >> If we always disable the binary search when
>> >>>>>>>>       DynamicDumpSharedSpaces is
>> >>>>>>>>       >> true, it will slow down normal execution of the Java program
>> >>>>>>>> when
>> >>>>>>>>       >> -XX:ArchiveClassesAtExit has been specified, but the program
>> >>>>>>>>       hasn't exited.
>> >>>>>>>>       > Could you please add some comments to
>> >>>>>>>> _disable_method_binary_search
>> >>>>>>>>       > with the above explanation? Thanks.
>> >>>>>>>>
>> >>>>>>>>       OK
>> >>>>>>>>       >
>> >>>>>>>>       >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>     76                     if (name.equals("s0") ||
>> >>>>>>>>       name.equals("s1")) {
>> >>>>>>>>       >>>     77                       // String regions are listed at
>> >>>>>>>>       the end and
>> >>>>>>>>       >>> they may not be fully occupied.
>> >>>>>>>>       >>>     78                       break;
>> >>>>>>>>       >>>     79                     } else if (name.equals("bm")) {
>> >>>>>>>>       >>>     80                       // Bitmap space does not have a
>> >>>>>>>>       requested address.
>> >>>>>>>>       >>>     81                       break;
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> It's not part of your change, but could you please fix line 76
>> >>>>>>>>       - 78
>> >>>>>>>>       >>> since it is trivial. It seems the lines can be removed.
>> >>>>>>>>       >> Removed.
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - /src/hotspot/share/memory/archiveUtils.hpp
>> >>>>>>>>       >>> The file name does not match with the macro '#ifndef
>> >>>>>>>>       >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>> >>>>>>>>       >>> archiveUtils.* ? archiveRelocator.hpp and
>> >>>>>>>> archiveRelocator.cpp are
>> >>>>>>>>       >>> more descriptive.
>> >>>>>>>>       >> I named the file archiveUtils.hpp so we can move other misc
>> >>>>>>>>       stuff used
>> >>>>>>>>       >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>> >>>>>>>>       >> metaspaceShared.hpp), since theses are not used by the majority
>> >>>>>>>>       of the
>> >>>>>>>>       >> files that use metaspaceShared.hpp.
>> >>>>>>>>       >>
>> >>>>>>>>       >> I fixed the ifdef.
>> >>>>>>>>       >>
>> >>>>>>>>       >>> - src/hotspot/share/memory/archiveUtils.cpp
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>> >>>>>>>>       address*
>> >>>>>>>>       >>> ptr_base, address* ptr_end) {
>> >>>>>>>>       >>>     37   assert(_ptrmap == NULL, "initialize only once");
>> >>>>>>>>       >>>     38   _ptr_base = ptr_base;
>> >>>>>>>>       >>>     39   _ptr_end = ptr_end;
>> >>>>>>>>       >>>     40   _compacted = false;
>> >>>>>>>>       >>>     41   _ptrmap = ptrmap;
>> >>>>>>>>       >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>> >>>>>>>>       default
>> >>>>>>>>       >>> archive is about 12MB.
>> >>>>>>>>       >>>     43 }
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> Could we do a better estimate here? We could guesstimate the
>> >>>>>>>> size
>> >>>>>>>>       >>> based on the current used class space and metaspace size. It's
>> >>>>>>>>       okay if
>> >>>>>>>>       >>> a larger bitmap used, since it can be reduced after all
>> >>>>>>>>       marking are
>> >>>>>>>>       >>> done.
>> >>>>>>>>       >> The bitmap is automatically expanded when necessary in
>> >>>>>>>>       >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>> >>>>>>>>       of the
>> >>>>>>>>       >> total archive size, so even if we do expand, the cost will be
>> >>>>>>>>       trivial.
>> >>>>>>>>       > The initial value is based on the default CDS archive. When
>> >>>>>>>> dealing
>> >>>>>>>>       > with a really large archive, it would have to re-grow many times.
>> >>>>>>>>       > Also, using a hard-coded value is less desirable.
>> >>>>>>>>
>> >>>>>>>>       OK, I changed it to the following
>> >>>>>>>>
>> >>>>>>>>          // Use this as initial guesstimate. We should need less space
>> >>>>>>>>       in the
>> >>>>>>>>          // archive, but if we're wrong the bitmap will be expanded
>> >>>>>>>>       automatically.
>> >>>>>>>>          size_t estimated_archive_size =
>> >>>>>>>> MetaspaceGC::capacity_until_GC();
>> >>>>>>>>          // But set it smaller in debug builds so we always test the
>> >>>>>>>>       expansion
>> >>>>>>>>       code.
>> >>>>>>>>          // (Default archive is about 12MB).
>> >>>>>>>>          DEBUG_ONLY(estimated_archive_size = 6 * M);
>> >>>>>>>>
>> >>>>>>>>          // We need one bit per pointer in the archive.
>> >>>>>>>>          _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>       Thanks!
>> >>>>>>>>       - Ioi
>> >>>>>>>>
>> >>>>>>>>       >
>> >>>>>>>>       >>>
>> >>>>>>>>       >>>
>> >>>>>>>>       >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>> >>>>>>>>       <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>> >>>>>>>>       >>>> Hi Ioi,
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> This is another great step for CDS usability improvement.
>> >>>>>>>>       Thank you!
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> I have a high level question (or request): could we consider
>> >>>>>>>>       >>>> separating the relocation work for 'direct' class metadata
>> >>>>>>>>       from other
>> >>>>>>>>       >>>> types of metadata (such as the shared system dictionary,
>> >>>>>>>>       symbol table,
>> >>>>>>>>       >>>> etc)? Initially we only relocate the tables and other
>> >>>>>>>>       archived global
>> >>>>>>>>       >>>> data. When each archived class is being loaded, we can
>> >>>>>>>>       relocate all
>> >>>>>>>>       >>>> the pointers within the current class. We could find the
>> >>>>>>>>       segment (for
>> >>>>>>>>       >>>> the current class) in the bitmap and update the pointers
>> >>>>>>>>       within the
>> >>>>>>>>       >>>> segment. That way we can reduce initial startup costs and
>> >>>>>>>>       also avoid
>> >>>>>>>>       >>>> relocating class data that's not used at runtime. In some
>> >>>>>>>>       real world
>> >>>>>>>>       >>>> large systems, an archive may contain extremely large
>> >>>>>>>> number of
>> >>>>>>>>       >>>> classes.
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> Following are partial review comments so we can move things
>> >>>>>>>>       forward.
>> >>>>>>>>       >>>> Still going through the rest of the changes.
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> - src/hotspot/share/classfile/javaClasses.cpp
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> 1218 void
>> >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>> >>>>>>>>       >>>> archived_mirror) {
>> >>>>>>>>       >>>> 1219   Klass* k =
>> >>>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>> >>>>>>>>       >>>> 1220   if (k != NULL) { // k is NULL for the primitive
>> >>>>>>>>       classes such as
>> >>>>>>>>       >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>> >>>>>>>>       >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
>> >>>>>>>>       >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>> >>>>>>>>       >>>> 1222   }
>> >>>>>>>>       >>>> 1223 ...
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> Primitive type mirrors are handled separately. Could you
>> >>>>>>>>       please verify
>> >>>>>>>>       >>>> if this call path happens for primitive type mirror?
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> To answer my question above, looks like you added the
>> >>>>>>>>       following, which
>> >>>>>>>>       >>>> is to be used for primitive type mirrors. That seems to be
>> >>>>>>>>       the reason
>> >>>>>>>>       >>>> why update_archived_mirror_native_pointers is trying to also
>> >>>>>>>>       cover
>> >>>>>>>>       >>>> primitive type. It better to have a separate API for
>> >>>>>>>>       primitive type
>> >>>>>>>>       >>>> mirror, which is cleaner. And, we also can replace the above
>> >>>>>>>>       check at
>> >>>>>>>>       >>>> line 1220 to be an assert for regular mirrors.
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>> >>>>>>>>       >>>> +  do_oop(p);
>> >>>>>>>>       >>>> +  oop mirror = *p;
>> >>>>>>>>       >>>> +  if (mirror != NULL) {
>> >>>>>>>>       >>>> +
>> >>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
>> >>>>>>>>       >>>> +  }
>> >>>>>>>>       >>>> +}
>> >>>>>>>>       >>>> +
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> How about renaming update_archived_mirror_native_pointers to
>> >>>>>>>>       >>>> update_archived_mirror_klass_pointers.
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> It would be good to pass the current klass as an argument.
>> >>>>>>>> We can
>> >>>>>>>>       >>>> verify the relocated pointer matches with the current klass
>> >>>>>>>>       pointer.
>> >>>>>>>>       >>>>
>> >>>>>>>>       >>>> We should also check if relocation is necessary before
>> >>>>>>>>       spending cycles
>> >>>>>>>>       >>>> to obtain the
>
>

From coleen.phillimore at oracle.com  Wed Nov 13 15:54:24 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 13 Nov 2019 10:54:24 -0500
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <CALrW1jxDLNH3Mp89YuU++NSnjo=OQGBdH1OSUJa9SZO6pjMo2A@mail.gmail.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
 <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
 <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>
 <CALrW1jye1Oua7e3LCNV6-c_pkYa3Ujni7own-ntXaFqv8tM6-Q@mail.gmail.com>
 <0fec66c6-b8a2-6019-655b-467f84404386@oracle.com>
 <CALrW1jxDLNH3Mp89YuU++NSnjo=OQGBdH1OSUJa9SZO6pjMo2A@mail.gmail.com>
Message-ID: <84c8f6aa-f715-4915-1928-69c4131528cb@oracle.com>


I agree, the new diagnostic option looks good.?? Better than 
SharedBaseAddress=0.

Thanks,
Coleen

On 11/13/19 10:37 AM, Jiangli Zhou wrote:
> Look good!
>
> Best,
> Jiangli
>
> On Tue, Nov 12, 2019 at 9:12 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>>
>> On 11/10/19 5:14 PM, Jiangli Zhou wrote:
>>
>>
>>
>> On Sun, Nov 10, 2019, 3:13 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>>
>>> On 11/9/19 8:25 PM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>>
>>>> On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>> Hi Jiangli,
>>>>>
>>>>> Thanks for your comments. Please see my replies in-line:
>>>>>
>>>>> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
>>>>>> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>>>>>>> I looked both 05.full and 06.delta webrevs. They look good.
>>>>>>>
>>>>>>> I still feel a bit uneasy about the potential runtime impact when data
>>>>>>> does get relocated. Long running apps/services may be shy away from
>>>>>>> enabling archive at runtime, if there is a detectable overhead even
>>>>>>> though it may only occur rarely. As relocation is enabled by default
>>>>>>> and users cannot turn it off, disabling with -Xshare:off entirely
>>>>>>> would become the only choice. Could you please create a new RFE
>>>>>>> (possibly with higher priority) to investigate the potential effect,
>>>>>>> or provide an option for users to opt-in relocation with the
>>>>>>> command-line switch?
>>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233862
>>>>> Investigate performance benefit of relocating CDS archive to under 32G
>>>>>
>>>>> As I noted in the bug report, I ran benchmarks with CDS relocation
>>>>> on/off, and there's no sign of regression when the CDS archive is
>>>>> relocated. Please see the bug report for how to configure the VM to do
>>>>> the comparison.
>>>>>
>>>>> As you said before: "When enabling CDS we [google] noticed a small
>>>>> runtime overhead in JDK 11 recently with a benchmark. After I backported
>>>>> JDK-8213713 to 11, it seemed to reduce the runtime overhead that the
>>>>> benchmark was experiencing":
>>>>>
>>>>> Can you confirm whether this is stock JDK 11 or a special google build?
>>>>> Which test case did you use? Is it possible for you to run the tests
>>>>> again (using the exact before/after bits that you had when backporting
>>>>> JDK-8213713)? Can you check if narrow_klass_base and narrow_klass_shift
>>>>> are the same in your before/after builds?
>>>> Thanks for creating the RFE.
>>>>
>>>> JDK-8213713 closes the 1G gap between the shared space and class space
>>>> and everything else is unaffected. The compressed class base and shift
>>>> were the same for before and after applying JDK-8213713. The effect
>>>> was statistically observed for the benchmark since the difference was
>>>> very small and could be within noise level for single run comparison.
>>>> A small difference could still be important for some use cases so it
>>>> needs to be taken into consideration when designing and implementing
>>>> new changes.
>>> Hi Jiangli,
>>>
>>> Thanks for taking the time for doing the performance measurements.
>>>
>>> I also ran benchmarks in all 3 modes (no CDS, CDS without relocation,
>>> CDS with relocation), and did not see any significant performance with
>>> Octane-DeltaBlue, Octane-NavierStokes, SPECjbb2005-Tuned,
>>> JFR-SPECjbb2005-Tuned, SPECjvm2008-Serial-G1 and Tools-Javac-Hello.
>>>
>>>
>>>> A new command-line for archived metadata relocation may still be
>>>> valuable. It would also be helpful for debugging and diagnosis.
>>>>
>>> How about a diagnostic flag ArchiveRelocationMode:
>>>
>>> 0: (default) first map at preferred address, and if unsuccessful, map to
>>> alternative address;
>>> 1: always map to alternative address;
>>> 2: always map at preferred address, and if unsuccessful, do not map the
>>> archive;
>>>
>>> 1 is for testing relocation, as well as for easy performance measurement
>>> (replaces the use of -XX:SharedBaseAddress=0 in my current patch.).
>>> 2 is for avoiding potential regression that may be introduced by
>>> relocation (revert to JDK 13 behavior).
>>>
>>> What do you think? If you like this I'll open a CSR.
>>
>>
>> That sounds good to me!
>>
>>
>> Hi Jiangli,
>>
>> It turns out that CSR is not needed for adding a diagnostic flag.
>>
>> I implemented the flag as described above. See:
>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v07-delta/
>>
>>
>> Thanks
>> - Ioi
>>
>>
>> Regards,
>> Jiangli
>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>
>>>>>> Forgot to say that when Java heap can fit into low 32G space, it takes
>>>>>> the class space size into account and leaves need space right above
>>>>>> (also in low 32G space) when reserving heap, for !UseSharedSpace. In
>>>>>> that case, it's more likely the class data and heap data can be
>>>>>> colocated successfully.
>>>>> The reason is not for "colocation". It's so that narrow_klass_base can
>>>>> be zero, and the klass pointer can be uncompressed with a shift (without
>>>>> also doing an addition).
>>>>>
>>>>> But with CDS enabled, we always hard code to use non-zero
>>>>> narrow_klass_base and 3 bit shift (for AOT). So by just relocating the
>>>>> CDS archive to under 32GB, without modifying how CDS handles
>>>>> narrow_klass_base/shift, I don't think we can expect any benefit.
>>>> I experimented with mapping the shared space in low 32G and placed
>>>> right above the Java heap. The class space was also allocated in the
>>>> low 32G space and after the mapped shared space in the experiment. The
>>>> compress class encoding was using 0 base and 3 shift, which was the
>>>> same as the encoding when CDS was disabled. I didn't observe runtime
>>>> performance difference when comparing that specific configuration with
>>>> the normal CDS mapping scheme (the shared space start at 32G and the
>>>> encoding is non-zero base and 3 shift).
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>> For modern architectures, I am not aware of any inherent speed benefit
>>>>> simply by putting data (in our case much larger than a page) "close to
>>>>> each other" in the virtual address space. If you have any reference of
>>>>> that, please let me know.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>>> Regards,
>>>>>>> Jiangli
>>>>>>>
>>>>>>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> Thanks for the review. Here's an webrev that has incorporated your
>>>>>>>> suggestions:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/
>>>>>>>>
>>>>>>>> Please see comments in-line
>>>>>>>>
>>>>>>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Hi, I've done a more high level code review of this and it looks good!
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think these classes require comments on what they do and why. The
>>>>>>>>> comments you sent me offline look good.
>>>>>>>> I added more comments for ArchivePtrMarker::_compacted per your offline
>>>>>>>> request.
>>>>>>>>
>>>>>>>>> Also .hpp files shouldn't include .inline.hpp files, like
>>>>>>>>> bitMap.inline.hpp.  Hopefully it's just a case of moving do_bit() into
>>>>>>>>> the cpp file.
>>>>>>>> I moved the do_bit() function into archiveUtils.inline.hpp, since is
>>>>>>>> used by 3 .cpp files, and performance is important.
>>>>>>>>
>>>>>>>>> I wonder if the exception list of classes to exclude should be a
>>>>>>>>> function in javaClasses.hpp/cpp where the explanation would make more
>>>>>>>>> sense?  ie bool
>>>>>>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>>>>>>>> I moved the checking code to javaClasses.cpp. Since we do (partially)
>>>>>>>> support java.lang.Class, which has injected native pointers, I named the
>>>>>>>> function as JavaClasses::is_supported_for_archiving instead. I also
>>>>>>>> massaged the comments a little for clarification.
>>>>>>>>
>>>>>>>>> Is there already an RFE to move the DumpSharedSpaces output from
>>>>>>>>> tty->print() to log_info() ?
>>>>>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 (Change CDS
>>>>>>>> dumping tty->print_cr() to unified logging).
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
>>>>>>>>>> Hi Jiangli,
>>>>>>>>>>
>>>>>>>>>> I've uploaded the webrev after integrating your comments:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please see more replies below:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>>>>>>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>>>>>> <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>        Hi Jiangli,
>>>>>>>>>>>
>>>>>>>>>>>        Thank you so much for spending time reviewing this RFE!
>>>>>>>>>>>
>>>>>>>>>>>        On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>>>>>>>>>>>        > Hi Ioi,
>>>>>>>>>>>        >
>>>>>>>>>>>        > Sorry for the delay again. Will try to put this on the top of my
>>>>>>>>>>>        list
>>>>>>>>>>>        > next week and reduce the turn-around time. The updates look
>>>>>>>>>>> good in
>>>>>>>>>>>        > general.
>>>>>>>>>>>        >
>>>>>>>>>>>        > We might want to have a better strategy when choosing metadata
>>>>>>>>>>>        > relocation address (when relocation is needed). Some
>>>>>>>>>>>        > applications/benchmarks may be more sensitive to cache
>>>>>>>>>>> locality and
>>>>>>>>>>>        > memory/data layout. There was a bug,
>>>>>>>>>>>        > https://bugs.openjdk.java.net/browse/JDK-8213713 that caused
>>>>>>>>>>> 1G gap
>>>>>>>>>>>        > between Java heap data and metadata before JDK 12. The gap
>>>>>>>>>>> seemed to
>>>>>>>>>>>        > cause a small but noticeable runtime effect in one case that I
>>>>>>>>>>> came
>>>>>>>>>>>        > across.
>>>>>>>>>>>
>>>>>>>>>>>        I guess you're saying we should try to relocate the archive into
>>>>>>>>>>>        somewhere under 32GB?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I don't yet have sufficient data that determins if mapping at low
>>>>>>>>>>> 32G produces better runtime performance. I experimented with that,
>>>>>>>>>>> but didn't see noticeable difference when comparing to mapping at
>>>>>>>>>>> the current default address. It doesn't hurt, I think. So it may be
>>>>>>>>>>> a better choice than relocating to a random address in high 32G
>>>>>>>>>>> space (when Java heap is in low 32G address space).
>>>>>>>>>> Maybe we should reconsider this when we have more concrete data for
>>>>>>>>>> the benefits of moving the compressed class space to under 32G.
>>>>>>>>>>
>>>>>>>>>> Please note that in metaspace.cpp, when CDS is disabled and  the VM
>>>>>>>>>> fails to allocate the class space at the requested address
>>>>>>>>>> (0x7c000000 for 16GB heap), it also just allocates from a random
>>>>>>>>>> address (without trying to to search under 32GB):
>>>>>>>>>>
>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This code has been there since 2013 and we have not seen any issues.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>        Could you elaborate more about the performance issue, especially
>>>>>>>>>>>        about
>>>>>>>>>>>        cache locality? I looked at JDK-8213713 but it didn't mention about
>>>>>>>>>>>        performance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When enabling CDS we noticed a small runtime overhead in JDK 11
>>>>>>>>>>> recently with a benchmark. After I backported JDK-8213713 to 11, it
>>>>>>>>>>> seemed to reduce the runtime overhead that the benchmark was
>>>>>>>>>>> experiencing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        Also, by default, we have non-zero narrow_klass_base and
>>>>>>>>>>>        narrow_klass_shift = 3, and archive relocation doesn't change that:
>>>>>>>>>>>
>>>>>>>>>>>        $ java -Xlog:cds=debug -version
>>>>>>>>>>>        ... narrow_klass_base = 0x0000000800000000, narrow_klass_shift = 3
>>>>>>>>>>>        $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>>>>>>>>>>>        ... narrow_klass_base = 0x00007f1e8b499000, narrow_klass_shift = 3
>>>>>>>>>>>
>>>>>>>>>>>        We always use narrow_klass_shift due to this:
>>>>>>>>>>>
>>>>>>>>>>>           // CDS uses LogKlassAlignmentInBytes for narrow_klass_shift. See
>>>>>>>>>>>           //
>>>>>>>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() for
>>>>>>>>>>>           // how dump time narrow_klass_shift is set. Although, CDS can
>>>>>>>>>>> work
>>>>>>>>>>>           // with zero-shift mode also, to be consistent with AOT it uses
>>>>>>>>>>>           // LogKlassAlignmentInBytes for klass shift so archived java
>>>>>>>>>>>        heap objects
>>>>>>>>>>>           // can be used at same time as AOT code.
>>>>>>>>>>>           if (!UseSharedSpaces
>>>>>>>>>>>               && (uint64_t)(higher_address - lower_base) <=
>>>>>>>>>>>        UnscaledClassSpaceMax) {
>>>>>>>>>>>             CompressedKlassPointers::set_shift(0);
>>>>>>>>>>>           } else {
>>>>>>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>>>>>>>>>>>           }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Right. If we relocate to low 32G space, it needs to make sure that
>>>>>>>>>>> the range containing the mapped class data and class space must be
>>>>>>>>>>> encodable.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        > Here are some additional comments (minor).
>>>>>>>>>>>        >
>>>>>>>>>>>        > Could you please fix the long lines in the following?
>>>>>>>>>>>        >
>>>>>>>>>>>        > 1237 void
>>>>>>>>>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop
>>>>>>>>>>>        > archived_mirror) {
>>>>>>>>>>>        > 1238   if (MetaspaceShared::relocation_delta() != 0) {
>>>>>>>>>>>        > 1239  assert(archived_mirror->metadata_field(_klass_offset) ==
>>>>>>>>>>>        > NULL, "must be for primitive class");
>>>>>>>>>>>        > 1240
>>>>>>>>>>>        > 1241     Klass* ak =
>>>>>>>>>>>        > ((Klass*)archived_mirror->metadata_field(_array_klass_offset));
>>>>>>>>>>>        > 1242     if (ak != NULL) {
>>>>>>>>>>>        > 1243  archived_mirror->metadata_field_put(_array_klass_offset,
>>>>>>>>>>>        > (Klass*)(address(ak) + MetaspaceShared::relocation_delta()));
>>>>>>>>>>>        > 1244     }
>>>>>>>>>>>        > 1245   }
>>>>>>>>>>>        > 1246 }
>>>>>>>>>>>        >
>>>>>>>>>>>        > src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>>>>        >
>>>>>>>>>>>        >   889   Thread* THREAD = Thread::current();
>>>>>>>>>>>        >   890   Method::sort_methods(ik->methods(), /*set_idnums=*/true,
>>>>>>>>>>>        > dynamic_dump_method_comparator);
>>>>>>>>>>>        >   891   if (ik->default_methods() != NULL) {
>>>>>>>>>>>        >   892  Method::sort_methods(ik->default_methods(),
>>>>>>>>>>>        > /*set_idnums=*/false, dynamic_dump_method_comparator);
>>>>>>>>>>>        >   893   }
>>>>>>>>>>>        >
>>>>>>>>>>>
>>>>>>>>>>>        OK will do.
>>>>>>>>>>>
>>>>>>>>>>>        > Please see inlined comments below.
>>>>>>>>>>>        >
>>>>>>>>>>>        > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>>>>>>        <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>>>>        >> Hi Jiangli,
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> Thanks for the review. I've updated the patch according to your
>>>>>>>>>>>        comments:
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>
>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/
>>>>>>>>>>>
>>>>>>>>>>>        >>
>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/
>>>>>>>>>>>
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> (the delta is on top of 8231610-relocate-cds-archive.v03.delta
>>>>>>>>>>>        in my
>>>>>>>>>>>        >> reply to Calvin's comments).
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>>>>>>>>>>>        >>> Hi Ioi,
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Sorry for the delay. Here are my remaining comments.
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 128   static intx _method_comparator_name_delta;
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> The name of the above variable is confusing. It's the value of
>>>>>>>>>>>        >>> _buffer_to_target_delta. It's better to _buffer_to_target_delta
>>>>>>>>>>>        >>> directly.
>>>>>>>>>>>        >> _buffer_to_target_delta is a non-static field, but
>>>>>>>>>>>        >> dynamic_dump_method_comparator() must be a static function so
>>>>>>>>>>>        it can't
>>>>>>>>>>>        >> use the non-static field easily.
>>>>>>>>>>>        >
>>>>>>>>>>>        > It sounds like an issue. _buffer_to_target_delta was made as a
>>>>>>>>>>>        > non-static mostly because we might support more than one dynamic
>>>>>>>>>>>        > archives in the future. However, today's usages bake in an
>>>>>>>>>>>        assumption
>>>>>>>>>>>        > that _buffer_to_target_delta is a singleton value. It is
>>>>>>>>>>> cleaner to
>>>>>>>>>>>        > either make _buffer_to_target_delta as a static variable for
>>>>>>>>>>> now, or
>>>>>>>>>>>        > adding an access API in DynamicArchiveBuilder to allow other
>>>>>>>>>>> code to
>>>>>>>>>>>        > properly and correctly use the value.
>>>>>>>>>>>
>>>>>>>>>>>        OK, I'll move it to a static variable.
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> Also, we can do a quick pointer comparison of 'a_name' and
>>>>>>>>>>>        >>> 'b_name' first before adjusting the pointers.
>>>>>>>>>>>        >> I added this:
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>       if (a_name == b_name) {
>>>>>>>>>>>        >>         return 0;
>>>>>>>>>>>        >>       }
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 934 void DynamicArchiveBuilder::relocate_buffer_to_target() {
>>>>>>>>>>>        >>> ...
>>>>>>>>>>>        >>>    944
>>>>>>>>>>>        >>>    945  ArchivePtrMarker::compact(relocatable_base,
>>>>>>>>>>>        relocatable_end);
>>>>>>>>>>>        >>> ...
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>    974     SharedDataRelocator patcher((address*)patch_base,
>>>>>>>>>>>        >>> (address*)patch_end, valid_old_base, valid_old_end,
>>>>>>>>>>>        >>>    975  valid_new_base, valid_new_end, addr_delta);
>>>>>>>>>>>        >>>    976  ArchivePtrMarker::ptrmap()->iterate(&patcher);
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Could we reduce the number of data re-iterations to help
>>>>>>>>>>> archive
>>>>>>>>>>>        >>> dumping performance. The ArchivePtrMarker::compact operation
>>>>>>>>>>>        can be
>>>>>>>>>>>        >>> combined with the patching iteration.
>>>>>>>>>>>        ArchivePtrMarker::compact API
>>>>>>>>>>>        >>> can be removed.
>>>>>>>>>>>        >> That's a good idea. I implemented it using a template parameter
>>>>>>>>>>>        so that
>>>>>>>>>>>        >> we can have max performance when relocating the archive at run
>>>>>>>>>>>        time.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> I added comments to explain why the relocation is done here. The
>>>>>>>>>>>        >> relocation is pretty rare (only when the base archive was not
>>>>>>>>>>>        mapped at
>>>>>>>>>>>        >> the default location).
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>    967     address valid_new_base =
>>>>>>>>>>>        >>> (address)Arguments::default_SharedBaseAddress();
>>>>>>>>>>>        >>>    968     address valid_new_end  = valid_new_base +
>>>>>>>>>>>        base_plus_top_size;
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> The debugging only code can be included under #ifdef ASSERT.
>>>>>>>>>>>        >> These values are actually also used in debug logging so they
>>>>>>>>>>>        can't be
>>>>>>>>>>>        >> ifdef'ed out.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> Also, the c++ compiler is pretty good with eliding code
>>>>>>>>>>> that's no
>>>>>>>>>>>        >> actually used. If I comment out all the logging code in
>>>>>>>>>>>        >> DynamicArchiveBuilder::relocate_buffer_to_target() and
>>>>>>>>>>>        >> SharedDataRelocator, gcc elides all the unused fields and their
>>>>>>>>>>>        >> assignments. So no code is generated for this, etc.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>       address valid_new_base =
>>>>>>>>>>>        >> (address)Arguments::default_SharedBaseAddress();
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> Since #ifdef ASSERT makes the code harder to read, I think we
>>>>>>>>>>>        should use
>>>>>>>>>>>        >> it only when really necessary.
>>>>>>>>>>>        > It seems cleaner to get rid of these debugging only variables, by
>>>>>>>>>>>        > using 'relocatable_base' and
>>>>>>>>>>>        > '(address)Arguments::default_SharedBaseAddress()' in the logging
>>>>>>>>>>>        code.
>>>>>>>>>>>
>>>>>>>>>>>        SharedDataRelocator is used under 3 different situations. These six
>>>>>>>>>>>        variables (patch_base, patch_end, valid_old_base, valid_old_end,
>>>>>>>>>>>        valid_new_base, valid_new_end) describes what is being patched,
>>>>>>>>>>>        and what
>>>>>>>>>>>        the expectations are, for each situation. The code will be hard to
>>>>>>>>>>>        understand without them.
>>>>>>>>>>>
>>>>>>>>>>>        Please note there's also logging code in the SharedDataRelocator
>>>>>>>>>>>        constructor that prints out these values.
>>>>>>>>>>>
>>>>>>>>>>>        I think I'll just remove the 'debug only' comment to avoid
>>>>>>>>>>> confusion.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ok.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>    993
>>>>>>>>>>>     dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> We could combine the archived heap data bitmap into the new
>>>>>>>>>>>        region as
>>>>>>>>>>>        >>> well? It can be handled as a separate RFE.
>>>>>>>>>>>        >> I've filed https://bugs.openjdk.java.net/browse/JDK-8233093
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - src/hotspot/share/memory/filemap.cpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1038     if (is_static()) {
>>>>>>>>>>>        >>> 1039       if (errno == ENOENT) {
>>>>>>>>>>>        >>> 1040         // Not locating the shared archive is ok.
>>>>>>>>>>>        >>> 1041         fail_continue("Specified shared archive not found
>>>>>>>>>>>        (%s).",
>>>>>>>>>>>        >>> _full_path);
>>>>>>>>>>>        >>> 1042       } else {
>>>>>>>>>>>        >>> 1043         fail_continue("Failed to open shared archive file
>>>>>>>>>>>        (%s).",
>>>>>>>>>>>        >>> 1044  os::strerror(errno));
>>>>>>>>>>>        >>> 1045       }
>>>>>>>>>>>        >>> 1046     } else {
>>>>>>>>>>>        >>> 1047       log_warning(cds, dynamic)("specified dynamic archive
>>>>>>>>>>>        >>> doesn't exist: %s", _full_path);
>>>>>>>>>>>        >>> 1048     }
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> If the top layer is explicitly specified by the user, a
>>>>>>>>>>>        warning does
>>>>>>>>>>>        >>> not seem to be a proper behavior if the VM fails to open the
>>>>>>>>>>>        archive
>>>>>>>>>>>        >>> file.
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> If might be better to handle the relocation unrelated code in
>>>>>>>>>>>        separate
>>>>>>>>>>>        >>> changeset and track with a separate RFE.
>>>>>>>>>>>        >> This code was moved from
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>
>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070
>>>>>>>>>>>
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> so I am not changing the behavior. If you want, we can file an
>>>>>>>>>>>        REF to
>>>>>>>>>>>        >> change the behavior.
>>>>>>>>>>>        > Ok. A new RFE sounds like the right thing to re-evaluable the
>>>>>>>>>>> usage
>>>>>>>>>>>        > issue here. Thanks.
>>>>>>>>>>>
>>>>>>>>>>>        I created https://bugs.openjdk.java.net/browse/JDK-8233446
>>>>>>>>>>>
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1148 void FileMapInfo::write_region(int region, char* base,
>>>>>>>>>>>        size_t size,
>>>>>>>>>>>        >>> 1149                                bool read_only, bool
>>>>>>>>>>>        allow_exec) {
>>>>>>>>>>>        >>> ...
>>>>>>>>>>>        >>> 1154
>>>>>>>>>>>        >>> 1155   if (region == MetaspaceShared::bm) {
>>>>>>>>>>>        >>> 1156     target_base = NULL;
>>>>>>>>>>>        >>> 1157   } else if (DynamicDumpSharedSpaces) {
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> It's not too clear to me how the bitmap (bm) region is handled
>>>>>>>>>>>        for the
>>>>>>>>>>>        >>> base layer and top layer. Could you please explain?
>>>>>>>>>>>        >> The bm region for both layers are mapped at an address picked
>>>>>>>>>>>        by the OS:
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> char* FileMapInfo::map_relocation_bitmap(size_t& bitmap_size) {
>>>>>>>>>>>        >>     FileMapRegion* si = space_at(MetaspaceShared::bm);
>>>>>>>>>>>        >>     bitmap_size = si->used_aligned();
>>>>>>>>>>>        >>     bool read_only = true, allow_exec = false;
>>>>>>>>>>>        >>     char* requested_addr = NULL; // allow OS to pick any
>>>>>>>>>>> location
>>>>>>>>>>>        >>     char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>>>>>>        si->file_offset(),
>>>>>>>>>>>        >> requested_addr, bitmap_size,
>>>>>>>>>>>        >> read_only, allow_exec);
>>>>>>>>>>>        >>
>>>>>>>>>>>        > Ok, after staring at the code for a few seconds I saw that's
>>>>>>>>>>>        intended.
>>>>>>>>>>>        > If the current region is 'bm', then the 'target_base' is NULL
>>>>>>>>>>>        > regardless if it's static or dynamic archive. Otherwise, the
>>>>>>>>>>>        > 'target_base' is handled differently for the static and dynamic
>>>>>>>>>>>        case.
>>>>>>>>>>>        > The following would be cleaner and has better reliability.
>>>>>>>>>>>        >
>>>>>>>>>>>        >     char* target_base = NULL;
>>>>>>>>>>>        >
>>>>>>>>>>>        >     // The target_base is NULL for 'bm' region.
>>>>>>>>>>>        >     if (!region == MetaspaceShared::bm) {
>>>>>>>>>>>        >       if (DynamicDumpSharedSpaces) {
>>>>>>>>>>>        >         assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>>>>>>> archive
>>>>>>>>>>>        > doesn't support heap regions");
>>>>>>>>>>>        >         target_base = DynamicArchive::buffer_to_target(base);
>>>>>>>>>>>        >       } else {
>>>>>>>>>>>        >         target_base = base;
>>>>>>>>>>>        >       }
>>>>>>>>>>>        >    }
>>>>>>>>>>>
>>>>>>>>>>>        How about this?
>>>>>>>>>>>
>>>>>>>>>>>           char* target_base;
>>>>>>>>>>>           if (region == MetaspaceShared::bm) {
>>>>>>>>>>>             target_base = NULL; // always NULL for bm region.
>>>>>>>>>>>           } else {
>>>>>>>>>>>             if (DynamicDumpSharedSpaces) {
>>>>>>>>>>>                 assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>>>>>>> archive
>>>>>>>>>>>        doesn't support heap regions");
>>>>>>>>>>>                 target_base = DynamicArchive::buffer_to_target(base);
>>>>>>>>>>>             } else {
>>>>>>>>>>>                 target_base = base;
>>>>>>>>>>>             }
>>>>>>>>>>>           }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No objection If you prefer the extra 'else' block.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1362
>>>>>>>>>>>     DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>>>>>>
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Could you please explain the above?
>>>>>>>>>>>        >> I added the comments
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>     // Make sure we don't attempt to use
>>>>>>>>>>>        header()->mapped_base_address()
>>>>>>>>>>>        >> unless
>>>>>>>>>>>        >>     // it's been successfully mapped.
>>>>>>>>>>>        >>
>>>>>>>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>>>>>>
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1359   FileMapRegion* last_region = NULL;
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1371     if (last_region != NULL) {
>>>>>>>>>>>        >>> 1372       // Ensure that the OS won't be able to allocate new
>>>>>>>>>>>        memory
>>>>>>>>>>>        >>> spaces between any mapped
>>>>>>>>>>>        >>> 1373       // regions, or else it would mess up the simple
>>>>>>>>>>>        comparision
>>>>>>>>>>>        >>> in MetaspaceObj::is_shared().
>>>>>>>>>>>        >>> 1374       assert(si->mapped_base() ==
>>>>>>>>>>> last_region->mapped_end(),
>>>>>>>>>>>        >>> "must have no gaps");
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1379     last_region = si;
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Can you please place 'last_region' related code under #ifdef
>>>>>>>>>>>        ASSERT?
>>>>>>>>>>>        >> I think that will make the code more cluttered. The compiler
>>>>>>>>>>> will
>>>>>>>>>>>        >> optimize out that away.
>>>>>>>>>>>        > It's cleaner to define debugging only variable for debugging only
>>>>>>>>>>>        > builds. You can wrapper it and related usage with DEBUG_ONLY.
>>>>>>>>>>>
>>>>>>>>>>>        OK, will do.
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1478 char* FileMapInfo::map_relocation_bitmap(size_t&
>>>>>>>>>>>        bitmap_size) {
>>>>>>>>>>>        >>> 1479   FileMapRegion* si = space_at(MetaspaceShared::bm);
>>>>>>>>>>>        >>> 1480   bitmap_size = si->used_aligned();
>>>>>>>>>>>        >>> 1481   bool read_only = true, allow_exec = false;
>>>>>>>>>>>        >>> 1482   char* requested_addr = NULL; // allow OS to pick any
>>>>>>>>>>>        location
>>>>>>>>>>>        >>> 1483   char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>>>>>>        si->file_offset(),
>>>>>>>>>>>        >>> 1484 requested_addr, bitmap_size,
>>>>>>>>>>>        >>> read_only, allow_exec);
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> We need to handle mapping failure here.
>>>>>>>>>>>        >> It's handled here:
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> bool FileMapInfo::relocate_pointers(intx addr_delta) {
>>>>>>>>>>>        >>     log_debug(cds, reloc)("runtime archive relocation start");
>>>>>>>>>>>        >>     size_t bitmap_size;
>>>>>>>>>>>        >>     char* bitmap_base = map_relocation_bitmap(bitmap_size);
>>>>>>>>>>>        >>     if (bitmap_base != NULL) {
>>>>>>>>>>>        >>     ...
>>>>>>>>>>>        >>     } else {
>>>>>>>>>>>        >>       log_error(cds)("failed to map relocation bitmap");
>>>>>>>>>>>        >>       return false;
>>>>>>>>>>>        >>     }
>>>>>>>>>>>        >>
>>>>>>>>>>>        > 'bitmap_base' is used immediately after map_memory(). So the
>>>>>>>>>>> check
>>>>>>>>>>>        > needs to be done immediately after map_memory(), but not in the
>>>>>>>>>>>        caller
>>>>>>>>>>>        > of map_relocation_bitmap().
>>>>>>>>>>>        >
>>>>>>>>>>>        > 1490   char* bitmap_base = os::map_memory(_fd, _full_path,
>>>>>>>>>>>        si->file_offset(),
>>>>>>>>>>>        > 1491 requested_addr, bitmap_size,
>>>>>>>>>>>        > read_only, allow_exec);
>>>>>>>>>>>        > 1492
>>>>>>>>>>>        > 1493   if (VerifySharedSpaces && bitmap_base != NULL &&
>>>>>>>>>>>        > !region_crc_check(bitmap_base, bitmap_size, si->crc())) {
>>>>>>>>>>>
>>>>>>>>>>>        OK, I'll fix that.
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1513     // debug only -- the current value of the pointers
>>>>>>>>>>> to be
>>>>>>>>>>>        >>> patched must be within this
>>>>>>>>>>>        >>> 1514     // range (i.e., must be between the requesed base
>>>>>>>>>>>        address,
>>>>>>>>>>>        >>> and the of the current archive).
>>>>>>>>>>>        >>> 1515     // Note: top archive may point to objects in the base
>>>>>>>>>>>        >>> archive, but not the other way around.
>>>>>>>>>>>        >>> 1516     address valid_old_base =
>>>>>>>>>>>        (address)header()->requested_base_address();
>>>>>>>>>>>        >>> 1517     address valid_old_end  = valid_old_base +
>>>>>>>>>>>        mapping_end_offset();
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Please place all FileMapInfo::relocate_pointers debugging only
>>>>>>>>>>>        code
>>>>>>>>>>>        >>> under #ifdef ASSERT.
>>>>>>>>>>>        >> Ditto about ifdef ASSERT
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - src/hotspot/share/memory/heapShared.cpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>    441 void
>>>>>>>>>>>        HeapShared::initialize_from_archived_subgraph(Klass* k) {
>>>>>>>>>>>        >>>    442   if (!open_archive_heap_region_mapped() ||
>>>>>>>>>>>        !MetaspaceObj::is_shared(k)) {
>>>>>>>>>>>        >>>    443     return; // nothing to do
>>>>>>>>>>>        >>>    444   }
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> When do we call HeapShared::initialize_from_archived_subgraph
>>>>>>>>>>>        for a
>>>>>>>>>>>        >>> klass that's not shared?
>>>>>>>>>>>        >> I've removed the !MetaspaceObj::is_shared(k). I probably added
>>>>>>>>>>>        that for
>>>>>>>>>>>        >> debugging purposes only.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>>    616   DEBUG_ONLY({
>>>>>>>>>>>        >>>    617       Klass* klass = orig_obj->klass();
>>>>>>>>>>>        >>>    618       assert(klass !=
>>>>>>>>>>> SystemDictionary::Module_klass() &&
>>>>>>>>>>>        >>>    619              klass !=
>>>>>>>>>>>        SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>>>>>>        >>>    620              klass !=
>>>>>>>>>>>        SystemDictionary::MemberName_klass() &&
>>>>>>>>>>>        >>>    621              klass !=
>>>>>>>>>>> SystemDictionary::Context_klass() &&
>>>>>>>>>>>        >>>    622              klass !=
>>>>>>>>>>>        SystemDictionary::ClassLoader_klass(), "we
>>>>>>>>>>>        >>> can only relocate metaspace object pointers inside
>>>>>>>>>>> java_lang_Class
>>>>>>>>>>>        >>> instances");
>>>>>>>>>>>        >>>    623     });
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Let's leave the above for a separate RFE. I think assert is not
>>>>>>>>>>>        >>> sufficient for the check. Also, why ResolvedMethodName,
>>>>>>>>>>> Module and
>>>>>>>>>>>        >>> MemberName cannot be part of the graph?
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >> I added the following comment:
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>     DEBUG_ONLY({
>>>>>>>>>>>        >>         // The following are classes in
>>>>>>>>>>>        share/classfile/javaClasses.cpp
>>>>>>>>>>>        >> that have injected native pointers
>>>>>>>>>>>        >>         // to metaspace objects. To support these classes, we
>>>>>>>>>>>        need to add
>>>>>>>>>>>        >> relocation code similar to
>>>>>>>>>>>        >>         //
>>>>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
>>>>>>>>>>>        >>         Klass* klass = orig_obj->klass();
>>>>>>>>>>>        >>         assert(klass != SystemDictionary::Module_klass() &&
>>>>>>>>>>>        >>                klass !=
>>>>>>>>>>>        SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>>>>>>        >>
>>>>>>>>>>>        > It's too restrictive to exclude those objects from the archived
>>>>>>>>>>>        object
>>>>>>>>>>>        > graph because metadata relocation, since metadata relocation is
>>>>>>>>>>>        rare.
>>>>>>>>>>>        > The trade-off doesn't seem to buy us much.
>>>>>>>>>>>        >
>>>>>>>>>>>        > Do you plan to add the needed relocation code?
>>>>>>>>>>>
>>>>>>>>>>>        I looked more into this. Actually we cannot handle these 5
>>>>>>>>>>> classes at
>>>>>>>>>>>        all, even without archive relocation:
>>>>>>>>>>>
>>>>>>>>>>>        [1] #define MODULE_INJECTED_FIELDS(macro) \
>>>>>>>>>>>           macro(java_lang_Module, module_entry, intptr_signature, false)
>>>>>>>>>>>
>>>>>>>>>>>        ->  module_entry is malloc'ed
>>>>>>>>>>>
>>>>>>>>>>>        [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>>>>>>>>>>>           macro(java_lang_invoke_ResolvedMethodName, vmholder,
>>>>>>>>>>>        object_signature, false) \
>>>>>>>>>>>           macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>>>>>>>>>>>        intptr_signature, false)
>>>>>>>>>>>
>>>>>>>>>>>        -> these fields are related to method handles and lambda forms,
>>>>>>>>>>> etc.
>>>>>>>>>>>        They can't be easily be archived without implementing lambda form
>>>>>>>>>>>        archiving. (I did a prototype; it's very complex and fragile).
>>>>>>>>>>>
>>>>>>>>>>>        [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>>>>>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>>>>>>        vmdependencies, intptr_signature, false) \
>>>>>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>>>>>>        last_cleanup, long_signature, false)
>>>>>>>>>>>
>>>>>>>>>>>        -> vmdependencies is malloc'ed.
>>>>>>>>>>>
>>>>>>>>>>>        [4] #define
>>>>>>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>>>>>>>>>>>           macro(java_lang_invoke_MemberName, vmindex, intptr_signature,
>>>>>>>>>>>        false)
>>>>>>>>>>>
>>>>>>>>>>>        -> this one is probably OK. Despite being declared as
>>>>>>>>>>>        'intptr_signature', it seems to be used just as an integer.
>>>>>>>>>>> However,
>>>>>>>>>>>        MemberNames are typically used with [2] and [3]. So let's just
>>>>>>>>>>>        forbid it
>>>>>>>>>>>        to be safe.
>>>>>>>>>>>
>>>>>>>>>>>        [2] [3] [4] are not used directly by regular Java code and are
>>>>>>>>>>>        unlikely
>>>>>>>>>>>        to be referenced (directly or indirectly) by static fields (except
>>>>>>>>>>>        for
>>>>>>>>>>>        the static fields in the classes in java.lang.invoke, which we
>>>>>>>>>>>        probably
>>>>>>>>>>>        won't support for heap archiving due to the problem I described for
>>>>>>>>>>>        [2]). Objects of these types are typically referenced via constant
>>>>>>>>>>>        pool
>>>>>>>>>>>        entries.
>>>>>>>>>>>
>>>>>>>>>>>        [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>>>>>>>>>>>           macro(java_lang_ClassLoader, loader_data, intptr_signature,
>>>>>>>>>>> false)
>>>>>>>>>>>
>>>>>>>>>>>        -> loader_data is malloc'ed.
>>>>>>>>>>>
>>>>>>>>>>>        So, I will change the DEBUG_ONLY into a product-mode check, and
>>>>>>>>>>> quit
>>>>>>>>>>>        dumping if these objects are found in the object subgraph.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sounds good. Can you please also add a comment with explanation.
>>>>>>>>>>>
>>>>>>>>>>> For  ClassLoader and Module, it worth considering caching the
>>>>>>>>>>> additional native data some time in the future. Lois had suggested
>>>>>>>>>>> the Module part a while ago.
>>>>>>>>>> I think we can do that if/when we archive Modules directly into the
>>>>>>>>>> shared heap.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        Maybe we should backport the check to older versions as well?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We should discuss with Andrew Haley for backports to JDK 11 update
>>>>>>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
>>>>>>>>>>> archiving to a restricted set of JDK library code, I think it is
>>>>>>>>>>> safe without the new check.
>>>>>>>>>>>
>>>>>>>>>>> For non-LTS releases, it might not be worthwhile as they may not be
>>>>>>>>>>> widely used?
>>>>>>>>>> I agree. FYI, we (Oracle) have no plan for backporting more types of
>>>>>>>>>> heap object archiving, so the decision would be up to whoever that
>>>>>>>>>> decides to do so.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> - Ioi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jiangli
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1036   metaspace_rs =
>>>>>>>>>>> ReservedSpace(compressed_class_space_size(),
>>>>>>>>>>>        >>> 1037   _reserve_alignment,
>>>>>>>>>>>        >>> 1038   large_pages,
>>>>>>>>>>>        >>> 1039   requested_addr);
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Please fix indentation.
>>>>>>>>>>>        >> Fixed.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>     78   enum SpecialRef {
>>>>>>>>>>>        >>>     79     _method_entry_ref
>>>>>>>>>>>        >>>     80   };
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Are there other pointers that are not references to
>>>>>>>>>>>        MetaspaceObj? If
>>>>>>>>>>>        >>> _method_entry_ref is the only type, it's probably not worth
>>>>>>>>>>>        defining
>>>>>>>>>>>        >>> SpecialRef?
>>>>>>>>>>>        >> There may be more types in the future, so I want to have a
>>>>>>>>>>>        stable API
>>>>>>>>>>>        >> that can be easily expanded without touching all the code that
>>>>>>>>>>>        uses it.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - src/hotspot/share/memory/metaspaceShared.hpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>     42 enum MapArchiveResult {
>>>>>>>>>>>        >>>     43   MAP_ARCHIVE_SUCCESS,
>>>>>>>>>>>        >>>     44   MAP_ARCHIVE_MMAP_FAILURE,
>>>>>>>>>>>        >>>     45   MAP_ARCHIVE_OTHER_FAILURE
>>>>>>>>>>>        >>>     46 };
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> If we want to define different failure types, it's probably
>>>>>>>>>>> worth
>>>>>>>>>>>        >>> using separate types for relocation failure and validation
>>>>>>>>>>>        failure.
>>>>>>>>>>>        >> For now, I just need to distinguish between MMAP_FAILURE (where
>>>>>>>>>>>        I should
>>>>>>>>>>>        >> attempt to remap at an alternative address) and OTHER_FAILURE
>>>>>>>>>>>        (where the
>>>>>>>>>>>        >> CDS archive loading will fail -- due to validation error,
>>>>>>>>>>>        insufficient
>>>>>>>>>>>        >> memory, etc -- without attempting to remap.)
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> ---
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>    193   static intx _mapping_delta; // FIXME rename
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> How about _relocation_delta?
>>>>>>>>>>>        >> Changed as suggested.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - src/hotspot/share/oops/instanceKlass
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> 1573 bool InstanceKlass::_disable_method_binary_search = false;
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> The use of _disable_method_binary_search is not necessary. You
>>>>>>>>>>>        can use
>>>>>>>>>>>        >>> DynamicDumpSharedSpaces for the purpose. That would make things
>>>>>>>>>>>        >>> cleaner.
>>>>>>>>>>>        >> If we always disable the binary search when
>>>>>>>>>>>        DynamicDumpSharedSpaces is
>>>>>>>>>>>        >> true, it will slow down normal execution of the Java program
>>>>>>>>>>> when
>>>>>>>>>>>        >> -XX:ArchiveClassesAtExit has been specified, but the program
>>>>>>>>>>>        hasn't exited.
>>>>>>>>>>>        > Could you please add some comments to
>>>>>>>>>>> _disable_method_binary_search
>>>>>>>>>>>        > with the above explanation? Thanks.
>>>>>>>>>>>
>>>>>>>>>>>        OK
>>>>>>>>>>>        >
>>>>>>>>>>>        >>> - test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>     76                     if (name.equals("s0") ||
>>>>>>>>>>>        name.equals("s1")) {
>>>>>>>>>>>        >>>     77                       // String regions are listed at
>>>>>>>>>>>        the end and
>>>>>>>>>>>        >>> they may not be fully occupied.
>>>>>>>>>>>        >>>     78                       break;
>>>>>>>>>>>        >>>     79                     } else if (name.equals("bm")) {
>>>>>>>>>>>        >>>     80                       // Bitmap space does not have a
>>>>>>>>>>>        requested address.
>>>>>>>>>>>        >>>     81                       break;
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> It's not part of your change, but could you please fix line 76
>>>>>>>>>>>        - 78
>>>>>>>>>>>        >>> since it is trivial. It seems the lines can be removed.
>>>>>>>>>>>        >> Removed.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - /src/hotspot/share/memory/archiveUtils.hpp
>>>>>>>>>>>        >>> The file name does not match with the macro '#ifndef
>>>>>>>>>>>        >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you please rename
>>>>>>>>>>>        >>> archiveUtils.* ? archiveRelocator.hpp and
>>>>>>>>>>> archiveRelocator.cpp are
>>>>>>>>>>>        >>> more descriptive.
>>>>>>>>>>>        >> I named the file archiveUtils.hpp so we can move other misc
>>>>>>>>>>>        stuff used
>>>>>>>>>>>        >> by dumping into this file (e.g., DumpRegion, WriteClosure from
>>>>>>>>>>>        >> metaspaceShared.hpp), since theses are not used by the majority
>>>>>>>>>>>        of the
>>>>>>>>>>>        >> files that use metaspaceShared.hpp.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >> I fixed the ifdef.
>>>>>>>>>>>        >>
>>>>>>>>>>>        >>> - src/hotspot/share/memory/archiveUtils.cpp
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>     36 void ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>>>>>>>>>>>        address*
>>>>>>>>>>>        >>> ptr_base, address* ptr_end) {
>>>>>>>>>>>        >>>     37   assert(_ptrmap == NULL, "initialize only once");
>>>>>>>>>>>        >>>     38   _ptr_base = ptr_base;
>>>>>>>>>>>        >>>     39   _ptr_end = ptr_end;
>>>>>>>>>>>        >>>     40   _compacted = false;
>>>>>>>>>>>        >>>     41   _ptrmap = ptrmap;
>>>>>>>>>>>        >>>     42   _ptrmap->initialize(12 * M / sizeof(intptr_t)); //
>>>>>>>>>>>        default
>>>>>>>>>>>        >>> archive is about 12MB.
>>>>>>>>>>>        >>>     43 }
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> Could we do a better estimate here? We could guesstimate the
>>>>>>>>>>> size
>>>>>>>>>>>        >>> based on the current used class space and metaspace size. It's
>>>>>>>>>>>        okay if
>>>>>>>>>>>        >>> a larger bitmap used, since it can be reduced after all
>>>>>>>>>>>        marking are
>>>>>>>>>>>        >>> done.
>>>>>>>>>>>        >> The bitmap is automatically expanded when necessary in
>>>>>>>>>>>        >> ArchivePtrMarker::mark_pointer(). It's only about 1/32 or 1/64
>>>>>>>>>>>        of the
>>>>>>>>>>>        >> total archive size, so even if we do expand, the cost will be
>>>>>>>>>>>        trivial.
>>>>>>>>>>>        > The initial value is based on the default CDS archive. When
>>>>>>>>>>> dealing
>>>>>>>>>>>        > with a really large archive, it would have to re-grow many times.
>>>>>>>>>>>        > Also, using a hard-coded value is less desirable.
>>>>>>>>>>>
>>>>>>>>>>>        OK, I changed it to the following
>>>>>>>>>>>
>>>>>>>>>>>           // Use this as initial guesstimate. We should need less space
>>>>>>>>>>>        in the
>>>>>>>>>>>           // archive, but if we're wrong the bitmap will be expanded
>>>>>>>>>>>        automatically.
>>>>>>>>>>>           size_t estimated_archive_size =
>>>>>>>>>>> MetaspaceGC::capacity_until_GC();
>>>>>>>>>>>           // But set it smaller in debug builds so we always test the
>>>>>>>>>>>        expansion
>>>>>>>>>>>        code.
>>>>>>>>>>>           // (Default archive is about 12MB).
>>>>>>>>>>>           DEBUG_ONLY(estimated_archive_size = 6 * M);
>>>>>>>>>>>
>>>>>>>>>>>           // We need one bit per pointer in the archive.
>>>>>>>>>>>           _ptrmap->initialize(estimated_archive_size / sizeof(intptr_t));
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        Thanks!
>>>>>>>>>>>        - Ioi
>>>>>>>>>>>
>>>>>>>>>>>        >
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>>
>>>>>>>>>>>        >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>>>>>>>>>>>        <jianglizhou at google.com <mailto:jianglizhou at google.com>> wrote:
>>>>>>>>>>>        >>>> Hi Ioi,
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> This is another great step for CDS usability improvement.
>>>>>>>>>>>        Thank you!
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> I have a high level question (or request): could we consider
>>>>>>>>>>>        >>>> separating the relocation work for 'direct' class metadata
>>>>>>>>>>>        from other
>>>>>>>>>>>        >>>> types of metadata (such as the shared system dictionary,
>>>>>>>>>>>        symbol table,
>>>>>>>>>>>        >>>> etc)? Initially we only relocate the tables and other
>>>>>>>>>>>        archived global
>>>>>>>>>>>        >>>> data. When each archived class is being loaded, we can
>>>>>>>>>>>        relocate all
>>>>>>>>>>>        >>>> the pointers within the current class. We could find the
>>>>>>>>>>>        segment (for
>>>>>>>>>>>        >>>> the current class) in the bitmap and update the pointers
>>>>>>>>>>>        within the
>>>>>>>>>>>        >>>> segment. That way we can reduce initial startup costs and
>>>>>>>>>>>        also avoid
>>>>>>>>>>>        >>>> relocating class data that's not used at runtime. In some
>>>>>>>>>>>        real world
>>>>>>>>>>>        >>>> large systems, an archive may contain extremely large
>>>>>>>>>>> number of
>>>>>>>>>>>        >>>> classes.
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> Following are partial review comments so we can move things
>>>>>>>>>>>        forward.
>>>>>>>>>>>        >>>> Still going through the rest of the changes.
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> - src/hotspot/share/classfile/javaClasses.cpp
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> 1218 void
>>>>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>>>>>>>>>>>        >>>> archived_mirror) {
>>>>>>>>>>>        >>>> 1219   Klass* k =
>>>>>>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>>>>>>>>>>>        >>>> 1220   if (k != NULL) { // k is NULL for the primitive
>>>>>>>>>>>        classes such as
>>>>>>>>>>>        >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>>>>>>>>>>>        >>>> 1221  archived_mirror->metadata_field_put(_klass_offset,
>>>>>>>>>>>        >>>> (Klass*)(address(k) + MetaspaceShared::mapping_delta()));
>>>>>>>>>>>        >>>> 1222   }
>>>>>>>>>>>        >>>> 1223 ...
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> Primitive type mirrors are handled separately. Could you
>>>>>>>>>>>        please verify
>>>>>>>>>>>        >>>> if this call path happens for primitive type mirror?
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> To answer my question above, looks like you added the
>>>>>>>>>>>        following, which
>>>>>>>>>>>        >>>> is to be used for primitive type mirrors. That seems to be
>>>>>>>>>>>        the reason
>>>>>>>>>>>        >>>> why update_archived_mirror_native_pointers is trying to also
>>>>>>>>>>>        cover
>>>>>>>>>>>        >>>> primitive type. It better to have a separate API for
>>>>>>>>>>>        primitive type
>>>>>>>>>>>        >>>> mirror, which is cleaner. And, we also can replace the above
>>>>>>>>>>>        check at
>>>>>>>>>>>        >>>> line 1220 to be an assert for regular mirrors.
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>>>>>>>>>>>        >>>> +  do_oop(p);
>>>>>>>>>>>        >>>> +  oop mirror = *p;
>>>>>>>>>>>        >>>> +  if (mirror != NULL) {
>>>>>>>>>>>        >>>> +
>>>>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror);
>>>>>>>>>>>        >>>> +  }
>>>>>>>>>>>        >>>> +}
>>>>>>>>>>>        >>>> +
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> How about renaming update_archived_mirror_native_pointers to
>>>>>>>>>>>        >>>> update_archived_mirror_klass_pointers.
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> It would be good to pass the current klass as an argument.
>>>>>>>>>>> We can
>>>>>>>>>>>        >>>> verify the relocated pointer matches with the current klass
>>>>>>>>>>>        pointer.
>>>>>>>>>>>        >>>>
>>>>>>>>>>>        >>>> We should also check if relocation is necessary before
>>>>>>>>>>>        spending cycles
>>>>>>>>>>>        >>>> to obtain the
>>


From ioi.lam at oracle.com  Wed Nov 13 17:19:16 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 13 Nov 2019 09:19:16 -0800
Subject: RFR(L) 8231610 Relocate the CDS archive if it cannot be mapped to
 the requested address
In-Reply-To: <84c8f6aa-f715-4915-1928-69c4131528cb@oracle.com>
References: <b554b386-11da-5852-be68-38869036fef8@oracle.com>
 <51245e17-51eb-ea1c-db2b-bbbdc7f768e8@oracle.com>
 <CALrW1jzLxU4xOQhwpi8E8EzW34L0OUeJbZGCsG=uNgSGbV0GQg@mail.gmail.com>
 <33b0b106-d29f-db2e-c909-5329fa60eca8@oracle.com>
 <CALrW1jzBxzBLA6epG_fcNB0Xr3k_8k_qA9fwdCM=9e77gKbEsg@mail.gmail.com>
 <8e5e6248-2b7d-f2a6-ae2b-0e673d74fc63@oracle.com>
 <7f0d558c-c161-ce41-90c6-ede8fddcf8b8@oracle.com>
 <99030987-a044-53fb-784b-62408333137a@oracle.com>
 <CALrW1jwTHJ1qg+gTfNeM9MbLdF042bF2ysKf7nGHKFNj9uKe+w@mail.gmail.com>
 <CALrW1jy5_4jrMRSZPAXV-c8a92Jy8y4eoK+f_t8ErwaZRGMoyw@mail.gmail.com>
 <52c473ef-5915-9ca0-8ed8-d4c2846965be@oracle.com>
 <CALrW1jzk+1XAqw2w55Y=ouyb-ZDB8tu5uWKNiXN9uA5Ku2XaCg@mail.gmail.com>
 <96ad8c62-fd62-1a1b-6f3c-e009e5e8a6f3@oracle.com>
 <CALrW1jye1Oua7e3LCNV6-c_pkYa3Ujni7own-ntXaFqv8tM6-Q@mail.gmail.com>
 <0fec66c6-b8a2-6019-655b-467f84404386@oracle.com>
 <CALrW1jxDLNH3Mp89YuU++NSnjo=OQGBdH1OSUJa9SZO6pjMo2A@mail.gmail.com>
 <84c8f6aa-f715-4915-1928-69c4131528cb@oracle.com>
Message-ID: <48c9f6e1-469e-44a8-7c84-3341d5473437@oracle.com>

Thanks Jiangli & Coleen for your reviews!

Looks like we are close to the end. I will do more testing and push soon.

Thanks
- Ioi

On 11/13/19 7:54 AM, coleen.phillimore at oracle.com wrote:
>
> I agree, the new diagnostic option looks good.?? Better than 
> SharedBaseAddress=0.
>
> Thanks,
> Coleen
>
> On 11/13/19 10:37 AM, Jiangli Zhou wrote:
>> Look good!
>>
>> Best,
>> Jiangli
>>
>> On Tue, Nov 12, 2019 at 9:12 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>>
>>> On 11/10/19 5:14 PM, Jiangli Zhou wrote:
>>>
>>>
>>>
>>> On Sun, Nov 10, 2019, 3:13 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>
>>>>
>>>> On 11/9/19 8:25 PM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> On Fri, Nov 8, 2019 at 1:35 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> Thanks for your comments. Please see my replies in-line:
>>>>>>
>>>>>> On 11/7/19 6:34 PM, Jiangli Zhou wrote:
>>>>>>> On Thu, Nov 7, 2019 at 6:11 PM Jiangli Zhou 
>>>>>>> <jianglizhou at google.com> wrote:
>>>>>>>> I looked both 05.full and 06.delta webrevs. They look good.
>>>>>>>>
>>>>>>>> I still feel a bit uneasy about the potential runtime impact 
>>>>>>>> when data
>>>>>>>> does get relocated. Long running apps/services may be shy away 
>>>>>>>> from
>>>>>>>> enabling archive at runtime, if there is a detectable overhead 
>>>>>>>> even
>>>>>>>> though it may only occur rarely. As relocation is enabled by 
>>>>>>>> default
>>>>>>>> and users cannot turn it off, disabling with -Xshare:off entirely
>>>>>>>> would become the only choice. Could you please create a new RFE
>>>>>>>> (possibly with higher priority) to investigate the potential 
>>>>>>>> effect,
>>>>>>>> or provide an option for users to opt-in relocation with the
>>>>>>>> command-line switch?
>>>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233862
>>>>>> Investigate performance benefit of relocating CDS archive to 
>>>>>> under 32G
>>>>>>
>>>>>> As I noted in the bug report, I ran benchmarks with CDS relocation
>>>>>> on/off, and there's no sign of regression when the CDS archive is
>>>>>> relocated. Please see the bug report for how to configure the VM 
>>>>>> to do
>>>>>> the comparison.
>>>>>>
>>>>>> As you said before: "When enabling CDS we [google] noticed a small
>>>>>> runtime overhead in JDK 11 recently with a benchmark. After I 
>>>>>> backported
>>>>>> JDK-8213713 to 11, it seemed to reduce the runtime overhead that the
>>>>>> benchmark was experiencing":
>>>>>>
>>>>>> Can you confirm whether this is stock JDK 11 or a special google 
>>>>>> build?
>>>>>> Which test case did you use? Is it possible for you to run the tests
>>>>>> again (using the exact before/after bits that you had when 
>>>>>> backporting
>>>>>> JDK-8213713)? Can you check if narrow_klass_base and 
>>>>>> narrow_klass_shift
>>>>>> are the same in your before/after builds?
>>>>> Thanks for creating the RFE.
>>>>>
>>>>> JDK-8213713 closes the 1G gap between the shared space and class 
>>>>> space
>>>>> and everything else is unaffected. The compressed class base and 
>>>>> shift
>>>>> were the same for before and after applying JDK-8213713. The effect
>>>>> was statistically observed for the benchmark since the difference was
>>>>> very small and could be within noise level for single run comparison.
>>>>> A small difference could still be important for some use cases so it
>>>>> needs to be taken into consideration when designing and implementing
>>>>> new changes.
>>>> Hi Jiangli,
>>>>
>>>> Thanks for taking the time for doing the performance measurements.
>>>>
>>>> I also ran benchmarks in all 3 modes (no CDS, CDS without relocation,
>>>> CDS with relocation), and did not see any significant performance with
>>>> Octane-DeltaBlue, Octane-NavierStokes, SPECjbb2005-Tuned,
>>>> JFR-SPECjbb2005-Tuned, SPECjvm2008-Serial-G1 and Tools-Javac-Hello.
>>>>
>>>>
>>>>> A new command-line for archived metadata relocation may still be
>>>>> valuable. It would also be helpful for debugging and diagnosis.
>>>>>
>>>> How about a diagnostic flag ArchiveRelocationMode:
>>>>
>>>> 0: (default) first map at preferred address, and if unsuccessful, 
>>>> map to
>>>> alternative address;
>>>> 1: always map to alternative address;
>>>> 2: always map at preferred address, and if unsuccessful, do not map 
>>>> the
>>>> archive;
>>>>
>>>> 1 is for testing relocation, as well as for easy performance 
>>>> measurement
>>>> (replaces the use of -XX:SharedBaseAddress=0 in my current patch.).
>>>> 2 is for avoiding potential regression that may be introduced by
>>>> relocation (revert to JDK 13 behavior).
>>>>
>>>> What do you think? If you like this I'll open a CSR.
>>>
>>>
>>> That sounds good to me!
>>>
>>>
>>> Hi Jiangli,
>>>
>>> It turns out that CSR is not needed for adding a diagnostic flag.
>>>
>>> I implemented the flag as described above. See:
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v07-delta/ 
>>>
>>>
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>> Regards,
>>> Jiangli
>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>>
>>>>>>> Forgot to say that when Java heap can fit into low 32G space, it 
>>>>>>> takes
>>>>>>> the class space size into account and leaves need space right above
>>>>>>> (also in low 32G space) when reserving heap, for 
>>>>>>> !UseSharedSpace. In
>>>>>>> that case, it's more likely the class data and heap data can be
>>>>>>> colocated successfully.
>>>>>> The reason is not for "colocation". It's so that 
>>>>>> narrow_klass_base can
>>>>>> be zero, and the klass pointer can be uncompressed with a shift 
>>>>>> (without
>>>>>> also doing an addition).
>>>>>>
>>>>>> But with CDS enabled, we always hard code to use non-zero
>>>>>> narrow_klass_base and 3 bit shift (for AOT). So by just 
>>>>>> relocating the
>>>>>> CDS archive to under 32GB, without modifying how CDS handles
>>>>>> narrow_klass_base/shift, I don't think we can expect any benefit.
>>>>> I experimented with mapping the shared space in low 32G and placed
>>>>> right above the Java heap. The class space was also allocated in the
>>>>> low 32G space and after the mapped shared space in the experiment. 
>>>>> The
>>>>> compress class encoding was using 0 base and 3 shift, which was the
>>>>> same as the encoding when CDS was disabled. I didn't observe runtime
>>>>> performance difference when comparing that specific configuration 
>>>>> with
>>>>> the normal CDS mapping scheme (the shared space start at 32G and the
>>>>> encoding is non-zero base and 3 shift).
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>> For modern architectures, I am not aware of any inherent speed 
>>>>>> benefit
>>>>>> simply by putting data (in our case much larger than a page) 
>>>>>> "close to
>>>>>> each other" in the virtual address space. If you have any 
>>>>>> reference of
>>>>>> that, please let me know.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiangli
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2019 at 4:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>>>>>> Hi Coleen,
>>>>>>>>>
>>>>>>>>> Thanks for the review. Here's an webrev that has incorporated 
>>>>>>>>> your
>>>>>>>>> suggestions:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v06-delta/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Please see comments in-line
>>>>>>>>>
>>>>>>>>> On 11/7/19 2:46 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>> Hi, I've done a more high level code review of this and it 
>>>>>>>>>> looks good!
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/src/hotspot/share/memory/archiveUtils.hpp.html 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think these classes require comments on what they do and 
>>>>>>>>>> why. The
>>>>>>>>>> comments you sent me offline look good.
>>>>>>>>> I added more comments for ArchivePtrMarker::_compacted per 
>>>>>>>>> your offline
>>>>>>>>> request.
>>>>>>>>>
>>>>>>>>>> Also .hpp files shouldn't include .inline.hpp files, like
>>>>>>>>>> bitMap.inline.hpp.? Hopefully it's just a case of moving 
>>>>>>>>>> do_bit() into
>>>>>>>>>> the cpp file.
>>>>>>>>> I moved the do_bit() function into archiveUtils.inline.hpp, 
>>>>>>>>> since is
>>>>>>>>> used by 3 .cpp files, and performance is important.
>>>>>>>>>
>>>>>>>>>> I wonder if the exception list of classes to exclude should be a
>>>>>>>>>> function in javaClasses.hpp/cpp where the explanation would 
>>>>>>>>>> make more
>>>>>>>>>> sense?? ie bool
>>>>>>>>>> JavaClasses::has_injected_native_pointers(InstanceKlass* k);
>>>>>>>>> I moved the checking code to javaClasses.cpp. Since we do 
>>>>>>>>> (partially)
>>>>>>>>> support java.lang.Class, which has injected native pointers, I 
>>>>>>>>> named the
>>>>>>>>> function as JavaClasses::is_supported_for_archiving instead. I 
>>>>>>>>> also
>>>>>>>>> massaged the comments a little for clarification.
>>>>>>>>>
>>>>>>>>>> Is there already an RFE to move the DumpSharedSpaces output from
>>>>>>>>>> tty->print() to log_info() ?
>>>>>>>>> I created https://bugs.openjdk.java.net/browse/JDK-8233826 
>>>>>>>>> (Change CDS
>>>>>>>>> dumping tty->print_cr() to unified logging).
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>> On 11/6/19 4:17 PM, Ioi Lam wrote:
>>>>>>>>>>> Hi Jiangli,
>>>>>>>>>>>
>>>>>>>>>>> I've uploaded the webrev after integrating your comments:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v05-delta/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please see more replies below:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/4/19 5:52 PM, Jiangli Zhou wrote:
>>>>>>>>>>>> On Sun, Nov 3, 2019 at 10:27 PM Ioi Lam <ioi.lam at oracle.com
>>>>>>>>>>>> <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? Hi Jiangli,
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? Thank you so much for spending time reviewing this RFE!
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? On 11/3/19 6:34 PM, Jiangli Zhou wrote:
>>>>>>>>>>>> ?????? > Hi Ioi,
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > Sorry for the delay again. Will try to put this on 
>>>>>>>>>>>> the top of my
>>>>>>>>>>>> ?????? list
>>>>>>>>>>>> ?????? > next week and reduce the turn-around time. The 
>>>>>>>>>>>> updates look
>>>>>>>>>>>> good in
>>>>>>>>>>>> ?????? > general.
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > We might want to have a better strategy when 
>>>>>>>>>>>> choosing metadata
>>>>>>>>>>>> ?????? > relocation address (when relocation is needed). Some
>>>>>>>>>>>> ?????? > applications/benchmarks may be more sensitive to 
>>>>>>>>>>>> cache
>>>>>>>>>>>> locality and
>>>>>>>>>>>> ?????? > memory/data layout. There was a bug,
>>>>>>>>>>>> ?????? > https://bugs.openjdk.java.net/browse/JDK-8213713 
>>>>>>>>>>>> that caused
>>>>>>>>>>>> 1G gap
>>>>>>>>>>>> ?????? > between Java heap data and metadata before JDK 12. 
>>>>>>>>>>>> The gap
>>>>>>>>>>>> seemed to
>>>>>>>>>>>> ?????? > cause a small but noticeable runtime effect in one 
>>>>>>>>>>>> case that I
>>>>>>>>>>>> came
>>>>>>>>>>>> ?????? > across.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? I guess you're saying we should try to relocate the 
>>>>>>>>>>>> archive into
>>>>>>>>>>>> ?????? somewhere under 32GB?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I don't yet have sufficient data that determins if mapping 
>>>>>>>>>>>> at low
>>>>>>>>>>>> 32G produces better runtime performance. I experimented 
>>>>>>>>>>>> with that,
>>>>>>>>>>>> but didn't see noticeable difference when comparing to 
>>>>>>>>>>>> mapping at
>>>>>>>>>>>> the current default address. It doesn't hurt, I think. So 
>>>>>>>>>>>> it may be
>>>>>>>>>>>> a better choice than relocating to a random address in high 
>>>>>>>>>>>> 32G
>>>>>>>>>>>> space (when Java heap is in low 32G address space).
>>>>>>>>>>> Maybe we should reconsider this when we have more concrete 
>>>>>>>>>>> data for
>>>>>>>>>>> the benefits of moving the compressed class space to under 32G.
>>>>>>>>>>>
>>>>>>>>>>> Please note that in metaspace.cpp, when CDS is disabled and? 
>>>>>>>>>>> the VM
>>>>>>>>>>> fails to allocate the class space at the requested address
>>>>>>>>>>> (0x7c000000 for 16GB heap), it also just allocates from a 
>>>>>>>>>>> random
>>>>>>>>>>> address (without trying to to search under 32GB):
>>>>>>>>>>>
>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/e767fa6a1d45/src/hotspot/share/memory/metaspace.cpp#l1128 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This code has been there since 2013 and we have not seen any 
>>>>>>>>>>> issues.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ?????? Could you elaborate more about the performance 
>>>>>>>>>>>> issue, especially
>>>>>>>>>>>> ?????? about
>>>>>>>>>>>> ?????? cache locality? I looked at JDK-8213713 but it 
>>>>>>>>>>>> didn't mention about
>>>>>>>>>>>> ?????? performance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> When enabling CDS we noticed a small runtime overhead in 
>>>>>>>>>>>> JDK 11
>>>>>>>>>>>> recently with a benchmark. After I backported JDK-8213713 
>>>>>>>>>>>> to 11, it
>>>>>>>>>>>> seemed to reduce the runtime overhead that the benchmark was
>>>>>>>>>>>> experiencing.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? Also, by default, we have non-zero narrow_klass_base 
>>>>>>>>>>>> and
>>>>>>>>>>>> ?????? narrow_klass_shift = 3, and archive relocation 
>>>>>>>>>>>> doesn't change that:
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? $ java -Xlog:cds=debug -version
>>>>>>>>>>>> ?????? ... narrow_klass_base = 0x0000000800000000, 
>>>>>>>>>>>> narrow_klass_shift = 3
>>>>>>>>>>>> ?????? $ java -Xlog:cds=debug -XX:SharedBaseAddress=0 -version
>>>>>>>>>>>> ?????? ... narrow_klass_base = 0x00007f1e8b499000, 
>>>>>>>>>>>> narrow_klass_shift = 3
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? We always use narrow_klass_shift due to this:
>>>>>>>>>>>>
>>>>>>>>>>>> ????????? // CDS uses LogKlassAlignmentInBytes for 
>>>>>>>>>>>> narrow_klass_shift. See
>>>>>>>>>>>> ????????? //
>>>>>>>>>>>> MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() 
>>>>>>>>>>>> for
>>>>>>>>>>>> ????????? // how dump time narrow_klass_shift is set. 
>>>>>>>>>>>> Although, CDS can
>>>>>>>>>>>> work
>>>>>>>>>>>> ????????? // with zero-shift mode also, to be consistent 
>>>>>>>>>>>> with AOT it uses
>>>>>>>>>>>> ????????? // LogKlassAlignmentInBytes for klass shift so 
>>>>>>>>>>>> archived java
>>>>>>>>>>>> ?????? heap objects
>>>>>>>>>>>> ????????? // can be used at same time as AOT code.
>>>>>>>>>>>> ????????? if (!UseSharedSpaces
>>>>>>>>>>>> ????????????? && (uint64_t)(higher_address - lower_base) <=
>>>>>>>>>>>> ?????? UnscaledClassSpaceMax) {
>>>>>>>>>>>> CompressedKlassPointers::set_shift(0);
>>>>>>>>>>>> ????????? } else {
>>>>>>>>>>>> CompressedKlassPointers::set_shift(LogKlassAlignmentInBytes);
>>>>>>>>>>>> ????????? }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Right. If we relocate to low 32G space, it needs to make 
>>>>>>>>>>>> sure that
>>>>>>>>>>>> the range containing the mapped class data and class space 
>>>>>>>>>>>> must be
>>>>>>>>>>>> encodable.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? > Here are some additional comments (minor).
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > Could you please fix the long lines in the following?
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > 1237 void
>>>>>>>>>>>> java_lang_Class::update_archived_primitive_mirror_native_pointers(oop 
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? > archived_mirror) {
>>>>>>>>>>>> ?????? > 1238?? if (MetaspaceShared::relocation_delta() != 
>>>>>>>>>>>> 0) {
>>>>>>>>>>>> ?????? > 1239 
>>>>>>>>>>>> assert(archived_mirror->metadata_field(_klass_offset) ==
>>>>>>>>>>>> ?????? > NULL, "must be for primitive class");
>>>>>>>>>>>> ?????? > 1240
>>>>>>>>>>>> ?????? > 1241???? Klass* ak =
>>>>>>>>>>>> ?????? > 
>>>>>>>>>>>> ((Klass*)archived_mirror->metadata_field(_array_klass_offset)); 
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? > 1242???? if (ak != NULL) {
>>>>>>>>>>>> ?????? > 1243 
>>>>>>>>>>>> archived_mirror->metadata_field_put(_array_klass_offset,
>>>>>>>>>>>> ?????? > (Klass*)(address(ak) + 
>>>>>>>>>>>> MetaspaceShared::relocation_delta()));
>>>>>>>>>>>> ?????? > 1244???? }
>>>>>>>>>>>> ?????? > 1245?? }
>>>>>>>>>>>> ?????? > 1246 }
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >?? 889?? Thread* THREAD = Thread::current();
>>>>>>>>>>>> ?????? >?? 890 Method::sort_methods(ik->methods(), 
>>>>>>>>>>>> /*set_idnums=*/true,
>>>>>>>>>>>> ?????? > dynamic_dump_method_comparator);
>>>>>>>>>>>> ?????? >?? 891?? if (ik->default_methods() != NULL) {
>>>>>>>>>>>> ?????? >?? 892 Method::sort_methods(ik->default_methods(),
>>>>>>>>>>>> ?????? > /*set_idnums=*/false, 
>>>>>>>>>>>> dynamic_dump_method_comparator);
>>>>>>>>>>>> ?????? >?? 893?? }
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? OK will do.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? > Please see inlined comments below.
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > On Mon, Oct 28, 2019 at 9:05 PM Ioi Lam 
>>>>>>>>>>>> <ioi.lam at oracle.com
>>>>>>>>>>>> ?????? <mailto:ioi.lam at oracle.com>> wrote:
>>>>>>>>>>>> ?????? >> Hi Jiangli,
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> Thanks for the review. I've updated the patch 
>>>>>>>>>>>> according to your
>>>>>>>>>>>> ?????? comments:
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8231610-relocate-cds-archive.v04.delta/ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> (the delta is on top of 
>>>>>>>>>>>> 8231610-relocate-cds-archive.v03.delta
>>>>>>>>>>>> ?????? in my
>>>>>>>>>>>> ?????? >> reply to Calvin's comments).
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> On 10/27/19 9:13 PM, Jiangli Zhou wrote:
>>>>>>>>>>>> ?????? >>> Hi Ioi,
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Sorry for the delay. Here are my remaining 
>>>>>>>>>>>> comments.
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/dynamicArchive.cpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 128?? static intx _method_comparator_name_delta;
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> The name of the above variable is confusing. 
>>>>>>>>>>>> It's the value of
>>>>>>>>>>>> ?????? >>> _buffer_to_target_delta. It's better to 
>>>>>>>>>>>> _buffer_to_target_delta
>>>>>>>>>>>> ?????? >>> directly.
>>>>>>>>>>>> ?????? >> _buffer_to_target_delta is a non-static field, but
>>>>>>>>>>>> ?????? >> dynamic_dump_method_comparator() must be a static 
>>>>>>>>>>>> function so
>>>>>>>>>>>> ?????? it can't
>>>>>>>>>>>> ?????? >> use the non-static field easily.
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > It sounds like an issue. _buffer_to_target_delta 
>>>>>>>>>>>> was made as a
>>>>>>>>>>>> ?????? > non-static mostly because we might support more 
>>>>>>>>>>>> than one dynamic
>>>>>>>>>>>> ?????? > archives in the future. However, today's usages 
>>>>>>>>>>>> bake in an
>>>>>>>>>>>> ?????? assumption
>>>>>>>>>>>> ?????? > that _buffer_to_target_delta is a singleton value. 
>>>>>>>>>>>> It is
>>>>>>>>>>>> cleaner to
>>>>>>>>>>>> ?????? > either make _buffer_to_target_delta as a static 
>>>>>>>>>>>> variable for
>>>>>>>>>>>> now, or
>>>>>>>>>>>> ?????? > adding an access API in DynamicArchiveBuilder to 
>>>>>>>>>>>> allow other
>>>>>>>>>>>> code to
>>>>>>>>>>>> ?????? > properly and correctly use the value.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? OK, I'll move it to a static variable.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> Also, we can do a quick pointer comparison of 
>>>>>>>>>>>> 'a_name' and
>>>>>>>>>>>> ?????? >>> 'b_name' first before adjusting the pointers.
>>>>>>>>>>>> ?????? >> I added this:
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>?????? if (a_name == b_name) {
>>>>>>>>>>>> ?????? >>???????? return 0;
>>>>>>>>>>>> ?????? >>?????? }
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 934 void 
>>>>>>>>>>>> DynamicArchiveBuilder::relocate_buffer_to_target() {
>>>>>>>>>>>> ?????? >>> ...
>>>>>>>>>>>> ?????? >>>??? 944
>>>>>>>>>>>> ?????? >>>??? 945 ArchivePtrMarker::compact(relocatable_base,
>>>>>>>>>>>> ?????? relocatable_end);
>>>>>>>>>>>> ?????? >>> ...
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>??? 974 SharedDataRelocator 
>>>>>>>>>>>> patcher((address*)patch_base,
>>>>>>>>>>>> ?????? >>> (address*)patch_end, valid_old_base, valid_old_end,
>>>>>>>>>>>> ?????? >>>??? 975? valid_new_base, valid_new_end, addr_delta);
>>>>>>>>>>>> ?????? >>>??? 976 
>>>>>>>>>>>> ArchivePtrMarker::ptrmap()->iterate(&patcher);
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Could we reduce the number of data re-iterations 
>>>>>>>>>>>> to help
>>>>>>>>>>>> archive
>>>>>>>>>>>> ?????? >>> dumping performance. The 
>>>>>>>>>>>> ArchivePtrMarker::compact operation
>>>>>>>>>>>> ?????? can be
>>>>>>>>>>>> ?????? >>> combined with the patching iteration.
>>>>>>>>>>>> ?????? ArchivePtrMarker::compact API
>>>>>>>>>>>> ?????? >>> can be removed.
>>>>>>>>>>>> ?????? >> That's a good idea. I implemented it using a 
>>>>>>>>>>>> template parameter
>>>>>>>>>>>> ?????? so that
>>>>>>>>>>>> ?????? >> we can have max performance when relocating the 
>>>>>>>>>>>> archive at run
>>>>>>>>>>>> ?????? time.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> I added comments to explain why the relocation is 
>>>>>>>>>>>> done here. The
>>>>>>>>>>>> ?????? >> relocation is pretty rare (only when the base 
>>>>>>>>>>>> archive was not
>>>>>>>>>>>> ?????? mapped at
>>>>>>>>>>>> ?????? >> the default location).
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>??? 967???? address valid_new_base =
>>>>>>>>>>>> ?????? >>> (address)Arguments::default_SharedBaseAddress();
>>>>>>>>>>>> ?????? >>>??? 968???? address valid_new_end? = 
>>>>>>>>>>>> valid_new_base +
>>>>>>>>>>>> ?????? base_plus_top_size;
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> The debugging only code can be included under 
>>>>>>>>>>>> #ifdef ASSERT.
>>>>>>>>>>>> ?????? >> These values are actually also used in debug 
>>>>>>>>>>>> logging so they
>>>>>>>>>>>> ?????? can't be
>>>>>>>>>>>> ?????? >> ifdef'ed out.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> Also, the c++ compiler is pretty good with 
>>>>>>>>>>>> eliding code
>>>>>>>>>>>> that's no
>>>>>>>>>>>> ?????? >> actually used. If I comment out all the logging 
>>>>>>>>>>>> code in
>>>>>>>>>>>> ?????? >> 
>>>>>>>>>>>> DynamicArchiveBuilder::relocate_buffer_to_target() and
>>>>>>>>>>>> ?????? >> SharedDataRelocator, gcc elides all the unused 
>>>>>>>>>>>> fields and their
>>>>>>>>>>>> ?????? >> assignments. So no code is generated for this, etc.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>?????? address valid_new_base =
>>>>>>>>>>>> ?????? >> (address)Arguments::default_SharedBaseAddress();
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> Since #ifdef ASSERT makes the code harder to 
>>>>>>>>>>>> read, I think we
>>>>>>>>>>>> ?????? should use
>>>>>>>>>>>> ?????? >> it only when really necessary.
>>>>>>>>>>>> ?????? > It seems cleaner to get rid of these debugging 
>>>>>>>>>>>> only variables, by
>>>>>>>>>>>> ?????? > using 'relocatable_base' and
>>>>>>>>>>>> ?????? > '(address)Arguments::default_SharedBaseAddress()' 
>>>>>>>>>>>> in the logging
>>>>>>>>>>>> ?????? code.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? SharedDataRelocator is used under 3 different 
>>>>>>>>>>>> situations. These six
>>>>>>>>>>>> ?????? variables (patch_base, patch_end, valid_old_base, 
>>>>>>>>>>>> valid_old_end,
>>>>>>>>>>>> ?????? valid_new_base, valid_new_end) describes what is 
>>>>>>>>>>>> being patched,
>>>>>>>>>>>> ?????? and what
>>>>>>>>>>>> ?????? the expectations are, for each situation. The code 
>>>>>>>>>>>> will be hard to
>>>>>>>>>>>> ?????? understand without them.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? Please note there's also logging code in the 
>>>>>>>>>>>> SharedDataRelocator
>>>>>>>>>>>> ?????? constructor that prints out these values.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? I think I'll just remove the 'debug only' comment to 
>>>>>>>>>>>> avoid
>>>>>>>>>>>> confusion.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ok.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>??? 993
>>>>>>>>>>>> dynamic_info->write_bitmap_region(ArchivePtrMarker::ptrmap());
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> We could combine the archived heap data bitmap 
>>>>>>>>>>>> into the new
>>>>>>>>>>>> ?????? region as
>>>>>>>>>>>> ?????? >>> well? It can be handled as a separate RFE.
>>>>>>>>>>>> ?????? >> I've filed 
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233093
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/filemap.cpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1038???? if (is_static()) {
>>>>>>>>>>>> ?????? >>> 1039?????? if (errno == ENOENT) {
>>>>>>>>>>>> ?????? >>> 1040???????? // Not locating the shared archive 
>>>>>>>>>>>> is ok.
>>>>>>>>>>>> ?????? >>> 1041 fail_continue("Specified shared archive not 
>>>>>>>>>>>> found
>>>>>>>>>>>> ?????? (%s).",
>>>>>>>>>>>> ?????? >>> _full_path);
>>>>>>>>>>>> ?????? >>> 1042?????? } else {
>>>>>>>>>>>> ?????? >>> 1043 fail_continue("Failed to open shared 
>>>>>>>>>>>> archive file
>>>>>>>>>>>> ?????? (%s).",
>>>>>>>>>>>> ?????? >>> 1044 os::strerror(errno));
>>>>>>>>>>>> ?????? >>> 1045?????? }
>>>>>>>>>>>> ?????? >>> 1046???? } else {
>>>>>>>>>>>> ?????? >>> 1047 log_warning(cds, dynamic)("specified 
>>>>>>>>>>>> dynamic archive
>>>>>>>>>>>> ?????? >>> doesn't exist: %s", _full_path);
>>>>>>>>>>>> ?????? >>> 1048???? }
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> If the top layer is explicitly specified by the 
>>>>>>>>>>>> user, a
>>>>>>>>>>>> ?????? warning does
>>>>>>>>>>>> ?????? >>> not seem to be a proper behavior if the VM fails 
>>>>>>>>>>>> to open the
>>>>>>>>>>>> ?????? archive
>>>>>>>>>>>> ?????? >>> file.
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> If might be better to handle the relocation 
>>>>>>>>>>>> unrelated code in
>>>>>>>>>>>> ?????? separate
>>>>>>>>>>>> ?????? >>> changeset and track with a separate RFE.
>>>>>>>>>>>> ?????? >> This code was moved from
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/d3382812b788/src/hotspot/share/memory/dynamicArchive.cpp#l1070 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> so I am not changing the behavior. If you want, 
>>>>>>>>>>>> we can file an
>>>>>>>>>>>> ?????? REF to
>>>>>>>>>>>> ?????? >> change the behavior.
>>>>>>>>>>>> ?????? > Ok. A new RFE sounds like the right thing to 
>>>>>>>>>>>> re-evaluable the
>>>>>>>>>>>> usage
>>>>>>>>>>>> ?????? > issue here. Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? I created 
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233446
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1148 void FileMapInfo::write_region(int region, 
>>>>>>>>>>>> char* base,
>>>>>>>>>>>> ?????? size_t size,
>>>>>>>>>>>> ?????? >>> 1149??????????????????????????????? bool 
>>>>>>>>>>>> read_only, bool
>>>>>>>>>>>> ?????? allow_exec) {
>>>>>>>>>>>> ?????? >>> ...
>>>>>>>>>>>> ?????? >>> 1154
>>>>>>>>>>>> ?????? >>> 1155?? if (region == MetaspaceShared::bm) {
>>>>>>>>>>>> ?????? >>> 1156???? target_base = NULL;
>>>>>>>>>>>> ?????? >>> 1157?? } else if (DynamicDumpSharedSpaces) {
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> It's not too clear to me how the bitmap (bm) 
>>>>>>>>>>>> region is handled
>>>>>>>>>>>> ?????? for the
>>>>>>>>>>>> ?????? >>> base layer and top layer. Could you please explain?
>>>>>>>>>>>> ?????? >> The bm region for both layers are mapped at an 
>>>>>>>>>>>> address picked
>>>>>>>>>>>> ?????? by the OS:
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> char* FileMapInfo::map_relocation_bitmap(size_t& 
>>>>>>>>>>>> bitmap_size) {
>>>>>>>>>>>> ?????? >>???? FileMapRegion* si = 
>>>>>>>>>>>> space_at(MetaspaceShared::bm);
>>>>>>>>>>>> ?????? >>???? bitmap_size = si->used_aligned();
>>>>>>>>>>>> ?????? >>???? bool read_only = true, allow_exec = false;
>>>>>>>>>>>> ?????? >>???? char* requested_addr = NULL; // allow OS to 
>>>>>>>>>>>> pick any
>>>>>>>>>>>> location
>>>>>>>>>>>> ?????? >>???? char* bitmap_base = os::map_memory(_fd, 
>>>>>>>>>>>> _full_path,
>>>>>>>>>>>> ?????? si->file_offset(),
>>>>>>>>>>>> ?????? >> requested_addr, bitmap_size,
>>>>>>>>>>>> ?????? >> read_only, allow_exec);
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? > Ok, after staring at the code for a few seconds I 
>>>>>>>>>>>> saw that's
>>>>>>>>>>>> ?????? intended.
>>>>>>>>>>>> ?????? > If the current region is 'bm', then the 
>>>>>>>>>>>> 'target_base' is NULL
>>>>>>>>>>>> ?????? > regardless if it's static or dynamic archive. 
>>>>>>>>>>>> Otherwise, the
>>>>>>>>>>>> ?????? > 'target_base' is handled differently for the 
>>>>>>>>>>>> static and dynamic
>>>>>>>>>>>> ?????? case.
>>>>>>>>>>>> ?????? > The following would be cleaner and has better 
>>>>>>>>>>>> reliability.
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >???? char* target_base = NULL;
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >???? // The target_base is NULL for 'bm' region.
>>>>>>>>>>>> ?????? >???? if (!region == MetaspaceShared::bm) {
>>>>>>>>>>>> ?????? >?????? if (DynamicDumpSharedSpaces) {
>>>>>>>>>>>> ?????? > assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>>>>>>>> archive
>>>>>>>>>>>> ?????? > doesn't support heap regions");
>>>>>>>>>>>> ?????? >???????? target_base = 
>>>>>>>>>>>> DynamicArchive::buffer_to_target(base);
>>>>>>>>>>>> ?????? >?????? } else {
>>>>>>>>>>>> ?????? >???????? target_base = base;
>>>>>>>>>>>> ?????? >?????? }
>>>>>>>>>>>> ?????? >??? }
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? How about this?
>>>>>>>>>>>>
>>>>>>>>>>>> ????????? char* target_base;
>>>>>>>>>>>> ????????? if (region == MetaspaceShared::bm) {
>>>>>>>>>>>> ??????????? target_base = NULL; // always NULL for bm region.
>>>>>>>>>>>> ????????? } else {
>>>>>>>>>>>> ??????????? if (DynamicDumpSharedSpaces) {
>>>>>>>>>>>> assert(!HeapShared::is_heap_region(region), "dynamic
>>>>>>>>>>>> archive
>>>>>>>>>>>> ?????? doesn't support heap regions");
>>>>>>>>>>>> ??????????????? target_base = 
>>>>>>>>>>>> DynamicArchive::buffer_to_target(base);
>>>>>>>>>>>> ??????????? } else {
>>>>>>>>>>>> ??????????????? target_base = base;
>>>>>>>>>>>> ??????????? }
>>>>>>>>>>>> ????????? }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> No objection If you prefer the extra 'else' block.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1362
>>>>>>>>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);)
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Could you please explain the above?
>>>>>>>>>>>> ?????? >> I added the comments
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>???? // Make sure we don't attempt to use
>>>>>>>>>>>> ?????? header()->mapped_base_address()
>>>>>>>>>>>> ?????? >> unless
>>>>>>>>>>>> ?????? >>???? // it's been successfully mapped.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> DEBUG_ONLY(header()->set_mapped_base_address((char*)(uintptr_t)0xdeadbeef);) 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1359?? FileMapRegion* last_region = NULL;
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1371???? if (last_region != NULL) {
>>>>>>>>>>>> ?????? >>> 1372?????? // Ensure that the OS won't be able 
>>>>>>>>>>>> to allocate new
>>>>>>>>>>>> ?????? memory
>>>>>>>>>>>> ?????? >>> spaces between any mapped
>>>>>>>>>>>> ?????? >>> 1373?????? // regions, or else it would mess up 
>>>>>>>>>>>> the simple
>>>>>>>>>>>> ?????? comparision
>>>>>>>>>>>> ?????? >>> in MetaspaceObj::is_shared().
>>>>>>>>>>>> ?????? >>> 1374 assert(si->mapped_base() ==
>>>>>>>>>>>> last_region->mapped_end(),
>>>>>>>>>>>> ?????? >>> "must have no gaps");
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1379???? last_region = si;
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Can you please place 'last_region' related code 
>>>>>>>>>>>> under #ifdef
>>>>>>>>>>>> ?????? ASSERT?
>>>>>>>>>>>> ?????? >> I think that will make the code more cluttered. 
>>>>>>>>>>>> The compiler
>>>>>>>>>>>> will
>>>>>>>>>>>> ?????? >> optimize out that away.
>>>>>>>>>>>> ?????? > It's cleaner to define debugging only variable for 
>>>>>>>>>>>> debugging only
>>>>>>>>>>>> ?????? > builds. You can wrapper it and related usage with 
>>>>>>>>>>>> DEBUG_ONLY.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? OK, will do.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1478 char* 
>>>>>>>>>>>> FileMapInfo::map_relocation_bitmap(size_t&
>>>>>>>>>>>> ?????? bitmap_size) {
>>>>>>>>>>>> ?????? >>> 1479?? FileMapRegion* si = 
>>>>>>>>>>>> space_at(MetaspaceShared::bm);
>>>>>>>>>>>> ?????? >>> 1480?? bitmap_size = si->used_aligned();
>>>>>>>>>>>> ?????? >>> 1481?? bool read_only = true, allow_exec = false;
>>>>>>>>>>>> ?????? >>> 1482?? char* requested_addr = NULL; // allow OS 
>>>>>>>>>>>> to pick any
>>>>>>>>>>>> ?????? location
>>>>>>>>>>>> ?????? >>> 1483?? char* bitmap_base = os::map_memory(_fd, 
>>>>>>>>>>>> _full_path,
>>>>>>>>>>>> ?????? si->file_offset(),
>>>>>>>>>>>> ?????? >>> 1484 requested_addr, bitmap_size,
>>>>>>>>>>>> ?????? >>> read_only, allow_exec);
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> We need to handle mapping failure here.
>>>>>>>>>>>> ?????? >> It's handled here:
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> bool FileMapInfo::relocate_pointers(intx 
>>>>>>>>>>>> addr_delta) {
>>>>>>>>>>>> ?????? >>???? log_debug(cds, reloc)("runtime archive 
>>>>>>>>>>>> relocation start");
>>>>>>>>>>>> ?????? >>???? size_t bitmap_size;
>>>>>>>>>>>> ?????? >>???? char* bitmap_base = 
>>>>>>>>>>>> map_relocation_bitmap(bitmap_size);
>>>>>>>>>>>> ?????? >>???? if (bitmap_base != NULL) {
>>>>>>>>>>>> ?????? >>???? ...
>>>>>>>>>>>> ?????? >>???? } else {
>>>>>>>>>>>> ?????? >>?????? log_error(cds)("failed to map relocation 
>>>>>>>>>>>> bitmap");
>>>>>>>>>>>> ?????? >>?????? return false;
>>>>>>>>>>>> ?????? >>???? }
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? > 'bitmap_base' is used immediately after 
>>>>>>>>>>>> map_memory(). So the
>>>>>>>>>>>> check
>>>>>>>>>>>> ?????? > needs to be done immediately after map_memory(), 
>>>>>>>>>>>> but not in the
>>>>>>>>>>>> ?????? caller
>>>>>>>>>>>> ?????? > of map_relocation_bitmap().
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > 1490?? char* bitmap_base = os::map_memory(_fd, 
>>>>>>>>>>>> _full_path,
>>>>>>>>>>>> ?????? si->file_offset(),
>>>>>>>>>>>> ?????? > 1491 requested_addr, bitmap_size,
>>>>>>>>>>>> ?????? > read_only, allow_exec);
>>>>>>>>>>>> ?????? > 1492
>>>>>>>>>>>> ?????? > 1493?? if (VerifySharedSpaces && bitmap_base != 
>>>>>>>>>>>> NULL &&
>>>>>>>>>>>> ?????? > !region_crc_check(bitmap_base, bitmap_size, 
>>>>>>>>>>>> si->crc())) {
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? OK, I'll fix that.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1513???? // debug only -- the current value of 
>>>>>>>>>>>> the pointers
>>>>>>>>>>>> to be
>>>>>>>>>>>> ?????? >>> patched must be within this
>>>>>>>>>>>> ?????? >>> 1514???? // range (i.e., must be between the 
>>>>>>>>>>>> requesed base
>>>>>>>>>>>> ?????? address,
>>>>>>>>>>>> ?????? >>> and the of the current archive).
>>>>>>>>>>>> ?????? >>> 1515???? // Note: top archive may point to 
>>>>>>>>>>>> objects in the base
>>>>>>>>>>>> ?????? >>> archive, but not the other way around.
>>>>>>>>>>>> ?????? >>> 1516???? address valid_old_base =
>>>>>>>>>>>> (address)header()->requested_base_address();
>>>>>>>>>>>> ?????? >>> 1517???? address valid_old_end? = valid_old_base +
>>>>>>>>>>>> ?????? mapping_end_offset();
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Please place all FileMapInfo::relocate_pointers 
>>>>>>>>>>>> debugging only
>>>>>>>>>>>> ?????? code
>>>>>>>>>>>> ?????? >>> under #ifdef ASSERT.
>>>>>>>>>>>> ?????? >> Ditto about ifdef ASSERT
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/heapShared.cpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>??? 441 void
>>>>>>>>>>>> HeapShared::initialize_from_archived_subgraph(Klass* k) {
>>>>>>>>>>>> ?????? >>>??? 442?? if (!open_archive_heap_region_mapped() ||
>>>>>>>>>>>> ?????? !MetaspaceObj::is_shared(k)) {
>>>>>>>>>>>> ?????? >>>??? 443???? return; // nothing to do
>>>>>>>>>>>> ?????? >>>??? 444?? }
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> When do we call 
>>>>>>>>>>>> HeapShared::initialize_from_archived_subgraph
>>>>>>>>>>>> ?????? for a
>>>>>>>>>>>> ?????? >>> klass that's not shared?
>>>>>>>>>>>> ?????? >> I've removed the !MetaspaceObj::is_shared(k). I 
>>>>>>>>>>>> probably added
>>>>>>>>>>>> ?????? that for
>>>>>>>>>>>> ?????? >> debugging purposes only.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>>??? 616?? DEBUG_ONLY({
>>>>>>>>>>>> ?????? >>>??? 617?????? Klass* klass = orig_obj->klass();
>>>>>>>>>>>> ?????? >>>??? 618 assert(klass !=
>>>>>>>>>>>> SystemDictionary::Module_klass() &&
>>>>>>>>>>>> ?????? >>>??? 619 klass !=
>>>>>>>>>>>> SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>>>>>>> ?????? >>>??? 620 klass !=
>>>>>>>>>>>> ?????? SystemDictionary::MemberName_klass() &&
>>>>>>>>>>>> ?????? >>>??? 621 klass !=
>>>>>>>>>>>> SystemDictionary::Context_klass() &&
>>>>>>>>>>>> ?????? >>>??? 622 klass !=
>>>>>>>>>>>> SystemDictionary::ClassLoader_klass(), "we
>>>>>>>>>>>> ?????? >>> can only relocate metaspace object pointers inside
>>>>>>>>>>>> java_lang_Class
>>>>>>>>>>>> ?????? >>> instances");
>>>>>>>>>>>> ?????? >>>??? 623???? });
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Let's leave the above for a separate RFE. I 
>>>>>>>>>>>> think assert is not
>>>>>>>>>>>> ?????? >>> sufficient for the check. Also, why 
>>>>>>>>>>>> ResolvedMethodName,
>>>>>>>>>>>> Module and
>>>>>>>>>>>> ?????? >>> MemberName cannot be part of the graph?
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >> I added the following comment:
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>???? DEBUG_ONLY({
>>>>>>>>>>>> ?????? >>???????? // The following are classes in
>>>>>>>>>>>> ?????? share/classfile/javaClasses.cpp
>>>>>>>>>>>> ?????? >> that have injected native pointers
>>>>>>>>>>>> ?????? >>???????? // to metaspace objects. To support these 
>>>>>>>>>>>> classes, we
>>>>>>>>>>>> ?????? need to add
>>>>>>>>>>>> ?????? >> relocation code similar to
>>>>>>>>>>>> ?????? >>???????? //
>>>>>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers.
>>>>>>>>>>>> ?????? >>???????? Klass* klass = orig_obj->klass();
>>>>>>>>>>>> ?????? >>???????? assert(klass != 
>>>>>>>>>>>> SystemDictionary::Module_klass() &&
>>>>>>>>>>>> ?????? >>??????????????? klass !=
>>>>>>>>>>>> SystemDictionary::ResolvedMethodName_klass() &&
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? > It's too restrictive to exclude those objects from 
>>>>>>>>>>>> the archived
>>>>>>>>>>>> ?????? object
>>>>>>>>>>>> ?????? > graph because metadata relocation, since metadata 
>>>>>>>>>>>> relocation is
>>>>>>>>>>>> ?????? rare.
>>>>>>>>>>>> ?????? > The trade-off doesn't seem to buy us much.
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? > Do you plan to add the needed relocation code?
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? I looked more into this. Actually we cannot handle 
>>>>>>>>>>>> these 5
>>>>>>>>>>>> classes at
>>>>>>>>>>>> ?????? all, even without archive relocation:
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? [1] #define MODULE_INJECTED_FIELDS(macro) \
>>>>>>>>>>>> ????????? macro(java_lang_Module, module_entry, 
>>>>>>>>>>>> intptr_signature, false)
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? ->? module_entry is malloc'ed
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? [2] #define RESOLVEDMETHOD_INJECTED_FIELDS(macro) \
>>>>>>>>>>>> macro(java_lang_invoke_ResolvedMethodName, vmholder,
>>>>>>>>>>>> ?????? object_signature, false) \
>>>>>>>>>>>> macro(java_lang_invoke_ResolvedMethodName, vmtarget,
>>>>>>>>>>>> ?????? intptr_signature, false)
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? -> these fields are related to method handles and 
>>>>>>>>>>>> lambda forms,
>>>>>>>>>>>> etc.
>>>>>>>>>>>> ?????? They can't be easily be archived without 
>>>>>>>>>>>> implementing lambda form
>>>>>>>>>>>> ?????? archiving. (I did a prototype; it's very complex and 
>>>>>>>>>>>> fragile).
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? [3] #define CALLSITECONTEXT_INJECTED_FIELDS(macro) \
>>>>>>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>>>>>>> ?????? vmdependencies, intptr_signature, false) \
>>>>>>>>>>>> macro(java_lang_invoke_MethodHandleNatives_CallSiteContext,
>>>>>>>>>>>> ?????? last_cleanup, long_signature, false)
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? -> vmdependencies is malloc'ed.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? [4] #define
>>>>>>>>>>>> MEMBERNAME_INJECTED_FIELDS(macro) \
>>>>>>>>>>>> ????????? macro(java_lang_invoke_MemberName, vmindex, 
>>>>>>>>>>>> intptr_signature,
>>>>>>>>>>>> ?????? false)
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? -> this one is probably OK. Despite being declared as
>>>>>>>>>>>> ?????? 'intptr_signature', it seems to be used just as an 
>>>>>>>>>>>> integer.
>>>>>>>>>>>> However,
>>>>>>>>>>>> ?????? MemberNames are typically used with [2] and [3]. So 
>>>>>>>>>>>> let's just
>>>>>>>>>>>> ?????? forbid it
>>>>>>>>>>>> ?????? to be safe.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? [2] [3] [4] are not used directly by regular Java 
>>>>>>>>>>>> code and are
>>>>>>>>>>>> ?????? unlikely
>>>>>>>>>>>> ?????? to be referenced (directly or indirectly) by static 
>>>>>>>>>>>> fields (except
>>>>>>>>>>>> ?????? for
>>>>>>>>>>>> ?????? the static fields in the classes in 
>>>>>>>>>>>> java.lang.invoke, which we
>>>>>>>>>>>> ?????? probably
>>>>>>>>>>>> ?????? won't support for heap archiving due to the problem 
>>>>>>>>>>>> I described for
>>>>>>>>>>>> ?????? [2]). Objects of these types are typically 
>>>>>>>>>>>> referenced via constant
>>>>>>>>>>>> ?????? pool
>>>>>>>>>>>> ?????? entries.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? [5] #define CLASSLOADER_INJECTED_FIELDS(macro) \
>>>>>>>>>>>> ????????? macro(java_lang_ClassLoader, loader_data, 
>>>>>>>>>>>> intptr_signature,
>>>>>>>>>>>> false)
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? -> loader_data is malloc'ed.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? So, I will change the DEBUG_ONLY into a product-mode 
>>>>>>>>>>>> check, and
>>>>>>>>>>>> quit
>>>>>>>>>>>> ?????? dumping if these objects are found in the object 
>>>>>>>>>>>> subgraph.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds good. Can you please also add a comment with 
>>>>>>>>>>>> explanation.
>>>>>>>>>>>>
>>>>>>>>>>>> For? ClassLoader and Module, it worth considering caching the
>>>>>>>>>>>> additional native data some time in the future. Lois had 
>>>>>>>>>>>> suggested
>>>>>>>>>>>> the Module part a while ago.
>>>>>>>>>>> I think we can do that if/when we archive Modules directly 
>>>>>>>>>>> into the
>>>>>>>>>>> shared heap.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? Maybe we should backport the check to older versions 
>>>>>>>>>>>> as well?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We should discuss with Andrew Haley for backports to JDK 11 
>>>>>>>>>>>> update
>>>>>>>>>>>> releases. Since the current OpenJDK 11 only applies Java heap
>>>>>>>>>>>> archiving to a restricted set of JDK library code, I think 
>>>>>>>>>>>> it is
>>>>>>>>>>>> safe without the new check.
>>>>>>>>>>>>
>>>>>>>>>>>> For non-LTS releases, it might not be worthwhile as they 
>>>>>>>>>>>> may not be
>>>>>>>>>>>> widely used?
>>>>>>>>>>> I agree. FYI, we (Oracle) have no plan for backporting more 
>>>>>>>>>>> types of
>>>>>>>>>>> heap object archiving, so the decision would be up to 
>>>>>>>>>>> whoever that
>>>>>>>>>>> decides to do so.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> - Ioi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jiangli
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1036?? metaspace_rs =
>>>>>>>>>>>> ReservedSpace(compressed_class_space_size(),
>>>>>>>>>>>> ?????? >>> 1037 _reserve_alignment,
>>>>>>>>>>>> ?????? >>> 1038?? large_pages,
>>>>>>>>>>>> ?????? >>> 1039?? requested_addr);
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Please fix indentation.
>>>>>>>>>>>> ?????? >> Fixed.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/metaspaceClosure.hpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>???? 78?? enum SpecialRef {
>>>>>>>>>>>> ?????? >>>???? 79 _method_entry_ref
>>>>>>>>>>>> ?????? >>>???? 80?? };
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Are there other pointers that are not references to
>>>>>>>>>>>> ?????? MetaspaceObj? If
>>>>>>>>>>>> ?????? >>> _method_entry_ref is the only type, it's 
>>>>>>>>>>>> probably not worth
>>>>>>>>>>>> ?????? defining
>>>>>>>>>>>> ?????? >>> SpecialRef?
>>>>>>>>>>>> ?????? >> There may be more types in the future, so I want 
>>>>>>>>>>>> to have a
>>>>>>>>>>>> ?????? stable API
>>>>>>>>>>>> ?????? >> that can be easily expanded without touching all 
>>>>>>>>>>>> the code that
>>>>>>>>>>>> ?????? uses it.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/metaspaceShared.hpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>???? 42 enum MapArchiveResult {
>>>>>>>>>>>> ?????? >>>???? 43 MAP_ARCHIVE_SUCCESS,
>>>>>>>>>>>> ?????? >>>???? 44 MAP_ARCHIVE_MMAP_FAILURE,
>>>>>>>>>>>> ?????? >>>???? 45 MAP_ARCHIVE_OTHER_FAILURE
>>>>>>>>>>>> ?????? >>>???? 46 };
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> If we want to define different failure types, 
>>>>>>>>>>>> it's probably
>>>>>>>>>>>> worth
>>>>>>>>>>>> ?????? >>> using separate types for relocation failure and 
>>>>>>>>>>>> validation
>>>>>>>>>>>> ?????? failure.
>>>>>>>>>>>> ?????? >> For now, I just need to distinguish between 
>>>>>>>>>>>> MMAP_FAILURE (where
>>>>>>>>>>>> ?????? I should
>>>>>>>>>>>> ?????? >> attempt to remap at an alternative address) and 
>>>>>>>>>>>> OTHER_FAILURE
>>>>>>>>>>>> ?????? (where the
>>>>>>>>>>>> ?????? >> CDS archive loading will fail -- due to 
>>>>>>>>>>>> validation error,
>>>>>>>>>>>> ?????? insufficient
>>>>>>>>>>>> ?????? >> memory, etc -- without attempting to remap.)
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> ---
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>??? 193?? static intx _mapping_delta; // FIXME 
>>>>>>>>>>>> rename
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> How about _relocation_delta?
>>>>>>>>>>>> ?????? >> Changed as suggested.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/oops/instanceKlass
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 1573 bool 
>>>>>>>>>>>> InstanceKlass::_disable_method_binary_search = false;
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> The use of _disable_method_binary_search is not 
>>>>>>>>>>>> necessary. You
>>>>>>>>>>>> ?????? can use
>>>>>>>>>>>> ?????? >>> DynamicDumpSharedSpaces for the purpose. That 
>>>>>>>>>>>> would make things
>>>>>>>>>>>> ?????? >>> cleaner.
>>>>>>>>>>>> ?????? >> If we always disable the binary search when
>>>>>>>>>>>> ?????? DynamicDumpSharedSpaces is
>>>>>>>>>>>> ?????? >> true, it will slow down normal execution of the 
>>>>>>>>>>>> Java program
>>>>>>>>>>>> when
>>>>>>>>>>>> ?????? >> -XX:ArchiveClassesAtExit has been specified, but 
>>>>>>>>>>>> the program
>>>>>>>>>>>> ?????? hasn't exited.
>>>>>>>>>>>> ?????? > Could you please add some comments to
>>>>>>>>>>>> _disable_method_binary_search
>>>>>>>>>>>> ?????? > with the above explanation? Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? OK
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>> - 
>>>>>>>>>>>> test/hotspot/jtreg/runtime/cds/SpaceUtilizationCheck.java
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> 76???????????????????? if (name.equals("s0") ||
>>>>>>>>>>>> ?????? name.equals("s1")) {
>>>>>>>>>>>> ?????? >>> 77?????????????????????? // String regions are 
>>>>>>>>>>>> listed at
>>>>>>>>>>>> ?????? the end and
>>>>>>>>>>>> ?????? >>> they may not be fully occupied.
>>>>>>>>>>>> ?????? >>> 78?????????????????????? break;
>>>>>>>>>>>> ?????? >>> 79???????????????????? } else if 
>>>>>>>>>>>> (name.equals("bm")) {
>>>>>>>>>>>> ?????? >>> 80?????????????????????? // Bitmap space does 
>>>>>>>>>>>> not have a
>>>>>>>>>>>> ?????? requested address.
>>>>>>>>>>>> ?????? >>> 81?????????????????????? break;
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> It's not part of your change, but could you 
>>>>>>>>>>>> please fix line 76
>>>>>>>>>>>> ?????? - 78
>>>>>>>>>>>> ?????? >>> since it is trivial. It seems the lines can be 
>>>>>>>>>>>> removed.
>>>>>>>>>>>> ?????? >> Removed.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - /src/hotspot/share/memory/archiveUtils.hpp
>>>>>>>>>>>> ?????? >>> The file name does not match with the macro 
>>>>>>>>>>>> '#ifndef
>>>>>>>>>>>> ?????? >>> SHARE_MEMORY_SHAREDDATARELOCATOR_HPP'. Could you 
>>>>>>>>>>>> please rename
>>>>>>>>>>>> ?????? >>> archiveUtils.* ? archiveRelocator.hpp and
>>>>>>>>>>>> archiveRelocator.cpp are
>>>>>>>>>>>> ?????? >>> more descriptive.
>>>>>>>>>>>> ?????? >> I named the file archiveUtils.hpp so we can move 
>>>>>>>>>>>> other misc
>>>>>>>>>>>> ?????? stuff used
>>>>>>>>>>>> ?????? >> by dumping into this file (e.g., DumpRegion, 
>>>>>>>>>>>> WriteClosure from
>>>>>>>>>>>> ?????? >> metaspaceShared.hpp), since theses are not used 
>>>>>>>>>>>> by the majority
>>>>>>>>>>>> ?????? of the
>>>>>>>>>>>> ?????? >> files that use metaspaceShared.hpp.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >> I fixed the ifdef.
>>>>>>>>>>>> ?????? >>
>>>>>>>>>>>> ?????? >>> - src/hotspot/share/memory/archiveUtils.cpp
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>???? 36 void 
>>>>>>>>>>>> ArchivePtrMarker::initialize(CHeapBitMap* ptrmap,
>>>>>>>>>>>> ?????? address*
>>>>>>>>>>>> ?????? >>> ptr_base, address* ptr_end) {
>>>>>>>>>>>> ?????? >>>???? 37?? assert(_ptrmap == NULL, "initialize 
>>>>>>>>>>>> only once");
>>>>>>>>>>>> ?????? >>>???? 38?? _ptr_base = ptr_base;
>>>>>>>>>>>> ?????? >>>???? 39?? _ptr_end = ptr_end;
>>>>>>>>>>>> ?????? >>>???? 40?? _compacted = false;
>>>>>>>>>>>> ?????? >>>???? 41?? _ptrmap = ptrmap;
>>>>>>>>>>>> ?????? >>>???? 42 _ptrmap->initialize(12 * M / 
>>>>>>>>>>>> sizeof(intptr_t)); //
>>>>>>>>>>>> ?????? default
>>>>>>>>>>>> ?????? >>> archive is about 12MB.
>>>>>>>>>>>> ?????? >>>???? 43 }
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> Could we do a better estimate here? We could 
>>>>>>>>>>>> guesstimate the
>>>>>>>>>>>> size
>>>>>>>>>>>> ?????? >>> based on the current used class space and 
>>>>>>>>>>>> metaspace size. It's
>>>>>>>>>>>> ?????? okay if
>>>>>>>>>>>> ?????? >>> a larger bitmap used, since it can be reduced 
>>>>>>>>>>>> after all
>>>>>>>>>>>> ?????? marking are
>>>>>>>>>>>> ?????? >>> done.
>>>>>>>>>>>> ?????? >> The bitmap is automatically expanded when 
>>>>>>>>>>>> necessary in
>>>>>>>>>>>> ?????? >> ArchivePtrMarker::mark_pointer(). It's only about 
>>>>>>>>>>>> 1/32 or 1/64
>>>>>>>>>>>> ?????? of the
>>>>>>>>>>>> ?????? >> total archive size, so even if we do expand, the 
>>>>>>>>>>>> cost will be
>>>>>>>>>>>> ?????? trivial.
>>>>>>>>>>>> ?????? > The initial value is based on the default CDS 
>>>>>>>>>>>> archive. When
>>>>>>>>>>>> dealing
>>>>>>>>>>>> ?????? > with a really large archive, it would have to 
>>>>>>>>>>>> re-grow many times.
>>>>>>>>>>>> ?????? > Also, using a hard-coded value is less desirable.
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? OK, I changed it to the following
>>>>>>>>>>>>
>>>>>>>>>>>> ????????? // Use this as initial guesstimate. We should 
>>>>>>>>>>>> need less space
>>>>>>>>>>>> ?????? in the
>>>>>>>>>>>> ????????? // archive, but if we're wrong the bitmap will be 
>>>>>>>>>>>> expanded
>>>>>>>>>>>> ?????? automatically.
>>>>>>>>>>>> ????????? size_t estimated_archive_size =
>>>>>>>>>>>> MetaspaceGC::capacity_until_GC();
>>>>>>>>>>>> ????????? // But set it smaller in debug builds so we 
>>>>>>>>>>>> always test the
>>>>>>>>>>>> ?????? expansion
>>>>>>>>>>>> ?????? code.
>>>>>>>>>>>> ????????? // (Default archive is about 12MB).
>>>>>>>>>>>> ????????? DEBUG_ONLY(estimated_archive_size = 6 * M);
>>>>>>>>>>>>
>>>>>>>>>>>> ????????? // We need one bit per pointer in the archive.
>>>>>>>>>>>> _ptrmap->initialize(estimated_archive_size / 
>>>>>>>>>>>> sizeof(intptr_t));
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? Thanks!
>>>>>>>>>>>> ?????? - Ioi
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>>
>>>>>>>>>>>> ?????? >>> On Wed, Oct 16, 2019 at 4:58 PM Jiangli Zhou
>>>>>>>>>>>> ?????? <jianglizhou at google.com 
>>>>>>>>>>>> <mailto:jianglizhou at google.com>> wrote:
>>>>>>>>>>>> ?????? >>>> Hi Ioi,
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> This is another great step for CDS usability 
>>>>>>>>>>>> improvement.
>>>>>>>>>>>> ?????? Thank you!
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> I have a high level question (or request): 
>>>>>>>>>>>> could we consider
>>>>>>>>>>>> ?????? >>>> separating the relocation work for 'direct' 
>>>>>>>>>>>> class metadata
>>>>>>>>>>>> ?????? from other
>>>>>>>>>>>> ?????? >>>> types of metadata (such as the shared system 
>>>>>>>>>>>> dictionary,
>>>>>>>>>>>> ?????? symbol table,
>>>>>>>>>>>> ?????? >>>> etc)? Initially we only relocate the tables and 
>>>>>>>>>>>> other
>>>>>>>>>>>> ?????? archived global
>>>>>>>>>>>> ?????? >>>> data. When each archived class is being loaded, 
>>>>>>>>>>>> we can
>>>>>>>>>>>> ?????? relocate all
>>>>>>>>>>>> ?????? >>>> the pointers within the current class. We could 
>>>>>>>>>>>> find the
>>>>>>>>>>>> ?????? segment (for
>>>>>>>>>>>> ?????? >>>> the current class) in the bitmap and update the 
>>>>>>>>>>>> pointers
>>>>>>>>>>>> ?????? within the
>>>>>>>>>>>> ?????? >>>> segment. That way we can reduce initial startup 
>>>>>>>>>>>> costs and
>>>>>>>>>>>> ?????? also avoid
>>>>>>>>>>>> ?????? >>>> relocating class data that's not used at 
>>>>>>>>>>>> runtime. In some
>>>>>>>>>>>> ?????? real world
>>>>>>>>>>>> ?????? >>>> large systems, an archive may contain extremely 
>>>>>>>>>>>> large
>>>>>>>>>>>> number of
>>>>>>>>>>>> ?????? >>>> classes.
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> Following are partial review comments so we can 
>>>>>>>>>>>> move things
>>>>>>>>>>>> ?????? forward.
>>>>>>>>>>>> ?????? >>>> Still going through the rest of the changes.
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> - src/hotspot/share/classfile/javaClasses.cpp
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> 1218 void
>>>>>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(oop
>>>>>>>>>>>> ?????? >>>> archived_mirror) {
>>>>>>>>>>>> ?????? >>>> 1219?? Klass* k =
>>>>>>>>>>>> ((Klass*)archived_mirror->metadata_field(_klass_offset));
>>>>>>>>>>>> ?????? >>>> 1220?? if (k != NULL) { // k is NULL for the 
>>>>>>>>>>>> primitive
>>>>>>>>>>>> ?????? classes such as
>>>>>>>>>>>> ?????? >>>> java.lang.Byte::TYPE <<<<<<<<<<<
>>>>>>>>>>>> ?????? >>>> 1221 
>>>>>>>>>>>> archived_mirror->metadata_field_put(_klass_offset,
>>>>>>>>>>>> ?????? >>>> (Klass*)(address(k) + 
>>>>>>>>>>>> MetaspaceShared::mapping_delta()));
>>>>>>>>>>>> ?????? >>>> 1222?? }
>>>>>>>>>>>> ?????? >>>> 1223 ...
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> Primitive type mirrors are handled separately. 
>>>>>>>>>>>> Could you
>>>>>>>>>>>> ?????? please verify
>>>>>>>>>>>> ?????? >>>> if this call path happens for primitive type 
>>>>>>>>>>>> mirror?
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> To answer my question above, looks like you 
>>>>>>>>>>>> added the
>>>>>>>>>>>> ?????? following, which
>>>>>>>>>>>> ?????? >>>> is to be used for primitive type mirrors. That 
>>>>>>>>>>>> seems to be
>>>>>>>>>>>> ?????? the reason
>>>>>>>>>>>> ?????? >>>> why update_archived_mirror_native_pointers is 
>>>>>>>>>>>> trying to also
>>>>>>>>>>>> ?????? cover
>>>>>>>>>>>> ?????? >>>> primitive type. It better to have a separate 
>>>>>>>>>>>> API for
>>>>>>>>>>>> ?????? primitive type
>>>>>>>>>>>> ?????? >>>> mirror, which is cleaner. And, we also can 
>>>>>>>>>>>> replace the above
>>>>>>>>>>>> ?????? check at
>>>>>>>>>>>> ?????? >>>> line 1220 to be an assert for regular mirrors.
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> +void ReadClosure::do_mirror_oop(oop *p) {
>>>>>>>>>>>> ?????? >>>> +? do_oop(p);
>>>>>>>>>>>> ?????? >>>> +? oop mirror = *p;
>>>>>>>>>>>> ?????? >>>> +? if (mirror != NULL) {
>>>>>>>>>>>> ?????? >>>> +
>>>>>>>>>>>> java_lang_Class::update_archived_mirror_native_pointers(mirror); 
>>>>>>>>>>>>
>>>>>>>>>>>> ?????? >>>> +? }
>>>>>>>>>>>> ?????? >>>> +}
>>>>>>>>>>>> ?????? >>>> +
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> How about renaming 
>>>>>>>>>>>> update_archived_mirror_native_pointers to
>>>>>>>>>>>> ?????? >>>> update_archived_mirror_klass_pointers.
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> It would be good to pass the current klass as 
>>>>>>>>>>>> an argument.
>>>>>>>>>>>> We can
>>>>>>>>>>>> ?????? >>>> verify the relocated pointer matches with the 
>>>>>>>>>>>> current klass
>>>>>>>>>>>> ?????? pointer.
>>>>>>>>>>>> ?????? >>>>
>>>>>>>>>>>> ?????? >>>> We should also check if relocation is necessary 
>>>>>>>>>>>> before
>>>>>>>>>>>> ?????? spending cycles
>>>>>>>>>>>> ?????? >>>> to obtain the
>>>
>


From jianglizhou at google.com  Wed Nov 13 17:23:30 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Wed, 13 Nov 2019 09:23:30 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <fd3d57f2-6f1b-7b3d-0601-6f2c36182559@oracle.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
 <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
 <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>
 <fd3d57f2-6f1b-7b3d-0601-6f2c36182559@oracle.com>
Message-ID: <CALrW1jx1uv=PU3Y2bfhOLM3ibCcQQ4pt1z=TiC45S2Mprcr0gg@mail.gmail.com>

Hi David,

On Tue, Nov 12, 2019 at 10:00 PM David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Jiangli,
>
> On 13/11/2019 12:20 pm, Jiangli Zhou wrote:
> > Hi Harold and Ioi,
> >
> > Thanks a lot for the additional feedback.
> >
> > I did some quick research today about -Xverify:none usages. My finding
> > showed that the use of -Xverify:none is not very uncommon in some
> > cases. Here are some of the usages:
> >
> > - trusted tools
>
> But what is the context? Is it:
>
> "I trust this tool, and all other classes, so I'll optimize by disabling
> verification,"; or

This is the case. For a tool that's developed by a user and properly
compiled by javac, user may want to disable class verification when
running the tool.

>
> "This tool produces non-verifiable classfiles, but I trust the tool and
> so will disable verification" (which implicitly means all
> classes/libraries have to be fully trusted)
>
> ?
>
> I'm not sure you can use any existing uses of -Xverify:none to infer the
> applicability or not to what is being proposed here for CDS.

In above example, CDS dump time forces verification for the tool's
classes as long as they are placed in -cp path. Without CDS involved,
users choice is honored. I feel this usage may be a lurking issue when
more users start to use CDS/AppCDS.

Harold, Ioi and I have a discussion for pre-jdk-6 verification off the
mailing list, since verification is security related and may be
sensitive. I'll loop you in. It's possible we may be able to separate
the pre-jdk-6 class problem from the general CDS -Xverify:none topics.

>
> > - some limited testing environment
> >
> > CDS (particularly with dynamic archiving capability) may help avoid
> > runtime verification overhead by verifying classes at dump time and
> > reduce the needs for -Xverify:none. It would be good to have
> > strategies for the following senators as well when removing
> > -Xverify:none:
> >
> > 1) In cases when shared archive is disabled at runtime (I hope it's
> > not common cases)
>
> I'm not quite sure what you are saying here. If a pre-verified archive
> can't be used at runtime then normal verification should occur as
> classes are not being loaded from a known pre-verified location.

CDS/AppCDS are still not widely adopted yet. When users start to learn
more about CDS/AppCDS capability, they may still choose to not use the
feature based on their specific requirements. For example, a user may
choose to not use AppCDS and also turn off the default CDS.

>
> > 2) When users want to reduce the overhead caused by verification
> > during archiving dump time
>
> I would not expect dumping to be such a time critical activity that
> users would care about the "overhead" of verification.

With dynamic archiving, dump time performance can be more important to users.

Best,

Jiangli

>
> Cheers,
> David
>
> > Thoughts?
> >
> > Best,
> > Jiangli
> >
> > On Tue, Nov 12, 2019 at 4:16 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>
> >> I am also a little worried that this might send the wrong message -- "if
> >> you want to archive pre-JDK6 classes, you need to disable verification
> >> altogether for all classes in your entire app".
> >>
> >> Thanks
> >> - Ioi
> >>
> >> On 11/12/19 12:40 PM, Harold Seigel wrote:
> >>> Hi Jiangli,
> >>>
> >>> I think this change is going in the wrong direction.  We are trying to
> >>> discourage disabling verification, not encourage it.  We also do not
> >>> want to create more use-cases for preserving -Xverify:none.
> >>>
> >>> It looks like your change would allow archiving of unverified pre-JDK6
> >>> classes, but not allow archiving of verified pre-JDK6 classes.  If so,
> >>> that seems backward.
> >>>
> >>> Thanks, Harold
> >>>
> >>> On 11/11/2019 11:53 PM, Ioi Lam wrote:
> >>>> I wonder if there's a safer alternative. Are there tools that can add
> >>>> stackmaps to pre-JDK6 classes? That way they can be verified with the
> >>>> split verifier during CDS dump time.
> >>>>
> >>>> Thanks
> >>>> - Ioi
> >>>>
> >>>> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> >>>>> Hi David,
> >>>>>
> >>>>> Thanks for quick response!
> >>>>>
> >>>>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes
> >>>>> <david.holmes at oracle.com> wrote:
> >>>>>> Hi Jiangli,
> >>>>>>
> >>>>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> >>>>>>> Please review the following change that allows archiving
> >>>>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
> >>>>>>>
> >>>>>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> >>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> >>>>>>>
> >>>>>>> Currently there are still large number of existing classes
> >>>>>>> (pre-built)
> >>>>>>> with older class versions (< 50) in real world applications. Those
> >>>>>>> classes are missing the benefit of archiving. Particularly, in some
> >>>>>>> use cases, class verification can be safely disabled. For those use
> >>>>>>> cases, supporting archiving pre JDK 6 classes shows good performance
> >>>>>>> benefit. We can re-evaluate this support when -Xverify:none is
> >>>>>>> removed
> >>>>>>> in the future, hopefully the needs for supporting class version < 50
> >>>>>>> is no longer significant at that time.
> >>>>>>>
> >>>>>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
> >>>>>>> sure the dump-time verification mode must be the same or stronger
> >>>>>>> than
> >>>>>>> the current mode.
> >>>>>>>
> >>>>>>> A CSR may be needed for the change. Any thoughts on that?
> >>>>>> A CSR request is definitely required given that you are proposing to
> >>>>>> undo a change that was itself put in place via a CSR request! And
> >>>>>> given
> >>>>>> this is relaxing a "defense-in-depth" check which will result in
> >>>>>> increasing exploitability, I think you will need a very strong
> >>>>>> argument
> >>>>>> to justify this.
> >>>>> Thanks for confirming this! Will do.
> >>>>>
> >>>>>> Further this not only undoes JDK-8197972 but it also invalidates
> >>>>>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
> >>>>>> requested a way to know if verification had been disabled, to help
> >>>>>> with
> >>>>>> analyzing crash reports, but instead we decided to not allow
> >>>>>> verification to be disabled.
> >>>>> I had some concerns about JDK-8155671 initially before making the
> >>>>> change, as it's a closed bug and my memory about the specific issue
> >>>>> was flushed out. I brought up the question in the bug. My take on
> >>>>> Ioi's response to my query about JDK-8155671 was that the
> >>>>> pre-JDK-8197972 behavior would not cause any security hole.
> >>>>>
> >>>>> Re-evaluating this particular behavior, I think the pre-JDK-8155671
> >>>>> would actually matches user intention better. If user decides to turn
> >>>>> off verification in safe use cases, it seems to be a good idea to
> >>>>> honor that. With the new dynamic archiving capability, archive could
> >>>>> be created at the first time when running a particular application.
> >>>>> Not forcing verification when user decides to can avoid
> >>>>> unnecessary/unwanted overhead.
> >>>>>
> >>>>> If verification is turned off at dump time for application classes,
> >>>>> runtime does not allow execution without also turning off
> >>>>> verification. We can determine a crash is not caused by relaxed dump
> >>>>> time verification.
> >>>>>
> >>>>> Regards,
> >>>>> Jiangli
> >>>>>
> >>>>>> David
> >>>>>> -----
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Tested with jtreg appcds tests.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jiangli
> >>>>>>>
> >>>>
> >>

From thomas.stuefe at gmail.com  Wed Nov 13 17:34:00 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 13 Nov 2019 18:34:00 +0100
Subject: [11u] RFR 8221539: [metaspace] Improve
 MetaspaceObj::is_metaspace_obj() and friends
Message-ID: <CAA-vtUxfM6BJ8W_+nH3xPrHBix4omkONceGxM02sJ9Bk74zQEQ@mail.gmail.com>

Hi,

may I have a review for a backport please.

Original bug:
https://bugs.openjdk.java.net/browse/JDK-8221539
http://hg.openjdk.java.net/jdk/jdk/rev/2ae93028bef3

11u webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/11u-8221539/webrev.00/webrev/

Original patch does not apply cleanly to 11u because it relies on older
changes in metaspace verification coding.

It would need 8218988 "Improve metaspace verifications" to apply cleanly.

That one would need 8177710 "Convert TestMetaspaceUtils_test to GTest".

Which has a number of problem and leads down a rabbit hole of unnecessary
and potentially dangerous test changes I do not really want to backport.

I had to make a cut somewhere, so I made one below this change: 8221539.

The fix is simply to call the older version of VirtualSpaceNode::verify()
which did not yet take a boolean parameter. Its all good.

diff -r 77e43317f4f7 -r 22efab1c724c
src/hotspot/share/memory/metaspace/virtualSpaceList.cpp
--- a/src/hotspot/share/memory/metaspace/virtualSpaceList.cpp   Wed Mar 27
14:13:34 2019 +0100
+++ b/src/hotspot/share/memory/metaspace/virtualSpaceList.cpp   Wed Nov 13
18:15:55 2019 +0100
@@ -410,7 +410,7 @@
   while (iter.repeat()) {
     VirtualSpaceNode* node = iter.get_next();
     if (slow) {
-      node->verify(true);
+      node->verify();
     }
     // Check that the node resides fully within our envelope.
     assert((address)node->low_boundary() >= _envelope_lo &&
(address)node->high_boundary() <= _envelope_hi,


Thank you,

Thomas

From daniel.daugherty at oracle.com  Wed Nov 13 20:05:10 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 13 Nov 2019 15:05:10 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
Message-ID: <40599ebe-5486-96e4-bf4a-c1c4b163cba2@oracle.com>

Greetings,

I'm only going to jump in on one more single item here. This one is
more difficult than the last one since I have to do more snipping
to tie together the contexts here...


On 11/11/19 7:41 AM, Robbin Ehn wrote:
> Hi David,
>
> On 2019-11-11 11:52, David Holmes wrote:
>
>> For example you say:
>>
>> ?> 242?? jint l_ref_count = ref_count();
>> ?> 243?? ADIM_guarantee(l_ref_count > 0, "must be positive: 
>> l_ref_count=%d,
>> ?> ref_count=%d", l_ref_count, ref_count());
>> ?> Please use Atomic::load() in ref_count.
>>
<snip>
> Argubly above should be written as:
> jint l_ref_count = ref_count(); // Atomic::load()
> if (l_ref_count > 0) {
> ????OrderAccess::loadload();
> ????ADIM_guarantee(l_ref_count > 0, "must be positive: l_ref_count=%d,
> ????ref_count=%d", l_ref_count, ref_count());
> }
>
> But since _ref_count could have been changed many times before the 
> second load I didn't see the point of printing the same value again.

Two things about this sub-thread:

- The reason for the second "ref_count=%d" is additional info
 ? in the case of a failure. If the l_ref_count value fails the
 ? check and the second printing shows a value that matches the
 ? condition, then I have one more tidbit of info about the race.
 ? And I get that information without looking at a core file in a
 ? debugger which doesn't always work.
- The above rewrite is not equivalent logic because this:

 ???? if (l_ref_count > 0) {

 ? will prevent ADIM_guarantee(l_ref_count > 0, ...) from ever failing.

My plan is to update the ref_count() function to do an
Atomic::load() of the _ref_count field and I think that will
address the sub-thread to everyone's satisfaction.

Dan


From manc at google.com  Thu Nov 14 01:00:21 2019
From: manc at google.com (Man Cao)
Date: Wed, 13 Nov 2019 17:00:21 -0800
Subject: RFR (XS): 8234127: BasicHashtable does not support small table_size
Message-ID: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>

Hi all,

Can I have reviews for this small bug fix?
Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
Bug: https://bugs.openjdk.java.net/browse/JDK-8234127

I'm trying to make use of KVHashtable in JDK-8087198 and encountered this
bug.

-Man

From fujie at loongson.cn  Thu Nov 14 02:09:20 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 14 Nov 2019 10:09:20 +0800
Subject: RFR(XS): 8234130: Zero VM build broken after 8233913
Message-ID: <00217ce4-cbc0-ed9e-b891-dddb4f2ec340@loongson.cn>

Hi all,

May I get reviews for the small fix?

JBS:??? https://bugs.openjdk.java.net/browse/JDK-8234130
Webrev: http://cr.openjdk.java.net/~jiefu/8234130/webrev.00/

Thanks a lot.
Best regards,
Jie


From coleen.phillimore at oracle.com  Thu Nov 14 02:13:05 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 13 Nov 2019 21:13:05 -0500
Subject: RFR(XS): 8234130: Zero VM build broken after 8233913
In-Reply-To: <00217ce4-cbc0-ed9e-b891-dddb4f2ec340@loongson.cn>
References: <00217ce4-cbc0-ed9e-b891-dddb4f2ec340@loongson.cn>
Message-ID: <339b4fe0-3483-9224-806f-e3462fe5dd4b@oracle.com>


Looks good + trivial.? Do you need a sponsor?
thanks,
Coleen

On 11/13/19 9:09 PM, Jie Fu wrote:
> Hi all,
>
> May I get reviews for the small fix?
>
> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8234130
> Webrev: http://cr.openjdk.java.net/~jiefu/8234130/webrev.00/
>
> Thanks a lot.
> Best regards,
> Jie
>
>


From fujie at loongson.cn  Thu Nov 14 02:16:23 2019
From: fujie at loongson.cn (Jie Fu)
Date: Thu, 14 Nov 2019 10:16:23 +0800
Subject: RFR(XS): 8234130: Zero VM build broken after 8233913
In-Reply-To: <339b4fe0-3483-9224-806f-e3462fe5dd4b@oracle.com>
References: <00217ce4-cbc0-ed9e-b891-dddb4f2ec340@loongson.cn>
 <339b4fe0-3483-9224-806f-e3462fe5dd4b@oracle.com>
Message-ID: <0e7371b5-5340-e561-3d09-1c2e5b53fe29@loongson.cn>

Thank you so much, Coleen.

Yes, I need a sponsor. Thanks.

Best regards,
Jie

On 2019/11/14 ??10:13, coleen.phillimore at oracle.com wrote:
>
> Looks good + trivial.? Do you need a sponsor?
> thanks,
> Coleen
>
> On 11/13/19 9:09 PM, Jie Fu wrote:
>> Hi all,
>>
>> May I get reviews for the small fix?
>>
>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8234130
>> Webrev: http://cr.openjdk.java.net/~jiefu/8234130/webrev.00/
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>>
>


From coleen.phillimore at oracle.com  Thu Nov 14 02:22:12 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 13 Nov 2019 21:22:12 -0500
Subject: RFR(XS): 8234130: Zero VM build broken after 8233913
In-Reply-To: <0e7371b5-5340-e561-3d09-1c2e5b53fe29@loongson.cn>
References: <00217ce4-cbc0-ed9e-b891-dddb4f2ec340@loongson.cn>
 <339b4fe0-3483-9224-806f-e3462fe5dd4b@oracle.com>
 <0e7371b5-5340-e561-3d09-1c2e5b53fe29@loongson.cn>
Message-ID: <cd6aaa3d-1ae4-4460-34c2-8d52263b1a45@oracle.com>

Done!
thanks for fixing it so quickly.
Coleen

On 11/13/19 9:16 PM, Jie Fu wrote:
> Thank you so much, Coleen.
>
> Yes, I need a sponsor. Thanks.
>
> Best regards,
> Jie
>
> On 2019/11/14 ??10:13, coleen.phillimore at oracle.com wrote:
>>
>> Looks good + trivial.? Do you need a sponsor?
>> thanks,
>> Coleen
>>
>> On 11/13/19 9:09 PM, Jie Fu wrote:
>>> Hi all,
>>>
>>> May I get reviews for the small fix?
>>>
>>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8234130
>>> Webrev: http://cr.openjdk.java.net/~jiefu/8234130/webrev.00/
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>>
>>
>


From ioi.lam at oracle.com  Thu Nov 14 03:52:04 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 13 Nov 2019 19:52:04 -0800
Subject: RFR(T): 8234133 VM build broken due to memory/archiveUtils.inline.hpp
Message-ID: <29647327-3980-c717-d104-faa04f441a0a@oracle.com>

Sorry I forgot to add this file:

http://cr.openjdk.java.net/~iklam/jdk14/8234133-missing-archiveUtils.inline.hpp/

Thanks
- Ioi

From ioi.lam at oracle.com  Thu Nov 14 03:53:44 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 13 Nov 2019 19:53:44 -0800
Subject: RFR(T): 8234133 VM build broken due to
 memory/archiveUtils.inline.hpp
In-Reply-To: <9e357836-3857-b6bf-5c5e-f946744d87ef@oracle.com>
References: <29647327-3980-c717-d104-faa04f441a0a@oracle.com>
 <9e357836-3857-b6bf-5c5e-f946744d87ef@oracle.com>
Message-ID: <218bbc85-f8f5-de1b-f1a5-e11607815fc6@oracle.com>

Thanks David and Jie for spotting this. Pushing now.

On 11/13/19 7:53 PM, David Holmes wrote:
> Looks good and trivial. Please push.
>
> Thanks,
> David
>
> On 14/11/2019 1:52 pm, Ioi Lam wrote:
>> Sorry I forgot to add this file:
>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8234133-missing-archiveUtils.inline.hpp/ 
>>
>>
>> Thanks
>> - Ioi


From david.holmes at oracle.com  Thu Nov 14 03:53:15 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 14 Nov 2019 13:53:15 +1000
Subject: RFR(T): 8234133 VM build broken due to
 memory/archiveUtils.inline.hpp
In-Reply-To: <29647327-3980-c717-d104-faa04f441a0a@oracle.com>
References: <29647327-3980-c717-d104-faa04f441a0a@oracle.com>
Message-ID: <9e357836-3857-b6bf-5c5e-f946744d87ef@oracle.com>

Looks good and trivial. Please push.

Thanks,
David

On 14/11/2019 1:52 pm, Ioi Lam wrote:
> Sorry I forgot to add this file:
> 
> http://cr.openjdk.java.net/~iklam/jdk14/8234133-missing-archiveUtils.inline.hpp/ 
> 
> 
> Thanks
> - Ioi

From robbin.ehn at oracle.com  Thu Nov 14 07:37:29 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 14 Nov 2019 08:37:29 +0100
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <40599ebe-5486-96e4-bf4a-c1c4b163cba2@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <172b7c1a-e707-02a9-cc96-4c788b759d72@oracle.com>
 <590a7395-8fda-caf3-402a-29393fd34469@oracle.com>
 <7dedd65e-f004-b0b4-0c04-63bc484198ac@oracle.com>
 <40599ebe-5486-96e4-bf4a-c1c4b163cba2@oracle.com>
Message-ID: <77365bdf-d98b-b7c7-4211-7516e9f1c51c@oracle.com>

Hi Dan,

On 2019-11-13 21:05, Daniel D. Daugherty wrote:
>> if (l_ref_count > 0) {
...
> - The above rewrite is not equivalent logic because this:
> 
>  ???? if (l_ref_count > 0) {
> 
>  ? will prevent ADIM_guarantee(l_ref_count > 0, ...) from ever failing.

Yes, that one should be reversed.

> 
> My plan is to update the ref_count() function to do an
> Atomic::load() of the _ref_count field and I think that will
> address the sub-thread to everyone's satisfaction.

Ok!

Thanks, Robbin

> 
> Dan
> 

From martin.doerr at sap.com  Thu Nov 14 11:36:25 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 14 Nov 2019 11:36:25 +0000
Subject: [11u] RFR 8221539: [metaspace] Improve
 MetaspaceObj::is_metaspace_obj() and friends
In-Reply-To: <CAA-vtUxfM6BJ8W_+nH3xPrHBix4omkONceGxM02sJ9Bk74zQEQ@mail.gmail.com>
References: <CAA-vtUxfM6BJ8W_+nH3xPrHBix4omkONceGxM02sJ9Bk74zQEQ@mail.gmail.com>
Message-ID: <VI1PR0201MB24792B5D32C023CE746CEA3D9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Thomas,

thanks for backporting it. Change looks good.

Best regards,
Martin


> -----Original Message-----
> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net> On
> Behalf Of Thomas St?fe
> Sent: Mittwoch, 13. November 2019 18:34
> To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>; jdk-
> updates-dev at openjdk.java.net
> Subject: [11u] RFR 8221539: [metaspace] Improve
> MetaspaceObj::is_metaspace_obj() and friends
> 
> Hi,
> 
> may I have a review for a backport please.
> 
> Original bug:
> https://bugs.openjdk.java.net/browse/JDK-8221539
> http://hg.openjdk.java.net/jdk/jdk/rev/2ae93028bef3
> 
> 11u webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/11u-
> 8221539/webrev.00/webrev/
> 
> Original patch does not apply cleanly to 11u because it relies on older
> changes in metaspace verification coding.
> 
> It would need 8218988 "Improve metaspace verifications" to apply cleanly.
> 
> That one would need 8177710 "Convert TestMetaspaceUtils_test to GTest".
> 
> Which has a number of problem and leads down a rabbit hole of unnecessary
> and potentially dangerous test changes I do not really want to backport.
> 
> I had to make a cut somewhere, so I made one below this change: 8221539.
> 
> The fix is simply to call the older version of VirtualSpaceNode::verify()
> which did not yet take a boolean parameter. Its all good.
> 
> diff -r 77e43317f4f7 -r 22efab1c724c
> src/hotspot/share/memory/metaspace/virtualSpaceList.cpp
> --- a/src/hotspot/share/memory/metaspace/virtualSpaceList.cpp   Wed Mar
> 27
> 14:13:34 2019 +0100
> +++ b/src/hotspot/share/memory/metaspace/virtualSpaceList.cpp   Wed
> Nov 13
> 18:15:55 2019 +0100
> @@ -410,7 +410,7 @@
>    while (iter.repeat()) {
>      VirtualSpaceNode* node = iter.get_next();
>      if (slow) {
> -      node->verify(true);
> +      node->verify();
>      }
>      // Check that the node resides fully within our envelope.
>      assert((address)node->low_boundary() >= _envelope_lo &&
> (address)node->high_boundary() <= _envelope_hi,
> 
> 
> Thank you,
> 
> Thomas

From thomas.stuefe at gmail.com  Thu Nov 14 11:40:36 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 14 Nov 2019 12:40:36 +0100
Subject: [11u] RFR 8221539: [metaspace] Improve
 MetaspaceObj::is_metaspace_obj() and friends
In-Reply-To: <VI1PR0201MB24792B5D32C023CE746CEA3D9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <CAA-vtUxfM6BJ8W_+nH3xPrHBix4omkONceGxM02sJ9Bk74zQEQ@mail.gmail.com>
 <VI1PR0201MB24792B5D32C023CE746CEA3D9A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <CAA-vtUzAqU6ibxoqS=vbZq2K4+c_WWd9mL1Q-mf228F_HfVxBg@mail.gmail.com>

Thank you Martin!

On Thu, Nov 14, 2019 at 12:36 PM Doerr, Martin <martin.doerr at sap.com> wrote:

> Hi Thomas,
>
> thanks for backporting it. Change looks good.
>
> Best regards,
> Martin
>
>
> > -----Original Message-----
> > From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net> On
> > Behalf Of Thomas St?fe
> > Sent: Mittwoch, 13. November 2019 18:34
> > To: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>; jdk-
> > updates-dev at openjdk.java.net
> > Subject: [11u] RFR 8221539: [metaspace] Improve
> > MetaspaceObj::is_metaspace_obj() and friends
> >
> > Hi,
> >
> > may I have a review for a backport please.
> >
> > Original bug:
> > https://bugs.openjdk.java.net/browse/JDK-8221539
> > http://hg.openjdk.java.net/jdk/jdk/rev/2ae93028bef3
> >
> > 11u webrev:
> > http://cr.openjdk.java.net/~stuefe/webrevs/11u-
> > 8221539/webrev.00/webrev/
> >
> > Original patch does not apply cleanly to 11u because it relies on older
> > changes in metaspace verification coding.
> >
> > It would need 8218988 "Improve metaspace verifications" to apply cleanly.
> >
> > That one would need 8177710 "Convert TestMetaspaceUtils_test to GTest".
> >
> > Which has a number of problem and leads down a rabbit hole of unnecessary
> > and potentially dangerous test changes I do not really want to backport.
> >
> > I had to make a cut somewhere, so I made one below this change: 8221539.
> >
> > The fix is simply to call the older version of VirtualSpaceNode::verify()
> > which did not yet take a boolean parameter. Its all good.
> >
> > diff -r 77e43317f4f7 -r 22efab1c724c
> > src/hotspot/share/memory/metaspace/virtualSpaceList.cpp
> > --- a/src/hotspot/share/memory/metaspace/virtualSpaceList.cpp   Wed Mar
> > 27
> > 14:13:34 2019 +0100
> > +++ b/src/hotspot/share/memory/metaspace/virtualSpaceList.cpp   Wed
> > Nov 13
> > 18:15:55 2019 +0100
> > @@ -410,7 +410,7 @@
> >    while (iter.repeat()) {
> >      VirtualSpaceNode* node = iter.get_next();
> >      if (slow) {
> > -      node->verify(true);
> > +      node->verify();
> >      }
> >      // Check that the node resides fully within our envelope.
> >      assert((address)node->low_boundary() >= _envelope_lo &&
> > (address)node->high_boundary() <= _envelope_hi,
> >
> >
> > Thank you,
> >
> > Thomas
>

From christoph.langer at sap.com  Thu Nov 14 15:37:21 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Thu, 14 Nov 2019 15:37:21 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between libjava, 
 hotspot and libinstrument
Message-ID: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi,

please review this cleanup change regarding function "canonicalize" of libjava.

Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/


The goal is to cleanup how this function is defined and used. One thing is, that there was an unnecessary wrapper function "Canonicalize" in jni_util.c. It wrapped the call to "canonicalize". We can get rid of this wrapper. Unfortunately, it is not possible to just export "canonicalize" since this will conflict with a method signature from the math library, at least on modern Linuxes. So I decided to call the method JDK_Canonicalize and will correctly define it in jdk_util.h which can be included everywhere.


Hotspot's classloader.cpp will dynamically resolve this method, so I add a local declaration of the function pointer in there.


This change shall be predecessor of JDK-8223261, where a review was already started here: https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063398.html

Thanks
Christoph


From serguei.spitsyn at oracle.com  Thu Nov 14 15:55:34 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Thu, 14 Nov 2019 07:55:34 -0800
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
Message-ID: <d6f497ea-81cc-54d9-d25b-22345040933b@oracle.com>

Hi David,

Just wanted to let you know I'm reviewing this.

Thanks,
Serguei


On 11/11/19 20:52, David Holmes wrote:
> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>
> In JDK-8229516 I moved the interrupted state of a thread from the 
> osThread in the VM to the java.lang.Thread instance. In doing that I 
> overlooked a critical aspect, which is that to access the field of a 
> Java object the JavaThread must not be in a safepoint-safe state** - 
> otherwise the oop, and anything referenced there from could be 
> relocated by the GC whilst the JavaThread is accessing it. This 
> manifested in a number of tests using JVM TI Agent threads and JVM TI 
> RawMonitors because the JavaThread's were marked _thread_blocked and 
> hence safepoint-safe, and we read a non-zero value for the interrupted 
> field even though we had never been interrupted.
>
> This problem existed in all the code that checks for interruption when 
> "waiting":
>
> - Parker::park (the code underpinning 
> java.util.concurrent.LockSupport.park())
>
> To fix this code I simply deleted a late check of the interrupted 
> field. The check was not needed because if an interrupt has occurred 
> then we will find the ParkEvent in a signalled state.
>
> - ObjectMonitor::wait
>
> Here the late check of the interrupted state is essential as we reset 
> the ParkEvent after an earlier check of the interrupted state. But the 
> fix was simply achieved by moving the check slightly earlier before we 
> use ThreadBlockInVm to become _thread_blocked.
>
> - RawMonitor::wait
>
> This fix was much more involved. The RawMonitor code directly 
> transitions the JavaThread from _thread_in_Native to _thread_blocked. 
> This is safe from a safepoint perspective because they are equivalent 
> safepoint-safe states. To allow access to the interrupted field I have 
> to transition from native to _thread_in_vm, and that has to be done by 
> proper thread-state transitions to ensure correct access to the oop 
> and its fields. Having done that I can then use ThreadBlockInVM for 
> the transitions to blocked. However, as the old code noted it can't 
> use proper thread-state transitions as this will lead to deadlocks 
> with the VMThread that can also use RawMonitors when executing various 
> event callbacks. To deal with that we have to note that the real 
> constraint is that the JavaThread cannot block at a safepoint whilst 
> it holds the RawMonitor. Hence the fix was push all the interrupt 
> checking code and the thread-state transitions to the lowest level of 
> RawMonitorWait, around the final park() call, after we have enqueued 
> the waiter and released the monitor. That avoids any deadlock 
> possibility.
>
> I also added checks to is_interrupted/interrupted to ensure they are 
> only called by a thread in a suitable state. This should only be the 
> VMThread (as a consequence of the Thread.stop implementation occurring 
> at a safepoint and issuing a JavaThread::interrupt() call to unblock 
> the target); or a JavaThread that is not _thread_in_native or 
> _thread_blocked.
>
> Testing: (still finalizing)
> ?- tiers 1 - 6 (Oracle platforms)
> ?- Local Linux testing
> ? - vmTestbase/nsk/monitoring/
> ? - vmTestbase/nsk/jdwp
> ? - vmTestbase/nsk/jdb/
> ? - vmTestbase/nsk/jdi/
> ? - vmTestbase/nsk/jvmti/
> ? - serviceability/jvmti/
> ? - serviceability/jdwp
> ? - JDK: java/lang/management
> ???????? com/sun/management
>
> ** Note that this applies to all accesses we make via code in 
> javaClasses.*. For this particular code I thought about adding a guard 
> in JavaThread::threadObj() but it turns out when we generate a crash 
> report we access the Thread's name() field and that can happen when in 
> any state, so we'd always trigger a secondary assertion failure during 
> error reporting if we did that. Note that accessing name() can still 
> easily lead to secondary assertions failures as I discovered when 
> trying to debug this and print the thread name out - I would see an 
> is_instance assertion fail checking that the Thread name() is an 
> instance of java.lang.String!
>
> Thanks,
> David
> -----


From martin.doerr at sap.com  Thu Nov 14 16:55:21 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 14 Nov 2019 16:55:21 +0000
Subject: RFR(T): 8234188: AIX build broken after 8220310
Message-ID: <VI1PR0201MB24798B3FEB2C206978AA59409A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi,

can somebody please review this trivial AIX build fix?

http://cr.openjdk.java.net/~mdoerr/8234188_fix_aix_build/webrev.00/

Best regards,
Martin


From harold.seigel at oracle.com  Thu Nov 14 17:04:09 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Thu, 14 Nov 2019 12:04:09 -0500
Subject: RFR(T): 8234188: AIX build broken after 8220310
In-Reply-To: <VI1PR0201MB24798B3FEB2C206978AA59409A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24798B3FEB2C206978AA59409A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <9b637531-72f7-3ea8-d059-2a481b7b4afb@oracle.com>

Hi Martin,

The change looks good and trivial.

Thanks, Harold

On 11/14/2019 11:55 AM, Doerr, Martin wrote:
> Hi,
>
> can somebody please review this trivial AIX build fix?
>
> http://cr.openjdk.java.net/~mdoerr/8234188_fix_aix_build/webrev.00/
>
> Best regards,
> Martin
>

From mikhailo.seledtsov at oracle.com  Thu Nov 14 18:38:52 2019
From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov)
Date: Thu, 14 Nov 2019 10:38:52 -0800
Subject: RFR(T): 8232244: [TESTBUG] Incorrect comment in
 TestClassUnloadEvent.java
Message-ID: <5DCD9F3C.5060809@oracle.com>

Please review this change that removes incorrect comment. Both 
statements in the comment are incorrect, hence removing the comment.


JBS: https://bugs.openjdk.java.net/browse/JDK-8232244

Change:
   --- a/test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java
   +++ b/test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java
   @@ -47,12 +47,6 @@
   * @run main/othervm -Xlog:class+unload -Xlog:gc -Xmx16m 
jdk.jfr.event.runtime.TestClassUnloadEvent
   */

   -/**
   - * System.gc() will trigger class unloading if 
-XX:+ExplicitGCInvokesConcurrent is NOT set.
   - * If this flag is set G1 will never unload classes on System.gc().
   - * As far as the "jfr" key guarantees no VM flags are set from the 
outside
   - * it should be enough with System.gc().
   - */


Testing:
    1. Ran the updated test: PASS


Thank you,
Misha


From coleen.phillimore at oracle.com  Thu Nov 14 18:50:37 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Thu, 14 Nov 2019 13:50:37 -0500
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
Message-ID: <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>


This fix seems fine, but having a hashtable with a starting length 1 
seems silly.?? Unless I'm reading this wrong.?? As Ioi wrote in his 
comment, there might be a better hashtable for your work.

Thanks,
Coleen

On 11/13/19 8:00 PM, Man Cao wrote:
> Hi all,
>
> Can I have reviews for this small bug fix?
> Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234127
>
> I'm trying to make use of KVHashtable in JDK-8087198 and encountered this
> bug.
>
> -Man


From igor.ignatyev at oracle.com  Thu Nov 14 18:54:23 2019
From: igor.ignatyev at oracle.com (Igor Ignatev)
Date: Thu, 14 Nov 2019 10:54:23 -0800
Subject: RFR(T): 8232244: [TESTBUG] Incorrect comment in
 TestClassUnloadEvent.java
In-Reply-To: <5DCD9F3C.5060809@oracle.com>
References: <5DCD9F3C.5060809@oracle.com>
Message-ID: <24694DF5-9CB6-4A2B-9526-C7EDA69663A7@oracle.com>

LGTM

? Igor

> On Nov 14, 2019, at 10:36 AM, Mikhailo Seledtsov <mikhailo.seledtsov at oracle.com> wrote:
> 
> ?Please review this change that removes incorrect comment. Both statements in the comment are incorrect, hence removing the comment.
> 
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8232244
> 
> Change:
>  --- a/test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java
>  +++ b/test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java
>  @@ -47,12 +47,6 @@
>  * @run main/othervm -Xlog:class+unload -Xlog:gc -Xmx16m jdk.jfr.event.runtime.TestClassUnloadEvent
>  */
> 
>  -/**
>  - * System.gc() will trigger class unloading if -XX:+ExplicitGCInvokesConcurrent is NOT set.
>  - * If this flag is set G1 will never unload classes on System.gc().
>  - * As far as the "jfr" key guarantees no VM flags are set from the outside
>  - * it should be enough with System.gc().
>  - */
> 
> 
> Testing:
>   1. Ran the updated test: PASS
> 
> 
> Thank you,
> Misha
> 


From mikhailo.seledtsov at oracle.com  Thu Nov 14 19:17:31 2019
From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov)
Date: Thu, 14 Nov 2019 11:17:31 -0800
Subject: RFR(T): 8232244: [TESTBUG] Incorrect comment in
 TestClassUnloadEvent.java
In-Reply-To: <24694DF5-9CB6-4A2B-9526-C7EDA69663A7@oracle.com>
References: <5DCD9F3C.5060809@oracle.com>
 <24694DF5-9CB6-4A2B-9526-C7EDA69663A7@oracle.com>
Message-ID: <5DCDA84B.5060905@oracle.com>

Thanks Igor, pushed.

On 11/14/19, 10:54 AM, Igor Ignatev wrote:
> LGTM
>
> ? Igor
>
>> On Nov 14, 2019, at 10:36 AM, Mikhailo Seledtsov<mikhailo.seledtsov at oracle.com>  wrote:
>>
>> ?Please review this change that removes incorrect comment. Both statements in the comment are incorrect, hence removing the comment.
>>
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8232244
>>
>> Change:
>>   --- a/test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java
>>   +++ b/test/jdk/jdk/jfr/event/runtime/TestClassUnloadEvent.java
>>   @@ -47,12 +47,6 @@
>>   * @run main/othervm -Xlog:class+unload -Xlog:gc -Xmx16m jdk.jfr.event.runtime.TestClassUnloadEvent
>>   */
>>
>>   -/**
>>   - * System.gc() will trigger class unloading if -XX:+ExplicitGCInvokesConcurrent is NOT set.
>>   - * If this flag is set G1 will never unload classes on System.gc().
>>   - * As far as the "jfr" key guarantees no VM flags are set from the outside
>>   - * it should be enough with System.gc().
>>   - */
>>
>>
>> Testing:
>>    1. Ran the updated test: PASS
>>
>>
>> Thank you,
>> Misha
>>

From manc at google.com  Thu Nov 14 19:18:42 2019
From: manc at google.com (Man Cao)
Date: Thu, 14 Nov 2019 11:18:42 -0800
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
 <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
Message-ID: <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>

Thanks for the review. Yes, I will try using ResourceHashtable in new code.
The BasicHashtable does not work with size 2 and 3, either. In my use case,
the initial size is based on a JVM flag (G1UpdateBufferSize), so it is
dependent on user input.

-Man


On Thu, Nov 14, 2019 at 10:52 AM <coleen.phillimore at oracle.com> wrote:

>
> This fix seems fine, but having a hashtable with a starting length 1
> seems silly.   Unless I'm reading this wrong.   As Ioi wrote in his
> comment, there might be a better hashtable for your work.
>
> Thanks,
> Coleen
>
> On 11/13/19 8:00 PM, Man Cao wrote:
> > Hi all,
> >
> > Can I have reviews for this small bug fix?
> > Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234127
> >
> > I'm trying to make use of KVHashtable in JDK-8087198 and encountered this
> > bug.
> >
> > -Man
>
>

From jianglizhou at google.com  Thu Nov 14 21:20:08 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Thu, 14 Nov 2019 13:20:08 -0800
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
 <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
 <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>
Message-ID: <CALrW1jxo-Bj3ZiQK4J6-c=zFHNsGAREX2=rpXyP53F_9HFEhug@mail.gmail.com>

Hi Man,

Just took a look. Looks fine to me as well.

It's not directly related, I'm wondering why such small hashtable is
needed in your use case.

Thanks,
Jiangli

On Thu, Nov 14, 2019 at 11:19 AM Man Cao <manc at google.com> wrote:
>
> Thanks for the review. Yes, I will try using ResourceHashtable in new code.
> The BasicHashtable does not work with size 2 and 3, either. In my use case,
> the initial size is based on a JVM flag (G1UpdateBufferSize), so it is
> dependent on user input.
>
> -Man
>
>
> On Thu, Nov 14, 2019 at 10:52 AM <coleen.phillimore at oracle.com> wrote:
>
> >
> > This fix seems fine, but having a hashtable with a starting length 1
> > seems silly.   Unless I'm reading this wrong.   As Ioi wrote in his
> > comment, there might be a better hashtable for your work.
> >
> > Thanks,
> > Coleen
> >
> > On 11/13/19 8:00 PM, Man Cao wrote:
> > > Hi all,
> > >
> > > Can I have reviews for this small bug fix?
> > > Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234127
> > >
> > > I'm trying to make use of KVHashtable in JDK-8087198 and encountered this
> > > bug.
> > >
> > > -Man
> >
> >

From manc at google.com  Thu Nov 14 21:47:52 2019
From: manc at google.com (Man Cao)
Date: Thu, 14 Nov 2019 13:47:52 -0800
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <CALrW1jxo-Bj3ZiQK4J6-c=zFHNsGAREX2=rpXyP53F_9HFEhug@mail.gmail.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
 <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
 <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>
 <CALrW1jxo-Bj3ZiQK4J6-c=zFHNsGAREX2=rpXyP53F_9HFEhug@mail.gmail.com>
Message-ID: <CA+w6HxY_VL+7NXV0RKJbcRqC0=8OyzAU9+4pP6ej_e0iCAYd_w@mail.gmail.com>

Thanks for the review.

> It's not directly related, I'm wondering why such small hashtable is
> needed in your use case.
The test G1AddMetaspaceDependency.java sets -XX:G1UpdateBufferSize=1, which
would create a size-one hashtable with my pending change for JDK-8087198.
Real users should probably not set G1UpdateBufferSize so small, unless they
want to stress G1's code for concurrent refinement.

-Man


On Thu, Nov 14, 2019 at 1:20 PM Jiangli Zhou <jianglizhou at google.com> wrote:

> Hi Man,
>
> Just took a look. Looks fine to me as well.
>
> It's not directly related, I'm wondering why such small hashtable is
> needed in your use case.
>
> Thanks,
> Jiangli
>
> On Thu, Nov 14, 2019 at 11:19 AM Man Cao <manc at google.com> wrote:
> >
> > Thanks for the review. Yes, I will try using ResourceHashtable in new
> code.
> > The BasicHashtable does not work with size 2 and 3, either. In my use
> case,
> > the initial size is based on a JVM flag (G1UpdateBufferSize), so it is
> > dependent on user input.
> >
> > -Man
> >
> >
> > On Thu, Nov 14, 2019 at 10:52 AM <coleen.phillimore at oracle.com> wrote:
> >
> > >
> > > This fix seems fine, but having a hashtable with a starting length 1
> > > seems silly.   Unless I'm reading this wrong.   As Ioi wrote in his
> > > comment, there might be a better hashtable for your work.
> > >
> > > Thanks,
> > > Coleen
> > >
> > > On 11/13/19 8:00 PM, Man Cao wrote:
> > > > Hi all,
> > > >
> > > > Can I have reviews for this small bug fix?
> > > > Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
> > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234127
> > > >
> > > > I'm trying to make use of KVHashtable in JDK-8087198 and encountered
> this
> > > > bug.
> > > >
> > > > -Man
> > >
> > >
>

From ioi.lam at oracle.com  Thu Nov 14 21:53:38 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 14 Nov 2019 13:53:38 -0800
Subject: RFR(XS) 8234196 [TESTBUG] DynamicArchiveRelocationTest.java missing
 "ArchiveRelocationMode == 1 ...."
Message-ID: <86842b34-6934-443c-50bd-c5997ca0b508@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8234196
http://cr.openjdk.java.net/~iklam/jdk14/8234196-missing-relocation-mode-log.v01/

To ensure that archive relocation has happened, the test checks for the 
log message "ArchiveRelocationMode == 1: always map archive(s) at an 
alternative address". However, in debug builds, this message occurs ONLY 
if the archive has been successfully mapped at the desired location 
(after which the JVM will unmap the archive, and remap it at an 
alternative location). See

http://hg.openjdk.java.net/jdk/jdk/file/b987ea528c21/src/hotspot/share/memory/filemap.cpp#l1397

In this particular test run, the archive failed to be mapped at the 
designed location (this is quite common on Windows), so we never execute 
the line at filemap.cpp:1397. Hence the expected message was not printed.

Anyway, the test cases already checks for the following messages:

 ??????? "runtime archive relocation start";
 ??????? "runtime archive relocation done"

so the check for "ArchiveRelocationMode == 1: ..." is redundant and can 
be removed.

Thanks
- Ioi

From daniel.daugherty at oracle.com  Thu Nov 14 22:06:11 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 14 Nov 2019 17:06:11 -0500
Subject: RFR(XS) 8234196 [TESTBUG] DynamicArchiveRelocationTest.java
 missing "ArchiveRelocationMode == 1 ...."
In-Reply-To: <86842b34-6934-443c-50bd-c5997ca0b508@oracle.com>
References: <86842b34-6934-443c-50bd-c5997ca0b508@oracle.com>
Message-ID: <b58995c3-53e7-fdc9-4075-3402fbe4dd0f@oracle.com>

On 11/14/19 4:53 PM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8234196
> http://cr.openjdk.java.net/~iklam/jdk14/8234196-missing-relocation-mode-log.v01/ 
>

test/hotspot/jtreg/runtime/cds/appcds/ArchiveRelocationTest.java
 ??? No comments.

test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/DynamicArchiveRelocationTest.java
 ??? No comments.

Thumbs up.

I presume you've done a sanity check run on the two tests locally.

Dan


>
> To ensure that archive relocation has happened, the test checks for 
> the log message "ArchiveRelocationMode == 1: always map archive(s) at 
> an alternative address". However, in debug builds, this message occurs 
> ONLY if the archive has been successfully mapped at the desired 
> location (after which the JVM will unmap the archive, and remap it at 
> an alternative location). See
>
> http://hg.openjdk.java.net/jdk/jdk/file/b987ea528c21/src/hotspot/share/memory/filemap.cpp#l1397 
>
>
> In this particular test run, the archive failed to be mapped at the 
> designed location (this is quite common on Windows), so we never 
> execute the line at filemap.cpp:1397. Hence the expected message was 
> not printed.
>
> Anyway, the test cases already checks for the following messages:
>
> ??????? "runtime archive relocation start";
> ??????? "runtime archive relocation done"
>
> so the check for "ArchiveRelocationMode == 1: ..." is redundant and 
> can be removed.
>
> Thanks
> - Ioi
>


From ioi.lam at oracle.com  Thu Nov 14 22:13:53 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 14 Nov 2019 14:13:53 -0800
Subject: RFR(XS) 8234196 [TESTBUG] DynamicArchiveRelocationTest.java
 missing "ArchiveRelocationMode == 1 ...."
In-Reply-To: <b58995c3-53e7-fdc9-4075-3402fbe4dd0f@oracle.com>
References: <86842b34-6934-443c-50bd-c5997ca0b508@oracle.com>
 <b58995c3-53e7-fdc9-4075-3402fbe4dd0f@oracle.com>
Message-ID: <5ca682a4-a527-c92b-4b70-9293c479551b@oracle.com>


On 11/14/19 2:06 PM, Daniel D. Daugherty wrote:
> On 11/14/19 4:53 PM, Ioi Lam wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8234196
>> http://cr.openjdk.java.net/~iklam/jdk14/8234196-missing-relocation-mode-log.v01/ 
>>
>
> test/hotspot/jtreg/runtime/cds/appcds/ArchiveRelocationTest.java
> ??? No comments.
>
> test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/DynamicArchiveRelocationTest.java 
>
> ??? No comments.
>
> Thumbs up.
>
> I presume you've done a sanity check run on the two tests locally.
>
> Dan
>

Hi Dan,

Thanks for the review. Yes, I've done a sanity run locally. However, the 
original failure only happens infrequently on Windows and might be 
host-specific, so I wasn't able to reproduce the failure.


 From looking at the log file of the failed test run, I can see the 
"runtime archive relocation start" and "runtime archive relocation done" 
logs, so I think the fix should be correct (famous last words ...).

Thanks
- Ioi

>
>
>
>>
>> To ensure that archive relocation has happened, the test checks for 
>> the log message "ArchiveRelocationMode == 1: always map archive(s) at 
>> an alternative address". However, in debug builds, this message 
>> occurs ONLY if the archive has been successfully mapped at the 
>> desired location (after which the JVM will unmap the archive, and 
>> remap it at an alternative location). See
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/b987ea528c21/src/hotspot/share/memory/filemap.cpp#l1397 
>>
>>
>> In this particular test run, the archive failed to be mapped at the 
>> designed location (this is quite common on Windows), so we never 
>> execute the line at filemap.cpp:1397. Hence the expected message was 
>> not printed.
>>
>> Anyway, the test cases already checks for the following messages:
>>
>> ??????? "runtime archive relocation start";
>> ??????? "runtime archive relocation done"
>>
>> so the check for "ArchiveRelocationMode == 1: ..." is redundant and 
>> can be removed.
>>
>> Thanks
>> - Ioi
>>
>


From david.holmes at oracle.com  Thu Nov 14 22:21:39 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 15 Nov 2019 08:21:39 +1000
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <00254f6c-7532-a12d-9074-831bf3b69abd@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <00254f6c-7532-a12d-9074-831bf3b69abd@oracle.com>
Message-ID: <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>

Hi Serguei,

Thanks for taking a look.

On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
> Hi David,
> 
> It looks good to me.
> A couple of nits below.
> 
> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html
> 
> 236 if (self->is_Java_thread()) {
> 237 JavaThread* jt = (JavaThread*) self;
> 238 // Transition to VM so we can check interrupt state
> 239 ThreadInVMfromNative tivm(jt);
> 240 if (jt->is_interrupted(true)) {
> 241 ret = M_INTERRUPTED;
> 242 } else {
> 243 ThreadBlockInVM tbivm(jt);
> 244 jt->set_suspend_equivalent();
> 245 if (millis <= 0) {
> 246 self->_ParkEvent->park();
> 247 } else {
> 248 self->_ParkEvent->park(millis);
> 249 }
> 250 }
> 251 // Return to VM before post-check of interrupt state
> 252 if (jt->is_interrupted(true)) {
> 253 ret = M_INTERRUPTED;
> 254 }
> 255 } else {
> 
> 
> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
> It will add more clarity to this code.

No, it has to be after line 250 as that is when we will hit the TBIVM 
destructor and so return to _thread_in_vm which is the state needed to 
read the interrupted field. Dan commented on the above and I changed it 
slightly by moving the comment:

 > 250   // Return to VM before post-check of interrupt state
 > 251 }
 > 252 if (jt->is_interrupted(true)) {
 > 253   ret = M_INTERRUPTED;
 > 254 }


>   412   if (self->is_Java_thread()) {
> 413 JavaThread* jt = (JavaThread*)self;
> 414 jt->set_suspend_equivalent();
>   415     for (;;) {
>   416       if (!jt->handle_special_suspend_equivalent_condition()) {
>   417         break;
> 418 } else {
> 419 // We've been suspended whilst waiting and so we have to
> 420 // relinquish the raw monitor until we are resumed. Of course
> 421 // after reacquiring we have to re-check for suspension again.
> 422 // Suspension requires we are _thread_blocked, and we also have to
> 423 // recheck for being interrupted.
>   424         simple_exit(jt);
> 425 {
> 426 ThreadInVMfromNative tivm(jt);
> 427 {
> 428 ThreadBlockInVM tbivm(jt);
>   429             jt->java_suspend_self();
> 430 }
> 431 if (jt->is_interrupted(true)) {
> 432 ret = M_INTERRUPTED;
> 433 }
> 434 }
>   435         simple_enter(jt);
>   436         jt->set_suspend_equivalent();
>   437       }
>   ...
> 
> This code can be simplified a little bit.
> The line:
> 
> 414 jt->set_suspend_equivalent();
> 
> can be placed before line 416.
> Then this line can be removed:
> 
>   436         jt->set_suspend_equivalent();

Yes you're right. I was trying to preserve the original loop structure, 
but then had to add the additional set_suspend_equivalent for the first 
iteration. But I can instead just move the existing one to the top of 
the loop.

Webrev updated in place.

Thanks,
David
-----

> 
> Thanks,
> Serguei
> 
> 
> On 11/11/19 20:52, David Holmes wrote:
>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>
>> In JDK-8229516 I moved the interrupted state of a thread from the 
>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>> overlooked a critical aspect, which is that to access the field of a 
>> Java object the JavaThread must not be in a safepoint-safe state** - 
>> otherwise the oop, and anything referenced there from could be 
>> relocated by the GC whilst the JavaThread is accessing it. This 
>> manifested in a number of tests using JVM TI Agent threads and JVM TI 
>> RawMonitors because the JavaThread's were marked _thread_blocked and 
>> hence safepoint-safe, and we read a non-zero value for the interrupted 
>> field even though we had never been interrupted.
>>
>> This problem existed in all the code that checks for interruption when 
>> "waiting":
>>
>> - Parker::park (the code underpinning 
>> java.util.concurrent.LockSupport.park())
>>
>> To fix this code I simply deleted a late check of the interrupted 
>> field. The check was not needed because if an interrupt has occurred 
>> then we will find the ParkEvent in a signalled state.
>>
>> - ObjectMonitor::wait
>>
>> Here the late check of the interrupted state is essential as we reset 
>> the ParkEvent after an earlier check of the interrupted state. But the 
>> fix was simply achieved by moving the check slightly earlier before we 
>> use ThreadBlockInVm to become _thread_blocked.
>>
>> - RawMonitor::wait
>>
>> This fix was much more involved. The RawMonitor code directly 
>> transitions the JavaThread from _thread_in_Native to _thread_blocked. 
>> This is safe from a safepoint perspective because they are equivalent 
>> safepoint-safe states. To allow access to the interrupted field I have 
>> to transition from native to _thread_in_vm, and that has to be done by 
>> proper thread-state transitions to ensure correct access to the oop 
>> and its fields. Having done that I can then use ThreadBlockInVM for 
>> the transitions to blocked. However, as the old code noted it can't 
>> use proper thread-state transitions as this will lead to deadlocks 
>> with the VMThread that can also use RawMonitors when executing various 
>> event callbacks. To deal with that we have to note that the real 
>> constraint is that the JavaThread cannot block at a safepoint whilst 
>> it holds the RawMonitor. Hence the fix was push all the interrupt 
>> checking code and the thread-state transitions to the lowest level of 
>> RawMonitorWait, around the final park() call, after we have enqueued 
>> the waiter and released the monitor. That avoids any deadlock 
>> possibility.
>>
>> I also added checks to is_interrupted/interrupted to ensure they are 
>> only called by a thread in a suitable state. This should only be the 
>> VMThread (as a consequence of the Thread.stop implementation occurring 
>> at a safepoint and issuing a JavaThread::interrupt() call to unblock 
>> the target); or a JavaThread that is not _thread_in_native or 
>> _thread_blocked.
>>
>> Testing: (still finalizing)
>> ?- tiers 1 - 6 (Oracle platforms)
>> ?- Local Linux testing
>> ? - vmTestbase/nsk/monitoring/
>> ? - vmTestbase/nsk/jdwp
>> ? - vmTestbase/nsk/jdb/
>> ? - vmTestbase/nsk/jdi/
>> ? - vmTestbase/nsk/jvmti/
>> ? - serviceability/jvmti/
>> ? - serviceability/jdwp
>> ? - JDK: java/lang/management
>> ???????? com/sun/management
>>
>> ** Note that this applies to all accesses we make via code in 
>> javaClasses.*. For this particular code I thought about adding a guard 
>> in JavaThread::threadObj() but it turns out when we generate a crash 
>> report we access the Thread's name() field and that can happen when in 
>> any state, so we'd always trigger a secondary assertion failure during 
>> error reporting if we did that. Note that accessing name() can still 
>> easily lead to secondary assertions failures as I discovered when 
>> trying to debug this and print the thread name out - I would see an 
>> is_instance assertion fail checking that the Thread name() is an 
>> instance of java.lang.String!
>>
>> Thanks,
>> David
>> -----
> 

From daniel.daugherty at oracle.com  Thu Nov 14 22:33:34 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 14 Nov 2019 17:33:34 -0500
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <00254f6c-7532-a12d-9074-831bf3b69abd@oracle.com>
 <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>
Message-ID: <adb3fa1c-032d-d903-31d7-fa38b7d84f20@oracle.com>

> Webrev updated in place. 

Thumbs up.

Dan


On 11/14/19 5:21 PM, David Holmes wrote:
> Hi Serguei,
>
> Thanks for taking a look.
>
> On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
>> Hi David,
>>
>> It looks good to me.
>> A couple of nits below.
>>
>> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html 
>>
>>
>> 236 if (self->is_Java_thread()) {
>> 237 JavaThread* jt = (JavaThread*) self;
>> 238 // Transition to VM so we can check interrupt state
>> 239 ThreadInVMfromNative tivm(jt);
>> 240 if (jt->is_interrupted(true)) {
>> 241 ret = M_INTERRUPTED;
>> 242 } else {
>> 243 ThreadBlockInVM tbivm(jt);
>> 244 jt->set_suspend_equivalent();
>> 245 if (millis <= 0) {
>> 246 self->_ParkEvent->park();
>> 247 } else {
>> 248 self->_ParkEvent->park(millis);
>> 249 }
>> 250 }
>> 251 // Return to VM before post-check of interrupt state
>> 252 if (jt->is_interrupted(true)) {
>> 253 ret = M_INTERRUPTED;
>> 254 }
>> 255 } else {
>>
>>
>> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
>> It will add more clarity to this code.
>
> No, it has to be after line 250 as that is when we will hit the TBIVM 
> destructor and so return to _thread_in_vm which is the state needed to 
> read the interrupted field. Dan commented on the above and I changed 
> it slightly by moving the comment:
>
> > 250?? // Return to VM before post-check of interrupt state
> > 251 }
> > 252 if (jt->is_interrupted(true)) {
> > 253?? ret = M_INTERRUPTED;
> > 254 }
>
>
>> ? 412?? if (self->is_Java_thread()) {
>> 413 JavaThread* jt = (JavaThread*)self;
>> 414 jt->set_suspend_equivalent();
>> ? 415???? for (;;) {
>> ? 416?????? if (!jt->handle_special_suspend_equivalent_condition()) {
>> ? 417???????? break;
>> 418 } else {
>> 419 // We've been suspended whilst waiting and so we have to
>> 420 // relinquish the raw monitor until we are resumed. Of course
>> 421 // after reacquiring we have to re-check for suspension again.
>> 422 // Suspension requires we are _thread_blocked, and we also have to
>> 423 // recheck for being interrupted.
>> ? 424???????? simple_exit(jt);
>> 425 {
>> 426 ThreadInVMfromNative tivm(jt);
>> 427 {
>> 428 ThreadBlockInVM tbivm(jt);
>> ? 429???????????? jt->java_suspend_self();
>> 430 }
>> 431 if (jt->is_interrupted(true)) {
>> 432 ret = M_INTERRUPTED;
>> 433 }
>> 434 }
>> ? 435???????? simple_enter(jt);
>> ? 436???????? jt->set_suspend_equivalent();
>> ? 437?????? }
>> ? ...
>>
>> This code can be simplified a little bit.
>> The line:
>>
>> 414 jt->set_suspend_equivalent();
>>
>> can be placed before line 416.
>> Then this line can be removed:
>>
>> ? 436???????? jt->set_suspend_equivalent();
>
> Yes you're right. I was trying to preserve the original loop 
> structure, but then had to add the additional set_suspend_equivalent 
> for the first iteration. But I can instead just move the existing one 
> to the top of the loop.
>
> Webrev updated in place.
>
> Thanks,
> David
> -----
>
>>
>> Thanks,
>> Serguei
>>
>>
>> On 11/11/19 20:52, David Holmes wrote:
>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>
>>> In JDK-8229516 I moved the interrupted state of a thread from the 
>>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>>> overlooked a critical aspect, which is that to access the field of a 
>>> Java object the JavaThread must not be in a safepoint-safe state** - 
>>> otherwise the oop, and anything referenced there from could be 
>>> relocated by the GC whilst the JavaThread is accessing it. This 
>>> manifested in a number of tests using JVM TI Agent threads and JVM 
>>> TI RawMonitors because the JavaThread's were marked _thread_blocked 
>>> and hence safepoint-safe, and we read a non-zero value for the 
>>> interrupted field even though we had never been interrupted.
>>>
>>> This problem existed in all the code that checks for interruption 
>>> when "waiting":
>>>
>>> - Parker::park (the code underpinning 
>>> java.util.concurrent.LockSupport.park())
>>>
>>> To fix this code I simply deleted a late check of the interrupted 
>>> field. The check was not needed because if an interrupt has occurred 
>>> then we will find the ParkEvent in a signalled state.
>>>
>>> - ObjectMonitor::wait
>>>
>>> Here the late check of the interrupted state is essential as we 
>>> reset the ParkEvent after an earlier check of the interrupted state. 
>>> But the fix was simply achieved by moving the check slightly earlier 
>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>
>>> - RawMonitor::wait
>>>
>>> This fix was much more involved. The RawMonitor code directly 
>>> transitions the JavaThread from _thread_in_Native to 
>>> _thread_blocked. This is safe from a safepoint perspective because 
>>> they are equivalent safepoint-safe states. To allow access to the 
>>> interrupted field I have to transition from native to _thread_in_vm, 
>>> and that has to be done by proper thread-state transitions to ensure 
>>> correct access to the oop and its fields. Having done that I can 
>>> then use ThreadBlockInVM for the transitions to blocked. However, as 
>>> the old code noted it can't use proper thread-state transitions as 
>>> this will lead to deadlocks with the VMThread that can also use 
>>> RawMonitors when executing various event callbacks. To deal with 
>>> that we have to note that the real constraint is that the JavaThread 
>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence 
>>> the fix was push all the interrupt checking code and the 
>>> thread-state transitions to the lowest level of RawMonitorWait, 
>>> around the final park() call, after we have enqueued the waiter and 
>>> released the monitor. That avoids any deadlock possibility.
>>>
>>> I also added checks to is_interrupted/interrupted to ensure they are 
>>> only called by a thread in a suitable state. This should only be the 
>>> VMThread (as a consequence of the Thread.stop implementation 
>>> occurring at a safepoint and issuing a JavaThread::interrupt() call 
>>> to unblock the target); or a JavaThread that is not 
>>> _thread_in_native or _thread_blocked.
>>>
>>> Testing: (still finalizing)
>>> ?- tiers 1 - 6 (Oracle platforms)
>>> ?- Local Linux testing
>>> ? - vmTestbase/nsk/monitoring/
>>> ? - vmTestbase/nsk/jdwp
>>> ? - vmTestbase/nsk/jdb/
>>> ? - vmTestbase/nsk/jdi/
>>> ? - vmTestbase/nsk/jvmti/
>>> ? - serviceability/jvmti/
>>> ? - serviceability/jdwp
>>> ? - JDK: java/lang/management
>>> ???????? com/sun/management
>>>
>>> ** Note that this applies to all accesses we make via code in 
>>> javaClasses.*. For this particular code I thought about adding a 
>>> guard in JavaThread::threadObj() but it turns out when we generate a 
>>> crash report we access the Thread's name() field and that can happen 
>>> when in any state, so we'd always trigger a secondary assertion 
>>> failure during error reporting if we did that. Note that accessing 
>>> name() can still easily lead to secondary assertions failures as I 
>>> discovered when trying to debug this and print the thread name out - 
>>> I would see an is_instance assertion fail checking that the Thread 
>>> name() is an instance of java.lang.String!
>>>
>>> Thanks,
>>> David
>>> -----
>>


From david.holmes at oracle.com  Thu Nov 14 22:40:18 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 15 Nov 2019 08:40:18 +1000
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <adb3fa1c-032d-d903-31d7-fa38b7d84f20@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <00254f6c-7532-a12d-9074-831bf3b69abd@oracle.com>
 <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>
 <adb3fa1c-032d-d903-31d7-fa38b7d84f20@oracle.com>
Message-ID: <37120bb3-a7e4-34c4-b0a0-ae1042045214@oracle.com>

Thanks Dan!

David

On 15/11/2019 8:33 am, Daniel D. Daugherty wrote:
>> Webrev updated in place. 
> 
> Thumbs up.
> 
> Dan
> 
> 
> 
> On 11/14/19 5:21 PM, David Holmes wrote:
>> Hi Serguei,
>>
>> Thanks for taking a look.
>>
>> On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
>>> Hi David,
>>>
>>> It looks good to me.
>>> A couple of nits below.
>>>
>>> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html 
>>>
>>>
>>> 236 if (self->is_Java_thread()) {
>>> 237 JavaThread* jt = (JavaThread*) self;
>>> 238 // Transition to VM so we can check interrupt state
>>> 239 ThreadInVMfromNative tivm(jt);
>>> 240 if (jt->is_interrupted(true)) {
>>> 241 ret = M_INTERRUPTED;
>>> 242 } else {
>>> 243 ThreadBlockInVM tbivm(jt);
>>> 244 jt->set_suspend_equivalent();
>>> 245 if (millis <= 0) {
>>> 246 self->_ParkEvent->park();
>>> 247 } else {
>>> 248 self->_ParkEvent->park(millis);
>>> 249 }
>>> 250 }
>>> 251 // Return to VM before post-check of interrupt state
>>> 252 if (jt->is_interrupted(true)) {
>>> 253 ret = M_INTERRUPTED;
>>> 254 }
>>> 255 } else {
>>>
>>>
>>> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
>>> It will add more clarity to this code.
>>
>> No, it has to be after line 250 as that is when we will hit the TBIVM 
>> destructor and so return to _thread_in_vm which is the state needed to 
>> read the interrupted field. Dan commented on the above and I changed 
>> it slightly by moving the comment:
>>
>> > 250?? // Return to VM before post-check of interrupt state
>> > 251 }
>> > 252 if (jt->is_interrupted(true)) {
>> > 253?? ret = M_INTERRUPTED;
>> > 254 }
>>
>>
>>> ? 412?? if (self->is_Java_thread()) {
>>> 413 JavaThread* jt = (JavaThread*)self;
>>> 414 jt->set_suspend_equivalent();
>>> ? 415???? for (;;) {
>>> ? 416?????? if (!jt->handle_special_suspend_equivalent_condition()) {
>>> ? 417???????? break;
>>> 418 } else {
>>> 419 // We've been suspended whilst waiting and so we have to
>>> 420 // relinquish the raw monitor until we are resumed. Of course
>>> 421 // after reacquiring we have to re-check for suspension again.
>>> 422 // Suspension requires we are _thread_blocked, and we also have to
>>> 423 // recheck for being interrupted.
>>> ? 424???????? simple_exit(jt);
>>> 425 {
>>> 426 ThreadInVMfromNative tivm(jt);
>>> 427 {
>>> 428 ThreadBlockInVM tbivm(jt);
>>> ? 429???????????? jt->java_suspend_self();
>>> 430 }
>>> 431 if (jt->is_interrupted(true)) {
>>> 432 ret = M_INTERRUPTED;
>>> 433 }
>>> 434 }
>>> ? 435???????? simple_enter(jt);
>>> ? 436???????? jt->set_suspend_equivalent();
>>> ? 437?????? }
>>> ? ...
>>>
>>> This code can be simplified a little bit.
>>> The line:
>>>
>>> 414 jt->set_suspend_equivalent();
>>>
>>> can be placed before line 416.
>>> Then this line can be removed:
>>>
>>> ? 436???????? jt->set_suspend_equivalent();
>>
>> Yes you're right. I was trying to preserve the original loop 
>> structure, but then had to add the additional set_suspend_equivalent 
>> for the first iteration. But I can instead just move the existing one 
>> to the top of the loop.
>>
>> Webrev updated in place.
>>
>> Thanks,
>> David
>> -----
>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 11/11/19 20:52, David Holmes wrote:
>>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>>
>>>> In JDK-8229516 I moved the interrupted state of a thread from the 
>>>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>>>> overlooked a critical aspect, which is that to access the field of a 
>>>> Java object the JavaThread must not be in a safepoint-safe state** - 
>>>> otherwise the oop, and anything referenced there from could be 
>>>> relocated by the GC whilst the JavaThread is accessing it. This 
>>>> manifested in a number of tests using JVM TI Agent threads and JVM 
>>>> TI RawMonitors because the JavaThread's were marked _thread_blocked 
>>>> and hence safepoint-safe, and we read a non-zero value for the 
>>>> interrupted field even though we had never been interrupted.
>>>>
>>>> This problem existed in all the code that checks for interruption 
>>>> when "waiting":
>>>>
>>>> - Parker::park (the code underpinning 
>>>> java.util.concurrent.LockSupport.park())
>>>>
>>>> To fix this code I simply deleted a late check of the interrupted 
>>>> field. The check was not needed because if an interrupt has occurred 
>>>> then we will find the ParkEvent in a signalled state.
>>>>
>>>> - ObjectMonitor::wait
>>>>
>>>> Here the late check of the interrupted state is essential as we 
>>>> reset the ParkEvent after an earlier check of the interrupted state. 
>>>> But the fix was simply achieved by moving the check slightly earlier 
>>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>>
>>>> - RawMonitor::wait
>>>>
>>>> This fix was much more involved. The RawMonitor code directly 
>>>> transitions the JavaThread from _thread_in_Native to 
>>>> _thread_blocked. This is safe from a safepoint perspective because 
>>>> they are equivalent safepoint-safe states. To allow access to the 
>>>> interrupted field I have to transition from native to _thread_in_vm, 
>>>> and that has to be done by proper thread-state transitions to ensure 
>>>> correct access to the oop and its fields. Having done that I can 
>>>> then use ThreadBlockInVM for the transitions to blocked. However, as 
>>>> the old code noted it can't use proper thread-state transitions as 
>>>> this will lead to deadlocks with the VMThread that can also use 
>>>> RawMonitors when executing various event callbacks. To deal with 
>>>> that we have to note that the real constraint is that the JavaThread 
>>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence 
>>>> the fix was push all the interrupt checking code and the 
>>>> thread-state transitions to the lowest level of RawMonitorWait, 
>>>> around the final park() call, after we have enqueued the waiter and 
>>>> released the monitor. That avoids any deadlock possibility.
>>>>
>>>> I also added checks to is_interrupted/interrupted to ensure they are 
>>>> only called by a thread in a suitable state. This should only be the 
>>>> VMThread (as a consequence of the Thread.stop implementation 
>>>> occurring at a safepoint and issuing a JavaThread::interrupt() call 
>>>> to unblock the target); or a JavaThread that is not 
>>>> _thread_in_native or _thread_blocked.
>>>>
>>>> Testing: (still finalizing)
>>>> ?- tiers 1 - 6 (Oracle platforms)
>>>> ?- Local Linux testing
>>>> ? - vmTestbase/nsk/monitoring/
>>>> ? - vmTestbase/nsk/jdwp
>>>> ? - vmTestbase/nsk/jdb/
>>>> ? - vmTestbase/nsk/jdi/
>>>> ? - vmTestbase/nsk/jvmti/
>>>> ? - serviceability/jvmti/
>>>> ? - serviceability/jdwp
>>>> ? - JDK: java/lang/management
>>>> ???????? com/sun/management
>>>>
>>>> ** Note that this applies to all accesses we make via code in 
>>>> javaClasses.*. For this particular code I thought about adding a 
>>>> guard in JavaThread::threadObj() but it turns out when we generate a 
>>>> crash report we access the Thread's name() field and that can happen 
>>>> when in any state, so we'd always trigger a secondary assertion 
>>>> failure during error reporting if we did that. Note that accessing 
>>>> name() can still easily lead to secondary assertions failures as I 
>>>> discovered when trying to debug this and print the thread name out - 
>>>> I would see an is_instance assertion fail checking that the Thread 
>>>> name() is an instance of java.lang.String!
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>
> 

From ioi.lam at oracle.com  Fri Nov 15 00:28:01 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 14 Nov 2019 16:28:01 -0800
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <CA+w6HxY_VL+7NXV0RKJbcRqC0=8OyzAU9+4pP6ej_e0iCAYd_w@mail.gmail.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
 <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
 <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>
 <CALrW1jxo-Bj3ZiQK4J6-c=zFHNsGAREX2=rpXyP53F_9HFEhug@mail.gmail.com>
 <CA+w6HxY_VL+7NXV0RKJbcRqC0=8OyzAU9+4pP6ej_e0iCAYd_w@mail.gmail.com>
Message-ID: <bd2067f0-e758-46e5-0eff-6c57f135ff3c@oracle.com>


On 11/14/19 1:47 PM, Man Cao wrote:
> Thanks for the review.
>
>> It's not directly related, I'm wondering why such small hashtable is
>> needed in your use case.
> The test G1AddMetaspaceDependency.java sets -XX:G1UpdateBufferSize=1, which
> would create a size-one hashtable with my pending change for JDK-8087198.
> Real users should probably not set G1UpdateBufferSize so small, unless they
> want to stress G1's code for concurrent refinement.
>
> -Man

If a minimum size of 1 doesn't make sense, maybe in your patch for 
JDK-8087198, you can update the allowable range in globals.hpp, or 
manually override it to a minimum size that makes sense? That can 
prevent incorrect settings that would cause unintended slow down.

 ? product(size_t, G1UpdateBufferSize, 
256,????????????????????????????????? \
 ????????? "Size of an update 
buffer")?????????????????????????????????????? \
 ????????? range(1, NOT_LP64(32*M) 
LP64_ONLY(1*G))?????????????????????????? \
\

Thanks
- Ioi

>
> On Thu, Nov 14, 2019 at 1:20 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>
>> Hi Man,
>>
>> Just took a look. Looks fine to me as well.
>>
>> It's not directly related, I'm wondering why such small hashtable is
>> needed in your use case.
>>
>> Thanks,
>> Jiangli
>>
>> On Thu, Nov 14, 2019 at 11:19 AM Man Cao <manc at google.com> wrote:
>>> Thanks for the review. Yes, I will try using ResourceHashtable in new
>> code.
>>> The BasicHashtable does not work with size 2 and 3, either. In my use
>> case,
>>> the initial size is based on a JVM flag (G1UpdateBufferSize), so it is
>>> dependent on user input.
>>>
>>> -Man
>>>
>>>
>>> On Thu, Nov 14, 2019 at 10:52 AM <coleen.phillimore at oracle.com> wrote:
>>>
>>>> This fix seems fine, but having a hashtable with a starting length 1
>>>> seems silly.   Unless I'm reading this wrong.   As Ioi wrote in his
>>>> comment, there might be a better hashtable for your work.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 11/13/19 8:00 PM, Man Cao wrote:
>>>>> Hi all,
>>>>>
>>>>> Can I have reviews for this small bug fix?
>>>>> Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234127
>>>>>
>>>>> I'm trying to make use of KVHashtable in JDK-8087198 and encountered
>> this
>>>>> bug.
>>>>>
>>>>> -Man
>>>>


From coleen.phillimore at oracle.com  Fri Nov 15 01:06:32 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Thu, 14 Nov 2019 20:06:32 -0500
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <bd2067f0-e758-46e5-0eff-6c57f135ff3c@oracle.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
 <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
 <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>
 <CALrW1jxo-Bj3ZiQK4J6-c=zFHNsGAREX2=rpXyP53F_9HFEhug@mail.gmail.com>
 <CA+w6HxY_VL+7NXV0RKJbcRqC0=8OyzAU9+4pP6ej_e0iCAYd_w@mail.gmail.com>
 <bd2067f0-e758-46e5-0eff-6c57f135ff3c@oracle.com>
Message-ID: <136f31e0-af26-2bd4-8e08-893ad5e13838@oracle.com>


On 11/14/19 7:28 PM, Ioi Lam wrote:
>
>
> On 11/14/19 1:47 PM, Man Cao wrote:
>> Thanks for the review.
>>
>>> It's not directly related, I'm wondering why such small hashtable is
>>> needed in your use case.
>> The test G1AddMetaspaceDependency.java sets -XX:G1UpdateBufferSize=1, 
>> which
>> would create a size-one hashtable with my pending change for 
>> JDK-8087198.
>> Real users should probably not set G1UpdateBufferSize so small, 
>> unless they
>> want to stress G1's code for concurrent refinement.
>>
>> -Man
>
> If a minimum size of 1 doesn't make sense, maybe in your patch for 
> JDK-8087198, you can update the allowable range in globals.hpp, or 
> manually override it to a minimum size that makes sense? That can 
> prevent incorrect settings that would cause unintended slow down.
>
> ? product(size_t, G1UpdateBufferSize, 
> 256,????????????????????????????????? \
> ????????? "Size of an update 
> buffer")?????????????????????????????????????? \
> ????????? range(1, NOT_LP64(32*M) 
> LP64_ONLY(1*G))?????????????????????????? \
> \
>

Yes, this should be done also.
Coleen

> Thanks
> - Ioi
>
>>
>> On Thu, Nov 14, 2019 at 1:20 PM Jiangli Zhou <jianglizhou at google.com> 
>> wrote:
>>
>>> Hi Man,
>>>
>>> Just took a look. Looks fine to me as well.
>>>
>>> It's not directly related, I'm wondering why such small hashtable is
>>> needed in your use case.
>>>
>>> Thanks,
>>> Jiangli
>>>
>>> On Thu, Nov 14, 2019 at 11:19 AM Man Cao <manc at google.com> wrote:
>>>> Thanks for the review. Yes, I will try using ResourceHashtable in new
>>> code.
>>>> The BasicHashtable does not work with size 2 and 3, either. In my use
>>> case,
>>>> the initial size is based on a JVM flag (G1UpdateBufferSize), so it is
>>>> dependent on user input.
>>>>
>>>> -Man
>>>>
>>>>
>>>> On Thu, Nov 14, 2019 at 10:52 AM <coleen.phillimore at oracle.com> wrote:
>>>>
>>>>> This fix seems fine, but having a hashtable with a starting length 1
>>>>> seems silly.?? Unless I'm reading this wrong.?? As Ioi wrote in his
>>>>> comment, there might be a better hashtable for your work.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 11/13/19 8:00 PM, Man Cao wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Can I have reviews for this small bug fix?
>>>>>> Webrev: https://cr.openjdk.java.net/~manc/8234127/webrev.00/
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234127
>>>>>>
>>>>>> I'm trying to make use of KVHashtable in JDK-8087198 and encountered
>>> this
>>>>>> bug.
>>>>>>
>>>>>> -Man
>>>>>
>


From serguei.spitsyn at oracle.com  Fri Nov 15 02:14:03 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Thu, 14 Nov 2019 18:14:03 -0800
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <00254f6c-7532-a12d-9074-831bf3b69abd@oracle.com>
 <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>
Message-ID: <c728708d-68b6-0cd5-7d5f-34005d8500b6@oracle.com>

Hi David,

Thank you for the update!
It looks good to me.

You are right about my first suggestion.
The lines need to stay where they are, or additional curly brackets
are needed to force the ThreadBlockInVM destructor earlier.

Thanks,
Serguei


On 11/14/19 2:21 PM, David Holmes wrote:
> Hi Serguei,
>
> Thanks for taking a look.
>
> On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
>> Hi David,
>>
>> It looks good to me.
>> A couple of nits below.
>>
>> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html 
>>
>>
>> 236 if (self->is_Java_thread()) {
>> 237 JavaThread* jt = (JavaThread*) self;
>> 238 // Transition to VM so we can check interrupt state
>> 239 ThreadInVMfromNative tivm(jt);
>> 240 if (jt->is_interrupted(true)) {
>> 241 ret = M_INTERRUPTED;
>> 242 } else {
>> 243 ThreadBlockInVM tbivm(jt);
>> 244 jt->set_suspend_equivalent();
>> 245 if (millis <= 0) {
>> 246 self->_ParkEvent->park();
>> 247 } else {
>> 248 self->_ParkEvent->park(millis);
>> 249 }
>> 250 }
>> 251 // Return to VM before post-check of interrupt state
>> 252 if (jt->is_interrupted(true)) {
>> 253 ret = M_INTERRUPTED;
>> 254 }
>> 255 } else {
>>
>>
>> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
>> It will add more clarity to this code.
>
> No, it has to be after line 250 as that is when we will hit the TBIVM 
> destructor and so return to _thread_in_vm which is the state needed to 
> read the interrupted field. Dan commented on the above and I changed 
> it slightly by moving the comment:
>
> > 250?? // Return to VM before post-check of interrupt state
> > 251 }
> > 252 if (jt->is_interrupted(true)) {
> > 253?? ret = M_INTERRUPTED;
> > 254 }
>
>
>> ? 412?? if (self->is_Java_thread()) {
>> 413 JavaThread* jt = (JavaThread*)self;
>> 414 jt->set_suspend_equivalent();
>> ? 415???? for (;;) {
>> ? 416?????? if (!jt->handle_special_suspend_equivalent_condition()) {
>> ? 417???????? break;
>> 418 } else {
>> 419 // We've been suspended whilst waiting and so we have to
>> 420 // relinquish the raw monitor until we are resumed. Of course
>> 421 // after reacquiring we have to re-check for suspension again.
>> 422 // Suspension requires we are _thread_blocked, and we also have to
>> 423 // recheck for being interrupted.
>> ? 424???????? simple_exit(jt);
>> 425 {
>> 426 ThreadInVMfromNative tivm(jt);
>> 427 {
>> 428 ThreadBlockInVM tbivm(jt);
>> ? 429???????????? jt->java_suspend_self();
>> 430 }
>> 431 if (jt->is_interrupted(true)) {
>> 432 ret = M_INTERRUPTED;
>> 433 }
>> 434 }
>> ? 435???????? simple_enter(jt);
>> ? 436???????? jt->set_suspend_equivalent();
>> ? 437?????? }
>> ? ...
>>
>> This code can be simplified a little bit.
>> The line:
>>
>> 414 jt->set_suspend_equivalent();
>>
>> can be placed before line 416.
>> Then this line can be removed:
>>
>> ? 436???????? jt->set_suspend_equivalent();
>
> Yes you're right. I was trying to preserve the original loop 
> structure, but then had to add the additional set_suspend_equivalent 
> for the first iteration. But I can instead just move the existing one 
> to the top of the loop.
>
> Webrev updated in place.
>
> Thanks,
> David
> -----
>
>>
>> Thanks,
>> Serguei
>>
>>
>> On 11/11/19 20:52, David Holmes wrote:
>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>
>>> In JDK-8229516 I moved the interrupted state of a thread from the 
>>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>>> overlooked a critical aspect, which is that to access the field of a 
>>> Java object the JavaThread must not be in a safepoint-safe state** - 
>>> otherwise the oop, and anything referenced there from could be 
>>> relocated by the GC whilst the JavaThread is accessing it. This 
>>> manifested in a number of tests using JVM TI Agent threads and JVM 
>>> TI RawMonitors because the JavaThread's were marked _thread_blocked 
>>> and hence safepoint-safe, and we read a non-zero value for the 
>>> interrupted field even though we had never been interrupted.
>>>
>>> This problem existed in all the code that checks for interruption 
>>> when "waiting":
>>>
>>> - Parker::park (the code underpinning 
>>> java.util.concurrent.LockSupport.park())
>>>
>>> To fix this code I simply deleted a late check of the interrupted 
>>> field. The check was not needed because if an interrupt has occurred 
>>> then we will find the ParkEvent in a signalled state.
>>>
>>> - ObjectMonitor::wait
>>>
>>> Here the late check of the interrupted state is essential as we 
>>> reset the ParkEvent after an earlier check of the interrupted state. 
>>> But the fix was simply achieved by moving the check slightly earlier 
>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>
>>> - RawMonitor::wait
>>>
>>> This fix was much more involved. The RawMonitor code directly 
>>> transitions the JavaThread from _thread_in_Native to 
>>> _thread_blocked. This is safe from a safepoint perspective because 
>>> they are equivalent safepoint-safe states. To allow access to the 
>>> interrupted field I have to transition from native to _thread_in_vm, 
>>> and that has to be done by proper thread-state transitions to ensure 
>>> correct access to the oop and its fields. Having done that I can 
>>> then use ThreadBlockInVM for the transitions to blocked. However, as 
>>> the old code noted it can't use proper thread-state transitions as 
>>> this will lead to deadlocks with the VMThread that can also use 
>>> RawMonitors when executing various event callbacks. To deal with 
>>> that we have to note that the real constraint is that the JavaThread 
>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence 
>>> the fix was push all the interrupt checking code and the 
>>> thread-state transitions to the lowest level of RawMonitorWait, 
>>> around the final park() call, after we have enqueued the waiter and 
>>> released the monitor. That avoids any deadlock possibility.
>>>
>>> I also added checks to is_interrupted/interrupted to ensure they are 
>>> only called by a thread in a suitable state. This should only be the 
>>> VMThread (as a consequence of the Thread.stop implementation 
>>> occurring at a safepoint and issuing a JavaThread::interrupt() call 
>>> to unblock the target); or a JavaThread that is not 
>>> _thread_in_native or _thread_blocked.
>>>
>>> Testing: (still finalizing)
>>> ?- tiers 1 - 6 (Oracle platforms)
>>> ?- Local Linux testing
>>> ? - vmTestbase/nsk/monitoring/
>>> ? - vmTestbase/nsk/jdwp
>>> ? - vmTestbase/nsk/jdb/
>>> ? - vmTestbase/nsk/jdi/
>>> ? - vmTestbase/nsk/jvmti/
>>> ? - serviceability/jvmti/
>>> ? - serviceability/jdwp
>>> ? - JDK: java/lang/management
>>> ???????? com/sun/management
>>>
>>> ** Note that this applies to all accesses we make via code in 
>>> javaClasses.*. For this particular code I thought about adding a 
>>> guard in JavaThread::threadObj() but it turns out when we generate a 
>>> crash report we access the Thread's name() field and that can happen 
>>> when in any state, so we'd always trigger a secondary assertion 
>>> failure during error reporting if we did that. Note that accessing 
>>> name() can still easily lead to secondary assertions failures as I 
>>> discovered when trying to debug this and print the thread name out - 
>>> I would see an is_instance assertion fail checking that the Thread 
>>> name() is an instance of java.lang.String!
>>>
>>> Thanks,
>>> David
>>> -----
>>


From david.holmes at oracle.com  Fri Nov 15 02:32:04 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 15 Nov 2019 12:32:04 +1000
Subject: RFR: 8233549: Thread interrupted state must only be accessed when
 not in a safepoint-safe state
In-Reply-To: <c728708d-68b6-0cd5-7d5f-34005d8500b6@oracle.com>
References: <fe942c6d-4db1-2166-739f-702c1f6cb2ca@oracle.com>
 <00254f6c-7532-a12d-9074-831bf3b69abd@oracle.com>
 <452c2f0f-9e7c-d8cc-c185-1f3349d0c566@oracle.com>
 <c728708d-68b6-0cd5-7d5f-34005d8500b6@oracle.com>
Message-ID: <aaa27416-e39a-81c7-9d85-979f5dacb0ac@oracle.com>

Thanks again Serguei.

David

On 15/11/2019 12:14 pm, serguei.spitsyn at oracle.com wrote:
> Hi David,
> 
> Thank you for the update!
> It looks good to me.
> 
> You are right about my first suggestion.
> The lines need to stay where they are, or additional curly brackets
> are needed to force the ThreadBlockInVM destructor earlier.
> 
> Thanks,
> Serguei
> 
> 
> On 11/14/19 2:21 PM, David Holmes wrote:
>> Hi Serguei,
>>
>> Thanks for taking a look.
>>
>> On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
>>> Hi David,
>>>
>>> It looks good to me.
>>> A couple of nits below.
>>>
>>> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html 
>>>
>>>
>>> 236 if (self->is_Java_thread()) {
>>> 237 JavaThread* jt = (JavaThread*) self;
>>> 238 // Transition to VM so we can check interrupt state
>>> 239 ThreadInVMfromNative tivm(jt);
>>> 240 if (jt->is_interrupted(true)) {
>>> 241 ret = M_INTERRUPTED;
>>> 242 } else {
>>> 243 ThreadBlockInVM tbivm(jt);
>>> 244 jt->set_suspend_equivalent();
>>> 245 if (millis <= 0) {
>>> 246 self->_ParkEvent->park();
>>> 247 } else {
>>> 248 self->_ParkEvent->park(millis);
>>> 249 }
>>> 250 }
>>> 251 // Return to VM before post-check of interrupt state
>>> 252 if (jt->is_interrupted(true)) {
>>> 253 ret = M_INTERRUPTED;
>>> 254 }
>>> 255 } else {
>>>
>>>
>>> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
>>> It will add more clarity to this code.
>>
>> No, it has to be after line 250 as that is when we will hit the TBIVM 
>> destructor and so return to _thread_in_vm which is the state needed to 
>> read the interrupted field. Dan commented on the above and I changed 
>> it slightly by moving the comment:
>>
>> > 250?? // Return to VM before post-check of interrupt state
>> > 251 }
>> > 252 if (jt->is_interrupted(true)) {
>> > 253?? ret = M_INTERRUPTED;
>> > 254 }
>>
>>
>>> ? 412?? if (self->is_Java_thread()) {
>>> 413 JavaThread* jt = (JavaThread*)self;
>>> 414 jt->set_suspend_equivalent();
>>> ? 415???? for (;;) {
>>> ? 416?????? if (!jt->handle_special_suspend_equivalent_condition()) {
>>> ? 417???????? break;
>>> 418 } else {
>>> 419 // We've been suspended whilst waiting and so we have to
>>> 420 // relinquish the raw monitor until we are resumed. Of course
>>> 421 // after reacquiring we have to re-check for suspension again.
>>> 422 // Suspension requires we are _thread_blocked, and we also have to
>>> 423 // recheck for being interrupted.
>>> ? 424???????? simple_exit(jt);
>>> 425 {
>>> 426 ThreadInVMfromNative tivm(jt);
>>> 427 {
>>> 428 ThreadBlockInVM tbivm(jt);
>>> ? 429???????????? jt->java_suspend_self();
>>> 430 }
>>> 431 if (jt->is_interrupted(true)) {
>>> 432 ret = M_INTERRUPTED;
>>> 433 }
>>> 434 }
>>> ? 435???????? simple_enter(jt);
>>> ? 436???????? jt->set_suspend_equivalent();
>>> ? 437?????? }
>>> ? ...
>>>
>>> This code can be simplified a little bit.
>>> The line:
>>>
>>> 414 jt->set_suspend_equivalent();
>>>
>>> can be placed before line 416.
>>> Then this line can be removed:
>>>
>>> ? 436???????? jt->set_suspend_equivalent();
>>
>> Yes you're right. I was trying to preserve the original loop 
>> structure, but then had to add the additional set_suspend_equivalent 
>> for the first iteration. But I can instead just move the existing one 
>> to the top of the loop.
>>
>> Webrev updated in place.
>>
>> Thanks,
>> David
>> -----
>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 11/11/19 20:52, David Holmes wrote:
>>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>>
>>>> In JDK-8229516 I moved the interrupted state of a thread from the 
>>>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>>>> overlooked a critical aspect, which is that to access the field of a 
>>>> Java object the JavaThread must not be in a safepoint-safe state** - 
>>>> otherwise the oop, and anything referenced there from could be 
>>>> relocated by the GC whilst the JavaThread is accessing it. This 
>>>> manifested in a number of tests using JVM TI Agent threads and JVM 
>>>> TI RawMonitors because the JavaThread's were marked _thread_blocked 
>>>> and hence safepoint-safe, and we read a non-zero value for the 
>>>> interrupted field even though we had never been interrupted.
>>>>
>>>> This problem existed in all the code that checks for interruption 
>>>> when "waiting":
>>>>
>>>> - Parker::park (the code underpinning 
>>>> java.util.concurrent.LockSupport.park())
>>>>
>>>> To fix this code I simply deleted a late check of the interrupted 
>>>> field. The check was not needed because if an interrupt has occurred 
>>>> then we will find the ParkEvent in a signalled state.
>>>>
>>>> - ObjectMonitor::wait
>>>>
>>>> Here the late check of the interrupted state is essential as we 
>>>> reset the ParkEvent after an earlier check of the interrupted state. 
>>>> But the fix was simply achieved by moving the check slightly earlier 
>>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>>
>>>> - RawMonitor::wait
>>>>
>>>> This fix was much more involved. The RawMonitor code directly 
>>>> transitions the JavaThread from _thread_in_Native to 
>>>> _thread_blocked. This is safe from a safepoint perspective because 
>>>> they are equivalent safepoint-safe states. To allow access to the 
>>>> interrupted field I have to transition from native to _thread_in_vm, 
>>>> and that has to be done by proper thread-state transitions to ensure 
>>>> correct access to the oop and its fields. Having done that I can 
>>>> then use ThreadBlockInVM for the transitions to blocked. However, as 
>>>> the old code noted it can't use proper thread-state transitions as 
>>>> this will lead to deadlocks with the VMThread that can also use 
>>>> RawMonitors when executing various event callbacks. To deal with 
>>>> that we have to note that the real constraint is that the JavaThread 
>>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence 
>>>> the fix was push all the interrupt checking code and the 
>>>> thread-state transitions to the lowest level of RawMonitorWait, 
>>>> around the final park() call, after we have enqueued the waiter and 
>>>> released the monitor. That avoids any deadlock possibility.
>>>>
>>>> I also added checks to is_interrupted/interrupted to ensure they are 
>>>> only called by a thread in a suitable state. This should only be the 
>>>> VMThread (as a consequence of the Thread.stop implementation 
>>>> occurring at a safepoint and issuing a JavaThread::interrupt() call 
>>>> to unblock the target); or a JavaThread that is not 
>>>> _thread_in_native or _thread_blocked.
>>>>
>>>> Testing: (still finalizing)
>>>> ?- tiers 1 - 6 (Oracle platforms)
>>>> ?- Local Linux testing
>>>> ? - vmTestbase/nsk/monitoring/
>>>> ? - vmTestbase/nsk/jdwp
>>>> ? - vmTestbase/nsk/jdb/
>>>> ? - vmTestbase/nsk/jdi/
>>>> ? - vmTestbase/nsk/jvmti/
>>>> ? - serviceability/jvmti/
>>>> ? - serviceability/jdwp
>>>> ? - JDK: java/lang/management
>>>> ???????? com/sun/management
>>>>
>>>> ** Note that this applies to all accesses we make via code in 
>>>> javaClasses.*. For this particular code I thought about adding a 
>>>> guard in JavaThread::threadObj() but it turns out when we generate a 
>>>> crash report we access the Thread's name() field and that can happen 
>>>> when in any state, so we'd always trigger a secondary assertion 
>>>> failure during error reporting if we did that. Note that accessing 
>>>> name() can still easily lead to secondary assertions failures as I 
>>>> discovered when trying to debug this and print the thread name out - 
>>>> I would see an is_instance assertion fail checking that the Thread 
>>>> name() is an instance of java.lang.String!
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>
> 

From manc at google.com  Fri Nov 15 02:34:14 2019
From: manc at google.com (Man Cao)
Date: Thu, 14 Nov 2019 18:34:14 -0800
Subject: RFR (XS): 8234127: BasicHashtable does not support small
 table_size
In-Reply-To: <136f31e0-af26-2bd4-8e08-893ad5e13838@oracle.com>
References: <CA+w6HxbgH12gwMmT_P9_f36JH=C8denWPPjanToirE=-SStWMw@mail.gmail.com>
 <375ff2a4-25b0-267c-4e48-f572347a72c0@oracle.com>
 <CA+w6HxbJUR5W-yA6Owr0iSV=fGxHLaMd0vWntLFX7nPCBocaaw@mail.gmail.com>
 <CALrW1jxo-Bj3ZiQK4J6-c=zFHNsGAREX2=rpXyP53F_9HFEhug@mail.gmail.com>
 <CA+w6HxY_VL+7NXV0RKJbcRqC0=8OyzAU9+4pP6ej_e0iCAYd_w@mail.gmail.com>
 <bd2067f0-e758-46e5-0eff-6c57f135ff3c@oracle.com>
 <136f31e0-af26-2bd4-8e08-893ad5e13838@oracle.com>
Message-ID: <CA+w6HxZwdi+wtgQDYCNUvR7Kh5XsvfKceKPKB7MZEF4sEdeZRg@mail.gmail.com>

Thanks for the suggestion.
My patch for JDK-8087198 only uses the hashtable in fastdebug build for
correctness checking.
The patch is under review, and I'm not sure if we will eventually keep the
hashtable in codebase, though.
For G1UpdateBufferSize, I think 1 is actually quite useful for GC testing
purposes.

-Man

From martin.doerr at sap.com  Fri Nov 15 10:12:11 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 15 Nov 2019 10:12:11 +0000
Subject: RFR(T): 8234188: AIX build broken after 8220310
In-Reply-To: <9b637531-72f7-3ea8-d059-2a481b7b4afb@oracle.com>
References: <VI1PR0201MB24798B3FEB2C206978AA59409A710@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9b637531-72f7-3ea8-d059-2a481b7b4afb@oracle.com>
Message-ID: <VI1PR0201MB24793A8D09E497A9969357479A700@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi Harold,

thanks for the review. Pushed.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net> On Behalf Of Harold Seigel
> Sent: Donnerstag, 14. November 2019 18:04
> To: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(T): 8234188: AIX build broken after 8220310
> 
> Hi Martin,
> 
> The change looks good and trivial.
> 
> Thanks, Harold
> 
> On 11/14/2019 11:55 AM, Doerr, Martin wrote:
> > Hi,
> >
> > can somebody please review this trivial AIX build fix?
> >
> > http://cr.openjdk.java.net/~mdoerr/8234188_fix_aix_build/webrev.00/
> >
> > Best regards,
> > Martin
> >

From christoph.goettschkes at microdoc.com  Fri Nov 15 13:49:36 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Fri, 15 Nov 2019 14:49:36 +0100
Subject: Build broken for ARM32 after 8231610: Relocate the CDS archive if it
 cannot be mapped to the requested address
Message-ID: <mailman.11.1573825878.19479.hotspot-runtime-dev@openjdk.java.net>

Hi,

I am no longer able to build for ARM32 after the commit for 8231610:
Relocate the CDS archive if it cannot be mapped to the requested address
[1]. I am using a linaro toolchain with a GCC version 4.9.4.

arm-linux-gnueabi-g++ (Linaro GCC 4.9-2017.01) 4.9.4
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

src/hotspot/share/memory/filemap.cpp:1569:21:
error: right shift count >= width of type [-Werror]
   assert((offset >> 32) == 0, "must be 32-bit only");

I guess the same check could be achieved without a right shift operation,
by casting the offset twice and comparing it?
   assert(offset == (size_t)(uint32_t)offset, "must be 32-bit only");

Here is a webrev [2] for that particular fix (there are two instances
where size_t is right shifted by 32). Should I open a new bug for this,
or should this be discussed using the already existing bug 8231610?

-- Christoph

[1] https://bugs.openjdk.java.net/browse/JDK-8231610
[2] https://cr.openjdk.java.net/~cgo/8231610/webrev.00/


From lois.foltan at oracle.com  Fri Nov 15 14:49:31 2019
From: lois.foltan at oracle.com (Lois Foltan)
Date: Fri, 15 Nov 2019 09:49:31 -0500
Subject: RFR: 8233497: Optimize default method generation by data
 structure reuse
In-Reply-To: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
References: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
Message-ID: <a17b9595-c8d9-7e25-1ab0-4260dba6b76a@oracle.com>

On 11/8/2019 6:57 AM, Claes Redestad wrote:
> Hi,
>
> when loading classes with complex hierarchies and many default methods,
> we can end up spending significant time in
> DefaultMethods::generate_default_methods
>
> This optimization reduces work done and memory requirements by reusing
> allocated data structures. For example by maintaining free lists of
> allocated Node objects.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233497
> Webrev: http://cr.openjdk.java.net/~redestad/8233497/open.00/
>
> Testing: Tier1-3, will make sure tier4-7 pass before push
>
> Performance notes: On one of our more complex startup tests we see a 3%
> improvement on the execution time total. Much less on simpler
> applications.
>
> I've not done a formal complexity analysis, but I think the memory
> complexity is now down from O(N*M) to O(N+M) where N is the number of
> classes and interfaces in the hierarchy and M the number of methods of
> interest in that hierarchy. Algorithmic complexity is probably O(N*M)
> still, but with much better constants.
>
> Special thanks to Lois for patience and persistence over several rounds
> of pre-review!
>
> Thanks!
>
> /Claes

Hi Claes,

I have reviewed your final webrev and I think this looks good.? It 
certainly is an area that can benefit from performance improvements so 
thank you for putting the time in on this.? A couple of final comments:

- line #79: Can you update the comment, it indicates that new_node_data 
takes an InstanceKlass* parameter which has been removed

- line #716: I would like an assert as part of the if statement in the 
situation where _free_scopes is not empty and the StateRestoreScope node 
is being popped instead of newly allocated. The assert would then check 
if that StateRestorerScope's _marks GrowableArray is empty, which it 
should be.? That would make me more comfortable with the idea that a 
previous StateRestoreScope's _marks array's data is not getting mixed 
with new data being established for a new Node in the hierarchy.

Thanks,
Lois

From zgu at redhat.com  Fri Nov 15 15:07:37 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 15 Nov 2019 10:07:37 -0500
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
Message-ID: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>

I could not reproduce the problem stated in CR.

The theory is that, when releasing a 2GB+ arena, 
Arena::set_size_in_bytes() passes a negative long integer to NMT, when 
it goes through long -> int -> long conversion, at the end, it becomes a 
positive number.

This problem is illustrated in new test. Without the fix, after 
releasing a 2GB+ arena, NMT shows the arena size doubled, instead of 
going down.

I am not completely sure this fixes the problem reported, but it is 
worth to cleanup inconsistent types in NMT API.


Bug: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html

Test:
   hotspot_nmt + new test (fastdebug and release)
   on Linux x86_64

Thanks,

-Zhengyu


From claes.redestad at oracle.com  Fri Nov 15 15:20:16 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Fri, 15 Nov 2019 16:20:16 +0100
Subject: RFR: 8233497: Optimize default method generation by data
 structure reuse
In-Reply-To: <a17b9595-c8d9-7e25-1ab0-4260dba6b76a@oracle.com>
References: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
 <a17b9595-c8d9-7e25-1ab0-4260dba6b76a@oracle.com>
Message-ID: <6e27d457-fd7a-d1f3-9118-0cff84140df3@oracle.com>


On 2019-11-15 15:49, Lois Foltan wrote:
> On 11/8/2019 6:57 AM, Claes Redestad wrote:
>> Hi,
>>
>> when loading classes with complex hierarchies and many default methods,
>> we can end up spending significant time in
>> DefaultMethods::generate_default_methods
>>
>> This optimization reduces work done and memory requirements by reusing
>> allocated data structures. For example by maintaining free lists of
>> allocated Node objects.
>>
>> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233497
>> Webrev: http://cr.openjdk.java.net/~redestad/8233497/open.00/
>>
>> Testing: Tier1-3, will make sure tier4-7 pass before push
>>
>> Performance notes: On one of our more complex startup tests we see a 3%
>> improvement on the execution time total. Much less on simpler
>> applications.
>>
>> I've not done a formal complexity analysis, but I think the memory
>> complexity is now down from O(N*M) to O(N+M) where N is the number of
>> classes and interfaces in the hierarchy and M the number of methods of
>> interest in that hierarchy. Algorithmic complexity is probably O(N*M)
>> still, but with much better constants.
>>
>> Special thanks to Lois for patience and persistence over several rounds
>> of pre-review!
>>
>> Thanks!
>>
>> /Claes
> 
> Hi Claes,
> 
> I have reviewed your final webrev and I think this looks good.? It 
> certainly is an area that can benefit from performance improvements so 
> thank you for putting the time in on this. 

Lois, again thank you for reviewing and suggesting a lot of improvements
along the way!

> A couple of final comments:
> 
> - line #79: Can you update the comment, it indicates that new_node_data 
> takes an InstanceKlass* parameter which has been removed
> 
> - line #716: I would like an assert as part of the if statement in the 
> situation where _free_scopes is not empty and the StateRestoreScope node 
> is being popped instead of newly allocated. The assert would then check 
> if that StateRestorerScope's _marks GrowableArray is empty, which it 
> should be.? That would make me more comfortable with the idea that a 
> previous StateRestoreScope's _marks array's data is not getting mixed 
> with new data being established for a new Node in the hierarchy.
> 

Done:

http://cr.openjdk.java.net/~redestad/8233497/open.01/

/Claes

From daniel.daugherty at oracle.com  Fri Nov 15 15:28:00 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 10:28:00 -0500
Subject: Build broken for ARM32 after 8231610: Relocate the CDS archive if
 it cannot be mapped to the requested address
In-Reply-To: <20191115135123.92163D83FB@aojmv0009>
References: <20191115135123.92163D83FB@aojmv0009>
Message-ID: <81aaa857-fd74-9eac-c277-139b17a0ba4d@oracle.com>

> Should I open a new bug for this,
> or should this be discussed using the already existing bug 8231610?

Since a changeset for 8231610 has already been pushed, you'll need
a new bug for this.

Dan


On 11/15/19 8:49 AM, christoph.goettschkes at microdoc.com wrote:
> Hi,
>
> I am no longer able to build for ARM32 after the commit for 8231610:
> Relocate the CDS archive if it cannot be mapped to the requested address
> [1]. I am using a linaro toolchain with a GCC version 4.9.4.
>
> arm-linux-gnueabi-g++ (Linaro GCC 4.9-2017.01) 4.9.4
> Copyright (C) 2015 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
>
> src/hotspot/share/memory/filemap.cpp:1569:21:
> error: right shift count >= width of type [-Werror]
>     assert((offset >> 32) == 0, "must be 32-bit only");
>
> I guess the same check could be achieved without a right shift operation,
> by casting the offset twice and comparing it?
>     assert(offset == (size_t)(uint32_t)offset, "must be 32-bit only");
>
> Here is a webrev [2] for that particular fix (there are two instances
> where size_t is right shifted by 32). Should I open a new bug for this,
> or should this be discussed using the already existing bug 8231610?
>
> -- Christoph
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8231610
> [2] https://cr.openjdk.java.net/~cgo/8231610/webrev.00/
>


From thomas.stuefe at gmail.com  Fri Nov 15 17:59:23 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Fri, 15 Nov 2019 18:59:23 +0100
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
In-Reply-To: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
References: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
Message-ID: <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>

Hi Zhengyu,

wouldn't ssize_t not be a better choice?

Other than that, looks good.

..Thomas

On Fri, Nov 15, 2019 at 4:08 PM Zhengyu Gu <zgu at redhat.com> wrote:

> I could not reproduce the problem stated in CR.
>
> The theory is that, when releasing a 2GB+ arena,
> Arena::set_size_in_bytes() passes a negative long integer to NMT, when
> it goes through long -> int -> long conversion, at the end, it becomes a
> positive number.
>
> This problem is illustrated in new test. Without the fix, after
> releasing a 2GB+ arena, NMT shows the arena size doubled, instead of
> going down.
>
> I am not completely sure this fixes the problem reported, but it is
> worth to cleanup inconsistent types in NMT API.
>
>
> Bug: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
>
> Test:
>    hotspot_nmt + new test (fastdebug and release)
>    on Linux x86_64
>
> Thanks,
>
> -Zhengyu
>
>

From lois.foltan at oracle.com  Fri Nov 15 18:13:38 2019
From: lois.foltan at oracle.com (Lois Foltan)
Date: Fri, 15 Nov 2019 13:13:38 -0500
Subject: RFR: 8233497: Optimize default method generation by data
 structure reuse
In-Reply-To: <6e27d457-fd7a-d1f3-9118-0cff84140df3@oracle.com>
References: <5991863e-28cf-0daa-3549-905609ce94a9@oracle.com>
 <a17b9595-c8d9-7e25-1ab0-4260dba6b76a@oracle.com>
 <6e27d457-fd7a-d1f3-9118-0cff84140df3@oracle.com>
Message-ID: <5ca586cf-2d8a-8b3b-0499-fd1447dfa739@oracle.com>


On 11/15/2019 10:20 AM, Claes Redestad wrote:
>
>
> On 2019-11-15 15:49, Lois Foltan wrote:
>> On 11/8/2019 6:57 AM, Claes Redestad wrote:
>>> Hi,
>>>
>>> when loading classes with complex hierarchies and many default methods,
>>> we can end up spending significant time in
>>> DefaultMethods::generate_default_methods
>>>
>>> This optimization reduces work done and memory requirements by reusing
>>> allocated data structures. For example by maintaining free lists of
>>> allocated Node objects.
>>>
>>> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8233497
>>> Webrev: http://cr.openjdk.java.net/~redestad/8233497/open.00/
>>>
>>> Testing: Tier1-3, will make sure tier4-7 pass before push
>>>
>>> Performance notes: On one of our more complex startup tests we see a 3%
>>> improvement on the execution time total. Much less on simpler
>>> applications.
>>>
>>> I've not done a formal complexity analysis, but I think the memory
>>> complexity is now down from O(N*M) to O(N+M) where N is the number of
>>> classes and interfaces in the hierarchy and M the number of methods of
>>> interest in that hierarchy. Algorithmic complexity is probably O(N*M)
>>> still, but with much better constants.
>>>
>>> Special thanks to Lois for patience and persistence over several rounds
>>> of pre-review!
>>>
>>> Thanks!
>>>
>>> /Claes
>>
>> Hi Claes,
>>
>> I have reviewed your final webrev and I think this looks good. It 
>> certainly is an area that can benefit from performance improvements 
>> so thank you for putting the time in on this. 
>
> Lois, again thank you for reviewing and suggesting a lot of improvements
> along the way!
>
>> A couple of final comments:
>>
>> - line #79: Can you update the comment, it indicates that 
>> new_node_data takes an InstanceKlass* parameter which has been removed
>>
>> - line #716: I would like an assert as part of the if statement in 
>> the situation where _free_scopes is not empty and the 
>> StateRestoreScope node is being popped instead of newly allocated. 
>> The assert would then check if that StateRestorerScope's _marks 
>> GrowableArray is empty, which it should be.? That would make me more 
>> comfortable with the idea that a previous StateRestoreScope's _marks 
>> array's data is not getting mixed with new data being established for 
>> a new Node in the hierarchy.
>>
>
> Done:
>
> http://cr.openjdk.java.net/~redestad/8233497/open.01/

Thanks for making that change!? Please fix comment at line #79 and add 
some comment to the assert at line #721 like "StateRestorerScope's 
_marks array not empty"?

I don't need to see another webrev.

Lois

>
> /Claes


From zgu at redhat.com  Fri Nov 15 18:37:26 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 15 Nov 2019 13:37:26 -0500
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
In-Reply-To: <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>
References: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
 <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>
Message-ID: <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>

Thanks, Thomas

On 11/15/19 12:59 PM, Thomas St?fe wrote:
> Hi Zhengyu,
> 
> wouldn't ssize_t not be a better choice?

You are right!

Updated: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.01/index.html

Reran the tests.

-Zhengyu

> 
> Other than that, looks good.
> 
> ..Thomas
> 
> On Fri, Nov 15, 2019 at 4:08 PM Zhengyu Gu <zgu at redhat.com 
> <mailto:zgu at redhat.com>> wrote:
> 
>     I could not reproduce the problem stated in CR.
> 
>     The theory is that, when releasing a 2GB+ arena,
>     Arena::set_size_in_bytes() passes a negative long integer to NMT, when
>     it goes through long -> int -> long conversion, at the end, it
>     becomes a
>     positive number.
> 
>     This problem is illustrated in new test. Without the fix, after
>     releasing a 2GB+ arena, NMT shows the arena size doubled, instead of
>     going down.
> 
>     I am not completely sure this fixes the problem reported, but it is
>     worth to cleanup inconsistent types in NMT API.
> 
> 
>     Bug: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
>     Webrev: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
> 
>     Test:
>      ? ?hotspot_nmt + new test (fastdebug and release)
>      ? ?on Linux x86_64
> 
>     Thanks,
> 
>     -Zhengyu
> 


From ioi.lam at oracle.com  Fri Nov 15 18:46:04 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 15 Nov 2019 10:46:04 -0800
Subject: Build broken for ARM32 after 8231610: Relocate the CDS archive if
 it cannot be mapped to the requested address
In-Reply-To: <20191115135123.92163D83FB@aojmv0009>
References: <20191115135123.92163D83FB@aojmv0009>
Message-ID: <430c25bb-b2f7-8faa-a419-1a0ca5f39630@oracle.com>

Hi Christoph,

The changes look good to me. I tried them on Linux/x64 and they will be 
triggered if muck with the value:

 ????? _mapping_offset = (size_t)CompressedOops::encode_not_null((oop)base);
 ????? if (crc != 0) {
 ??????? _mapping_offset += 0x100000000;
 ????? }
 ????? assert(_mapping_offset == (size_t)(uint32_t)_mapping_offset, 
"must be 32-bit only");

We also have similar checks elsewhere in the VM:

./cpu/x86/nativeInst_x86.cpp:? guarantee(disp == 
(intptr_t)(int32_t)disp, "must be 32-bit offset");

Thanks
- Ioi


On 11/15/19 5:49 AM, christoph.goettschkes at microdoc.com wrote:
> Hi,
>
> I am no longer able to build for ARM32 after the commit for 8231610:
> Relocate the CDS archive if it cannot be mapped to the requested address
> [1]. I am using a linaro toolchain with a GCC version 4.9.4.
>
> arm-linux-gnueabi-g++ (Linaro GCC 4.9-2017.01) 4.9.4
> Copyright (C) 2015 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
>
> src/hotspot/share/memory/filemap.cpp:1569:21:
> error: right shift count >= width of type [-Werror]
>     assert((offset >> 32) == 0, "must be 32-bit only");
>
> I guess the same check could be achieved without a right shift operation,
> by casting the offset twice and comparing it?
>     assert(offset == (size_t)(uint32_t)offset, "must be 32-bit only");
>
> Here is a webrev [2] for that particular fix (there are two instances
> where size_t is right shifted by 32). Should I open a new bug for this,
> or should this be discussed using the already existing bug 8231610?
>
> -- Christoph
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8231610
> [2] https://cr.openjdk.java.net/~cgo/8231610/webrev.00/
>


From thomas.schatzl at oracle.com  Fri Nov 15 19:41:37 2019
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 15 Nov 2019 20:41:37 +0100
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
In-Reply-To: <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>
References: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
 <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>
 <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>
Message-ID: <a82808b1-4790-1943-d623-84c3a809f4e0@oracle.com>

Hi,

On 15.11.19 19:37, Zhengyu Gu wrote:
> Thanks, Thomas
> 
> On 11/15/19 12:59 PM, Thomas St?fe wrote:
>> Hi Zhengyu,
>>
>> wouldn't ssize_t not be a better choice?
> 
> You are right!
> 
> Updated: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.01/index.html
> 
> Reran the tests.

   looks good.

Thomas

From zgu at redhat.com  Fri Nov 15 19:46:48 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 15 Nov 2019 14:46:48 -0500
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
In-Reply-To: <a82808b1-4790-1943-d623-84c3a809f4e0@oracle.com>
References: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
 <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>
 <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>
 <a82808b1-4790-1943-d623-84c3a809f4e0@oracle.com>
Message-ID: <21049b9e-1a03-d777-56a0-c815b97bc131@redhat.com>

Thanks, Thomas,

-Zhengyu

On 11/15/19 2:41 PM, Thomas Schatzl wrote:
> Hi,
> 
> On 15.11.19 19:37, Zhengyu Gu wrote:
>> Thanks, Thomas
>>
>> On 11/15/19 12:59 PM, Thomas St?fe wrote:
>>> Hi Zhengyu,
>>>
>>> wouldn't ssize_t not be a better choice?
>>
>> You are right!
>>
>> Updated: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.01/index.html
>>
>> Reran the tests.
> 
>  ? looks good.
> 
> Thomas
> 


From thomas.stuefe at gmail.com  Fri Nov 15 20:09:19 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Fri, 15 Nov 2019 21:09:19 +0100
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
In-Reply-To: <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>
References: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
 <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>
 <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>
Message-ID: <CAA-vtUwU-ss0LEhNuqUnEPTih+rWyyMDK6vo7BD77vKN4Q6yWQ@mail.gmail.com>

All good now.

.. Thomas

On Fri, Nov 15, 2019, 19:37 Zhengyu Gu <zgu at redhat.com> wrote:

> Thanks, Thomas
>
> On 11/15/19 12:59 PM, Thomas St?fe wrote:
> > Hi Zhengyu,
> >
> > wouldn't ssize_t not be a better choice?
>
> You are right!
>
> Updated: http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.01/index.html
>
> Reran the tests.
>
> -Zhengyu
>
> >
> > Other than that, looks good.
> >
> > ..Thomas
> >
> > On Fri, Nov 15, 2019 at 4:08 PM Zhengyu Gu <zgu at redhat.com
> > <mailto:zgu at redhat.com>> wrote:
> >
> >     I could not reproduce the problem stated in CR.
> >
> >     The theory is that, when releasing a 2GB+ arena,
> >     Arena::set_size_in_bytes() passes a negative long integer to NMT,
> when
> >     it goes through long -> int -> long conversion, at the end, it
> >     becomes a
> >     positive number.
> >
> >     This problem is illustrated in new test. Without the fix, after
> >     releasing a 2GB+ arena, NMT shows the arena size doubled, instead of
> >     going down.
> >
> >     I am not completely sure this fixes the problem reported, but it is
> >     worth to cleanup inconsistent types in NMT API.
> >
> >
> >     Bug:
> http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
> >     Webrev:
> http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
> >
> >     Test:
> >         hotspot_nmt + new test (fastdebug and release)
> >         on Linux x86_64
> >
> >     Thanks,
> >
> >     -Zhengyu
> >
>
>

From zgu at redhat.com  Fri Nov 15 20:35:21 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 15 Nov 2019 15:35:21 -0500
Subject: RFR 8204128: NMT might report incorrect numbers for Compiler area
In-Reply-To: <CAA-vtUwU-ss0LEhNuqUnEPTih+rWyyMDK6vo7BD77vKN4Q6yWQ@mail.gmail.com>
References: <cbd88c3f-a4a3-92d5-9b98-f427b78d01ca@redhat.com>
 <CAA-vtUxqYQTZuw2DYB1kZbKanz=u2Cri9FEgtdHe5rxwgiV=PQ@mail.gmail.com>
 <5c3bd780-5333-bbf9-ead6-b510d433538a@redhat.com>
 <CAA-vtUwU-ss0LEhNuqUnEPTih+rWyyMDK6vo7BD77vKN4Q6yWQ@mail.gmail.com>
Message-ID: <fa9fb93c-6ffa-ff64-7f54-b87ac65d1515@redhat.com>


On 11/15/19 3:09 PM, Thomas St?fe wrote:
> All good now.

Thank you, and pushed.

-Zhengyu

> 
> .. Thomas
> 
> On Fri, Nov 15, 2019, 19:37 Zhengyu Gu <zgu at redhat.com 
> <mailto:zgu at redhat.com>> wrote:
> 
>     Thanks, Thomas
> 
>     On 11/15/19 12:59 PM, Thomas St?fe wrote:
>      > Hi Zhengyu,
>      >
>      > wouldn't ssize_t not be a better choice?
> 
>     You are right!
> 
>     Updated:
>     http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.01/index.html
> 
>     Reran the tests.
> 
>     -Zhengyu
> 
>      >
>      > Other than that, looks good.
>      >
>      > ..Thomas
>      >
>      > On Fri, Nov 15, 2019 at 4:08 PM Zhengyu Gu <zgu at redhat.com
>     <mailto:zgu at redhat.com>
>      > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>> wrote:
>      >
>      >? ? ?I could not reproduce the problem stated in CR.
>      >
>      >? ? ?The theory is that, when releasing a 2GB+ arena,
>      >? ? ?Arena::set_size_in_bytes() passes a negative long integer to
>     NMT, when
>      >? ? ?it goes through long -> int -> long conversion, at the end, it
>      >? ? ?becomes a
>      >? ? ?positive number.
>      >
>      >? ? ?This problem is illustrated in new test. Without the fix, after
>      >? ? ?releasing a 2GB+ arena, NMT shows the arena size doubled,
>     instead of
>      >? ? ?going down.
>      >
>      >? ? ?I am not completely sure this fixes the problem reported, but
>     it is
>      >? ? ?worth to cleanup inconsistent types in NMT API.
>      >
>      >
>      >? ? ?Bug:
>     http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
>      >? ? ?Webrev:
>     http://cr.openjdk.java.net/~zgu/JDK-8204128/webrev.00/index.html
>      >
>      >? ? ?Test:
>      >? ? ? ? ?hotspot_nmt + new test (fastdebug and release)
>      >? ? ? ? ?on Linux x86_64
>      >
>      >? ? ?Thanks,
>      >
>      >? ? ?-Zhengyu
>      >
> 


From daniel.daugherty at oracle.com  Fri Nov 15 23:26:51 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 18:26:51 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
Message-ID: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>

Greetings,

runtime/NMT/HugeArenaTracking.java is a new test added by the following fix:

 ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
 ??? https://bugs.openjdk.java.net/browse/JDK-8204128

The test is failing in the JDK-14 CI on the Win* platforms. That failure is
tracked by:

 ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
failed: deallocation > allocated
 ??? https://bugs.openjdk.java.net/browse/JDK-8234270

To keep the noise in the CI down over the weekend, I'm putting the test
on the ProblemList for Win* using this bug:

 ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
 ??? https://bugs.openjdk.java.net/browse/JDK-8234272

Here's the trivial diff:

$ hg diff
diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
--- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 -0800
+++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 -0500
@@ -90,6 +90,7 @@
 ?# :hotspot_runtime

 ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64
+runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
 ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all

 ?#############################################################################


Thanks, in advance, for any comments, questions or suggestions.

Dan


From igor.ignatyev at oracle.com  Fri Nov 15 23:29:51 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 15 Nov 2019 15:29:51 -0800
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
Message-ID: <F3DF0539-9A17-413F-9D75-038BD4DCB437@oracle.com>

LGTM

-- Igor

> On Nov 15, 2019, at 3:26 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
> 
> Greetings,
> 
> runtime/NMT/HugeArenaTracking.java is a new test added by the following fix:
> 
>     JDK-8204128 NMT might report incorrect numbers for Compiler area
>     https://bugs.openjdk.java.net/browse/JDK-8204128
> 
> The test is failing in the JDK-14 CI on the Win* platforms. That failure is
> tracked by:
> 
>     JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) failed: deallocation > allocated
>     https://bugs.openjdk.java.net/browse/JDK-8234270
> 
> To keep the noise in the CI down over the weekend, I'm putting the test
> on the ProblemList for Win* using this bug:
> 
>     JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>     https://bugs.openjdk.java.net/browse/JDK-8234272
> 
> Here's the trivial diff:
> 
> $ hg diff
> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
> --- a/test/hotspot/jtreg/ProblemList.txt    Fri Nov 15 14:22:24 2019 -0800
> +++ b/test/hotspot/jtreg/ProblemList.txt    Fri Nov 15 18:22:00 2019 -0500
> @@ -90,6 +90,7 @@
>  # :hotspot_runtime
> 
>  runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64
> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>  runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
> 
>  #############################################################################
> 
> 
> Thanks, in advance, for any comments, questions or suggestions.
> 
> Dan
> 


From daniel.daugherty at oracle.com  Fri Nov 15 23:30:13 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 18:30:13 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <F3DF0539-9A17-413F-9D75-038BD4DCB437@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <F3DF0539-9A17-413F-9D75-038BD4DCB437@oracle.com>
Message-ID: <14f5278e-d4ef-5490-3093-647a5b98cc9d@oracle.com>

Thanks Igor!!

Dan


On 11/15/19 6:29 PM, Igor Ignatyev wrote:
> LGTM
>
> -- Igor
>
>> On Nov 15, 2019, at 3:26 PM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
>>
>> Greetings,
>>
>> runtime/NMT/HugeArenaTracking.java is a new test added by the following fix:
>>
>>      JDK-8204128 NMT might report incorrect numbers for Compiler area
>>      https://bugs.openjdk.java.net/browse/JDK-8204128
>>
>> The test is failing in the JDK-14 CI on the Win* platforms. That failure is
>> tracked by:
>>
>>      JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) failed: deallocation > allocated
>>      https://bugs.openjdk.java.net/browse/JDK-8234270
>>
>> To keep the noise in the CI down over the weekend, I'm putting the test
>> on the ProblemList for Win* using this bug:
>>
>>      JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>      https://bugs.openjdk.java.net/browse/JDK-8234272
>>
>> Here's the trivial diff:
>>
>> $ hg diff
>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>> --- a/test/hotspot/jtreg/ProblemList.txt    Fri Nov 15 14:22:24 2019 -0800
>> +++ b/test/hotspot/jtreg/ProblemList.txt    Fri Nov 15 18:22:00 2019 -0500
>> @@ -90,6 +90,7 @@
>>   # :hotspot_runtime
>>
>>   runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64
>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>   runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>
>>   #############################################################################
>>
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>


From leonid.mesnik at oracle.com  Fri Nov 15 23:43:03 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Fri, 15 Nov 2019 15:43:03 -0800
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
Message-ID: <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>

It would be better to backout fix.

Other tests executed with NMT triggered this assertion also. So we are 
going to have a lot of assertions.

Leonid

On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> runtime/NMT/HugeArenaTracking.java is a new test added by the 
> following fix:
>
> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>
> The test is failing in the JDK-14 CI on the Win* platforms. That 
> failure is
> tracked by:
>
> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
> failed: deallocation > allocated
> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>
> To keep the noise in the CI down over the weekend, I'm putting the test
> on the ProblemList for Win* using this bug:
>
> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>
> Here's the trivial diff:
>
> $ hg diff
> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 
> -0800
> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 
> -0500
> @@ -90,6 +90,7 @@
> ?# :hotspot_runtime
>
> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64
> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>
> ?############################################################################# 
>
>
>
> Thanks, in advance, for any comments, questions or suggestions.
>
> Dan
>

From daniel.daugherty at oracle.com  Sat Nov 16 01:19:09 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 20:19:09 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
Message-ID: <da9d1db1-378e-2347-2f5f-beab1d6ebee7@oracle.com>

Sorry I missed this. I took a (late) dinner break with the family.

Okay, so we're going to see more failures in the higher tiers due to this
fix. Based on your update to JDK-8234270, we'll see some Kitchensink 
failures
also right? Possibly as early as Tier4, but definitely by Tier6... sigh...

Dan


On 11/15/19 6:43 PM, Leonid Mesnik wrote:
> It would be better to backout fix.
>
> Other tests executed with NMT triggered this assertion also. So we are 
> going to have a lot of assertions.
>
> Leonid
>
> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>> following fix:
>>
>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>
>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>> failure is
>> tracked by:
>>
>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>> failed: deallocation > allocated
>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>
>> To keep the noise in the CI down over the weekend, I'm putting the test
>> on the ProblemList for Win* using this bug:
>>
>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>
>> Here's the trivial diff:
>>
>> $ hg diff
>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 
>> -0800
>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 
>> -0500
>> @@ -90,6 +90,7 @@
>> ?# :hotspot_runtime
>>
>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>> aix-ppc64
>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>
>> ?############################################################################# 
>>
>>
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>


From leonid.mesnik at oracle.com  Sat Nov 16 01:46:08 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Fri, 15 Nov 2019 17:46:08 -0800
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <da9d1db1-378e-2347-2f5f-beab1d6ebee7@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
 <da9d1db1-378e-2347-2f5f-beab1d6ebee7@oracle.com>
Message-ID: <556159ae-c890-2f72-a4a2-56a1d8e9d408@oracle.com>

Yes, and the stress testing is going to be completely broken.

I think it would be make a sense to pre-test fix with Kitchensink before 
integration. I could help with this if needed.

Leonid

On 11/15/19 5:19 PM, Daniel D. Daugherty wrote:
> Sorry I missed this. I took a (late) dinner break with the family.
>
> Okay, so we're going to see more failures in the higher tiers due to this
> fix. Based on your update to JDK-8234270, we'll see some Kitchensink 
> failures
> also right? Possibly as early as Tier4, but definitely by Tier6... 
> sigh...
>
> Dan
>
>
> On 11/15/19 6:43 PM, Leonid Mesnik wrote:
>> It would be better to backout fix.
>>
>> Other tests executed with NMT triggered this assertion also. So we 
>> are going to have a lot of assertions.
>>
>> Leonid
>>
>> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>>> following fix:
>>>
>>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>
>>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>>> failure is
>>> tracked by:
>>>
>>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>> failed: deallocation > allocated
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>
>>> To keep the noise in the CI down over the weekend, I'm putting the test
>>> on the ProblemList for Win* using this bug:
>>>
>>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>>
>>> Here's the trivial diff:
>>>
>>> $ hg diff
>>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 
>>> -0800
>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 
>>> -0500
>>> @@ -90,6 +90,7 @@
>>> ?# :hotspot_runtime
>>>
>>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>>> aix-ppc64
>>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>>
>>> ?############################################################################# 
>>>
>>>
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>

From daniel.daugherty at oracle.com  Sat Nov 16 01:47:19 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 20:47:19 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <556159ae-c890-2f72-a4a2-56a1d8e9d408@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
 <da9d1db1-378e-2347-2f5f-beab1d6ebee7@oracle.com>
 <556159ae-c890-2f72-a4a2-56a1d8e9d408@oracle.com>
Message-ID: <791ef29d-6e30-3724-463b-627f071f763e@oracle.com>

On 11/15/19 8:46 PM, Leonid Mesnik wrote:
> Yes, and the stress testing is going to be completely broken.

That would be a Bad Thing (TM); especially on a weekend. I'll get a new
subtask going for backing out JDK-8204128.

Dan


> I think it would be make a sense to pre-test fix with Kitchensink 
> before integration. I could help with this if needed.
>
> Leonid
>
> On 11/15/19 5:19 PM, Daniel D. Daugherty wrote:
>> Sorry I missed this. I took a (late) dinner break with the family.
>>
>> Okay, so we're going to see more failures in the higher tiers due to 
>> this
>> fix. Based on your update to JDK-8234270, we'll see some Kitchensink 
>> failures
>> also right? Possibly as early as Tier4, but definitely by Tier6... 
>> sigh...
>>
>> Dan
>>
>>
>> On 11/15/19 6:43 PM, Leonid Mesnik wrote:
>>> It would be better to backout fix.
>>>
>>> Other tests executed with NMT triggered this assertion also. So we 
>>> are going to have a lot of assertions.
>>>
>>> Leonid
>>>
>>> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>>>> following fix:
>>>>
>>>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>>
>>>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>>>> failure is
>>>> tracked by:
>>>>
>>>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>>> failed: deallocation > allocated
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>>
>>>> To keep the noise in the CI down over the weekend, I'm putting the 
>>>> test
>>>> on the ProblemList for Win* using this bug:
>>>>
>>>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>>>
>>>> Here's the trivial diff:
>>>>
>>>> $ hg diff
>>>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 
>>>> 2019 -0800
>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 
>>>> 2019 -0500
>>>> @@ -90,6 +90,7 @@
>>>> ?# :hotspot_runtime
>>>>
>>>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>>>> aix-ppc64
>>>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>>>
>>>> ?############################################################################# 
>>>>
>>>>
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>


From daniel.daugherty at oracle.com  Sat Nov 16 02:08:35 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 21:08:35 -0500
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
Message-ID: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>

Greetings,

The following fix needs to be backed out:

 ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
 ??? https://bugs.openjdk.java.net/browse/JDK-8204128

The fix is causing failures in the stress tiers of the JDK14 CI. Please
see the info added to the following bug by Leonid:

 ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
failed: deallocation > allocated
 ??? https://bugs.openjdk.java.net/browse/JDK-8234270

To keep the noise in the CI down over the weekend, I'm backing out the
fix using the following bug:

 ??? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
numbers for Compiler area
 ??? https://bugs.openjdk.java.net/browse/JDK-8234274

I'm also backing out the following fix:

 ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java

Doesn't make sense to ProblemList a test that has been deleted.

Here's the webrev for the backout:

http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/

I have manually verified the backout against the changeset that precedes
the changeset for JDK-8204128.

Thanks, in advance, for any comments, questions or suggestions.

Dan


From zgu at redhat.com  Sat Nov 16 02:10:22 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 15 Nov 2019 21:10:22 -0500
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
References: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
Message-ID: <98486d27-32c5-0a6d-db6d-bd4ca1440028@redhat.com>

Thanks, Dan.

-Zhengyu

On 11/15/19 9:08 PM, Daniel D. Daugherty wrote:
> Greetings,
> 
> The following fix needs to be backed out:
> 
>  ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>  ??? https://bugs.openjdk.java.net/browse/JDK-8204128
> 
> The fix is causing failures in the stress tiers of the JDK14 CI. Please
> see the info added to the following bug by Leonid:
> 
>  ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
> failed: deallocation > allocated
>  ??? https://bugs.openjdk.java.net/browse/JDK-8234270
> 
> To keep the noise in the CI down over the weekend, I'm backing out the
> fix using the following bug:
> 
>  ??? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
> numbers for Compiler area
>  ??? https://bugs.openjdk.java.net/browse/JDK-8234274
> 
> I'm also backing out the following fix:
> 
>  ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
> 
> Doesn't make sense to ProblemList a test that has been deleted.
> 
> Here's the webrev for the backout:
> 
> http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/
> 
> I have manually verified the backout against the changeset that precedes
> the changeset for JDK-8204128.
> 
> Thanks, in advance, for any comments, questions or suggestions.
> 
> Dan
> 
> 
> 


From daniel.daugherty at oracle.com  Sat Nov 16 02:12:14 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 21:12:14 -0500
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <98486d27-32c5-0a6d-db6d-bd4ca1440028@redhat.com>
References: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
 <98486d27-32c5-0a6d-db6d-bd4ca1440028@redhat.com>
Message-ID: <3bf037cb-03bf-bdd3-349b-5b860e49ac39@oracle.com>

No problem. Please confirm you are good with the backout.

Leonid says he can help with testing the [REDO]. I'll morph
https://bugs.openjdk.java.net/browse/JDK-8234270 into the REDO bug in a 
minute.

Dan

On 11/15/19 9:10 PM, Zhengyu Gu wrote:
> Thanks, Dan.
>
> -Zhengyu
>
> On 11/15/19 9:08 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The following fix needs to be backed out:
>>
>> ???? JDK-8204128 NMT might report incorrect numbers for Compiler area
>> ???? https://bugs.openjdk.java.net/browse/JDK-8204128
>>
>> The fix is causing failures in the stress tiers of the JDK14 CI. Please
>> see the info added to the following bug by Leonid:
>>
>> ???? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>> failed: deallocation > allocated
>> ???? https://bugs.openjdk.java.net/browse/JDK-8234270
>>
>> To keep the noise in the CI down over the weekend, I'm backing out the
>> fix using the following bug:
>>
>> ???? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
>> numbers for Compiler area
>> ???? https://bugs.openjdk.java.net/browse/JDK-8234274
>>
>> I'm also backing out the following fix:
>>
>> ???? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>
>> Doesn't make sense to ProblemList a test that has been deleted.
>>
>> Here's the webrev for the backout:
>>
>> http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/
>>
>> I have manually verified the backout against the changeset that precedes
>> the changeset for JDK-8204128.
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
>>
>>
>


From patricio.chilano.mateo at oracle.com  Sat Nov 16 02:15:55 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Fri, 15 Nov 2019 23:15:55 -0300
Subject: RFR 8231264: Disable biased-locking and deprecate all flags related
 to biased-locking
Message-ID: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>

Hi all,

Could you review the following patch?

JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev

Biased locking will be disabled by default and all related flags will be 
deprecated. Performance gains seen when the feature was introduced in 
the VM are less clear today with modern Java code/processors. Detailed 
rationale behind the change is included on the description of the bug.

I modified test gtest/oops/test_markWord.cpp so that it still exercises 
other cases of markword printing.

Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows and 
Solaris).

Thanks,
Patricio


From zgu at redhat.com  Sat Nov 16 02:17:23 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 15 Nov 2019 21:17:23 -0500
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <3bf037cb-03bf-bdd3-349b-5b860e49ac39@oracle.com>
References: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
 <98486d27-32c5-0a6d-db6d-bd4ca1440028@redhat.com>
 <3bf037cb-03bf-bdd3-349b-5b860e49ac39@oracle.com>
Message-ID: <86b228df-afac-c4e0-43be-cacd06909ee0@redhat.com>


On 11/15/19 9:12 PM, Daniel D. Daugherty wrote:
> No problem. Please confirm you are good with the backout.
Yes, I am good with backing out.

> 
> Leonid says he can help with testing the [REDO]. I'll morph
> https://bugs.openjdk.java.net/browse/JDK-8234270 into the REDO bug in a 
> minute.

I think that the new assertion may actually catch something about 
original bug. It will be great can verify that.

Thanks,

-Zhengyu

> 
> Dan
> 
> On 11/15/19 9:10 PM, Zhengyu Gu wrote:
>> Thanks, Dan.
>>
>> -Zhengyu
>>
>> On 11/15/19 9:08 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The following fix needs to be backed out:
>>>
>>> ???? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>> ???? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>
>>> The fix is causing failures in the stress tiers of the JDK14 CI. Please
>>> see the info added to the following bug by Leonid:
>>>
>>> ???? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>> failed: deallocation > allocated
>>> ???? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>
>>> To keep the noise in the CI down over the weekend, I'm backing out the
>>> fix using the following bug:
>>>
>>> ???? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
>>> numbers for Compiler area
>>> ???? https://bugs.openjdk.java.net/browse/JDK-8234274
>>>
>>> I'm also backing out the following fix:
>>>
>>> ???? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>>
>>> Doesn't make sense to ProblemList a test that has been deleted.
>>>
>>> Here's the webrev for the backout:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/
>>>
>>> I have manually verified the backout against the changeset that precedes
>>> the changeset for JDK-8204128.
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>>>
>>>
>>
> 


From daniel.daugherty at oracle.com  Sat Nov 16 02:19:31 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 21:19:31 -0500
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <86b228df-afac-c4e0-43be-cacd06909ee0@redhat.com>
References: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
 <98486d27-32c5-0a6d-db6d-bd4ca1440028@redhat.com>
 <3bf037cb-03bf-bdd3-349b-5b860e49ac39@oracle.com>
 <86b228df-afac-c4e0-43be-cacd06909ee0@redhat.com>
Message-ID: <e0c3bb69-72e3-cc99-6c73-4b8b6d33445e@oracle.com>

Thanks for the confirmation.

Dan


On 11/15/19 9:17 PM, Zhengyu Gu wrote:
>
>
> On 11/15/19 9:12 PM, Daniel D. Daugherty wrote:
>> No problem. Please confirm you are good with the backout.
> Yes, I am good with backing out.
>
>>
>> Leonid says he can help with testing the [REDO]. I'll morph
>> https://bugs.openjdk.java.net/browse/JDK-8234270 into the REDO bug in 
>> a minute.
>
> I think that the new assertion may actually catch something about 
> original bug. It will be great can verify that.
>
> Thanks,
>
> -Zhengyu
>
>>
>> Dan
>>
>> On 11/15/19 9:10 PM, Zhengyu Gu wrote:
>>> Thanks, Dan.
>>>
>>> -Zhengyu
>>>
>>> On 11/15/19 9:08 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> The following fix needs to be backed out:
>>>>
>>>> ???? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>>
>>>> The fix is causing failures in the stress tiers of the JDK14 CI. 
>>>> Please
>>>> see the info added to the following bug by Leonid:
>>>>
>>>> ???? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>>> failed: deallocation > allocated
>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>>
>>>> To keep the noise in the CI down over the weekend, I'm backing out the
>>>> fix using the following bug:
>>>>
>>>> ???? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
>>>> numbers for Compiler area
>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8234274
>>>>
>>>> I'm also backing out the following fix:
>>>>
>>>> ???? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>>>
>>>> Doesn't make sense to ProblemList a test that has been deleted.
>>>>
>>>> Here's the webrev for the backout:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/
>>>>
>>>> I have manually verified the backout against the changeset that 
>>>> precedes
>>>> the changeset for JDK-8204128.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>
>>
>
>


From david.holmes at oracle.com  Sat Nov 16 02:20:35 2019
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 16 Nov 2019 12:20:35 +1000
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
References: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
Message-ID: <32b106ec-e565-6bb8-e19b-fdbdc091a101@oracle.com>

Hi Dan,

Looks like an accurate backout to me.

Thanks,
David

On 16/11/2019 12:08 pm, Daniel D. Daugherty wrote:
> Greetings,
> 
> The following fix needs to be backed out:
> 
>  ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>  ??? https://bugs.openjdk.java.net/browse/JDK-8204128
> 
> The fix is causing failures in the stress tiers of the JDK14 CI. Please
> see the info added to the following bug by Leonid:
> 
>  ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
> failed: deallocation > allocated
>  ??? https://bugs.openjdk.java.net/browse/JDK-8234270
> 
> To keep the noise in the CI down over the weekend, I'm backing out the
> fix using the following bug:
> 
>  ??? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
> numbers for Compiler area
>  ??? https://bugs.openjdk.java.net/browse/JDK-8234274
> 
> I'm also backing out the following fix:
> 
>  ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
> 
> Doesn't make sense to ProblemList a test that has been deleted.
> 
> Here's the webrev for the backout:
> 
> http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/
> 
> I have manually verified the backout against the changeset that precedes
> the changeset for JDK-8204128.
> 
> Thanks, in advance, for any comments, questions or suggestions.
> 
> Dan
> 
> 
> 

From daniel.daugherty at oracle.com  Sat Nov 16 02:21:29 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 15 Nov 2019 21:21:29 -0500
Subject: RFR(T): 8234274 [BACKOUT] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <32b106ec-e565-6bb8-e19b-fdbdc091a101@oracle.com>
References: <fb8bbbaf-49fd-92ce-599c-af45d0172382@oracle.com>
 <32b106ec-e565-6bb8-e19b-fdbdc091a101@oracle.com>
Message-ID: <db62a4cf-bdd4-3647-38e7-f1054cb8756b@oracle.com>

Thanks for the confirmation. I literally just pushed it with zgu's review...

Dan


On 11/15/19 9:20 PM, David Holmes wrote:
> Hi Dan,
>
> Looks like an accurate backout to me.
>
> Thanks,
> David
>
> On 16/11/2019 12:08 pm, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The following fix needs to be backed out:
>>
>> ???? JDK-8204128 NMT might report incorrect numbers for Compiler area
>> ???? https://bugs.openjdk.java.net/browse/JDK-8204128
>>
>> The fix is causing failures in the stress tiers of the JDK14 CI. Please
>> see the info added to the following bug by Leonid:
>>
>> ???? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>> failed: deallocation > allocated
>> ???? https://bugs.openjdk.java.net/browse/JDK-8234270
>>
>> To keep the noise in the CI down over the weekend, I'm backing out the
>> fix using the following bug:
>>
>> ???? JDK-8234274 [BACKOUT] JDK-8204128 NMT might report incorrect 
>> numbers for Compiler area
>> ???? https://bugs.openjdk.java.net/browse/JDK-8234274
>>
>> I'm also backing out the following fix:
>>
>> ???? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>
>> Doesn't make sense to ProblemList a test that has been deleted.
>>
>> Here's the webrev for the backout:
>>
>> http://cr.openjdk.java.net/~dcubed/8234274-webrev/0-for-jdk14/
>>
>> I have manually verified the backout against the changeset that precedes
>> the changeset for JDK-8204128.
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
>>
>>


From david.holmes at oracle.com  Sat Nov 16 03:46:50 2019
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 16 Nov 2019 13:46:50 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
Message-ID: <5e088c4b-2a0d-715e-047d-39578ed18270@oracle.com>

Hi Patricio,

On 16/11/2019 12:15 pm, Patricio Chilano wrote:
> Hi all,
> 
> Could you review the following patch?
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev
> 
> Biased locking will be disabled by default and all related flags will be 
> deprecated. Performance gains seen when the feature was introduced in 
> the VM are less clear today with modern Java code/processors. Detailed 
> rationale behind the change is included on the description of the bug.

That all seems fine.

> I modified test gtest/oops/test_markWord.cpp so that it still exercises 
> other cases of markword printing.

There seems to be some value in testing things in the non-biased-locking 
case - though I wonder if there is not another gtest that does this?

The only criticism I have is that when BL is disabled the later comments 
in the test are incorrect:

115   // Same thread tries to lock it again.

First time without BL.

121   // This is no longer biased, because ObjectLocker revokes the bias.

Was not biased to begin with without BL.

Thanks,
David

> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows and 
> Solaris).
> 
> Thanks,
> Patricio
> 
> 

From zgu at redhat.com  Sat Nov 16 14:04:40 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Sat, 16 Nov 2019 09:04:40 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
Message-ID: <66dde824-a453-f3de-a9f6-0781065f882b@redhat.com>

Hi Leonid,

On 11/15/19 6:43 PM, Leonid Mesnik wrote:
> It would be better to backout fix.
> 
> Other tests executed with NMT triggered this assertion also. So we are 
> going to have a lot of assertions.

Are any of these "other tests" publicly available? If so, could you 
point me what are they?

Thanks,

-Zhengyu

> 
> Leonid
> 
> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>> following fix:
>>
>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>
>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>> failure is
>> tracked by:
>>
>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>> failed: deallocation > allocated
>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>
>> To keep the noise in the CI down over the weekend, I'm putting the test
>> on the ProblemList for Win* using this bug:
>>
>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>
>> Here's the trivial diff:
>>
>> $ hg diff
>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 
>> -0800
>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 
>> -0500
>> @@ -90,6 +90,7 @@
>> ?# :hotspot_runtime
>>
>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64
>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>
>> ?############################################################################# 
>>
>>
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
> 


From leonid.mesnik at oracle.com  Sat Nov 16 18:42:48 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Sat, 16 Nov 2019 10:42:48 -0800
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <66dde824-a453-f3de-a9f6-0781065f882b@redhat.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
 <66dde824-a453-f3de-a9f6-0781065f882b@redhat.com>
Message-ID: <667049ec-9c07-996a-cdd7-7af1cbbbfe57@oracle.com>

Hi

Unfortunately, I don't know any publicly available test which could 
reproduce this issue right now.

We also run some jdk/hotstpot regression tests with NMT enabled, however 
not sure if they could be used as reproducers.

Leonid

On 11/16/19 6:04 AM, Zhengyu Gu wrote:
> Hi Leonid,
>
> On 11/15/19 6:43 PM, Leonid Mesnik wrote:
>> It would be better to backout fix.
>>
>> Other tests executed with NMT triggered this assertion also. So we 
>> are going to have a lot of assertions.
>
> Are any of these "other tests" publicly available? If so, could you 
> point me what are they?
>
> Thanks,
>
> -Zhengyu
>
>>
>> Leonid
>>
>> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>>> following fix:
>>>
>>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>
>>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>>> failure is
>>> tracked by:
>>>
>>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>> failed: deallocation > allocated
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>
>>> To keep the noise in the CI down over the weekend, I'm putting the test
>>> on the ProblemList for Win* using this bug:
>>>
>>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>>
>>> Here's the trivial diff:
>>>
>>> $ hg diff
>>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 
>>> -0800
>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 
>>> -0500
>>> @@ -90,6 +90,7 @@
>>> ?# :hotspot_runtime
>>>
>>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>>> aix-ppc64
>>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>>
>>> ?############################################################################# 
>>>
>>>
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>>
>

From zgu at redhat.com  Sun Nov 17 15:31:37 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Sun, 17 Nov 2019 10:31:37 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <667049ec-9c07-996a-cdd7-7af1cbbbfe57@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
 <66dde824-a453-f3de-a9f6-0781065f882b@redhat.com>
 <667049ec-9c07-996a-cdd7-7af1cbbbfe57@oracle.com>
Message-ID: <cb5a98f0-82fe-9306-38f4-ffbc7b81a732@redhat.com>

Hi,

Could you test this patch? 
http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html


I was able to reproduce original bug with 
compiler/codecache/stress/UnexpectedDeoptimizationTest.java + NMT, newly 
added assertion caught the problem.

I also fixed the new test failure on Windows. JDK-8204128 patch missed 
one long -> ssize_t change. long on Windows is 4-bytes vs 8-bytes on 
other 64-bits platforms, that explains why it only fails on Windows.

This patch passed submit test.

[Mach5] mach5-one-zgu-JDK-8234270-1-20191117-1425-6784823: PASSED

Thanks,

-Zhengyu


On 11/16/19 1:42 PM, Leonid Mesnik wrote:
> Hi
> 
> Unfortunately, I don't know any publicly available test which could 
> reproduce this issue right now.
> 
> We also run some jdk/hotstpot regression tests with NMT enabled, however 
> not sure if they could be used as reproducers.
> 
> Leonid
> 
> On 11/16/19 6:04 AM, Zhengyu Gu wrote:
>> Hi Leonid,
>>
>> On 11/15/19 6:43 PM, Leonid Mesnik wrote:
>>> It would be better to backout fix.
>>>
>>> Other tests executed with NMT triggered this assertion also. So we 
>>> are going to have a lot of assertions.
>>
>> Are any of these "other tests" publicly available? If so, could you 
>> point me what are they?
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>>
>>> Leonid
>>>
>>> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>>>> following fix:
>>>>
>>>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>>
>>>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>>>> failure is
>>>> tracked by:
>>>>
>>>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>>> failed: deallocation > allocated
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>>
>>>> To keep the noise in the CI down over the weekend, I'm putting the test
>>>> on the ProblemList for Win* using this bug:
>>>>
>>>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>>>
>>>> Here's the trivial diff:
>>>>
>>>> $ hg diff
>>>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 2019 
>>>> -0800
>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 2019 
>>>> -0500
>>>> @@ -90,6 +90,7 @@
>>>> ?# :hotspot_runtime
>>>>
>>>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>>>> aix-ppc64
>>>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>>>
>>>> ?############################################################################# 
>>>>
>>>>
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>
>>
> 


From david.holmes at oracle.com  Mon Nov 18 02:30:48 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 12:30:48 +1000
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
Message-ID: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>

Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/

This was a very difficult bug to track down and I want to publicly 
acknowledge and thank the jemalloc folk (users and developers) for 
continuing to investigate this issue from their side. Without their 
persistence this issue would have languished.

The thread stack_base() is the first address above the thread's stack. 
However, the "in stack" checks performed by Thread::on_local_stack and 
Thread::is_in_stack allowed the checked address to be equal to the 
stack_base() - which is not correct. Here's how this manifests as the bug:

- Let a JavaThread instance, T2, be allocated at the end of thread T1's 
stack i.e. at T1->stack_base()
   [This seems to be why this only reproduced with jemalloc.]
- Let T2 lock an inflated monitor
- Let T1 try to lock the same monitor
   - T1 would consider the _owner field value (T2) as being in its stack 
and so consider the monitor stack-locked by T1
   - And so both T1 and T2 would have ownership of the monitor allowing 
the monitor state (and application state) to be corrupted. This results 
in a range of hangs and crashes depending on the exact interleaving.

Interestingly Thread::is_in_usable_stack does not have this bug.

The bug can be tracked way back to JDK-6699669 as explained in the bug 
report. That issue also showed that the same bug existed in the SA 
implementations of these "on stack" checks.

Testing:
   - The reproducer from the bug report, using jemalloc, ran over 5000 
times without failing in any way.
   - tiers 1-3 on all Oracle platforms
   - serviceability/sa tests

Thanks,
David
-----

From david.holmes at oracle.com  Mon Nov 18 04:00:24 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 14:00:24 +1000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <c6540c8a-a74e-3bff-a212-04162fdf1f63@oracle.com>

Hi Christoph,

This all seems fine to me. One clarification:

- /* The appropriate location of getPrefixed() should be io_util_md.c, but
-    java.lang.instrument package has hardwired canonicalize_md.c into their
-    dll, to avoid complicate solution such as including io_util_md.c into
-    that package, as a workaround we put this method here.
-  */

I assume this hardwired usage was removed some time ago?

Thanks,
David

On 15/11/2019 1:37 am, Langer, Christoph wrote:
> Hi,
> 
> please review this cleanup change regarding function "canonicalize" of libjava.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
> 
> 
> The goal is to cleanup how this function is defined and used. One thing is, that there was an unnecessary wrapper function "Canonicalize" in jni_util.c. It wrapped the call to "canonicalize". We can get rid of this wrapper. Unfortunately, it is not possible to just export "canonicalize" since this will conflict with a method signature from the math library, at least on modern Linuxes. So I decided to call the method JDK_Canonicalize and will correctly define it in jdk_util.h which can be included everywhere.
> 
> 
> 
> Hotspot's classloader.cpp will dynamically resolve this method, so I add a local declaration of the function pointer in there.
> 
> 
> 
> This change shall be predecessor of JDK-8223261, where a review was already started here: https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063398.html
> 
> Thanks
> Christoph
> 

From david.holmes at oracle.com  Mon Nov 18 04:13:55 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 14:13:55 +1000
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
Message-ID: <f9439547-1266-ea5e-8346-4e0e0b25223d@oracle.com>

Hi Dan,

No further comments from me at this stage.

Thanks,
David

On 5/11/2019 7:03 am, Daniel D. Daugherty wrote:
> Greetings,
> 
> I have made changes to the Async Monitor Deflation code in response to
> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
> and Erik O. for their comments!
> 
> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async Monitor
> Deflation project needs to push before Nov. 12, 2019 in order to allow
> for sufficient bake time for such a big change. Nov. 12 is _next_ Tuesday
> so we have 8 days from today to finish this code review cycle and push
> this code for JDK14.
> 
> Carsten and Roman! Time for you guys to chime in again on the code reviews.
> 
> I have attached the change list from CR7 to CR8 instead of putting it in
> the body of this email. I've also added a link to the CR7-to-CR8-changes
> file to the webrevs so it should be easy to find.
> 
> Main bug URL:
> 
>  ??? JDK-8153224 Monitor deflation prolong safepoints
>  ??? https://bugs.openjdk.java.net/browse/JDK-8153224
> 
> The project is currently baselined on jdk-14+21.
> 
> Here's the full webrev URL for those folks that want to see all of the
> current Async Monitor Deflation code in one go (v2.08 full):
> 
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full
> 
> Some folks might want to see just what has changed since the last review
> cycle so here's a webrev for that (v2.08 inc):
> 
> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/
> 
> The OpenJDK wiki did not need any changes for this round:
> 
> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
> 
> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
> tier[1-8]
> testing on Oracle's usual set of platforms. It has also been through my 
> usual
> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
> addition
> of Robbin's "MoCrazy 1024" test running in parallel with the other tests in
> my lab. Some testing is still running, but so far there are no new 
> regressions.
> 
> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 bits.
> 
> Thanks, in advance, for any questions, comments or suggestions.
> 
> Dan
> 
> 
> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The Async Monitor Deflation project is reaching the end game. I have no
>> changes planned for the project at this time so all that is left is code
>> review and any changes that results from those reviews.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>> in the main body of this email.
>>
>> Main bug URL:
>>
>> ??? JDK-8153224 Monitor deflation prolong safepoints
>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+19.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.07 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.07 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/
>>
>> The OpenJDK wiki has been updated to match the CR7/v2.07/10-for-jdk14 
>> changes:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab.
>>
>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>> tier[1-3]
>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in process.
>>
>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>> Performance lab
>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>
>> ??? - "base" is jdk-14+18
>> ??? - "v2.07" is the latest version and includes C2 inc_om_ref_count() 
>> support
>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>> specified
>>
>> ???????? hbIR?????????? hbIR
>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS? runtime
>> ??? ---------------? ---------? --------? -------------? -------
>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20? 3841.30 base
>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20? 3964.10 v2.07
>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00? 3931.30 
>> v2.07_handshake
>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10? 3850.20 
>> v2.07_off
>>
>> ??? - The Aurora Perf comparison tool reports:
>>
>> ??????? Comparison????????????? max-jOPS critical-jOPS
>> ??????? ----------------------? -------------------- --------------------
>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>> p=0.790)
>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>> p=0.536)
>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>> p=0.347)
>>
>> ??????? (s) - significant? (ns) - not-significant
>>
>> ??? - For historical comparison, the Aurora Perf comparision tool
>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>
>> ??????? Comparison????????????? max-jOPS critical-jOPS
>> ??????? ----------------------? -------------------- --------------------
>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>> p=0.646)
>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>> p=0.481)
>>
>> ??????? (s) - significant? (ns) - not-significant
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>> for our first code review in that new context!!
>>>
>>> I've been focused on changing the monitor list management code to be
>>> lock-free in order to make SPECjbb2015 happier. Of course with a change
>>> like that, it takes a while to chase down all the new and wonderful
>>> races. At this point, I have the code back to the same stability that
>>> I had with CR5/v2.05/8-for-jdk13.
>>>
>>> To lay the ground work for this round of review, I pushed the following
>>> two fixes to jdk/jdk earlier today:
>>>
>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>> preparation
>>> ? ? ??????????? for lock free Monitor lists
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>
>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails after 
>>> 8230184
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>
>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>> JDK-8230184 and JDK-8230317.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>
>>>
>>>
>>> The primary focus of this review cycle is on the lock-free Monitor List
>>> management changes so here's a webrev for just that patch (v2.06c):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>
>>>
>>> The secondary focus of this review cycle is on the bug fixes that have
>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>> patch (v2.06b):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>
>>>
>>> The third and final bucket for this review cycle is the rename, 
>>> whitespace,
>>> indent and comments changes made in preparation for lock free Monitor 
>>> list
>>> management. Almost all of that was extracted into JDK-8230184 for the
>>> baseline so this bucket now has just a few comment changes relative to
>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>
>>>
>>>
>>> Some folks might want to see just what has changed since the last review
>>> cycle so here's a webrev for that (v2.06 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
>>>
>>>
>>> Last, but not least, some folks might want to see the code before the
>>> addition of lock-free Monitor List management so here's a webrev for
>>> that (v2.00 -> v2.05):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/
>>>
>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> but that should only be changes to describe per-thread list async 
>>> monitor
>>> deflation being done by the ServiceThread.
>>>
>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>
>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>> Oracle's usual set of platforms. It has also been through my usual set
>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>
>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was using
>>> this patch baselined on jdk-13+31 (for stability):
>>>
>>> ????????? hbIR?????????? hbIR
>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>> ???? ---------------? ---------? --------? ------------- -------
>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 v2.06d
>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>> v2.06d_off
>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>> v2.06d_ocache
>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>> v2.05_off
>>>
>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>> the "ocache" config is with 128 byte cache line sizes instead of 64 byte
>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>> before
>>> those changes were distributed into the "v2.06a", "v2.06b" and "v2.06c"
>>> buckets for this review recycle.
>>>
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I've been focused on chasing down and fixing the rare test failures
>>>> that only pop up rarely. So this round is primarily fixes for races
>>>> with a few additional fixes that came from Karen's review of CR4.
>>>> Thanks Karen!
>>>>
>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>> (jdk/jdk) repo soon...
>>>>
>>>> Here's the full webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>
>>>> Here's the incremental webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>
>>>> I have not yet checked the OpenJDK wiki to see if it needs any updates
>>>> to match the CR5 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> (I did update the OpenJDK wiki for the CR4 changes back on 2019.06.26)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>> available.
>>>> Since I haven't made any performance changes in this round, I'll only
>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>
>>>> Next up:
>>>>
>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>> ? Linux-X64 and we've determined that some of that comes from
>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>> - Of course, going lock free often causes new races and new failures
>>>> ? so that's a good reason for make those changes isolated in their
>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>> - I finally have a potential fix for the Win* failure with
>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>> ? but I haven't run it through Mach5 yet so it'll be in the next round.
>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing some
>>>> ? monitor related failures there. I suspect that I need to go take a
>>>> ? look at the C2 RTM macro assembler code and look for things that 
>>>> might
>>>> ? conflict if Async Monitor Deflation. If you're interested in that 
>>>> kind
>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>> ? added in this round!
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a fix for an issue that came up during performance testing.
>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>> experiments.
>>>>>
>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>> verbose due to the complexity of the issue, but the changes
>>>>> themselves are not that big.
>>>>>
>>>>> Functional:
>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>> ??????? safepoints that simply request async deflation which
>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>> ??????? their async deflation work.
>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>> ??? - is_async_deflation_needed() returns true when
>>>>> ????? is_async_cleanup_requested() is true or when
>>>>> ????? monitors_used_above_threshold() is true (but no more often than
>>>>> ????? AsyncDeflationInterval).
>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>> ??????? (default is 1000 millis/1 second)
>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>> ??????? This is to prevent async deflation from swamping the
>>>>> ??????? ServiceThread.
>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>> ??? in-use list and now it also marks JavaThreads for async deflation
>>>>> ??? of their in-use lists.
>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>> ????? GuaranteedSafepointInterval.
>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>
>>>>> Test update:
>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>
>>>>> Collateral:
>>>>> ? - Add/clarify/update some logging messages.
>>>>>
>>>>> Cleanup:
>>>>> ? - Updated comments based on Karen's code review.
>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>> ??? - comment and function name changes
>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+22.
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>
>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> The wiki doesn't say a whole lot about the async deflation invocation
>>>>> mechanism so I have to figure out how to add that content.
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to run
>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I had some discussions with Karen about a race that was in the
>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>> simple: remove the special case code for async deflation in the
>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>> for ObjectMonitor::enter() protection.
>>>>>>
>>>>>> During those discussions Karen also floated the idea of using the
>>>>>> ref_count field instead of the contentions field for the Async
>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>> Sorry for all the names...
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>
>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>> stress kit is running right now.
>>>>>>
>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>> the results and analyze them.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a small but important bug fix for the Async Monitor Deflation
>>>>>>> project ready to go. It's also known as v2.02 (for those for with 
>>>>>>> the
>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev URLs). 
>>>>>>> Sorry
>>>>>>> for all the names...
>>>>>>>
>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>> patch
>>>>>>> is out of our hair.
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>
>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>> running
>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>
>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>> testing is done.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I finally have CR1 for the Async Monitor Deflation project ready to
>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>> and as
>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all the
>>>>>>>> names...
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> Baseline bug fixes URL:
>>>>>>>>
>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>> Deflation project
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>
>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/
>>>>>>>>
>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>> version
>>>>>>>> of JDK-8153224...
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>> Solaris-X64
>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. I'm
>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>> work on:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>
>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>
>>>>>>>>> Earlier versions of this patch have been through several rounds of
>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>> Kitchensink
>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>> fastdebug
>>>>>>>>> and slowdebug). Earlier versions have run my monitor inflation 
>>>>>>>>> stress
>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product,
>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>
>>>>>>>>> All of the testing done on earlier versions will be redone on the
>>>>>>>>> latest version of the patch.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> P.S.
>>>>>>>>> One subtest in 
>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been trying
>>>>>>>>> to characterize/analyze this failure for more than a week now. At
>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>> aggravating
>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> 

From robbin.ehn at oracle.com  Mon Nov 18 08:33:04 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 18 Nov 2019 09:33:04 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
Message-ID: <a19921e3-3ac2-b884-a061-226e38bfccc8@oracle.com>

Looks, good, thanks!

/Robbin

On 11/16/19 3:15 AM, Patricio Chilano wrote:
> Hi all,
> 
> Could you review the following patch?
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev
> 
> Biased locking will be disabled by default and all related flags will be 
> deprecated. Performance gains seen when the feature was introduced in the VM are 
> less clear today with modern Java code/processors. Detailed rationale behind the 
> change is included on the description of the bug.
> 
> I modified test gtest/oops/test_markWord.cpp so that it still exercises other 
> cases of markword printing.
> 
> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows and Solaris).
> 
> Thanks,
> Patricio
> 
> 

From christoph.goettschkes at microdoc.com  Mon Nov 18 08:36:39 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Mon, 18 Nov 2019 09:36:39 +0100
Subject: Build broken for ARM32 after 8231610: Relocate the CDS archive if
 it cannot be mapped to the requested address
In-Reply-To: <430c25bb-b2f7-8faa-a419-1a0ca5f39630@oracle.com>
References: <20191115135123.92163D83FB@aojmv0009>
 <430c25bb-b2f7-8faa-a419-1a0ca5f39630@oracle.com>
Message-ID: <mailman.12.1574066299.19479.hotspot-runtime-dev@openjdk.java.net>

thanks for looking into this. I will then create a new issue in the JBS 
and make an RFR for it.

-- Christoph

"hotspot-runtime-dev" <hotspot-runtime-dev-bounces at openjdk.java.net> wrote 
on 2019-11-15 19:46:04:

> From: Ioi Lam <ioi.lam at oracle.com>
> To: hotspot-runtime-dev at openjdk.java.net
> Date: 2019-11-15 19:46
> Subject: Re: Build broken for ARM32 after 8231610: Relocate the CDS 
> archive if it cannot be mapped to the requested address
> Sent by: "hotspot-runtime-dev" 
<hotspot-runtime-dev-bounces at openjdk.java.net>
> 
> Hi Christoph,
> 
> The changes look good to me. I tried them on Linux/x64 and they will be 
> triggered if muck with the value:
> 
>        _mapping_offset = 
(size_t)CompressedOops::encode_not_null((oop)base);
>        if (crc != 0) {
>          _mapping_offset += 0x100000000;
>        }
>        assert(_mapping_offset == (size_t)(uint32_t)_mapping_offset, 
> "must be 32-bit only");
> 
> We also have similar checks elsewhere in the VM:
> 
> ./cpu/x86/nativeInst_x86.cpp:  guarantee(disp == 
> (intptr_t)(int32_t)disp, "must be 32-bit offset");
> 
> Thanks
> - Ioi
> 
> 
> On 11/15/19 5:49 AM, christoph.goettschkes at microdoc.com wrote:
> > Hi,
> >
> > I am no longer able to build for ARM32 after the commit for 8231610:
> > Relocate the CDS archive if it cannot be mapped to the requested 
address
> > [1]. I am using a linaro toolchain with a GCC version 4.9.4.
> >
> > arm-linux-gnueabi-g++ (Linaro GCC 4.9-2017.01) 4.9.4
> > Copyright (C) 2015 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There 
is NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> > PURPOSE.
> >
> > src/hotspot/share/memory/filemap.cpp:1569:21:
> > error: right shift count >= width of type [-Werror]
> >     assert((offset >> 32) == 0, "must be 32-bit only");
> >
> > I guess the same check could be achieved without a right shift 
operation,
> > by casting the offset twice and comparing it?
> >     assert(offset == (size_t)(uint32_t)offset, "must be 32-bit only");
> >
> > Here is a webrev [2] for that particular fix (there are two instances
> > where size_t is right shifted by 32). Should I open a new bug for 
this,
> > or should this be discussed using the already existing bug 8231610?
> >
> > -- Christoph
> >
> > [1] https://bugs.openjdk.java.net/browse/JDK-8231610
> > [2] https://cr.openjdk.java.net/~cgo/8231610/webrev.00/
> >
> 


From christoph.goettschkes at microdoc.com  Mon Nov 18 08:41:39 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Mon, 18 Nov 2019 09:41:39 +0100
Subject: RFR: 8234324: ARM32 build broken after 8231610
Message-ID: <mailman.13.1574066598.19479.hotspot-runtime-dev@openjdk.java.net>

Hi,

please review the following change, which fixes a compilation issue 
introduce with 8231610.

I already posted the patch on the mailing list, since I wasn't sure if I 
should create a new bug for this or not. Ioi already looked at it [1] and 
pointed out, that we are already using similar checks in HotSpot [2].

Bug: https://bugs.openjdk.java.net/browse/JDK-8234324
Webrev: https://cr.openjdk.java.net/~cgo/8234324/webrev.00/

-- Christoph

[1]
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/037008.html
[2]
https://hg.openjdk.java.net/jdk/jdk/file/7bdc4f073c7f/src/hotspot/cpu/x86/nativeInst_x86.cpp#l446


From christoph.langer at sap.com  Mon Nov 18 09:00:32 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Mon, 18 Nov 2019 09:00:32 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <c6540c8a-a74e-3bff-a212-04162fdf1f63@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <c6540c8a-a74e-3bff-a212-04162fdf1f63@oracle.com>
Message-ID: <AM6PR02MB48010760659F645D806A359C8A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi David,

> This all seems fine to me. One clarification:

Thanks for the review.

> 
> - /* The appropriate location of getPrefixed() should be io_util_md.c, but
> -    java.lang.instrument package has hardwired canonicalize_md.c into their
> -    dll, to avoid complicate solution such as including io_util_md.c into
> -    that package, as a workaround we put this method here.
> -  */
> 
> I assume this hardwired usage was removed some time ago?

AFAICS, yes. libinstrument builds/links against libjava. I cannot find any duplicates of canonicalize* in there.

Any other reviews (e.g. Gerard?)

Thanks & Best regards
Christoph

> 
> Thanks,
> David
> 
> On 15/11/2019 1:37 am, Langer, Christoph wrote:
> > Hi,
> >
> > please review this cleanup change regarding function "canonicalize" of
> libjava.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
> >
> >
> > The goal is to cleanup how this function is defined and used. One thing is,
> that there was an unnecessary wrapper function "Canonicalize" in jni_util.c.
> It wrapped the call to "canonicalize". We can get rid of this wrapper.
> Unfortunately, it is not possible to just export "canonicalize" since this will
> conflict with a method signature from the math library, at least on modern
> Linuxes. So I decided to call the method JDK_Canonicalize and will correctly
> define it in jdk_util.h which can be included everywhere.
> >
> >
> >
> > Hotspot's classloader.cpp will dynamically resolve this method, so I add a
> local declaration of the function pointer in there.
> >
> >
> >
> > This change shall be predecessor of JDK-8223261, where a review was
> already started here: https://mail.openjdk.java.net/pipermail/core-libs-
> dev/2019-November/063398.html
> >
> > Thanks
> > Christoph
> >

From Alan.Bateman at oracle.com  Mon Nov 18 09:09:03 2019
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Mon, 18 Nov 2019 09:09:03 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB48010760659F645D806A359C8A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <c6540c8a-a74e-3bff-a212-04162fdf1f63@oracle.com>
 <AM6PR02MB48010760659F645D806A359C8A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <b7d17237-f7ff-baf7-9392-eadff354feed@oracle.com>

On 18/11/2019 09:00, Langer, Christoph wrote:
> :
> Any other reviews (e.g. Gerard?)
>
I plan to review this change. We also need to figure out how to remove 
the dependency on this function from the JPLIS agent as that should not 
be there.

-Alan

From christoph.langer at sap.com  Mon Nov 18 09:21:31 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Mon, 18 Nov 2019 09:21:31 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <b7d17237-f7ff-baf7-9392-eadff354feed@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <c6540c8a-a74e-3bff-a212-04162fdf1f63@oracle.com>
 <AM6PR02MB48010760659F645D806A359C8A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <b7d17237-f7ff-baf7-9392-eadff354feed@oracle.com>
Message-ID: <AM6PR02MB480186B3CAF9A6927C27A08D8A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>

> I plan to review this change. We also need to figure out how to remove
> the dependency on this function from the JPLIS agent as that should not
> be there.

Agree. I'd hope, however, that this can be done with a different change (unless you have an idea for a very simple, straightforward way that could fit under the umbrella of JDK- 8234185).

/Christoph

From aph at redhat.com  Mon Nov 18 10:03:27 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Nov 2019 10:03:27 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
Message-ID: <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>

On 11/16/19 2:15 AM, Patricio Chilano wrote:
> Biased locking will be disabled by default and all related flags will be 
> deprecated. Performance gains seen when the feature was introduced in 
> the VM are less clear today with modern Java code/processors. Detailed 
> rationale behind the change is included on the description of the bug.

IMO detailed rationale on its own isn't going to do it.  I would
expect to see detailed measurements to justify such an important
change rather than mere assertions. What do your numbers look like?

This paragraph is rather incredible: "The performance gains that were
seen in the past are far less evident today. The cost of executing
atomic instructions has decreased on modern processors since the
introduction of biased locking into the VM"

This test, with no contention:

    @Benchmark
    public void lock(BenchmarkState state) {
        synchronized(state) {
            state.n++;
        }
    }

Benchmark   Mode  Cnt   Score   Error  Units

-XX:+UseBiasedLocking:
Dummy.lock  avgt    3   2.063 ? 0.215  ns/op

-XX:-UseBiasedLocking:
Dummy.lock  avgt    3  14.991 ? 0.365  ns/op

Threadripper 2950X, 3.5Ghz.

I believe that the uncontended case for synchronized blocks is still
important.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Mon Nov 18 10:27:38 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Nov 2019 10:27:38 +0000
Subject: RFR: 8234324: ARM32 build broken after 8231610
In-Reply-To: <20191118084321.90FAA101E33@aojmv0009>
References: <20191118084321.90FAA101E33@aojmv0009>
Message-ID: <ed601d47-af72-ac97-6e73-86caee2f4041@redhat.com>

On 11/18/19 8:41 AM, christoph.goettschkes at microdoc.com wrote:
> introduce with 8231610.
> 
> I already posted the patch on the mailing list, since I wasn't sure if I 
> should create a new bug for this or not. Ioi already looked at it [1] and 
> pointed out, that we are already using similar checks in HotSpot [2].
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234324
> Webrev: https://cr.openjdk.java.net/~cgo/8234324/webrev.00/

OK.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From robbin.ehn at oracle.com  Mon Nov 18 11:07:37 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 18 Nov 2019 12:07:37 +0100
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
Message-ID: <831a0a84-b6a5-34d2-f6bd-4bacc2fa812c@oracle.com>

Looks good, thanks David!

/Robbin

On 11/18/19 3:30 AM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> 
> This was a very difficult bug to track down and I want to publicly acknowledge 
> and thank the jemalloc folk (users and developers) for continuing to investigate 
> this issue from their side. Without their persistence this issue would have 
> languished.
> 
> The thread stack_base() is the first address above the thread's stack. However, 
> the "in stack" checks performed by Thread::on_local_stack and 
> Thread::is_in_stack allowed the checked address to be equal to the stack_base() 
> - which is not correct. Here's how this manifests as the bug:
> 
> - Let a JavaThread instance, T2, be allocated at the end of thread T1's stack 
> i.e. at T1->stack_base()
>  ? [This seems to be why this only reproduced with jemalloc.]
> - Let T2 lock an inflated monitor
> - Let T1 try to lock the same monitor
>  ? - T1 would consider the _owner field value (T2) as being in its stack and so 
> consider the monitor stack-locked by T1
>  ? - And so both T1 and T2 would have ownership of the monitor allowing the 
> monitor state (and application state) to be corrupted. This results in a range 
> of hangs and crashes depending on the exact interleaving.
> 
> Interestingly Thread::is_in_usable_stack does not have this bug.
> 
> The bug can be tracked way back to JDK-6699669 as explained in the bug report. 
> That issue also showed that the same bug existed in the SA implementations of 
> these "on stack" checks.
> 
> Testing:
>  ? - The reproducer from the bug report, using jemalloc, ran over 5000 times 
> without failing in any way.
>  ? - tiers 1-3 on all Oracle platforms
>  ? - serviceability/sa tests
> 
> Thanks,
> David
> -----

From rkennke at redhat.com  Mon Nov 18 11:20:27 2019
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 18 Nov 2019 12:20:27 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <5e088c4b-2a0d-715e-047d-39578ed18270@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <5e088c4b-2a0d-715e-047d-39578ed18270@oracle.com>
Message-ID: <1c2adc12-b599-9890-9950-964b12f21207@redhat.com>

>> Could you review the following patch?
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev
>>
>> Biased locking will be disabled by default and all related flags will
>> be deprecated. Performance gains seen when the feature was introduced
>> in the VM are less clear today with modern Java code/processors.
>> Detailed rationale behind the change is included on the description of
>> the bug.
> 
> That all seems fine.


I feel like I must have missed some parts of the conversation here.

Roman


From david.holmes at oracle.com  Mon Nov 18 11:27:41 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 21:27:41 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
Message-ID: <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>

Hi Andrew,

On 18/11/2019 8:03 pm, Andrew Haley wrote:
> On 11/16/19 2:15 AM, Patricio Chilano wrote:
>> Biased locking will be disabled by default and all related flags will be
>> deprecated. Performance gains seen when the feature was introduced in
>> the VM are less clear today with modern Java code/processors. Detailed
>> rationale behind the change is included on the description of the bug.
> 
> IMO detailed rationale on its own isn't going to do it.  I would
> expect to see detailed measurements to justify such an important
> change rather than mere assertions. What do your numbers look like?
> 
> This paragraph is rather incredible: "The performance gains that were
> seen in the past are far less evident today. The cost of executing
> atomic instructions has decreased on modern processors since the
> introduction of biased locking into the VM"
> 
> This test, with no contention:
> 
>      @Benchmark
>      public void lock(BenchmarkState state) {
>          synchronized(state) {
>              state.n++;
>          }
>      }
> 
> Benchmark   Mode  Cnt   Score   Error  Units
> 
> -XX:+UseBiasedLocking:
> Dummy.lock  avgt    3   2.063 ? 0.215  ns/op
> 
> -XX:-UseBiasedLocking:
> Dummy.lock  avgt    3  14.991 ? 0.365  ns/op
> 
> Threadripper 2950X, 3.5Ghz.
> 
> I believe that the uncontended case for synchronized blocks is still
> important.

For a micro-benchmark like that sure. But is that at all representative 
of real modern code? We know some of the really old benchmarks used 
synchronized collections and StringBuffer extensively and so they also 
benefit from biased-locking. But more modern benchmarks are not showing 
any benefit.

We'd like to know the impact on real applications but we have no way to 
know that a-priori. So we're either stuck with the burden of supporting 
biased-locking forever, or we flip the switch to turn it off and see if 
it causes too many issues. Unless you see another way to determine this?

Cheers,
David
-----

From david.holmes at oracle.com  Mon Nov 18 11:32:15 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 21:32:15 +1000
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <831a0a84-b6a5-34d2-f6bd-4bacc2fa812c@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
 <831a0a84-b6a5-34d2-f6bd-4bacc2fa812c@oracle.com>
Message-ID: <a758ef7f-843e-e163-5592-aad3a4735053@oracle.com>

Thanks Robbin!

David

On 18/11/2019 9:07 pm, Robbin Ehn wrote:
> Looks good, thanks David!
> 
> /Robbin
> 
> On 11/18/19 3:30 AM, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
>> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
>>
>> This was a very difficult bug to track down and I want to publicly 
>> acknowledge and thank the jemalloc folk (users and developers) for 
>> continuing to investigate this issue from their side. Without their 
>> persistence this issue would have languished.
>>
>> The thread stack_base() is the first address above the thread's stack. 
>> However, the "in stack" checks performed by Thread::on_local_stack and 
>> Thread::is_in_stack allowed the checked address to be equal to the 
>> stack_base() - which is not correct. Here's how this manifests as the 
>> bug:
>>
>> - Let a JavaThread instance, T2, be allocated at the end of thread 
>> T1's stack i.e. at T1->stack_base()
>> ?? [This seems to be why this only reproduced with jemalloc.]
>> - Let T2 lock an inflated monitor
>> - Let T1 try to lock the same monitor
>> ?? - T1 would consider the _owner field value (T2) as being in its 
>> stack and so consider the monitor stack-locked by T1
>> ?? - And so both T1 and T2 would have ownership of the monitor 
>> allowing the monitor state (and application state) to be corrupted. 
>> This results in a range of hangs and crashes depending on the exact 
>> interleaving.
>>
>> Interestingly Thread::is_in_usable_stack does not have this bug.
>>
>> The bug can be tracked way back to JDK-6699669 as explained in the bug 
>> report. That issue also showed that the same bug existed in the SA 
>> implementations of these "on stack" checks.
>>
>> Testing:
>> ?? - The reproducer from the bug report, using jemalloc, ran over 5000 
>> times without failing in any way.
>> ?? - tiers 1-3 on all Oracle platforms
>> ?? - serviceability/sa tests
>>
>> Thanks,
>> David
>> -----

From thomas.stuefe at gmail.com  Mon Nov 18 11:58:59 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 18 Nov 2019 12:58:59 +0100
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
Message-ID: <CAA-vtUx1qAZpx4ehQPgats3QYEgUcPLJ0eiLvEmD9Ke7YBintQ@mail.gmail.com>

This is evil :)

There might be more cases like this, e.g.

frame_x86.cpp  frame::is_interpreted_frame_valid():

if (locals > thread->stack_base() || locals < (address) fp()) return false;

Also, I would have thought the little alloca() dance we do at the start
of thread_native_entry() would push the first real frame down the stack.

The fix looks good.

Cheers, Thomas


On Mon, Nov 18, 2019 at 3:31 AM David Holmes <david.holmes at oracle.com>
wrote:

> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
>
> This was a very difficult bug to track down and I want to publicly
> acknowledge and thank the jemalloc folk (users and developers) for
> continuing to investigate this issue from their side. Without their
> persistence this issue would have languished.
>
> The thread stack_base() is the first address above the thread's stack.
> However, the "in stack" checks performed by Thread::on_local_stack and
> Thread::is_in_stack allowed the checked address to be equal to the
> stack_base() - which is not correct. Here's how this manifests as the bug:
>
> - Let a JavaThread instance, T2, be allocated at the end of thread T1's
> stack i.e. at T1->stack_base()
>    [This seems to be why this only reproduced with jemalloc.]
> - Let T2 lock an inflated monitor
> - Let T1 try to lock the same monitor
>    - T1 would consider the _owner field value (T2) as being in its stack
> and so consider the monitor stack-locked by T1
>    - And so both T1 and T2 would have ownership of the monitor allowing
> the monitor state (and application state) to be corrupted. This results
> in a range of hangs and crashes depending on the exact interleaving.
>
> Interestingly Thread::is_in_usable_stack does not have this bug.
>
> The bug can be tracked way back to JDK-6699669 as explained in the bug
> report. That issue also showed that the same bug existed in the SA
> implementations of these "on stack" checks.
>
> Testing:
>    - The reproducer from the bug report, using jemalloc, ran over 5000
> times without failing in any way.
>    - tiers 1-3 on all Oracle platforms
>    - serviceability/sa tests
>
> Thanks,
> David
> -----
>

From adinn at redhat.com  Mon Nov 18 12:16:56 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 18 Nov 2019 12:16:56 +0000
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
Message-ID: <709d92c8-64ed-a12d-46e9-054b203f0f72@redhat.com>

On 18/11/2019 02:30, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> 
> This was a very difficult bug to track down and I want to publicly
> acknowledge and thank the jemalloc folk (users and developers) for
> continuing to investigate this issue from their side. Without their
> persistence this issue would have languished.
> . . .

Wow, nice work tracking that one down!

regards,


Andrew Dinn
-----------


From christoph.goettschkes at microdoc.com  Mon Nov 18 12:27:12 2019
From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com)
Date: Mon, 18 Nov 2019 13:27:12 +0100
Subject: RFR: 8234324: ARM32 build broken after 8231610
In-Reply-To: <ed601d47-af72-ac97-6e73-86caee2f4041@redhat.com>
References: <20191118084321.90FAA101E33@aojmv0009>
 <ed601d47-af72-ac97-6e73-86caee2f4041@redhat.com>
Message-ID: <mailman.14.1574080142.19479.hotspot-runtime-dev@openjdk.java.net>

Thanks for the additional review.
I updated the webrev with the reviewed-by line. Could you sponsor the 
changeset and commit it into the repository for me?

https://cr.openjdk.java.net/~cgo/8234324/webrev.01/

Thanks,
Christoph

Andrew Haley <aph at redhat.com> wrote on 2019-11-18 11:27:38:

> From: Andrew Haley <aph at redhat.com>
> To: christoph.goettschkes at microdoc.com, 
hotspot-runtime-dev at openjdk.java.net
> Date: 2019-11-18 11:27
> Subject: Re: RFR: 8234324: ARM32 build broken after 8231610
> 
> On 11/18/19 8:41 AM, christoph.goettschkes at microdoc.com wrote:
> > introduce with 8231610.
> > 
> > I already posted the patch on the mailing list, since I wasn't sure if 
I 
> > should create a new bug for this or not. Ioi already looked at it [1] 
and 
> > pointed out, that we are already using similar checks in HotSpot [2].
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234324
> > Webrev: https://cr.openjdk.java.net/~cgo/8234324/webrev.00/
> 
> OK.
> 
> -- 
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> 


From shade at redhat.com  Mon Nov 18 12:57:04 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 18 Nov 2019 13:57:04 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
Message-ID: <b4eb9ac9-330c-f1f2-fa99-54b5879b0f14@redhat.com>

(lurking off the parental leave to point out a few things) (...and resending with proper email)

There way too few details in the RFE report to make the informed decision. Also, removing it when
thread-local handshakes are finally there to make unbiases/rebiases much less painful for
performance is quite odd to see.

On 11/18/19 12:27 PM, David Holmes wrote:
> For a micro-benchmark like that sure. But is that at all representative of real modern code? We know
> some of the really old benchmarks used synchronized collections and StringBuffer extensively and so
> they also benefit from biased-locking. But more modern benchmarks are not showing any benefit.

If you want to say that SPECjbb2015 does not show improvement, then that is because we (well, me
myself!) specifically argued during its development that the lock usages there should explore
something beyond biased locking. Which is why locking paths there are more or less contended, so
that locks get out of their biased state. Therefore, arguing that biased locking is not needed
because SPECjbb2015 does not show the benefit of having it enabled -- is circular.

> We'd like to know the impact on real applications but we have no way to know that a-priori. So we're
> either stuck with the burden of supporting biased-locking forever, or we flip the switch to turn it
> off and see if it causes too many issues. Unless you see another way to determine this?

We (in Shenandoah) were back and forth on heuristically enabling/disabling UseBiasedLocking. When we
did disable it by default, we had users complain about performance penalties against other
collectors. Which lead us to reinstating it back, and it was also visible on some SPECjvm2008
benchmarks:
  http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-November/004333.html

At very least, deprecating the flag is unwarranted at this point, until we are totally sure it is
not needed. You could make it disabled by default and collect complaints, though. That would take a
few short-term releases for most interested parties to catch up with this.

Over and out. Don't rush this, please.

-- 
Thanks,
-Aleksey


From christoph.langer at sap.com  Mon Nov 18 13:22:11 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Mon, 18 Nov 2019 13:22:11 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava,  hotspot and libinstrument
In-Reply-To: <489372066.20191118140919@am-soft.de>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <489372066.20191118140919@am-soft.de>
Message-ID: <AM6PR02MB48012B1619D64259A579AC418A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi Thorsten,

I saw your other mail already but didn't find time to reply.

I'm actually not convinced that it is a good idea to add ERROR_NO_MORE_FILES to lastErrorReportable. The error codes listed there are conditions on which canonicalization of a path is stopped but the result is deemed correct. E.g. if the path only exists up to a certain directory, one can assume the rest of the path is canonic. Or if there are conditions like network errors or access denied, then further canonicalization isn't possible, too.

However, your case, the sporadic ERROR_NO_MORE_FILES, needs to be understood first. I rather think if this happens, there's a real condition for an IOException. It should definitely be analyzed and understood what the reason is for ERROR_NO_MORE_FILES. Are you aware of other reports of this issue? Was this already analyzed by some Windows experts, e.g. Microsoft support?

Best regards
Christoph


> -----Original Message-----
> From: Thorsten Sch?ning <tschoening at am-soft.de>
> Sent: Montag, 18. November 2019 14:09
> To: core-libs-dev at openjdk.java.net
> Cc: Langer, Christoph <christoph.langer at sap.com>; hotspot-runtime-
> dev at openjdk.java.net; gerard ziemski <gerard.ziemski at oracle.com>
> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
> libjava, hotspot and libinstrument
> 
> Guten Tag Langer, Christoph,
> am Donnerstag, 14. November 2019 um 16:37 schrieben Sie:
> 
> > please review this cleanup change regarding function "canonicalize" of
> libjava.
> [...]
> > The goal is to cleanup how this function is defined and used.[...]
> 
> If you are already changing "lastErrorReportable" for Windows, how
> about adding ERROR_NO_MORE_FILES there as well to not run into
> unnecessary exceptions under some circumstances?
> 
> https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-
> November/063437.html
> https://stackoverflow.com/questions/58825588/does-java-need-to-
> support-error-no-more-files-when-canonicalizing-paths-on-windo
> https://stackoverflow.com/questions/58825963/when-does-findfirstfilew-
> set-last-error-to-be-error-no-more-files-instead-of-err?noredirect=1&lq=1
> 
> Mit freundlichen Gr??en,
> 
> Thorsten Sch?ning
> 
> --
> Thorsten Sch?ning       E-Mail: Thorsten.Schoening at AM-SoFT.de
> AM-SoFT IT-Systeme      http://www.AM-SoFT.de/
> 
> Telefon...........05151-  9468- 55
> Fax...............05151-  9468- 88
> Mobil..............0178-8 9468- 04
> 
> AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
> AG Hannover HRB 207 694 - Gesch?ftsf?hrer: Andreas Muchow


From david.holmes at oracle.com  Mon Nov 18 13:25:48 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 23:25:48 +1000
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <CAA-vtUx1qAZpx4ehQPgats3QYEgUcPLJ0eiLvEmD9Ke7YBintQ@mail.gmail.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
 <CAA-vtUx1qAZpx4ehQPgats3QYEgUcPLJ0eiLvEmD9Ke7YBintQ@mail.gmail.com>
Message-ID: <d73b7b8d-3726-b9f6-05c4-179f2fa93fbc@oracle.com>

Hi Thomas,

Thanks for taking a look.

On 18/11/2019 9:58 pm, Thomas St?fe wrote:
> This is evil :)
> 
> There might be more cases like this, e.g.
> 
> frame_x86.cpp ?frame::is_interpreted_frame_valid():
> 
> if (locals > thread->stack_base() || locals < (address) fp()) return false;

Yes that might be a case where >= should be in use. I'll file another 
bug to check uses of stack_base().

> Also, I would have thought the little alloca() dance we do at the start 
> of?thread_native_entry() would push the first real frame down the stack.

I know nothing of that code. :)

> The fix looks good.

Thanks!

David
-----

> Cheers, Thomas
> 
> 
> 
> On Mon, Nov 18, 2019 at 3:31 AM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
>     webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> 
>     This was a very difficult bug to track down and I want to publicly
>     acknowledge and thank the jemalloc folk (users and developers) for
>     continuing to investigate this issue from their side. Without their
>     persistence this issue would have languished.
> 
>     The thread stack_base() is the first address above the thread's stack.
>     However, the "in stack" checks performed by Thread::on_local_stack and
>     Thread::is_in_stack allowed the checked address to be equal to the
>     stack_base() - which is not correct. Here's how this manifests as
>     the bug:
> 
>     - Let a JavaThread instance, T2, be allocated at the end of thread T1's
>     stack i.e. at T1->stack_base()
>      ? ?[This seems to be why this only reproduced with jemalloc.]
>     - Let T2 lock an inflated monitor
>     - Let T1 try to lock the same monitor
>      ? ?- T1 would consider the _owner field value (T2) as being in its
>     stack
>     and so consider the monitor stack-locked by T1
>      ? ?- And so both T1 and T2 would have ownership of the monitor
>     allowing
>     the monitor state (and application state) to be corrupted. This results
>     in a range of hangs and crashes depending on the exact interleaving.
> 
>     Interestingly Thread::is_in_usable_stack does not have this bug.
> 
>     The bug can be tracked way back to JDK-6699669 as explained in the bug
>     report. That issue also showed that the same bug existed in the SA
>     implementations of these "on stack" checks.
> 
>     Testing:
>      ? ?- The reproducer from the bug report, using jemalloc, ran over 5000
>     times without failing in any way.
>      ? ?- tiers 1-3 on all Oracle platforms
>      ? ?- serviceability/sa tests
> 
>     Thanks,
>     David
>     -----
> 

From thomas.stuefe at gmail.com  Mon Nov 18 13:31:13 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 18 Nov 2019 14:31:13 +0100
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <d73b7b8d-3726-b9f6-05c4-179f2fa93fbc@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
 <CAA-vtUx1qAZpx4ehQPgats3QYEgUcPLJ0eiLvEmD9Ke7YBintQ@mail.gmail.com>
 <d73b7b8d-3726-b9f6-05c4-179f2fa93fbc@oracle.com>
Message-ID: <CAA-vtUynvsE+GCS5+gb-8Fw9U78yP+jQrT4WH=DZXxbyVkyA3A@mail.gmail.com>

Hi David,

On Mon, Nov 18, 2019 at 2:26 PM David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> Thanks for taking a look.
>
> On 18/11/2019 9:58 pm, Thomas St?fe wrote:
> > This is evil :)
> >
> > There might be more cases like this, e.g.
> >
> > frame_x86.cpp  frame::is_interpreted_frame_valid():
> >
> > if (locals > thread->stack_base() || locals < (address) fp()) return
> false;
>
> Yes that might be a case where >= should be in use. I'll file another
> bug to check uses of stack_base().
>
>
Many of them could use Thread::in_usable_stack(), I assume.


> > Also, I would have thought the little alloca() dance we do at the start
> > of thread_native_entry() would push the first real frame down the stack.
>
> I know nothing of that code. :)
>

See os_linux.cpp:
...
  // Try to randomize the cache line index of hot stack frames.
  // This helps when threads of the same stack traces evict each other's
  // cache lines. The threads can be either from the same JVM instance, or
  // from different JVM instances. The benefit is especially true for
  // processors with hyperthreading technology.
  static int counter = 0;
  int pid = os::current_process_id();
  alloca(((pid ^ counter++) & 7) * 128);


> > The fix looks good.
>
> Thanks!
>
> David
> -----
>
>
Cheers, Thomas


> > Cheers, Thomas
> >
> >
> >
> > On Mon, Nov 18, 2019 at 3:31 AM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> >     Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> >     webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> >
> >     This was a very difficult bug to track down and I want to publicly
> >     acknowledge and thank the jemalloc folk (users and developers) for
> >     continuing to investigate this issue from their side. Without their
> >     persistence this issue would have languished.
> >
> >     The thread stack_base() is the first address above the thread's
> stack.
> >     However, the "in stack" checks performed by Thread::on_local_stack
> and
> >     Thread::is_in_stack allowed the checked address to be equal to the
> >     stack_base() - which is not correct. Here's how this manifests as
> >     the bug:
> >
> >     - Let a JavaThread instance, T2, be allocated at the end of thread
> T1's
> >     stack i.e. at T1->stack_base()
> >         [This seems to be why this only reproduced with jemalloc.]
> >     - Let T2 lock an inflated monitor
> >     - Let T1 try to lock the same monitor
> >         - T1 would consider the _owner field value (T2) as being in its
> >     stack
> >     and so consider the monitor stack-locked by T1
> >         - And so both T1 and T2 would have ownership of the monitor
> >     allowing
> >     the monitor state (and application state) to be corrupted. This
> results
> >     in a range of hangs and crashes depending on the exact interleaving.
> >
> >     Interestingly Thread::is_in_usable_stack does not have this bug.
> >
> >     The bug can be tracked way back to JDK-6699669 as explained in the
> bug
> >     report. That issue also showed that the same bug existed in the SA
> >     implementations of these "on stack" checks.
> >
> >     Testing:
> >         - The reproducer from the bug report, using jemalloc, ran over
> 5000
> >     times without failing in any way.
> >         - tiers 1-3 on all Oracle platforms
> >         - serviceability/sa tests
> >
> >     Thanks,
> >     David
> >     -----
> >
>

From david.holmes at oracle.com  Mon Nov 18 13:50:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 18 Nov 2019 23:50:22 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <428b5bd0-3519-9ec6-1be1-d965f1b14e1e@redhat.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <428b5bd0-3519-9ec6-1be1-d965f1b14e1e@redhat.com>
Message-ID: <4d2be622-d6af-cc89-159a-45c92c25d9ba@oracle.com>

Hi Aleksey,

On 18/11/2019 10:55 pm, Aleksey Shipilev wrote:
> (lurking off the parental leave to point out a few things)
> 
> There way too few details in the RFE report to make the informed decision. Also, removing it when
> thread-local handshakes are finally there to make unbiases/rebiases much less painful for
> performance is quite odd to see.

I'm not sure where TLH stands in relation to biased-locking at this 
stage. I think it is partly the work on trying to use TLH that 
motivated seeing if we can just remove biased-locking altogether.

> On 11/18/19 12:27 PM, David Holmes wrote:
>> For a micro-benchmark like that sure. But is that at all representative of real modern code? We know
>> some of the really old benchmarks used synchronized collections and StringBuffer extensively and so
>> they also benefit from biased-locking. But more modern benchmarks are not showing any benefit.
> 
> If you want to say that SPECjbb2015 does not show improvement, then that is because we (well, me
> myself!) specifically argued during its development that the lock usages there should explore
> something beyond biased locking. Which is why locking paths there are more or less contended, so
> that locks get out of their biased state. Therefore, arguing that biased locking is not needed
> because SPECjbb2015 does not show the benefit of having it enabled -- is circular.

As I have no knowledge of how any of these benchmarks are designed or 
what they are intended to try and demonstrate I can't really comment. I 
personally loathe the silly benchmark dances that we do trying to 
convince people that if benchmark X runs well then so will some set of 
real applications. But if the current benchmark people are caring about 
is specjbb2015 then it apparently runs better without biased-locking.

>> We'd like to know the impact on real applications but we have no way to know that a-priori. So we're
>> either stuck with the burden of supporting biased-locking forever, or we flip the switch to turn it
>> off and see if it causes too many issues. Unless you see another way to determine this?
> 
> We (in Shenandoah) were back and forth on heuristically enabling/disabling UseBiasedLocking. When we
> did disable it by default, we had users complain about performance penalties against other
> collectors. Which lead us to reinstating it back, and it was also visible on some SPECjvm2008
> benchmarks:
>    http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-November/004333.html

The discussion is very similar to the to-and-fro we have also had in 
Oracle over the past couple of years. Some benchmarks show benefit from 
biased-locking because they (over?) use uncontended locking which is 
exactly what biased-locking was designed to cheapen. But do we care 
about such benchmarks? Do they tell us anything interesting about the 
impact on real applications?

Can you share which users complained, and for what applications?

> At very least, deprecating the flag is unwarranted at this point, until we are totally sure it is
> not needed. You could make it disabled by default and collect complaints, though. That would take a
> few short-term releases for most interested parties to catch up with this.
> 
> Over and out. Don't rush this, please.

Well as I said we've already been sitting on this for quite a while. :)

Thanks,
David
-----


> -Aleksey
> 

From daniel.daugherty at oracle.com  Mon Nov 18 13:50:50 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 18 Nov 2019 08:50:50 -0500
Subject: RFR(L) 8153224 Monitor deflation prolong safepoints
 (CR8/v2.08/11-for-jdk14)
In-Reply-To: <f9439547-1266-ea5e-8346-4e0e0b25223d@oracle.com>
References: <62729044-8a22-0e20-0eda-04d47c9ea23c@oracle.com>
 <d39a21e5-2375-a530-cf61-af3f84a067a6@oracle.com>
 <313e51c8-b672-bb1c-577a-49868f09e6c1@oracle.com>
 <1fd54b23-bd35-9b8f-e6f3-6000440d8770@oracle.com>
 <bfc1b0d2-6813-53e5-7255-ffb06156daeb@oracle.com>
 <3fb1a2b5-5aad-eab3-09ba-6f64a2242d30@oracle.com>
 <e3ff1c7f-9220-a1a6-e3e7-00dcc24a9bcf@oracle.com>
 <38e2d441-11c9-8342-37d5-8030dd06f2f4@oracle.com>
 <e431113c-b56e-1f43-cf0f-ce76cb58548d@oracle.com>
 <f9439547-1266-ea5e-8346-4e0e0b25223d@oracle.com>
Message-ID: <7c51cdd8-f2d2-7f5a-5813-43e78116ff9f@oracle.com>

Thanks.

Dan


On 11/17/19 11:13 PM, David Holmes wrote:
> Hi Dan,
>
> No further comments from me at this stage.
>
> Thanks,
> David
>
> On 5/11/2019 7:03 am, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have made changes to the Async Monitor Deflation code in response to
>> the CR7/v2.07/10-for-jdk14 code review cycle. Thanks to David H., Robbin
>> and Erik O. for their comments!
>>
>> JDK14 Rampdown phase one is coming on Dec. 12, 2019 and the Async 
>> Monitor
>> Deflation project needs to push before Nov. 12, 2019 in order to allow
>> for sufficient bake time for such a big change. Nov. 12 is _next_ 
>> Tuesday
>> so we have 8 days from today to finish this code review cycle and push
>> this code for JDK14.
>>
>> Carsten and Roman! Time for you guys to chime in again on the code 
>> reviews.
>>
>> I have attached the change list from CR7 to CR8 instead of putting it in
>> the body of this email. I've also added a link to the CR7-to-CR8-changes
>> file to the webrevs so it should be easy to find.
>>
>> Main bug URL:
>>
>> ???? JDK-8153224 Monitor deflation prolong safepoints
>> ???? https://bugs.openjdk.java.net/browse/JDK-8153224
>>
>> The project is currently baselined on jdk-14+21.
>>
>> Here's the full webrev URL for those folks that want to see all of the
>> current Async Monitor Deflation code in one go (v2.08 full):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.full 
>>
>>
>> Some folks might want to see just what has changed since the last review
>> cycle so here's a webrev for that (v2.08 inc):
>>
>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/11-for-jdk14.v2.08.inc/ 
>>
>>
>> The OpenJDK wiki did not need any changes for this round:
>>
>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>
>> The jdk-14+21 based v2.08 version of the patch has been thru Mach5 
>> tier[1-8]
>> testing on Oracle's usual set of platforms. It has also been through 
>> my usual
>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>> addition
>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>> tests in
>> my lab. Some testing is still running, but so far there are no new 
>> regressions.
>>
>> I have not yet done a SPECjbb2015 round on the CR8/v2.08/11-for-jdk14 
>> bits.
>>
>> Thanks, in advance, for any questions, comments or suggestions.
>>
>> Dan
>>
>>
>> On 10/17/19 5:50 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The Async Monitor Deflation project is reaching the end game. I have no
>>> changes planned for the project at this time so all that is left is 
>>> code
>>> review and any changes that results from those reviews.
>>>
>>> Carsten and Roman! Time for you guys to chime in again on the code 
>>> reviews.
>>>
>>> I have attached the list of fixes from CR6 to CR7 instead of putting it
>>> in the main body of this email.
>>>
>>> Main bug URL:
>>>
>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>
>>> The project is currently baselined on jdk-14+19.
>>>
>>> Here's the full webrev URL for those folks that want to see all of the
>>> current Async Monitor Deflation code in one go (v2.07 full):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.full 
>>>
>>>
>>> Some folks might want to see just what has changed since the last 
>>> review
>>> cycle so here's a webrev for that (v2.07 inc):
>>>
>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/10-for-jdk14.v2.07.inc/ 
>>>
>>>
>>> The OpenJDK wiki has been updated to match the 
>>> CR7/v2.07/10-for-jdk14 changes:
>>>
>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>
>>> The jdk-14+18 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-8]
>>> testing on Oracle's usual set of platforms. It has also been through 
>>> my usual
>>> set of stress testing on Linux-X64, macOSX and Solaris-X64 with the 
>>> addition
>>> of Robbin's "MoCrazy 1024" test running in parallel with the other 
>>> tests in
>>> my lab.
>>>
>>> The jdk-14+19 based v2.07 version of the patch has been thru Mach5 
>>> tier[1-3]
>>> test on Oracle's usual set of platforms. Mach5 tier[4-8] are in 
>>> process.
>>>
>>> I did another round of SPECjbb2015 testing in Oracle's Aurora 
>>> Performance lab
>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs:
>>>
>>> ??? - "base" is jdk-14+18
>>> ??? - "v2.07" is the latest version and includes C2 
>>> inc_om_ref_count() support
>>> ????? on LP64 X64 and the new HandshakeAfterDeflateIdleMonitors option
>>> ? ? - "off" is with -XX:-AsyncDeflateIdleMonitors specified
>>> ? ? - "handshake" is with -XX:+HandshakeAfterDeflateIdleMonitors 
>>> specified
>>>
>>> ???????? hbIR?????????? hbIR
>>> ??? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>> ??? ---------------? ---------? --------? ------------- -------
>>> ?????????? 34282.00?? 30635.90? 28831.30?????? 20969.20 3841.30 base
>>> ?????????? 34282.00?? 30973.00? 29345.80?????? 21025.20 3964.10 v2.07
>>> ?????????? 34282.00?? 31105.60? 29174.30?????? 21074.00 3931.30 
>>> v2.07_handshake
>>> ?????????? 34282.00?? 30789.70? 27151.60?????? 19839.10 3850.20 
>>> v2.07_off
>>>
>>> ??? - The Aurora Perf comparison tool reports:
>>>
>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>> ??????? ----------------------? -------------------- 
>>> --------------------
>>> ??????? base vs 2.07??????????? +1.78% (s, p=0.000)?? +0.27% (ns, 
>>> p=0.790)
>>> ??????? base vs 2.07_handshake? +1.19% (s, p=0.007)?? +0.58% (ns, 
>>> p=0.536)
>>> ??????? base vs 2.07_off??????? -5.83% (ns, p=0.394)? -5.39% (ns, 
>>> p=0.347)
>>>
>>> ??????? (s) - significant? (ns) - not-significant
>>>
>>> ??? - For historical comparison, the Aurora Perf comparision tool
>>> ??????? reported for v2.06 with a baseline of jdk-13+31:
>>>
>>> ??????? Comparison????????????? max-jOPS critical-jOPS
>>> ??????? ----------------------? -------------------- 
>>> --------------------
>>> ??????? base vs 2.06??????????? -0.32% (ns, p=0.345)? +0.71% (ns, 
>>> p=0.646)
>>> ??????? base vs 2.06_off??????? +0.49% (ns, p=0.292)? -1.21% (ns, 
>>> p=0.481)
>>>
>>> ??????? (s) - significant? (ns) - not-significant
>>>
>>> Thanks, in advance, for any questions, comments or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 8/28/19 5:02 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> The Async Monitor Deflation project has rebased to JDK14 so it's time
>>>> for our first code review in that new context!!
>>>>
>>>> I've been focused on changing the monitor list management code to be
>>>> lock-free in order to make SPECjbb2015 happier. Of course with a 
>>>> change
>>>> like that, it takes a while to chase down all the new and wonderful
>>>> races. At this point, I have the code back to the same stability that
>>>> I had with CR5/v2.05/8-for-jdk13.
>>>>
>>>> To lay the ground work for this round of review, I pushed the 
>>>> following
>>>> two fixes to jdk/jdk earlier today:
>>>>
>>>> ??? JDK-8230184 rename, whitespace, indent and comments changes in 
>>>> preparation
>>>> ? ? ??????????? for lock free Monitor lists
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230184
>>>>
>>>> ??? JDK-8230317 serviceability/sa/ClhsdbPrintStatics.java fails 
>>>> after 8230184
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8230317
>>>>
>>>> I have attached the list of fixes from CR5 to CR6 instead of putting
>>>> in the main body of this email.
>>>>
>>>> Main bug URL:
>>>>
>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>
>>>> The project is currently baselined on jdk-14+11 plus the fixes for
>>>> JDK-8230184 and JDK-8230317.
>>>>
>>>> Here's the full webrev URL for those folks that want to see all of the
>>>> current Async Monitor Deflation code in one go (v2.06 full):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/ 
>>>>
>>>>
>>>>
>>>> The primary focus of this review cycle is on the lock-free Monitor 
>>>> List
>>>> management changes so here's a webrev for just that patch (v2.06c):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06c.inc/ 
>>>>
>>>>
>>>> The secondary focus of this review cycle is on the bug fixes that have
>>>> been made since CR5/v2.05/8-for-jdk13 so here's a webrev for just that
>>>> patch (v2.06b):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06b.inc/ 
>>>>
>>>>
>>>> The third and final bucket for this review cycle is the rename, 
>>>> whitespace,
>>>> indent and comments changes made in preparation for lock free 
>>>> Monitor list
>>>> management. Almost all of that was extracted into JDK-8230184 for the
>>>> baseline so this bucket now has just a few comment changes relative to
>>>> CR5/v2.05/8-for-jdk13. Here's a webrev for the remainder (v2.06a):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06a.inc/ 
>>>>
>>>>
>>>>
>>>> Some folks might want to see just what has changed since the last 
>>>> review
>>>> cycle so here's a webrev for that (v2.06 inc):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/ 
>>>>
>>>>
>>>>
>>>> Last, but not least, some folks might want to see the code before the
>>>> addition of lock-free Monitor List management so here's a webrev for
>>>> that (v2.00 -> v2.05):
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.05.inc/ 
>>>>
>>>>
>>>> The OpenJDK wiki will need minor updates to match the CR6 changes:
>>>>
>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>
>>>> but that should only be changes to describe per-thread list async 
>>>> monitor
>>>> deflation being done by the ServiceThread.
>>>>
>>>> (I did update the OpenJDK wiki for the CR5 changes back on 2019.08.14)
>>>>
>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>> Oracle's usual set of platforms. It has also been through my usual set
>>>> of stress testing on Linux-X64, macOSX and Solaris-X64.
>>>>
>>>> I did a bunch of SPECjbb2015 testing in Oracle's Aurora Performance 
>>>> lab
>>>> using using their tuned SPECjbb2015 Linux-X64 G1 configs. This was 
>>>> using
>>>> this patch baselined on jdk-13+31 (for stability):
>>>>
>>>> ????????? hbIR?????????? hbIR
>>>> ???? (max attempted)? (settled)? max-jOPS? critical-jOPS runtime
>>>> ???? ---------------? ---------? --------? ------------- -------
>>>> ??????????? 34282.00?? 28837.20? 27905.20?????? 19817.40 3658.10 base
>>>> ??????????? 34965.70?? 29798.80? 27814.90?????? 19959.00 3514.60 
>>>> v2.06d
>>>> ??????????? 34282.00?? 29100.70? 28042.50?????? 19577.00 3701.90 
>>>> v2.06d_off
>>>> ??????????? 34282.00?? 29218.50? 27562.80?????? 19397.30 3657.60 
>>>> v2.06d_ocache
>>>> ??????????? 34965.70?? 29838.30? 26512.40?????? 19170.60 3569.90 v2.05
>>>> ??????????? 34282.00?? 28926.10? 27734.00?????? 19835.10 3588.40 
>>>> v2.05_off
>>>>
>>>> The "off" configs are with -XX:-AsyncDeflateIdleMonitors specified and
>>>> the "ocache" config is with 128 byte cache line sizes instead of 64 
>>>> byte
>>>> cache lines sizes. "v2.06d" is the last set of changes that I made 
>>>> before
>>>> those changes were distributed into the "v2.06a", "v2.06b" and 
>>>> "v2.06c"
>>>> buckets for this review recycle.
>>>>
>>>>
>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 7/11/19 3:49 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've been focused on chasing down and fixing the rare test failures
>>>>> that only pop up rarely. So this round is primarily fixes for races
>>>>> with a few additional fixes that came from Karen's review of CR4.
>>>>> Thanks Karen!
>>>>>
>>>>> I have attached the list of fixes from CR4 to CR5 instead of putting
>>>>> in the main body of this email.
>>>>>
>>>>> Main bug URL:
>>>>>
>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>
>>>>> The project is currently baselined on jdk-13+29. This will likely be
>>>>> the last JDK13 baseline for this project and I'll roll to the JDK14
>>>>> (jdk/jdk) repo soon...
>>>>>
>>>>> Here's the full webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.full/
>>>>>
>>>>> Here's the incremental webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/8-for-jdk13.inc/
>>>>>
>>>>> I have not yet checked the OpenJDK wiki to see if it needs any 
>>>>> updates
>>>>> to match the CR5 changes:
>>>>>
>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
>>>>>
>>>>> (I did update the OpenJDK wiki for the CR4 changes back on 
>>>>> 2019.06.26)
>>>>>
>>>>> This version of the patch has been thru Mach5 tier[1-3] testing on
>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running now and
>>>>> Mach5 tier[78] will follow. I'll kick off the usual stress testing
>>>>> on Linux-X64, macOSX and Solaris-X64 as those machines become 
>>>>> available.
>>>>> Since I haven't made any performance changes in this round, I'll only
>>>>> be running SPECjbb2015 to gather the latest monitorinflation logs.
>>>>>
>>>>> Next up:
>>>>>
>>>>> - We're still seeing 4-5% lower performance with SPECjbb2015 on
>>>>> ? Linux-X64 and we've determined that some of that comes from
>>>>> ? contention on the gListLock. So I'm going to investigate removing
>>>>> ? the gListLock. Yes, another lock free set of changes is coming!
>>>>> - Of course, going lock free often causes new races and new failures
>>>>> ? so that's a good reason for make those changes isolated in their
>>>>> ? own round (and not holding up CR5/v2.05/8-for-jdk13 anymore).
>>>>> - I finally have a potential fix for the Win* failure with
>>>>> ? ? gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>> ? but I haven't run it through Mach5 yet so it'll be in the next 
>>>>> round.
>>>>> - Some RTM tests were recently re-enabled in Mach5 and I'm seeing 
>>>>> some
>>>>> ? monitor related failures there. I suspect that I need to go take a
>>>>> ? look at the C2 RTM macro assembler code and look for things that 
>>>>> might
>>>>> ? conflict if Async Monitor Deflation. If you're interested in 
>>>>> that kind
>>>>> ? of issue, then see the macroAssembler_x86.cpp sanity check that I
>>>>> ? added in this round!
>>>>>
>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 5/26/19 8:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix for an issue that came up during performance testing.
>>>>>> Many thanks to Robbin for diagnosing the issue in his SPECjbb2015
>>>>>> experiments.
>>>>>>
>>>>>> Here's the list of changes from CR3 to CR4. The list is a bit
>>>>>> verbose due to the complexity of the issue, but the changes
>>>>>> themselves are not that big.
>>>>>>
>>>>>> Functional:
>>>>>> ? - Change SafepointSynchronize::is_cleanup_needed() from calling
>>>>>> ??? ObjectSynchronizer::is_cleanup_needed() to calling
>>>>>> ??? ObjectSynchronizer::is_safepoint_deflation_needed():
>>>>>> ??? - is_safepoint_deflation_needed() returns the result of
>>>>>> ????? monitors_used_above_threshold() for safepoint based
>>>>>> ????? monitor deflation (!AsyncDeflateIdleMonitors).
>>>>>> ??? - For AsyncDeflateIdleMonitors, it only returns true if
>>>>>> ????? there is a special deflation request, e.g., System.gc()
>>>>>> ????? - This solves a bug where there are a bunch of Cleanup
>>>>>> ??????? safepoints that simply request async deflation which
>>>>>> ??????? keeps the async JavaThreads from making progress on
>>>>>> ??????? their async deflation work.
>>>>>> ? - Add AsyncDeflationInterval diagnostic option. Description:
>>>>>> ????? Async deflate idle monitors every so many milliseconds when
>>>>>> ????? MonitorUsedDeflationThreshold is exceeded (0 is off).
>>>>>> ? - Replace ObjectSynchronizer::gOmShouldDeflateIdleMonitors() with
>>>>>> ??? ObjectSynchronizer::is_async_deflation_needed():
>>>>>> ??? - is_async_deflation_needed() returns true when
>>>>>> ????? is_async_cleanup_requested() is true or when
>>>>>> ????? monitors_used_above_threshold() is true (but no more often 
>>>>>> than
>>>>>> ????? AsyncDeflationInterval).
>>>>>> ??? - if AsyncDeflateIdleMonitors Service_lock->wait() now waits for
>>>>>> ????? at most GuaranteedSafepointInterval millis:
>>>>>> ????? - This allows is_async_deflation_needed() to be checked at
>>>>>> ??????? the same interval as GuaranteedSafepointInterval.
>>>>>> ??????? (default is 1000 millis/1 second)
>>>>>> ????? - Once is_async_deflation_needed() has returned true, it
>>>>>> ??????? generally cannot return true for AsyncDeflationInterval.
>>>>>> ??????? This is to prevent async deflation from swamping the
>>>>>> ??????? ServiceThread.
>>>>>> ? - The ServiceThread still handles async deflation of the global
>>>>>> ??? in-use list and now it also marks JavaThreads for async 
>>>>>> deflation
>>>>>> ??? of their in-use lists.
>>>>>> ??? - The ServiceThread will check for async deflation work every
>>>>>> ????? GuaranteedSafepointInterval.
>>>>>> ??? - A safepoint can still cause the ServiceThread to check for
>>>>>> ????? async deflation work via is_async_deflation_requested.
>>>>>> ? - Refactor code from ObjectSynchronizer::is_cleanup_needed() into
>>>>>> ??? monitors_used_above_threshold() and remove is_cleanup_needed().
>>>>>> ? - In addition to System.gc(), the VM_Exit VM op and the final
>>>>>> ??? VMThread safepoint now set the is_special_deflation_requested
>>>>>> ??? flag to reduce the in-use monitor population that is reported by
>>>>>> ??? ObjectSynchronizer::log_in_use_monitor_details() at VM exit.
>>>>>>
>>>>>> Test update:
>>>>>> ? - test/hotspot/gtest/oops/test_markOop.cpp is updated to work with
>>>>>> ??? AsyncDeflateIdleMonitors.
>>>>>>
>>>>>> Collateral:
>>>>>> ? - Add/clarify/update some logging messages.
>>>>>>
>>>>>> Cleanup:
>>>>>> ? - Updated comments based on Karen's code review.
>>>>>> ? - Change 'special cleanup' -> 'special deflation' and
>>>>>> ??? 'async cleanup' -> 'async deflation'.
>>>>>> ??? - comment and function name changes
>>>>>> ? - Clarify MonitorUsedDeflationThreshold description;
>>>>>>
>>>>>>
>>>>>> Main bug URL:
>>>>>>
>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>
>>>>>> The project is currently baselined on jdk-13+22.
>>>>>>
>>>>>> Here's the full webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.full/
>>>>>>
>>>>>> Here's the incremental webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/7-for-jdk13.inc/
>>>>>>
>>>>>> I have not updated the OpenJDK wiki to reflect the CR4 changes:
>>>>>>
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>
>>>>>>
>>>>>> The wiki doesn't say a whole lot about the async deflation 
>>>>>> invocation
>>>>>> mechanism so I have to figure out how to add that content.
>>>>>>
>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run is
>>>>>> running now. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>> are running on Linux-X64, MacOSX and Solaris-X64. I still have to 
>>>>>> run
>>>>>> my stress kit on Linux-X64. I still have to run the SPECjbb2015
>>>>>> baseline and CR4 runs on Linux-X64, MacOSX and Solaris-X64.
>>>>>>
>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> On 5/6/19 11:52 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I had some discussions with Karen about a race that was in the
>>>>>>> ObjectMonitor::enter() code in CR2/v2.02/5-for-jdk13. This race was
>>>>>>> theoretical and I had no test failures due to it. The fix is pretty
>>>>>>> simple: remove the special case code for async deflation in the
>>>>>>> ObjectMonitor::enter() function and rely solely on the ref_count
>>>>>>> for ObjectMonitor::enter() protection.
>>>>>>>
>>>>>>> During those discussions Karen also floated the idea of using the
>>>>>>> ref_count field instead of the contentions field for the Async
>>>>>>> Monitor Deflation protocol. I decided to go ahead and code up that
>>>>>>> change and I have run it through the usual stress and Mach5 testing
>>>>>>> with no issues. It's also known as v2.03 (for those for with the
>>>>>>> patches) and as webrev/6-for-jdk13 (for those with webrev URLs).
>>>>>>> Sorry for all the names...
>>>>>>>
>>>>>>> Main bug URL:
>>>>>>>
>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>
>>>>>>> The project is currently baselined on jdk-13+18.
>>>>>>>
>>>>>>> Here's the full webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.full/
>>>>>>>
>>>>>>> Here's the incremental webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/6-for-jdk13.inc/
>>>>>>>
>>>>>>> I have also updated the OpenJDK wiki to reflect the CR3 changes:
>>>>>>>
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>
>>>>>>>
>>>>>>> This version of the patch has been thru Mach5 tier[1-8] testing on
>>>>>>> Oracle's usual set of platforms. My Solaris-X64 stress kit run had
>>>>>>> no issues. Kitchensink8H on product, fastdebug, and slowdebug bits
>>>>>>> had no failures on Linux-X64; MacOSX fastdebug and slowdebug and
>>>>>>> Solaris-X64 release had the usual "Too large time diff" complaints.
>>>>>>> 12 hour Inflate2 runs on product, fastdebug and slowdebug bits on
>>>>>>> Linux-X64, MacOSX and Solaris-X64 had no failures. My Linux-X64
>>>>>>> stress kit is running right now.
>>>>>>>
>>>>>>> I've done the SPECjbb2015 baseline and CR3 runs. I need to gather
>>>>>>> the results and analyze them.
>>>>>>>
>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 4/25/19 12:38 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a small but important bug fix for the Async Monitor 
>>>>>>>> Deflation
>>>>>>>> project ready to go. It's also known as v2.02 (for those for 
>>>>>>>> with the
>>>>>>>> patches) and as webrev/5-for-jdk13 (for those with webrev 
>>>>>>>> URLs). Sorry
>>>>>>>> for all the names...
>>>>>>>>
>>>>>>>> JDK-8222295 was pushed to jdk/jdk two days ago so that baseline 
>>>>>>>> patch
>>>>>>>> is out of our hair.
>>>>>>>>
>>>>>>>> Main bug URL:
>>>>>>>>
>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>
>>>>>>>> The project is currently baselined on jdk-13+17.
>>>>>>>>
>>>>>>>> Here's the full webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/5-for-jdk13.inc/
>>>>>>>>
>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR2 
>>>>>>>> changes:
>>>>>>>>
>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>
>>>>>>>>
>>>>>>>> This version of the patch has been thru Mach5 tier[1-6] testing on
>>>>>>>> Oracle's usual set of platforms. Mach5 tier[7-8] is running now.
>>>>>>>> My stress kit is running on Solaris-X64 now. Kitchensink8H is 
>>>>>>>> running
>>>>>>>> now on product, fastdebug, and slowdebug bits on Linux-X64, MacOSX
>>>>>>>> and Solaris-X64. 12 hour Inflate2 runs are running now on product,
>>>>>>>> fastdebug and slowdebug bits on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>> I'll start my my stress kit on Linux-X64 sometime on Sunday (after
>>>>>>>> my jdk-13+18 stress run is done).
>>>>>>>>
>>>>>>>> I'll do SPECjbb2015 baseline and CR2 runs after all the stress
>>>>>>>> testing is done.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/19/19 11:58 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I finally have CR1 for the Async Monitor Deflation project 
>>>>>>>>> ready to
>>>>>>>>> go. It's also known as v2.01 (for those for with the patches) 
>>>>>>>>> and as
>>>>>>>>> webrev/4-for-jdk13 (for those with webrev URLs). Sorry for all 
>>>>>>>>> the
>>>>>>>>> names...
>>>>>>>>>
>>>>>>>>> Main bug URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>
>>>>>>>>> Baseline bug fixes URL:
>>>>>>>>>
>>>>>>>>> ??? JDK-8222295 more baseline cleanups from Async Monitor 
>>>>>>>>> Deflation project
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222295
>>>>>>>>>
>>>>>>>>> The project is currently baselined on jdk-13+15.
>>>>>>>>>
>>>>>>>>> Here's the webrev for the latest baseline changes (JDK-8222295):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.8222295 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the full webrev URL (JDK-8153224 only):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev URL (JDK-8153224):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/4-for-jdk13.inc/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So I'm looking for reviews for both JDK-8222295 and the latest 
>>>>>>>>> version
>>>>>>>>> of JDK-8153224...
>>>>>>>>>
>>>>>>>>> I still have to update the OpenJDK wiki to reflect the CR 
>>>>>>>>> changes:
>>>>>>>>>
>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This version of the patch has been thru Mach5 tier[1-3] 
>>>>>>>>> testing on
>>>>>>>>> Oracle's usual set of platforms. Mach5 tier[4-6] is running 
>>>>>>>>> now and
>>>>>>>>> Mach5 tier[78] will be run later today. My stress kit on 
>>>>>>>>> Solaris-X64
>>>>>>>>> is running now. Linux-X64 stress testing will start on Sunday. 
>>>>>>>>> I'm
>>>>>>>>> planning to do Kitchensink runs, SPECjbb2015 runs and my monitor
>>>>>>>>> inflation stress tests on Linux-X64, MacOSX and Solaris-X64.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/24/19 9:57 AM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> Welcome to the OpenJDK review thread for my port of Carsten's 
>>>>>>>>>> work on:
>>>>>>>>>>
>>>>>>>>>> ??? JDK-8153224 Monitor deflation prolong safepoints
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153224
>>>>>>>>>>
>>>>>>>>>> Here's a link to the OpenJDK wiki that describes my port:
>>>>>>>>>>
>>>>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13/
>>>>>>>>>>
>>>>>>>>>> Here's a link to Carsten's original webrev:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
>>>>>>>>>>
>>>>>>>>>> Earlier versions of this patch have been through several 
>>>>>>>>>> rounds of
>>>>>>>>>> preliminary review. Many thanks to Carsten, Coleen, Robbin, and
>>>>>>>>>> Roman for their preliminary code review comments. A very special
>>>>>>>>>> thanks to Robbin and Roman for building and testing the patch in
>>>>>>>>>> their own environments (including specJBB2015).
>>>>>>>>>>
>>>>>>>>>> This version of the patch has been thru Mach5 tier[1-8] 
>>>>>>>>>> testing on
>>>>>>>>>> Oracle's usual set of platforms. Earlier versions have been run
>>>>>>>>>> through my stress kit on my Linux-X64 and Solaris-X64 servers
>>>>>>>>>> (product, fastdebug, slowdebug).Earlier versions have run 
>>>>>>>>>> Kitchensink
>>>>>>>>>> for 12 hours on MacOSX, Linux-X64 and Solaris-X64 (product, 
>>>>>>>>>> fastdebug
>>>>>>>>>> and slowdebug). Earlier versions have run my monitor 
>>>>>>>>>> inflation stress
>>>>>>>>>> tests for 12 hours on MacOSX, Linux-X64 and Solaris-X64 
>>>>>>>>>> (product,
>>>>>>>>>> fastdebug and slowdebug).
>>>>>>>>>>
>>>>>>>>>> All of the testing done on earlier versions will be redone on 
>>>>>>>>>> the
>>>>>>>>>> latest version of the patch.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any questions, comments or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> P.S.
>>>>>>>>>> One subtest in 
>>>>>>>>>> gc/g1/humongousObjects/TestHumongousClassLoader.java
>>>>>>>>>> is currently failing in -Xcomp mode on Win* only. I've been 
>>>>>>>>>> trying
>>>>>>>>>> to characterize/analyze this failure for more than a week 
>>>>>>>>>> now. At
>>>>>>>>>> this point I'm convinced that Async Monitor Deflation is 
>>>>>>>>>> aggravating
>>>>>>>>>> an existing bug. However, I plan to have a better handle on that
>>>>>>>>>> failure before these bits are pushed to the jdk/jdk repo.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>


From boris.ulasevich at bell-sw.com  Mon Nov 18 14:35:04 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Mon, 18 Nov 2019 17:35:04 +0300
Subject: RFR(S): 8233113: ARM32: assert on UnsafeJlong mutex rank check
In-Reply-To: <df1adebe-31d5-8cc2-d8c7-a22e2397c7b9@oracle.com>
References: <5124def3-3bf7-8425-557d-c6cba6192927@bell-sw.com>
 <df1adebe-31d5-8cc2-d8c7-a22e2397c7b9@oracle.com>
Message-ID: <5d0e1ad7-0ffa-54a1-fbc7-23aecc367c0c@bell-sw.com>

David, thank you!

Dear all,

   Can anybody else take a look at the review please?
   Or should I consider the change trivial?

thanks,
Boris

On 11.11.2019 13:56, David Holmes wrote:
> Hi Boris,
> 
> This seems fine to me.
> 
> Thanks,
> David
> 
> On 8/11/2019 11:28 pm, Boris Ulasevich wrote:
>> Hi,
>>
>> Recent JDK-8184732 change adds the assertion that fires on UnsafeJlong 
>> mutex rank check, on platforms without 64 bit atomics 
>> compare-and-exchange support. On preliminary review (thanks to Coleen 
>> and David!) it is suggested to remove the assertion and corresponding 
>> test codes.
>>
>> http://bugs.openjdk.java.net/browse/JDK-8233113
>> http://cr.openjdk.java.net/~bulasevich/8233113/webrev.01
>>
>> Thanks,
>> Boris

From adinn at redhat.com  Mon Nov 18 15:33:16 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 18 Nov 2019 15:33:16 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <b4eb9ac9-330c-f1f2-fa99-54b5879b0f14@redhat.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <b4eb9ac9-330c-f1f2-fa99-54b5879b0f14@redhat.com>
Message-ID: <97886def-9bac-ec7f-8ece-d58cdaeb146c@redhat.com>

On 18/11/2019 12:57, Aleksey Shipilev wrote:
> If you want to say that SPECjbb2015 does not show improvement, then that is because we (well, me
> myself!) specifically argued during its development that the lock usages there should explore
> something beyond biased locking. Which is why locking paths there are more or less contended, so
> that locks get out of their biased state. Therefore, arguing that biased locking is not needed
> because SPECjbb2015 does not show the benefit of having it enabled -- is circular.

I think that comment definitely constitute's shooting someone's fox
(i.e. the excu^H^H^H^H rationale offered for this action has just been
erased).

>> We'd like to know the impact on real applications but we have no way to know that a-priori. So we're
>> either stuck with the burden of supporting biased-locking forever, or we flip the switch to turn it
>> off and see if it causes too many issues. Unless you see another way to determine this?
> . . .
> At very least, deprecating the flag is unwarranted at this point, until we are totally sure it is
> not needed. You could make it disabled by default and collect complaints, though. That would take a
> few short-term releases for most interested parties to catch up with this.

That was my immediate thought in response to the above. Why would you
rush to deprecate (and eventually /remove/) something you still don't
know the true merits of? That doesn't engender confidence that this has
been thought through properly or decided rationally. Combined with
Aleksey's revelation above I think the the justification for adopting
this 'fix' requires a reboot, preferably from cold.

> Over and out. Don't rush this, please.
Agreed.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From shade at redhat.com  Mon Nov 18 15:53:42 2019
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 18 Nov 2019 16:53:42 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <4d2be622-d6af-cc89-159a-45c92c25d9ba@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <428b5bd0-3519-9ec6-1be1-d965f1b14e1e@redhat.com>
 <4d2be622-d6af-cc89-159a-45c92c25d9ba@oracle.com>
Message-ID: <b560a602-656f-f3fa-0ed2-c2db38ff9b40@redhat.com>

On 11/18/19 2:50 PM, David Holmes wrote:
>> ?? http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-November/004333.html
> 
> The discussion is very similar to the to-and-fro we have also had in Oracle over the past couple of
> years. Some benchmarks show benefit from biased-locking because they (over?) use uncontended locking
> which is exactly what biased-locking was designed to cheapen. But do we care about such benchmarks?
> Do they tell us anything interesting about the impact on real applications?

For the benchmarks that exercise library code, I believe they serve as data points against the idea
that "real code" is already (re)written in such a way that effectively unsynchronized accesses never
acquire (biased) locks. Also partially explains why users see regressions on their stacks with BL
disabled. For example, I suspect the regressions on Xml* workloads is something about what old^W
stable Xalan/Xerces code does (maybe self-written synchronized collections?).

> Can you share which users complained, and for what applications?

Alas, assorted private communications, so I am not at liberty to share the details. But some of
those are proverbial "real applications". This is why I say that disabling BL is questionable from
the performance standpoint, and deprecating the flag is dubious today.

Note that Shenandoah basically did the limited form of the experiment you seem to suggest (disable
BL by default and see if there are many complaints) -- and there were substantial throughput losses
which made us bite the latency bullet and enable BL back. This tells me the larger experiment would
yield even more regression reports, hopefully public ones!

> Well as I said we've already been sitting on this for quite a while. :)

It does not matter how much time anyone spent internally reminiscing about this. What is relevant is
how much time the external users got to notice the regressions and provide their input before the
feature is deprecated/purged. Based on what we had with Shenandoah, I would say it would take a few
short-term-support releases to gather that data. I believe deprecation should be off the table until
that happens.

My bottom line: disabling biased locking experimentally is maybe okay, if you need more data to make
the decision, especially if that default is contained in a short-term release; deprecating BL is not
okay at current juncture.

>From another perspective, if the intent here is to ignore all performance regressions for the sake
of better maintainability, I get that part! But then we should be upfront about this cost, instead
of arguing that No True Scotsman^W^W Real Application is affected. A JEP is probably in order, that
would capture risks, assumptions, and the mitigations that library and application writers can
deploy to recuperate the throughput losses (like "don't use synchronized on the paths are provably
not contended ever; avoid StringBuilder, Vector, Collections.synchronized*(); rewrite synchronized
to atomics/stamped locks; etc.").

-- 
Thanks,
-Aleksey


From aph at redhat.com  Mon Nov 18 16:26:58 2019
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Nov 2019 16:26:58 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
Message-ID: <2273f036-893c-2ccb-c240-d832e7bcd717@redhat.com>

On 11/18/19 11:27 AM, David Holmes wrote:
> For a micro-benchmark like that sure. But is that at all representative 
> of real modern code? We know some of the really old benchmarks used 
> synchronized collections and StringBuffer extensively and so they also 
> benefit from biased-locking. But more modern benchmarks are not showing 
> any benefit.

I'm not really surprised. That's probably because modern benchmarks
are using ReentrantLocks, which are just as slow (or fast, choose your
poison) as monitors, but with no way to use biasing for uncontended
cases. So of course such benchmarks don't benefit from biased locking,
more's the pity.

> We'd like to know the impact on real applications but we have no way
> to know that a-priori. So we're either stuck with the burden of
> supporting biased-locking forever, or we flip the switch to turn it
> off and see if it causes too many issues. Unless you see another way
> to determine this?

Here's my completely unbiased (:-) take on it:

We're looking at a performance regression with nothing in return
beyond our own convenience. Which is nice (hey, I don't like that
biased locking code, either, I had to the stubs for AArch64 and they
gave me a headache) but I'm wondering who we should prioritize, us or
our users. And that is the choice, isn't it?

Sometimes I think we don't realize what we've got.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From daniel.daugherty at oracle.com  Mon Nov 18 17:36:46 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 18 Nov 2019 12:36:46 -0500
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
Message-ID: <cd392333-5bff-fe94-fb25-03af1e88dc5c@oracle.com>

Hi David,


On 11/17/19 9:30 PM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/

src/hotspot/share/runtime/thread.hpp
 ??? Nice catch!

src/hotspot/share/runtime/thread.cpp
 ??? Nice catch!

 ??? Not your issue, but these two lines feel strange/wrong:

 ?? ? ?? L1008: ? // Allow non Java threads to call this without stack_base
 ??????? L1009: ? if (_stack_base == NULL) return true;

 ??? When _stack_base is NULL, any 'adr' is in the caller's stack? The
 ??? comment is not helping understand why this is so...

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaThread.java
 ??? Nice catch!

 ??? Again, not your issue, but these four lines are questionable:

 ? ? ? ? L383???? Address sp????? = lastSPDbg();
 ??????? L384???? Address stackBase = getStackBase();
 ??????? L385???? // Be robust
 ??????? L386???? if (sp == null) return false;

 ??? I can see why a NULL sp would cause a "false" return since obviously
 ??? something is a amiss in the frame. However, the C++ code doesn't make
 ??? this check so why does the SA code?

 ??? And this code doesn't check stackBase == NULL so it's not matching
 ??? the C++ code either.


Thumbs up on the change itself. My queries above and below might warrant
new bugs or RFEs to be filed.

>
> This was a very difficult bug to track down and I want to publicly 
> acknowledge and thank the jemalloc folk (users and developers) for 
> continuing to investigate this issue from their side. Without their 
> persistence this issue would have languished.

You also deserve thanks for sticking with this bug: Thanks David!!


> The thread stack_base() is the first address above the thread's stack. 
> However, the "in stack" checks performed by Thread::on_local_stack and 
> Thread::is_in_stack allowed the checked address to be equal to the 
> stack_base() - which is not correct. Here's how this manifests as the 
> bug:
>
> - Let a JavaThread instance, T2, be allocated at the end of thread 
> T1's stack i.e. at T1->stack_base()
> ? [This seems to be why this only reproduced with jemalloc.]
> - Let T2 lock an inflated monitor
> - Let T1 try to lock the same monitor
> ? - T1 would consider the _owner field value (T2) as being in its 
> stack and so consider the monitor stack-locked by T1
> ? - And so both T1 and T2 would have ownership of the monitor allowing 
> the monitor state (and application state) to be corrupted. This 
> results in a range of hangs and crashes depending on the exact 
> interleaving.

Ouch!

So I was wondering how this bug could happen with the thread alignment
logic that we have in place... search for the _real_malloc_address stuff...

And then I noticed that the logic only kicks in when UseBiasedLocking == 
true
(and this bug says it doesn't happen with -XX:-UseBiasedLocking):

src/hotspot/share/runtime/thread.cpp:

// ======= Thread ========
// Support for forcing alignment of thread objects for biased locking
void* Thread::allocate(size_t size, bool throw_excpt, MEMFLAGS flags) {
 ? if (UseBiasedLocking) {
 ??? const size_t alignment = markWord::biased_lock_alignment;
 ??? size_t aligned_size = size + (alignment - sizeof(intptr_t));
 ??? void* real_malloc_addr = throw_excpt? AllocateHeap(aligned_size, 
flags, CURRENT_PC)
 ????????????????????????????????????????? : AllocateHeap(aligned_size, 
flags, CURRENT_PC,
AllocFailStrategy::RETURN_NULL);
 ??? void* aligned_addr???? = align_up(real_malloc_addr, alignment);
 ??? assert(((uintptr_t) aligned_addr + (uintptr_t) size) <=
 ?????????? ((uintptr_t) real_malloc_addr + (uintptr_t) aligned_size),
 ?????????? "JavaThread alignment code overflowed allocated storage");
 ??? if (aligned_addr != real_malloc_addr) {
 ????? log_info(biasedlocking)("Aligned thread " INTPTR_FORMAT " to " 
INTPTR_FORMAT,
 ????????????????????????????? p2i(real_malloc_addr),
 ????????????????????????????? p2i(aligned_addr));
 ??? }
 ??? ((Thread*) aligned_addr)->_real_malloc_address = real_malloc_addr;
 ??? return aligned_addr;
 ? } else {
 ??? return throw_excpt? AllocateHeap(size, flags, CURRENT_PC)
 ?????????????????????? : AllocateHeap(size, flags, CURRENT_PC, 
AllocFailStrategy::RETURN_NULL);
 ? }
}


The logging logic above:

 ??? if (aligned_addr != real_malloc_addr) {
 ????? log_info(biasedlocking)("Aligned thread " INTPTR_FORMAT " to " 
INTPTR_FORMAT,
 ????????????????????????????? p2i(real_malloc_addr),
 ????????????????????????????? p2i(aligned_addr));
 ??? }

allows for real_malloc_addr to be the same as aligned_addr sometimes
(and no log message is issued), but I'm not sure from spelunking in
code whether it's really possible for:

 ??? void* aligned_addr???? = align_up(real_malloc_addr, alignment);

to return aligned_addr == real_malloc_addr. In other words, if
real_malloc_addr is already aligned perfectly, does align_up() still
change that value?

If it is possible for (aligned_addr == real_malloc_addr), then it is
possible for this bug to happen without jemalloc.

I've convinced myself that this is possible because of this line:

 ??? size_t aligned_size = size + (alignment - sizeof(intptr_t));

If real_malloc_addr is already aligned perfectly and align_up()
always changed the input address, then the aligned_size would be
too small by sizeof(intptr_t) and we would have seen a buffer
overwrite like that over the many, many years.

So my conclusion is that it should be possible for this bug to
happen without jemalloc, but it would have to be rare.


> Interestingly Thread::is_in_usable_stack does not have this bug.

So we have Thread::is_in_usable_stack(), Thread::on_local_stack() and
Thread::is_in_stack()? I haven't compared all three side by side, but
there might be some cleanup work that can be done here (in a different
bug).


>
> The bug can be tracked way back to JDK-6699669 as explained in the bug 
> report. That issue also showed that the same bug existed in the SA 
> implementations of these "on stack" checks.

Ouch! JDK-6699669 was fixed in jdk7-B56 and looks like it was pushed
to the jdk6u train... so this bug goes back quite a ways...

Outstanding hunt David!

Dan


>
> Testing:
> ? - The reproducer from the bug report, using jemalloc, ran over 5000 
> times without failing in any way.
> ? - tiers 1-3 on all Oracle platforms
> ? - serviceability/sa tests
>
> Thanks,
> David
> -----


From jianglizhou at google.com  Mon Nov 18 19:01:17 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 18 Nov 2019 11:01:17 -0800
Subject: RFR: JDK-8230413: Support Pre JDK 6 class with CDS
In-Reply-To: <CALrW1jx1uv=PU3Y2bfhOLM3ibCcQQ4pt1z=TiC45S2Mprcr0gg@mail.gmail.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
 <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
 <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>
 <fd3d57f2-6f1b-7b3d-0601-6f2c36182559@oracle.com>
 <CALrW1jx1uv=PU3Y2bfhOLM3ibCcQQ4pt1z=TiC45S2Mprcr0gg@mail.gmail.com>
Message-ID: <CALrW1jx08tzZ6PBv_zsqLTcO8JSh0Piu1Zsz36jiACF6QwFmaw@mail.gmail.com>

After off-mailing list discussions regarding the support for archiving
pre-JDK-6 classes, we would like to seek additional input and feedback
on this topic. Please reply to this email thread if the general
support for archiving pre-JDK-6 class is a requirement in your use
case, both with verification enabled and disabled. Thanks!

Best,

Jiangli

On Wed, Nov 13, 2019 at 9:23 AM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> Hi David,
>
> On Tue, Nov 12, 2019 at 10:00 PM David Holmes <david.holmes at oracle.com> wrote:
> >
> > Hi Jiangli,
> >
> > On 13/11/2019 12:20 pm, Jiangli Zhou wrote:
> > > Hi Harold and Ioi,
> > >
> > > Thanks a lot for the additional feedback.
> > >
> > > I did some quick research today about -Xverify:none usages. My finding
> > > showed that the use of -Xverify:none is not very uncommon in some
> > > cases. Here are some of the usages:
> > >
> > > - trusted tools
> >
> > But what is the context? Is it:
> >
> > "I trust this tool, and all other classes, so I'll optimize by disabling
> > verification,"; or
>
> This is the case. For a tool that's developed by a user and properly
> compiled by javac, user may want to disable class verification when
> running the tool.
>
> >
> > "This tool produces non-verifiable classfiles, but I trust the tool and
> > so will disable verification" (which implicitly means all
> > classes/libraries have to be fully trusted)
> >
> > ?
> >
> > I'm not sure you can use any existing uses of -Xverify:none to infer the
> > applicability or not to what is being proposed here for CDS.
>
> In above example, CDS dump time forces verification for the tool's
> classes as long as they are placed in -cp path. Without CDS involved,
> users choice is honored. I feel this usage may be a lurking issue when
> more users start to use CDS/AppCDS.
>
> Harold, Ioi and I have a discussion for pre-jdk-6 verification off the
> mailing list, since verification is security related and may be
> sensitive. I'll loop you in. It's possible we may be able to separate
> the pre-jdk-6 class problem from the general CDS -Xverify:none topics.
>
> >
> > > - some limited testing environment
> > >
> > > CDS (particularly with dynamic archiving capability) may help avoid
> > > runtime verification overhead by verifying classes at dump time and
> > > reduce the needs for -Xverify:none. It would be good to have
> > > strategies for the following senators as well when removing
> > > -Xverify:none:
> > >
> > > 1) In cases when shared archive is disabled at runtime (I hope it's
> > > not common cases)
> >
> > I'm not quite sure what you are saying here. If a pre-verified archive
> > can't be used at runtime then normal verification should occur as
> > classes are not being loaded from a known pre-verified location.
>
> CDS/AppCDS are still not widely adopted yet. When users start to learn
> more about CDS/AppCDS capability, they may still choose to not use the
> feature based on their specific requirements. For example, a user may
> choose to not use AppCDS and also turn off the default CDS.
>
> >
> > > 2) When users want to reduce the overhead caused by verification
> > > during archiving dump time
> >
> > I would not expect dumping to be such a time critical activity that
> > users would care about the "overhead" of verification.
>
> With dynamic archiving, dump time performance can be more important to users.
>
> Best,
>
> Jiangli
>
> >
> > Cheers,
> > David
> >
> > > Thoughts?
> > >
> > > Best,
> > > Jiangli
> > >
> > > On Tue, Nov 12, 2019 at 4:16 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> > >>
> > >> I am also a little worried that this might send the wrong message -- "if
> > >> you want to archive pre-JDK6 classes, you need to disable verification
> > >> altogether for all classes in your entire app".
> > >>
> > >> Thanks
> > >> - Ioi
> > >>
> > >> On 11/12/19 12:40 PM, Harold Seigel wrote:
> > >>> Hi Jiangli,
> > >>>
> > >>> I think this change is going in the wrong direction.  We are trying to
> > >>> discourage disabling verification, not encourage it.  We also do not
> > >>> want to create more use-cases for preserving -Xverify:none.
> > >>>
> > >>> It looks like your change would allow archiving of unverified pre-JDK6
> > >>> classes, but not allow archiving of verified pre-JDK6 classes.  If so,
> > >>> that seems backward.
> > >>>
> > >>> Thanks, Harold
> > >>>
> > >>> On 11/11/2019 11:53 PM, Ioi Lam wrote:
> > >>>> I wonder if there's a safer alternative. Are there tools that can add
> > >>>> stackmaps to pre-JDK6 classes? That way they can be verified with the
> > >>>> split verifier during CDS dump time.
> > >>>>
> > >>>> Thanks
> > >>>> - Ioi
> > >>>>
> > >>>> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> > >>>>> Hi David,
> > >>>>>
> > >>>>> Thanks for quick response!
> > >>>>>
> > >>>>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes
> > >>>>> <david.holmes at oracle.com> wrote:
> > >>>>>> Hi Jiangli,
> > >>>>>>
> > >>>>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> > >>>>>>> Please review the following change that allows archiving
> > >>>>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
> > >>>>>>>
> > >>>>>>> webrev: http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> > >>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> > >>>>>>>
> > >>>>>>> Currently there are still large number of existing classes
> > >>>>>>> (pre-built)
> > >>>>>>> with older class versions (< 50) in real world applications. Those
> > >>>>>>> classes are missing the benefit of archiving. Particularly, in some
> > >>>>>>> use cases, class verification can be safely disabled. For those use
> > >>>>>>> cases, supporting archiving pre JDK 6 classes shows good performance
> > >>>>>>> benefit. We can re-evaluate this support when -Xverify:none is
> > >>>>>>> removed
> > >>>>>>> in the future, hopefully the needs for supporting class version < 50
> > >>>>>>> is no longer significant at that time.
> > >>>>>>>
> > >>>>>>> This change brings back the pre-JDK-8198849 behavior. Runtime makes
> > >>>>>>> sure the dump-time verification mode must be the same or stronger
> > >>>>>>> than
> > >>>>>>> the current mode.
> > >>>>>>>
> > >>>>>>> A CSR may be needed for the change. Any thoughts on that?
> > >>>>>> A CSR request is definitely required given that you are proposing to
> > >>>>>> undo a change that was itself put in place via a CSR request! And
> > >>>>>> given
> > >>>>>> this is relaxing a "defense-in-depth" check which will result in
> > >>>>>> increasing exploitability, I think you will need a very strong
> > >>>>>> argument
> > >>>>>> to justify this.
> > >>>>> Thanks for confirming this! Will do.
> > >>>>>
> > >>>>>> Further this not only undoes JDK-8197972 but it also invalidates
> > >>>>>> JDK-8155671 being closed as a duplicate of JDK-8197972. JDK-8155671
> > >>>>>> requested a way to know if verification had been disabled, to help
> > >>>>>> with
> > >>>>>> analyzing crash reports, but instead we decided to not allow
> > >>>>>> verification to be disabled.
> > >>>>> I had some concerns about JDK-8155671 initially before making the
> > >>>>> change, as it's a closed bug and my memory about the specific issue
> > >>>>> was flushed out. I brought up the question in the bug. My take on
> > >>>>> Ioi's response to my query about JDK-8155671 was that the
> > >>>>> pre-JDK-8197972 behavior would not cause any security hole.
> > >>>>>
> > >>>>> Re-evaluating this particular behavior, I think the pre-JDK-8155671
> > >>>>> would actually matches user intention better. If user decides to turn
> > >>>>> off verification in safe use cases, it seems to be a good idea to
> > >>>>> honor that. With the new dynamic archiving capability, archive could
> > >>>>> be created at the first time when running a particular application.
> > >>>>> Not forcing verification when user decides to can avoid
> > >>>>> unnecessary/unwanted overhead.
> > >>>>>
> > >>>>> If verification is turned off at dump time for application classes,
> > >>>>> runtime does not allow execution without also turning off
> > >>>>> verification. We can determine a crash is not caused by relaxed dump
> > >>>>> time verification.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Jiangli
> > >>>>>
> > >>>>>> David
> > >>>>>> -----
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> Tested with jtreg appcds tests.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jiangli
> > >>>>>>>
> > >>>>
> > >>

From igor.ignatyev at oracle.com  Mon Nov 18 19:58:00 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 18 Nov 2019 11:58:00 -0800
Subject: RFR(S) : 8234304 : runtime/cds/appcds/javaldr tests should use driver
 mode
Message-ID: <A304F4E9-F73E-4DF1-979A-F9671FA15573@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
> 30 lines changed: 0 ins; 5 del; 25 mod;

Hi all,

could you please review this small patch which replaces 'main/othervm' w/ 'driver' in runtime/cds/appcds/javaldr tests?
from JBS:
> runtime/cds/appcds/javaldr tests do actual testing in child processes, the main test class makes preparations, starts JDK under test, and verifies its results. so there is no need for the main test class to be run in JDK under test w/ all external flags.
the patch also replaces "X" w/ X.class.getName() in all the places these tests refer to a class, which made explicit @build action unnecessary. 

webrev: http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
JBS: https://bugs.openjdk.java.net/browse/JDK-8234304
testing: 
 - runtime/cds/appcds/javaldr on macosx-x64 w/ all flags used in Oracle CI
 - runtime/cds/appcds/javaldr on linux-x64,windows-x64,macosx-x64,solaris-sparcv9 in OOB
 - runtime/cds/appcds/javaldr one by one on macosx-x64

Thanks,
-- Igor

From patricio.chilano.mateo at oracle.com  Mon Nov 18 20:29:12 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Mon, 18 Nov 2019 17:29:12 -0300
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <428b5bd0-3519-9ec6-1be1-d965f1b14e1e@redhat.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <428b5bd0-3519-9ec6-1be1-d965f1b14e1e@redhat.com>
Message-ID: <3d6c8ef9-ffe8-e1be-52ce-6c867d28658f@oracle.com>

Thanks all for chiming in to this one. I knew this was going to be a 
controversial patch and I really appreciate your feedback. The idea was 
also to get enough feedback to make the best decision and not just to 
push it without general consent.

@Aleksey,

On 11/18/19 7:55 AM, Aleksey Shipilev wrote:
> (lurking off the parental leave to point out a few things)
>
> There way too few details in the RFE report to make the informed decision. Also, removing it when
> thread-local handshakes are finally there to make unbiases/rebiases much less painful for
> performance is quite odd to see.
Yes, now we have TLH for bias revocations which allows us to avoid 
safepoints (except for the bulk ones). I run benchmarks (Specjbb2005, 
Specjbb2015, and Specjvm2008 on Linux and Windows) though and didn't see 
regressions nor improvements compared to when we used safepoints. With 
handshakes we don't have to stop all JavaThreads but it is still not 
cheap for the revocation itself (I've measured cases where the actual 
time to revoke with handshake was more than with safepoint). And today 
we still have to go through the VMThread which can be a bottleneck, 
although I'm working on 8230594 to allow direct handshakes between 
JavaThreads.

> On 11/18/19 12:27 PM, David Holmes wrote:
>> For a micro-benchmark like that sure. But is that at all representative of real modern code? We know
>> some of the really old benchmarks used synchronized collections and StringBuffer extensively and so
>> they also benefit from biased-locking. But more modern benchmarks are not showing any benefit.
> If you want to say that SPECjbb2015 does not show improvement, then that is because we (well, me
> myself!) specifically argued during its development that the lock usages there should explore
> something beyond biased locking. Which is why locking paths there are more or less contended, so
> that locks get out of their biased state. Therefore, arguing that biased locking is not needed
> because SPECjbb2015 does not show the benefit of having it enabled -- is circular.
Ok, I didn't really know the specifics of how SPECjbb2015 was designed. 
Yes, the measurements show no significant changes on that one. Here are 
some results running on a Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz 
processor for example:

Oracle Linux 6.4 criticaljOPS??? ??? ??? ??? ??? ??? ??? ?? ? ??? maxjOPS
-XX:+UseBiasedLocking 22635.00??? ??? ??? ??? ??????????????????? 31012.00
-XX:-UseBiasedLocking 22885.00??? ??? ??? ??? ??????????????????? 31196.67
GNU/Linux 4.1.12
-XX:+UseBiasedLocking ?? ??? ??? ??? ??? ??? ??? ??? ??? ?? 24277.00 ?? 
 ??? ??? ??? ?? ??? ??? ??? ??? ? 41571.00
-XX:-UseBiasedLocking 24583.00 ? ??? ??? ??? ??? ??? ??? ??? ??? ? 41500.00

The benchmark was actually run several times with different tunning 
flags but the results are similar (thanks to Thomas Schatzl for running 
them! )

I did see a 2-3% regression on Specjbb2005 and some Specjvm2008 as you 
mentioned. But whenever there is a new patch we always seem to just rely 
on how Specjbb2015 performs, so that's why I assumed that maybe those 
benchmarks weren't that representative today.

>> We'd like to know the impact on real applications but we have no way to know that a-priori. So we're
>> either stuck with the burden of supporting biased-locking forever, or we flip the switch to turn it
>> off and see if it causes too many issues. Unless you see another way to determine this?
> We (in Shenandoah) were back and forth on heuristically enabling/disabling UseBiasedLocking. When we
> did disable it by default, we had users complain about performance penalties against other
> collectors. Which lead us to reinstating it back, and it was also visible on some SPECjvm2008
> benchmarks:
>    http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-November/004333.html
>
> At very least, deprecating the flag is unwarranted at this point, until we are totally sure it is
> not needed. You could make it disabled by default and collect complaints, though. That would take a
> few short-term releases for most interested parties to catch up with this.
>
> Over and out. Don't rush this, please.
Ok, yours and others concerns seem reasonable. I think I got enough 
feedback from this RFR. If we actually want to continue with this I'll 
start a JEP then.

Thanks!

Patricio
> -Aleksey
>


From ioi.lam at oracle.com  Mon Nov 18 20:50:16 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 18 Nov 2019 12:50:16 -0800
Subject: RFR(S) : 8234304 : runtime/cds/appcds/javaldr tests should use
 driver mode
In-Reply-To: <A304F4E9-F73E-4DF1-979A-F9671FA15573@oracle.com>
References: <A304F4E9-F73E-4DF1-979A-F9671FA15573@oracle.com>
Message-ID: <625948c7-8d44-f958-59fa-52ddde06fba1@oracle.com>

Looks good to me. Thanks for fixing this.

- Ioi

On 11/18/19 11:58 AM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
>> 30 lines changed: 0 ins; 5 del; 25 mod;
> Hi all,
>
> could you please review this small patch which replaces 'main/othervm' w/ 'driver' in runtime/cds/appcds/javaldr tests?
> from JBS:
>> runtime/cds/appcds/javaldr tests do actual testing in child processes, the main test class makes preparations, starts JDK under test, and verifies its results. so there is no need for the main test class to be run in JDK under test w/ all external flags.
> the patch also replaces "X" w/ X.class.getName() in all the places these tests refer to a class, which made explicit @build action unnecessary.
>
> webrev: http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234304
> testing:
>   - runtime/cds/appcds/javaldr on macosx-x64 w/ all flags used in Oracle CI
>   - runtime/cds/appcds/javaldr on linux-x64,windows-x64,macosx-x64,solaris-sparcv9 in OOB
>   - runtime/cds/appcds/javaldr one by one on macosx-x64
>
> Thanks,
> -- Igor


From ioi.lam at oracle.com  Mon Nov 18 21:21:47 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 18 Nov 2019 13:21:47 -0800
Subject: Should Java support ERROR_NO_MORE_FILES when canonicalizing paths
 on Windows?
In-Reply-To: <346034750.20191114204642@am-soft.de>
References: <346034750.20191114204642@am-soft.de>
Message-ID: <feced9f8-a16a-b01c-4c77-5c6259b49c92@oracle.com>

Hi Thorsten, since you have problems filing bugs on JBS, I filed the 
following bug on your behalf.

https://bugs.openjdk.java.net/browse/JDK-8234363

I have not investigated the issue in detail yet. How often do you see 
ERROR_NO_MORE_FILES happening? Have you checked if your process 
(apache?) has too many open files such that FindFirstFileW is not able 
to open the directory to get a file listing?

If that is indeed the case, I am not sure what's the best way of 
handling it. If resource (file descriptors) are running out, perhaps the 
current behavior of throwing an exception in 
WinNTFileSystem.canonicalize0() would be better than just ignoring it 
and return an incorrect result. But I'll defer to the folks on the 
core-libs team.

Thanks
- Ioi


On 11/14/19 11:46 AM, Thorsten Sch?ning wrote:
> Hi all,
>
> while the details can be read on SO[1][2], the bottom line is that I
> have a setup in which Windows sets the error code ERROR_NO_MORE_FILES
> during calls to FindFirstFileW sometimes. In theory that shouldn't
> happen, but it simply does once in a while and make my Java daemon run
> into exceptions, because that error code isn't expected.
>
> The following lists all expected error codes from the function
> "lastErrorReportable", which is used during canonicalizing paths:
>
>>     if ((errval == ERROR_FILE_NOT_FOUND)
>>         || (errval == ERROR_DIRECTORY)
>>         || (errval == ERROR_PATH_NOT_FOUND)
>>         || (errval == ERROR_BAD_NETPATH)
>>         || (errval == ERROR_BAD_NET_NAME)
>>         || (errval == ERROR_ACCESS_DENIED)
>>         || (errval == ERROR_NETWORK_UNREACHABLE)
>>         || (errval == ERROR_NETWORK_ACCESS_DENIED)) {
>>         return 0;
>>     }
> https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libjava/canonicalize_md.c#L131
>
> Obviously ERROR_NO_MORE_FILES is missing, but its pretty interesting
> as well that Java supports other error codes already. Regarding the
> docs, FindFirstFileW should only return ERROR_FILE_NOT_FOUND, but most
> likely people ran into other error codes already Windows used in
> various circumstances. All of those totally make sense.
>
> So, how about adding ERROR_NO_MORE_FILES there as well?
>
> As can be read in my SO-questions, I didn't recognized any other real
> technical problem like permissions issues, timeouts in the network or
> stuff. Its only that Windows sometimes decides to use that error code
> for some unknown reason. Adding it would only be one line of text and
> increase compatibility with setups like mine. I guess problems like
> these were the reason to add other error codes in the past as well.
>
> Thanks!
>
> [1]: https://stackoverflow.com/questions/58825588/does-java-need-to-support-error-no-more-files-when-canonicalizing-paths-on-windo
> [2]: https://stackoverflow.com/questions/58825963/when-does-findfirstfilew-set-last-error-to-be-error-no-more-files-instead-of-err?noredirect=1&lq=1
>
> Mit freundlichen Gr??en,
>
> Thorsten Sch?ning
>


From mikhailo.seledtsov at oracle.com  Mon Nov 18 22:11:29 2019
From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com)
Date: Mon, 18 Nov 2019 14:11:29 -0800
Subject: RFR(S) : 8234304 : runtime/cds/appcds/javaldr tests should use
 driver mode
In-Reply-To: <625948c7-8d44-f958-59fa-52ddde06fba1@oracle.com>
References: <A304F4E9-F73E-4DF1-979A-F9671FA15573@oracle.com>
 <625948c7-8d44-f958-59fa-52ddde06fba1@oracle.com>
Message-ID: <1bab40c0-94da-1e49-f44a-0a6980421e80@oracle.com>

+1

On 11/18/19 12:50 PM, Ioi Lam wrote:
> Looks good to me. Thanks for fixing this.
>
> - Ioi
>
> On 11/18/19 11:58 AM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
>>> 30 lines changed: 0 ins; 5 del; 25 mod;
>> Hi all,
>>
>> could you please review this small patch which replaces 
>> 'main/othervm' w/ 'driver' in runtime/cds/appcds/javaldr tests?
>> from JBS:
>>> runtime/cds/appcds/javaldr tests do actual testing in child 
>>> processes, the main test class makes preparations, starts JDK under 
>>> test, and verifies its results. so there is no need for the main 
>>> test class to be run in JDK under test w/ all external flags.
>> the patch also replaces "X" w/ X.class.getName() in all the places 
>> these tests refer to a class, which made explicit @build action 
>> unnecessary.
>>
>> webrev: 
>> http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234304
>> testing:
>> ? - runtime/cds/appcds/javaldr on macosx-x64 w/ all flags used in 
>> Oracle CI
>> ? - runtime/cds/appcds/javaldr on 
>> linux-x64,windows-x64,macosx-x64,solaris-sparcv9 in OOB
>> ? - runtime/cds/appcds/javaldr one by one on macosx-x64
>>
>> Thanks,
>> -- Igor
>

From igor.ignatyev at oracle.com  Mon Nov 18 22:14:13 2019
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 18 Nov 2019 14:14:13 -0800
Subject: RFR(S) : 8234304 : runtime/cds/appcds/javaldr tests should use
 driver mode
In-Reply-To: <1bab40c0-94da-1e49-f44a-0a6980421e80@oracle.com>
References: <A304F4E9-F73E-4DF1-979A-F9671FA15573@oracle.com>
 <625948c7-8d44-f958-59fa-52ddde06fba1@oracle.com>
 <1bab40c0-94da-1e49-f44a-0a6980421e80@oracle.com>
Message-ID: <64E60E47-8181-4511-9D1D-4152C9F00BD1@oracle.com>

Ioi, Misha,

thanks for your review, pushed.

-- Igor

> On Nov 18, 2019, at 2:11 PM, mikhailo.seledtsov at oracle.com wrote:
> 
> +1
> 
> On 11/18/19 12:50 PM, Ioi Lam wrote:
>> Looks good to me. Thanks for fixing this.
>> 
>> - Ioi
>> 
>> On 11/18/19 11:58 AM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
>>>> 30 lines changed: 0 ins; 5 del; 25 mod;
>>> Hi all,
>>> 
>>> could you please review this small patch which replaces 'main/othervm' w/ 'driver' in runtime/cds/appcds/javaldr tests?
>>> from JBS:
>>>> runtime/cds/appcds/javaldr tests do actual testing in child processes, the main test class makes preparations, starts JDK under test, and verifies its results. so there is no need for the main test class to be run in JDK under test w/ all external flags.
>>> the patch also replaces "X" w/ X.class.getName() in all the places these tests refer to a class, which made explicit @build action unnecessary.
>>> 
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8234304/webrev.00/index.html
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8234304
>>> testing:
>>>   - runtime/cds/appcds/javaldr on macosx-x64 w/ all flags used in Oracle CI
>>>   - runtime/cds/appcds/javaldr on linux-x64,windows-x64,macosx-x64,solaris-sparcv9 in OOB
>>>   - runtime/cds/appcds/javaldr one by one on macosx-x64
>>> 
>>> Thanks,
>>> -- Igor
>> 


From daniel.daugherty at oracle.com  Mon Nov 18 22:23:59 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 18 Nov 2019 17:23:59 -0500
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
Message-ID: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>

Greetings,

I have another round of baseline cleanup changes from the Async Monitor
Deflation project (8153224). Like previous sub-tasks of 8153224, these
changes are small and/or trivial. These changes have previously been
reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).

Vladimir K., if you could sanity check the cleanups in 
macroAssembler_x86.cpp
that would be appreciated (only comments were changed). I recommend the
Udiff view...

Please see the bug for details about the changes in this webrev:

 ??? JDK-8230876 baseline cleanups from Async Monitor Deflation v2.0[789]
 ??? https://bugs.openjdk.java.net/browse/JDK-8230876

Here's the webrev URL:

 ??? http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/

These changes have been included in my recent rounds of Mach5 Tier[1-8]
and other associated stress and/or performance testing. I have also done
a Mach5 Tier[1-3] run with just this patch to make sure that I got all
the pieces that are needed.

Thanks, in advance, for any comments, questions or suggestions.

Dan

From daniel.daugherty at oracle.com  Mon Nov 18 23:06:17 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 18 Nov 2019 18:06:17 -0500
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
Message-ID: <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>

Hi Patricio,

On 11/15/19 9:15 PM, Patricio Chilano wrote:
> Hi all,
>
> Could you review the following patch?
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev

src/hotspot/share/runtime/arguments.cpp
 ??? Is it too early to specify the obsolete_in and expired_in values?
 ??? They could be JDK_Version::undefined() so that all you are doing
 ??? is deprecation in this changeset.

src/hotspot/share/runtime/globals.hpp
 ??? No comments.

test/hotspot/gtest/oops/test_markWord.cpp
 ??? L96: ??? // Can't test this with biased locking disabled.
 ??????? Perhaps (since the comment is inside the if-statement):
 ???????????? // This sub-test requires biased locking to be enabled.

 ??? L11[135] - Why indent the pre-processor controls? Left most
 ??????? column is generally the style used.

 ??? L115: ? // Same thread tries to lock it again.
 ??????? This comment needs a rewrite. Perhaps:
 ??????????? // Lock the object using an ObjectLocker helper which
 ??????????? // will revoke the bias if we happened to use that
 ??????????? // mechanism above.

 ??? L121: ? // This is no longer biased, because ObjectLocker revokes 
the bias.
 ??????? This comment needs a rewrite. Perhaps:
 ??????????? // The object should be unlocked with no hashCode at
 ??????????? // this point (ObjectLocker dtr has run).

test/jdk/jdk/jfr/event/runtime/TestBiasedLockRevocationEvents.java
 ??? No comments.

Thumbs up! My comments are mostly nits so I don't need to see a new
webrev if you decide to make changes based on my suggestions.

As for the whole "too soon to deprecate" discussion: Deprecation is not
making the code obsolete so this changeset is not taking anything away
other than changing the default of UseBiasedLocking from true to false.
There are things that have been deprecated since JDK8 and they still
have not yet been made obsolete.

Deprecating biased locking is the proper way of saying that we (Oracle)
and/or others think that biased locking should/will go away in a future
release. Yes, there are locking experts outside of Oracle that have said
that biased locking should go away, but I haven't gotten permission to
quote the folks (yet)...

Deprecation is not final. Features can be un-deprecated if some
relevant facts and/or info changes the previous conclusion.

Dan


>
> Biased locking will be disabled by default and all related flags will 
> be deprecated. Performance gains seen when the feature was introduced 
> in the VM are less clear today with modern Java code/processors. 
> Detailed rationale behind the change is included on the description of 
> the bug.
>
> I modified test gtest/oops/test_markWord.cpp so that it still 
> exercises other cases of markword printing.
>
> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
> and Solaris).
>
> Thanks,
> Patricio
>
>


From david.holmes at oracle.com  Tue Nov 19 01:59:38 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 19 Nov 2019 11:59:38 +1000
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <cd392333-5bff-fe94-fb25-03af1e88dc5c@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
 <cd392333-5bff-fe94-fb25-03af1e88dc5c@oracle.com>
Message-ID: <9188fde8-734c-965a-f392-fad3bf04204c@oracle.com>

Hi Dan,

Thanks for taking a look at this.

On 19/11/2019 3:36 am, Daniel D. Daugherty wrote:
> Hi David,
> 
> 
> On 11/17/19 9:30 PM, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
>> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> 
> src/hotspot/share/runtime/thread.hpp
>  ??? Nice catch!
> 
> src/hotspot/share/runtime/thread.cpp
>  ??? Nice catch!
> 
>  ??? Not your issue, but these two lines feel strange/wrong:
> 
>  ?? ? ?? L1008: ? // Allow non Java threads to call this without stack_base
>  ??????? L1009: ? if (_stack_base == NULL) return true;
> 
>  ??? When _stack_base is NULL, any 'adr' is in the caller's stack? The
>  ??? comment is not helping understand why this is so...
> 
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaThread.java
>  ??? Nice catch!
> 
>  ??? Again, not your issue, but these four lines are questionable:
> 
>  ? ? ? ? L383???? Address sp????? = lastSPDbg();
>  ??????? L384???? Address stackBase = getStackBase();
>  ??????? L385???? // Be robust
>  ??????? L386???? if (sp == null) return false;
> 
>  ??? I can see why a NULL sp would cause a "false" return since obviously
>  ??? something is a amiss in the frame. However, the C++ code doesn't make
>  ??? this check so why does the SA code?
> 
>  ??? And this code doesn't check stackBase == NULL so it's not matching
>  ??? the C++ code either.
> 
> 
> Thumbs up on the change itself. My queries above and below might warrant
> new bugs or RFEs to be filed.

I have filed a bug to examine this and the issue Thomas flagged:

https://bugs.openjdk.java.net/browse/JDK-8234372

"Investigate use of Thread::stack_base() and queries for "in stack""

>>
>> This was a very difficult bug to track down and I want to publicly 
>> acknowledge and thank the jemalloc folk (users and developers) for 
>> continuing to investigate this issue from their side. Without their 
>> persistence this issue would have languished.
> 
> You also deserve thanks for sticking with this bug: Thanks David!!

Thanks, but I had written this off as a jemalloc issue until they 
provided the additional data.

>> The thread stack_base() is the first address above the thread's stack. 
>> However, the "in stack" checks performed by Thread::on_local_stack and 
>> Thread::is_in_stack allowed the checked address to be equal to the 
>> stack_base() - which is not correct. Here's how this manifests as the 
>> bug:
>>
>> - Let a JavaThread instance, T2, be allocated at the end of thread 
>> T1's stack i.e. at T1->stack_base()
>> ? [This seems to be why this only reproduced with jemalloc.]
>> - Let T2 lock an inflated monitor
>> - Let T1 try to lock the same monitor
>> ? - T1 would consider the _owner field value (T2) as being in its 
>> stack and so consider the monitor stack-locked by T1
>> ? - And so both T1 and T2 would have ownership of the monitor allowing 
>> the monitor state (and application state) to be corrupted. This 
>> results in a range of hangs and crashes depending on the exact 
>> interleaving.
> 
> Ouch!
> 
> So I was wondering how this bug could happen with the thread alignment
> logic that we have in place... search for the _real_malloc_address stuff...
> 
> And then I noticed that the logic only kicks in when UseBiasedLocking == 
> true
> (and this bug says it doesn't happen with -XX:-UseBiasedLocking):

Actually that is a false claim. As per my comment on "2019-10-09 14:09" 
it does reproduce with biased-locking disabled but much more rarely.

> src/hotspot/share/runtime/thread.cpp:
> 
> // ======= Thread ========
> // Support for forcing alignment of thread objects for biased locking
> void* Thread::allocate(size_t size, bool throw_excpt, MEMFLAGS flags) {
>  ? if (UseBiasedLocking) {
>  ??? const size_t alignment = markWord::biased_lock_alignment;
>  ??? size_t aligned_size = size + (alignment - sizeof(intptr_t));
>  ??? void* real_malloc_addr = throw_excpt? AllocateHeap(aligned_size, 
> flags, CURRENT_PC)
>  ????????????????????????????????????????? : AllocateHeap(aligned_size, 
> flags, CURRENT_PC,
> AllocFailStrategy::RETURN_NULL);
>  ??? void* aligned_addr???? = align_up(real_malloc_addr, alignment);
>  ??? assert(((uintptr_t) aligned_addr + (uintptr_t) size) <=
>  ?????????? ((uintptr_t) real_malloc_addr + (uintptr_t) aligned_size),
>  ?????????? "JavaThread alignment code overflowed allocated storage");
>  ??? if (aligned_addr != real_malloc_addr) {
>  ????? log_info(biasedlocking)("Aligned thread " INTPTR_FORMAT " to " 
> INTPTR_FORMAT,
>  ????????????????????????????? p2i(real_malloc_addr),
>  ????????????????????????????? p2i(aligned_addr));
>  ??? }
>  ??? ((Thread*) aligned_addr)->_real_malloc_address = real_malloc_addr;
>  ??? return aligned_addr;
>  ? } else {
>  ??? return throw_excpt? AllocateHeap(size, flags, CURRENT_PC)
>  ?????????????????????? : AllocateHeap(size, flags, CURRENT_PC, 
> AllocFailStrategy::RETURN_NULL);
>  ? }
> }
> 
> 
> The logging logic above:
> 
>  ??? if (aligned_addr != real_malloc_addr) {
>  ????? log_info(biasedlocking)("Aligned thread " INTPTR_FORMAT " to " 
> INTPTR_FORMAT,
>  ????????????????????????????? p2i(real_malloc_addr),
>  ????????????????????????????? p2i(aligned_addr));
>  ??? }
> 
> allows for real_malloc_addr to be the same as aligned_addr sometimes
> (and no log message is issued), but I'm not sure from spelunking in
> code whether it's really possible for:
> 
>  ??? void* aligned_addr???? = align_up(real_malloc_addr, alignment);
> 
> to return aligned_addr == real_malloc_addr. In other words, if
> real_malloc_addr is already aligned perfectly, does align_up() still
> change that value?
> 
> If it is possible for (aligned_addr == real_malloc_addr), then it is
> possible for this bug to happen without jemalloc.
> 
> I've convinced myself that this is possible because of this line:
> 
>  ??? size_t aligned_size = size + (alignment - sizeof(intptr_t));
> 
> If real_malloc_addr is already aligned perfectly and align_up()
> always changed the input address, then the aligned_size would be
> too small by sizeof(intptr_t) and we would have seen a buffer
> overwrite like that over the many, many years.
> 
> So my conclusion is that it should be possible for this bug to
> happen without jemalloc, but it would have to be rare.

I'm a little surprised that we specialize this way as I thought the 
128/256 byte alignment was necessary regardless of biased-locking. 
Further even if running without biased-locking we still have alignment 
requirements for the lock-bits, age-bits etc, that do not seem to be 
captured by the above code unless AllocateHeap somehow already provides 
such alignment by default. (I'm also unclear why this doesn't fail in 
debug builds but just assume the allocation patterns are different.)

Anyway, if the allocator already returns a suitably aligned block of 
memory then I am assuming the above code doesn't actually need to do 
anything.

So theoretically, without having advance knowledge of the details of the 
allocator, yes this bug could happen for any allocator.

Thanks,
David
-----

> 
>> Interestingly Thread::is_in_usable_stack does not have this bug.
> 
> So we have Thread::is_in_usable_stack(), Thread::on_local_stack() and
> Thread::is_in_stack()? I haven't compared all three side by side, but
> there might be some cleanup work that can be done here (in a different
> bug).
> 
> 
>>
>> The bug can be tracked way back to JDK-6699669 as explained in the bug 
>> report. That issue also showed that the same bug existed in the SA 
>> implementations of these "on stack" checks.
> 
> Ouch! JDK-6699669 was fixed in jdk7-B56 and looks like it was pushed
> to the jdk6u train... so this bug goes back quite a ways...
> 
> Outstanding hunt David!
> 
> Dan
> 
> 
>>
>> Testing:
>> ? - The reproducer from the bug report, using jemalloc, ran over 5000 
>> times without failing in any way.
>> ? - tiers 1-3 on all Oracle platforms
>> ? - serviceability/sa tests
>>
>> Thanks,
>> David
>> -----
> 

From david.holmes at oracle.com  Tue Nov 19 02:23:59 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 19 Nov 2019 12:23:59 +1000
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
Message-ID: <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>

Hi Dan,

Given:

  volatile intptr_t _recursions;

The change to the print statements to use INTX_FORMAT instead of the 
existing INTPTR_FORMAT seems a little odd - but obviously you don't want 
it printed in hex. That seems fine, but can we then make the simple 
change to redefine _recursions as intx as well - which is a semantic 
no-op given:

typedef intptr_t  intx;

Otherwise all seems okay.

Thanks,
David

On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
> Greetings,
> 
> I have another round of baseline cleanup changes from the Async Monitor
> Deflation project (8153224). Like previous sub-tasks of 8153224, these
> changes are small and/or trivial. These changes have previously been
> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
> 
> Vladimir K., if you could sanity check the cleanups in 
> macroAssembler_x86.cpp
> that would be appreciated (only comments were changed). I recommend the
> Udiff view...
> 
> Please see the bug for details about the changes in this webrev:
> 
>  ??? JDK-8230876 baseline cleanups from Async Monitor Deflation v2.0[789]
>  ??? https://bugs.openjdk.java.net/browse/JDK-8230876
> 
> Here's the webrev URL:
> 
>  ??? http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
> 
> These changes have been included in my recent rounds of Mach5 Tier[1-8]
> and other associated stress and/or performance testing. I have also done
> a Mach5 Tier[1-3] run with just this patch to make sure that I got all
> the pieces that are needed.
> 
> Thanks, in advance, for any comments, questions or suggestions.
> 
> Dan

From v.plizga at cft.ru  Tue Nov 19 03:19:24 2019
From: v.plizga at cft.ru (Plizga Vladimir)
Date: Tue, 19 Nov 2019 03:19:24 +0000
Subject: [***DKIM violation***]Re: RFR: JDK-8230413: Support Pre JDK 6
 class with CDS
In-Reply-To: <CALrW1jx08tzZ6PBv_zsqLTcO8JSh0Piu1Zsz36jiACF6QwFmaw@mail.gmail.com>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
 <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
 <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>
 <fd3d57f2-6f1b-7b3d-0601-6f2c36182559@oracle.com>
 <CALrW1jx1uv=PU3Y2bfhOLM3ibCcQQ4pt1z=TiC45S2Mprcr0gg@mail.gmail.com>
 <CALrW1jx08tzZ6PBv_zsqLTcO8JSh0Piu1Zsz36jiACF6QwFmaw@mail.gmail.com>
Message-ID: <cf4817495ceb4e5eb1c425ce2cb21678@nut-mbx-2.win.ftc.ru>

Hi Jiangli,

Thank you for proposal of this feature!

I've tried to use AppCDS for a web application (microservice) based on Spring Boot framework. No ''-Xverify:none" option is used neither at dump time nor at run time.
There are ~10 000 classes in application at runtime (simply counted with 'grep' of JVM class loading log). During archive creation ~750 classes are skipped with warnings like this:
> Pre JDK 6 class not supported by CDS: 49.0 org/jrobin/core/RrdUpdater

All the skipped classes are from 3rd party libraries. These libraries are usually involved as transitive dependencies of other libraries and frameworks. AFAIU, there are 2 reasons why the libraries have their byte code of so old version:
1. The libraries that use them didn?t upgrade their dependencies for some reasons.
2. They keep their byte code old intentionally to stay compatible with most runtime environments (for example slf4j logging fa?ade).

Of course this is not very significant (and for not the largest) part of all the skipped classes in this case. But we have other similar applications with a bigger number of old transitive dependencies (about 10%). 
Hope this would help.

Cheers,
Vladimir Plizga


-----Original Message-----
From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net> On Behalf Of Jiangli Zhou
Sent: Tuesday, November 19, 2019 2:01 AM
To: hotspot-runtime-dev <hotspot-runtime-dev at openjdk.java.net>
Subject: [***DKIM violation***]Re: RFR: JDK-8230413: Support Pre JDK 6 class with CDS

After off-mailing list discussions regarding the support for archiving
pre-JDK-6 classes, we would like to seek additional input and feedback on this topic. Please reply to this email thread if the general support for archiving pre-JDK-6 class is a requirement in your use case, both with verification enabled and disabled. Thanks!

Best,

Jiangli

On Wed, Nov 13, 2019 at 9:23 AM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> Hi David,
>
> On Tue, Nov 12, 2019 at 10:00 PM David Holmes <david.holmes at oracle.com> wrote:
> >
> > Hi Jiangli,
> >
> > On 13/11/2019 12:20 pm, Jiangli Zhou wrote:
> > > Hi Harold and Ioi,
> > >
> > > Thanks a lot for the additional feedback.
> > >
> > > I did some quick research today about -Xverify:none usages. My 
> > > finding showed that the use of -Xverify:none is not very uncommon 
> > > in some cases. Here are some of the usages:
> > >
> > > - trusted tools
> >
> > But what is the context? Is it:
> >
> > "I trust this tool, and all other classes, so I'll optimize by 
> > disabling verification,"; or
>
> This is the case. For a tool that's developed by a user and properly 
> compiled by javac, user may want to disable class verification when 
> running the tool.
>
> >
> > "This tool produces non-verifiable classfiles, but I trust the tool 
> > and so will disable verification" (which implicitly means all 
> > classes/libraries have to be fully trusted)
> >
> > ?
> >
> > I'm not sure you can use any existing uses of -Xverify:none to infer 
> > the applicability or not to what is being proposed here for CDS.
>
> In above example, CDS dump time forces verification for the tool's 
> classes as long as they are placed in -cp path. Without CDS involved, 
> users choice is honored. I feel this usage may be a lurking issue when 
> more users start to use CDS/AppCDS.
>
> Harold, Ioi and I have a discussion for pre-jdk-6 verification off the 
> mailing list, since verification is security related and may be 
> sensitive. I'll loop you in. It's possible we may be able to separate 
> the pre-jdk-6 class problem from the general CDS -Xverify:none topics.
>
> >
> > > - some limited testing environment
> > >
> > > CDS (particularly with dynamic archiving capability) may help 
> > > avoid runtime verification overhead by verifying classes at dump 
> > > time and reduce the needs for -Xverify:none. It would be good to 
> > > have strategies for the following senators as well when removing
> > > -Xverify:none:
> > >
> > > 1) In cases when shared archive is disabled at runtime (I hope 
> > > it's not common cases)
> >
> > I'm not quite sure what you are saying here. If a pre-verified 
> > archive can't be used at runtime then normal verification should 
> > occur as classes are not being loaded from a known pre-verified location.
>
> CDS/AppCDS are still not widely adopted yet. When users start to learn 
> more about CDS/AppCDS capability, they may still choose to not use the 
> feature based on their specific requirements. For example, a user may 
> choose to not use AppCDS and also turn off the default CDS.
>
> >
> > > 2) When users want to reduce the overhead caused by verification 
> > > during archiving dump time
> >
> > I would not expect dumping to be such a time critical activity that 
> > users would care about the "overhead" of verification.
>
> With dynamic archiving, dump time performance can be more important to users.
>
> Best,
>
> Jiangli
>
> >
> > Cheers,
> > David
> >
> > > Thoughts?
> > >
> > > Best,
> > > Jiangli
> > >
> > > On Tue, Nov 12, 2019 at 4:16 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> > >>
> > >> I am also a little worried that this might send the wrong message 
> > >> -- "if you want to archive pre-JDK6 classes, you need to disable 
> > >> verification altogether for all classes in your entire app".
> > >>
> > >> Thanks
> > >> - Ioi
> > >>
> > >> On 11/12/19 12:40 PM, Harold Seigel wrote:
> > >>> Hi Jiangli,
> > >>>
> > >>> I think this change is going in the wrong direction.  We are 
> > >>> trying to discourage disabling verification, not encourage it.  
> > >>> We also do not want to create more use-cases for preserving -Xverify:none.
> > >>>
> > >>> It looks like your change would allow archiving of unverified 
> > >>> pre-JDK6 classes, but not allow archiving of verified pre-JDK6 
> > >>> classes.  If so, that seems backward.
> > >>>
> > >>> Thanks, Harold
> > >>>
> > >>> On 11/11/2019 11:53 PM, Ioi Lam wrote:
> > >>>> I wonder if there's a safer alternative. Are there tools that 
> > >>>> can add stackmaps to pre-JDK6 classes? That way they can be 
> > >>>> verified with the split verifier during CDS dump time.
> > >>>>
> > >>>> Thanks
> > >>>> - Ioi
> > >>>>
> > >>>> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> > >>>>> Hi David,
> > >>>>>
> > >>>>> Thanks for quick response!
> > >>>>>
> > >>>>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes 
> > >>>>> <david.holmes at oracle.com> wrote:
> > >>>>>> Hi Jiangli,
> > >>>>>>
> > >>>>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> > >>>>>>> Please review the following change that allows archiving 
> > >>>>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
> > >>>>>>>
> > >>>>>>> webrev: 
> > >>>>>>> http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> > >>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> > >>>>>>>
> > >>>>>>> Currently there are still large number of existing classes
> > >>>>>>> (pre-built)
> > >>>>>>> with older class versions (< 50) in real world applications. 
> > >>>>>>> Those classes are missing the benefit of archiving. 
> > >>>>>>> Particularly, in some use cases, class verification can be 
> > >>>>>>> safely disabled. For those use cases, supporting archiving 
> > >>>>>>> pre JDK 6 classes shows good performance benefit. We can 
> > >>>>>>> re-evaluate this support when -Xverify:none is removed in 
> > >>>>>>> the future, hopefully the needs for supporting class version 
> > >>>>>>> < 50 is no longer significant at that time.
> > >>>>>>>
> > >>>>>>> This change brings back the pre-JDK-8198849 behavior. 
> > >>>>>>> Runtime makes sure the dump-time verification mode must be 
> > >>>>>>> the same or stronger than the current mode.
> > >>>>>>>
> > >>>>>>> A CSR may be needed for the change. Any thoughts on that?
> > >>>>>> A CSR request is definitely required given that you are 
> > >>>>>> proposing to undo a change that was itself put in place via a 
> > >>>>>> CSR request! And given this is relaxing a "defense-in-depth" 
> > >>>>>> check which will result in increasing exploitability, I think 
> > >>>>>> you will need a very strong argument to justify this.
> > >>>>> Thanks for confirming this! Will do.
> > >>>>>
> > >>>>>> Further this not only undoes JDK-8197972 but it also 
> > >>>>>> invalidates
> > >>>>>> JDK-8155671 being closed as a duplicate of JDK-8197972. 
> > >>>>>> JDK-8155671 requested a way to know if verification had been 
> > >>>>>> disabled, to help with analyzing crash reports, but instead 
> > >>>>>> we decided to not allow verification to be disabled.
> > >>>>> I had some concerns about JDK-8155671 initially before making 
> > >>>>> the change, as it's a closed bug and my memory about the 
> > >>>>> specific issue was flushed out. I brought up the question in 
> > >>>>> the bug. My take on Ioi's response to my query about 
> > >>>>> JDK-8155671 was that the
> > >>>>> pre-JDK-8197972 behavior would not cause any security hole.
> > >>>>>
> > >>>>> Re-evaluating this particular behavior, I think the 
> > >>>>> pre-JDK-8155671 would actually matches user intention better. 
> > >>>>> If user decides to turn off verification in safe use cases, it 
> > >>>>> seems to be a good idea to honor that. With the new dynamic 
> > >>>>> archiving capability, archive could be created at the first time when running a particular application.
> > >>>>> Not forcing verification when user decides to can avoid 
> > >>>>> unnecessary/unwanted overhead.
> > >>>>>
> > >>>>> If verification is turned off at dump time for application 
> > >>>>> classes, runtime does not allow execution without also turning 
> > >>>>> off verification. We can determine a crash is not caused by 
> > >>>>> relaxed dump time verification.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Jiangli
> > >>>>>
> > >>>>>> David
> > >>>>>> -----
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> Tested with jtreg appcds tests.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jiangli
> > >>>>>>>
> > >>>>
> > >>

From jianglizhou at google.com  Tue Nov 19 03:40:00 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 18 Nov 2019 19:40:00 -0800
Subject: [***DKIM violation***]Re: RFR: JDK-8230413: Support Pre JDK 6
 class with CDS
In-Reply-To: <cf4817495ceb4e5eb1c425ce2cb21678@nut-mbx-2.win.ftc.ru>
References: <CALrW1jzdNjQbZG=+2qdNfV2jXuAi88ybh=KYs-mwLX+tn750BA@mail.gmail.com>
 <c69bf94f-e52d-f7cd-e9f7-252a71204f7b@oracle.com>
 <CALrW1jyog+Q5H-BXQxQEkPdtLhr_aB7ekOVzuq9Qi4qLQ-jqOg@mail.gmail.com>
 <f2b33c85-e811-48f5-ed82-6ee398ea1935@oracle.com>
 <c3a4ff42-e3b5-98a9-138c-358d14ac1ada@oracle.com>
 <c322d943-8c6a-5c6e-345f-af4c902d4a57@oracle.com>
 <CALrW1jyqUuZNjmS0UZyiR-ZQfZVVy38opvSCaMotwTzY0i+S1A@mail.gmail.com>
 <fd3d57f2-6f1b-7b3d-0601-6f2c36182559@oracle.com>
 <CALrW1jx1uv=PU3Y2bfhOLM3ibCcQQ4pt1z=TiC45S2Mprcr0gg@mail.gmail.com>
 <CALrW1jx08tzZ6PBv_zsqLTcO8JSh0Piu1Zsz36jiACF6QwFmaw@mail.gmail.com>
 <cf4817495ceb4e5eb1c425ce2cb21678@nut-mbx-2.win.ftc.ru>
Message-ID: <CALrW1jxVHyXCL-Aj+zAV2Uyx0UX8VU8Y=UYUZ7qXMM0o6p=f9g@mail.gmail.com>

Hi Plizga,

Thanks for the feedback! Inputs like yours are most helpful for deciding
what is the right approach. Really appreciate it!

Best,
Jiangli

On Mon, Nov 18, 2019, 7:21 PM Plizga Vladimir <v.plizga at cft.ru> wrote:

> Hi Jiangli,
>
> Thank you for proposal of this feature!
>
> I've tried to use AppCDS for a web application (microservice) based on
> Spring Boot framework. No ''-Xverify:none" option is used neither at dump
> time nor at run time.
> There are ~10 000 classes in application at runtime (simply counted with
> 'grep' of JVM class loading log). During archive creation ~750 classes are
> skipped with warnings like this:
> > Pre JDK 6 class not supported by CDS: 49.0 org/jrobin/core/RrdUpdater
>
> All the skipped classes are from 3rd party libraries. These libraries are
> usually involved as transitive dependencies of other libraries and
> frameworks. AFAIU, there are 2 reasons why the libraries have their byte
> code of so old version:
> 1. The libraries that use them didn?t upgrade their dependencies for some
> reasons.
> 2. They keep their byte code old intentionally to stay compatible with
> most runtime environments (for example slf4j logging fa?ade).
>
> Of course this is not very significant (and for not the largest) part of
> all the skipped classes in this case. But we have other similar
> applications with a bigger number of old transitive dependencies (about
> 10%).
> Hope this would help.
>
> Cheers,
> Vladimir Plizga
>
>
> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-bounces at openjdk.java.net>
> On Behalf Of Jiangli Zhou
> Sent: Tuesday, November 19, 2019 2:01 AM
> To: hotspot-runtime-dev <hotspot-runtime-dev at openjdk.java.net>
> Subject: [***DKIM violation***]Re: RFR: JDK-8230413: Support Pre JDK 6
> class with CDS
>
> After off-mailing list discussions regarding the support for archiving
> pre-JDK-6 classes, we would like to seek additional input and feedback on
> this topic. Please reply to this email thread if the general support for
> archiving pre-JDK-6 class is a requirement in your use case, both with
> verification enabled and disabled. Thanks!
>
> Best,
>
> Jiangli
>
> On Wed, Nov 13, 2019 at 9:23 AM Jiangli Zhou <jianglizhou at google.com>
> wrote:
> >
> > Hi David,
> >
> > On Tue, Nov 12, 2019 at 10:00 PM David Holmes <david.holmes at oracle.com>
> wrote:
> > >
> > > Hi Jiangli,
> > >
> > > On 13/11/2019 12:20 pm, Jiangli Zhou wrote:
> > > > Hi Harold and Ioi,
> > > >
> > > > Thanks a lot for the additional feedback.
> > > >
> > > > I did some quick research today about -Xverify:none usages. My
> > > > finding showed that the use of -Xverify:none is not very uncommon
> > > > in some cases. Here are some of the usages:
> > > >
> > > > - trusted tools
> > >
> > > But what is the context? Is it:
> > >
> > > "I trust this tool, and all other classes, so I'll optimize by
> > > disabling verification,"; or
> >
> > This is the case. For a tool that's developed by a user and properly
> > compiled by javac, user may want to disable class verification when
> > running the tool.
> >
> > >
> > > "This tool produces non-verifiable classfiles, but I trust the tool
> > > and so will disable verification" (which implicitly means all
> > > classes/libraries have to be fully trusted)
> > >
> > > ?
> > >
> > > I'm not sure you can use any existing uses of -Xverify:none to infer
> > > the applicability or not to what is being proposed here for CDS.
> >
> > In above example, CDS dump time forces verification for the tool's
> > classes as long as they are placed in -cp path. Without CDS involved,
> > users choice is honored. I feel this usage may be a lurking issue when
> > more users start to use CDS/AppCDS.
> >
> > Harold, Ioi and I have a discussion for pre-jdk-6 verification off the
> > mailing list, since verification is security related and may be
> > sensitive. I'll loop you in. It's possible we may be able to separate
> > the pre-jdk-6 class problem from the general CDS -Xverify:none topics.
> >
> > >
> > > > - some limited testing environment
> > > >
> > > > CDS (particularly with dynamic archiving capability) may help
> > > > avoid runtime verification overhead by verifying classes at dump
> > > > time and reduce the needs for -Xverify:none. It would be good to
> > > > have strategies for the following senators as well when removing
> > > > -Xverify:none:
> > > >
> > > > 1) In cases when shared archive is disabled at runtime (I hope
> > > > it's not common cases)
> > >
> > > I'm not quite sure what you are saying here. If a pre-verified
> > > archive can't be used at runtime then normal verification should
> > > occur as classes are not being loaded from a known pre-verified
> location.
> >
> > CDS/AppCDS are still not widely adopted yet. When users start to learn
> > more about CDS/AppCDS capability, they may still choose to not use the
> > feature based on their specific requirements. For example, a user may
> > choose to not use AppCDS and also turn off the default CDS.
> >
> > >
> > > > 2) When users want to reduce the overhead caused by verification
> > > > during archiving dump time
> > >
> > > I would not expect dumping to be such a time critical activity that
> > > users would care about the "overhead" of verification.
> >
> > With dynamic archiving, dump time performance can be more important to
> users.
> >
> > Best,
> >
> > Jiangli
> >
> > >
> > > Cheers,
> > > David
> > >
> > > > Thoughts?
> > > >
> > > > Best,
> > > > Jiangli
> > > >
> > > > On Tue, Nov 12, 2019 at 4:16 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> > > >>
> > > >> I am also a little worried that this might send the wrong message
> > > >> -- "if you want to archive pre-JDK6 classes, you need to disable
> > > >> verification altogether for all classes in your entire app".
> > > >>
> > > >> Thanks
> > > >> - Ioi
> > > >>
> > > >> On 11/12/19 12:40 PM, Harold Seigel wrote:
> > > >>> Hi Jiangli,
> > > >>>
> > > >>> I think this change is going in the wrong direction.  We are
> > > >>> trying to discourage disabling verification, not encourage it.
> > > >>> We also do not want to create more use-cases for preserving
> -Xverify:none.
> > > >>>
> > > >>> It looks like your change would allow archiving of unverified
> > > >>> pre-JDK6 classes, but not allow archiving of verified pre-JDK6
> > > >>> classes.  If so, that seems backward.
> > > >>>
> > > >>> Thanks, Harold
> > > >>>
> > > >>> On 11/11/2019 11:53 PM, Ioi Lam wrote:
> > > >>>> I wonder if there's a safer alternative. Are there tools that
> > > >>>> can add stackmaps to pre-JDK6 classes? That way they can be
> > > >>>> verified with the split verifier during CDS dump time.
> > > >>>>
> > > >>>> Thanks
> > > >>>> - Ioi
> > > >>>>
> > > >>>> On 11/11/19 4:25 PM, Jiangli Zhou wrote:
> > > >>>>> Hi David,
> > > >>>>>
> > > >>>>> Thanks for quick response!
> > > >>>>>
> > > >>>>> On Mon, Nov 11, 2019 at 3:12 PM David Holmes
> > > >>>>> <david.holmes at oracle.com> wrote:
> > > >>>>>> Hi Jiangli,
> > > >>>>>>
> > > >>>>>> On 12/11/2019 8:12 am, Jiangli Zhou wrote:
> > > >>>>>>> Please review the following change that allows archiving
> > > >>>>>>> pre-JAVA_6_VERSION classes with -Xverify:none.
> > > >>>>>>>
> > > >>>>>>> webrev:
> > > >>>>>>> http://cr.openjdk.java.net/~jiangli/8230413/webrev.00/
> > > >>>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8230413
> > > >>>>>>>
> > > >>>>>>> Currently there are still large number of existing classes
> > > >>>>>>> (pre-built)
> > > >>>>>>> with older class versions (< 50) in real world applications.
> > > >>>>>>> Those classes are missing the benefit of archiving.
> > > >>>>>>> Particularly, in some use cases, class verification can be
> > > >>>>>>> safely disabled. For those use cases, supporting archiving
> > > >>>>>>> pre JDK 6 classes shows good performance benefit. We can
> > > >>>>>>> re-evaluate this support when -Xverify:none is removed in
> > > >>>>>>> the future, hopefully the needs for supporting class version
> > > >>>>>>> < 50 is no longer significant at that time.
> > > >>>>>>>
> > > >>>>>>> This change brings back the pre-JDK-8198849 behavior.
> > > >>>>>>> Runtime makes sure the dump-time verification mode must be
> > > >>>>>>> the same or stronger than the current mode.
> > > >>>>>>>
> > > >>>>>>> A CSR may be needed for the change. Any thoughts on that?
> > > >>>>>> A CSR request is definitely required given that you are
> > > >>>>>> proposing to undo a change that was itself put in place via a
> > > >>>>>> CSR request! And given this is relaxing a "defense-in-depth"
> > > >>>>>> check which will result in increasing exploitability, I think
> > > >>>>>> you will need a very strong argument to justify this.
> > > >>>>> Thanks for confirming this! Will do.
> > > >>>>>
> > > >>>>>> Further this not only undoes JDK-8197972 but it also
> > > >>>>>> invalidates
> > > >>>>>> JDK-8155671 being closed as a duplicate of JDK-8197972.
> > > >>>>>> JDK-8155671 requested a way to know if verification had been
> > > >>>>>> disabled, to help with analyzing crash reports, but instead
> > > >>>>>> we decided to not allow verification to be disabled.
> > > >>>>> I had some concerns about JDK-8155671 initially before making
> > > >>>>> the change, as it's a closed bug and my memory about the
> > > >>>>> specific issue was flushed out. I brought up the question in
> > > >>>>> the bug. My take on Ioi's response to my query about
> > > >>>>> JDK-8155671 was that the
> > > >>>>> pre-JDK-8197972 behavior would not cause any security hole.
> > > >>>>>
> > > >>>>> Re-evaluating this particular behavior, I think the
> > > >>>>> pre-JDK-8155671 would actually matches user intention better.
> > > >>>>> If user decides to turn off verification in safe use cases, it
> > > >>>>> seems to be a good idea to honor that. With the new dynamic
> > > >>>>> archiving capability, archive could be created at the first time
> when running a particular application.
> > > >>>>> Not forcing verification when user decides to can avoid
> > > >>>>> unnecessary/unwanted overhead.
> > > >>>>>
> > > >>>>> If verification is turned off at dump time for application
> > > >>>>> classes, runtime does not allow execution without also turning
> > > >>>>> off verification. We can determine a crash is not caused by
> > > >>>>> relaxed dump time verification.
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Jiangli
> > > >>>>>
> > > >>>>>> David
> > > >>>>>> -----
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> Tested with jtreg appcds tests.
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Jiangli
> > > >>>>>>>
> > > >>>>
> > > >>
>

From coleen.phillimore at oracle.com  Tue Nov 19 03:42:51 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Mon, 18 Nov 2019 22:42:51 -0500
Subject: RFR(S): 8233113: ARM32: assert on UnsafeJlong mutex rank check
In-Reply-To: <5d0e1ad7-0ffa-54a1-fbc7-23aecc367c0c@bell-sw.com>
References: <5124def3-3bf7-8425-557d-c6cba6192927@bell-sw.com>
 <df1adebe-31d5-8cc2-d8c7-a22e2397c7b9@oracle.com>
 <5d0e1ad7-0ffa-54a1-fbc7-23aecc367c0c@bell-sw.com>
Message-ID: <77cf0a9b-2fa2-db56-5db2-bc998373d052@oracle.com>


This looks good!? Thank you for fixing it.
Coleen

On 11/18/19 9:35 AM, Boris Ulasevich wrote:
> David, thank you!
>
> Dear all,
>
> ? Can anybody else take a look at the review please?
> ? Or should I consider the change trivial?
>
> thanks,
> Boris
>
> On 11.11.2019 13:56, David Holmes wrote:
>> Hi Boris,
>>
>> This seems fine to me.
>>
>> Thanks,
>> David
>>
>> On 8/11/2019 11:28 pm, Boris Ulasevich wrote:
>>> Hi,
>>>
>>> Recent JDK-8184732 change adds the assertion that fires on 
>>> UnsafeJlong mutex rank check, on platforms without 64 bit atomics 
>>> compare-and-exchange support. On preliminary review (thanks to 
>>> Coleen and David!) it is suggested to remove the assertion and 
>>> corresponding test codes.
>>>
>>> http://bugs.openjdk.java.net/browse/JDK-8233113
>>> http://cr.openjdk.java.net/~bulasevich/8233113/webrev.01
>>>
>>> Thanks,
>>> Boris


From serguei.spitsyn at oracle.com  Tue Nov 19 04:34:39 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Mon, 18 Nov 2019 20:34:39 -0800
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
Message-ID: <6fe10178-cb74-aa9f-7052-5c577e6e10bf@oracle.com>

Hi David,

The fix looks good.
It is besides the platform-dependent code that Thomas flagged.

There can be similar broken code on other platforms.
For instance, there is a suspicious spot in cpu/ppc/frame_ppc.cpp:

 ??? // sender_fp must be within the stack and above (but not
 ??? // equal) current frame's fp.
 ??? if (sender_fp > thread->stack_base() || sender_fp <= fp) {
 ??????? return false;
 ??? }

Thanks,
Serguei


On 11/17/19 18:30, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
>
> This was a very difficult bug to track down and I want to publicly 
> acknowledge and thank the jemalloc folk (users and developers) for 
> continuing to investigate this issue from their side. Without their 
> persistence this issue would have languished.
>
> The thread stack_base() is the first address above the thread's stack. 
> However, the "in stack" checks performed by Thread::on_local_stack and 
> Thread::is_in_stack allowed the checked address to be equal to the 
> stack_base() - which is not correct. Here's how this manifests as the 
> bug:
>
> - Let a JavaThread instance, T2, be allocated at the end of thread 
> T1's stack i.e. at T1->stack_base()
> ? [This seems to be why this only reproduced with jemalloc.]
> - Let T2 lock an inflated monitor
> - Let T1 try to lock the same monitor
> ? - T1 would consider the _owner field value (T2) as being in its 
> stack and so consider the monitor stack-locked by T1
> ? - And so both T1 and T2 would have ownership of the monitor allowing 
> the monitor state (and application state) to be corrupted. This 
> results in a range of hangs and crashes depending on the exact 
> interleaving.
>
> Interestingly Thread::is_in_usable_stack does not have this bug.
>
> The bug can be tracked way back to JDK-6699669 as explained in the bug 
> report. That issue also showed that the same bug existed in the SA 
> implementations of these "on stack" checks.
>
> Testing:
> ? - The reproducer from the bug report, using jemalloc, ran over 5000 
> times without failing in any way.
> ? - tiers 1-3 on all Oracle platforms
> ? - serviceability/sa tests
>
> Thanks,
> David
> -----


From david.holmes at oracle.com  Tue Nov 19 04:37:24 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 19 Nov 2019 14:37:24 +1000
Subject: RFR: 8215355: Object monitor deadlock with no threads holding the
 monitor (using jemalloc 5.1)
In-Reply-To: <6fe10178-cb74-aa9f-7052-5c577e6e10bf@oracle.com>
References: <d66fdf17-0114-8fd5-cc6b-b2fc6fa5c896@oracle.com>
 <6fe10178-cb74-aa9f-7052-5c577e6e10bf@oracle.com>
Message-ID: <aba036e4-6959-e87f-e2f6-2e56a33153b0@oracle.com>

Hi Serguei,

On 19/11/2019 2:34 pm, serguei.spitsyn at oracle.com wrote:
> Hi David,
> 
> The fix looks good.

Thanks for taking a look!

> It is besides the platform-dependent code that Thomas flagged.
> 
> There can be similar broken code on other platforms.
> For instance, there is a suspicious spot in cpu/ppc/frame_ppc.cpp:
> 
>  ??? // sender_fp must be within the stack and above (but not
>  ??? // equal) current frame's fp.
>  ??? if (sender_fp > thread->stack_base() || sender_fp <= fp) {
>  ??????? return false;
>  ??? }

I have filed:

https://bugs.openjdk.java.net/browse/JDK-8234372

"Investigate use of Thread::stack_base() and queries for "in stack""

to look at all uses of stack_base().

Thanks,
David

> Thanks,
> Serguei
> 
> 
> On 11/17/19 18:30, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
>> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
>>
>> This was a very difficult bug to track down and I want to publicly 
>> acknowledge and thank the jemalloc folk (users and developers) for 
>> continuing to investigate this issue from their side. Without their 
>> persistence this issue would have languished.
>>
>> The thread stack_base() is the first address above the thread's stack. 
>> However, the "in stack" checks performed by Thread::on_local_stack and 
>> Thread::is_in_stack allowed the checked address to be equal to the 
>> stack_base() - which is not correct. Here's how this manifests as the 
>> bug:
>>
>> - Let a JavaThread instance, T2, be allocated at the end of thread 
>> T1's stack i.e. at T1->stack_base()
>> ? [This seems to be why this only reproduced with jemalloc.]
>> - Let T2 lock an inflated monitor
>> - Let T1 try to lock the same monitor
>> ? - T1 would consider the _owner field value (T2) as being in its 
>> stack and so consider the monitor stack-locked by T1
>> ? - And so both T1 and T2 would have ownership of the monitor allowing 
>> the monitor state (and application state) to be corrupted. This 
>> results in a range of hangs and crashes depending on the exact 
>> interleaving.
>>
>> Interestingly Thread::is_in_usable_stack does not have this bug.
>>
>> The bug can be tracked way back to JDK-6699669 as explained in the bug 
>> report. That issue also showed that the same bug existed in the SA 
>> implementations of these "on stack" checks.
>>
>> Testing:
>> ? - The reproducer from the bug report, using jemalloc, ran over 5000 
>> times without failing in any way.
>> ? - tiers 1-3 on all Oracle platforms
>> ? - serviceability/sa tests
>>
>> Thanks,
>> David
>> -----
> 

From robbin.ehn at oracle.com  Tue Nov 19 11:05:08 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 19 Nov 2019 12:05:08 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
Message-ID: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>

Hi all, please review.

CMS was the last real user of the more advantage features of VM operation.
VM operation can be simplified to always be an stack object and thus either be
of safepoint or no safepoint type.

VM_EnableBiasedLocking is executed once by watcher thread, if needed (default 
not used). Making it synchrone doesn't matter.
VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
safepoint anyways, no real point in not stopping direct.
VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op is 
not needed. Arguably this thread should actually stop here, since we are about 
to safepoint.

There is also a small cleanup in vmThread.cpp where an unused method is removed.
And the extra safepoint is removed:
"// We want to make sure that we get to a safepoint regularly"
No we don't :)

Issue:
https://bugs.openjdk.java.net/browse/JDK-8234086
Change-set:
http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html

Tested scavenge manually, passes t1-2.

Thanks, Robbin

From david.holmes at oracle.com  Tue Nov 19 13:27:33 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 19 Nov 2019 23:27:33 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
Message-ID: <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>

Hi Robbin,

On 19/11/2019 9:05 pm, Robbin Ehn wrote:
> Hi all, please review.

Preliminary comments ... so ...

+class VM_Operation : public StackObj {
   public:
    enum Mode {
      _safepoint,       // blocking,        safepoint, vm_op C-heap 
allocated
-    _no_safepoint,    // blocking,     no safepoint, vm_op C-Heap allocated
-    _concurrent,      // non-blocking, no safepoint, vm_op C-Heap allocated
-    _async_safepoint  // non-blocking,    safepoint, vm_op C-Heap allocated
+    _no_safepoint    // blocking,     no safepoint, vm_op C-Heap allocated
    };


You are basically getting rid of concurrent and async_safepoint VM op 
capability. Okay. But you're also making all VM ops StackObj so all 
those "VM op C-heap allocated" comments are no longer correct. Also many 
of the comments around the VM ops you have changed from async to 
sync,and from C-heap to stackobj, are also no longer correct.

Please update.

Thanks,
David
-----

> CMS was the last real user of the more advantage features of VM operation.
> VM operation can be simplified to always be an stack object and thus 
> either be
> of safepoint or no safepoint type.
> 
> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
> (default not used). Making it synchrone doesn't matter.
> VM_ThreadStop is executed by a JavaThread, that thread should stop for 
> the safepoint anyways, no real point in not stopping direct.
> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM 
> op is not needed. Arguably this thread should actually stop here, since 
> we are about to safepoint.
> 
> There is also a small cleanup in vmThread.cpp where an unused method is 
> removed.
> And the extra safepoint is removed:
> "// We want to make sure that we get to a safepoint regularly"
> No we don't :)
> 
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8234086
> Change-set:
> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
> 
> Tested scavenge manually, passes t1-2.
> 
> Thanks, Robbin

From robbin.ehn at oracle.com  Tue Nov 19 13:51:54 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 19 Nov 2019 14:51:54 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
Message-ID: <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>

Hi David, thanks for having a look.

On 11/19/19 2:27 PM, David Holmes wrote:
> Hi Robbin,
> 
> On 19/11/2019 9:05 pm, Robbin Ehn wrote:
>> Hi all, please review.
> 
> Preliminary comments ... so ...
> 
> +class VM_Operation : public StackObj {
>  ? public:
>  ?? enum Mode {
>  ???? _safepoint,?????? // blocking,??????? safepoint, vm_op C-heap allocated
> -??? _no_safepoint,??? // blocking,???? no safepoint, vm_op C-Heap allocated
> -??? _concurrent,????? // non-blocking, no safepoint, vm_op C-Heap allocated
> -??? _async_safepoint? // non-blocking,??? safepoint, vm_op C-Heap allocated
> +??? _no_safepoint??? // blocking,???? no safepoint, vm_op C-Heap allocated
>  ?? };
> 
> 
> You are basically getting rid of concurrent and async_safepoint VM op 
> capability. Okay. But you're also making all VM ops StackObj so all those "VM op 
> C-heap allocated" comments are no longer correct. Also many of the comments 
> around the VM ops you have changed from async to sync,and from C-heap to 
> stackobj, are also no longer correct.
> 
> Please update.

Not sure if I found all:
http://cr.openjdk.java.net/~rehn/8234086/v2/inc/webrev/index.html
http://cr.openjdk.java.net/~rehn/8234086/v2/full/webrev/index.html

Thanks, Robbin

> 
> Thanks,
> David
> -----
> 
>> CMS was the last real user of the more advantage features of VM operation.
>> VM operation can be simplified to always be an stack object and thus either be
>> of safepoint or no safepoint type.
>>
>> VM_EnableBiasedLocking is executed once by watcher thread, if needed (default 
>> not used). Making it synchrone doesn't matter.
>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>> safepoint anyways, no real point in not stopping direct.
>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op is 
>> not needed. Arguably this thread should actually stop here, since we are about 
>> to safepoint.
>>
>> There is also a small cleanup in vmThread.cpp where an unused method is removed.
>> And the extra safepoint is removed:
>> "// We want to make sure that we get to a safepoint regularly"
>> No we don't :)
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8234086
>> Change-set:
>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>
>> Tested scavenge manually, passes t1-2.
>>
>> Thanks, Robbin

From markus.gronlund at oracle.com  Tue Nov 19 14:38:26 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Tue, 19 Nov 2019 06:38:26 -0800 (PST)
Subject: 8233197(S): Invert JvmtiExport::post_vm_initialized() and
 Jfr:on_vm_start() start-up order for correct option parsing
Message-ID: <b2bf81c0-80fa-49e4-ac09-8fa6589b1e80@default>

Greetings,

(apologies for the wide distribution)

Kindly asking for reviews for the following changeset:

Bug: https://bugs.openjdk.java.net/browse/JDK-8233197 
Webrev: http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/
Testing: serviceability/jvmti, jdk_jfr, tier1-5
Summary: please see bug for description.

For Runtime / Serviceability folks:
This change slightly modifies the relative order in Threads::create_vm(); please see threads.cpp.
There is an upcall as part of Jfr::on_vm_start() that delivers global JFR command-line options to Java (only if set).
The behavioral change amounts to a few classes loaded as part of establishing this upcall (all internal JFR classes and/or java.base classes, loaded by the bootloader) no longer being visible to the ClassFileLoadHook's of agents. These classes are visible to agents that work with "early_start" JVMTI environments however.

The major part of JFR startup with associated class loading still happens as part of Jfr::on_vm_live() with no behavioral change in relation to agents.

Thank you
Markus

From daniel.daugherty at oracle.com  Tue Nov 19 14:41:36 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 19 Nov 2019 09:41:36 -0500
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
 <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
Message-ID: <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>

Hi David,

Thanks for the quick review!

On 11/18/19 9:23 PM, David Holmes wrote:
> Hi Dan,
>
> Given:
>
> ?volatile intptr_t _recursions;
>
> The change to the print statements to use INTX_FORMAT instead of the 
> existing INTPTR_FORMAT seems a little odd - but obviously you don't 
> want it printed in hex.

Yup. I first noticed it when I wrote and tested my initial version of
ObjectMonitor::print_debug_style_on() and then I realized that it was all
over with other prints of that field.


> That seems fine, but can we then make the simple change to redefine 
> _recursions as intx as well - which is a semantic no-op given:
>
> typedef intptr_t? intx;

No problem. I'll make that change before I push. Not sure why that
didn't occur to me before.


> Otherwise all seems okay.

Thanks! Now, if we can just get a quick review of macroAssembler_x86.cpp
from Vladimir K... :-)

Dan


>
> Thanks,
> David
>
> On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have another round of baseline cleanup changes from the Async Monitor
>> Deflation project (8153224). Like previous sub-tasks of 8153224, these
>> changes are small and/or trivial. These changes have previously been
>> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
>>
>> Vladimir K., if you could sanity check the cleanups in 
>> macroAssembler_x86.cpp
>> that would be appreciated (only comments were changed). I recommend the
>> Udiff view...
>>
>> Please see the bug for details about the changes in this webrev:
>>
>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation 
>> v2.0[789]
>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>>
>> Here's the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
>>
>> These changes have been included in my recent rounds of Mach5 Tier[1-8]
>> and other associated stress and/or performance testing. I have also done
>> a Mach5 Tier[1-3] run with just this patch to make sure that I got all
>> the pieces that are needed.
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan


From kim.barrett at oracle.com  Tue Nov 19 15:55:04 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 19 Nov 2019 10:55:04 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
Message-ID: <29CF8C53-D058-4527-8866-1B1D4DB9998A@oracle.com>

> On Nov 19, 2019, at 8:51 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> 
> Hi David, thanks for having a look.
> 
> On 11/19/19 2:27 PM, David Holmes wrote:
>> Hi Robbin,
>> On 19/11/2019 9:05 pm, Robbin Ehn wrote:
>>> Hi all, please review.
>> Preliminary comments ... so ...
>> +class VM_Operation : public StackObj {
>>   public:
>>    enum Mode {
>>      _safepoint,       // blocking,        safepoint, vm_op C-heap allocated
>> -    _no_safepoint,    // blocking,     no safepoint, vm_op C-Heap allocated
>> -    _concurrent,      // non-blocking, no safepoint, vm_op C-Heap allocated
>> -    _async_safepoint  // non-blocking,    safepoint, vm_op C-Heap allocated
>> +    _no_safepoint    // blocking,     no safepoint, vm_op C-Heap allocated
>>    };
>> You are basically getting rid of concurrent and async_safepoint VM op capability. Okay. But you're also making all VM ops StackObj so all those "VM op C-heap allocated" comments are no longer correct. Also many of the comments around the VM ops you have changed from async to sync,and from C-heap to stackobj, are also no longer correct.
>> Please update.
> 
> Not sure if I found all:
> http://cr.openjdk.java.net/~rehn/8234086/v2/inc/webrev/index.html
> http://cr.openjdk.java.net/~rehn/8234086/v2/full/webrev/index.html

------------------------------------------------------------------------------
src/hotspot/share/runtime/biasedLocking.cpp
  83     // Use async VM operation to avoid blocking the Watcher thread.
  84     // VM Thread will free C heap storage.

These comments appear to be obsolete.

------------------------------------------------------------------------------
src/hotspot/share/runtime/safepoint.cpp
 535 bool SafepointSynchronize::is_forced_cleanup_needed() {
 536   return ObjectSynchronizer::force_monitor_scavage();
 537 }

src/hotspot/share/runtime/synchronizer.cpp 
 917   return force_monitor_scavage();

src/hotspot/share/runtime/synchronizer.cpp 
 920 bool ObjectSynchronizer::force_monitor_scavage() {

force_monitor_scavage typo in "scavage".

And that name seems like it has side effects (forces something), not a
predicate. It's testing whether ForceMonitorScavenge is true, but I
think a better name is needed.

------------------------------------------------------------------------------
src/hotspot/share/runtime/synchronizer.cpp
1006   if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, &ForceMonitorScavenge) == 0) {
1007     VMThread::check_cleanup();

I was going to make a couple nit-pick comments here, but this seems to
be soon to be dead code (only reached when deprecated MonitorBound has
a non-default value).

Not sure it's worth changing this and adding check_cleanup just for
use here though, given this code is going away soon. Are there planned
future uses of check_cleanup?

------------------------------------------------------------------------------
src/hotspot/share/runtime/vmOperations.hpp  
Removed:
 156   virtual ~VM_Operation() {}

Simple removal may cause warnings with some flags that have been
discussed being enabled. This change violates the "usual rule" of
public and virtual or non-public for base class destructors. Of
course, that rule (and any warnings that might try to check for it)
doesn't account for non-heap-allocatable classes. And HotSpot is full
of violations of that rule, so we're probably a long way from being
able to turn on some of those kinds of warnings.

OTOH, leaving the destructor alone has very little cost.

------------------------------------------------------------------------------
src/hotspot/share/runtime/vmOperations.hpp 
 190   bool evaluate_at_safepoint() const {

Nice to see this no longer being virtual, and the preceeding big block
caution comment being gone.

------------------------------------------------------------------------------


From martin.doerr at sap.com  Tue Nov 19 16:51:46 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 19 Nov 2019 16:51:46 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
Message-ID: <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>

Hi Dan,

> As for the whole "too soon to deprecate" discussion: Deprecation is not
> making the code obsolete so this changeset is not taking anything away
> other than changing the default of UseBiasedLocking from true to false.
> There are things that have been deprecated since JDK8 and they still
> have not yet been made obsolete.

I think deprecating before publishing an evaluation or at least having a discussion is not appropriate.

> Deprecating biased locking is the proper way of saying that we (Oracle)
> and/or others think that biased locking should/will go away in a future
> release. Yes, there are locking experts outside of Oracle that have said
> that biased locking should go away, but I haven't gotten permission to
> quote the folks (yet)...

There should be consent on the direction of possibly removing it before communicating it the hard way.
However, switching it off for evaluation sounds feasible to me.
Seems like we have some homework, too.

Thanks, Patricio, for going the JEP way. I think changes with less impact have already been handled as JEP.

Best regards,
Martin


> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
> Sent: Dienstag, 19. November 2019 00:06
> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR 8231264: Disable biased-locking and deprecate all flags
> related to biased-locking
> 
> Hi Patricio,
> 
> On 11/15/19 9:15 PM, Patricio Chilano wrote:
> > Hi all,
> >
> > Could you review the following patch?
> >
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
> > Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev
> 
> src/hotspot/share/runtime/arguments.cpp
>  ??? Is it too early to specify the obsolete_in and expired_in values?
>  ??? They could be JDK_Version::undefined() so that all you are doing
>  ??? is deprecation in this changeset.
> 
> src/hotspot/share/runtime/globals.hpp
>  ??? No comments.
> 
> test/hotspot/gtest/oops/test_markWord.cpp
>  ??? L96: ??? // Can't test this with biased locking disabled.
>  ??????? Perhaps (since the comment is inside the if-statement):
>  ???????????? // This sub-test requires biased locking to be enabled.
> 
>  ??? L11[135] - Why indent the pre-processor controls? Left most
>  ??????? column is generally the style used.
> 
>  ??? L115: ? // Same thread tries to lock it again.
>  ??????? This comment needs a rewrite. Perhaps:
>  ??????????? // Lock the object using an ObjectLocker helper which
>  ??????????? // will revoke the bias if we happened to use that
>  ??????????? // mechanism above.
> 
>  ??? L121: ? // This is no longer biased, because ObjectLocker revokes
> the bias.
>  ??????? This comment needs a rewrite. Perhaps:
>  ??????????? // The object should be unlocked with no hashCode at
>  ??????????? // this point (ObjectLocker dtr has run).
> 
> test/jdk/jdk/jfr/event/runtime/TestBiasedLockRevocationEvents.java
>  ??? No comments.
> 
> Thumbs up! My comments are mostly nits so I don't need to see a new
> webrev if you decide to make changes based on my suggestions.
> 
> As for the whole "too soon to deprecate" discussion: Deprecation is not
> making the code obsolete so this changeset is not taking anything away
> other than changing the default of UseBiasedLocking from true to false.
> There are things that have been deprecated since JDK8 and they still
> have not yet been made obsolete.
> 
> Deprecating biased locking is the proper way of saying that we (Oracle)
> and/or others think that biased locking should/will go away in a future
> release. Yes, there are locking experts outside of Oracle that have said
> that biased locking should go away, but I haven't gotten permission to
> quote the folks (yet)...
> 
> Deprecation is not final. Features can be un-deprecated if some
> relevant facts and/or info changes the previous conclusion.
> 
> Dan
> 
> 
> 
> >
> > Biased locking will be disabled by default and all related flags will
> > be deprecated. Performance gains seen when the feature was introduced
> > in the VM are less clear today with modern Java code/processors.
> > Detailed rationale behind the change is included on the description of
> > the bug.
> >
> > I modified test gtest/oops/test_markWord.cpp so that it still
> > exercises other cases of markword printing.
> >
> > Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows
> > and Solaris).
> >
> > Thanks,
> > Patricio
> >
> >


From vladimir.kozlov at oracle.com  Tue Nov 19 19:38:19 2019
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 19 Nov 2019 11:38:19 -0800
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
 <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
 <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
Message-ID: <53168bec-ca88-e203-4c3d-60d729e23fce@oracle.com>

macroAssembler_x86.cpp changes are only in comments which looks good to me.

Fast_Lock/Fast_Unlock were names before I moved code from .ad files to macroAssembler_x86.cpp (8033805).

About "Without cast to int32_t this style of movptr will destroy r10 which is typically obj."
64-bit version of movptr() has 2 versions of second argument, intptr_t and int32_t:

http://hg.openjdk.java.net/jdk/jdk/file/faac483dfb30/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l728

intptr_t version use rscratch1 (R10) which is most likely is used and have valid value in it (in C2 generated code):

http://hg.openjdk.java.net/jdk/jdk/file/faac483dfb30/src/hotspot/cpu/x86/x86_64.ad#l134

Vladimir

On 11/19/19 6:41 AM, Daniel D. Daugherty wrote:
> Hi David,
> 
> Thanks for the quick review!
> 
> On 11/18/19 9:23 PM, David Holmes wrote:
>> Hi Dan,
>>
>> Given:
>>
>> ?volatile intptr_t _recursions;
>>
>> The change to the print statements to use INTX_FORMAT instead of the existing INTPTR_FORMAT seems a little odd - but 
>> obviously you don't want it printed in hex.
> 
> Yup. I first noticed it when I wrote and tested my initial version of
> ObjectMonitor::print_debug_style_on() and then I realized that it was all
> over with other prints of that field.
> 
> 
>> That seems fine, but can we then make the simple change to redefine _recursions as intx as well - which is a semantic 
>> no-op given:
>>
>> typedef intptr_t? intx;
> 
> No problem. I'll make that change before I push. Not sure why that
> didn't occur to me before.
> 
> 
>> Otherwise all seems okay.
> 
> Thanks! Now, if we can just get a quick review of macroAssembler_x86.cpp
> from Vladimir K... :-)
> 
> Dan
> 
> 
>>
>> Thanks,
>> David
>>
>> On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have another round of baseline cleanup changes from the Async Monitor
>>> Deflation project (8153224). Like previous sub-tasks of 8153224, these
>>> changes are small and/or trivial. These changes have previously been
>>> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
>>>
>>> Vladimir K., if you could sanity check the cleanups in macroAssembler_x86.cpp
>>> that would be appreciated (only comments were changed). I recommend the
>>> Udiff view...
>>>
>>> Please see the bug for details about the changes in this webrev:
>>>
>>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation v2.0[789]
>>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>>>
>>> Here's the webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
>>>
>>> These changes have been included in my recent rounds of Mach5 Tier[1-8]
>>> and other associated stress and/or performance testing. I have also done
>>> a Mach5 Tier[1-3] run with just this patch to make sure that I got all
>>> the pieces that are needed.
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
> 

From daniel.daugherty at oracle.com  Tue Nov 19 20:11:50 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 19 Nov 2019 15:11:50 -0500
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <53168bec-ca88-e203-4c3d-60d729e23fce@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
 <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
 <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
 <53168bec-ca88-e203-4c3d-60d729e23fce@oracle.com>
Message-ID: <e41f2b6e-ba40-23b9-b5ae-001bad9a8727@oracle.com>

Vladimir,

Thanks for the review!


On 11/19/19 2:38 PM, Vladimir Kozlov wrote:
> macroAssembler_x86.cpp changes are only in comments which looks good 
> to me.

Thanks.


> Fast_Lock/Fast_Unlock were names before I moved code from .ad files to 
> macroAssembler_x86.cpp (8033805).

Yup. I figured we should match the current names.


> About "Without cast to int32_t this style of movptr will destroy r10 
> which is typically obj."
> 64-bit version of movptr() has 2 versions of second argument, intptr_t 
> and int32_t:
>
> http://hg.openjdk.java.net/jdk/jdk/file/faac483dfb30/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l728 
>
>
> intptr_t version use rscratch1 (R10) which is most likely is used and 
> have valid value in it (in C2 generated code):
>
> http://hg.openjdk.java.net/jdk/jdk/file/faac483dfb30/src/hotspot/cpu/x86/x86_64.ad#l134 
>

Thanks for confirmation that the comment is still valid.

Again, thanks for the review!

Dan


>
> Vladimir
>
> On 11/19/19 6:41 AM, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Thanks for the quick review!
>>
>> On 11/18/19 9:23 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Given:
>>>
>>> ?volatile intptr_t _recursions;
>>>
>>> The change to the print statements to use INTX_FORMAT instead of the 
>>> existing INTPTR_FORMAT seems a little odd - but obviously you don't 
>>> want it printed in hex.
>>
>> Yup. I first noticed it when I wrote and tested my initial version of
>> ObjectMonitor::print_debug_style_on() and then I realized that it was 
>> all
>> over with other prints of that field.
>>
>>
>>> That seems fine, but can we then make the simple change to redefine 
>>> _recursions as intx as well - which is a semantic no-op given:
>>>
>>> typedef intptr_t? intx;
>>
>> No problem. I'll make that change before I push. Not sure why that
>> didn't occur to me before.
>>
>>
>>> Otherwise all seems okay.
>>
>> Thanks! Now, if we can just get a quick review of macroAssembler_x86.cpp
>> from Vladimir K... :-)
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have another round of baseline cleanup changes from the Async 
>>>> Monitor
>>>> Deflation project (8153224). Like previous sub-tasks of 8153224, these
>>>> changes are small and/or trivial. These changes have previously been
>>>> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
>>>>
>>>> Vladimir K., if you could sanity check the cleanups in 
>>>> macroAssembler_x86.cpp
>>>> that would be appreciated (only comments were changed). I recommend 
>>>> the
>>>> Udiff view...
>>>>
>>>> Please see the bug for details about the changes in this webrev:
>>>>
>>>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation 
>>>> v2.0[789]
>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>>>>
>>>> Here's the webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
>>>>
>>>> These changes have been included in my recent rounds of Mach5 
>>>> Tier[1-8]
>>>> and other associated stress and/or performance testing. I have also 
>>>> done
>>>> a Mach5 Tier[1-3] run with just this patch to make sure that I got all
>>>> the pieces that are needed.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>


From leonid.mesnik at oracle.com  Tue Nov 19 21:30:32 2019
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Tue, 19 Nov 2019 13:30:32 -0800
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <cb5a98f0-82fe-9306-38f4-ffbc7b81a732@redhat.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
 <66dde824-a453-f3de-a9f6-0781065f882b@redhat.com>
 <667049ec-9c07-996a-cdd7-7af1cbbbfe57@oracle.com>
 <cb5a98f0-82fe-9306-38f4-ffbc7b81a732@redhat.com>
Message-ID: <898546a1-bec9-b852-2c73-26e741465bf2@oracle.com>

Hi

New patch works fine.

Leonid

On 11/17/19 7:31 AM, Zhengyu Gu wrote:
> Hi,
>
> Could you test this patch? 
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
>
>
> I was able to reproduce original bug with 
> compiler/codecache/stress/UnexpectedDeoptimizationTest.java + NMT, 
> newly added assertion caught the problem.
>
> I also fixed the new test failure on Windows. JDK-8204128 patch missed 
> one long -> ssize_t change. long on Windows is 4-bytes vs 8-bytes on 
> other 64-bits platforms, that explains why it only fails on Windows.
>
> This patch passed submit test.
>
> [Mach5] mach5-one-zgu-JDK-8234270-1-20191117-1425-6784823: PASSED
>
> Thanks,
>
> -Zhengyu
>
>
> On 11/16/19 1:42 PM, Leonid Mesnik wrote:
>> Hi
>>
>> Unfortunately, I don't know any publicly available test which could 
>> reproduce this issue right now.
>>
>> We also run some jdk/hotstpot regression tests with NMT enabled, 
>> however not sure if they could be used as reproducers.
>>
>> Leonid
>>
>> On 11/16/19 6:04 AM, Zhengyu Gu wrote:
>>> Hi Leonid,
>>>
>>> On 11/15/19 6:43 PM, Leonid Mesnik wrote:
>>>> It would be better to backout fix.
>>>>
>>>> Other tests executed with NMT triggered this assertion also. So we 
>>>> are going to have a lot of assertions.
>>>
>>> Are any of these "other tests" publicly available? If so, could you 
>>> point me what are they?
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>>
>>>> Leonid
>>>>
>>>> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>>>>> following fix:
>>>>>
>>>>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>>>
>>>>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>>>>> failure is
>>>>> tracked by:
>>>>>
>>>>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>>>> failed: deallocation > allocated
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>>>
>>>>> To keep the noise in the CI down over the weekend, I'm putting the 
>>>>> test
>>>>> on the ProblemList for Win* using this bug:
>>>>>
>>>>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>>>>
>>>>> Here's the trivial diff:
>>>>>
>>>>> $ hg diff
>>>>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 
>>>>> 2019 -0800
>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 
>>>>> 2019 -0500
>>>>> @@ -90,6 +90,7 @@
>>>>> ?# :hotspot_runtime
>>>>>
>>>>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>>>>> aix-ppc64
>>>>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>>>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>>>>
>>>>> ?############################################################################# 
>>>>>
>>>>>
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>
>>>
>>
>

From zgu at redhat.com  Tue Nov 19 21:40:23 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 19 Nov 2019 16:40:23 -0500
Subject: RFR(T): 8234272 ProblemList runtime/NMT/HugeArenaTracking.java
In-Reply-To: <898546a1-bec9-b852-2c73-26e741465bf2@oracle.com>
References: <115bf1e2-8f57-ebd3-de5d-c891d20ac19c@oracle.com>
 <2c087e61-2f10-8707-7b2f-4071b8b86582@oracle.com>
 <66dde824-a453-f3de-a9f6-0781065f882b@redhat.com>
 <667049ec-9c07-996a-cdd7-7af1cbbbfe57@oracle.com>
 <cb5a98f0-82fe-9306-38f4-ffbc7b81a732@redhat.com>
 <898546a1-bec9-b852-2c73-26e741465bf2@oracle.com>
Message-ID: <3250ea78-ccfe-7f03-140d-a9de5445d29d@redhat.com>

Thanks for verifying, Leonid.

-Zhengyu

On 11/19/19 4:30 PM, Leonid Mesnik wrote:
> Hi
> 
> New patch works fine.
> 
> Leonid
> 
> On 11/17/19 7:31 AM, Zhengyu Gu wrote:
>> Hi,
>>
>> Could you test this patch? 
>> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
>>
>>
>> I was able to reproduce original bug with 
>> compiler/codecache/stress/UnexpectedDeoptimizationTest.java + NMT, 
>> newly added assertion caught the problem.
>>
>> I also fixed the new test failure on Windows. JDK-8204128 patch missed 
>> one long -> ssize_t change. long on Windows is 4-bytes vs 8-bytes on 
>> other 64-bits platforms, that explains why it only fails on Windows.
>>
>> This patch passed submit test.
>>
>> [Mach5] mach5-one-zgu-JDK-8234270-1-20191117-1425-6784823: PASSED
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>> On 11/16/19 1:42 PM, Leonid Mesnik wrote:
>>> Hi
>>>
>>> Unfortunately, I don't know any publicly available test which could 
>>> reproduce this issue right now.
>>>
>>> We also run some jdk/hotstpot regression tests with NMT enabled, 
>>> however not sure if they could be used as reproducers.
>>>
>>> Leonid
>>>
>>> On 11/16/19 6:04 AM, Zhengyu Gu wrote:
>>>> Hi Leonid,
>>>>
>>>> On 11/15/19 6:43 PM, Leonid Mesnik wrote:
>>>>> It would be better to backout fix.
>>>>>
>>>>> Other tests executed with NMT triggered this assertion also. So we 
>>>>> are going to have a lot of assertions.
>>>>
>>>> Are any of these "other tests" publicly available? If so, could you 
>>>> point me what are they?
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>>
>>>>> Leonid
>>>>>
>>>>> On 11/15/19 3:26 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> runtime/NMT/HugeArenaTracking.java is a new test added by the 
>>>>>> following fix:
>>>>>>
>>>>>> ??? JDK-8204128 NMT might report incorrect numbers for Compiler area
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8204128
>>>>>>
>>>>>> The test is failing in the JDK-14 CI on the Win* platforms. That 
>>>>>> failure is
>>>>>> tracked by:
>>>>>>
>>>>>> ??? JDK-8234270 HugeArenaTracking.java failed assert(_size >= sz) 
>>>>>> failed: deallocation > allocated
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234270
>>>>>>
>>>>>> To keep the noise in the CI down over the weekend, I'm putting the 
>>>>>> test
>>>>>> on the ProblemList for Win* using this bug:
>>>>>>
>>>>>> ??? JDK-8234272 ProblemList runtime/NMT/HugeArenaTracking.java
>>>>>> ??? https://bugs.openjdk.java.net/browse/JDK-8234272
>>>>>>
>>>>>> Here's the trivial diff:
>>>>>>
>>>>>> $ hg diff
>>>>>> diff -r 8e7f29b1ad4a test/hotspot/jtreg/ProblemList.txt
>>>>>> --- a/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 14:22:24 
>>>>>> 2019 -0800
>>>>>> +++ b/test/hotspot/jtreg/ProblemList.txt??? Fri Nov 15 18:22:00 
>>>>>> 2019 -0500
>>>>>> @@ -90,6 +90,7 @@
>>>>>> ?# :hotspot_runtime
>>>>>>
>>>>>> ?runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 
>>>>>> aix-ppc64
>>>>>> +runtime/nmt/HugeArenaTracking.java 8234270 windows-x64
>>>>>> ?runtime/ReservedStack/ReservedStackTest.java 8231031 generic-all
>>>>>>
>>>>>> ?############################################################################# 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>
>>>>
>>>
>>
> 


From daniel.daugherty at oracle.com  Tue Nov 19 23:21:45 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 19 Nov 2019 18:21:45 -0500
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
 <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
 <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
Message-ID: <bcb08366-a17c-2c82-785e-886015c27228@oracle.com>

So I changed the _recursions field from 'intptr_t' -> 'intx' and then
I grepped around for uses of the field. One thread on the sweater led
to another... It's still a pretty trivial change... especially if you
look at the Udiff links...

Incremental webrev:

http://cr.openjdk.java.net/~dcubed/8230876-webrev/1-for-jdk14.inc/

Full webrev:

http://cr.openjdk.java.net/~dcubed/8230876-webrev/1-for-jdk14.full/

I'm rerunning Mach5 Tier[1-3] just to be sure...

Dan


On 11/19/19 9:41 AM, Daniel D. Daugherty wrote:
> Hi David,
>
> Thanks for the quick review!
>
> On 11/18/19 9:23 PM, David Holmes wrote:
>> Hi Dan,
>>
>> Given:
>>
>> ?volatile intptr_t _recursions;
>>
>> The change to the print statements to use INTX_FORMAT instead of the 
>> existing INTPTR_FORMAT seems a little odd - but obviously you don't 
>> want it printed in hex.
>
> Yup. I first noticed it when I wrote and tested my initial version of
> ObjectMonitor::print_debug_style_on() and then I realized that it was all
> over with other prints of that field.
>
>
>> That seems fine, but can we then make the simple change to redefine 
>> _recursions as intx as well - which is a semantic no-op given:
>>
>> typedef intptr_t? intx;
>
> No problem. I'll make that change before I push. Not sure why that
> didn't occur to me before.
>
>
>> Otherwise all seems okay.
>
> Thanks! Now, if we can just get a quick review of macroAssembler_x86.cpp
> from Vladimir K... :-)
>
> Dan
>
>
>>
>> Thanks,
>> David
>>
>> On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have another round of baseline cleanup changes from the Async Monitor
>>> Deflation project (8153224). Like previous sub-tasks of 8153224, these
>>> changes are small and/or trivial. These changes have previously been
>>> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
>>>
>>> Vladimir K., if you could sanity check the cleanups in 
>>> macroAssembler_x86.cpp
>>> that would be appreciated (only comments were changed). I recommend the
>>> Udiff view...
>>>
>>> Please see the bug for details about the changes in this webrev:
>>>
>>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation 
>>> v2.0[789]
>>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>>>
>>> Here's the webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
>>>
>>> These changes have been included in my recent rounds of Mach5 Tier[1-8]
>>> and other associated stress and/or performance testing. I have also 
>>> done
>>> a Mach5 Tier[1-3] run with just this patch to make sure that I got all
>>> the pieces that are needed.
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>
>


From david.holmes at oracle.com  Tue Nov 19 23:23:58 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 20 Nov 2019 09:23:58 +1000
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <bcb08366-a17c-2c82-785e-886015c27228@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
 <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
 <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
 <bcb08366-a17c-2c82-785e-886015c27228@oracle.com>
Message-ID: <055fd45b-727b-0ade-89c7-3a6f5b253f86@oracle.com>

On 20/11/2019 9:21 am, Daniel D. Daugherty wrote:
> So I changed the _recursions field from 'intptr_t' -> 'intx' and then
> I grepped around for uses of the field. One thread on the sweater led
> to another... It's still a pretty trivial change... especially if you
> look at the Udiff links...
> 
> Incremental webrev:
> 
> http://cr.openjdk.java.net/~dcubed/8230876-webrev/1-for-jdk14.inc/

That all looks fine to me.

Thanks,
David
-----

> Full webrev:
> 
> http://cr.openjdk.java.net/~dcubed/8230876-webrev/1-for-jdk14.full/
> 
> I'm rerunning Mach5 Tier[1-3] just to be sure...
> 
> Dan
> 
> 
> On 11/19/19 9:41 AM, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Thanks for the quick review!
>>
>> On 11/18/19 9:23 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Given:
>>>
>>> ?volatile intptr_t _recursions;
>>>
>>> The change to the print statements to use INTX_FORMAT instead of the 
>>> existing INTPTR_FORMAT seems a little odd - but obviously you don't 
>>> want it printed in hex.
>>
>> Yup. I first noticed it when I wrote and tested my initial version of
>> ObjectMonitor::print_debug_style_on() and then I realized that it was all
>> over with other prints of that field.
>>
>>
>>> That seems fine, but can we then make the simple change to redefine 
>>> _recursions as intx as well - which is a semantic no-op given:
>>>
>>> typedef intptr_t? intx;
>>
>> No problem. I'll make that change before I push. Not sure why that
>> didn't occur to me before.
>>
>>
>>> Otherwise all seems okay.
>>
>> Thanks! Now, if we can just get a quick review of macroAssembler_x86.cpp
>> from Vladimir K... :-)
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have another round of baseline cleanup changes from the Async Monitor
>>>> Deflation project (8153224). Like previous sub-tasks of 8153224, these
>>>> changes are small and/or trivial. These changes have previously been
>>>> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
>>>>
>>>> Vladimir K., if you could sanity check the cleanups in 
>>>> macroAssembler_x86.cpp
>>>> that would be appreciated (only comments were changed). I recommend the
>>>> Udiff view...
>>>>
>>>> Please see the bug for details about the changes in this webrev:
>>>>
>>>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation 
>>>> v2.0[789]
>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>>>>
>>>> Here's the webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
>>>>
>>>> These changes have been included in my recent rounds of Mach5 Tier[1-8]
>>>> and other associated stress and/or performance testing. I have also 
>>>> done
>>>> a Mach5 Tier[1-3] run with just this patch to make sure that I got all
>>>> the pieces that are needed.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>
>>
> 

From daniel.daugherty at oracle.com  Tue Nov 19 23:25:34 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 19 Nov 2019 18:25:34 -0500
Subject: RFR(S/T): 8230876: baseline cleanups from Async Monitor Deflation
 v2.0[789]
In-Reply-To: <055fd45b-727b-0ade-89c7-3a6f5b253f86@oracle.com>
References: <420b2d84-82a2-f782-7dcc-175472bf6a5a@oracle.com>
 <44efa18f-5524-8465-fb0c-cab41a1569af@oracle.com>
 <ec005696-f447-9fc8-4ffd-5957b27fc40f@oracle.com>
 <bcb08366-a17c-2c82-785e-886015c27228@oracle.com>
 <055fd45b-727b-0ade-89c7-3a6f5b253f86@oracle.com>
Message-ID: <9074fd3f-1273-38a4-119c-3b66d736cda4@oracle.com>

On 11/19/19 6:23 PM, David Holmes wrote:
> On 20/11/2019 9:21 am, Daniel D. Daugherty wrote:
>> So I changed the _recursions field from 'intptr_t' -> 'intx' and then
>> I grepped around for uses of the field. One thread on the sweater led
>> to another... It's still a pretty trivial change... especially if you
>> look at the Udiff links...
>>
>> Incremental webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/1-for-jdk14.inc/
>
> That all looks fine to me.

Thanks for the very fast re-review!

Dan


>
> Thanks,
> David
> -----
>
>> Full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/1-for-jdk14.full/
>>
>> I'm rerunning Mach5 Tier[1-3] just to be sure...
>>
>> Dan
>>
>>
>> On 11/19/19 9:41 AM, Daniel D. Daugherty wrote:
>>> Hi David,
>>>
>>> Thanks for the quick review!
>>>
>>> On 11/18/19 9:23 PM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> Given:
>>>>
>>>> ?volatile intptr_t _recursions;
>>>>
>>>> The change to the print statements to use INTX_FORMAT instead of 
>>>> the existing INTPTR_FORMAT seems a little odd - but obviously you 
>>>> don't want it printed in hex.
>>>
>>> Yup. I first noticed it when I wrote and tested my initial version of
>>> ObjectMonitor::print_debug_style_on() and then I realized that it 
>>> was all
>>> over with other prints of that field.
>>>
>>>
>>>> That seems fine, but can we then make the simple change to redefine 
>>>> _recursions as intx as well - which is a semantic no-op given:
>>>>
>>>> typedef intptr_t? intx;
>>>
>>> No problem. I'll make that change before I push. Not sure why that
>>> didn't occur to me before.
>>>
>>>
>>>> Otherwise all seems okay.
>>>
>>> Thanks! Now, if we can just get a quick review of 
>>> macroAssembler_x86.cpp
>>> from Vladimir K... :-)
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 19/11/2019 8:23 am, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have another round of baseline cleanup changes from the Async 
>>>>> Monitor
>>>>> Deflation project (8153224). Like previous sub-tasks of 8153224, 
>>>>> these
>>>>> changes are small and/or trivial. These changes have previously been
>>>>> reviewed as a (very) small part of 8153224 (CR8/v2.08/11-for-jdk14).
>>>>>
>>>>> Vladimir K., if you could sanity check the cleanups in 
>>>>> macroAssembler_x86.cpp
>>>>> that would be appreciated (only comments were changed). I 
>>>>> recommend the
>>>>> Udiff view...
>>>>>
>>>>> Please see the bug for details about the changes in this webrev:
>>>>>
>>>>> ???? JDK-8230876 baseline cleanups from Async Monitor Deflation 
>>>>> v2.0[789]
>>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8230876
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8230876-webrev/0-for-jdk14/
>>>>>
>>>>> These changes have been included in my recent rounds of Mach5 
>>>>> Tier[1-8]
>>>>> and other associated stress and/or performance testing. I have 
>>>>> also done
>>>>> a Mach5 Tier[1-3] run with just this patch to make sure that I got 
>>>>> all
>>>>> the pieces that are needed.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>
>>>
>>


From serguei.spitsyn at oracle.com  Wed Nov 20 02:20:32 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 19 Nov 2019 18:20:32 -0800
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <b814ad82-508b-8eec-6b6e-3792e23dd72c@oracle.com>

Hi Christoph,

The fix looks good to me.
I'd recommend to run the jdk_instrument and vmTestbase_nsk_jvmti tests.
Also, it can be safe to run:
 ? open/test/hotspot/jtreg/serviceability/jvmti
 ? open/test/hotspot/jtreg/runtime/cds/appcds
 ? open/test/hotspot/jtreg/runtime/BootClassAppendProp

Thanks,
Serguei

On 11/14/19 07:37, Langer, Christoph wrote:
> Hi,
>
> please review this cleanup change regarding function "canonicalize" of libjava.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>
>
> The goal is to cleanup how this function is defined and used. One thing is, that there was an unnecessary wrapper function "Canonicalize" in jni_util.c. It wrapped the call to "canonicalize". We can get rid of this wrapper. Unfortunately, it is not possible to just export "canonicalize" since this will conflict with a method signature from the math library, at least on modern Linuxes. So I decided to call the method JDK_Canonicalize and will correctly define it in jdk_util.h which can be included everywhere.
>
>
>
> Hotspot's classloader.cpp will dynamically resolve this method, so I add a local declaration of the function pointer in there.
>
>
>
> This change shall be predecessor of JDK-8223261, where a review was already started here: https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063398.html
>
> Thanks
> Christoph
>


From david.holmes at oracle.com  Wed Nov 20 03:55:14 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 20 Nov 2019 13:55:14 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
Message-ID: <9b2c1c62-0c8c-e603-361d-1f57c681f308@oracle.com>

Hi Robbin,

Meta-comment: I found it difficult to identify exactly what was being 
removed/simplified. Please update the bug report with basic information 
on what you've actually done here: always stack allocated; no concurrent 
ops; no async ops.

Meta-comment2: I would have thought that concurrent handshake ops might 
be useful/desirable - though I assume your aim is to make such ops 
actually bypass the VMThread altogether in the future?

On 19/11/2019 11:51 pm, Robbin Ehn wrote:
> Hi David, thanks for having a look.
> 
> On 11/19/19 2:27 PM, David Holmes wrote:
>> Hi Robbin,
>>
>> On 19/11/2019 9:05 pm, Robbin Ehn wrote:
>>> Hi all, please review.
>>
>> Preliminary comments ... so ...
>>
>> +class VM_Operation : public StackObj {
>> ?? public:
>> ??? enum Mode {
>> ????? _safepoint,?????? // blocking,??????? safepoint, vm_op C-heap 
>> allocated
>> -??? _no_safepoint,??? // blocking,???? no safepoint, vm_op C-Heap 
>> allocated
>> -??? _concurrent,????? // non-blocking, no safepoint, vm_op C-Heap 
>> allocated
>> -??? _async_safepoint? // non-blocking,??? safepoint, vm_op C-Heap 
>> allocated
>> +??? _no_safepoint??? // blocking,???? no safepoint, vm_op C-Heap 
>> allocated
>> ??? };
>>
>>
>> You are basically getting rid of concurrent and async_safepoint VM op 
>> capability. Okay. But you're also making all VM ops StackObj so all 
>> those "VM op C-heap allocated" comments are no longer correct. Also 
>> many of the comments around the VM ops you have changed from async to 
>> sync,and from C-heap to stackobj, are also no longer correct.
>>
>> Please update.
> 
> Not sure if I found all:
> http://cr.openjdk.java.net/~rehn/8234086/v2/inc/webrev/index.html
> http://cr.openjdk.java.net/~rehn/8234086/v2/full/webrev/index.html

As per Kim's email no you didn't spot them all :)

Plus:

src/hotspot/share/runtime/thread.cpp

  // Enqueue a VM_Operation to do the job for us - sometime later

The "sometime later" is no longer applicable as it isn't async.

---

I'm really not seeing what the 
ObjectSynchronizer::force_monitor_scavage() related changes are all 
about. They don't seem to be part of the simplification but seem to be a 
separate issue. ??

---

src/hotspot/share/runtime/vmOperations.cpp/hpp

If there are only safepoint and no-safepoint ops now then we don't need 
a Mode enum just a virtual evaluate_at_safepoint() query.

Not sure how adding VM_QueueHead simplifies anything ??

The real simplification with this would come with getting rid of 
priorities :)

Minor nit: If there is no queue_add_front then queue_add_back could just 
be queue_add.

-         if (timedout) {
+         {

Not at all clear why the forced safepoint code is now unconditional. I 
see we check the interval more explicitly in VMThread::no_op_safepoint, 
but it isn't clear that we always want to waste time unlocking and 
relocking the lock unnecessarily.

The timedout local variable is also now unused.

  648     // only blocking VM operations need to verify the caller's 
safepoint state:

All ops are now blocking.

Thanks,
David


> Thanks, Robbin
> 
>>
>> Thanks,
>> David
>> -----
>>
>>> CMS was the last real user of the more advantage features of VM 
>>> operation.
>>> VM operation can be simplified to always be an stack object and thus 
>>> either be
>>> of safepoint or no safepoint type.
>>>
>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>> (default not used). Making it synchrone doesn't matter.
>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>> for the safepoint anyways, no real point in not stopping direct.
>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the 
>>> VM op is not needed. Arguably this thread should actually stop here, 
>>> since we are about to safepoint.
>>>
>>> There is also a small cleanup in vmThread.cpp where an unused method 
>>> is removed.
>>> And the extra safepoint is removed:
>>> "// We want to make sure that we get to a safepoint regularly"
>>> No we don't :)
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>> Change-set:
>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>
>>> Tested scavenge manually, passes t1-2.
>>>
>>> Thanks, Robbin

From boris.ulasevich at bell-sw.com  Wed Nov 20 06:32:49 2019
From: boris.ulasevich at bell-sw.com (Boris Ulasevich)
Date: Wed, 20 Nov 2019 09:32:49 +0300
Subject: RFR(S): 8233113: ARM32: assert on UnsafeJlong mutex rank check
In-Reply-To: <77cf0a9b-2fa2-db56-5db2-bc998373d052@oracle.com>
References: <5124def3-3bf7-8425-557d-c6cba6192927@bell-sw.com>
 <df1adebe-31d5-8cc2-d8c7-a22e2397c7b9@oracle.com>
 <5d0e1ad7-0ffa-54a1-fbc7-23aecc367c0c@bell-sw.com>
 <77cf0a9b-2fa2-db56-5db2-bc998373d052@oracle.com>
Message-ID: <c3a431ff-469f-246b-b789-420708f54f57@bell-sw.com>

Thank you!

On 19.11.2019 6:42, coleen.phillimore at oracle.com wrote:
> 
> This looks good!? Thank you for fixing it.
> Coleen
> 
> On 11/18/19 9:35 AM, Boris Ulasevich wrote:
>> David, thank you!
>>
>> Dear all,
>>
>> ? Can anybody else take a look at the review please?
>> ? Or should I consider the change trivial?
>>
>> thanks,
>> Boris
>>
>> On 11.11.2019 13:56, David Holmes wrote:
>>> Hi Boris,
>>>
>>> This seems fine to me.
>>>
>>> Thanks,
>>> David
>>>
>>> On 8/11/2019 11:28 pm, Boris Ulasevich wrote:
>>>> Hi,
>>>>
>>>> Recent JDK-8184732 change adds the assertion that fires on 
>>>> UnsafeJlong mutex rank check, on platforms without 64 bit atomics 
>>>> compare-and-exchange support. On preliminary review (thanks to 
>>>> Coleen and David!) it is suggested to remove the assertion and 
>>>> corresponding test codes.
>>>>
>>>> http://bugs.openjdk.java.net/browse/JDK-8233113
>>>> http://cr.openjdk.java.net/~bulasevich/8233113/webrev.01
>>>>
>>>> Thanks,
>>>> Boris
> 

From christoph.langer at sap.com  Wed Nov 20 07:47:36 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Wed, 20 Nov 2019 07:47:36 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <b814ad82-508b-8eec-6b6e-3792e23dd72c@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <b814ad82-508b-8eec-6b6e-3792e23dd72c@oracle.com>
Message-ID: <AM6PR02MB4801096CFF1BB307533C11F78A4F0@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi Serguei,

thanks for the review.

The patch successfully ran through our nightly test system which covers all these tests on several platforms.

Best regards
Christoph

> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Mittwoch, 20. November 2019 03:21
> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; gerard
> ziemski <gerard.ziemski at oracle.com>
> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
> libjava, hotspot and libinstrument
> 
> Hi Christoph,
> 
> The fix looks good to me.
> I'd recommend to run the jdk_instrument and vmTestbase_nsk_jvmti tests.
> Also, it can be safe to run:
>  ? open/test/hotspot/jtreg/serviceability/jvmti
>  ? open/test/hotspot/jtreg/runtime/cds/appcds
>  ? open/test/hotspot/jtreg/runtime/BootClassAppendProp
> 
> Thanks,
> Serguei
> 
> On 11/14/19 07:37, Langer, Christoph wrote:
> > Hi,
> >
> > please review this cleanup change regarding function "canonicalize" of
> libjava.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
> >
> >
> > The goal is to cleanup how this function is defined and used. One thing is,
> that there was an unnecessary wrapper function "Canonicalize" in jni_util.c.
> It wrapped the call to "canonicalize". We can get rid of this wrapper.
> Unfortunately, it is not possible to just export "canonicalize" since this will
> conflict with a method signature from the math library, at least on modern
> Linuxes. So I decided to call the method JDK_Canonicalize and will correctly
> define it in jdk_util.h which can be included everywhere.
> >
> >
> >
> > Hotspot's classloader.cpp will dynamically resolve this method, so I add a
> local declaration of the function pointer in there.
> >
> >
> >
> > This change shall be predecessor of JDK-8223261, where a review was
> already started here: https://mail.openjdk.java.net/pipermail/core-libs-
> dev/2019-November/063398.html
> >
> > Thanks
> > Christoph
> >


From serguei.spitsyn at oracle.com  Wed Nov 20 07:58:21 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 19 Nov 2019 23:58:21 -0800
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801096CFF1BB307533C11F78A4F0@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <b814ad82-508b-8eec-6b6e-3792e23dd72c@oracle.com>
 <AM6PR02MB4801096CFF1BB307533C11F78A4F0@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <17b627f8-c1ff-34c0-d21a-ef1cf2cede87@oracle.com>

Thanks, Christoph!
I forgot to tell that my recommendation is for the serviceability 
(j.l.instrument) coverage only.

Thanks,
Serguei

On 11/19/19 23:47, Langer, Christoph wrote:
> Hi Serguei,
>
> thanks for the review.
>
> The patch successfully ran through our nightly test system which covers all these tests on several platforms.
>
> Best regards
> Christoph
>
>> -----Original Message-----
>> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
>> Sent: Mittwoch, 20. November 2019 03:21
>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; gerard
>> ziemski <gerard.ziemski at oracle.com>
>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
>> libjava, hotspot and libinstrument
>>
>> Hi Christoph,
>>
>> The fix looks good to me.
>> I'd recommend to run the jdk_instrument and vmTestbase_nsk_jvmti tests.
>> Also, it can be safe to run:
>>   ? open/test/hotspot/jtreg/serviceability/jvmti
>>   ? open/test/hotspot/jtreg/runtime/cds/appcds
>>   ? open/test/hotspot/jtreg/runtime/BootClassAppendProp
>>
>> Thanks,
>> Serguei
>>
>> On 11/14/19 07:37, Langer, Christoph wrote:
>>> Hi,
>>>
>>> please review this cleanup change regarding function "canonicalize" of
>> libjava.
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>>>
>>>
>>> The goal is to cleanup how this function is defined and used. One thing is,
>> that there was an unnecessary wrapper function "Canonicalize" in jni_util.c.
>> It wrapped the call to "canonicalize". We can get rid of this wrapper.
>> Unfortunately, it is not possible to just export "canonicalize" since this will
>> conflict with a method signature from the math library, at least on modern
>> Linuxes. So I decided to call the method JDK_Canonicalize and will correctly
>> define it in jdk_util.h which can be included everywhere.
>>>
>>>
>>> Hotspot's classloader.cpp will dynamically resolve this method, so I add a
>> local declaration of the function pointer in there.
>>>
>>>
>>> This change shall be predecessor of JDK-8223261, where a review was
>> already started here: https://mail.openjdk.java.net/pipermail/core-libs-
>> dev/2019-November/063398.html
>>> Thanks
>>> Christoph
>>>


From robbin.ehn at oracle.com  Wed Nov 20 09:38:50 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 20 Nov 2019 10:38:50 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <29CF8C53-D058-4527-8866-1B1D4DB9998A@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
 <29CF8C53-D058-4527-8866-1B1D4DB9998A@oracle.com>
Message-ID: <307b86df-eef7-66ba-993c-be96c8eb90ec@oracle.com>

Hi Kim, thanks for looking at this.

On 11/19/19 4:55 PM, Kim Barrett wrote:
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/biasedLocking.cpp
>    83     // Use async VM operation to avoid blocking the Watcher thread.
>    84     // VM Thread will free C heap storage.
> 
> These comments appear to be obsolete.

Thanks, removed.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/safepoint.cpp
>   535 bool SafepointSynchronize::is_forced_cleanup_needed() {
>   536   return ObjectSynchronizer::force_monitor_scavage();
>   537 }
> 
> src/hotspot/share/runtime/synchronizer.cpp
>   917   return force_monitor_scavage();
> 
> src/hotspot/share/runtime/synchronizer.cpp
>   920 bool ObjectSynchronizer::force_monitor_scavage() {
> 
> force_monitor_scavage typo in "scavage".
> 
> And that name seems like it has side effects (forces something), not a
> predicate. It's testing whether ForceMonitorScavenge is true, but I
> think a better name is needed.

Changed name.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
> 1006   if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, &ForceMonitorScavenge) == 0) {
> 1007     VMThread::check_cleanup();
> 
> I was going to make a couple nit-pick comments here, but this seems to
> be soon to be dead code (only reached when deprecated MonitorBound has
> a non-default value).
> 
> Not sure it's worth changing this and adding check_cleanup just for
> use here though, given this code is going away soon. Are there planned
> future uses of check_cleanup?

No future plans, no.
Are you suggesting doing a synchronous safepoint here instead?

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/vmOperations.hpp
> Removed:
>   156   virtual ~VM_Operation() {}
> 
> Simple removal may cause warnings with some flags that have been
> discussed being enabled. This change violates the "usual rule" of
> public and virtual or non-public for base class destructors. Of
> course, that rule (and any warnings that might try to check for it)
> doesn't account for non-heap-allocatable classes. And HotSpot is full
> of violations of that rule, so we're probably a long way from being
> able to turn on some of those kinds of warnings.
> 
> OTOH, leaving the destructor alone has very little cost.

Added back.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/vmOperations.hpp
>   190   bool evaluate_at_safepoint() const {
> 
> Nice to see this no longer being virtual, and the preceeding big block
> caution comment being gone.
> 
> ------------------------------------------------------------------------------
> 

(sending v3 after I fixed David's comments)

Thanks, Robbin

From robbin.ehn at oracle.com  Wed Nov 20 12:57:21 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 20 Nov 2019 13:57:21 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <9b2c1c62-0c8c-e603-361d-1f57c681f308@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
 <9b2c1c62-0c8c-e603-361d-1f57c681f308@oracle.com>
Message-ID: <9dd6c64a-e9d3-3677-1722-339bf092d0f1@oracle.com>

Hi David,

On 11/20/19 4:55 AM, David Holmes wrote:
> Hi Robbin,
> 
> Meta-comment: I found it difficult to identify exactly what was being 
> removed/simplified. Please update the bug report with basic information on what 
> you've actually done here: always stack allocated; no concurrent ops; no async ops.
> 
> Meta-comment2: I would have thought that concurrent handshake ops might be 
> useful/desirable - though I assume your aim is to make such ops actually bypass 
> the VMThread altogether in the future?

Yes, exactly.

> 
> On 19/11/2019 11:51 pm, Robbin Ehn wrote:
>> Hi David, thanks for having a look.
>>
>> On 11/19/19 2:27 PM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 19/11/2019 9:05 pm, Robbin Ehn wrote:
>>>> Hi all, please review.
>>>
>>> Preliminary comments ... so ...
>>>
>>> +class VM_Operation : public StackObj {
>>> ?? public:
>>> ??? enum Mode {
>>> ????? _safepoint,?????? // blocking,??????? safepoint, vm_op C-heap allocated
>>> -??? _no_safepoint,??? // blocking,???? no safepoint, vm_op C-Heap allocated
>>> -??? _concurrent,????? // non-blocking, no safepoint, vm_op C-Heap allocated
>>> -??? _async_safepoint? // non-blocking,??? safepoint, vm_op C-Heap allocated
>>> +??? _no_safepoint??? // blocking,???? no safepoint, vm_op C-Heap allocated
>>> ??? };
>>>
>>>
>>> You are basically getting rid of concurrent and async_safepoint VM op 
>>> capability. Okay. But you're also making all VM ops StackObj so all those "VM 
>>> op C-heap allocated" comments are no longer correct. Also many of the 
>>> comments around the VM ops you have changed from async to sync,and from 
>>> C-heap to stackobj, are also no longer correct.
>>>
>>> Please update.
>>
>> Not sure if I found all:
>> http://cr.openjdk.java.net/~rehn/8234086/v2/inc/webrev/index.html
>> http://cr.openjdk.java.net/~rehn/8234086/v2/full/webrev/index.html
> 
> As per Kim's email no you didn't spot them all :)
> 
> Plus:
> 
> src/hotspot/share/runtime/thread.cpp
> 
>  ?// Enqueue a VM_Operation to do the job for us - sometime later
> 
> The "sometime later" is no longer applicable as it isn't async.
> 

Thanks, removed!

> ---
> 
> I'm really not seeing what the ObjectSynchronizer::force_monitor_scavage() 
> related changes are all about. They don't seem to be part of the simplification 
> but seem to be a separate issue. ??

To keep the same behavior as an asynch safepoint I did this.
Kim also didn't like this, suggestions?
(I did not want to block the thread here)

> 
> ---
> 
> src/hotspot/share/runtime/vmOperations.cpp/hpp
> 
> If there are only safepoint and no-safepoint ops now then we don't need a Mode 
> enum just a virtual evaluate_at_safepoint() query.
> 
> Not sure how adding VM_QueueHead simplifies anything ??

The head was heap allocated.
By having the heads static instead we don't need to allocate them.

> 
> The real simplification with this would come with getting rid of priorities :)

I have a patch for that, but is in a bigger change-set which also removes oops
do and thus the drain queue is not needed. But to actually do that change
sanely, I had to refactor a lot, which makes it large change.
So I'm saving that for JDK 15...

> 
> Minor nit: If there is no queue_add_front then queue_add_back could just be 
> queue_add.

Fixed.

> 
> -???????? if (timedout) {
> +???????? {
> 
> Not at all clear why the forced safepoint code is now unconditional. I see we 
> check the interval more explicitly in VMThread::no_op_safepoint, but it isn't 
> clear that we always want to waste time unlocking and relocking the lock 
> unnecessarily.

Yes, thanks, this is not the code I intended.
Fixing...

> 
> The timedout local variable is also now unused.
> 
>  ?648???? // only blocking VM operations need to verify the caller's safepoint 
> state:
> 
> All ops are now blocking.
> 

Thanks, David!

(sending v3 after testing to RFR mail)

/Robbin

> Thanks,
> David
> 
> 
>> Thanks, Robbin
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> CMS was the last real user of the more advantage features of VM operation.
>>>> VM operation can be simplified to always be an stack object and thus either be
>>>> of safepoint or no safepoint type.
>>>>
>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>> (default not used). Making it synchrone doesn't matter.
>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>>>> safepoint anyways, no real point in not stopping direct.
>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op 
>>>> is not needed. Arguably this thread should actually stop here, since we are 
>>>> about to safepoint.
>>>>
>>>> There is also a small cleanup in vmThread.cpp where an unused method is 
>>>> removed.
>>>> And the extra safepoint is removed:
>>>> "// We want to make sure that we get to a safepoint regularly"
>>>> No we don't :)
>>>>
>>>> Issue:
>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>> Change-set:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>
>>>> Tested scavenge manually, passes t1-2.
>>>>
>>>> Thanks, Robbin

From zgu at redhat.com  Wed Nov 20 13:48:14 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 20 Nov 2019 08:48:14 -0500
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect numbers
 for Compiler area
Message-ID: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>

JDK-8204128 did not fix the original bug. But new assertion helped to 
catch the problem, as it consistently failed in Oracle internal tests.

The root cause is that, when NMT biases a resource area to compiler, it 
did not adjust tracking data to reflect that. When the biased resource 
area is released, there is a possibility that its size is greater than 
total size recorded, and underflow a size_t counter.

JDK-8204128 patch also missed a long to ssize_t parameter type change, 
that resulted new test failure on Windows, because long is 4-bytes on 
Windows.

Many thanks to Leonid Mesnik, who helped to run this patch through 
Oracle's internal stress tests.

Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html


Test:
   hotspot_nmt
   Submit test
   Oracle internal stress tests.


Thanks,

-Zhengyu


From erik.osterlund at oracle.com  Wed Nov 20 16:22:38 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Wed, 20 Nov 2019 17:22:38 +0100
Subject: RFR: 8234531: Remove CMS code from CLDG and safepoint cleanup
Message-ID: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>

Hi,

There is some CMS code left in CLDG and safepoint cleanup. It should be 
removed.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8234531

Webrev:
http://cr.openjdk.java.net/~eosterlund/8234531/webrev.00/

Thanks,
/Erik

From david.buck at oracle.com  Wed Nov 20 16:51:35 2019
From: david.buck at oracle.com (David Buck)
Date: Thu, 21 Nov 2019 01:51:35 +0900
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
Message-ID: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>

Hi!

May I please get a review of this small fix:

bug report: https://bugs.openjdk.java.net/browse/JDK-8230611
proposed fix: http://cr.openjdk.java.net/~dbuck/8230611_ver00/

Cheers,
-Buck

From coleen.phillimore at oracle.com  Wed Nov 20 18:39:40 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 20 Nov 2019 13:39:40 -0500
Subject: RFR: 8234531: Remove CMS code from CLDG and safepoint cleanup
In-Reply-To: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
References: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
Message-ID: <259021b5-f8d9-ffda-a4cd-d333ed56309f@oracle.com>

Looks good!
Coleen

On 11/20/19 11:22 AM, erik.osterlund at oracle.com wrote:
> Hi,
>
> There is some CMS code left in CLDG and safepoint cleanup. It should 
> be removed.
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234531
>
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234531/webrev.00/
>
> Thanks,
> /Erik


From coleen.phillimore at oracle.com  Wed Nov 20 18:42:45 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Wed, 20 Nov 2019 13:42:45 -0500
Subject: RFR: 8234531: Remove CMS code from CLDG and safepoint cleanup
In-Reply-To: <259021b5-f8d9-ffda-a4cd-d333ed56309f@oracle.com>
References: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
 <259021b5-f8d9-ffda-a4cd-d333ed56309f@oracle.com>
Message-ID: <9738fd66-52fd-105f-ec6e-5f86dcb97310@oracle.com>


I thought I filed this under GC.?? Your bug should be a duplicate of 
that, and coordinate with Leo who is assigned to it.

https://bugs.openjdk.java.net/browse/JDK-8233214

Coleen

On 11/20/19 1:39 PM, coleen.phillimore at oracle.com wrote:
> Looks good!
> Coleen
>
> On 11/20/19 11:22 AM, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> There is some CMS code left in CLDG and safepoint cleanup. It should 
>> be removed.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234531
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234531/webrev.00/
>>
>> Thanks,
>> /Erik
>


From zgu at redhat.com  Wed Nov 20 19:32:51 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 20 Nov 2019 14:32:51 -0500
Subject: RFR: 8234531: Remove CMS code from CLDG and safepoint cleanup
In-Reply-To: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
References: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
Message-ID: <81e1eed9-e42d-8ac1-2084-3b300a477f86@redhat.com>

There are still some comments in classLoaderDataGraph.cpp referring CMS.

For example:

380  // (CMS doesn't purge right away).

Otherwise, looks good to me.

-Zhengyu

On 11/20/19 11:22 AM, erik.osterlund at oracle.com wrote:
> Hi,
> 
> There is some CMS code left in CLDG and safepoint cleanup. It should be 
> removed.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234531
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234531/webrev.00/
> 
> Thanks,
> /Erik
> 


From kim.barrett at oracle.com  Wed Nov 20 20:08:31 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 20 Nov 2019 15:08:31 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <307b86df-eef7-66ba-993c-be96c8eb90ec@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
 <29CF8C53-D058-4527-8866-1B1D4DB9998A@oracle.com>
 <307b86df-eef7-66ba-993c-be96c8eb90ec@oracle.com>
Message-ID: <9907AB77-F0CF-4497-9258-D386DE6DAFCB@oracle.com>

> On Nov 20, 2019, at 4:38 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> On 11/19/19 4:55 PM, Kim Barrett wrote:
>> src/hotspot/share/runtime/synchronizer.cpp
>> 1006   if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, &ForceMonitorScavenge) == 0) {
>> 1007     VMThread::check_cleanup();
>> I was going to make a couple nit-pick comments here, but this seems to
>> be soon to be dead code (only reached when deprecated MonitorBound has
>> a non-default value).
>> Not sure it's worth changing this and adding check_cleanup just for
>> use here though, given this code is going away soon. Are there planned
>> future uses of check_cleanup?
> 
> No future plans, no.
> Are you suggesting doing a synchronous safepoint here instead?

I think I (and maybe David too) need to better understand what is being done here and why.

>> src/hotspot/share/runtime/vmOperations.hpp
>> Removed:
>>  156   virtual ~VM_Operation() {}
>> Simple removal may cause warnings with some flags that have been
>> discussed being enabled. This change violates the "usual rule" of
>> public and virtual or non-public for base class destructors. Of
>> course, that rule (and any warnings that might try to check for it)
>> doesn't account for non-heap-allocatable classes. And HotSpot is full
>> of violations of that rule, so we're probably a long way from being
>> able to turn on some of those kinds of warnings.
>> OTOH, leaving the destructor alone has very little cost.
> 
> Added back.

Robbin let me know offline that just adding back the virtual
destructor doesn't work.

I had forgotten that a virtual destructor doesn't work for classes
derived from StackObj, running afoul of the deleting destructor.

Since deriving from StackObj prevents the problems the various
(currently not enabled) warning options are trying to prevent, no
current harm in using the public default destructor. We'll deal with
the possible future warnings problem in the possible future.

So go ahead with your original change of using the default destructor.


From markus.gronlund at oracle.com  Wed Nov 20 20:54:05 2019
From: markus.gronlund at oracle.com (Markus Gronlund)
Date: Wed, 20 Nov 2019 12:54:05 -0800 (PST)
Subject: 8233197(S): Invert JvmtiExport::post_vm_initialized() and
 Jfr:on_vm_start() start-up order for correct option parsing
In-Reply-To: <1ca7ae34-41fe-fad1-4bd2-57cdf9667bd9@oracle.com>
References: <b2bf81c0-80fa-49e4-ac09-8fa6589b1e80@default>
 <1ca7ae34-41fe-fad1-4bd2-57cdf9667bd9@oracle.com>
Message-ID: <62407a3d-f6a2-400b-9311-9ab7e32d85f7@default>

Hi Serguei,

thanks for taking a look.

?

"It does not look as a good idea to change the JVMTI phase like above.

? If you need the ONLOAD phase just to enable capabilities then it is better to do it in the real ONLOAD phase.

? Do I miss anything important here?

? Please, ask questions if you have any problems with it."

?

Yes, so the reason for the phase transition is not so much to do with capabilities, but that an agent can only register, i.e. call GetEnv(), in phases JVMTI_PHASE_ONLOAD and JVMTI_PHASE_LIVE.

create_vm_init_agents() is where the (temporary) JVMTI_PHASE_PRIMORDIAL to JVMTI_PHASE_ONLOAD happens during the callouts to Agent_OnLoad(), and then the state is returned to JVMTI_PHASE_PRIMORDIAL. It is hard to find an unconditional hook point there since create_vm_init_agents() is made conditional on Arguments::init_agents_at_startup(), with a listing populated from "real agents" (on command-line).

The JFR JVMTI agent itself is also conditional, installed only if JFR is actively started (i.e. a starting a recording). Hence, the phase transition mechanism merely replicates the state changes in create_vm_init_agents() to have the agent register properly. This is a moot point now however as I have taken another pass. I now found a way to only have the agent register during the JVMTI_PHASE_LIVE phase, so the phase transition mechanism is not needed.

?

"The Jfr::on_vm_init() is confusing as there is a mismatch with the JVMTI phases order.

? It fills like it means JFR init event (not VM init) or something like this.

? Or maybe it denotes the VM initialization start. :)

? I'll be happy if you could explain it a little bit."

?

Yes, this is confusing, I agree. Of course, JFR has a tight relation to the JVMTI phases, but only in so far as to coordinate agent registration. The JFR calls are not intended to reflect the JVMTI phases per se but a more general initialization order state description, like you say "VM initialization start and completion". However, it is very hard to encode proper semantics into the JFR calls in Threads::create_vm() to reflect the concepts of "stages"; they are simply not well-defined. In addition, there are so many of them J. For example, I always get confused that VM initialization is reflected in JVMTI by the VMStart event and the completion by the VMInit event (representing VM initialization complete). At the same time, the DTRACE macros have both HOTSPOT_VM_INIT_BEGIN() HOTSPOT_VM_INIT_END() placed before both...

?

I abandoned the attempt to encode anything meaningful into the JFR calls trying to represent a certain "VM initialization stage".

Instead, I will just have syntactic JFR calls reflecting some relative order (on_create_vm_1(), on_create_vm_2(),.. _3()) etc. Looks like there are precedents of this style. 

?

?Not sure, if your agent needs to enable these capabilities (introduced in JDK 9 with modules):
? can_generate_early_vmstart
? can_generate_early_class_hook_events?

?

Thanks for the suggestion Serguei, but these capabilities are not yet needed.

?

Here is the updated webrev: http://cr.openjdk.java.net/~mgronlun/8233197/webrev02/

?

Thanks again

Markus

?

?

From: Serguei Spitsyn 
Sent: den 20 november 2019 04:10
To: Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev <hotspot-jfr-dev at openjdk.java.net>; hotspot-runtime-dev at openjdk.java.net; serviceability-dev at openjdk.java.net
Subject: Re: 8233197(S): Invert JvmtiExport::post_vm_initialized() and Jfr:on_vm_start() start-up order for correct option parsing

?

Hi Marcus,

It looks good in general.

A couple of comments though.

http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp.frames.html

 258 class JvmtiPhaseTransition {
 259? private:
 260?? bool _transition;
 261? public:
 262?? JvmtiPhaseTransition() : _transition(JvmtiEnvBase::get_phase() == JVMTI_PHASE_PRIMORDIAL) {
 263???? if (_transition) {
 264?????? JvmtiEnvBase::set_phase(JVMTI_PHASE_ONLOAD);
 265???? }
 266?? }
 267?? ~JvmtiPhaseTransition() {
 268???? if (_transition) {
 269?????? assert(JvmtiEnvBase::get_phase() == JVMTI_PHASE_ONLOAD, "invariant");
 270?????? JvmtiEnvBase::set_phase(JVMTI_PHASE_PRIMORDIAL);
 271???? }
 272?? }
 273 };
 274 
?275 static bool initialize() {
 276?? JavaThread* const jt = current_java_thread();
 277?? assert(jt != NULL, "invariant");
 278?? assert(jt->thread_state() == _thread_in_vm, "invariant");
 279?? DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(jt));
 280?? JvmtiPhaseTransition jvmti_phase_transition;
 281?? ThreadToNativeFromVM transition(jt);
 282?? if (create_jvmti_env(jt) != JNI_OK) {
 283???? assert(jfr_jvmti_env == NULL, "invariant");
 284???? return false;
 285?? }
 286?? assert(jfr_jvmti_env != NULL, "invariant");
 287?? if (!register_capabilities(jt)) {
 288???? return false;
 289?? }
 290?? if (!register_callbacks(jt)) {
 291???? return false;
 292?? }
 293?? return update_class_file_load_hook_event(JVMTI_ENABLE);
 294 }


It does not look as a good idea to change the JVMTI phase like above.
If you need the ONLOAD phase just to enable capabilities then it is better to do it in the real ONLOAD phase.
Do I miss anything important here?
Please, ask questions if you have any problems with it.

The Jfr::on_vm_init() is confusing as there is a mismatch with the JVMTI phases order.
It fills like it means JFR init event (not VM init) or something like this.
Or maybe it denotes the VM initialization start. :)
I'll be happy if you could explain it a little bit.

Not sure, if your agent needs to enable these capabilities (introduced in JDK 9 with modules):
? can_generate_early_vmstart
? can_generate_early_class_hook_events

Thanks,
Serguei


On 11/19/19 06:38, Markus Gronlund wrote:

Greetings,
?
(apologies for the wide distribution)
?
Kindly asking for reviews for the following changeset:
?
Bug: https://bugs.openjdk.java.net/browse/JDK-8233197 
Webrev: http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/
Testing: serviceability/jvmti, jdk_jfr, tier1-5
Summary: please see bug for description.
?
For Runtime / Serviceability folks:
This change slightly modifies the relative order in Threads::create_vm(); please see threads.cpp.
There is an upcall as part of Jfr::on_vm_start() that delivers global JFR command-line options to Java (only if set).
The behavioral change amounts to a few classes loaded as part of establishing this upcall (all internal JFR classes and/or java.base classes, loaded by the bootloader) no longer being visible to the ClassFileLoadHook's of agents. These classes are visible to agents that work with "early_start" JVMTI environments however.
?
The major part of JFR startup with associated class loading still happens as part of Jfr::on_vm_live() with no behavioral change in relation to agents.
?
Thank you
Markus

?

From erik.osterlund at oracle.com  Wed Nov 20 21:38:24 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 20 Nov 2019 22:38:24 +0100
Subject: RFR: 8234531: Remove CMS code from CLDG and safepoint cleanup
In-Reply-To: <259021b5-f8d9-ffda-a4cd-d333ed56309f@oracle.com>
References: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
 <259021b5-f8d9-ffda-a4cd-d333ed56309f@oracle.com>
Message-ID: <492c8e8c-b99b-843f-02a1-bf088e39da5b@oracle.com>

Hi Coleen,

Thanks for the review.

/Erik

On 2019-11-20 19:39, coleen.phillimore at oracle.com wrote:
> Looks good!
> Coleen
>
> On 11/20/19 11:22 AM, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> There is some CMS code left in CLDG and safepoint cleanup. It should 
>> be removed.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234531
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234531/webrev.00/
>>
>> Thanks,
>> /Erik
>


From erik.osterlund at oracle.com  Wed Nov 20 21:39:57 2019
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 20 Nov 2019 22:39:57 +0100
Subject: RFR: 8234531: Remove CMS code from CLDG and safepoint cleanup
In-Reply-To: <81e1eed9-e42d-8ac1-2084-3b300a477f86@redhat.com>
References: <ee17bf4c-710a-9948-81d0-b7861017ad45@oracle.com>
 <81e1eed9-e42d-8ac1-2084-3b300a477f86@redhat.com>
Message-ID: <25acfa8d-6641-dc98-8788-b08e874e6b84@oracle.com>

Hi Zhengyu,

Thanks for the review. Will remove comments before pushing.

/Erik

On 2019-11-20 20:32, Zhengyu Gu wrote:
> There are still some comments in classLoaderDataGraph.cpp referring CMS.
>
> For example:
>
> 380? // (CMS doesn't purge right away).
>
> Otherwise, looks good to me.
>
> -Zhengyu
>
> On 11/20/19 11:22 AM, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> There is some CMS code left in CLDG and safepoint cleanup. It should 
>> be removed.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234531
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234531/webrev.00/
>>
>> Thanks,
>> /Erik
>>
>


From ioi.lam at oracle.com  Wed Nov 20 22:28:41 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 20 Nov 2019 14:28:41 -0800
Subject: RFR(S) 8234429: appcds/dynamicArchive tests crashing with Graal
Message-ID: <2ca6a977-b4d7-31d8-cba0-e4c9d9822f65@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8234429
http://cr.openjdk.java.net/~iklam/jdk14/8234429-dynamic-cds-graal-crash.v01/

In JDK-8231610, the implementation of DynamicArchive::is_mapped() is 
changed to

 ??? static bool is_mapped() { return FileMapInfo::dynamic_info() != 
NULL; }

During dynamic dumping, we temporarily (inside a safepoint) allocate a 
dynamic FileMapInfo, which makes it appear as if the dynamic archive has 
been mapped.

When graal is enabled, the VM actually continues to run for a little 
(compiling Java methods) after dynamic dumping has finished. During this 
time, when JVMCI tries to resolves a class, it might try to look up from 
the dynamic archive, which will fail as the dynamic archive isn't really 
mapped.

The fix is to free the temporarily allocated FileMapInfo when dynamic 
dumping is finished.

Thanks
- Ioi

From serguei.spitsyn at oracle.com  Thu Nov 21 00:52:41 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 20 Nov 2019 16:52:41 -0800
Subject: 8233197(S): Invert JvmtiExport::post_vm_initialized() and
 Jfr:on_vm_start() start-up order for correct option parsing
In-Reply-To: <62407a3d-f6a2-400b-9311-9ab7e32d85f7@default>
References: <b2bf81c0-80fa-49e4-ac09-8fa6589b1e80@default>
 <1ca7ae34-41fe-fad1-4bd2-57cdf9667bd9@oracle.com>
 <62407a3d-f6a2-400b-9311-9ab7e32d85f7@default>
Message-ID: <2ae4f9d7-7415-8c99-874c-97b6612ac272@oracle.com>

Hi Marcus,

Thank you for the answers!
The update looks good to me.

A couple of minor minor comments.

http://cr.openjdk.java.net/~mgronlun/8233197/webrev02/src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp.frames.html

57 static bool set_event_notification_mode(jvmtiEventMode mode,
   58                                               jvmtiEvent event,
   59                                               jthread event_thread,
   60                                               ...) {


 ? You may want to align arguments.

126 size_t length = sizeof base_error_msg ; // includes terminating null


 ? Unneeded space before ';'.
 ? Would it better to use this form: sizeof(base_error_msg)?


No need in another webrev.

Thanks,
Serguei


On 11/20/19 12:54 PM, Markus Gronlund wrote:
>
> Hi Serguei,
>
> thanks for taking a look.
>
> "It does not look as a good idea to change the JVMTI phase like above.
>
> ? If you need the ONLOAD phase just to enable capabilities then it is 
> better to do it in the real ONLOAD phase.
>
> ? Do I miss anything important here?
>
> ? Please, ask questions if you have any problems with it."
>
> Yes, so the reason for the phase transition is not so much to do with 
> capabilities, but that an agent can only register, i.e. call GetEnv(), 
> in phases JVMTI_PHASE_ONLOAD and JVMTI_PHASE_LIVE.
>
> create_vm_init_agents() is where the (temporary) 
> JVMTI_PHASE_PRIMORDIAL to JVMTI_PHASE_ONLOAD happens during the 
> callouts to Agent_OnLoad(), and then the state is returned to 
> JVMTI_PHASE_PRIMORDIAL. It is hard to find an unconditional hook point 
> there since create_vm_init_agents() is made conditional on 
> Arguments::init_agents_at_startup(), with a listing populated from 
> "real agents" (on command-line).
>
> The JFR JVMTI agent itself is also conditional, installed only if JFR 
> is actively started (i.e. a starting a recording). Hence, the phase 
> transition mechanism merely replicates the state changes in 
> create_vm_init_agents() to have the agent register properly. This is a 
> moot point now however as I have taken another pass. I now found a way 
> to only have the agent register during the JVMTI_PHASE_LIVE phase, so 
> the phase transition mechanism is not needed.
>
> "The Jfr::on_vm_init() is confusing as there is a mismatch with the 
> JVMTI phases order.
>
> ? It fills like it means JFR init event (not VM init) or something 
> like this.
>
> ? Or maybe it denotes the VM initialization start. :)
>
> ? I'll be happy if you could explain it a little bit."
>
> Yes, this is confusing, I agree. Of course, JFR has a tight relation 
> to the JVMTI phases, but only in so far as to coordinate agent 
> registration. The JFR calls are not intended to reflect the JVMTI 
> phases per se but a more general initialization order state 
> description, like you say "VM initialization start and completion". 
> However, it is very hard to encode proper semantics into the JFR calls 
> in Threads::create_vm() to reflect the concepts of "stages"; they are 
> simply not well-defined. In addition, there are so many of them J. For 
> example, I always get confused that VM initialization is reflected in 
> JVMTI by the VMStart event and the completion by the VMInit event 
> (representing VM initialization complete). At the same time, the 
> DTRACE macros have both HOTSPOT_VM_INIT_BEGIN() HOTSPOT_VM_INIT_END() 
> placed before both...
>
> I abandoned the attempt to encode anything meaningful into the JFR 
> calls trying to represent a certain "VM initialization stage".
>
> Instead, I will just have syntactic JFR calls reflecting some relative 
> order (on_create_vm_1(), on_create_vm_2(),.. _3()) etc. Looks like 
> there are precedents of this style.
>
> ?Not sure, if your agent needs to enable these capabilities 
> (introduced in JDK 9 with modules):
> ? can_generate_early_vmstart
> ? can_generate_early_class_hook_events?
>
> Thanks for the suggestion Serguei, but these capabilities are not yet 
> needed.
>
> Here is the updated webrev: 
> http://cr.openjdk.java.net/~mgronlun/8233197/webrev02/
>
> Thanks again
>
> Markus
>
> *From:*Serguei Spitsyn
> *Sent:* den 20 november 2019 04:10
> *To:* Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev 
> <hotspot-jfr-dev at openjdk.java.net>; 
> hotspot-runtime-dev at openjdk.java.net; serviceability-dev at openjdk.java.net
> *Subject:* Re: 8233197(S): Invert JvmtiExport::post_vm_initialized() 
> and Jfr:on_vm_start() start-up order for correct option parsing
>
> Hi Marcus,
>
> It looks good in general.
>
> A couple of comments though.
>
> http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp.frames.html
>
> 258 class JvmtiPhaseTransition {
> 259? private:
> 260?? bool _transition;
> 261? public:
> 262?? JvmtiPhaseTransition() : _transition(JvmtiEnvBase::get_phase() 
> == JVMTI_PHASE_PRIMORDIAL) {
> 263???? if (_transition) {
> 264?????? JvmtiEnvBase::set_phase(JVMTI_PHASE_ONLOAD);
> 265???? }
> 266?? }
> 267?? ~JvmtiPhaseTransition() {
> 268???? if (_transition) {
> 269?????? assert(JvmtiEnvBase::get_phase() == JVMTI_PHASE_ONLOAD, 
> "invariant");
> 270?????? JvmtiEnvBase::set_phase(JVMTI_PHASE_PRIMORDIAL);
> 271???? }
> 272?? }
> 273 };
> 274
>  ?275 static bool initialize() {
>   276?? JavaThread* const jt = current_java_thread();
>   277?? assert(jt != NULL, "invariant");
>   278?? assert(jt->thread_state() == _thread_in_vm, "invariant");
>   279?? DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(jt));
> *280?? JvmtiPhaseTransition jvmti_phase_transition;*
>   281?? ThreadToNativeFromVM transition(jt);
>   282?? if (create_jvmti_env(jt) != JNI_OK) {
>   283???? assert(jfr_jvmti_env == NULL, "invariant");
>   284???? return false;
>   285?? }
>   286?? assert(jfr_jvmti_env != NULL, "invariant");
> 287?? if (!register_capabilities(jt)) {
>   288???? return false;
>   289?? }
> 290?? if (!register_callbacks(jt)) {
>   291???? return false;
>   292?? }
> 293?? return update_class_file_load_hook_event(JVMTI_ENABLE);
>   294 }
>
>
> It does not look as a good idea to change the JVMTI phase like above.
> If you need the ONLOAD phase just to enable capabilities then it is 
> better to do it in the real ONLOAD phase.
> Do I miss anything important here?
> Please, ask questions if you have any problems with it.
>
> The Jfr::on_vm_init() is confusing as there is a mismatch with the 
> JVMTI phases order.
> It fills like it means JFR init event (not VM init) or something like 
> this.
> Or maybe it denotes the VM initialization start. :)
> I'll be happy if you could explain it a little bit.
>
> Not sure, if your agent needs to enable these capabilities (introduced 
> in JDK 9 with modules):
> ? can_generate_early_vmstart
> ? can_generate_early_class_hook_events
>
> Thanks,
> Serguei
>
>
> On 11/19/19 06:38, Markus Gronlund wrote:
>
>     Greetings,
>
>     (apologies for the wide distribution)
>
>     Kindly asking for reviews for the following changeset:
>
>     Bug:https://bugs.openjdk.java.net/browse/JDK-8233197  
>
>     Webrev:http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/
>
>     Testing: serviceability/jvmti, jdk_jfr, tier1-5
>
>     Summary: please see bug for description.
>
>     For Runtime / Serviceability folks:
>
>     This change slightly modifies the relative order in Threads::create_vm(); please see threads.cpp.
>
>     There is an upcall as part of Jfr::on_vm_start() that delivers global JFR command-line options to Java (only if set).
>
>     The behavioral change amounts to a few classes loaded as part of establishing this upcall (all internal JFR classes and/or java.base classes, loaded by the bootloader) no longer being visible to the ClassFileLoadHook's of agents. These classes are visible to agents that work with "early_start" JVMTI environments however.
>
>     The major part of JFR startup with associated class loading still happens as part of Jfr::on_vm_live() with no behavioral change in relation to agents.
>
>     Thank you
>
>     Markus
>


From kim.barrett at oracle.com  Thu Nov 21 07:13:13 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 21 Nov 2019 02:13:13 -0500
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
Message-ID: <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>

> On Nov 20, 2019, at 11:51 AM, David Buck <david.buck at oracle.com> wrote:
> 
> Hi!
> 
> May I please get a review of this small fix:
> 
> bug report: https://bugs.openjdk.java.net/browse/JDK-8230611
> proposed fix: http://cr.openjdk.java.net/~dbuck/8230611_ver00/
> 
> Cheers,
> -Buck

Note that RVO becomes mandatory in C++17. Not that this helps us
today. Though I'm somewhat surprised that any even vaguely recent
compiler would fail to do it.

(Side comment on this code: yet another SMR mechanism.)

The proposed operator= has some problems:

(1) Self-assignment bug. Self-assignment is rarely an issue in
practice, but if it costs nothing to handle, then one should do so.
The problem is that decreasing the counter before increasing it could
allow an existing waiter to proceed. Just do the increase first.

(2) operator= should return *this unless there's a good reason not to.

(3) There's no reason for the rhs argument to be non-const.

(4) The indentation is not HotSpot's normal 2-space.

So a better definition would be (1)

Iterator& operator=(const Iterator& rhs) {
  _current = rhs._current; 
  rhs._list->increase_readers();
  _list->decrease_readers();
  _list = rhs._list;
  return *this;
}

If assignments where the _list is the same for both is a common case,
it might be worthwhile to avoid the counter manipulation altogether in
that case, e.g. (2)

Iterator& operator=(const Iterator& rhs) {
  _current = rhs._current; 
  if (_list != rhs._list) {
    rhs._list->increase_readers();
    _list->decrease_readers();
    _list = rhs._list;
  }
  return *this;
}

I think the copy-swap idiom probably isn't the way to go here.
Compared to (2) it has fewer counter mods for the rvalue with
different lists, but more for the lvalue with the same list. It is
always at least as good as (1) though. But to really minimize counter
manipulations one wants move-construct and move-assign to handle the
rvalue case, and we don't have those yet.

https://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom/3279550#3279550

I think the move of the call to increase_readers() from
LogOutputList::iterator to the Iterator constructor is potentially a
mistake. I think the counter protection should be increased before the
node is obtained from the _level_start array. It might be that there
are reasons why the proposed change will work, but I think the
original code is safer and easier to analyze.


From robbin.ehn at oracle.com  Thu Nov 21 08:00:31 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 21 Nov 2019 09:00:31 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <9907AB77-F0CF-4497-9258-D386DE6DAFCB@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
 <29CF8C53-D058-4527-8866-1B1D4DB9998A@oracle.com>
 <307b86df-eef7-66ba-993c-be96c8eb90ec@oracle.com>
 <9907AB77-F0CF-4497-9258-D386DE6DAFCB@oracle.com>
Message-ID: <42c8c4ea-518a-b1ab-d1f5-e7c8ec92d1ce@oracle.com>

Hi Kim,

On 2019-11-20 21:08, Kim Barrett wrote:
>> On Nov 20, 2019, at 4:38 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>> On 11/19/19 4:55 PM, Kim Barrett wrote:
>>> src/hotspot/share/runtime/synchronizer.cpp
>>> 1006   if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, &ForceMonitorScavenge) == 0) {
>>> 1007     VMThread::check_cleanup();
>>> I was going to make a couple nit-pick comments here, but this seems to
>>> be soon to be dead code (only reached when deprecated MonitorBound has
>>> a non-default value).
>>> Not sure it's worth changing this and adding check_cleanup just for
>>> use here though, given this code is going away soon. Are there planned
>>> future uses of check_cleanup?
>>
>> No future plans, no.
>> Are you suggesting doing a synchronous safepoint here instead?
> 
> I think I (and maybe David too) need to better understand what is being done here and why.

Sorry for not making that clear, we have this comment explaining the situation:

       const int mx = MonitorBound;
       if (mx > 0 && (g_om_population-g_om_free_count) > mx) {
         // Not enough ObjectMonitors on the global free list.
         // We can't safely induce a STW safepoint from om_alloc() as our thread
         // state may not be appropriate for such activities and callers may hold
         // naked oops, so instead we defer the action.
         InduceScavenge(self, "om_alloc");
       }

I'm not sure why we only do this if we try to take a monitor from global free 
list. If free list is empty we just allocate instead.

By setting ForceMonitorScavenge and poke the VM thread via 
VMOperationQueue_lock, we still adhere to the comment.

This is not used in "8153224 : Monitor deflation prolong safepoints" patch with 
+AsyncDeflateIdleMonitors. The plan for flag AsyncDeflateIdleMonitors is to 
deprecate it in the release after, so this code is going away.
And thus 'force clean up' can be removed at the same time.

Found some more comments to fix :)

> 
> So go ahead with your original change of using the default destructor.
> 

Ok!

Thanks, Robbin (v3 against RFR mail)

From Alan.Bateman at oracle.com  Thu Nov 21 08:50:31 2019
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Thu, 21 Nov 2019 08:50:31 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>

On 14/11/2019 15:37, Langer, Christoph wrote:
> Hi,
>
> please review this cleanup change regarding function "canonicalize" of libjava.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>
>
> The goal is to cleanup how this function is defined and used. One thing is, that there was an unnecessary wrapper function "Canonicalize" in jni_util.c. It wrapped the call to "canonicalize". We can get rid of this wrapper. Unfortunately, it is not possible to just export "canonicalize" since this will conflict with a method signature from the math library, at least on modern Linuxes. So I decided to call the method JDK_Canonicalize and will correctly define it in jdk_util.h which can be included everywhere.
>
I think this change is okay. My main concern when initially seeing this 
go by was that it would leak the \\?\ or \\?\UNC\ prefix into the 
canonical File when it wasn't there previously, this would of course 
have several implications. But I think you have it right and this is, as 
you position, just refactoring/cleanup.

-Alan

From erik.osterlund at oracle.com  Thu Nov 21 09:15:27 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Thu, 21 Nov 2019 10:15:27 +0100
Subject: RFR: 8234509: Race in macOS os::processor_id()
Message-ID: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>

Hi,

In os::processor_id() on macOS, a CAS protects initialization of 
processor ID number given an APIC id. However, the CAS is not converted 
properly to a boolean determining whether it succeeded or failed. The 
implicit boolean conversion is wrong, allowing multiple threads to 
increment the next processor ID. The result is that if another thread 
runs on a different APIC id, we can get IDs that are higher than the 
number of processors.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8234509

Webrev:
http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/

Thanks,
/Erik

From david.holmes at oracle.com  Thu Nov 21 09:23:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 21 Nov 2019 19:23:22 +1000
Subject: RFR: 8234509: Race in macOS os::processor_id()
In-Reply-To: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
References: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
Message-ID: <c589131e-4d44-f473-d990-c624f823620e@oracle.com>

Hi Erik,

Looks good. Good catch!

Thanks,
David

On 21/11/2019 7:15 pm, erik.osterlund at oracle.com wrote:
> Hi,
> 
> In os::processor_id() on macOS, a CAS protects initialization of 
> processor ID number given an APIC id. However, the CAS is not converted 
> properly to a boolean determining whether it succeeded or failed. The 
> implicit boolean conversion is wrong, allowing multiple threads to 
> increment the next processor ID. The result is that if another thread 
> runs on a different APIC id, we can get IDs that are higher than the 
> number of processors.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234509
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/
> 
> Thanks,
> /Erik

From robbin.ehn at oracle.com  Thu Nov 21 09:23:41 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 21 Nov 2019 10:23:41 +0100
Subject: RFR: 8234509: Race in macOS os::processor_id()
In-Reply-To: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
References: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
Message-ID: <bce09511-6296-210e-da39-86486aa17924@oracle.com>

Looks good, thanks, Robbin

On 2019-11-21 10:15, erik.osterlund at oracle.com wrote:
> Hi,
> 
> In os::processor_id() on macOS, a CAS protects initialization of processor ID 
> number given an APIC id. However, the CAS is not converted properly to a boolean 
> determining whether it succeeded or failed. The implicit boolean conversion is 
> wrong, allowing multiple threads to increment the next processor ID. The result 
> is that if another thread runs on a different APIC id, we can get IDs that are 
> higher than the number of processors.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234509
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/
> 
> Thanks,
> /Erik

From erik.osterlund at oracle.com  Thu Nov 21 09:31:44 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Thu, 21 Nov 2019 10:31:44 +0100
Subject: RFR: 8234509: Race in macOS os::processor_id()
In-Reply-To: <bce09511-6296-210e-da39-86486aa17924@oracle.com>
References: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
 <bce09511-6296-210e-da39-86486aa17924@oracle.com>
Message-ID: <f7d40111-c18c-79e1-7584-87e99c0cf9e5@oracle.com>

Hi Robbin,

Thanks for the review!

/Erik

On 11/21/19 10:23 AM, Robbin Ehn wrote:
> Looks good, thanks, Robbin
>
> On 2019-11-21 10:15, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> In os::processor_id() on macOS, a CAS protects initialization of 
>> processor ID number given an APIC id. However, the CAS is not 
>> converted properly to a boolean determining whether it succeeded or 
>> failed. The implicit boolean conversion is wrong, allowing multiple 
>> threads to increment the next processor ID. The result is that if 
>> another thread runs on a different APIC id, we can get IDs that are 
>> higher than the number of processors.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234509
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/
>>
>> Thanks,
>> /Erik


From erik.osterlund at oracle.com  Thu Nov 21 09:31:28 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Thu, 21 Nov 2019 10:31:28 +0100
Subject: RFR: 8234509: Race in macOS os::processor_id()
In-Reply-To: <c589131e-4d44-f473-d990-c624f823620e@oracle.com>
References: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
 <c589131e-4d44-f473-d990-c624f823620e@oracle.com>
Message-ID: <177e70da-5f4b-d29f-6a9d-5ee422368525@oracle.com>

Hi David,

Thanks for the review!

/Erik

On 11/21/19 10:23 AM, David Holmes wrote:
> Hi Erik,
>
> Looks good. Good catch!
>
> Thanks,
> David
>
> On 21/11/2019 7:15 pm, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> In os::processor_id() on macOS, a CAS protects initialization of 
>> processor ID number given an APIC id. However, the CAS is not 
>> converted properly to a boolean determining whether it succeeded or 
>> failed. The implicit boolean conversion is wrong, allowing multiple 
>> threads to increment the next processor ID. The result is that if 
>> another thread runs on a different APIC id, we can get IDs that are 
>> higher than the number of processors.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234509
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/
>>
>> Thanks,
>> /Erik


From per.liden at oracle.com  Thu Nov 21 09:39:15 2019
From: per.liden at oracle.com (Per Liden)
Date: Thu, 21 Nov 2019 10:39:15 +0100
Subject: RFR: 8234509: Race in macOS os::processor_id()
In-Reply-To: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
References: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
Message-ID: <e26a181c-82a4-fddc-d688-605d1f523ff5@oracle.com>

Looks good!

/Per

On 11/21/19 10:15 AM, erik.osterlund at oracle.com wrote:
> Hi,
> 
> In os::processor_id() on macOS, a CAS protects initialization of 
> processor ID number given an APIC id. However, the CAS is not converted 
> properly to a boolean determining whether it succeeded or failed. The 
> implicit boolean conversion is wrong, allowing multiple threads to 
> increment the next processor ID. The result is that if another thread 
> runs on a different APIC id, we can get IDs that are higher than the 
> number of processors.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8234509
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/
> 
> Thanks,
> /Erik

From erik.osterlund at oracle.com  Thu Nov 21 09:46:11 2019
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Thu, 21 Nov 2019 10:46:11 +0100
Subject: RFR: 8234509: Race in macOS os::processor_id()
In-Reply-To: <e26a181c-82a4-fddc-d688-605d1f523ff5@oracle.com>
References: <5f7f7af2-6031-46c7-6155-2026802431cc@oracle.com>
 <e26a181c-82a4-fddc-d688-605d1f523ff5@oracle.com>
Message-ID: <05b7ad58-76a9-9db3-049d-b5b05bc2c8cf@oracle.com>

Hi Per,

Thanks for the review!

/Erik

On 11/21/19 10:39 AM, Per Liden wrote:
> Looks good!
>
> /Per
>
> On 11/21/19 10:15 AM, erik.osterlund at oracle.com wrote:
>> Hi,
>>
>> In os::processor_id() on macOS, a CAS protects initialization of 
>> processor ID number given an APIC id. However, the CAS is not 
>> converted properly to a boolean determining whether it succeeded or 
>> failed. The implicit boolean conversion is wrong, allowing multiple 
>> threads to increment the next processor ID. The result is that if 
>> another thread runs on a different APIC id, we can get IDs that are 
>> higher than the number of processors.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8234509
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8234509/webrev.00/
>>
>> Thanks,
>> /Erik


From david.buck at oracle.com  Thu Nov 21 11:34:18 2019
From: david.buck at oracle.com (David Buck)
Date: Thu, 21 Nov 2019 20:34:18 +0900
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
 <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
Message-ID: <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>

Hi Kim!

Thank you for taking a look at this!

I agree with you on all counts. I have addressed all of your concerns 
and have retested. Here is the new webrev:

http://cr.openjdk.java.net/~dbuck/8230611_ver01/

 > If assignments where the _list is the same for both is a common case,
 > it might be worthwhile to avoid the counter manipulation altogether in
 > that case, e.g. (2)

The copy assignment operator never seems to be used in our current 
logging code. But I suppose (2) seems like a better choice as I assume 
it is at least not *unlikely* that future code would do assignment 
between iterators of the same _list.

 > I think the move of the call to increase_readers() from
 > LogOutputList::iterator to the Iterator constructor is potentially a
 > mistake. I think the counter protection should be increased before the
 > node is obtained from the _level_start array. It might be that there
 > are reasons why the proposed change will work, but I think the
 > original code is safer and easier to analyze.

You were right to worry. Looking at the code now, it seems clear to me 
that the intent was to make sure that the reader count was incremented 
*before* any fields were initialized. My change breaks that and made a 
nasty race possible. I have returned the call to increase_readers() back 
to where it originally was. I have also modified the new copy ctor to 
make sure that the count is incremented before the 2 fields. Finally, as 
a small modification of your suggested copy assignment operator, I do 
not decrement the reader count for the lhs _list until its value has 
been overwritten. (I do understand that that last one may be a bit 
overkill...)

Please let me know what you think and thanks again for all the great 
feedback.

Cheers,
-Buck


On 2019/11/21 16:13, Kim Barrett wrote:
>> On Nov 20, 2019, at 11:51 AM, David Buck <david.buck at oracle.com> wrote:
>>
>> Hi!
>>
>> May I please get a review of this small fix:
>>
>> bug report: https://bugs.openjdk.java.net/browse/JDK-8230611
>> proposed fix: http://cr.openjdk.java.net/~dbuck/8230611_ver00/
>>
>> Cheers,
>> -Buck
> 
> Note that RVO becomes mandatory in C++17. Not that this helps us
> today. Though I'm somewhat surprised that any even vaguely recent
> compiler would fail to do it.
> 
> (Side comment on this code: yet another SMR mechanism.)
> 
> The proposed operator= has some problems:
> 
> (1) Self-assignment bug. Self-assignment is rarely an issue in
> practice, but if it costs nothing to handle, then one should do so.
> The problem is that decreasing the counter before increasing it could
> allow an existing waiter to proceed. Just do the increase first.
> 
> (2) operator= should return *this unless there's a good reason not to.
> 
> (3) There's no reason for the rhs argument to be non-const.
> 
> (4) The indentation is not HotSpot's normal 2-space.
> 
> So a better definition would be (1)
> 
> Iterator& operator=(const Iterator& rhs) {
>    _current = rhs._current;
>    rhs._list->increase_readers();
>    _list->decrease_readers();
>    _list = rhs._list;
>    return *this;
> }
> 
> If assignments where the _list is the same for both is a common case,
> it might be worthwhile to avoid the counter manipulation altogether in
> that case, e.g. (2)
> 
> Iterator& operator=(const Iterator& rhs) {
>    _current = rhs._current;
>    if (_list != rhs._list) {
>      rhs._list->increase_readers();
>      _list->decrease_readers();
>      _list = rhs._list;
>    }
>    return *this;
> }
> 
> I think the copy-swap idiom probably isn't the way to go here.
> Compared to (2) it has fewer counter mods for the rvalue with
> different lists, but more for the lvalue with the same list. It is
> always at least as good as (1) though. But to really minimize counter
> manipulations one wants move-construct and move-assign to handle the
> rvalue case, and we don't have those yet.
> 
> https://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom/3279550#3279550
> 
> I think the move of the call to increase_readers() from
> LogOutputList::iterator to the Iterator constructor is potentially a
> mistake. I think the counter protection should be increased before the
> node is obtained from the _level_start array. It might be that there
> are reasons why the proposed change will work, but I think the
> original code is safer and easier to analyze.
> 

-- 
External Email Recipient Confirmation Process - Oracle Internal Only

From robbin.ehn at oracle.com  Thu Nov 21 11:50:21 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 21 Nov 2019 12:50:21 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
Message-ID: <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>

Hi,

Here is v3:

Full:
http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
Inc:
http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/

Tested t1-3

Thanks, Robbin

On 2019-11-19 12:05, Robbin Ehn wrote:
> Hi all, please review.
> 
> CMS was the last real user of the more advantage features of VM operation.
> VM operation can be simplified to always be an stack object and thus either be
> of safepoint or no safepoint type.
> 
> VM_EnableBiasedLocking is executed once by watcher thread, if needed (default 
> not used). Making it synchrone doesn't matter.
> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
> safepoint anyways, no real point in not stopping direct.
> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op is 
> not needed. Arguably this thread should actually stop here, since we are about 
> to safepoint.
> 
> There is also a small cleanup in vmThread.cpp where an unused method is removed.
> And the extra safepoint is removed:
> "// We want to make sure that we get to a safepoint regularly"
> No we don't :)
> 
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8234086
> Change-set:
> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
> 
> Tested scavenge manually, passes t1-2.
> 
> Thanks, Robbin

From christoph.langer at sap.com  Thu Nov 21 13:18:57 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Thu, 21 Nov 2019 13:18:57 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
Message-ID: <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi Alan,

thanks for the review. I'll push it then after running through jdk-submit.

/Christoph

> -----Original Message-----
> From: Alan Bateman <Alan.Bateman at oracle.com>
> Sent: Donnerstag, 21. November 2019 09:51
> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
> libjava, hotspot and libinstrument
> 
> On 14/11/2019 15:37, Langer, Christoph wrote:
> > Hi,
> >
> > please review this cleanup change regarding function "canonicalize" of
> libjava.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
> >
> >
> > The goal is to cleanup how this function is defined and used. One thing is,
> that there was an unnecessary wrapper function "Canonicalize" in jni_util.c.
> It wrapped the call to "canonicalize". We can get rid of this wrapper.
> Unfortunately, it is not possible to just export "canonicalize" since this will
> conflict with a method signature from the math library, at least on modern
> Linuxes. So I decided to call the method JDK_Canonicalize and will correctly
> define it in jdk_util.h which can be included everywhere.
> >
> I think this change is okay. My main concern when initially seeing this
> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
> canonical File when it wasn't there previously, this would of course
> have several implications. But I think you have it right and this is, as
> you position, just refactoring/cleanup.
> 
> -Alan

From daniel.daugherty at oracle.com  Thu Nov 21 15:14:44 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 21 Nov 2019 10:14:44 -0500
Subject: RFR(T): 8234544: ObjectSynchronizer::FastHashCode() cleanups from
 Async Monitor Deflation project
Message-ID: <373c3a4d-6cf7-1c6c-7578-eec57efd704e@oracle.com>

Greetings,

I have another round of baseline cleanup changes from the Async Monitor
Deflation project (8153224). Only comments have been changed with this
one so I'm considering this changeset to be trivial.

Please see the bug for details about the changes in this webrev:

JDK-8234544 ObjectSynchronizer::FastHashCode() cleanups from Async
 ??????????????? Monitor Deflation project
 ??? https://bugs.openjdk.java.net/browse/JDK-8234544

Here's the webrev URL:

http://cr.openjdk.java.net/~dcubed/8234544-webrev/0-for-jdk14/

These changes have been tested with a Mach5 Tier[1-3] just because
I'm paranoid. :-)

Thanks, in advance, for any comments, questions or suggestions.

Dan

From yumin.qi at gmail.com  Thu Nov 21 17:12:28 2019
From: yumin.qi at gmail.com (yumin qi)
Date: Thu, 21 Nov 2019 09:12:28 -0800
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
Message-ID: <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>

Hi, Zhengyu

  The fix looks good to me.

Thanks
Yumin


On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com> wrote:

> JDK-8204128 did not fix the original bug. But new assertion helped to
> catch the problem, as it consistently failed in Oracle internal tests.
>
> The root cause is that, when NMT biases a resource area to compiler, it
> did not adjust tracking data to reflect that. When the biased resource
> area is released, there is a possibility that its size is greater than
> total size recorded, and underflow a size_t counter.
>
> JDK-8204128 patch also missed a long to ssize_t parameter type change,
> that resulted new test failure on Windows, because long is 4-bytes on
> Windows.
>
> Many thanks to Leonid Mesnik, who helped to run this patch through
> Oracle's internal stress tests.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
>
>
> Test:
>    hotspot_nmt
>    Submit test
>    Oracle internal stress tests.
>
>
> Thanks,
>
> -Zhengyu
>
>

From zgu at redhat.com  Thu Nov 21 17:13:39 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 21 Nov 2019 12:13:39 -0500
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
Message-ID: <d0ed67ac-6000-c7c5-4b16-bcaec9a47c3e@redhat.com>

Thanks, Yumin

-Zhengyu

On 11/21/19 12:12 PM, yumin qi wrote:
> Hi, Zhengyu
> 
>  ? The fix looks good to me.
> 
> Thanks
> Yumin
> 
> 
> 
> On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com 
> <mailto:zgu at redhat.com>> wrote:
> 
>     JDK-8204128 did not fix the original bug. But new assertion helped to
>     catch the problem, as it consistently failed in Oracle internal tests.
> 
>     The root cause is that, when NMT biases a resource area to compiler, it
>     did not adjust tracking data to reflect that. When the biased resource
>     area is released, there is a possibility that its size is greater than
>     total size recorded, and underflow a size_t counter.
> 
>     JDK-8204128 patch also missed a long to ssize_t parameter type change,
>     that resulted new test failure on Windows, because long is 4-bytes on
>     Windows.
> 
>     Many thanks to Leonid Mesnik, who helped to run this patch through
>     Oracle's internal stress tests.
> 
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
>     Webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
> 
> 
>     Test:
>      ? ?hotspot_nmt
>      ? ?Submit test
>      ? ?Oracle internal stress tests.
> 
> 
>     Thanks,
> 
>     -Zhengyu
> 


From harold.seigel at oracle.com  Thu Nov 21 18:49:09 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Thu, 21 Nov 2019 13:49:09 -0500
Subject: RFR 8234058: runtime/CompressedOops/CompressedClassPointers.java
 fails with 'Narrow klass base: 0x0000000000000000' missing from stdout/stderr
Message-ID: <2fa6f501-4783-7ae7-4c0f-4e2cfc767576@oracle.com>

Hi,

Please review this trivial change to stop running test 
CompressedClassPointers.java on Windows.? The test checks that the base 
of the compressed class region is at a specific address. But, it fails 
intermittently because ASLR sometimes causes a different address to be 
returned.

Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234058/webrev/

JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234058

The change was tested by checking, in Mach5, that the test was not run 
on Windows but was run on other platforms.

Thanks, Harold


From coleen.phillimore at oracle.com  Thu Nov 21 18:50:11 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Thu, 21 Nov 2019 13:50:11 -0500
Subject: RFR 8234058: runtime/CompressedOops/CompressedClassPointers.java
 fails with 'Narrow klass base: 0x0000000000000000' missing from
 stdout/stderr
In-Reply-To: <2fa6f501-4783-7ae7-4c0f-4e2cfc767576@oracle.com>
References: <2fa6f501-4783-7ae7-4c0f-4e2cfc767576@oracle.com>
Message-ID: <c1c316f5-92c1-4781-2417-d06e137c6847@oracle.com>


I think this looks good.
Coleen

On 11/21/19 1:49 PM, Harold Seigel wrote:
> Hi,
>
> Please review this trivial change to stop running test 
> CompressedClassPointers.java on Windows.? The test checks that the 
> base of the compressed class region is at a specific address. But, it 
> fails intermittently because ASLR sometimes causes a different address 
> to be returned.
>
> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234058/webrev/
>
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234058
>
> The change was tested by checking, in Mach5, that the test was not run 
> on Windows but was run on other platforms.
>
> Thanks, Harold
>


From harold.seigel at oracle.com  Thu Nov 21 18:51:04 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Thu, 21 Nov 2019 13:51:04 -0500
Subject: RFR 8234058: runtime/CompressedOops/CompressedClassPointers.java
 fails with 'Narrow klass base: 0x0000000000000000' missing from
 stdout/stderr
In-Reply-To: <c1c316f5-92c1-4781-2417-d06e137c6847@oracle.com>
References: <2fa6f501-4783-7ae7-4c0f-4e2cfc767576@oracle.com>
 <c1c316f5-92c1-4781-2417-d06e137c6847@oracle.com>
Message-ID: <d2975750-0447-0c82-6067-30ad90fba2e0@oracle.com>

Thanks Coleen!

Harold

On 11/21/2019 1:50 PM, coleen.phillimore at oracle.com wrote:
>
> I think this looks good.
> Coleen
>
> On 11/21/19 1:49 PM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this trivial change to stop running test 
>> CompressedClassPointers.java on Windows.? The test checks that the 
>> base of the compressed class region is at a specific address. But, it 
>> fails intermittently because ASLR sometimes causes a different 
>> address to be returned.
>>
>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234058/webrev/
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234058
>>
>> The change was tested by checking, in Mach5, that the test was not 
>> run on Windows but was run on other platforms.
>>
>> Thanks, Harold
>>
>

From kim.barrett at oracle.com  Thu Nov 21 19:08:39 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 21 Nov 2019 14:08:39 -0500
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
 <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
 <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>
Message-ID: <1B26B3EF-7148-41FB-B26F-29EEA94A01BD@oracle.com>

> On Nov 21, 2019, at 6:34 AM, David Buck <david.buck at oracle.com> wrote:
> > I think the move of the call to increase_readers() from
> > LogOutputList::iterator to the Iterator constructor is potentially a
> > mistake. I think the counter protection should be increased before the
> > node is obtained from the _level_start array. It might be that there
> > are reasons why the proposed change will work, but I think the
> > original code is safer and easier to analyze.
> 
> You were right to worry. Looking at the code now, it seems clear to me that the intent was to make sure that the reader count was incremented *before* any fields were initialized. My change breaks that and made a nasty race possible.

I don?t think it?s quite that simple.  The invariant that must be maintained is that the
count can?t drop to zero while there are still references (iterators) that might be used.

> I have returned the call to increase_readers() back to where it originally was.

Good.

> I have also modified the new copy ctor to make sure that the count is incremented before the 2 fields.

No, don?t do that.  Your ver00 copy ctor was fine.  The counted reference from the itr
argument ensures the count can?t drop to zero before the increment by the copy ctor.

> Finally, as a small modification of your suggested copy assignment operator, I do not decrement the reader count for the lhs _list until its value has been overwritten. (I do understand that that last one may be a bit overkill?)

It?s nodes that are being protected, not lists.


From kim.barrett at oracle.com  Thu Nov 21 19:50:19 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 21 Nov 2019 14:50:19 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <42c8c4ea-518a-b1ab-d1f5-e7c8ec92d1ce@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <dfff2121-2f6b-7407-812e-7608e145af8c@oracle.com>
 <d263d1ec-ee83-c1bf-c945-76089e5ca865@oracle.com>
 <29CF8C53-D058-4527-8866-1B1D4DB9998A@oracle.com>
 <307b86df-eef7-66ba-993c-be96c8eb90ec@oracle.com>
 <9907AB77-F0CF-4497-9258-D386DE6DAFCB@oracle.com>
 <42c8c4ea-518a-b1ab-d1f5-e7c8ec92d1ce@oracle.com>
Message-ID: <4A43C3E0-E19B-451A-8AB8-1E3EEA2ECB83@oracle.com>

> On Nov 21, 2019, at 3:00 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> On 2019-11-20 21:08, Kim Barrett wrote:
>>> On Nov 20, 2019, at 4:38 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>> On 11/19/19 4:55 PM, Kim Barrett wrote:
>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>> 1006   if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, &ForceMonitorScavenge) == 0) {
>>>> 1007     VMThread::check_cleanup();
>>>> I was going to make a couple nit-pick comments here, but this seems to
>>>> be soon to be dead code (only reached when deprecated MonitorBound has
>>>> a non-default value).
>>>> Not sure it's worth changing this and adding check_cleanup just for
>>>> use here though, given this code is going away soon. Are there planned
>>>> future uses of check_cleanup?
>>> 
>>> No future plans, no.
>>> Are you suggesting doing a synchronous safepoint here instead?
>> I think I (and maybe David too) need to better understand what is being done here and why.
> 
> Sorry for not making that clear, we have this comment explaining the situation:
> 
>      const int mx = MonitorBound;
>      if (mx > 0 && (g_om_population-g_om_free_count) > mx) {
>        // Not enough ObjectMonitors on the global free list.
>        // We can't safely induce a STW safepoint from om_alloc() as our thread
>        // state may not be appropriate for such activities and callers may hold
>        // naked oops, so instead we defer the action.
>        InduceScavenge(self, "om_alloc");
>      }

I see; this is a place where we were using the async VMOp feature and
still (according to the comment) can't do a synchronous safepoint
here.  So a bit of somewhat new code to cope with this.  And since we
only get here when MonitorBound is exceeded, and that's already
deprecated (JDK-8230938) and slated for removal in jdk15, that new
code will be going away soonish.  But you want to make progress on
this change now, rather than waiting for that "soonish".  OK.

> I'm not sure why we only do this if we try to take a monitor from global free list. If free list is empty we just allocate instead.

No idea; seems like while the global free list is empty we can just continue
allocating, without regard to MonitorBound.  Though presumably we?ll
eventually safepoint and check for cleanup using the other mechanism.


From kim.barrett at oracle.com  Thu Nov 21 19:53:45 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 21 Nov 2019 14:53:45 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
Message-ID: <12710202-8213-480F-A0EC-6CBEB13D7284@oracle.com>

> On Nov 21, 2019, at 6:50 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> 
> Hi,
> 
> Here is v3:
> 
> Full:
> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
> Inc:
> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
> 
> Tested t1-3

Looks good to me.  But I'm not an expert in this area of the code and
it's pretty critical code, so you might want to seek at least a third
reviewer.  (Assuming David is also reviewing.)


From david.buck at oracle.com  Thu Nov 21 20:24:16 2019
From: david.buck at oracle.com (David Buck)
Date: Fri, 22 Nov 2019 05:24:16 +0900
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <1B26B3EF-7148-41FB-B26F-29EEA94A01BD@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
 <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
 <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>
 <1B26B3EF-7148-41FB-B26F-29EEA94A01BD@oracle.com>
Message-ID: <b9e782fa-41f6-3857-ceb9-5264bc394c40@oracle.com>

Hi Kim!

Thanks again for helping me to get this fix in shape.

Once again, I agree with everything you said. I have returned the copy 
ctor to my first version (using an initializer list), and have modified 
the copy assignment so that it now exactly matches your number 2 
suggestion from your previous response.

Here is a new (and hopefully final) webrev:

http://cr.openjdk.java.net/~dbuck/8230611_ver02/

I am still running the sanity check builds / tests in the background. 
But seeing how the copy ctor is simply a reversion back to the already 
thoroughly-tested version from earlier, and that the copy assignment 
override is dead code in our builds, I do not expect any surprises.

Does the latest webrev look good? Any other thoughts or feedback?

Cheers,
-Buck

On 2019/11/22 4:08, Kim Barrett wrote:
>> On Nov 21, 2019, at 6:34 AM, David Buck <david.buck at oracle.com> wrote:
>>> I think the move of the call to increase_readers() from
>>> LogOutputList::iterator to the Iterator constructor is potentially a
>>> mistake. I think the counter protection should be increased before the
>>> node is obtained from the _level_start array. It might be that there
>>> are reasons why the proposed change will work, but I think the
>>> original code is safer and easier to analyze.
>>
>> You were right to worry. Looking at the code now, it seems clear to me that the intent was to make sure that the reader count was incremented *before* any fields were initialized. My change breaks that and made a nasty race possible.
> 
> I don?t think it?s quite that simple.  The invariant that must be maintained is that the
> count can?t drop to zero while there are still references (iterators) that might be used.
> 
>> I have returned the call to increase_readers() back to where it originally was.
> 
> Good.
> 
>> I have also modified the new copy ctor to make sure that the count is incremented before the 2 fields.
> 
> No, don?t do that.  Your ver00 copy ctor was fine.  The counted reference from the itr
> argument ensures the count can?t drop to zero before the increment by the copy ctor.
> 
>> Finally, as a small modification of your suggested copy assignment operator, I do not decrement the reader count for the lhs _list until its value has been overwritten. (I do understand that that last one may be a bit overkill?)
> 
> It?s nodes that are being protected, not lists.
> 

-- 
External Email Recipient Confirmation Process - Oracle Internal Only

From kim.barrett at oracle.com  Thu Nov 21 20:37:29 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 21 Nov 2019 15:37:29 -0500
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <b9e782fa-41f6-3857-ceb9-5264bc394c40@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
 <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
 <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>
 <1B26B3EF-7148-41FB-B26F-29EEA94A01BD@oracle.com>
 <b9e782fa-41f6-3857-ceb9-5264bc394c40@oracle.com>
Message-ID: <FD1B0A90-DE51-43B4-9CDB-2E28617AD28A@oracle.com>

> On Nov 21, 2019, at 3:24 PM, David Buck <david.buck at oracle.com> wrote:
> 
> Hi Kim!
> 
> Thanks again for helping me to get this fix in shape.
> 
> Once again, I agree with everything you said. I have returned the copy ctor to my first version (using an initializer list), and have modified the copy assignment so that it now exactly matches your number 2 suggestion from your previous response.
> 
> Here is a new (and hopefully final) webrev:
> 
> http://cr.openjdk.java.net/~dbuck/8230611_ver02/
> 
> I am still running the sanity check builds / tests in the background. But seeing how the copy ctor is simply a reversion back to the already thoroughly-tested version from earlier, and that the copy assignment override is dead code in our builds, I do not expect any surprises.
> 
> Does the latest webrev look good? Any other thoughts or feedback?

Looks good.


From david.holmes at oracle.com  Thu Nov 21 21:31:52 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 22 Nov 2019 07:31:52 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
 <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
Message-ID: <34110470-9006-072b-d88c-22f145dee363@oracle.com>

On 20/11/2019 2:51 am, Doerr, Martin wrote:
> Hi Dan,
> 
>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>> making the code obsolete so this changeset is not taking anything away
>> other than changing the default of UseBiasedLocking from true to false.
>> There are things that have been deprecated since JDK8 and they still
>> have not yet been made obsolete.
> 
> I think deprecating before publishing an evaluation or at least having a discussion is not appropriate.

Deprecation shows the intent that we (eventually) want to remove this 
and that people should try to avoid using it. If we don't actually 
deprecate it but just turn off then here is a likely scenario:

- we turn of BL in 14
- customer updates to 14 sees a performance issue, checks the release 
notes, sees BL is disabled and turns it back on.
- customer continues on their merry way and feels no need to report back 
to OpenJDK that they need BL (even if we ask them to via release notes)
- we get no feedback that BL is still useful and so we deprecate it in, 
say, 16
- customer updates to 16 and gets the deprecation warning and then 
reports back that they need BL

Alternatively we deprecate in 14 and customer lets us know straight away 
that it is still useful.

Cheers,
David
-----

>> Deprecating biased locking is the proper way of saying that we (Oracle)
>> and/or others think that biased locking should/will go away in a future
>> release. Yes, there are locking experts outside of Oracle that have said
>> that biased locking should go away, but I haven't gotten permission to
>> quote the folks (yet)...
> 
> There should be consent on the direction of possibly removing it before communicating it the hard way.
> However, switching it off for evaluation sounds feasible to me.
> Seems like we have some homework, too.
> 
> Thanks, Patricio, for going the JEP way. I think changes with less impact have already been handled as JEP.
> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
>> Sent: Dienstag, 19. November 2019 00:06
>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR 8231264: Disable biased-locking and deprecate all flags
>> related to biased-locking
>>
>> Hi Patricio,
>>
>> On 11/15/19 9:15 PM, Patricio Chilano wrote:
>>> Hi all,
>>>
>>> Could you review the following patch?
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev
>>
>> src/hotspot/share/runtime/arguments.cpp
>>   ??? Is it too early to specify the obsolete_in and expired_in values?
>>   ??? They could be JDK_Version::undefined() so that all you are doing
>>   ??? is deprecation in this changeset.
>>
>> src/hotspot/share/runtime/globals.hpp
>>   ??? No comments.
>>
>> test/hotspot/gtest/oops/test_markWord.cpp
>>   ??? L96: ??? // Can't test this with biased locking disabled.
>>   ??????? Perhaps (since the comment is inside the if-statement):
>>   ???????????? // This sub-test requires biased locking to be enabled.
>>
>>   ??? L11[135] - Why indent the pre-processor controls? Left most
>>   ??????? column is generally the style used.
>>
>>   ??? L115: ? // Same thread tries to lock it again.
>>   ??????? This comment needs a rewrite. Perhaps:
>>   ??????????? // Lock the object using an ObjectLocker helper which
>>   ??????????? // will revoke the bias if we happened to use that
>>   ??????????? // mechanism above.
>>
>>   ??? L121: ? // This is no longer biased, because ObjectLocker revokes
>> the bias.
>>   ??????? This comment needs a rewrite. Perhaps:
>>   ??????????? // The object should be unlocked with no hashCode at
>>   ??????????? // this point (ObjectLocker dtr has run).
>>
>> test/jdk/jdk/jfr/event/runtime/TestBiasedLockRevocationEvents.java
>>   ??? No comments.
>>
>> Thumbs up! My comments are mostly nits so I don't need to see a new
>> webrev if you decide to make changes based on my suggestions.
>>
>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>> making the code obsolete so this changeset is not taking anything away
>> other than changing the default of UseBiasedLocking from true to false.
>> There are things that have been deprecated since JDK8 and they still
>> have not yet been made obsolete.
>>
>> Deprecating biased locking is the proper way of saying that we (Oracle)
>> and/or others think that biased locking should/will go away in a future
>> release. Yes, there are locking experts outside of Oracle that have said
>> that biased locking should go away, but I haven't gotten permission to
>> quote the folks (yet)...
>>
>> Deprecation is not final. Features can be un-deprecated if some
>> relevant facts and/or info changes the previous conclusion.
>>
>> Dan
>>
>>
>>
>>>
>>> Biased locking will be disabled by default and all related flags will
>>> be deprecated. Performance gains seen when the feature was introduced
>>> in the VM are less clear today with modern Java code/processors.
>>> Detailed rationale behind the change is included on the description of
>>> the bug.
>>>
>>> I modified test gtest/oops/test_markWord.cpp so that it still
>>> exercises other cases of markword printing.
>>>
>>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows
>>> and Solaris).
>>>
>>> Thanks,
>>> Patricio
>>>
>>>
> 

From david.holmes at oracle.com  Thu Nov 21 22:37:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 22 Nov 2019 08:37:22 +1000
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <FD1B0A90-DE51-43B4-9CDB-2E28617AD28A@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
 <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
 <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>
 <1B26B3EF-7148-41FB-B26F-29EEA94A01BD@oracle.com>
 <b9e782fa-41f6-3857-ceb9-5264bc394c40@oracle.com>
 <FD1B0A90-DE51-43B4-9CDB-2E28617AD28A@oracle.com>
Message-ID: <c246a574-de84-c25d-7bc4-f1a10495aa58@oracle.com>

On 22/11/2019 6:37 am, Kim Barrett wrote:
>> On Nov 21, 2019, at 3:24 PM, David Buck <david.buck at oracle.com> wrote:
>>
>> Hi Kim!
>>
>> Thanks again for helping me to get this fix in shape.
>>
>> Once again, I agree with everything you said. I have returned the copy ctor to my first version (using an initializer list), and have modified the copy assignment so that it now exactly matches your number 2 suggestion from your previous response.
>>
>> Here is a new (and hopefully final) webrev:
>>
>> http://cr.openjdk.java.net/~dbuck/8230611_ver02/
>>
>> I am still running the sanity check builds / tests in the background. But seeing how the copy ctor is simply a reversion back to the already thoroughly-tested version from earlier, and that the copy assignment override is dead code in our builds, I do not expect any surprises.
>>
>> Does the latest webrev look good? Any other thoughts or feedback?
> 
> Looks good.

+1

Thanks Kim for resolving all the technical nuances.

Thanks Buck for fixing this issue.

Cheers,
David


From ioi.lam at oracle.com  Thu Nov 21 22:58:27 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 21 Nov 2019 14:58:27 -0800
Subject: RFR(XS) 8234539 ArchiveRelocationTest.java failed: Archive mapping
 should always succeed
Message-ID: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8234539
http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v01/

This bug happens only on Windows. The fix is one-line -- in order to check
whether "This is the second time we try to map the archive(s)", instead of
using (addr_delta != 0), the correct condition is (rs.is_reserved()). Please
see the bug report for details.

I also improve the log messages when error happens.

Thanks
- Ioi

From coleen.phillimore at oracle.com  Thu Nov 21 23:04:36 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Thu, 21 Nov 2019 18:04:36 -0500
Subject: RFR (S): 8230611: infinite loop in
 LogOutputList::wait_until_no_readers()
In-Reply-To: <c246a574-de84-c25d-7bc4-f1a10495aa58@oracle.com>
References: <567531cb-6171-d750-2789-7f31e6a01ed6@oracle.com>
 <E5C3888C-D95C-48EE-94E3-2AA9C5402730@oracle.com>
 <d9fd2eab-4592-6b75-d4ab-b722898c7093@oracle.com>
 <1B26B3EF-7148-41FB-B26F-29EEA94A01BD@oracle.com>
 <b9e782fa-41f6-3857-ceb9-5264bc394c40@oracle.com>
 <FD1B0A90-DE51-43B4-9CDB-2E28617AD28A@oracle.com>
 <c246a574-de84-c25d-7bc4-f1a10495aa58@oracle.com>
Message-ID: <63aa589b-c428-a71b-9778-7130395ffed6@oracle.com>

Buck, thanks for fixing this and Kim for your comments, and David for 
reviewing.? It looks good to me also.
Coleen

On 11/21/19 5:37 PM, David Holmes wrote:
> On 22/11/2019 6:37 am, Kim Barrett wrote:
>>> On Nov 21, 2019, at 3:24 PM, David Buck <david.buck at oracle.com> wrote:
>>>
>>> Hi Kim!
>>>
>>> Thanks again for helping me to get this fix in shape.
>>>
>>> Once again, I agree with everything you said. I have returned the 
>>> copy ctor to my first version (using an initializer list), and have 
>>> modified the copy assignment so that it now exactly matches your 
>>> number 2 suggestion from your previous response.
>>>
>>> Here is a new (and hopefully final) webrev:
>>>
>>> http://cr.openjdk.java.net/~dbuck/8230611_ver02/
>>>
>>> I am still running the sanity check builds / tests in the 
>>> background. But seeing how the copy ctor is simply a reversion back 
>>> to the already thoroughly-tested version from earlier, and that the 
>>> copy assignment override is dead code in our builds, I do not expect 
>>> any surprises.
>>>
>>> Does the latest webrev look good? Any other thoughts or feedback?
>>
>> Looks good.
>
> +1
>
> Thanks Kim for resolving all the technical nuances.
>
> Thanks Buck for fixing this issue.
>
> Cheers,
> David
>
>
>


From david.holmes at oracle.com  Fri Nov 22 02:08:40 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 22 Nov 2019 12:08:40 +1000
Subject: RFR(T): 8234544: ObjectSynchronizer::FastHashCode() cleanups from
 Async Monitor Deflation project
In-Reply-To: <373c3a4d-6cf7-1c6c-7578-eec57efd704e@oracle.com>
References: <373c3a4d-6cf7-1c6c-7578-eec57efd704e@oracle.com>
Message-ID: <eb996883-827c-29a2-a8c4-7ca5b3e683a8@oracle.com>

Hi Dan,

Seems fine and trivial.

Thanks,
David

On 22/11/2019 1:14 am, Daniel D. Daugherty wrote:
> Greetings,
> 
> I have another round of baseline cleanup changes from the Async Monitor
> Deflation project (8153224). Only comments have been changed with this
> one so I'm considering this changeset to be trivial.
> 
> Please see the bug for details about the changes in this webrev:
> 
> JDK-8234544 ObjectSynchronizer::FastHashCode() cleanups from Async
>  ??????????????? Monitor Deflation project
>  ??? https://bugs.openjdk.java.net/browse/JDK-8234544
> 
> Here's the webrev URL:
> 
> http://cr.openjdk.java.net/~dcubed/8234544-webrev/0-for-jdk14/
> 
> These changes have been tested with a Mach5 Tier[1-3] just because
> I'm paranoid. :-)
> 
> Thanks, in advance, for any comments, questions or suggestions.
> 
> Dan

From daniel.daugherty at oracle.com  Fri Nov 22 03:26:44 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 21 Nov 2019 22:26:44 -0500
Subject: RFR(T): 8234544: ObjectSynchronizer::FastHashCode() cleanups from
 Async Monitor Deflation project
In-Reply-To: <eb996883-827c-29a2-a8c4-7ca5b3e683a8@oracle.com>
References: <373c3a4d-6cf7-1c6c-7578-eec57efd704e@oracle.com>
 <eb996883-827c-29a2-a8c4-7ca5b3e683a8@oracle.com>
Message-ID: <5efba476-aa6a-eea6-40f4-c72fce5c59c4@oracle.com>

Thanks for the review.

Dan


On 11/21/19 9:08 PM, David Holmes wrote:
> Hi Dan,
>
> Seems fine and trivial.
>
> Thanks,
> David
>
> On 22/11/2019 1:14 am, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have another round of baseline cleanup changes from the Async Monitor
>> Deflation project (8153224). Only comments have been changed with this
>> one so I'm considering this changeset to be trivial.
>>
>> Please see the bug for details about the changes in this webrev:
>>
>> JDK-8234544 ObjectSynchronizer::FastHashCode() cleanups from Async
>> ???????????????? Monitor Deflation project
>> ???? https://bugs.openjdk.java.net/browse/JDK-8234544
>>
>> Here's the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8234544-webrev/0-for-jdk14/
>>
>> These changes have been tested with a Mach5 Tier[1-3] just because
>> I'm paranoid. :-)
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan


From ioi.lam at oracle.com  Fri Nov 22 05:52:25 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 21 Nov 2019 21:52:25 -0800
Subject: RFR(XS) 8233446: Improve error handling when specified dynamic
 archive doesn't exist
Message-ID: <1decfeb9-f4eb-a577-a68d-8f5dcde01d68@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8233446
http://cr.openjdk.java.net/~iklam/jdk14/8233446-error-handling-when-dyn-archive-not-found.v01/

With this patch, error handling in CDS is the same when the static archive
or the dynamic archive cannot be opened at runtime:

-Xshare:auto (default)
 ??? VM will continue to execute without mapping the specified but
 ??? unavailable archive. (Same as before this patch)

-Xshare:on (deprecated)
 ??? VM will exit with an error message


So this patch only modifies the behavior of -Xshare:on, which is deprecated
anyway. However, the code is simplified and more consistent, and will 
make it easier
to remove support for -Xshare:on in the future.

Tested with hs-tier1/hs-tier2

Thanks
- Ioi

From david.holmes at oracle.com  Fri Nov 22 06:13:24 2019
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 22 Nov 2019 16:13:24 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
Message-ID: <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>

Hi Robbin,

On 21/11/2019 9:50 pm, Robbin Ehn wrote:
> Hi,
> 
> Here is v3:
> 
> Full:
> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/

src/hotspot/share/runtime/synchronizer.cpp

Looking at the highly discussed:

if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
&ForceMonitorScavenge) == 0) {

why isn't that just:

if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {

??

Also while we are here can we clean this up further:

static volatile int ForceMonitorScavenge = 0;

becomes

static int _forceMonitorScavenge = 0;

so the variable doesn't look like it came from globals.hpp :)

Just to be clear, I understand the changes around monitor scavenging 
now, though I'm not sure getting rid of async VM ops and replacing with 
a new way to directly wakeup the VMThread really amounts to a 
simplification.

---

src/hotspot/share/runtime/vmOperations.hpp

I still think getting rid of Mode altogether would be a good 
simplification. :)

Thanks,
David
-----


> Inc:
> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
> 
> Tested t1-3
> 
> Thanks, Robbin
> 
> On 2019-11-19 12:05, Robbin Ehn wrote:
>> Hi all, please review.
>>
>> CMS was the last real user of the more advantage features of VM 
>> operation.
>> VM operation can be simplified to always be an stack object and thus 
>> either be
>> of safepoint or no safepoint type.
>>
>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>> (default not used). Making it synchrone doesn't matter.
>> VM_ThreadStop is executed by a JavaThread, that thread should stop for 
>> the safepoint anyways, no real point in not stopping direct.
>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the 
>> VM op is not needed. Arguably this thread should actually stop here, 
>> since we are about to safepoint.
>>
>> There is also a small cleanup in vmThread.cpp where an unused method 
>> is removed.
>> And the extra safepoint is removed:
>> "// We want to make sure that we get to a safepoint regularly"
>> No we don't :)
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8234086
>> Change-set:
>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>
>> Tested scavenge manually, passes t1-2.
>>
>> Thanks, Robbin

From kim.barrett at oracle.com  Fri Nov 22 07:23:43 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 22 Nov 2019 02:23:43 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
Message-ID: <15F8277A-54A3-400D-9D08-155A8986D3C9@oracle.com>

> On Nov 22, 2019, at 1:13 AM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Robbin,
> 
> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>> Hi,
>> Here is v3:
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
> 
> src/hotspot/share/runtime/synchronizer.cpp
> 
> Looking at the highly discussed:
> 
> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, &ForceMonitorScavenge) == 0) {
> 
> why isn't that just:
> 
> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
> 
> ??
> 
> Also while we are here can we clean this up further:
> 
> static volatile int ForceMonitorScavenge = 0;
> 
> becomes
> 
> static int _forceMonitorScavenge = 0;
> 
> so the variable doesn't look like it came from globals.hpp :)

I was going to ask some similar questions, but decided not to bother, e.g.

> On Nov 19, 2019, at 10:55 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
> I was going to make a couple nit-pick comments here, but this seems to
> be soon to be dead code (only reached when deprecated MonitorBound has
> a non-default value).


From adinn at redhat.com  Fri Nov 22 10:14:11 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 22 Nov 2019 10:14:11 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <34110470-9006-072b-d88c-22f145dee363@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
 <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <34110470-9006-072b-d88c-22f145dee363@oracle.com>
Message-ID: <249648ff-084b-00bc-4c70-14024471e082@redhat.com>

On 21/11/2019 21:31, David Holmes wrote:
> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>> I think deprecating before publishing an evaluation or at least having
>> a discussion is not appropriate.
> 
> Deprecation shows the intent that we (eventually) want to remove this
> and that people should try to avoid using it. If we don't actually
> deprecate it but just turn off then here is a likely scenario:

Who is this we? The premise of your scenario has built in the conclusion
that some of /us/ are questioning and thereby excluded our critique from
any chance of qualifying the proposed action.

If the existence of such a consensus is not clear (and I suggest that
this thread makes that plain) and the evidence for arriving at such a
consensus is not compelling (ditto) and if the rest of the scenario will
likely play out as you suggest then that is a strong reason to
re-address the decision to switch the feature off, whether or not it is
deprecated at the same time.

> Alternatively we deprecate in 14 and customer lets us know straight away
> that it is still useful.

Alternatively, we come up with better evidence that it needs switching
off (and, possibly, deprecating).

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From erik.osterlund at oracle.com  Fri Nov 22 11:49:57 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Fri, 22 Nov 2019 12:49:57 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
Message-ID: <67F1BE19-5149-4760-A01F-31A3F87D979C@oracle.com>

Hi everyone,

Not sure which email to reply to.
Anyway, I thought somebody ought to highlight that biased locking is essentially a blocker for loom. If we want to keep biased locking, it has to be fundamentally redesigned. I can?t see anyone else mentioned that in this thread, so there we go.

With that said (and it is unfortunate it has not been said yet), if people want to keep this thing, then said people peobably should consider how they imagine this working with fibers.

The fundamental problem is: thread as ID is useless as everything runs on all threads suddenly, and we essentially have to revoke everything even when run from the same fiber. There have been some thoughts about putting some fiber oop in there instead or something, but that would make all GCs vomit all over the place, and it doesnt have the special alignment requirements that fat threads currently conform to when biased locking is enabled.

So yeah, loom. There is that. That is what I wanted to add to this conversation.

Thanks,
/Erik

> On 16 Nov 2019, at 03:16, Patricio Chilano <patricio.chilano.mateo at oracle.com> wrote:
> 
> ?Hi all,
> 
> Could you review the following patch?
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8231264
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8231264/v01/webrev
> 
> Biased locking will be disabled by default and all related flags will be deprecated. Performance gains seen when the feature was introduced in the VM are less clear today with modern Java code/processors. Detailed rationale behind the change is included on the description of the bug.
> 
> I modified test gtest/oops/test_markWord.cpp so that it still exercises other cases of markword printing.
> 
> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows and Solaris).
> 
> Thanks,
> Patricio
> 
> 


From lois.foltan at oracle.com  Fri Nov 22 12:26:32 2019
From: lois.foltan at oracle.com (Lois Foltan)
Date: Fri, 22 Nov 2019 07:26:32 -0500
Subject: RFR(XS) 8233446: Improve error handling when specified dynamic
 archive doesn't exist
In-Reply-To: <1decfeb9-f4eb-a577-a68d-8f5dcde01d68@oracle.com>
References: <1decfeb9-f4eb-a577-a68d-8f5dcde01d68@oracle.com>
Message-ID: <824ce887-2d1f-7254-4353-ca19aa10b021@oracle.com>

Looks good.
Lois

On 11/22/2019 12:52 AM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8233446
> http://cr.openjdk.java.net/~iklam/jdk14/8233446-error-handling-when-dyn-archive-not-found.v01/ 
>
>
> With this patch, error handling in CDS is the same when the static 
> archive
> or the dynamic archive cannot be opened at runtime:
>
> -Xshare:auto (default)
> ??? VM will continue to execute without mapping the specified but
> ??? unavailable archive. (Same as before this patch)
>
> -Xshare:on (deprecated)
> ??? VM will exit with an error message
>
>
> So this patch only modifies the behavior of -Xshare:on, which is 
> deprecated
> anyway. However, the code is simplified and more consistent, and will 
> make it easier
> to remove support for -Xshare:on in the future.
>
> Tested with hs-tier1/hs-tier2
>
> Thanks
> - Ioi


From christoph.langer at sap.com  Fri Nov 22 14:04:38 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Fri, 22 Nov 2019 14:04:38 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi,

I'd like to push this change. However, running it through jdk-submit shows reproducible errors:

Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
BuildId: 2019-11-22-0926373.christoph.langer.source
No failed tests
Tasks Summary
?	NA: 0
?	NOTHING_TO_RUN: 0
?	KILLED: 0
?	PASSED: 76
?	UNABLE_TO_RUN: 0
?	EXECUTED_WITH_FAILURE: 1
?	FAILED: 0
?	HARNESS_ERROR: 0
Build
1 Executed with failure
o	windows-x64-install-windows-x64-build-19 error while building, return value: 2


Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
BuildId: 2019-11-21-2311357.christoph.langer.source
No failed tests
Tasks Summary
?	NA: 0
?	NOTHING_TO_RUN: 0
?	KILLED: 0
?	PASSED: 76
?	UNABLE_TO_RUN: 0
?	EXECUTED_WITH_FAILURE: 1
?	FAILED: 0
?	HARNESS_ERROR: 0
Build
1 Executed with failure
o	windows-x64-install-windows-x64-build-19 error while building, return value: 2


David already had a look and let me know that the following was the reason:

t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.c(41): fatal error C1083: Cannot open include file: 'jdk_util.h': No such file or directory

This is not explainable to me as I see this running through my local build and our nightly builds without problems. I also can't explain jdk_util.h can't be opened at this place - it should be there and part of the include directories...

I'd appreciate any help...

Thanks
Christoph


> -----Original Message-----
> From: Langer, Christoph
> Sent: Donnerstag, 21. November 2019 14:19
> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function between
> libjava, hotspot and libinstrument
> 
> Hi Alan,
> 
> thanks for the review. I'll push it then after running through jdk-submit.
> 
> /Christoph
> 
> > -----Original Message-----
> > From: Alan Bateman <Alan.Bateman at oracle.com>
> > Sent: Donnerstag, 21. November 2019 09:51
> > To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
> > dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> > Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
> > libjava, hotspot and libinstrument
> >
> > On 14/11/2019 15:37, Langer, Christoph wrote:
> > > Hi,
> > >
> > > please review this cleanup change regarding function "canonicalize" of
> > libjava.
> > >
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> > > Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
> > >
> > >
> > > The goal is to cleanup how this function is defined and used. One thing is,
> > that there was an unnecessary wrapper function "Canonicalize" in jni_util.c.
> > It wrapped the call to "canonicalize". We can get rid of this wrapper.
> > Unfortunately, it is not possible to just export "canonicalize" since this will
> > conflict with a method signature from the math library, at least on modern
> > Linuxes. So I decided to call the method JDK_Canonicalize and will correctly
> > define it in jdk_util.h which can be included everywhere.
> > >
> > I think this change is okay. My main concern when initially seeing this
> > go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
> > canonical File when it wasn't there previously, this would of course
> > have several implications. But I think you have it right and this is, as
> > you position, just refactoring/cleanup.
> >
> > -Alan

From robbin.ehn at oracle.com  Fri Nov 22 14:39:46 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Fri, 22 Nov 2019 15:39:46 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
Message-ID: <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>

Hi David,

On 11/22/19 7:13 AM, David Holmes wrote:
> Hi Robbin,
> 
> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>> Hi,
>>
>> Here is v3:
>>
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
> 
> src/hotspot/share/runtime/synchronizer.cpp
> 
> Looking at the highly discussed:
> 
> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
> &ForceMonitorScavenge) == 0) {
> 
> why isn't that just:
> 
> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
> 
> ??

I assumed someone had seen contention on ForceMonitorScavenge.
Many threads can be enter and re-enter here.
I don't know if that's still the case.

Since we only hit this path when the deprecated MonitorsBound is set, I think I 
can change it?

> 
> Also while we are here can we clean this up further:
> 
> static volatile int ForceMonitorScavenge = 0;
> 
> becomes
> 
> static int _forceMonitorScavenge = 0;
> 
> so the variable doesn't look like it came from globals.hpp :)
> 

Sure!

> Just to be clear, I understand the changes around monitor scavenging now, though 
> I'm not sure getting rid of async VM ops and replacing with a new way to 
> directly wakeup the VMThread really amounts to a simplification.
> 
> ---
> 
> src/hotspot/share/runtime/vmOperations.hpp
> 
> I still think getting rid of Mode altogether would be a good simplification. :)

Sure!

Here is v4, inc:
http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
Full:
http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html

Tested t1-3

Thanks, Robbin


> 
> Thanks,
> David
> -----
> 
> 
>> Inc:
>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>
>> Tested t1-3
>>
>> Thanks, Robbin
>>
>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>> Hi all, please review.
>>>
>>> CMS was the last real user of the more advantage features of VM operation.
>>> VM operation can be simplified to always be an stack object and thus either be
>>> of safepoint or no safepoint type.
>>>
>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed (default 
>>> not used). Making it synchrone doesn't matter.
>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>>> safepoint anyways, no real point in not stopping direct.
>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op is 
>>> not needed. Arguably this thread should actually stop here, since we are 
>>> about to safepoint.
>>>
>>> There is also a small cleanup in vmThread.cpp where an unused method is removed.
>>> And the extra safepoint is removed:
>>> "// We want to make sure that we get to a safepoint regularly"
>>> No we don't :)
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>> Change-set:
>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>
>>> Tested scavenge manually, passes t1-2.
>>>
>>> Thanks, Robbin

From robbin.ehn at oracle.com  Fri Nov 22 14:40:43 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Fri, 22 Nov 2019 15:40:43 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <15F8277A-54A3-400D-9D08-155A8986D3C9@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <15F8277A-54A3-400D-9D08-155A8986D3C9@oracle.com>
Message-ID: <0e689087-e56f-58a8-f152-1801167cc181@oracle.com>

Hi Kim,

>> so the variable doesn't look like it came from globals.hpp :)
> 
> I was going to ask some similar questions, but decided not to bother, e.g.

v4:
http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html

Thanks, Robbin

> 
>> On Nov 19, 2019, at 10:55 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> I was going to make a couple nit-pick comments here, but this seems to
>> be soon to be dead code (only reached when deprecated MonitorBound has
>> a non-default value).
> 
> 

From daniel.daugherty at oracle.com  Fri Nov 22 15:06:19 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 22 Nov 2019 10:06:19 -0500
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <249648ff-084b-00bc-4c70-14024471e082@redhat.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
 <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <34110470-9006-072b-d88c-22f145dee363@oracle.com>
 <249648ff-084b-00bc-4c70-14024471e082@redhat.com>
Message-ID: <a89922f0-2e53-f312-4458-9ad7bb8d4476@oracle.com>

On 11/22/19 5:14 AM, Andrew Dinn wrote:
> On 21/11/2019 21:31, David Holmes wrote:
>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>> I think deprecating before publishing an evaluation or at least having
>>> a discussion is not appropriate.
>> Deprecation shows the intent that we (eventually) want to remove this
>> and that people should try to avoid using it. If we don't actually
>> deprecate it but just turn off then here is a likely scenario:
> Who is this we?

The way you edited the original thread makes it look like the "we" comes
out of no where. Okay. Not sure why you did that, but here's the complete
context:

I posted this comment:

On 11/18/19 6:06 PM, Daniel D. Daugherty wrote:
> As for the whole "too soon to deprecate" discussion: Deprecation is not
> making the code obsolete so this changeset is not taking anything away
> other than changing the default of UseBiasedLocking from true to false.
> There are things that have been deprecated since JDK8 and they still
> have not yet been made obsolete.
>
> Deprecating biased locking is the proper way of saying that we (Oracle)
> and/or others think that biased locking should/will go away in a future
> release. Yes, there are locking experts outside of Oracle that have said
> that biased locking should go away, but I haven't gotten permission to
> quote the folks (yet)...
>
> Deprecation is not final. Features can be un-deprecated if some
> relevant facts and/or info changes the previous conclusion. 

Martin posted this reply to my comment:
> On 11/19/19 11:51 AM, Doerr, Martin wrote:
>> Hi Dan,
>>
>>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>>> making the code obsolete so this changeset is not taking anything away
>>> other than changing the default of UseBiasedLocking from true to false.
>>> There are things that have been deprecated since JDK8 and they still
>>> have not yet been made obsolete.
>> I think deprecating before publishing an evaluation or at least having a discussion is not appropriate.
>>
>>> Deprecating biased locking is the proper way of saying that we (Oracle)
>>> and/or others think that biased locking should/will go away in a future
>>> release. Yes, there are locking experts outside of Oracle that have said
>>> that biased locking should go away, but I haven't gotten permission to
>>> quote the folks (yet)...
>> There should be consent on the direction of possibly removing it before communicating it the hard way.
>> However, switching it off for evaluation sounds feasible to me.
>> Seems like we have some homework, too.

And David posted a reply to Martin's comment:

> On 11/21/19 4:31 PM, David Holmes wrote:
>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>> Hi Dan,
>>>
>>>> As for the whole "too soon to deprecate" discussion: Deprecation is 
>>>> not
>>>> making the code obsolete so this changeset is not taking anything away
>>>> other than changing the default of UseBiasedLocking from true to 
>>>> false.
>>>> There are things that have been deprecated since JDK8 and they still
>>>> have not yet been made obsolete.
>>>
>>> I think deprecating before publishing an evaluation or at least 
>>> having a discussion is not appropriate.
>>
>> Deprecation shows the intent that we (eventually) want to remove this 
>> and that people should try to avoid using it. If we don't actually 
>> deprecate it but just turn off then here is a likely scenario:
>>
>> - we turn of BL in 14
>> - customer updates to 14 sees a performance issue, checks the release 
>> notes, sees BL is disabled and turns it back on.
>> - customer continues on their merry way and feels no need to report 
>> back to OpenJDK that they need BL (even if we ask them to via release 
>> notes)
>> - we get no feedback that BL is still useful and so we deprecate it 
>> in, say, 16
>> - customer updates to 16 and gets the deprecation warning and then 
>> reports back that they need BL
>>
>> Alternatively we deprecate in 14 and customer lets us know straight 
>> away that it is still useful. 

So now David's use of "we" should be more clear. I do have to point out
that my use of "we (Oracle)" was present in both Martin's reply and in
David's reply to Martin, but for some reason you chose to edit it out.
This makes your pushing back on David's use of an unqualified "we"
questionable. Are you trying to be intentionally confrontational?


> The premise of your scenario has built in the conclusion
> that some of /us/ are questioning and thereby excluded our critique from
> any chance of qualifying the proposed action.

Hmmm... I don't see anyone excluding critiques here, but maybe I've missed
something...


> If the existence of such a consensus is not clear (and I suggest that
> this thread makes that plain) and the evidence for arriving at such a
> consensus is not compelling (ditto) and if the rest of the scenario will
> likely play out as you suggest then that is a strong reason to
> re-address the decision to switch the feature off, whether or not it is
> deprecated at the same time.

I suspect that "compelling" is in the eye of the beholder.

Simply changing the default from true to false is pretty much a silent
change in behavior even if we put out a release note. By deprecating at
the same time, we'll have a visible diagnostic message if biased locking
is enabled. That's much more likely to lead to feedback than a silent
change in behavior.


>> Alternatively we deprecate in 14 and customer lets us know straight away
>> that it is still useful.
> Alternatively, we come up with better evidence that it needs switching
> off (and, possibly, deprecating).

I wonder what would be considered acceptable "better evidence".

Dan

>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
>


From martin.doerr at sap.com  Fri Nov 22 16:29:22 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 22 Nov 2019 16:29:22 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a89922f0-2e53-f312-4458-9ad7bb8d4476@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
 <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <34110470-9006-072b-d88c-22f145dee363@oracle.com>
 <249648ff-084b-00bc-4c70-14024471e082@redhat.com>
 <a89922f0-2e53-f312-4458-9ad7bb8d4476@oracle.com>
Message-ID: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi all,

the only feasible argument for considering biased locking removal I've read so far is the one from Erik.
(Thanks, Erik!)

This should have been on the table before proposing deprecation.

If the plan is to deprecate it and later obsolete it in the release in which project loom enters main line,
I guess we (SAP) could agree with it.
It would be really helpful to communicate such plans before proposing deprecation.

Best regards,
Martin 


> -----Original Message-----
> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
> Sent: Freitag, 22. November 2019 16:06
> To: Andrew Dinn <adinn at redhat.com>; David Holmes
> <david.holmes at oracle.com>; Doerr, Martin <martin.doerr at sap.com>;
> Patricio Chilano <patricio.chilano.mateo at oracle.com>; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR 8231264: Disable biased-locking and deprecate all flags
> related to biased-locking
> 
> On 11/22/19 5:14 AM, Andrew Dinn wrote:
> > On 21/11/2019 21:31, David Holmes wrote:
> >> On 20/11/2019 2:51 am, Doerr, Martin wrote:
> >>> I think deprecating before publishing an evaluation or at least having
> >>> a discussion is not appropriate.
> >> Deprecation shows the intent that we (eventually) want to remove this
> >> and that people should try to avoid using it. If we don't actually
> >> deprecate it but just turn off then here is a likely scenario:
> > Who is this we?
> 
> The way you edited the original thread makes it look like the "we" comes
> out of no where. Okay. Not sure why you did that, but here's the complete
> context:
> 
> I posted this comment:
> 
> On 11/18/19 6:06 PM, Daniel D. Daugherty wrote:
> > As for the whole "too soon to deprecate" discussion: Deprecation is not
> > making the code obsolete so this changeset is not taking anything away
> > other than changing the default of UseBiasedLocking from true to false.
> > There are things that have been deprecated since JDK8 and they still
> > have not yet been made obsolete.
> >
> > Deprecating biased locking is the proper way of saying that we (Oracle)
> > and/or others think that biased locking should/will go away in a future
> > release. Yes, there are locking experts outside of Oracle that have said
> > that biased locking should go away, but I haven't gotten permission to
> > quote the folks (yet)...
> >
> > Deprecation is not final. Features can be un-deprecated if some
> > relevant facts and/or info changes the previous conclusion.
> 
> Martin posted this reply to my comment:
> > On 11/19/19 11:51 AM, Doerr, Martin wrote:
> >> Hi Dan,
> >>
> >>> As for the whole "too soon to deprecate" discussion: Deprecation is not
> >>> making the code obsolete so this changeset is not taking anything away
> >>> other than changing the default of UseBiasedLocking from true to false.
> >>> There are things that have been deprecated since JDK8 and they still
> >>> have not yet been made obsolete.
> >> I think deprecating before publishing an evaluation or at least having a
> discussion is not appropriate.
> >>
> >>> Deprecating biased locking is the proper way of saying that we (Oracle)
> >>> and/or others think that biased locking should/will go away in a future
> >>> release. Yes, there are locking experts outside of Oracle that have said
> >>> that biased locking should go away, but I haven't gotten permission to
> >>> quote the folks (yet)...
> >> There should be consent on the direction of possibly removing it before
> communicating it the hard way.
> >> However, switching it off for evaluation sounds feasible to me.
> >> Seems like we have some homework, too.
> 
> And David posted a reply to Martin's comment:
> 
> > On 11/21/19 4:31 PM, David Holmes wrote:
> >> On 20/11/2019 2:51 am, Doerr, Martin wrote:
> >>> Hi Dan,
> >>>
> >>>> As for the whole "too soon to deprecate" discussion: Deprecation is
> >>>> not
> >>>> making the code obsolete so this changeset is not taking anything away
> >>>> other than changing the default of UseBiasedLocking from true to
> >>>> false.
> >>>> There are things that have been deprecated since JDK8 and they still
> >>>> have not yet been made obsolete.
> >>>
> >>> I think deprecating before publishing an evaluation or at least
> >>> having a discussion is not appropriate.
> >>
> >> Deprecation shows the intent that we (eventually) want to remove this
> >> and that people should try to avoid using it. If we don't actually
> >> deprecate it but just turn off then here is a likely scenario:
> >>
> >> - we turn of BL in 14
> >> - customer updates to 14 sees a performance issue, checks the release
> >> notes, sees BL is disabled and turns it back on.
> >> - customer continues on their merry way and feels no need to report
> >> back to OpenJDK that they need BL (even if we ask them to via release
> >> notes)
> >> - we get no feedback that BL is still useful and so we deprecate it
> >> in, say, 16
> >> - customer updates to 16 and gets the deprecation warning and then
> >> reports back that they need BL
> >>
> >> Alternatively we deprecate in 14 and customer lets us know straight
> >> away that it is still useful.
> 
> So now David's use of "we" should be more clear. I do have to point out
> that my use of "we (Oracle)" was present in both Martin's reply and in
> David's reply to Martin, but for some reason you chose to edit it out.
> This makes your pushing back on David's use of an unqualified "we"
> questionable. Are you trying to be intentionally confrontational?
> 
> 
> > The premise of your scenario has built in the conclusion
> > that some of /us/ are questioning and thereby excluded our critique from
> > any chance of qualifying the proposed action.
> 
> Hmmm... I don't see anyone excluding critiques here, but maybe I've missed
> something...
> 
> 
> > If the existence of such a consensus is not clear (and I suggest that
> > this thread makes that plain) and the evidence for arriving at such a
> > consensus is not compelling (ditto) and if the rest of the scenario will
> > likely play out as you suggest then that is a strong reason to
> > re-address the decision to switch the feature off, whether or not it is
> > deprecated at the same time.
> 
> I suspect that "compelling" is in the eye of the beholder.
> 
> Simply changing the default from true to false is pretty much a silent
> change in behavior even if we put out a release note. By deprecating at
> the same time, we'll have a visible diagnostic message if biased locking
> is enabled. That's much more likely to lead to feedback than a silent
> change in behavior.
> 
> 
> >> Alternatively we deprecate in 14 and customer lets us know straight away
> >> that it is still useful.
> > Alternatively, we come up with better evidence that it needs switching
> > off (and, possibly, deprecating).
> 
> I wonder what would be considered acceptable "better evidence".
> 
> Dan
> 
> >
> > regards,
> >
> >
> > Andrew Dinn
> > -----------
> > Senior Principal Software Engineer
> > Red Hat UK Ltd
> > Registered in England and Wales under Company Registration No. 03798903
> > Directors: Michael Cunningham, Michael ("Mike") O'Neill
> >
> >


From ioi.lam at oracle.com  Fri Nov 22 17:16:38 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 22 Nov 2019 09:16:38 -0800
Subject: RFR(S) 8234622 [TESTBUG] ArchivedModuleCompareTest.java fails with
 -vmoptions:-Xlog:cds
Message-ID: <13280c3b-0c54-9b40-ce59-6e9bf8f6a3af@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8234622
http://cr.openjdk.java.net/~iklam/jdk14/8234622-ArchivedModuleCompareTest-failed-with-logging.v01/

This test is confused when it encounters UL logging messages in the 
STDOUT of child processes. The fix is to filter out the UL logs using 
Regexp.

Thanks
- Ioi


From erik.osterlund at oracle.com  Fri Nov 22 17:22:13 2019
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Fri, 22 Nov 2019 18:22:13 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>

Hi Martin,

Yeah, sorry it wasn?t communicated from the start. When this was discussed internally there were a whole bunch of factors considered. But loom interactions was really a primary concern in the discussions, and I was surprised to find it has yet to be mentioned in this thread!

Hope this makes a bit more sense now!

Thanks,
/Erik

> On 22 Nov 2019, at 17:29, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> ?Hi all,
> 
> the only feasible argument for considering biased locking removal I've read so far is the one from Erik.
> (Thanks, Erik!)
> 
> This should have been on the table before proposing deprecation.
> 
> If the plan is to deprecate it and later obsolete it in the release in which project loom enters main line,
> I guess we (SAP) could agree with it.
> It would be really helpful to communicate such plans before proposing deprecation.
> 
> Best regards,
> Martin 
> 
> 
>> -----Original Message-----
>> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
>> Sent: Freitag, 22. November 2019 16:06
>> To: Andrew Dinn <adinn at redhat.com>; David Holmes
>> <david.holmes at oracle.com>; Doerr, Martin <martin.doerr at sap.com>;
>> Patricio Chilano <patricio.chilano.mateo at oracle.com>; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR 8231264: Disable biased-locking and deprecate all flags
>> related to biased-locking
>> 
>>> On 11/22/19 5:14 AM, Andrew Dinn wrote:
>>> On 21/11/2019 21:31, David Holmes wrote:
>>>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>>>> I think deprecating before publishing an evaluation or at least having
>>>>> a discussion is not appropriate.
>>>> Deprecation shows the intent that we (eventually) want to remove this
>>>> and that people should try to avoid using it. If we don't actually
>>>> deprecate it but just turn off then here is a likely scenario:
>>> Who is this we?
>> 
>> The way you edited the original thread makes it look like the "we" comes
>> out of no where. Okay. Not sure why you did that, but here's the complete
>> context:
>> 
>> I posted this comment:
>> 
>>> On 11/18/19 6:06 PM, Daniel D. Daugherty wrote:
>>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>>> making the code obsolete so this changeset is not taking anything away
>>> other than changing the default of UseBiasedLocking from true to false.
>>> There are things that have been deprecated since JDK8 and they still
>>> have not yet been made obsolete.
>>> 
>>> Deprecating biased locking is the proper way of saying that we (Oracle)
>>> and/or others think that biased locking should/will go away in a future
>>> release. Yes, there are locking experts outside of Oracle that have said
>>> that biased locking should go away, but I haven't gotten permission to
>>> quote the folks (yet)...
>>> 
>>> Deprecation is not final. Features can be un-deprecated if some
>>> relevant facts and/or info changes the previous conclusion.
>> 
>> Martin posted this reply to my comment:
>>> On 11/19/19 11:51 AM, Doerr, Martin wrote:
>>>> Hi Dan,
>>>> 
>>>>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>>>>> making the code obsolete so this changeset is not taking anything away
>>>>> other than changing the default of UseBiasedLocking from true to false.
>>>>> There are things that have been deprecated since JDK8 and they still
>>>>> have not yet been made obsolete.
>>>> I think deprecating before publishing an evaluation or at least having a
>> discussion is not appropriate.
>>>> 
>>>>> Deprecating biased locking is the proper way of saying that we (Oracle)
>>>>> and/or others think that biased locking should/will go away in a future
>>>>> release. Yes, there are locking experts outside of Oracle that have said
>>>>> that biased locking should go away, but I haven't gotten permission to
>>>>> quote the folks (yet)...
>>>> There should be consent on the direction of possibly removing it before
>> communicating it the hard way.
>>>> However, switching it off for evaluation sounds feasible to me.
>>>> Seems like we have some homework, too.
>> 
>> And David posted a reply to Martin's comment:
>> 
>>> On 11/21/19 4:31 PM, David Holmes wrote:
>>>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>>>> Hi Dan,
>>>>> 
>>>>>> As for the whole "too soon to deprecate" discussion: Deprecation is
>>>>>> not
>>>>>> making the code obsolete so this changeset is not taking anything away
>>>>>> other than changing the default of UseBiasedLocking from true to
>>>>>> false.
>>>>>> There are things that have been deprecated since JDK8 and they still
>>>>>> have not yet been made obsolete.
>>>>> 
>>>>> I think deprecating before publishing an evaluation or at least
>>>>> having a discussion is not appropriate.
>>>> 
>>>> Deprecation shows the intent that we (eventually) want to remove this
>>>> and that people should try to avoid using it. If we don't actually
>>>> deprecate it but just turn off then here is a likely scenario:
>>>> 
>>>> - we turn of BL in 14
>>>> - customer updates to 14 sees a performance issue, checks the release
>>>> notes, sees BL is disabled and turns it back on.
>>>> - customer continues on their merry way and feels no need to report
>>>> back to OpenJDK that they need BL (even if we ask them to via release
>>>> notes)
>>>> - we get no feedback that BL is still useful and so we deprecate it
>>>> in, say, 16
>>>> - customer updates to 16 and gets the deprecation warning and then
>>>> reports back that they need BL
>>>> 
>>>> Alternatively we deprecate in 14 and customer lets us know straight
>>>> away that it is still useful.
>> 
>> So now David's use of "we" should be more clear. I do have to point out
>> that my use of "we (Oracle)" was present in both Martin's reply and in
>> David's reply to Martin, but for some reason you chose to edit it out.
>> This makes your pushing back on David's use of an unqualified "we"
>> questionable. Are you trying to be intentionally confrontational?
>> 
>> 
>>> The premise of your scenario has built in the conclusion
>>> that some of /us/ are questioning and thereby excluded our critique from
>>> any chance of qualifying the proposed action.
>> 
>> Hmmm... I don't see anyone excluding critiques here, but maybe I've missed
>> something...
>> 
>> 
>>> If the existence of such a consensus is not clear (and I suggest that
>>> this thread makes that plain) and the evidence for arriving at such a
>>> consensus is not compelling (ditto) and if the rest of the scenario will
>>> likely play out as you suggest then that is a strong reason to
>>> re-address the decision to switch the feature off, whether or not it is
>>> deprecated at the same time.
>> 
>> I suspect that "compelling" is in the eye of the beholder.
>> 
>> Simply changing the default from true to false is pretty much a silent
>> change in behavior even if we put out a release note. By deprecating at
>> the same time, we'll have a visible diagnostic message if biased locking
>> is enabled. That's much more likely to lead to feedback than a silent
>> change in behavior.
>> 
>> 
>>>> Alternatively we deprecate in 14 and customer lets us know straight away
>>>> that it is still useful.
>>> Alternatively, we come up with better evidence that it needs switching
>>> off (and, possibly, deprecating).
>> 
>> I wonder what would be considered acceptable "better evidence".
>> 
>> Dan
>> 
>>> 
>>> regards,
>>> 
>>> 
>>> Andrew Dinn
>>> -----------
>>> Senior Principal Software Engineer
>>> Red Hat UK Ltd
>>> Registered in England and Wales under Company Registration No. 03798903
>>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>> 
>>> 
> 


From mikhailo.seledtsov at oracle.com  Fri Nov 22 17:26:29 2019
From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com)
Date: Fri, 22 Nov 2019 09:26:29 -0800
Subject: RFR(S) 8234622 [TESTBUG] ArchivedModuleCompareTest.java fails
 with -vmoptions:-Xlog:cds
In-Reply-To: <13280c3b-0c54-9b40-ce59-6e9bf8f6a3af@oracle.com>
References: <13280c3b-0c54-9b40-ce59-6e9bf8f6a3af@oracle.com>
Message-ID: <dfcf469d-1ffe-f559-b3ea-77b51f3f9820@oracle.com>

Looks good,

Misha

On 11/22/19 9:16 AM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8234622
> http://cr.openjdk.java.net/~iklam/jdk14/8234622-ArchivedModuleCompareTest-failed-with-logging.v01/ 
>
>
> This test is confused when it encounters UL logging messages in the 
> STDOUT of child processes. The fix is to filter out the UL logs using 
> Regexp.
>
> Thanks
> - Ioi
>
>

From patricio.chilano.mateo at oracle.com  Fri Nov 22 18:25:02 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Fri, 22 Nov 2019 15:25:02 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
Message-ID: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>

Hi,

This patch aims to address a current bug where, given the right 
combination of handshakes and external suspend/resume, a JavaThread can 
transition from a safe state back to Java without blocking for a 
still-in-progress handshake. In the description of the bug I added an 
example, tracing the state changes of the JavaThread as it goes through 
the different transitions until it escapes the handshake. Currently, the 
window of time for this issue to happen is so small that we do not see 
actual failures running tests. Running test SuspendAtExit.java and 
adding some small delay before restoring the JavaThread state in 
java_suspend_self_with_safepoint_check() can demonstrate the issue.
The proposed fix is to check again if we have a pending/in-progress 
handshake operation after executing ~ThreadInVMForHandshake().

Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows and 
Solaris).

Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/

Thanks,
Patricio

From calvin.cheung at oracle.com  Fri Nov 22 19:12:45 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Fri, 22 Nov 2019 11:12:45 -0800
Subject: RFR(S) 8234429: appcds/dynamicArchive tests crashing with Graal
In-Reply-To: <2ca6a977-b4d7-31d8-cba0-e4c9d9822f65@oracle.com>
References: <2ca6a977-b4d7-31d8-cba0-e4c9d9822f65@oracle.com>
Message-ID: <96c3ed40-1d70-cdde-d247-91042785f760@oracle.com>

Hi Ioi,

The fix looks good.

thanks,

Calvin

On 11/20/19 2:28 PM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8234429
> http://cr.openjdk.java.net/~iklam/jdk14/8234429-dynamic-cds-graal-crash.v01/ 
>
>
> In JDK-8231610, the implementation of DynamicArchive::is_mapped() is 
> changed to
>
> ??? static bool is_mapped() { return FileMapInfo::dynamic_info() != 
> NULL; }
>
> During dynamic dumping, we temporarily (inside a safepoint) allocate a 
> dynamic FileMapInfo, which makes it appear as if the dynamic archive 
> has been mapped.
>
> When graal is enabled, the VM actually continues to run for a little 
> (compiling Java methods) after dynamic dumping has finished. During 
> this time, when JVMCI tries to resolves a class, it might try to look 
> up from the dynamic archive, which will fail as the dynamic archive 
> isn't really mapped.
>
> The fix is to free the temporarily allocated FileMapInfo when dynamic 
> dumping is finished.
>
> Thanks
> - Ioi

From calvin.cheung at oracle.com  Fri Nov 22 19:23:41 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Fri, 22 Nov 2019 11:23:41 -0800
Subject: RFR(XS) 8234539 ArchiveRelocationTest.java failed: Archive
 mapping should always succeed
In-Reply-To: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>
References: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>
Message-ID: <698fac17-b84e-a5ab-44c2-ccfb08bbfe27@oracle.com>

Hi Ioi,

The fix looks good.

thanks,

Calvin

On 11/21/19 2:58 PM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8234539
> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v01/ 
>
>
> This bug happens only on Windows. The fix is one-line -- in order to 
> check
> whether "This is the second time we try to map the archive(s)", 
> instead of
> using (addr_delta != 0), the correct condition is (rs.is_reserved()). 
> Please
> see the bug report for details.
>
> I also improve the log messages when error happens.
>
> Thanks
> - Ioi

From daniel.daugherty at oracle.com  Fri Nov 22 21:50:38 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 22 Nov 2019 16:50:38 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
Message-ID: <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>

Hi Robbin,

Sorry I'm late to this review thread...

I'm adding Serguei to this email thread since I'm making comments
about the JVM/TI parts of this changeset...


> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 


src/hotspot/share/runtime/vmOperations.hpp
 ??? No comments.

src/hotspot/share/runtime/vmOperations.cpp
 ??? No comments.

src/hotspot/share/runtime/vmThread.hpp
 ??? L148: ? // The ever running loop for the VMThread
 ??? L149: ? void loop();
 ??? L150: ? static void check_cleanup();
 ??????? nit - Feels like an odd place to add check_cleanup().

 ??????? Update: Now that I've seen what clean_up(), it needs a
 ??????? better name. Perhaps check_for_forced_cleanup()? And since
 ??????? it is supposed to affect the running loop for the VMThread
 ??????? I'm okay with its location now.

src/hotspot/share/runtime/vmThread.cpp
 ??? L382: ? event->set_blocking(true);
 ??????? Probably have to keep the 'blocking' attribute in the event
 ??????? for backward compatibility in the JFR record format?

 ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
regular intervals
 ??????? Is this comment true anymore (even before this changeset)?
 ??????? Adding this on the next line might help:

 ????????????????? // (if there is cleanup work to do)

 ??????? since I _think_ that's how the policy has been evolved...

 ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
 ??????? Please prefix with "(void)" to make it clear you are
 ??????? intentionally ignoring the return value.

 ??? old L627-634 (We want to make sure that we get to a safepoint 
regularly)
 ??????? I think this now old code is covered by your change above:

 ??????? L488: ??????? // If the queue contains a safepoint VM op,
 ??????? L489: ??????? // clean up will be done so we can skip this part.
 ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {

 ??????? Please confirm that our thinking is the same here.

 ??? L661: ??? int ticket =? t->vm_operation_ticket();
 ??????? nit - extra space after '='

 ??? Okay. Definitely simpler code.

src/hotspot/share/runtime/handshake.cpp
 ??? No comments.

src/hotspot/share/runtime/safepoint.hpp
 ??? No comments.

src/hotspot/share/runtime/safepoint.cpp
 ??? Definitely got my attention with
 ??? ObjectSynchronizer::needs_monitor_scavenge().

src/hotspot/share/runtime/synchronizer.hpp
 ??? No comments.

src/hotspot/share/runtime/synchronizer.cpp
 ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
triggering safepoint cleanup.");
 ??????? Thanks for adding the logging line.

 ?? ? ?? Update: As Kim pointed out, this code goes away when
 ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
 ? ? ? ? forward to making that change.

 ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
 ??????? nit - extra space between 'xchg ('

 ??????? Since InduceScavenge() is only called when the deprecated
 ??????? MonitorBound is specified, I think you could use cmpxchg()
 ??????? for clarity. Of course, you might be thinking that the
 ??????? pattern is a useful example for other folks to copy...

src/hotspot/share/runtime/thread.cpp
 ??? old L527: // Enqueue a VM_Operation to do the job for us - sometime 
later
 ??? L527: void Thread::send_async_exception(oop java_thread, oop 
java_throwable) {
 ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
 ??? L529: ? VMThread::execute(&vm_stop);
 ??? L530: }
 ?????? Okay so you deleted the comment about the call being async and the
 ?????? VM op is no longer async, but does that break the expectation of
 ?????? any callers?

 ?????? Off the top of head, I can't think of a way for a caller of
 ?????? Thread::send_async_exception() to determine that the call is now
 ?????? synchronous instead of asynchronous, but ...

 ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
 ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
 ?????? used to throw an exception at the calling thread, I suspect that
 ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
 ?????? With the exception throwing now being synchronous, would that
 ?????? affect the return value of the JVM/TI StopThread() call?

 ?????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
 ?????? in the build directory) uses ThreadInVMfromNative so the calling
 ?????? thread is in VM when it requests the now synchronous VM operation.
 ?????? When it requests the VM op, the calling thread will block which
 ?????? should allow the VM thread to execute the op. No worries there so
 ?????? far...

 ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
 ?????? so I think if the exception is delivered to the calling thread
 ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
 ?????? will have our return value. The CautiouslyPreserveExceptionMark
 ?????? destructor won't kick in until we return from jvmti_StopThread()
 ?????? (the JVM/TI wrapper from the build).

 ?????? However, that might cause this assertion to fire:

 ?????? src/hotspot/share/utilities/preserveException.cpp:
 ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
generated");

 ?????? because it is now detecting that an exception was thrown
 ?????? while executing a JVM/TI call. This is pure theory here.

src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
 ??? No comments.

src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
 ??? No comments.

src/hotspot/share/runtime/biasedLocking.cpp
 ??? old L85: ??? // Use async VM operation to avoid blocking the 
Watcher thread.
 ??????? Again, you've deleted the comment, but is there going to
 ??????? be any unexpected side effects from the change? Looks like
 ??????? the work consists of:

 ??????? L70: 
ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);

 ??????? Is that going to be a problem for the WatcherThread?

test/hotspot/gtest/threadHelper.inline.hpp
 ??? No comments.

As David H. likes to say: the proof is in the building and testing.

Thumbs up on the overall idea and implementation. There might be an
issue lurking there in JVM/TI StopThread(), but that's just a theory
on my part...

Dan


On 11/22/19 9:39 AM, Robbin Ehn wrote:
> Hi David,
>
> On 11/22/19 7:13 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>> Hi,
>>>
>>> Here is v3:
>>>
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>
>> src/hotspot/share/runtime/synchronizer.cpp
>>
>> Looking at the highly discussed:
>>
>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>> &ForceMonitorScavenge) == 0) {
>>
>> why isn't that just:
>>
>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>
>> ??
>
> I assumed someone had seen contention on ForceMonitorScavenge.
> Many threads can be enter and re-enter here.
> I don't know if that's still the case.
>
> Since we only hit this path when the deprecated MonitorsBound is set, 
> I think I can change it?
>
>>
>> Also while we are here can we clean this up further:
>>
>> static volatile int ForceMonitorScavenge = 0;
>>
>> becomes
>>
>> static int _forceMonitorScavenge = 0;
>>
>> so the variable doesn't look like it came from globals.hpp :)
>>
>
> Sure!
>
>> Just to be clear, I understand the changes around monitor scavenging 
>> now, though I'm not sure getting rid of async VM ops and replacing 
>> with a new way to directly wakeup the VMThread really amounts to a 
>> simplification.
>>
>> ---
>>
>> src/hotspot/share/runtime/vmOperations.hpp
>>
>> I still think getting rid of Mode altogether would be a good 
>> simplification. :)
>
> Sure!
>
> Here is v4, inc:
> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
> Full:
> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>
> Tested t1-3
>
> Thanks, Robbin
>
>
>>
>> Thanks,
>> David
>> -----
>>
>>
>>> Inc:
>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>
>>> Tested t1-3
>>>
>>> Thanks, Robbin
>>>
>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>> Hi all, please review.
>>>>
>>>> CMS was the last real user of the more advantage features of VM 
>>>> operation.
>>>> VM operation can be simplified to always be an stack object and 
>>>> thus either be
>>>> of safepoint or no safepoint type.
>>>>
>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>> needed (default not used). Making it synchrone doesn't matter.
>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>>> for the safepoint anyways, no real point in not stopping direct.
>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>> the VM op is not needed. Arguably this thread should actually stop 
>>>> here, since we are about to safepoint.
>>>>
>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>> method is removed.
>>>> And the extra safepoint is removed:
>>>> "// We want to make sure that we get to a safepoint regularly"
>>>> No we don't :)
>>>>
>>>> Issue:
>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>> Change-set:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>
>>>> Tested scavenge manually, passes t1-2.
>>>>
>>>> Thanks, Robbin


From daniel.daugherty at oracle.com  Fri Nov 22 21:58:44 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 22 Nov 2019 16:58:44 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
Message-ID: <092954ba-4eca-90bf-3d89-ddd24706f7a8@oracle.com>

Just to you...

I think you need to do at least one set of Mach5 runs that includes the
higher tiers. With this change, you really need JPDA (including JVM/TI)
which comes in at Tier5 and you need stress testing which comes in at
Tiers[678].

I tend to submit Tier[1-3], Tier[4-6], Tier7 and then Tier8. This gets
thru more quickly than a Tier[1-8] which has too many tasks...

Dan


On 11/22/19 4:50 PM, Daniel D. Daugherty wrote:
> Hi Robbin,
>
> Sorry I'm late to this review thread...
>
> I'm adding Serguei to this email thread since I'm making comments
> about the JVM/TI parts of this changeset...
>
>
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>
>
> src/hotspot/share/runtime/vmOperations.hpp
> ??? No comments.
>
> src/hotspot/share/runtime/vmOperations.cpp
> ??? No comments.
>
> src/hotspot/share/runtime/vmThread.hpp
> ??? L148: ? // The ever running loop for the VMThread
> ??? L149: ? void loop();
> ??? L150: ? static void check_cleanup();
> ??????? nit - Feels like an odd place to add check_cleanup().
>
> ??????? Update: Now that I've seen what clean_up(), it needs a
> ??????? better name. Perhaps check_for_forced_cleanup()? And since
> ??????? it is supposed to affect the running loop for the VMThread
> ??????? I'm okay with its location now.
>
> src/hotspot/share/runtime/vmThread.cpp
> ??? L382: ? event->set_blocking(true);
> ??????? Probably have to keep the 'blocking' attribute in the event
> ??????? for backward compatibility in the JFR record format?
>
> ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
> regular intervals
> ??????? Is this comment true anymore (even before this changeset)?
> ??????? Adding this on the next line might help:
>
> ????????????????? // (if there is cleanup work to do)
>
> ??????? since I _think_ that's how the policy has been evolved...
>
> ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
> ??????? Please prefix with "(void)" to make it clear you are
> ??????? intentionally ignoring the return value.
>
> ??? old L627-634 (We want to make sure that we get to a safepoint 
> regularly)
> ??????? I think this now old code is covered by your change above:
>
> ??????? L488: ??????? // If the queue contains a safepoint VM op,
> ??????? L489: ??????? // clean up will be done so we can skip this part.
> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>
> ??????? Please confirm that our thinking is the same here.
>
> ??? L661: ??? int ticket =? t->vm_operation_ticket();
> ??????? nit - extra space after '='
>
> ??? Okay. Definitely simpler code.
>
> src/hotspot/share/runtime/handshake.cpp
> ??? No comments.
>
> src/hotspot/share/runtime/safepoint.hpp
> ??? No comments.
>
> src/hotspot/share/runtime/safepoint.cpp
> ??? Definitely got my attention with
> ??? ObjectSynchronizer::needs_monitor_scavenge().
>
> src/hotspot/share/runtime/synchronizer.hpp
> ??? No comments.
>
> src/hotspot/share/runtime/synchronizer.cpp
> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
> triggering safepoint cleanup.");
> ??????? Thanks for adding the logging line.
>
> ?? ? ?? Update: As Kim pointed out, this code goes away when
> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
> ? ? ? ? forward to making that change.
>
> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
> ??????? nit - extra space between 'xchg ('
>
> ??????? Since InduceScavenge() is only called when the deprecated
> ??????? MonitorBound is specified, I think you could use cmpxchg()
> ??????? for clarity. Of course, you might be thinking that the
> ??????? pattern is a useful example for other folks to copy...
>
> src/hotspot/share/runtime/thread.cpp
> ??? old L527: // Enqueue a VM_Operation to do the job for us - 
> sometime later
> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
> java_throwable) {
> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
> ??? L529: ? VMThread::execute(&vm_stop);
> ??? L530: }
> ?????? Okay so you deleted the comment about the call being async and the
> ?????? VM op is no longer async, but does that break the expectation of
> ?????? any callers?
>
> ?????? Off the top of head, I can't think of a way for a caller of
> ?????? Thread::send_async_exception() to determine that the call is now
> ?????? synchronous instead of asynchronous, but ...
>
> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
> ?????? used to throw an exception at the calling thread, I suspect that
> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
> ?????? With the exception throwing now being synchronous, would that
> ?????? affect the return value of the JVM/TI StopThread() call?
>
> ?????? Looks like the JVM/TI wrapper (see 
> gensrc/jvmtifiles/jvmtiEnter.cpp
> ?????? in the build directory) uses ThreadInVMfromNative so the calling
> ?????? thread is in VM when it requests the now synchronous VM operation.
> ?????? When it requests the VM op, the calling thread will block which
> ?????? should allow the VM thread to execute the op. No worries there so
> ?????? far...
>
> ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
> ?????? so I think if the exception is delivered to the calling thread
> ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
> ?????? will have our return value. The CautiouslyPreserveExceptionMark
> ?????? destructor won't kick in until we return from jvmti_StopThread()
> ?????? (the JVM/TI wrapper from the build).
>
> ?????? However, that might cause this assertion to fire:
>
> ?????? src/hotspot/share/utilities/preserveException.cpp:
> ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
> generated");
>
> ?????? because it is now detecting that an exception was thrown
> ?????? while executing a JVM/TI call. This is pure theory here.
>
> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
> ??? No comments.
>
> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
> ??? No comments.
>
> src/hotspot/share/runtime/biasedLocking.cpp
> ??? old L85: ??? // Use async VM operation to avoid blocking the 
> Watcher thread.
> ??????? Again, you've deleted the comment, but is there going to
> ??????? be any unexpected side effects from the change? Looks like
> ??????? the work consists of:
>
> ??????? L70: 
> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>
> ??????? Is that going to be a problem for the WatcherThread?
>
> test/hotspot/gtest/threadHelper.inline.hpp
> ??? No comments.
>
> As David H. likes to say: the proof is in the building and testing.
>
> Thumbs up on the overall idea and implementation. There might be an
> issue lurking there in JVM/TI StopThread(), but that's just a theory
> on my part...
>
> Dan
>
>
>
> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>> Hi David,
>>
>> On 11/22/19 7:13 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>> Hi,
>>>>
>>>> Here is v3:
>>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>>
>>> Looking at the highly discussed:
>>>
>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>> &ForceMonitorScavenge) == 0) {
>>>
>>> why isn't that just:
>>>
>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>
>>> ??
>>
>> I assumed someone had seen contention on ForceMonitorScavenge.
>> Many threads can be enter and re-enter here.
>> I don't know if that's still the case.
>>
>> Since we only hit this path when the deprecated MonitorsBound is set, 
>> I think I can change it?
>>
>>>
>>> Also while we are here can we clean this up further:
>>>
>>> static volatile int ForceMonitorScavenge = 0;
>>>
>>> becomes
>>>
>>> static int _forceMonitorScavenge = 0;
>>>
>>> so the variable doesn't look like it came from globals.hpp :)
>>>
>>
>> Sure!
>>
>>> Just to be clear, I understand the changes around monitor scavenging 
>>> now, though I'm not sure getting rid of async VM ops and replacing 
>>> with a new way to directly wakeup the VMThread really amounts to a 
>>> simplification.
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>>
>>> I still think getting rid of Mode altogether would be a good 
>>> simplification. :)
>>
>> Sure!
>>
>> Here is v4, inc:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>
>> Tested t1-3
>>
>> Thanks, Robbin
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>> Hi all, please review.
>>>>>
>>>>> CMS was the last real user of the more advantage features of VM 
>>>>> operation.
>>>>> VM operation can be simplified to always be an stack object and 
>>>>> thus either be
>>>>> of safepoint or no safepoint type.
>>>>>
>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>>>> for the safepoint anyways, no real point in not stopping direct.
>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>>> the VM op is not needed. Arguably this thread should actually stop 
>>>>> here, since we are about to safepoint.
>>>>>
>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>> method is removed.
>>>>> And the extra safepoint is removed:
>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>> No we don't :)
>>>>>
>>>>> Issue:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>> Change-set:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>
>>>>> Tested scavenge manually, passes t1-2.
>>>>>
>>>>> Thanks, Robbin
>


From coleen.phillimore at oracle.com  Fri Nov 22 22:15:32 2019
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Fri, 22 Nov 2019 17:15:32 -0500
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
Message-ID: <e588d805-fa9e-c051-0327-1137285d0329@oracle.com>


This fix looks good to me, from your explanations offline and in the bug 
report.
Coleen

On 11/22/19 1:25 PM, Patricio Chilano wrote:
> Hi,
>
> This patch aims to address a current bug where, given the right 
> combination of handshakes and external suspend/resume, a JavaThread 
> can transition from a safe state back to Java without blocking for a 
> still-in-progress handshake. In the description of the bug I added an 
> example, tracing the state changes of the JavaThread as it goes 
> through the different transitions until it escapes the handshake. 
> Currently, the window of time for this issue to happen is so small 
> that we do not see actual failures running tests. Running test 
> SuspendAtExit.java and adding some small delay before restoring the 
> JavaThread state in java_suspend_self_with_safepoint_check() can 
> demonstrate the issue.
> The proposed fix is to check again if we have a pending/in-progress 
> handshake operation after executing ~ThreadInVMForHandshake().
>
> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
> and Solaris).
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>
> Thanks,
> Patricio


From ioi.lam at oracle.com  Fri Nov 22 22:40:20 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 22 Nov 2019 14:40:20 -0800
Subject: RFR(S) 8234429: appcds/dynamicArchive tests crashing with Graal
In-Reply-To: <96c3ed40-1d70-cdde-d247-91042785f760@oracle.com>
References: <2ca6a977-b4d7-31d8-cba0-e4c9d9822f65@oracle.com>
 <96c3ed40-1d70-cdde-d247-91042785f760@oracle.com>
Message-ID: <2ff601c6-adb1-8c67-41ad-604c7513dc08@oracle.com>

Hi Calvin, Thanks for the review.

- Ioi

On 11/22/19 11:12 AM, Calvin Cheung wrote:
> Hi Ioi,
>
> The fix looks good.
>
> thanks,
>
> Calvin
>
> On 11/20/19 2:28 PM, Ioi Lam wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8234429
>> http://cr.openjdk.java.net/~iklam/jdk14/8234429-dynamic-cds-graal-crash.v01/ 
>>
>>
>> In JDK-8231610, the implementation of DynamicArchive::is_mapped() is 
>> changed to
>>
>> ??? static bool is_mapped() { return FileMapInfo::dynamic_info() != 
>> NULL; }
>>
>> During dynamic dumping, we temporarily (inside a safepoint) allocate 
>> a dynamic FileMapInfo, which makes it appear as if the dynamic 
>> archive has been mapped.
>>
>> When graal is enabled, the VM actually continues to run for a little 
>> (compiling Java methods) after dynamic dumping has finished. During 
>> this time, when JVMCI tries to resolves a class, it might try to look 
>> up from the dynamic archive, which will fail as the dynamic archive 
>> isn't really mapped.
>>
>> The fix is to free the temporarily allocated FileMapInfo when dynamic 
>> dumping is finished.
>>
>> Thanks
>> - Ioi


From daniel.daugherty at oracle.com  Fri Nov 22 23:10:37 2019
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 22 Nov 2019 18:10:37 -0500
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
Message-ID: <6365d352-d1ae-470f-6473-824677cf653a@oracle.com>

Hi Patricio,


On 11/22/19 1:25 PM, Patricio Chilano wrote:
> Hi,
>
> This patch aims to address a current bug where, given the right 
> combination of handshakes and external suspend/resume, a JavaThread 
> can transition from a safe state back to Java without blocking for a 
> still-in-progress handshake. In the description of the bug I added an 
> example, tracing the state changes of the JavaThread as it goes 
> through the different transitions until it escapes the handshake. 
> Currently, the window of time for this issue to happen is so small 
> that we do not see actual failures running tests. Running test 
> SuspendAtExit.java and adding some small delay before restoring the 
> JavaThread state in java_suspend_self_with_safepoint_check() can 
> demonstrate the issue.
> The proposed fix is to check again if we have a pending/in-progress 
> handshake operation after executing ~ThreadInVMForHandshake().
>
> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
> and Solaris).
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/

src/hotspot/share/runtime/handshake.cpp
 ??? No comments.

Thumbs up!

Nice job on the write up in the bug.

I think I grok the fix. This is very much like suspend thread loops.
As we are coming out of our block after being resumed, we have to
check for another pending suspend request that was made after we
were resumed and while we were in the process of unblocking... These
async protocols are tricky.

One last question: If a delay is added to the existing baseline code,
would SuspendAtExit.java fail? I'm trying to figure out if this race
is possible without your pending work for JDK-8232733.

Dan


>
> Thanks,
> Patricio


From calvin.cheung at oracle.com  Fri Nov 22 23:11:44 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Fri, 22 Nov 2019 15:11:44 -0800
Subject: RFR(XS) 8233446: Improve error handling when specified dynamic
 archive doesn't exist
In-Reply-To: <824ce887-2d1f-7254-4353-ca19aa10b021@oracle.com>
References: <1decfeb9-f4eb-a577-a68d-8f5dcde01d68@oracle.com>
 <824ce887-2d1f-7254-4353-ca19aa10b021@oracle.com>
Message-ID: <b465c075-7f9e-0d74-4149-6b03665e504f@oracle.com>

+1

thanks,

Calvin

On 11/22/19 4:26 AM, Lois Foltan wrote:
> Looks good.
> Lois
>
> On 11/22/2019 12:52 AM, Ioi Lam wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8233446
>> http://cr.openjdk.java.net/~iklam/jdk14/8233446-error-handling-when-dyn-archive-not-found.v01/ 
>>
>>
>> With this patch, error handling in CDS is the same when the static 
>> archive
>> or the dynamic archive cannot be opened at runtime:
>>
>> -Xshare:auto (default)
>> ??? VM will continue to execute without mapping the specified but
>> ??? unavailable archive. (Same as before this patch)
>>
>> -Xshare:on (deprecated)
>> ??? VM will exit with an error message
>>
>>
>> So this patch only modifies the behavior of -Xshare:on, which is 
>> deprecated
>> anyway. However, the code is simplified and more consistent, and will 
>> make it easier
>> to remove support for -Xshare:on in the future.
>>
>> Tested with hs-tier1/hs-tier2
>>
>> Thanks
>> - Ioi
>

From patricio.chilano.mateo at oracle.com  Sat Nov 23 00:45:35 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Fri, 22 Nov 2019 21:45:35 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <e588d805-fa9e-c051-0327-1137285d0329@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <e588d805-fa9e-c051-0327-1137285d0329@oracle.com>
Message-ID: <19f1741e-a099-d3ec-6083-de24c20179e1@oracle.com>

Thanks Coleen!

Patricio

On 11/22/19 5:15 PM, coleen.phillimore at oracle.com wrote:
>
> This fix looks good to me, from your explanations offline and in the 
> bug report.
> Coleen
>
> On 11/22/19 1:25 PM, Patricio Chilano wrote:
>> Hi,
>>
>> This patch aims to address a current bug where, given the right 
>> combination of handshakes and external suspend/resume, a JavaThread 
>> can transition from a safe state back to Java without blocking for a 
>> still-in-progress handshake. In the description of the bug I added an 
>> example, tracing the state changes of the JavaThread as it goes 
>> through the different transitions until it escapes the handshake. 
>> Currently, the window of time for this issue to happen is so small 
>> that we do not see actual failures running tests. Running test 
>> SuspendAtExit.java and adding some small delay before restoring the 
>> JavaThread state in java_suspend_self_with_safepoint_check() can 
>> demonstrate the issue.
>> The proposed fix is to check again if we have a pending/in-progress 
>> handshake operation after executing ~ThreadInVMForHandshake().
>>
>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
>> and Solaris).
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>>
>> Thanks,
>> Patricio
>


From kim.barrett at oracle.com  Sat Nov 23 00:48:41 2019
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 22 Nov 2019 19:48:41 -0500
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <0e689087-e56f-58a8-f152-1801167cc181@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <15F8277A-54A3-400D-9D08-155A8986D3C9@oracle.com>
 <0e689087-e56f-58a8-f152-1801167cc181@oracle.com>
Message-ID: <49C7F9F1-0398-45DC-857F-A4F345DEA509@oracle.com>

> On Nov 22, 2019, at 9:40 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> 
> Hi Kim,
> 
>>> so the variable doesn't look like it came from globals.hpp :)
>> I was going to ask some similar questions, but decided not to bother, e.g.
> 
> v4:
> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html

Still looks good to me.


From patricio.chilano.mateo at oracle.com  Sat Nov 23 01:12:20 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Fri, 22 Nov 2019 22:12:20 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <6365d352-d1ae-470f-6473-824677cf653a@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <6365d352-d1ae-470f-6473-824677cf653a@oracle.com>
Message-ID: <2bc75977-8cc8-0a0f-6cda-275173d1beee@oracle.com>

Hi Dan,

On 11/22/19 6:10 PM, Daniel D. Daugherty wrote:
> Hi Patricio,
>
>
> On 11/22/19 1:25 PM, Patricio Chilano wrote:
>> Hi,
>>
>> This patch aims to address a current bug where, given the right 
>> combination of handshakes and external suspend/resume, a JavaThread 
>> can transition from a safe state back to Java without blocking for a 
>> still-in-progress handshake. In the description of the bug I added an 
>> example, tracing the state changes of the JavaThread as it goes 
>> through the different transitions until it escapes the handshake. 
>> Currently, the window of time for this issue to happen is so small 
>> that we do not see actual failures running tests. Running test 
>> SuspendAtExit.java and adding some small delay before restoring the 
>> JavaThread state in java_suspend_self_with_safepoint_check() can 
>> demonstrate the issue.
>> The proposed fix is to check again if we have a pending/in-progress 
>> handshake operation after executing ~ThreadInVMForHandshake().
>>
>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
>> and Solaris).
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>
> src/hotspot/share/runtime/handshake.cpp
> ??? No comments.
>
> Thumbs up!
>
> Nice job on the write up in the bug.
Thanks!? : )

> I think I grok the fix. This is very much like suspend thread loops.
> As we are coming out of our block after being resumed, we have to
> check for another pending suspend request that was made after we
> were resumed and while we were in the process of unblocking... These
> async protocols are tricky.
>
> One last question: If a delay is added to the existing baseline code,
> would SuspendAtExit.java fail? I'm trying to figure out if this race
> is possible without your pending work for JDK-8232733.
Yes, it can fail in the current baseline, it is just unlikely so you 
have to manually add a short sleep to see it. For example, if I just add 
to the current baseline the line os::naked_short_nanosleep(100000) 
before set_thread_state_fence(state) in 
JavaThread::java_suspend_self_with_safepoint_check(), that test crashes 
when running on my Mac after 1-2 attempts. I tried to play with the 
timing a little bit in Linux too but couldn't make it fail. 8232733 will 
just make this issue more visible, since the JavaThread that is being 
handshaked could be resumed at any time during the handshake. Today, a 
JavaThread can only be resumed either before or after the handshake. So 
the issue only appears for the "before" case, when the VMThread trying 
to process the handshake sees that the JavaThread is blocked right after 
it is resumed but before its original state is restored (combined with 
the fact that the JavaThread suspended itself while polling inside the 
~ThreadInVMForHandshake()).

Thanks for reviewing this Dan!

Patricio
> Dan
>
>
>>
>> Thanks,
>> Patricio
>


From ioi.lam at oracle.com  Sat Nov 23 01:46:59 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 22 Nov 2019 17:46:59 -0800
Subject: RFR(XS) 8234539 ArchiveRelocationTest.java failed: Archive
 mapping should always succeed
In-Reply-To: <698fac17-b84e-a5ab-44c2-ccfb08bbfe27@oracle.com>
References: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>
 <698fac17-b84e-a5ab-44c2-ccfb08bbfe27@oracle.com>
Message-ID: <24e62e72-bc2f-3045-45d8-a778271d6c2d@oracle.com>

Hi Calvin,

Thanks for the review. It turned out that I needed to fix another 
(addr_delta == 0) bug in the code. I've also moved the handling of 
ArchiveRelocationMode==1 in debug builds to 
MetaspaceShared::map_archives(). This way, we can simulate the "mapping 
failure" after all archives have been mapped. This way, we can better 
test the code that unmap the archives after the initial mapping failures.

Here's the updated patch.
http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v02/

I am running tier4-rt-cds-relocation multiple times to make sure 8234539 
is no longer triggered on Windows.

Thanks
- Ioi

On 11/22/2019 11:23 AM, Calvin Cheung wrote:
> Hi Ioi,
>
> The fix looks good.
>
> thanks,
>
> Calvin
>
> On 11/21/19 2:58 PM, Ioi Lam wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8234539
>> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v01/ 
>>
>>
>> This bug happens only on Windows. The fix is one-line -- in order to 
>> check
>> whether "This is the second time we try to map the archive(s)", 
>> instead of
>> using (addr_delta != 0), the correct condition is (rs.is_reserved()). 
>> Please
>> see the bug report for details.
>>
>> I also improve the log messages when error happens.
>>
>> Thanks
>> - Ioi


From aph at redhat.com  Sun Nov 24 17:09:41 2019
From: aph at redhat.com (Andrew Haley)
Date: Sun, 24 Nov 2019 17:09:41 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <67F1BE19-5149-4760-A01F-31A3F87D979C@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <67F1BE19-5149-4760-A01F-31A3F87D979C@oracle.com>
Message-ID: <f1744141-f2f1-b186-874a-44e3a580084b@redhat.com>

On 11/22/19 11:49 AM, Erik ?sterlund wrote:

> So yeah, loom. There is that. That is what I wanted to add to this
> conversation.

Hmm. I'm working on Loom, and the last thing I want is for the Loom
patch to be a performance regression for OpenJDK as a whole. That
would be a *terrible* message, apart from any other consideration.
Certainly there are problems when running Loom virtual threads with
biased locking enabled, but that is another matter.

Maybe there are other ways to solve the problem. We could arrange it
so that any attempt to create a virtual Thread caused a mass bias
revocation, for example. If we also set BiasedLockingStartupDelay to,
say 4000ms, any application which started a virtual Thread before that
time would effectively prevent biased locking from ever being used.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From david.holmes at oracle.com  Sun Nov 24 21:33:16 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 07:33:16 +1000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>

Hi Christoph,

On 23/11/2019 12:04 am, Langer, Christoph wrote:
> Hi,
> 
> I'd like to push this change. However, running it through jdk-submit shows reproducible errors:
> 
> Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
> BuildId: 2019-11-22-0926373.christoph.langer.source
> No failed tests
> Tasks Summary
> ?	NA: 0
> ?	NOTHING_TO_RUN: 0
> ?	KILLED: 0
> ?	PASSED: 76
> ?	UNABLE_TO_RUN: 0
> ?	EXECUTED_WITH_FAILURE: 1
> ?	FAILED: 0
> ?	HARNESS_ERROR: 0
> Build
> 1 Executed with failure
> o	windows-x64-install-windows-x64-build-19 error while building, return value: 2
> 
> 
> Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
> BuildId: 2019-11-21-2311357.christoph.langer.source
> No failed tests
> Tasks Summary
> ?	NA: 0
> ?	NOTHING_TO_RUN: 0
> ?	KILLED: 0
> ?	PASSED: 76
> ?	UNABLE_TO_RUN: 0
> ?	EXECUTED_WITH_FAILURE: 1
> ?	FAILED: 0
> ?	HARNESS_ERROR: 0
> Build
> 1 Executed with failure
> o	windows-x64-install-windows-x64-build-19 error while building, return value: 2
> 
> 
> David already had a look and let me know that the following was the reason:
> 
> t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.c(41): fatal error C1083: Cannot open include file: 'jdk_util.h': No such file or directory
> 
> This is not explainable to me as I see this running through my local build and our nightly builds without problems. I also can't explain jdk_util.h can't be opened at this place - it should be there and part of the include directories...
> 
> I'd appreciate any help...

I just dug a little deeper and this is failing in part of our closed 
build for the install repo. There is a library there that is using 
canonicalize_md.c directly - i.e. it adds that file to its source files 
list. The build instructions don't include that directory on the include 
directory list - hence the failure. But it will also fail due to the 
name change you made.

Someone will need to work with you to make the necessary changes to our 
code.

David

> Thanks
> Christoph
> 
> 
>> -----Original Message-----
>> From: Langer, Christoph
>> Sent: Donnerstag, 21. November 2019 14:19
>> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function between
>> libjava, hotspot and libinstrument
>>
>> Hi Alan,
>>
>> thanks for the review. I'll push it then after running through jdk-submit.
>>
>> /Christoph
>>
>>> -----Original Message-----
>>> From: Alan Bateman <Alan.Bateman at oracle.com>
>>> Sent: Donnerstag, 21. November 2019 09:51
>>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
>>> libjava, hotspot and libinstrument
>>>
>>> On 14/11/2019 15:37, Langer, Christoph wrote:
>>>> Hi,
>>>>
>>>> please review this cleanup change regarding function "canonicalize" of
>>> libjava.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>>>>
>>>>
>>>> The goal is to cleanup how this function is defined and used. One thing is,
>>> that there was an unnecessary wrapper function "Canonicalize" in jni_util.c.
>>> It wrapped the call to "canonicalize". We can get rid of this wrapper.
>>> Unfortunately, it is not possible to just export "canonicalize" since this will
>>> conflict with a method signature from the math library, at least on modern
>>> Linuxes. So I decided to call the method JDK_Canonicalize and will correctly
>>> define it in jdk_util.h which can be included everywhere.
>>>>
>>> I think this change is okay. My main concern when initially seeing this
>>> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
>>> canonical File when it wasn't there previously, this would of course
>>> have several implications. But I think you have it right and this is, as
>>> you position, just refactoring/cleanup.
>>>
>>> -Alan

From david.holmes at oracle.com  Sun Nov 24 21:49:13 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 07:49:13 +1000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>
Message-ID: <5b46607d-c84b-086d-6241-cf2eee95d0a6@oracle.com>

On 25/11/2019 7:33 am, David Holmes wrote:
> Hi Christoph,
> 
> On 23/11/2019 12:04 am, Langer, Christoph wrote:
>> Hi,
>>
>> I'd like to push this change. However, running it through jdk-submit 
>> shows reproducible errors:
>>
>> Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
>> BuildId: 2019-11-22-0926373.christoph.langer.source
>> No failed tests
>> Tasks Summary
>> ???? NA: 0
>> ???? NOTHING_TO_RUN: 0
>> ???? KILLED: 0
>> ???? PASSED: 76
>> ???? UNABLE_TO_RUN: 0
>> ???? EXECUTED_WITH_FAILURE: 1
>> ???? FAILED: 0
>> ???? HARNESS_ERROR: 0
>> Build
>> 1 Executed with failure
>> o??? windows-x64-install-windows-x64-build-19 error while building, 
>> return value: 2
>>
>>
>> Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
>> BuildId: 2019-11-21-2311357.christoph.langer.source
>> No failed tests
>> Tasks Summary
>> ???? NA: 0
>> ???? NOTHING_TO_RUN: 0
>> ???? KILLED: 0
>> ???? PASSED: 76
>> ???? UNABLE_TO_RUN: 0
>> ???? EXECUTED_WITH_FAILURE: 1
>> ???? FAILED: 0
>> ???? HARNESS_ERROR: 0
>> Build
>> 1 Executed with failure
>> o??? windows-x64-install-windows-x64-build-19 error while building, 
>> return value: 2
>>
>>
>> David already had a look and let me know that the following was the 
>> reason:
>>
>> t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.c(41): 
>> fatal error C1083: Cannot open include file: 'jdk_util.h': No such 
>> file or directory
>>
>> This is not explainable to me as I see this running through my local 
>> build and our nightly builds without problems. I also can't explain 
>> jdk_util.h can't be opened at this place - it should be there and part 
>> of the include directories...
>>
>> I'd appreciate any help...
> 
> I just dug a little deeper and this is failing in part of our closed 
> build for the install repo. There is a library there that is using 
> canonicalize_md.c directly - i.e. it adds that file to its source files 
> list. The build instructions don't include that directory on the include 
> directory list - hence the failure. But it will also fail due to the 
> name change you made.

Actually it appears that the other source code doesn't actually refer to 
the canonicalize function at all, so a simple fix may be possible at the 
build level on our side. I'm testing that now.

David
-----

> Someone will need to work with you to make the necessary changes to our 
> code.
> 
> David
> 
>> Thanks
>> Christoph
>>
>>
>>> -----Original Message-----
>>> From: Langer, Christoph
>>> Sent: Donnerstag, 21. November 2019 14:19
>>> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function 
>>> between
>>> libjava, hotspot and libinstrument
>>>
>>> Hi Alan,
>>>
>>> thanks for the review. I'll push it then after running through 
>>> jdk-submit.
>>>
>>> /Christoph
>>>
>>>> -----Original Message-----
>>>> From: Alan Bateman <Alan.Bateman at oracle.com>
>>>> Sent: Donnerstag, 21. November 2019 09:51
>>>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function 
>>>> between
>>>> libjava, hotspot and libinstrument
>>>>
>>>> On 14/11/2019 15:37, Langer, Christoph wrote:
>>>>> Hi,
>>>>>
>>>>> please review this cleanup change regarding function "canonicalize" of
>>>> libjava.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>>>>>
>>>>>
>>>>> The goal is to cleanup how this function is defined and used. One 
>>>>> thing is,
>>>> that there was an unnecessary wrapper function "Canonicalize" in 
>>>> jni_util.c.
>>>> It wrapped the call to "canonicalize". We can get rid of this wrapper.
>>>> Unfortunately, it is not possible to just export "canonicalize" 
>>>> since this will
>>>> conflict with a method signature from the math library, at least on 
>>>> modern
>>>> Linuxes. So I decided to call the method JDK_Canonicalize and will 
>>>> correctly
>>>> define it in jdk_util.h which can be included everywhere.
>>>>>
>>>> I think this change is okay. My main concern when initially seeing this
>>>> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
>>>> canonical File when it wasn't there previously, this would of course
>>>> have several implications. But I think you have it right and this 
>>>> is, as
>>>> you position, just refactoring/cleanup.
>>>>
>>>> -Alan

From david.holmes at oracle.com  Sun Nov 24 22:45:34 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 08:45:34 +1000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <5b46607d-c84b-086d-6241-cf2eee95d0a6@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>
 <5b46607d-c84b-086d-6241-cf2eee95d0a6@oracle.com>
Message-ID: <cf42ad0e-ed3c-2e25-9381-2edd13a0af73@oracle.com>

On 25/11/2019 7:49 am, David Holmes wrote:
> On 25/11/2019 7:33 am, David Holmes wrote:
>> Hi Christoph,
>>
>> On 23/11/2019 12:04 am, Langer, Christoph wrote:
>>> Hi,
>>>
>>> I'd like to push this change. However, running it through jdk-submit 
>>> shows reproducible errors:
>>>
>>> Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
>>> BuildId: 2019-11-22-0926373.christoph.langer.source
>>> No failed tests
>>> Tasks Summary
>>> ???? NA: 0
>>> ???? NOTHING_TO_RUN: 0
>>> ???? KILLED: 0
>>> ???? PASSED: 76
>>> ???? UNABLE_TO_RUN: 0
>>> ???? EXECUTED_WITH_FAILURE: 1
>>> ???? FAILED: 0
>>> ???? HARNESS_ERROR: 0
>>> Build
>>> 1 Executed with failure
>>> o??? windows-x64-install-windows-x64-build-19 error while building, 
>>> return value: 2
>>>
>>>
>>> Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
>>> BuildId: 2019-11-21-2311357.christoph.langer.source
>>> No failed tests
>>> Tasks Summary
>>> ???? NA: 0
>>> ???? NOTHING_TO_RUN: 0
>>> ???? KILLED: 0
>>> ???? PASSED: 76
>>> ???? UNABLE_TO_RUN: 0
>>> ???? EXECUTED_WITH_FAILURE: 1
>>> ???? FAILED: 0
>>> ???? HARNESS_ERROR: 0
>>> Build
>>> 1 Executed with failure
>>> o??? windows-x64-install-windows-x64-build-19 error while building, 
>>> return value: 2
>>>
>>>
>>> David already had a look and let me know that the following was the 
>>> reason:
>>>
>>> t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.c(41): 
>>> fatal error C1083: Cannot open include file: 'jdk_util.h': No such 
>>> file or directory
>>>
>>> This is not explainable to me as I see this running through my local 
>>> build and our nightly builds without problems. I also can't explain 
>>> jdk_util.h can't be opened at this place - it should be there and 
>>> part of the include directories...
>>>
>>> I'd appreciate any help...
>>
>> I just dug a little deeper and this is failing in part of our closed 
>> build for the install repo. There is a library there that is using 
>> canonicalize_md.c directly - i.e. it adds that file to its source 
>> files list. The build instructions don't include that directory on the 
>> include directory list - hence the failure. But it will also fail due 
>> to the name change you made.
> 
> Actually it appears that the other source code doesn't actually refer to 
> the canonicalize function at all, so a simple fix may be possible at the 
> build level on our side. I'm testing that now.

It isn't the canonicalize function that is used, it is getPrefixed, 
which has now been moved to the io_util_md.c file. So a fix will be a 
bit more involved.

David

> 
> David
> -----
> 
>> Someone will need to work with you to make the necessary changes to 
>> our code.
>>
>> David
>>
>>> Thanks
>>> Christoph
>>>
>>>
>>>> -----Original Message-----
>>>> From: Langer, Christoph
>>>> Sent: Donnerstag, 21. November 2019 14:19
>>>> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function 
>>>> between
>>>> libjava, hotspot and libinstrument
>>>>
>>>> Hi Alan,
>>>>
>>>> thanks for the review. I'll push it then after running through 
>>>> jdk-submit.
>>>>
>>>> /Christoph
>>>>
>>>>> -----Original Message-----
>>>>> From: Alan Bateman <Alan.Bateman at oracle.com>
>>>>> Sent: Donnerstag, 21. November 2019 09:51
>>>>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
>>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function 
>>>>> between
>>>>> libjava, hotspot and libinstrument
>>>>>
>>>>> On 14/11/2019 15:37, Langer, Christoph wrote:
>>>>>> Hi,
>>>>>>
>>>>>> please review this cleanup change regarding function 
>>>>>> "canonicalize" of
>>>>> libjava.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
>>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>>>>>>
>>>>>>
>>>>>> The goal is to cleanup how this function is defined and used. One 
>>>>>> thing is,
>>>>> that there was an unnecessary wrapper function "Canonicalize" in 
>>>>> jni_util.c.
>>>>> It wrapped the call to "canonicalize". We can get rid of this wrapper.
>>>>> Unfortunately, it is not possible to just export "canonicalize" 
>>>>> since this will
>>>>> conflict with a method signature from the math library, at least on 
>>>>> modern
>>>>> Linuxes. So I decided to call the method JDK_Canonicalize and will 
>>>>> correctly
>>>>> define it in jdk_util.h which can be included everywhere.
>>>>>>
>>>>> I think this change is okay. My main concern when initially seeing 
>>>>> this
>>>>> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
>>>>> canonical File when it wasn't there previously, this would of course
>>>>> have several implications. But I think you have it right and this 
>>>>> is, as
>>>>> you position, just refactoring/cleanup.
>>>>>
>>>>> -Alan

From david.holmes at oracle.com  Sun Nov 24 23:51:44 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 09:51:44 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
Message-ID: <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>

On 23/11/2019 3:22 am, Erik ?sterlund wrote:
> Hi Martin,
> 
> Yeah, sorry it wasn?t communicated from the start. When this was discussed internally there were a whole bunch of factors considered. But loom interactions was really a primary concern in the discussions, and I was surprised to find it has yet to be mentioned in this thread!

Loom is not the primary motivation for getting rid of the complex and 
invasive biased-locking code. Loom is a factor to consider as I've 
raised in a number of discussions, but it seems not in this public one - 
apologies on that - removing BL _might_ make it easier for Loom. The 
reality is that we don't know what impact BL has on Loom. BL is only one 
part of the native locking process that Loom currently cannot support. 
We won't know the impact of BL until we try to support native monitors. 
It may be that removing BL whilst retaining the rest of the existing 
native code for ObjectMonitor use makes a huge simplification to the 
effort; or it may be lost in the noise. Or if we remove all native code 
(seems highly unlikely) then whether or not BL was removed earlier is moot.

Biased-locking is a very old optimization for uncontended locking, based 
on a time when there was heavy single-threaded use of synchronized data 
structures, and where actual lock/unlock atomic operations were very 
expensive. It is very complex and highly intrusive code. Every time "we" 
have had to make changes to object monitor support, or safepoint 
support, we have had to deal with the added complexity that 
biased-locking introduced and "we" have asked ourselves many times 
whether "we" can just get rid of this old optimization. And as I posted 
in one of my earliest responses to Aleksey those discussions finally 
reached the point where "we" decided to propose performing this removal. 
But step one is to ascertain the impact so the initial proposal is to 
disable biased-locking by default and deprecate it so that the change is 
much more visible.

David
-----

> Hope this makes a bit more sense now!
> 
> Thanks,
> /Erik
> 
>> On 22 Nov 2019, at 17:29, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> ?Hi all,
>>
>> the only feasible argument for considering biased locking removal I've read so far is the one from Erik.
>> (Thanks, Erik!)
>>
>> This should have been on the table before proposing deprecation.
>>
>> If the plan is to deprecate it and later obsolete it in the release in which project loom enters main line,
>> I guess we (SAP) could agree with it.
>> It would be really helpful to communicate such plans before proposing deprecation.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
>>> Sent: Freitag, 22. November 2019 16:06
>>> To: Andrew Dinn <adinn at redhat.com>; David Holmes
>>> <david.holmes at oracle.com>; Doerr, Martin <martin.doerr at sap.com>;
>>> Patricio Chilano <patricio.chilano.mateo at oracle.com>; hotspot-runtime-
>>> dev at openjdk.java.net
>>> Subject: Re: RFR 8231264: Disable biased-locking and deprecate all flags
>>> related to biased-locking
>>>
>>>> On 11/22/19 5:14 AM, Andrew Dinn wrote:
>>>> On 21/11/2019 21:31, David Holmes wrote:
>>>>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>>>>> I think deprecating before publishing an evaluation or at least having
>>>>>> a discussion is not appropriate.
>>>>> Deprecation shows the intent that we (eventually) want to remove this
>>>>> and that people should try to avoid using it. If we don't actually
>>>>> deprecate it but just turn off then here is a likely scenario:
>>>> Who is this we?
>>>
>>> The way you edited the original thread makes it look like the "we" comes
>>> out of no where. Okay. Not sure why you did that, but here's the complete
>>> context:
>>>
>>> I posted this comment:
>>>
>>>> On 11/18/19 6:06 PM, Daniel D. Daugherty wrote:
>>>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>>>> making the code obsolete so this changeset is not taking anything away
>>>> other than changing the default of UseBiasedLocking from true to false.
>>>> There are things that have been deprecated since JDK8 and they still
>>>> have not yet been made obsolete.
>>>>
>>>> Deprecating biased locking is the proper way of saying that we (Oracle)
>>>> and/or others think that biased locking should/will go away in a future
>>>> release. Yes, there are locking experts outside of Oracle that have said
>>>> that biased locking should go away, but I haven't gotten permission to
>>>> quote the folks (yet)...
>>>>
>>>> Deprecation is not final. Features can be un-deprecated if some
>>>> relevant facts and/or info changes the previous conclusion.
>>>
>>> Martin posted this reply to my comment:
>>>> On 11/19/19 11:51 AM, Doerr, Martin wrote:
>>>>> Hi Dan,
>>>>>
>>>>>> As for the whole "too soon to deprecate" discussion: Deprecation is not
>>>>>> making the code obsolete so this changeset is not taking anything away
>>>>>> other than changing the default of UseBiasedLocking from true to false.
>>>>>> There are things that have been deprecated since JDK8 and they still
>>>>>> have not yet been made obsolete.
>>>>> I think deprecating before publishing an evaluation or at least having a
>>> discussion is not appropriate.
>>>>>
>>>>>> Deprecating biased locking is the proper way of saying that we (Oracle)
>>>>>> and/or others think that biased locking should/will go away in a future
>>>>>> release. Yes, there are locking experts outside of Oracle that have said
>>>>>> that biased locking should go away, but I haven't gotten permission to
>>>>>> quote the folks (yet)...
>>>>> There should be consent on the direction of possibly removing it before
>>> communicating it the hard way.
>>>>> However, switching it off for evaluation sounds feasible to me.
>>>>> Seems like we have some homework, too.
>>>
>>> And David posted a reply to Martin's comment:
>>>
>>>> On 11/21/19 4:31 PM, David Holmes wrote:
>>>>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>>> As for the whole "too soon to deprecate" discussion: Deprecation is
>>>>>>> not
>>>>>>> making the code obsolete so this changeset is not taking anything away
>>>>>>> other than changing the default of UseBiasedLocking from true to
>>>>>>> false.
>>>>>>> There are things that have been deprecated since JDK8 and they still
>>>>>>> have not yet been made obsolete.
>>>>>>
>>>>>> I think deprecating before publishing an evaluation or at least
>>>>>> having a discussion is not appropriate.
>>>>>
>>>>> Deprecation shows the intent that we (eventually) want to remove this
>>>>> and that people should try to avoid using it. If we don't actually
>>>>> deprecate it but just turn off then here is a likely scenario:
>>>>>
>>>>> - we turn of BL in 14
>>>>> - customer updates to 14 sees a performance issue, checks the release
>>>>> notes, sees BL is disabled and turns it back on.
>>>>> - customer continues on their merry way and feels no need to report
>>>>> back to OpenJDK that they need BL (even if we ask them to via release
>>>>> notes)
>>>>> - we get no feedback that BL is still useful and so we deprecate it
>>>>> in, say, 16
>>>>> - customer updates to 16 and gets the deprecation warning and then
>>>>> reports back that they need BL
>>>>>
>>>>> Alternatively we deprecate in 14 and customer lets us know straight
>>>>> away that it is still useful.
>>>
>>> So now David's use of "we" should be more clear. I do have to point out
>>> that my use of "we (Oracle)" was present in both Martin's reply and in
>>> David's reply to Martin, but for some reason you chose to edit it out.
>>> This makes your pushing back on David's use of an unqualified "we"
>>> questionable. Are you trying to be intentionally confrontational?
>>>
>>>
>>>> The premise of your scenario has built in the conclusion
>>>> that some of /us/ are questioning and thereby excluded our critique from
>>>> any chance of qualifying the proposed action.
>>>
>>> Hmmm... I don't see anyone excluding critiques here, but maybe I've missed
>>> something...
>>>
>>>
>>>> If the existence of such a consensus is not clear (and I suggest that
>>>> this thread makes that plain) and the evidence for arriving at such a
>>>> consensus is not compelling (ditto) and if the rest of the scenario will
>>>> likely play out as you suggest then that is a strong reason to
>>>> re-address the decision to switch the feature off, whether or not it is
>>>> deprecated at the same time.
>>>
>>> I suspect that "compelling" is in the eye of the beholder.
>>>
>>> Simply changing the default from true to false is pretty much a silent
>>> change in behavior even if we put out a release note. By deprecating at
>>> the same time, we'll have a visible diagnostic message if biased locking
>>> is enabled. That's much more likely to lead to feedback than a silent
>>> change in behavior.
>>>
>>>
>>>>> Alternatively we deprecate in 14 and customer lets us know straight away
>>>>> that it is still useful.
>>>> Alternatively, we come up with better evidence that it needs switching
>>>> off (and, possibly, deprecating).
>>>
>>> I wonder what would be considered acceptable "better evidence".
>>>
>>> Dan
>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>> Andrew Dinn
>>>> -----------
>>>> Senior Principal Software Engineer
>>>> Red Hat UK Ltd
>>>> Registered in England and Wales under Company Registration No. 03798903
>>>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>>>
>>>>
>>
> 

From david.holmes at oracle.com  Mon Nov 25 00:02:13 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 10:02:13 +1000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <cf42ad0e-ed3c-2e25-9381-2edd13a0af73@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>
 <5b46607d-c84b-086d-6241-cf2eee95d0a6@oracle.com>
 <cf42ad0e-ed3c-2e25-9381-2edd13a0af73@oracle.com>
Message-ID: <1d4ed7c6-d626-053a-e077-da284a078082@oracle.com>


On 25/11/2019 8:45 am, David Holmes wrote:
> On 25/11/2019 7:49 am, David Holmes wrote:
>> On 25/11/2019 7:33 am, David Holmes wrote:
>>> Hi Christoph,
>>>
>>> On 23/11/2019 12:04 am, Langer, Christoph wrote:
>>>> Hi,
>>>>
>>>> I'd like to push this change. However, running it through jdk-submit 
>>>> shows reproducible errors:
>>>>
>>>> Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
>>>> BuildId: 2019-11-22-0926373.christoph.langer.source
>>>> No failed tests
>>>> Tasks Summary
>>>> ???? NA: 0
>>>> ???? NOTHING_TO_RUN: 0
>>>> ???? KILLED: 0
>>>> ???? PASSED: 76
>>>> ???? UNABLE_TO_RUN: 0
>>>> ???? EXECUTED_WITH_FAILURE: 1
>>>> ???? FAILED: 0
>>>> ???? HARNESS_ERROR: 0
>>>> Build
>>>> 1 Executed with failure
>>>> o??? windows-x64-install-windows-x64-build-19 error while building, 
>>>> return value: 2
>>>>
>>>>
>>>> Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
>>>> BuildId: 2019-11-21-2311357.christoph.langer.source
>>>> No failed tests
>>>> Tasks Summary
>>>> ???? NA: 0
>>>> ???? NOTHING_TO_RUN: 0
>>>> ???? KILLED: 0
>>>> ???? PASSED: 76
>>>> ???? UNABLE_TO_RUN: 0
>>>> ???? EXECUTED_WITH_FAILURE: 1
>>>> ???? FAILED: 0
>>>> ???? HARNESS_ERROR: 0
>>>> Build
>>>> 1 Executed with failure
>>>> o??? windows-x64-install-windows-x64-build-19 error while building, 
>>>> return value: 2
>>>>
>>>>
>>>> David already had a look and let me know that the following was the 
>>>> reason:
>>>>
>>>> t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.c(41): 
>>>> fatal error C1083: Cannot open include file: 'jdk_util.h': No such 
>>>> file or directory
>>>>
>>>> This is not explainable to me as I see this running through my local 
>>>> build and our nightly builds without problems. I also can't explain 
>>>> jdk_util.h can't be opened at this place - it should be there and 
>>>> part of the include directories...
>>>>
>>>> I'd appreciate any help...
>>>
>>> I just dug a little deeper and this is failing in part of our closed 
>>> build for the install repo. There is a library there that is using 
>>> canonicalize_md.c directly - i.e. it adds that file to its source 
>>> files list. The build instructions don't include that directory on 
>>> the include directory list - hence the failure. But it will also fail 
>>> due to the name change you made.
>>
>> Actually it appears that the other source code doesn't actually refer 
>> to the canonicalize function at all, so a simple fix may be possible 
>> at the build level on our side. I'm testing that now.
> 
> It isn't the canonicalize function that is used, it is getPrefixed, 
> which has now been moved to the io_util_md.c file. So a fix will be a 
> bit more involved.

I tried adding io_util_md.c to the library source list instead of 
canonicalize_md.c but that just caused a slew of other compilation 
failures, so I don't see any quick fix for us here.

David

> 
> David
> 
>>
>> David
>> -----
>>
>>> Someone will need to work with you to make the necessary changes to 
>>> our code.
>>>
>>> David
>>>
>>>> Thanks
>>>> Christoph
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Langer, Christoph
>>>>> Sent: Donnerstag, 21. November 2019 14:19
>>>>> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
>>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function 
>>>>> between
>>>>> libjava, hotspot and libinstrument
>>>>>
>>>>> Hi Alan,
>>>>>
>>>>> thanks for the review. I'll push it then after running through 
>>>>> jdk-submit.
>>>>>
>>>>> /Christoph
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Alan Bateman <Alan.Bateman at oracle.com>
>>>>>> Sent: Donnerstag, 21. November 2019 09:51
>>>>>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
>>>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function 
>>>>>> between
>>>>>> libjava, hotspot and libinstrument
>>>>>>
>>>>>> On 14/11/2019 15:37, Langer, Christoph wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> please review this cleanup change regarding function 
>>>>>>> "canonicalize" of
>>>>>> libjava.
>>>>>>>
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
>>>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>>>>>>>
>>>>>>>
>>>>>>> The goal is to cleanup how this function is defined and used. One 
>>>>>>> thing is,
>>>>>> that there was an unnecessary wrapper function "Canonicalize" in 
>>>>>> jni_util.c.
>>>>>> It wrapped the call to "canonicalize". We can get rid of this 
>>>>>> wrapper.
>>>>>> Unfortunately, it is not possible to just export "canonicalize" 
>>>>>> since this will
>>>>>> conflict with a method signature from the math library, at least 
>>>>>> on modern
>>>>>> Linuxes. So I decided to call the method JDK_Canonicalize and will 
>>>>>> correctly
>>>>>> define it in jdk_util.h which can be included everywhere.
>>>>>>>
>>>>>> I think this change is okay. My main concern when initially seeing 
>>>>>> this
>>>>>> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
>>>>>> canonical File when it wasn't there previously, this would of course
>>>>>> have several implications. But I think you have it right and this 
>>>>>> is, as
>>>>>> you position, just refactoring/cleanup.
>>>>>>
>>>>>> -Alan

From david.holmes at oracle.com  Mon Nov 25 05:36:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 15:36:22 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
Message-ID: <f4d66c80-3da3-0d80-fad2-e79ae3380bf3@oracle.com>

Hi Dan,

I support your analysis regarding JVM TI StopThread. It's very hard via 
code inspection to be 100% certain but I think Robbin's change will 
install the async-exception in the current thread in the context of the 
StopThread call, resulting in the CautiouslyPreserveExceptionMark 
asserting. Unfortunately JVM TI StopThread doesn't special-case the 
current thread the way JVM_StopThread does.

Your observations about the WatcherThread change in behaviour are also 
spot on. Potentially at least, forcing the WatcherThread to wait for the 
safepoint to be executed could interfere with executing other periodic 
tasks. By default the WatcherThread won't be executing this code as the 
BiasedLockingStartupDelay is zero. But potentially, if anyone has that 
delay enabled, this could cause an observable change in behaviour in 
relation to other PeriodicTasks.

Perhaps the ability to execute an async-safepoint VM operation needs to 
remain, for simplicity (compared to working around the issues).

David
-----

On 23/11/2019 7:50 am, Daniel D. Daugherty wrote:
> Hi Robbin,
> 
> Sorry I'm late to this review thread...
> 
> I'm adding Serguei to this email thread since I'm making comments
> about the JVM/TI parts of this changeset...
> 
> 
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
> 
> 
> src/hotspot/share/runtime/vmOperations.hpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/vmOperations.cpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/vmThread.hpp
>  ??? L148: ? // The ever running loop for the VMThread
>  ??? L149: ? void loop();
>  ??? L150: ? static void check_cleanup();
>  ??????? nit - Feels like an odd place to add check_cleanup().
> 
>  ??????? Update: Now that I've seen what clean_up(), it needs a
>  ??????? better name. Perhaps check_for_forced_cleanup()? And since
>  ??????? it is supposed to affect the running loop for the VMThread
>  ??????? I'm okay with its location now.
> 
> src/hotspot/share/runtime/vmThread.cpp
>  ??? L382: ? event->set_blocking(true);
>  ??????? Probably have to keep the 'blocking' attribute in the event
>  ??????? for backward compatibility in the JFR record format?
> 
>  ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
> regular intervals
>  ??????? Is this comment true anymore (even before this changeset)?
>  ??????? Adding this on the next line might help:
> 
>  ????????????????? // (if there is cleanup work to do)
> 
>  ??????? since I _think_ that's how the policy has been evolved...
> 
>  ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>  ??????? Please prefix with "(void)" to make it clear you are
>  ??????? intentionally ignoring the return value.
> 
>  ??? old L627-634 (We want to make sure that we get to a safepoint 
> regularly)
>  ??????? I think this now old code is covered by your change above:
> 
>  ??????? L488: ??????? // If the queue contains a safepoint VM op,
>  ??????? L489: ??????? // clean up will be done so we can skip this part.
>  ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
> 
>  ??????? Please confirm that our thinking is the same here.
> 
>  ??? L661: ??? int ticket =? t->vm_operation_ticket();
>  ??????? nit - extra space after '='
> 
>  ??? Okay. Definitely simpler code.
> 
> src/hotspot/share/runtime/handshake.cpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/safepoint.hpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/safepoint.cpp
>  ??? Definitely got my attention with
>  ??? ObjectSynchronizer::needs_monitor_scavenge().
> 
> src/hotspot/share/runtime/synchronizer.hpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/synchronizer.cpp
>  ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
> triggering safepoint cleanup.");
>  ??????? Thanks for adding the logging line.
> 
>  ?? ? ?? Update: As Kim pointed out, this code goes away when
>  ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>  ? ? ? ? forward to making that change.
> 
>  ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
>  ??????? nit - extra space between 'xchg ('
> 
>  ??????? Since InduceScavenge() is only called when the deprecated
>  ??????? MonitorBound is specified, I think you could use cmpxchg()
>  ??????? for clarity. Of course, you might be thinking that the
>  ??????? pattern is a useful example for other folks to copy...
> 
> src/hotspot/share/runtime/thread.cpp
>  ??? old L527: // Enqueue a VM_Operation to do the job for us - sometime 
> later
>  ??? L527: void Thread::send_async_exception(oop java_thread, oop 
> java_throwable) {
>  ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>  ??? L529: ? VMThread::execute(&vm_stop);
>  ??? L530: }
>  ?????? Okay so you deleted the comment about the call being async and the
>  ?????? VM op is no longer async, but does that break the expectation of
>  ?????? any callers?
> 
>  ?????? Off the top of head, I can't think of a way for a caller of
>  ?????? Thread::send_async_exception() to determine that the call is now
>  ?????? synchronous instead of asynchronous, but ...
> 
>  ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>  ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>  ?????? used to throw an exception at the calling thread, I suspect that
>  ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>  ?????? With the exception throwing now being synchronous, would that
>  ?????? affect the return value of the JVM/TI StopThread() call?
> 
>  ?????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
>  ?????? in the build directory) uses ThreadInVMfromNative so the calling
>  ?????? thread is in VM when it requests the now synchronous VM operation.
>  ?????? When it requests the VM op, the calling thread will block which
>  ?????? should allow the VM thread to execute the op. No worries there so
>  ?????? far...
> 
>  ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
>  ?????? so I think if the exception is delivered to the calling thread
>  ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>  ?????? will have our return value. The CautiouslyPreserveExceptionMark
>  ?????? destructor won't kick in until we return from jvmti_StopThread()
>  ?????? (the JVM/TI wrapper from the build).
> 
>  ?????? However, that might cause this assertion to fire:
> 
>  ?????? src/hotspot/share/utilities/preserveException.cpp:
>  ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
> generated");
> 
>  ?????? because it is now detecting that an exception was thrown
>  ?????? while executing a JVM/TI call. This is pure theory here.
> 
> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>  ??? No comments.
> 
> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/biasedLocking.cpp
>  ??? old L85: ??? // Use async VM operation to avoid blocking the 
> Watcher thread.
>  ??????? Again, you've deleted the comment, but is there going to
>  ??????? be any unexpected side effects from the change? Looks like
>  ??????? the work consists of:
> 
>  ??????? L70: 
> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
> 
>  ??????? Is that going to be a problem for the WatcherThread?
> 
> test/hotspot/gtest/threadHelper.inline.hpp
>  ??? No comments.
> 
> As David H. likes to say: the proof is in the building and testing.
> 
> Thumbs up on the overall idea and implementation. There might be an
> issue lurking there in JVM/TI StopThread(), but that's just a theory
> on my part...
> 
> Dan
> 
> 
> 
> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>> Hi David,
>>
>> On 11/22/19 7:13 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>> Hi,
>>>>
>>>> Here is v3:
>>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>>
>>> Looking at the highly discussed:
>>>
>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>> &ForceMonitorScavenge) == 0) {
>>>
>>> why isn't that just:
>>>
>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>
>>> ??
>>
>> I assumed someone had seen contention on ForceMonitorScavenge.
>> Many threads can be enter and re-enter here.
>> I don't know if that's still the case.
>>
>> Since we only hit this path when the deprecated MonitorsBound is set, 
>> I think I can change it?
>>
>>>
>>> Also while we are here can we clean this up further:
>>>
>>> static volatile int ForceMonitorScavenge = 0;
>>>
>>> becomes
>>>
>>> static int _forceMonitorScavenge = 0;
>>>
>>> so the variable doesn't look like it came from globals.hpp :)
>>>
>>
>> Sure!
>>
>>> Just to be clear, I understand the changes around monitor scavenging 
>>> now, though I'm not sure getting rid of async VM ops and replacing 
>>> with a new way to directly wakeup the VMThread really amounts to a 
>>> simplification.
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>>
>>> I still think getting rid of Mode altogether would be a good 
>>> simplification. :)
>>
>> Sure!
>>
>> Here is v4, inc:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>
>> Tested t1-3
>>
>> Thanks, Robbin
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>> Hi all, please review.
>>>>>
>>>>> CMS was the last real user of the more advantage features of VM 
>>>>> operation.
>>>>> VM operation can be simplified to always be an stack object and 
>>>>> thus either be
>>>>> of safepoint or no safepoint type.
>>>>>
>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>>>> for the safepoint anyways, no real point in not stopping direct.
>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>>> the VM op is not needed. Arguably this thread should actually stop 
>>>>> here, since we are about to safepoint.
>>>>>
>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>> method is removed.
>>>>> And the extra safepoint is removed:
>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>> No we don't :)
>>>>>
>>>>> Issue:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>> Change-set:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>
>>>>> Tested scavenge manually, passes t1-2.
>>>>>
>>>>> Thanks, Robbin
> 

From david.holmes at oracle.com  Mon Nov 25 06:46:10 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 16:46:10 +1000
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
Message-ID: <61034f63-0843-a531-4da6-fe4064cdb357@oracle.com>

Hi Patricio,

On 23/11/2019 4:25 am, Patricio Chilano wrote:
> Hi,
> 
> This patch aims to address a current bug where, given the right 
> combination of handshakes and external suspend/resume, a JavaThread can 
> transition from a safe state back to Java without blocking for a 
> still-in-progress handshake. In the description of the bug I added an 
> example, tracing the state changes of the JavaThread as it goes through 
> the different transitions until it escapes the handshake. Currently, the 
> window of time for this issue to happen is so small that we do not see 
> actual failures running tests. Running test SuspendAtExit.java and 
> adding some small delay before restoring the JavaThread state in 
> java_suspend_self_with_safepoint_check() can demonstrate the issue.

Good catch. This highlights how difficult it is to see where all the 
thread-state-transitions are and reason about what can and can't happen 
in a given sequence of code.

> The proposed fix is to check again if we have a pending/in-progress 
> handshake operation after executing ~ThreadInVMForHandshake().

Minor nit but given we end up calling process_self_inner only after it 
was determined the current _handshake has_operation(), then the while 
loop should really be a do-while loop ?

Thanks,
David
-----

> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows and 
> Solaris).
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
> 
> Thanks,
> Patricio

From robbin.ehn at oracle.com  Mon Nov 25 08:06:50 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 25 Nov 2019 09:06:50 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <f4d66c80-3da3-0d80-fad2-e79ae3380bf3@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <f4d66c80-3da3-0d80-fad2-e79ae3380bf3@oracle.com>
Message-ID: <a77c1a73-af3d-daa4-4f0b-11d886e0909f@oracle.com>

Hi,

Starting with this email due to thanksgiving.

On 2019-11-25 06:36, David Holmes wrote:
> Hi Dan,
> 
> I support your analysis regarding JVM TI StopThread. It's very hard via code 
> inspection to be 100% certain but I think Robbin's change will install the 
> async-exception in the current thread in the context of the StopThread call, 
> resulting in the CautiouslyPreserveExceptionMark asserting. Unfortunately JVM TI 
> StopThread doesn't special-case the current thread the way JVM_StopThread does.

It passes jvmti/jdi tests.
The async-exception is thread internally only delivered when going to java, via
the suspend flags. In the case going VM->native it should not be delivered.
I'll investigate and have a look.

I find it very unsettling that jvmti StopThread is not deprecated?
This have exactly the same flaws as Thread.stop().
Meaning even if we remove Thread.stop() the VM needs to support this flawed
stopping ability...

> 
> Your observations about the WatcherThread change in behaviour are also spot on. 
> Potentially at least, forcing the WatcherThread to wait for the safepoint to be 
> executed could interfere with executing other periodic tasks. By default the 
> WatcherThread won't be executing this code as the BiasedLockingStartupDelay is 
> zero. But potentially, if anyone has that delay enabled, this could cause an 
> observable change in behaviour in relation to other PeriodicTasks.

In RFR I had this comment:
VM_EnableBiasedLocking is executed once by watcher thread, if needed (default
not used). Making it synchrone doesn't matter.

A periodic task have a minimum resolution of 10ms, while the safepoint for 
enabling biased locking takes <1ms under normal circumstances. On an
over-provisioned machine we see longer safepoints, but we see also see scheduler
delays up to 35-40ms.

I deemed it very unlikely that it is possible to notice it.

> 
> Perhaps the ability to execute an async-safepoint VM operation needs to remain, 
> for simplicity (compared to working around the issues).

I'm hoping not :(

/Robbin

> 
> David
> -----
> 
> On 23/11/2019 7:50 am, Daniel D. Daugherty wrote:
>> Hi Robbin,
>>
>> Sorry I'm late to this review thread...
>>
>> I'm adding Serguei to this email thread since I'm making comments
>> about the JVM/TI parts of this changeset...
>>
>>
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>
>>
>> src/hotspot/share/runtime/vmOperations.hpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/vmOperations.cpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/vmThread.hpp
>> ???? L148: ? // The ever running loop for the VMThread
>> ???? L149: ? void loop();
>> ???? L150: ? static void check_cleanup();
>> ???????? nit - Feels like an odd place to add check_cleanup().
>>
>> ???????? Update: Now that I've seen what clean_up(), it needs a
>> ???????? better name. Perhaps check_for_forced_cleanup()? And since
>> ???????? it is supposed to affect the running loop for the VMThread
>> ???????? I'm okay with its location now.
>>
>> src/hotspot/share/runtime/vmThread.cpp
>> ???? L382: ? event->set_blocking(true);
>> ???????? Probably have to keep the 'blocking' attribute in the event
>> ???????? for backward compatibility in the JFR record format?
>>
>> ???? L478: ??????? // wait with a timeout to guarantee safepoints at regular 
>> intervals
>> ???????? Is this comment true anymore (even before this changeset)?
>> ???????? Adding this on the next line might help:
>>
>> ?????????????????? // (if there is cleanup work to do)
>>
>> ???????? since I _think_ that's how the policy has been evolved...
>>
>> ???? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>> ???????? Please prefix with "(void)" to make it clear you are
>> ???????? intentionally ignoring the return value.
>>
>> ???? old L627-634 (We want to make sure that we get to a safepoint regularly)
>> ???????? I think this now old code is covered by your change above:
>>
>> ???????? L488: ??????? // If the queue contains a safepoint VM op,
>> ???????? L489: ??????? // clean up will be done so we can skip this part.
>> ???????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>
>> ???????? Please confirm that our thinking is the same here.
>>
>> ???? L661: ??? int ticket =? t->vm_operation_ticket();
>> ???????? nit - extra space after '='
>>
>> ???? Okay. Definitely simpler code.
>>
>> src/hotspot/share/runtime/handshake.cpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/safepoint.hpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/safepoint.cpp
>> ???? Definitely got my attention with
>> ???? ObjectSynchronizer::needs_monitor_scavenge().
>>
>> src/hotspot/share/runtime/synchronizer.hpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/synchronizer.cpp
>> ???? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, triggering 
>> safepoint cleanup.");
>> ???????? Thanks for adding the logging line.
>>
>> ??? ? ?? Update: As Kim pointed out, this code goes away when
>> ???????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>> ?? ? ? ? forward to making that change.
>>
>> ???? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && Atomic::xchg 
>> (1, &_forceMonitorScavenge) == 0) {
>> ???????? nit - extra space between 'xchg ('
>>
>> ???????? Since InduceScavenge() is only called when the deprecated
>> ???????? MonitorBound is specified, I think you could use cmpxchg()
>> ???????? for clarity. Of course, you might be thinking that the
>> ???????? pattern is a useful example for other folks to copy...
>>
>> src/hotspot/share/runtime/thread.cpp
>> ???? old L527: // Enqueue a VM_Operation to do the job for us - sometime later
>> ???? L527: void Thread::send_async_exception(oop java_thread, oop 
>> java_throwable) {
>> ???? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>> ???? L529: ? VMThread::execute(&vm_stop);
>> ???? L530: }
>> ??????? Okay so you deleted the comment about the call being async and the
>> ??????? VM op is no longer async, but does that break the expectation of
>> ??????? any callers?
>>
>> ??????? Off the top of head, I can't think of a way for a caller of
>> ??????? Thread::send_async_exception() to determine that the call is now
>> ??????? synchronous instead of asynchronous, but ...
>>
>> ??????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>> ??????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>> ??????? used to throw an exception at the calling thread, I suspect that
>> ??????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>> ??????? With the exception throwing now being synchronous, would that
>> ??????? affect the return value of the JVM/TI StopThread() call?
>>
>> ??????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
>> ??????? in the build directory) uses ThreadInVMfromNative so the calling
>> ??????? thread is in VM when it requests the now synchronous VM operation.
>> ??????? When it requests the VM op, the calling thread will block which
>> ??????? should allow the VM thread to execute the op. No worries there so
>> ??????? far...
>>
>> ??????? It looks like the code also uses CautiouslyPreserveExceptionMark
>> ??????? so I think if the exception is delivered to the calling thread
>> ??????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>> ??????? will have our return value. The CautiouslyPreserveExceptionMark
>> ??????? destructor won't kick in until we return from jvmti_StopThread()
>> ??????? (the JVM/TI wrapper from the build).
>>
>> ??????? However, that might cause this assertion to fire:
>>
>> ??????? src/hotspot/share/utilities/preserveException.cpp:
>> ??????? assert(!_thread->has_pending_exception(), "unexpected exception 
>> generated");
>>
>> ??????? because it is now detecting that an exception was thrown
>> ??????? while executing a JVM/TI call. This is pure theory here.
>>
>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>> ???? No comments.
>>
>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>> ???? No comments.
>>
>> src/hotspot/share/runtime/biasedLocking.cpp
>> ???? old L85: ??? // Use async VM operation to avoid blocking the Watcher thread.
>> ???????? Again, you've deleted the comment, but is there going to
>> ???????? be any unexpected side effects from the change? Looks like
>> ???????? the work consists of:
>>
>> ???????? L70: ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>
>> ???????? Is that going to be a problem for the WatcherThread?
>>
>> test/hotspot/gtest/threadHelper.inline.hpp
>> ???? No comments.
>>
>> As David H. likes to say: the proof is in the building and testing.
>>
>> Thumbs up on the overall idea and implementation. There might be an
>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>> on my part...
>>
>> Dan
>>
>>
>>
>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>> Hi,
>>>>>
>>>>> Here is v3:
>>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>
>>>> Looking at the highly discussed:
>>>>
>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>> &ForceMonitorScavenge) == 0) {
>>>>
>>>> why isn't that just:
>>>>
>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>
>>>> ??
>>>
>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>> Many threads can be enter and re-enter here.
>>> I don't know if that's still the case.
>>>
>>> Since we only hit this path when the deprecated MonitorsBound is set, I think 
>>> I can change it?
>>>
>>>>
>>>> Also while we are here can we clean this up further:
>>>>
>>>> static volatile int ForceMonitorScavenge = 0;
>>>>
>>>> becomes
>>>>
>>>> static int _forceMonitorScavenge = 0;
>>>>
>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>
>>>
>>> Sure!
>>>
>>>> Just to be clear, I understand the changes around monitor scavenging now, 
>>>> though I'm not sure getting rid of async VM ops and replacing with a new way 
>>>> to directly wakeup the VMThread really amounts to a simplification.
>>>>
>>>> ---
>>>>
>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>
>>>> I still think getting rid of Mode altogether would be a good simplification. :)
>>>
>>> Sure!
>>>
>>> Here is v4, inc:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>
>>> Tested t1-3
>>>
>>> Thanks, Robbin
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Inc:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>
>>>>> Tested t1-3
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>> Hi all, please review.
>>>>>>
>>>>>> CMS was the last real user of the more advantage features of VM operation.
>>>>>> VM operation can be simplified to always be an stack object and thus 
>>>>>> either be
>>>>>> of safepoint or no safepoint type.
>>>>>>
>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>>>> (default not used). Making it synchrone doesn't matter.
>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>>>>>> safepoint anyways, no real point in not stopping direct.
>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op 
>>>>>> is not needed. Arguably this thread should actually stop here, since we 
>>>>>> are about to safepoint.
>>>>>>
>>>>>> There is also a small cleanup in vmThread.cpp where an unused method is 
>>>>>> removed.
>>>>>> And the extra safepoint is removed:
>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>> No we don't :)
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>> Change-set:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>
>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>
>>>>>> Thanks, Robbin
>>

From robbin.ehn at oracle.com  Mon Nov 25 08:30:31 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 25 Nov 2019 09:30:31 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
Message-ID: <d1a0f624-3277-e3f7-8c30-abcc6471750c@oracle.com>

Hi Dan,

On 2019-11-22 22:50, Daniel D. Daugherty wrote:
> Hi Robbin,
> 
> Sorry I'm late to this review thread...

No problem.

> 
> I'm adding Serguei to this email thread since I'm making comments
> about the JVM/TI parts of this changeset...
> 

Good, thanks!

> 
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
> src/hotspot/share/runtime/vmThread.hpp
>  ??? L148: ? // The ever running loop for the VMThread
>  ??? L149: ? void loop();
>  ??? L150: ? static void check_cleanup();
>  ??????? nit - Feels like an odd place to add check_cleanup().
> 
>  ??????? Update: Now that I've seen what clean_up(), it needs a
>  ??????? better name. Perhaps check_for_forced_cleanup()? And since
>  ??????? it is supposed to affect the running loop for the VMThread
>  ??????? I'm okay with its location now.
> 

Fixed.

> src/hotspot/share/runtime/vmThread.cpp
>  ??? L382: ? event->set_blocking(true);
>  ??????? Probably have to keep the 'blocking' attribute in the event
>  ??????? for backward compatibility in the JFR record format?
> 
>  ??? L478: ??????? // wait with a timeout to guarantee safepoints at regular 
> intervals
>  ??????? Is this comment true anymore (even before this changeset)?
>  ??????? Adding this on the next line might help:
> 
>  ????????????????? // (if there is cleanup work to do)

Fixed.

> 
>  ??????? since I _think_ that's how the policy has been evolved...
> 
>  ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>  ??????? Please prefix with "(void)" to make it clear you are
>  ??????? intentionally ignoring the return value.
> 
>  ??? old L627-634 (We want to make sure that we get to a safepoint regularly)
>  ??????? I think this now old code is covered by your change above:
> 
>  ??????? L488: ??????? // If the queue contains a safepoint VM op,
>  ??????? L489: ??????? // clean up will be done so we can skip this part.
>  ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
> 
>  ??????? Please confirm that our thinking is the same here.

Yes, you are correct.

> 
>  ??? L661: ??? int ticket =? t->vm_operation_ticket();
>  ??????? nit - extra space after '='
> 
>  ??? Okay. Definitely simpler code.

Fixed, great.

> src/hotspot/share/runtime/synchronizer.cpp
>  ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, triggering 
> safepoint cleanup.");
>  ??????? Thanks for adding the logging line.
> 
>  ?? ? ?? Update: As Kim pointed out, this code goes away when
>  ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>  ? ? ? ? forward to making that change.

Yes!

> 
>  ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && Atomic::xchg (1, 
> &_forceMonitorScavenge) == 0) {
>  ??????? nit - extra space between 'xchg ('
> 
>  ??????? Since InduceScavenge() is only called when the deprecated
>  ??????? MonitorBound is specified, I think you could use cmpxchg()
>  ??????? for clarity. Of course, you might be thinking that the
>  ??????? pattern is a useful example for other folks to copy...

That predates me, since all was in favor of only xchg, fixed!

> 
> src/hotspot/share/runtime/thread.cpp
>  ??? old L527: // Enqueue a VM_Operation to do the job for us - sometime later
>  ??? L527: void Thread::send_async_exception(oop java_thread, oop java_throwable) {
>  ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>  ??? L529: ? VMThread::execute(&vm_stop);
>  ??? L530: }
>  ?????? Okay so you deleted the comment about the call being async and the
>  ?????? VM op is no longer async, but does that break the expectation of
>  ?????? any callers?
> 
>  ?????? Off the top of head, I can't think of a way for a caller of
>  ?????? Thread::send_async_exception() to determine that the call is now
>  ?????? synchronous instead of asynchronous, but ...
> 
>  ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>  ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>  ?????? used to throw an exception at the calling thread, I suspect that
>  ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>  ?????? With the exception throwing now being synchronous, would that
>  ?????? affect the return value of the JVM/TI StopThread() call?
> 
>  ?????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
>  ?????? in the build directory) uses ThreadInVMfromNative so the calling
>  ?????? thread is in VM when it requests the now synchronous VM operation.
>  ?????? When it requests the VM op, the calling thread will block which
>  ?????? should allow the VM thread to execute the op. No worries there so
>  ?????? far...

We might as well be stop in ~ThreadInVMfromNative when checking if we should
block. E.g. we tell the VM thread to stop us, then we check if VM thread is
stopping us, if we are fast we are not armed yet and may elide into native :)

> 
>  ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
>  ?????? so I think if the exception is delivered to the calling thread
>  ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>  ?????? will have our return value. The CautiouslyPreserveExceptionMark
>  ?????? destructor won't kick in until we return from jvmti_StopThread()
>  ?????? (the JVM/TI wrapper from the build).
> 
>  ?????? However, that might cause this assertion to fire:
> 
>  ?????? src/hotspot/share/utilities/preserveException.cpp:
>  ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
> generated");
> 
>  ?????? because it is now detecting that an exception was thrown
>  ?????? while executing a JVM/TI call. This is pure theory here.

This can happen today, but async exception is not delivered when going to
native.

Thread -> asynch safepoint
VMThread -> arm
Thread -> blocks in ~ThreadInVMfromNative
VMThread -> install async

But we do not call check_and_handle_async_exceptions() when going to native.

> 
> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>  ??? No comments.
> 
> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>  ??? No comments.
> 
> src/hotspot/share/runtime/biasedLocking.cpp
>  ??? old L85: ??? // Use async VM operation to avoid blocking the Watcher thread.
>  ??????? Again, you've deleted the comment, but is there going to
>  ??????? be any unexpected side effects from the change? Looks like
>  ??????? the work consists of:
> 
>  ??????? L70: ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
> 
>  ??????? Is that going to be a problem for the WatcherThread?

~no :) Please see longer response to David's email :)

> 
> test/hotspot/gtest/threadHelper.inline.hpp
>  ??? No comments.
> 
> As David H. likes to say: the proof is in the building and testing.
> 
> Thumbs up on the overall idea and implementation. There might be an
> issue lurking there in JVM/TI StopThread(), but that's just a theory
> on my part...

Thanks, Robbin

> 
> Dan
> 
> 
> 
> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>> Hi David,
>>
>> On 11/22/19 7:13 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>> Hi,
>>>>
>>>> Here is v3:
>>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>>
>>> Looking at the highly discussed:
>>>
>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>> &ForceMonitorScavenge) == 0) {
>>>
>>> why isn't that just:
>>>
>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>
>>> ??
>>
>> I assumed someone had seen contention on ForceMonitorScavenge.
>> Many threads can be enter and re-enter here.
>> I don't know if that's still the case.
>>
>> Since we only hit this path when the deprecated MonitorsBound is set, I think 
>> I can change it?
>>
>>>
>>> Also while we are here can we clean this up further:
>>>
>>> static volatile int ForceMonitorScavenge = 0;
>>>
>>> becomes
>>>
>>> static int _forceMonitorScavenge = 0;
>>>
>>> so the variable doesn't look like it came from globals.hpp :)
>>>
>>
>> Sure!
>>
>>> Just to be clear, I understand the changes around monitor scavenging now, 
>>> though I'm not sure getting rid of async VM ops and replacing with a new way 
>>> to directly wakeup the VMThread really amounts to a simplification.
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>>
>>> I still think getting rid of Mode altogether would be a good simplification. :)
>>
>> Sure!
>>
>> Here is v4, inc:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>
>> Tested t1-3
>>
>> Thanks, Robbin
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>> Hi all, please review.
>>>>>
>>>>> CMS was the last real user of the more advantage features of VM operation.
>>>>> VM operation can be simplified to always be an stack object and thus either be
>>>>> of safepoint or no safepoint type.
>>>>>
>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>>> (default not used). Making it synchrone doesn't matter.
>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>>>>> safepoint anyways, no real point in not stopping direct.
>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op 
>>>>> is not needed. Arguably this thread should actually stop here, since we are 
>>>>> about to safepoint.
>>>>>
>>>>> There is also a small cleanup in vmThread.cpp where an unused method is 
>>>>> removed.
>>>>> And the extra safepoint is removed:
>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>> No we don't :)
>>>>>
>>>>> Issue:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>> Change-set:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>
>>>>> Tested scavenge manually, passes t1-2.
>>>>>
>>>>> Thanks, Robbin
> 

From robbin.ehn at oracle.com  Mon Nov 25 08:33:07 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 25 Nov 2019 09:33:07 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <092954ba-4eca-90bf-3d89-ddd24706f7a8@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <092954ba-4eca-90bf-3d89-ddd24706f7a8@oracle.com>
Message-ID: <d7750b89-aa00-38ae-7edf-bca4c2e5e494@oracle.com>

Hi Dan,

On 2019-11-22 22:58, Daniel D. Daugherty wrote:
> Just to you...
> 
> I think you need to do at least one set of Mach5 runs that includes the
> higher tiers. With this change, you really need JPDA (including JVM/TI)
> which comes in at Tier5 and you need stress testing which comes in at
> Tiers[678].

This have been tested together with a bigger change-set t1-5 and nsk jdi/jvmti.

> 
> I tend to submit Tier[1-3], Tier[4-6], Tier7 and then Tier8. This gets
> thru more quickly than a Tier[1-8] which has too many tasks...

I'll do that.

Thanks, Robbin

> 
> Dan
> 
> 
> On 11/22/19 4:50 PM, Daniel D. Daugherty wrote:
>> Hi Robbin,
>>
>> Sorry I'm late to this review thread...
>>
>> I'm adding Serguei to this email thread since I'm making comments
>> about the JVM/TI parts of this changeset...
>>
>>
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>
>>
>> src/hotspot/share/runtime/vmOperations.hpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/vmOperations.cpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/vmThread.hpp
>> ??? L148: ? // The ever running loop for the VMThread
>> ??? L149: ? void loop();
>> ??? L150: ? static void check_cleanup();
>> ??????? nit - Feels like an odd place to add check_cleanup().
>>
>> ??????? Update: Now that I've seen what clean_up(), it needs a
>> ??????? better name. Perhaps check_for_forced_cleanup()? And since
>> ??????? it is supposed to affect the running loop for the VMThread
>> ??????? I'm okay with its location now.
>>
>> src/hotspot/share/runtime/vmThread.cpp
>> ??? L382: ? event->set_blocking(true);
>> ??????? Probably have to keep the 'blocking' attribute in the event
>> ??????? for backward compatibility in the JFR record format?
>>
>> ??? L478: ??????? // wait with a timeout to guarantee safepoints at regular 
>> intervals
>> ??????? Is this comment true anymore (even before this changeset)?
>> ??????? Adding this on the next line might help:
>>
>> ????????????????? // (if there is cleanup work to do)
>>
>> ??????? since I _think_ that's how the policy has been evolved...
>>
>> ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>> ??????? Please prefix with "(void)" to make it clear you are
>> ??????? intentionally ignoring the return value.
>>
>> ??? old L627-634 (We want to make sure that we get to a safepoint regularly)
>> ??????? I think this now old code is covered by your change above:
>>
>> ??????? L488: ??????? // If the queue contains a safepoint VM op,
>> ??????? L489: ??????? // clean up will be done so we can skip this part.
>> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>
>> ??????? Please confirm that our thinking is the same here.
>>
>> ??? L661: ??? int ticket =? t->vm_operation_ticket();
>> ??????? nit - extra space after '='
>>
>> ??? Okay. Definitely simpler code.
>>
>> src/hotspot/share/runtime/handshake.cpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/safepoint.hpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/safepoint.cpp
>> ??? Definitely got my attention with
>> ??? ObjectSynchronizer::needs_monitor_scavenge().
>>
>> src/hotspot/share/runtime/synchronizer.hpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/synchronizer.cpp
>> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, triggering 
>> safepoint cleanup.");
>> ??????? Thanks for adding the logging line.
>>
>> ?? ? ?? Update: As Kim pointed out, this code goes away when
>> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>> ? ? ? ? forward to making that change.
>>
>> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && Atomic::xchg (1, 
>> &_forceMonitorScavenge) == 0) {
>> ??????? nit - extra space between 'xchg ('
>>
>> ??????? Since InduceScavenge() is only called when the deprecated
>> ??????? MonitorBound is specified, I think you could use cmpxchg()
>> ??????? for clarity. Of course, you might be thinking that the
>> ??????? pattern is a useful example for other folks to copy...
>>
>> src/hotspot/share/runtime/thread.cpp
>> ??? old L527: // Enqueue a VM_Operation to do the job for us - sometime later
>> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
>> java_throwable) {
>> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>> ??? L529: ? VMThread::execute(&vm_stop);
>> ??? L530: }
>> ?????? Okay so you deleted the comment about the call being async and the
>> ?????? VM op is no longer async, but does that break the expectation of
>> ?????? any callers?
>>
>> ?????? Off the top of head, I can't think of a way for a caller of
>> ?????? Thread::send_async_exception() to determine that the call is now
>> ?????? synchronous instead of asynchronous, but ...
>>
>> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>> ?????? used to throw an exception at the calling thread, I suspect that
>> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>> ?????? With the exception throwing now being synchronous, would that
>> ?????? affect the return value of the JVM/TI StopThread() call?
>>
>> ?????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
>> ?????? in the build directory) uses ThreadInVMfromNative so the calling
>> ?????? thread is in VM when it requests the now synchronous VM operation.
>> ?????? When it requests the VM op, the calling thread will block which
>> ?????? should allow the VM thread to execute the op. No worries there so
>> ?????? far...
>>
>> ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
>> ?????? so I think if the exception is delivered to the calling thread
>> ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>> ?????? will have our return value. The CautiouslyPreserveExceptionMark
>> ?????? destructor won't kick in until we return from jvmti_StopThread()
>> ?????? (the JVM/TI wrapper from the build).
>>
>> ?????? However, that might cause this assertion to fire:
>>
>> ?????? src/hotspot/share/utilities/preserveException.cpp:
>> ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
>> generated");
>>
>> ?????? because it is now detecting that an exception was thrown
>> ?????? while executing a JVM/TI call. This is pure theory here.
>>
>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>> ??? No comments.
>>
>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/biasedLocking.cpp
>> ??? old L85: ??? // Use async VM operation to avoid blocking the Watcher thread.
>> ??????? Again, you've deleted the comment, but is there going to
>> ??????? be any unexpected side effects from the change? Looks like
>> ??????? the work consists of:
>>
>> ??????? L70: ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>
>> ??????? Is that going to be a problem for the WatcherThread?
>>
>> test/hotspot/gtest/threadHelper.inline.hpp
>> ??? No comments.
>>
>> As David H. likes to say: the proof is in the building and testing.
>>
>> Thumbs up on the overall idea and implementation. There might be an
>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>> on my part...
>>
>> Dan
>>
>>
>>
>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>> Hi,
>>>>>
>>>>> Here is v3:
>>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>
>>>> Looking at the highly discussed:
>>>>
>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>> &ForceMonitorScavenge) == 0) {
>>>>
>>>> why isn't that just:
>>>>
>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>
>>>> ??
>>>
>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>> Many threads can be enter and re-enter here.
>>> I don't know if that's still the case.
>>>
>>> Since we only hit this path when the deprecated MonitorsBound is set, I think 
>>> I can change it?
>>>
>>>>
>>>> Also while we are here can we clean this up further:
>>>>
>>>> static volatile int ForceMonitorScavenge = 0;
>>>>
>>>> becomes
>>>>
>>>> static int _forceMonitorScavenge = 0;
>>>>
>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>
>>>
>>> Sure!
>>>
>>>> Just to be clear, I understand the changes around monitor scavenging now, 
>>>> though I'm not sure getting rid of async VM ops and replacing with a new way 
>>>> to directly wakeup the VMThread really amounts to a simplification.
>>>>
>>>> ---
>>>>
>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>
>>>> I still think getting rid of Mode altogether would be a good simplification. :)
>>>
>>> Sure!
>>>
>>> Here is v4, inc:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>
>>> Tested t1-3
>>>
>>> Thanks, Robbin
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Inc:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>
>>>>> Tested t1-3
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>> Hi all, please review.
>>>>>>
>>>>>> CMS was the last real user of the more advantage features of VM operation.
>>>>>> VM operation can be simplified to always be an stack object and thus 
>>>>>> either be
>>>>>> of safepoint or no safepoint type.
>>>>>>
>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>>>> (default not used). Making it synchrone doesn't matter.
>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>>>>>> safepoint anyways, no real point in not stopping direct.
>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op 
>>>>>> is not needed. Arguably this thread should actually stop here, since we 
>>>>>> are about to safepoint.
>>>>>>
>>>>>> There is also a small cleanup in vmThread.cpp where an unused method is 
>>>>>> removed.
>>>>>> And the extra safepoint is removed:
>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>> No we don't :)
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>> Change-set:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>
>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>
>>>>>> Thanks, Robbin
>>
> 

From robbin.ehn at oracle.com  Mon Nov 25 09:14:51 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 25 Nov 2019 10:14:51 +0100
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
Message-ID: <1333677e-23ff-17fb-87df-055220efe850@oracle.com>

Hi Patricio,

On 2019-11-22 19:25, Patricio Chilano wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/

Thanks, I think this is good and easy to backport!
You might as well add native to the assert.

We should revisit this when we have time.
There are two polls and four transition in this code, which is more complicated
than I like.

/Robbin

> 
> Thanks,
> Patricio

From felix.yang at huawei.com  Mon Nov 25 11:33:18 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Mon, 25 Nov 2019 11:33:18 +0000
Subject: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when
 profiling return and parameters type
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EDAA21F3E@dggeml527-mbx.china.huawei.com>

Ping?   Any comments?

Thanks,
Felix

From: Yangfei (Felix)
Sent: Thursday, November 7, 2019 9:17 AM
To: hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
Subject: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when profiling return and parameters type

Hi,

   Please review the following patch:

      Bug: https://bugs.openjdk.java.net/browse/JDK-8233466

Webrev: http://cr.openjdk.java.net/~fyang/8233466/webrev.00/


When profiling return and parameters type from the interpreter on aarch64 platform, 'mdp' is loaded by test_method_data_pointer which is called by profile_return_type & profile_parameters_type.

It's not necessary to load mdo before calling __ profile_return_type or __ profile_parameters_type.


Passed tier1-3 testing.


Thanks,

Felix

From serguei.spitsyn at oracle.com  Mon Nov 25 11:35:17 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Mon, 25 Nov 2019 03:35:17 -0800
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
Message-ID: <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>

Hi Dan and Robbin,

I can be wrong and missing something but it feels like there is no issue 
for JVMTI with this fix.

 > Off the top of head, I can't think of a way for a caller of
 > Thread::send_async_exception() to determine that the call is now
 > synchronous instead of asynchronous, but ...

There can be some confusion here about what is synchronous relative to.
I read it this way:
 ?It synchronous for the current thread which calls the 
send_async_exception().
 ?However, it is asynchronous for the target thread that needs to be 
stopped.
 ?So that the fix does not break the JVMTI spec requirements.

Please, let me know if you agree (or not) with this reading.

Thanks,
Serguei


On 11/22/19 13:50, Daniel D. Daugherty wrote:
> Hi Robbin,
>
> Sorry I'm late to this review thread...
>
> I'm adding Serguei to this email thread since I'm making comments
> about the JVM/TI parts of this changeset...
>
>
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>
>
> src/hotspot/share/runtime/vmOperations.hpp
> ??? No comments.
>
> src/hotspot/share/runtime/vmOperations.cpp
> ??? No comments.
>
> src/hotspot/share/runtime/vmThread.hpp
> ??? L148: ? // The ever running loop for the VMThread
> ??? L149: ? void loop();
> ??? L150: ? static void check_cleanup();
> ??????? nit - Feels like an odd place to add check_cleanup().
>
> ??????? Update: Now that I've seen what clean_up(), it needs a
> ??????? better name. Perhaps check_for_forced_cleanup()? And since
> ??????? it is supposed to affect the running loop for the VMThread
> ??????? I'm okay with its location now.
>
> src/hotspot/share/runtime/vmThread.cpp
> ??? L382: ? event->set_blocking(true);
> ??????? Probably have to keep the 'blocking' attribute in the event
> ??????? for backward compatibility in the JFR record format?
>
> ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
> regular intervals
> ??????? Is this comment true anymore (even before this changeset)?
> ??????? Adding this on the next line might help:
>
> ????????????????? // (if there is cleanup work to do)
>
> ??????? since I _think_ that's how the policy has been evolved...
>
> ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
> ??????? Please prefix with "(void)" to make it clear you are
> ??????? intentionally ignoring the return value.
>
> ??? old L627-634 (We want to make sure that we get to a safepoint 
> regularly)
> ??????? I think this now old code is covered by your change above:
>
> ??????? L488: ??????? // If the queue contains a safepoint VM op,
> ??????? L489: ??????? // clean up will be done so we can skip this part.
> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>
> ??????? Please confirm that our thinking is the same here.
>
> ??? L661: ??? int ticket =? t->vm_operation_ticket();
> ??????? nit - extra space after '='
>
> ??? Okay. Definitely simpler code.
>
> src/hotspot/share/runtime/handshake.cpp
> ??? No comments.
>
> src/hotspot/share/runtime/safepoint.hpp
> ??? No comments.
>
> src/hotspot/share/runtime/safepoint.cpp
> ??? Definitely got my attention with
> ??? ObjectSynchronizer::needs_monitor_scavenge().
>
> src/hotspot/share/runtime/synchronizer.hpp
> ??? No comments.
>
> src/hotspot/share/runtime/synchronizer.cpp
> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
> triggering safepoint cleanup.");
> ??????? Thanks for adding the logging line.
>
> ?? ? ?? Update: As Kim pointed out, this code goes away when
> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
> ? ? ? ? forward to making that change.
>
> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
> ??????? nit - extra space between 'xchg ('
>
> ??????? Since InduceScavenge() is only called when the deprecated
> ??????? MonitorBound is specified, I think you could use cmpxchg()
> ??????? for clarity. Of course, you might be thinking that the
> ??????? pattern is a useful example for other folks to copy...
>
> src/hotspot/share/runtime/thread.cpp
> ??? old L527: // Enqueue a VM_Operation to do the job for us - 
> sometime later
> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
> java_throwable) {
> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
> ??? L529: ? VMThread::execute(&vm_stop);
> ??? L530: }
> ?????? Okay so you deleted the comment about the call being async and the
> ?????? VM op is no longer async, but does that break the expectation of
> ?????? any callers?
>
> ?????? Off the top of head, I can't think of a way for a caller of
> ?????? Thread::send_async_exception() to determine that the call is now
> ?????? synchronous instead of asynchronous, but ...
>
> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
> ?????? used to throw an exception at the calling thread, I suspect that
> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
> ?????? With the exception throwing now being synchronous, would that
> ?????? affect the return value of the JVM/TI StopThread() call?
>
> ?????? Looks like the JVM/TI wrapper (see 
> gensrc/jvmtifiles/jvmtiEnter.cpp
> ?????? in the build directory) uses ThreadInVMfromNative so the calling
> ?????? thread is in VM when it requests the now synchronous VM operation.
> ?????? When it requests the VM op, the calling thread will block which
> ?????? should allow the VM thread to execute the op. No worries there so
> ?????? far...
>
> ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
> ?????? so I think if the exception is delivered to the calling thread
> ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
> ?????? will have our return value. The CautiouslyPreserveExceptionMark
> ?????? destructor won't kick in until we return from jvmti_StopThread()
> ?????? (the JVM/TI wrapper from the build).
>
> ?????? However, that might cause this assertion to fire:
>
> ?????? src/hotspot/share/utilities/preserveException.cpp:
> ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
> generated");
>
> ?????? because it is now detecting that an exception was thrown
> ?????? while executing a JVM/TI call. This is pure theory here.
>
> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
> ??? No comments.
>
> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
> ??? No comments.
>
> src/hotspot/share/runtime/biasedLocking.cpp
> ??? old L85: ??? // Use async VM operation to avoid blocking the 
> Watcher thread.
> ??????? Again, you've deleted the comment, but is there going to
> ??????? be any unexpected side effects from the change? Looks like
> ??????? the work consists of:
>
> ??????? L70: 
> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>
> ??????? Is that going to be a problem for the WatcherThread?
>
> test/hotspot/gtest/threadHelper.inline.hpp
> ??? No comments.
>
> As David H. likes to say: the proof is in the building and testing.
>
> Thumbs up on the overall idea and implementation. There might be an
> issue lurking there in JVM/TI StopThread(), but that's just a theory
> on my part...
>
> Dan
>
>
>
> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>> Hi David,
>>
>> On 11/22/19 7:13 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>> Hi,
>>>>
>>>> Here is v3:
>>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>>
>>> Looking at the highly discussed:
>>>
>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>> &ForceMonitorScavenge) == 0) {
>>>
>>> why isn't that just:
>>>
>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>
>>> ??
>>
>> I assumed someone had seen contention on ForceMonitorScavenge.
>> Many threads can be enter and re-enter here.
>> I don't know if that's still the case.
>>
>> Since we only hit this path when the deprecated MonitorsBound is set, 
>> I think I can change it?
>>
>>>
>>> Also while we are here can we clean this up further:
>>>
>>> static volatile int ForceMonitorScavenge = 0;
>>>
>>> becomes
>>>
>>> static int _forceMonitorScavenge = 0;
>>>
>>> so the variable doesn't look like it came from globals.hpp :)
>>>
>>
>> Sure!
>>
>>> Just to be clear, I understand the changes around monitor scavenging 
>>> now, though I'm not sure getting rid of async VM ops and replacing 
>>> with a new way to directly wakeup the VMThread really amounts to a 
>>> simplification.
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>>
>>> I still think getting rid of Mode altogether would be a good 
>>> simplification. :)
>>
>> Sure!
>>
>> Here is v4, inc:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>
>> Tested t1-3
>>
>> Thanks, Robbin
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>> Hi all, please review.
>>>>>
>>>>> CMS was the last real user of the more advantage features of VM 
>>>>> operation.
>>>>> VM operation can be simplified to always be an stack object and 
>>>>> thus either be
>>>>> of safepoint or no safepoint type.
>>>>>
>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>>>> for the safepoint anyways, no real point in not stopping direct.
>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>>> the VM op is not needed. Arguably this thread should actually stop 
>>>>> here, since we are about to safepoint.
>>>>>
>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>> method is removed.
>>>>> And the extra safepoint is removed:
>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>> No we don't :)
>>>>>
>>>>> Issue:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>> Change-set:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>
>>>>> Tested scavenge manually, passes t1-2.
>>>>>
>>>>> Thanks, Robbin
>


From serguei.spitsyn at oracle.com  Mon Nov 25 11:45:52 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Mon, 25 Nov 2019 03:45:52 -0800
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>
Message-ID: <9cdcb5e0-a517-2193-e77d-ad024ac1d11f@oracle.com>

Please, skip my reply below.
I need to read all emails carefully.

Thanks,
Serguei

On 11/25/19 03:35, serguei.spitsyn at oracle.com wrote:
> Hi Dan and Robbin,
>
> I can be wrong and missing something but it feels like there is no 
> issue for JVMTI with this fix.
>
> > Off the top of head, I can't think of a way for a caller of
> > Thread::send_async_exception() to determine that the call is now
> > synchronous instead of asynchronous, but ...
>
> There can be some confusion here about what is synchronous relative to.
> I read it this way:
> ?It synchronous for the current thread which calls the 
> send_async_exception().
> ?However, it is asynchronous for the target thread that needs to be 
> stopped.
> ?So that the fix does not break the JVMTI spec requirements.
>
> Please, let me know if you agree (or not) with this reading.
>
> Thanks,
> Serguei
>
>
> On 11/22/19 13:50, Daniel D. Daugherty wrote:
>> Hi Robbin,
>>
>> Sorry I'm late to this review thread...
>>
>> I'm adding Serguei to this email thread since I'm making comments
>> about the JVM/TI parts of this changeset...
>>
>>
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>
>>
>> src/hotspot/share/runtime/vmOperations.hpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/vmOperations.cpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/vmThread.hpp
>> ??? L148: ? // The ever running loop for the VMThread
>> ??? L149: ? void loop();
>> ??? L150: ? static void check_cleanup();
>> ??????? nit - Feels like an odd place to add check_cleanup().
>>
>> ??????? Update: Now that I've seen what clean_up(), it needs a
>> ??????? better name. Perhaps check_for_forced_cleanup()? And since
>> ??????? it is supposed to affect the running loop for the VMThread
>> ??????? I'm okay with its location now.
>>
>> src/hotspot/share/runtime/vmThread.cpp
>> ??? L382: ? event->set_blocking(true);
>> ??????? Probably have to keep the 'blocking' attribute in the event
>> ??????? for backward compatibility in the JFR record format?
>>
>> ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
>> regular intervals
>> ??????? Is this comment true anymore (even before this changeset)?
>> ??????? Adding this on the next line might help:
>>
>> ????????????????? // (if there is cleanup work to do)
>>
>> ??????? since I _think_ that's how the policy has been evolved...
>>
>> ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>> ??????? Please prefix with "(void)" to make it clear you are
>> ??????? intentionally ignoring the return value.
>>
>> ??? old L627-634 (We want to make sure that we get to a safepoint 
>> regularly)
>> ??????? I think this now old code is covered by your change above:
>>
>> ??????? L488: ??????? // If the queue contains a safepoint VM op,
>> ??????? L489: ??????? // clean up will be done so we can skip this part.
>> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>
>> ??????? Please confirm that our thinking is the same here.
>>
>> ??? L661: ??? int ticket =? t->vm_operation_ticket();
>> ??????? nit - extra space after '='
>>
>> ??? Okay. Definitely simpler code.
>>
>> src/hotspot/share/runtime/handshake.cpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/safepoint.hpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/safepoint.cpp
>> ??? Definitely got my attention with
>> ??? ObjectSynchronizer::needs_monitor_scavenge().
>>
>> src/hotspot/share/runtime/synchronizer.hpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/synchronizer.cpp
>> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
>> triggering safepoint cleanup.");
>> ??????? Thanks for adding the logging line.
>>
>> ?? ? ?? Update: As Kim pointed out, this code goes away when
>> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>> ? ? ? ? forward to making that change.
>>
>> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
>> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
>> ??????? nit - extra space between 'xchg ('
>>
>> ??????? Since InduceScavenge() is only called when the deprecated
>> ??????? MonitorBound is specified, I think you could use cmpxchg()
>> ??????? for clarity. Of course, you might be thinking that the
>> ??????? pattern is a useful example for other folks to copy...
>>
>> src/hotspot/share/runtime/thread.cpp
>> ??? old L527: // Enqueue a VM_Operation to do the job for us - 
>> sometime later
>> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
>> java_throwable) {
>> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>> ??? L529: ? VMThread::execute(&vm_stop);
>> ??? L530: }
>> ?????? Okay so you deleted the comment about the call being async and 
>> the
>> ?????? VM op is no longer async, but does that break the expectation of
>> ?????? any callers?
>>
>> ?????? Off the top of head, I can't think of a way for a caller of
>> ?????? Thread::send_async_exception() to determine that the call is now
>> ?????? synchronous instead of asynchronous, but ...
>>
>> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>> ?????? used to throw an exception at the calling thread, I suspect that
>> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>> ?????? With the exception throwing now being synchronous, would that
>> ?????? affect the return value of the JVM/TI StopThread() call?
>>
>> ?????? Looks like the JVM/TI wrapper (see 
>> gensrc/jvmtifiles/jvmtiEnter.cpp
>> ?????? in the build directory) uses ThreadInVMfromNative so the calling
>> ?????? thread is in VM when it requests the now synchronous VM 
>> operation.
>> ?????? When it requests the VM op, the calling thread will block which
>> ?????? should allow the VM thread to execute the op. No worries there so
>> ?????? far...
>>
>> ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
>> ?????? so I think if the exception is delivered to the calling thread
>> ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>> ?????? will have our return value. The CautiouslyPreserveExceptionMark
>> ?????? destructor won't kick in until we return from jvmti_StopThread()
>> ?????? (the JVM/TI wrapper from the build).
>>
>> ?????? However, that might cause this assertion to fire:
>>
>> ?????? src/hotspot/share/utilities/preserveException.cpp:
>> ?????? assert(!_thread->has_pending_exception(), "unexpected 
>> exception generated");
>>
>> ?????? because it is now detecting that an exception was thrown
>> ?????? while executing a JVM/TI call. This is pure theory here.
>>
>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>> ??? No comments.
>>
>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>> ??? No comments.
>>
>> src/hotspot/share/runtime/biasedLocking.cpp
>> ??? old L85: ??? // Use async VM operation to avoid blocking the 
>> Watcher thread.
>> ??????? Again, you've deleted the comment, but is there going to
>> ??????? be any unexpected side effects from the change? Looks like
>> ??????? the work consists of:
>>
>> ??????? L70: 
>> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>
>> ??????? Is that going to be a problem for the WatcherThread?
>>
>> test/hotspot/gtest/threadHelper.inline.hpp
>> ??? No comments.
>>
>> As David H. likes to say: the proof is in the building and testing.
>>
>> Thumbs up on the overall idea and implementation. There might be an
>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>> on my part...
>>
>> Dan
>>
>>
>>
>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>> Hi,
>>>>>
>>>>> Here is v3:
>>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>
>>>> Looking at the highly discussed:
>>>>
>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>> &ForceMonitorScavenge) == 0) {
>>>>
>>>> why isn't that just:
>>>>
>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>
>>>> ??
>>>
>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>> Many threads can be enter and re-enter here.
>>> I don't know if that's still the case.
>>>
>>> Since we only hit this path when the deprecated MonitorsBound is 
>>> set, I think I can change it?
>>>
>>>>
>>>> Also while we are here can we clean this up further:
>>>>
>>>> static volatile int ForceMonitorScavenge = 0;
>>>>
>>>> becomes
>>>>
>>>> static int _forceMonitorScavenge = 0;
>>>>
>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>
>>>
>>> Sure!
>>>
>>>> Just to be clear, I understand the changes around monitor 
>>>> scavenging now, though I'm not sure getting rid of async VM ops and 
>>>> replacing with a new way to directly wakeup the VMThread really 
>>>> amounts to a simplification.
>>>>
>>>> ---
>>>>
>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>
>>>> I still think getting rid of Mode altogether would be a good 
>>>> simplification. :)
>>>
>>> Sure!
>>>
>>> Here is v4, inc:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>
>>> Tested t1-3
>>>
>>> Thanks, Robbin
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Inc:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>
>>>>> Tested t1-3
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>> Hi all, please review.
>>>>>>
>>>>>> CMS was the last real user of the more advantage features of VM 
>>>>>> operation.
>>>>>> VM operation can be simplified to always be an stack object and 
>>>>>> thus either be
>>>>>> of safepoint or no safepoint type.
>>>>>>
>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should 
>>>>>> stop for the safepoint anyways, no real point in not stopping 
>>>>>> direct.
>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>>>> the VM op is not needed. Arguably this thread should actually 
>>>>>> stop here, since we are about to safepoint.
>>>>>>
>>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>>> method is removed.
>>>>>> And the extra safepoint is removed:
>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>> No we don't :)
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>> Change-set:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>
>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>
>>>>>> Thanks, Robbin
>>
>


From christoph.langer at sap.com  Mon Nov 25 12:38:15 2019
From: christoph.langer at sap.com (Langer, Christoph)
Date: Mon, 25 Nov 2019 12:38:15 +0000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <1d4ed7c6-d626-053a-e077-da284a078082@oracle.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>
 <5b46607d-c84b-086d-6241-cf2eee95d0a6@oracle.com>
 <cf42ad0e-ed3c-2e25-9381-2edd13a0af73@oracle.com>
 <1d4ed7c6-d626-053a-e077-da284a078082@oracle.com>
Message-ID: <AM6PR02MB4801B759287D5F20639F1F4F8A4A0@AM6PR02MB4801.eurprd02.prod.outlook.com>

Hi David,

thanks for your investigation. I'll prepare a fix to move back getPrefixed into canonicalize_md.c. However, could you please still fix your internal build in terms that it would have 'jdk_util.h' in the include path?

Thanks
Christoph

> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Montag, 25. November 2019 01:02
> To: Langer, Christoph <christoph.langer at sap.com>; Alan Bateman
> <Alan.Bateman at oracle.com>; gerard ziemski <gerard.ziemski at oracle.com>
> Cc: core-libs-dev at openjdk.java.net; hotspot-runtime-
> dev at openjdk.java.net
> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
> libjava, hotspot and libinstrument
> 
> 
> 
> On 25/11/2019 8:45 am, David Holmes wrote:
> > On 25/11/2019 7:49 am, David Holmes wrote:
> >> On 25/11/2019 7:33 am, David Holmes wrote:
> >>> Hi Christoph,
> >>>
> >>> On 23/11/2019 12:04 am, Langer, Christoph wrote:
> >>>> Hi,
> >>>>
> >>>> I'd like to push this change. However, running it through jdk-submit
> >>>> shows reproducible errors:
> >>>>
> >>>> Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
> >>>> BuildId: 2019-11-22-0926373.christoph.langer.source
> >>>> No failed tests
> >>>> Tasks Summary
> >>>> ???? NA: 0
> >>>> ???? NOTHING_TO_RUN: 0
> >>>> ???? KILLED: 0
> >>>> ???? PASSED: 76
> >>>> ???? UNABLE_TO_RUN: 0
> >>>> ???? EXECUTED_WITH_FAILURE: 1
> >>>> ???? FAILED: 0
> >>>> ???? HARNESS_ERROR: 0
> >>>> Build
> >>>> 1 Executed with failure
> >>>> o??? windows-x64-install-windows-x64-build-19 error while building,
> >>>> return value: 2
> >>>>
> >>>>
> >>>> Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
> >>>> BuildId: 2019-11-21-2311357.christoph.langer.source
> >>>> No failed tests
> >>>> Tasks Summary
> >>>> ???? NA: 0
> >>>> ???? NOTHING_TO_RUN: 0
> >>>> ???? KILLED: 0
> >>>> ???? PASSED: 76
> >>>> ???? UNABLE_TO_RUN: 0
> >>>> ???? EXECUTED_WITH_FAILURE: 1
> >>>> ???? FAILED: 0
> >>>> ???? HARNESS_ERROR: 0
> >>>> Build
> >>>> 1 Executed with failure
> >>>> o??? windows-x64-install-windows-x64-build-19 error while building,
> >>>> return value: 2
> >>>>
> >>>>
> >>>> David already had a look and let me know that the following was the
> >>>> reason:
> >>>>
> >>>>
> t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.
> c(41):
> >>>> fatal error C1083: Cannot open include file: 'jdk_util.h': No such
> >>>> file or directory
> >>>>
> >>>> This is not explainable to me as I see this running through my local
> >>>> build and our nightly builds without problems. I also can't explain
> >>>> jdk_util.h can't be opened at this place - it should be there and
> >>>> part of the include directories...
> >>>>
> >>>> I'd appreciate any help...
> >>>
> >>> I just dug a little deeper and this is failing in part of our closed
> >>> build for the install repo. There is a library there that is using
> >>> canonicalize_md.c directly - i.e. it adds that file to its source
> >>> files list. The build instructions don't include that directory on
> >>> the include directory list - hence the failure. But it will also fail
> >>> due to the name change you made.
> >>
> >> Actually it appears that the other source code doesn't actually refer
> >> to the canonicalize function at all, so a simple fix may be possible
> >> at the build level on our side. I'm testing that now.
> >
> > It isn't the canonicalize function that is used, it is getPrefixed,
> > which has now been moved to the io_util_md.c file. So a fix will be a
> > bit more involved.
> 
> I tried adding io_util_md.c to the library source list instead of
> canonicalize_md.c but that just caused a slew of other compilation
> failures, so I don't see any quick fix for us here.
> 
> David
> 
> >
> > David
> >
> >>
> >> David
> >> -----
> >>
> >>> Someone will need to work with you to make the necessary changes to
> >>> our code.
> >>>
> >>> David
> >>>
> >>>> Thanks
> >>>> Christoph
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Langer, Christoph
> >>>>> Sent: Donnerstag, 21. November 2019 14:19
> >>>>> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
> >>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> >>>>> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function
> >>>>> between
> >>>>> libjava, hotspot and libinstrument
> >>>>>
> >>>>> Hi Alan,
> >>>>>
> >>>>> thanks for the review. I'll push it then after running through
> >>>>> jdk-submit.
> >>>>>
> >>>>> /Christoph
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Alan Bateman <Alan.Bateman at oracle.com>
> >>>>>> Sent: Donnerstag, 21. November 2019 09:51
> >>>>>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
> >>>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> >>>>>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function
> >>>>>> between
> >>>>>> libjava, hotspot and libinstrument
> >>>>>>
> >>>>>> On 14/11/2019 15:37, Langer, Christoph wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> please review this cleanup change regarding function
> >>>>>>> "canonicalize" of
> >>>>>> libjava.
> >>>>>>>
> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
> >>>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
> >>>>>>>
> >>>>>>>
> >>>>>>> The goal is to cleanup how this function is defined and used. One
> >>>>>>> thing is,
> >>>>>> that there was an unnecessary wrapper function "Canonicalize" in
> >>>>>> jni_util.c.
> >>>>>> It wrapped the call to "canonicalize". We can get rid of this
> >>>>>> wrapper.
> >>>>>> Unfortunately, it is not possible to just export "canonicalize"
> >>>>>> since this will
> >>>>>> conflict with a method signature from the math library, at least
> >>>>>> on modern
> >>>>>> Linuxes. So I decided to call the method JDK_Canonicalize and will
> >>>>>> correctly
> >>>>>> define it in jdk_util.h which can be included everywhere.
> >>>>>>>
> >>>>>> I think this change is okay. My main concern when initially seeing
> >>>>>> this
> >>>>>> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
> >>>>>> canonical File when it wasn't there previously, this would of course
> >>>>>> have several implications. But I think you have it right and this
> >>>>>> is, as
> >>>>>> you position, just refactoring/cleanup.
> >>>>>>
> >>>>>> -Alan

From david.holmes at oracle.com  Mon Nov 25 12:41:53 2019
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 25 Nov 2019 22:41:53 +1000
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801B759287D5F20639F1F4F8A4A0@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <6209301e-ae85-2a91-7d9e-c9096581365d@oracle.com>
 <AM6PR02MB48010AA9F7B5B16B24058B288A4E0@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <AM6PR02MB4801BE8770BB5A5FF05EAE248A490@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <21f83181-eea8-7b7f-9f5f-5f1a26413154@oracle.com>
 <5b46607d-c84b-086d-6241-cf2eee95d0a6@oracle.com>
 <cf42ad0e-ed3c-2e25-9381-2edd13a0af73@oracle.com>
 <1d4ed7c6-d626-053a-e077-da284a078082@oracle.com>
 <AM6PR02MB4801B759287D5F20639F1F4F8A4A0@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <1657461f-ea1f-cd0d-f842-e5a5404afb96@oracle.com>

Hi Christoph,

On 25/11/2019 10:38 pm, Langer, Christoph wrote:
> Hi David,
> 
> thanks for your investigation. I'll prepare a fix to move back getPrefixed into canonicalize_md.c. However, could you please still fix your internal build in terms that it would have 'jdk_util.h' in the include path?

That should be simple enough to do.

David

> Thanks
> Christoph
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Montag, 25. November 2019 01:02
>> To: Langer, Christoph <christoph.langer at sap.com>; Alan Bateman
>> <Alan.Bateman at oracle.com>; gerard ziemski <gerard.ziemski at oracle.com>
>> Cc: core-libs-dev at openjdk.java.net; hotspot-runtime-
>> dev at openjdk.java.net
>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function between
>> libjava, hotspot and libinstrument
>>
>>
>>
>> On 25/11/2019 8:45 am, David Holmes wrote:
>>> On 25/11/2019 7:49 am, David Holmes wrote:
>>>> On 25/11/2019 7:33 am, David Holmes wrote:
>>>>> Hi Christoph,
>>>>>
>>>>> On 23/11/2019 12:04 am, Langer, Christoph wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'd like to push this change. However, running it through jdk-submit
>>>>>> shows reproducible errors:
>>>>>>
>>>>>> Job: mach5-one-clanger-JDK-8234185-1-20191122-0927-6913189
>>>>>> BuildId: 2019-11-22-0926373.christoph.langer.source
>>>>>> No failed tests
>>>>>> Tasks Summary
>>>>>> ???? NA: 0
>>>>>> ???? NOTHING_TO_RUN: 0
>>>>>> ???? KILLED: 0
>>>>>> ???? PASSED: 76
>>>>>> ???? UNABLE_TO_RUN: 0
>>>>>> ???? EXECUTED_WITH_FAILURE: 1
>>>>>> ???? FAILED: 0
>>>>>> ???? HARNESS_ERROR: 0
>>>>>> Build
>>>>>> 1 Executed with failure
>>>>>> o??? windows-x64-install-windows-x64-build-19 error while building,
>>>>>> return value: 2
>>>>>>
>>>>>>
>>>>>> Job: mach5-one-clanger-JDK-8234185-20191121-2313-6898791
>>>>>> BuildId: 2019-11-21-2311357.christoph.langer.source
>>>>>> No failed tests
>>>>>> Tasks Summary
>>>>>> ???? NA: 0
>>>>>> ???? NOTHING_TO_RUN: 0
>>>>>> ???? KILLED: 0
>>>>>> ???? PASSED: 76
>>>>>> ???? UNABLE_TO_RUN: 0
>>>>>> ???? EXECUTED_WITH_FAILURE: 1
>>>>>> ???? FAILED: 0
>>>>>> ???? HARNESS_ERROR: 0
>>>>>> Build
>>>>>> 1 Executed with failure
>>>>>> o??? windows-x64-install-windows-x64-build-19 error while building,
>>>>>> return value: 2
>>>>>>
>>>>>>
>>>>>> David already had a look and let me know that the following was the
>>>>>> reason:
>>>>>>
>>>>>>
>> t:/workspace/open/src/java.base/windows/native/libjava/canonicalize_md.
>> c(41):
>>>>>> fatal error C1083: Cannot open include file: 'jdk_util.h': No such
>>>>>> file or directory
>>>>>>
>>>>>> This is not explainable to me as I see this running through my local
>>>>>> build and our nightly builds without problems. I also can't explain
>>>>>> jdk_util.h can't be opened at this place - it should be there and
>>>>>> part of the include directories...
>>>>>>
>>>>>> I'd appreciate any help...
>>>>>
>>>>> I just dug a little deeper and this is failing in part of our closed
>>>>> build for the install repo. There is a library there that is using
>>>>> canonicalize_md.c directly - i.e. it adds that file to its source
>>>>> files list. The build instructions don't include that directory on
>>>>> the include directory list - hence the failure. But it will also fail
>>>>> due to the name change you made.
>>>>
>>>> Actually it appears that the other source code doesn't actually refer
>>>> to the canonicalize function at all, so a simple fix may be possible
>>>> at the build level on our side. I'm testing that now.
>>>
>>> It isn't the canonicalize function that is used, it is getPrefixed,
>>> which has now been moved to the io_util_md.c file. So a fix will be a
>>> bit more involved.
>>
>> I tried adding io_util_md.c to the library source list instead of
>> canonicalize_md.c but that just caused a slew of other compilation
>> failures, so I don't see any quick fix for us here.
>>
>> David
>>
>>>
>>> David
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Someone will need to work with you to make the necessary changes to
>>>>> our code.
>>>>>
>>>>> David
>>>>>
>>>>>> Thanks
>>>>>> Christoph
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Langer, Christoph
>>>>>>> Sent: Donnerstag, 21. November 2019 14:19
>>>>>>> To: Alan Bateman <Alan.Bateman at oracle.com>; core-libs-
>>>>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: RE: RFR: 8234185: Cleanup usage of canonicalize function
>>>>>>> between
>>>>>>> libjava, hotspot and libinstrument
>>>>>>>
>>>>>>> Hi Alan,
>>>>>>>
>>>>>>> thanks for the review. I'll push it then after running through
>>>>>>> jdk-submit.
>>>>>>>
>>>>>>> /Christoph
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Alan Bateman <Alan.Bateman at oracle.com>
>>>>>>>> Sent: Donnerstag, 21. November 2019 09:51
>>>>>>>> To: Langer, Christoph <christoph.langer at sap.com>; core-libs-
>>>>>>>> dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR: 8234185: Cleanup usage of canonicalize function
>>>>>>>> between
>>>>>>>> libjava, hotspot and libinstrument
>>>>>>>>
>>>>>>>> On 14/11/2019 15:37, Langer, Christoph wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> please review this cleanup change regarding function
>>>>>>>>> "canonicalize" of
>>>>>>>> libjava.
>>>>>>>>>
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234185
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~clanger/webrevs/8234185.0/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The goal is to cleanup how this function is defined and used. One
>>>>>>>>> thing is,
>>>>>>>> that there was an unnecessary wrapper function "Canonicalize" in
>>>>>>>> jni_util.c.
>>>>>>>> It wrapped the call to "canonicalize". We can get rid of this
>>>>>>>> wrapper.
>>>>>>>> Unfortunately, it is not possible to just export "canonicalize"
>>>>>>>> since this will
>>>>>>>> conflict with a method signature from the math library, at least
>>>>>>>> on modern
>>>>>>>> Linuxes. So I decided to call the method JDK_Canonicalize and will
>>>>>>>> correctly
>>>>>>>> define it in jdk_util.h which can be included everywhere.
>>>>>>>>>
>>>>>>>> I think this change is okay. My main concern when initially seeing
>>>>>>>> this
>>>>>>>> go by was that it would leak the \\?\ or \\?\UNC\ prefix into the
>>>>>>>> canonical File when it wasn't there previously, this would of course
>>>>>>>> have several implications. But I think you have it right and this
>>>>>>>> is, as
>>>>>>>> you position, just refactoring/cleanup.
>>>>>>>>
>>>>>>>> -Alan

From robbin.ehn at oracle.com  Mon Nov 25 12:48:35 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 25 Nov 2019 13:48:35 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <9cdcb5e0-a517-2193-e77d-ad024ac1d11f@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>
 <9cdcb5e0-a517-2193-e77d-ad024ac1d11f@oracle.com>
Message-ID: <8271e3b6-0a94-4b6d-dfb8-3405f1534444@oracle.com>

Hi Serguei, thanks for having a look.

AFAIK:
Today one of these three can happen when returning to agent (native).
- The target thread for stop have not yet installed the async exception.
- The target thread have installed the async exception, but not yet stopped.
- The target thread have installed the async exception and already stopped.

A agent must handle all three possible scenarios.
This patch just removes the first scenario, and makes the installation part 
synchrone.

Hope that helps!

Note, I don't see the method "StopThread" being documented as either synchrone
or asynchrone itself.
The exception is documented as beeing asynchrone.

And I don't think we follow the specs now, since we ignore the result of:
void VM_ThreadStop::doit()
One would expect e.g. JVMTI_ERROR_THREAD_NOT_ALIVE if we never deilver the async
exception. So making the async exception installation part synchrone would be a
step to fix that issue.

Thanks, Robbin

On 2019-11-25 12:45, serguei.spitsyn at oracle.com wrote:
> Please, skip my reply below.
> I need to read all emails carefully.
> 
> Thanks,
> Serguei
> 
> On 11/25/19 03:35, serguei.spitsyn at oracle.com wrote:
>> Hi Dan and Robbin,
>>
>> I can be wrong and missing something but it feels like there is no issue for 
>> JVMTI with this fix.
>>
>> > Off the top of head, I can't think of a way for a caller of
>> > Thread::send_async_exception() to determine that the call is now
>> > synchronous instead of asynchronous, but ...
>>
>> There can be some confusion here about what is synchronous relative to.
>> I read it this way:
>> ?It synchronous for the current thread which calls the send_async_exception().
>> ?However, it is asynchronous for the target thread that needs to be stopped.
>> ?So that the fix does not break the JVMTI spec requirements.
>>
>> Please, let me know if you agree (or not) with this reading.
>>
>> Thanks,
>> Serguei
>>
>>
>> On 11/22/19 13:50, Daniel D. Daugherty wrote:
>>> Hi Robbin,
>>>
>>> Sorry I'm late to this review thread...
>>>
>>> I'm adding Serguei to this email thread since I'm making comments
>>> about the JVM/TI parts of this changeset...
>>>
>>>
>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>>
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/runtime/vmOperations.cpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/runtime/vmThread.hpp
>>> ??? L148: ? // The ever running loop for the VMThread
>>> ??? L149: ? void loop();
>>> ??? L150: ? static void check_cleanup();
>>> ??????? nit - Feels like an odd place to add check_cleanup().
>>>
>>> ??????? Update: Now that I've seen what clean_up(), it needs a
>>> ??????? better name. Perhaps check_for_forced_cleanup()? And since
>>> ??????? it is supposed to affect the running loop for the VMThread
>>> ??????? I'm okay with its location now.
>>>
>>> src/hotspot/share/runtime/vmThread.cpp
>>> ??? L382: ? event->set_blocking(true);
>>> ??????? Probably have to keep the 'blocking' attribute in the event
>>> ??????? for backward compatibility in the JFR record format?
>>>
>>> ??? L478: ??????? // wait with a timeout to guarantee safepoints at regular 
>>> intervals
>>> ??????? Is this comment true anymore (even before this changeset)?
>>> ??????? Adding this on the next line might help:
>>>
>>> ????????????????? // (if there is cleanup work to do)
>>>
>>> ??????? since I _think_ that's how the policy has been evolved...
>>>
>>> ??? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>>> ??????? Please prefix with "(void)" to make it clear you are
>>> ??????? intentionally ignoring the return value.
>>>
>>> ??? old L627-634 (We want to make sure that we get to a safepoint regularly)
>>> ??????? I think this now old code is covered by your change above:
>>>
>>> ??????? L488: ??????? // If the queue contains a safepoint VM op,
>>> ??????? L489: ??????? // clean up will be done so we can skip this part.
>>> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>>
>>> ??????? Please confirm that our thinking is the same here.
>>>
>>> ??? L661: ??? int ticket =? t->vm_operation_ticket();
>>> ??????? nit - extra space after '='
>>>
>>> ??? Okay. Definitely simpler code.
>>>
>>> src/hotspot/share/runtime/handshake.cpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/runtime/safepoint.hpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/runtime/safepoint.cpp
>>> ??? Definitely got my attention with
>>> ??? ObjectSynchronizer::needs_monitor_scavenge().
>>>
>>> src/hotspot/share/runtime/synchronizer.hpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, triggering 
>>> safepoint cleanup.");
>>> ??????? Thanks for adding the logging line.
>>>
>>> ?? ? ?? Update: As Kim pointed out, this code goes away when
>>> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>>> ? ? ? ? forward to making that change.
>>>
>>> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && Atomic::xchg 
>>> (1, &_forceMonitorScavenge) == 0) {
>>> ??????? nit - extra space between 'xchg ('
>>>
>>> ??????? Since InduceScavenge() is only called when the deprecated
>>> ??????? MonitorBound is specified, I think you could use cmpxchg()
>>> ??????? for clarity. Of course, you might be thinking that the
>>> ??????? pattern is a useful example for other folks to copy...
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>> ??? old L527: // Enqueue a VM_Operation to do the job for us - sometime later
>>> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
>>> java_throwable) {
>>> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>>> ??? L529: ? VMThread::execute(&vm_stop);
>>> ??? L530: }
>>> ?????? Okay so you deleted the comment about the call being async and the
>>> ?????? VM op is no longer async, but does that break the expectation of
>>> ?????? any callers?
>>>
>>> ?????? Off the top of head, I can't think of a way for a caller of
>>> ?????? Thread::send_async_exception() to determine that the call is now
>>> ?????? synchronous instead of asynchronous, but ...
>>>
>>> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>>> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>>> ?????? used to throw an exception at the calling thread, I suspect that
>>> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>>> ?????? With the exception throwing now being synchronous, would that
>>> ?????? affect the return value of the JVM/TI StopThread() call?
>>>
>>> ?????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
>>> ?????? in the build directory) uses ThreadInVMfromNative so the calling
>>> ?????? thread is in VM when it requests the now synchronous VM operation.
>>> ?????? When it requests the VM op, the calling thread will block which
>>> ?????? should allow the VM thread to execute the op. No worries there so
>>> ?????? far...
>>>
>>> ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
>>> ?????? so I think if the exception is delivered to the calling thread
>>> ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>>> ?????? will have our return value. The CautiouslyPreserveExceptionMark
>>> ?????? destructor won't kick in until we return from jvmti_StopThread()
>>> ?????? (the JVM/TI wrapper from the build).
>>>
>>> ?????? However, that might cause this assertion to fire:
>>>
>>> ?????? src/hotspot/share/utilities/preserveException.cpp:
>>> ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
>>> generated");
>>>
>>> ?????? because it is now detecting that an exception was thrown
>>> ?????? while executing a JVM/TI call. This is pure theory here.
>>>
>>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>>> ??? No comments.
>>>
>>> src/hotspot/share/runtime/biasedLocking.cpp
>>> ??? old L85: ??? // Use async VM operation to avoid blocking the Watcher thread.
>>> ??????? Again, you've deleted the comment, but is there going to
>>> ??????? be any unexpected side effects from the change? Looks like
>>> ??????? the work consists of:
>>>
>>> ??????? L70: ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>>
>>> ??????? Is that going to be a problem for the WatcherThread?
>>>
>>> test/hotspot/gtest/threadHelper.inline.hpp
>>> ??? No comments.
>>>
>>> As David H. likes to say: the proof is in the building and testing.
>>>
>>> Thumbs up on the overall idea and implementation. There might be an
>>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>>> on my part...
>>>
>>> Dan
>>>
>>>
>>>
>>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>>> Hi David,
>>>>
>>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Here is v3:
>>>>>>
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>
>>>>> Looking at the highly discussed:
>>>>>
>>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>>> &ForceMonitorScavenge) == 0) {
>>>>>
>>>>> why isn't that just:
>>>>>
>>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>>
>>>>> ??
>>>>
>>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>>> Many threads can be enter and re-enter here.
>>>> I don't know if that's still the case.
>>>>
>>>> Since we only hit this path when the deprecated MonitorsBound is set, I 
>>>> think I can change it?
>>>>
>>>>>
>>>>> Also while we are here can we clean this up further:
>>>>>
>>>>> static volatile int ForceMonitorScavenge = 0;
>>>>>
>>>>> becomes
>>>>>
>>>>> static int _forceMonitorScavenge = 0;
>>>>>
>>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>>
>>>>
>>>> Sure!
>>>>
>>>>> Just to be clear, I understand the changes around monitor scavenging now, 
>>>>> though I'm not sure getting rid of async VM ops and replacing with a new 
>>>>> way to directly wakeup the VMThread really amounts to a simplification.
>>>>>
>>>>> ---
>>>>>
>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>
>>>>> I still think getting rid of Mode altogether would be a good 
>>>>> simplification. :)
>>>>
>>>> Sure!
>>>>
>>>> Here is v4, inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>>> Inc:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>>
>>>>>> Tested t1-3
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>>> Hi all, please review.
>>>>>>>
>>>>>>> CMS was the last real user of the more advantage features of VM operation.
>>>>>>> VM operation can be simplified to always be an stack object and thus 
>>>>>>> either be
>>>>>>> of safepoint or no safepoint type.
>>>>>>>
>>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>>>>> (default not used). Making it synchrone doesn't matter.
>>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for 
>>>>>>> the safepoint anyways, no real point in not stopping direct.
>>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM 
>>>>>>> op is not needed. Arguably this thread should actually stop here, since 
>>>>>>> we are about to safepoint.
>>>>>>>
>>>>>>> There is also a small cleanup in vmThread.cpp where an unused method is 
>>>>>>> removed.
>>>>>>> And the extra safepoint is removed:
>>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>>> No we don't :)
>>>>>>>
>>>>>>> Issue:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>>> Change-set:
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>>
>>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>
>>
> 

From zgu at redhat.com  Mon Nov 25 13:30:35 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 25 Nov 2019 08:30:35 -0500
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
Message-ID: <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>

Ping ... May I get a second review?

Thanks,

-Zhengyu

On 11/21/19 12:12 PM, yumin qi wrote:
> Hi, Zhengyu
> 
>  ? The fix looks good to me.
> 
> Thanks
> Yumin
> 
> 
> 
> On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com 
> <mailto:zgu at redhat.com>> wrote:
> 
>     JDK-8204128 did not fix the original bug. But new assertion helped to
>     catch the problem, as it consistently failed in Oracle internal tests.
> 
>     The root cause is that, when NMT biases a resource area to compiler, it
>     did not adjust tracking data to reflect that. When the biased resource
>     area is released, there is a possibility that its size is greater than
>     total size recorded, and underflow a size_t counter.
> 
>     JDK-8204128 patch also missed a long to ssize_t parameter type change,
>     that resulted new test failure on Windows, because long is 4-bytes on
>     Windows.
> 
>     Many thanks to Leonid Mesnik, who helped to run this patch through
>     Oracle's internal stress tests.
> 
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
>     Webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
> 
> 
>     Test:
>      ? ?hotspot_nmt
>      ? ?Submit test
>      ? ?Oracle internal stress tests.
> 
> 
>     Thanks,
> 
>     -Zhengyu
> 


From claes.redestad at oracle.com  Mon Nov 25 13:51:12 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Mon, 25 Nov 2019 14:51:12 +0100
Subject: RFR: 8234735: InstanceKlass:find_method_index regression after
 JDK-8231610
Message-ID: <10b66da5-1cc8-16bf-0d9b-f1e51665e191@oracle.com>

Hi,

JDK-8231610 refactored InstanceKlass::binary_search to ::quick_search,
which now has a linear slow-path version in case the VM is currently
dumping the dynamic CDS archive. This cause the methods to grow enough
that ::quick_search is not inlined into find_method_index, which cause a
local regression.

Since the linear search is not as performance sensitive as the default
binary search, then outlining the slow-path to a different method makes
sense. Tests show that this allows the performance critical path to be
completely inlined again, recuperating about half of the performance
regression (still adds a branch to check for whether we should do a
linear search to the fast path, compared to pre-8231610)

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234735
Webrev: http://cr.openjdk.java.net/~redestad/8234735/open.00/

Testing: tier1-3

Thanks!

/Claes

From ralf.schmelter at sap.com  Mon Nov 25 14:41:03 2019
From: ralf.schmelter at sap.com (Schmelter, Ralf)
Date: Mon, 25 Nov 2019 14:41:03 +0000
Subject: RFR (M) 8234510: Remove file seeking requirement for writing a heap
 dump
Message-ID: <AM0PR02MB45008C66EC315E9836F7FF7A9F4A0@AM0PR02MB4500.eurprd02.prod.outlook.com>

Hello,

this change removes the need to use seek on the hprof file when creating a heap dump, thus making it possible to stream the dump. This enables us to dump to a socket or directly gzip the dump.

Instead of fixing the heap dump segments size on the written file, the size of the heap dump segments is either fixed up in the buffer instead or, for entries to big to fit into the buffer fully, the entry get its own segment with no need to fix up the segment size later.

To do this, we now need to know how large an heap dump segment entry is when starting to write the entry. This is either trivial (for the roots) or already known (for the instance and array dump entries). Just the class entry needed a little more code to track the size.

The change results in more heap dump segments in the written heap dump. But since the overhead per segment is 9 bytes, even for the smallest used buffer (64K) the overhead is less than 0.02%. Additionally the heap dump now expects to be able to allocate at least 64k for the buffer. The old code tried to run even with a buffer of 1 byte or no buffer at all.

Bugreport: https://bugs.openjdk.java.net/browse/JDK-8234510 
Webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8234510/webrev.0/ 

Best regards,
Ralf

From erik.gahlin at oracle.com  Mon Nov 25 16:27:46 2019
From: erik.gahlin at oracle.com (Erik Gahlin)
Date: Mon, 25 Nov 2019 17:27:46 +0100
Subject: 8233197(S): Invert JvmtiExport::post_vm_initialized() and
 Jfr:on_vm_start() start-up order for correct option parsing
In-Reply-To: <62407a3d-f6a2-400b-9311-9ab7e32d85f7@default>
References: <b2bf81c0-80fa-49e4-ac09-8fa6589b1e80@default>
 <1ca7ae34-41fe-fad1-4bd2-57cdf9667bd9@oracle.com>
 <62407a3d-f6a2-400b-9311-9ab7e32d85f7@default>
Message-ID: <8da1145a-5e65-8db8-8a11-9bce1af22233@oracle.com>

Looks good.

Erik

On 2019-11-20 21:54, Markus Gronlund wrote:
>
> "It does not look as a good idea to change the JVMTI phase like above.
>
> ? If you need the ONLOAD phase just to enable capabilities then it is 
> better to do it in the real ONLOAD phase.
>
> ? Do I miss anything important here?
>
> ? Please, ask questions if you have any problems with it."
>
> Yes, so the reason for the phase transition is not so much to do with 
> capabilities, but that an agent can only register, i.e. call GetEnv(), 
> in phases JVMTI_PHASE_ONLOAD and JVMTI_PHASE_LIVE.
>
> create_vm_init_agents() is where the (temporary) 
> JVMTI_PHASE_PRIMORDIAL to JVMTI_PHASE_ONLOAD happens during the 
> callouts to Agent_OnLoad(), and then the state is returned to 
> JVMTI_PHASE_PRIMORDIAL. It is hard to find an unconditional hook point 
> there since create_vm_init_agents() is made conditional on 
> Arguments::init_agents_at_startup(), with a listing populated from 
> "real agents" (on command-line).
>
> The JFR JVMTI agent itself is also conditional, installed only if JFR 
> is actively started (i.e. a starting a recording). Hence, the phase 
> transition mechanism merely replicates the state changes in 
> create_vm_init_agents() to have the agent register properly. This is a 
> moot point now however as I have taken another pass. I now found a way 
> to only have the agent register during the JVMTI_PHASE_LIVE phase, so 
> the phase transition mechanism is not needed.
>
> "The Jfr::on_vm_init() is confusing as there is a mismatch with the 
> JVMTI phases order.
>
> ? It fills like it means JFR init event (not VM init) or something 
> like this.
>
> ? Or maybe it denotes the VM initialization start. :)
>
> ? I'll be happy if you could explain it a little bit."
>
> Yes, this is confusing, I agree. Of course, JFR has a tight relation 
> to the JVMTI phases, but only in so far as to coordinate agent 
> registration. The JFR calls are not intended to reflect the JVMTI 
> phases per se but a more general initialization order state 
> description, like you say "VM initialization start and completion". 
> However, it is very hard to encode proper semantics into the JFR calls 
> in Threads::create_vm() to reflect the concepts of "stages"; they are 
> simply not well-defined. In addition, there are so many of them J. For 
> example, I always get confused that VM initialization is reflected in 
> JVMTI by the VMStart event and the completion by the VMInit event 
> (representing VM initialization complete). At the same time, the 
> DTRACE macros have both HOTSPOT_VM_INIT_BEGIN() HOTSPOT_VM_INIT_END() 
> placed before both...
>
> I abandoned the attempt to encode anything meaningful into the JFR 
> calls trying to represent a certain "VM initialization stage".
>
> Instead, I will just have syntactic JFR calls reflecting some relative 
> order (on_create_vm_1(), on_create_vm_2(),.. _3()) etc. Looks like 
> there are precedents of this style.
>
> ?Not sure, if your agent needs to enable these capabilities 
> (introduced in JDK 9 with modules):
> ? can_generate_early_vmstart
> ? can_generate_early_class_hook_events?
>
> Thanks for the suggestion Serguei, but these capabilities are not yet 
> needed.
>
> Here is the updated webrev: 
> http://cr.openjdk.java.net/~mgronlun/8233197/webrev02/
>
> Thanks again
>
> Markus
>
> *From:*Serguei Spitsyn
> *Sent:* den 20 november 2019 04:10
> *To:* Markus Gronlund <markus.gronlund at oracle.com>; hotspot-jfr-dev 
> <hotspot-jfr-dev at openjdk.java.net>; 
> hotspot-runtime-dev at openjdk.java.net; serviceability-dev at openjdk.java.net
> *Subject:* Re: 8233197(S): Invert JvmtiExport::post_vm_initialized() 
> and Jfr:on_vm_start() start-up order for correct option parsing
>
> Hi Marcus,
>
> It looks good in general.
>
> A couple of comments though.
>
> http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp.frames.html
>
> 258 class JvmtiPhaseTransition {
> 259? private:
> 260?? bool _transition;
> 261? public:
> 262?? JvmtiPhaseTransition() : _transition(JvmtiEnvBase::get_phase() 
> == JVMTI_PHASE_PRIMORDIAL) {
> 263???? if (_transition) {
> 264?????? JvmtiEnvBase::set_phase(JVMTI_PHASE_ONLOAD);
> 265???? }
> 266?? }
> 267?? ~JvmtiPhaseTransition() {
> 268???? if (_transition) {
> 269?????? assert(JvmtiEnvBase::get_phase() == JVMTI_PHASE_ONLOAD, 
> "invariant");
> 270?????? JvmtiEnvBase::set_phase(JVMTI_PHASE_PRIMORDIAL);
> 271???? }
> 272?? }
> 273 };
> 274
>  ?275 static bool initialize() {
>   276?? JavaThread* const jt = current_java_thread();
>   277?? assert(jt != NULL, "invariant");
>   278?? assert(jt->thread_state() == _thread_in_vm, "invariant");
>   279?? DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(jt));
> *280?? JvmtiPhaseTransition jvmti_phase_transition;*
>   281?? ThreadToNativeFromVM transition(jt);
>   282?? if (create_jvmti_env(jt) != JNI_OK) {
>   283???? assert(jfr_jvmti_env == NULL, "invariant");
>   284???? return false;
>   285?? }
>   286?? assert(jfr_jvmti_env != NULL, "invariant");
> 287?? if (!register_capabilities(jt)) {
>   288???? return false;
>   289?? }
> 290?? if (!register_callbacks(jt)) {
>   291???? return false;
>   292?? }
> 293?? return update_class_file_load_hook_event(JVMTI_ENABLE);
>   294 }
>
>
> It does not look as a good idea to change the JVMTI phase like above.
> If you need the ONLOAD phase just to enable capabilities then it is 
> better to do it in the real ONLOAD phase.
> Do I miss anything important here?
> Please, ask questions if you have any problems with it.
>
> The Jfr::on_vm_init() is confusing as there is a mismatch with the 
> JVMTI phases order.
> It fills like it means JFR init event (not VM init) or something like 
> this.
> Or maybe it denotes the VM initialization start. :)
> I'll be happy if you could explain it a little bit.
>
> Not sure, if your agent needs to enable these capabilities (introduced 
> in JDK 9 with modules):
> ? can_generate_early_vmstart
> ? can_generate_early_class_hook_events
>
> Thanks,
> Serguei
>
>
> On 11/19/19 06:38, Markus Gronlund wrote:
>
>     Greetings,
>
>     (apologies for the wide distribution)
>
>     Kindly asking for reviews for the following changeset:
>
>     Bug:https://bugs.openjdk.java.net/browse/JDK-8233197  
>
>     Webrev:http://cr.openjdk.java.net/~mgronlun/8233197/webrev01/
>
>     Testing: serviceability/jvmti, jdk_jfr, tier1-5
>
>     Summary: please see bug for description.
>
>     For Runtime / Serviceability folks:
>
>     This change slightly modifies the relative order in Threads::create_vm(); please see threads.cpp.
>
>     There is an upcall as part of Jfr::on_vm_start() that delivers global JFR command-line options to Java (only if set).
>
>     The behavioral change amounts to a few classes loaded as part of establishing this upcall (all internal JFR classes and/or java.base classes, loaded by the bootloader) no longer being visible to the ClassFileLoadHook's of agents. These classes are visible to agents that work with "early_start" JVMTI environments however.
>
>     The major part of JFR startup with associated class loading still happens as part of Jfr::on_vm_live() with no behavioral change in relation to agents.
>
>     Thank you
>
>     Markus
>

From harold.seigel at oracle.com  Mon Nov 25 17:13:03 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Mon, 25 Nov 2019 12:13:03 -0500
Subject: RFR 8234656: Improve granularity of verifier logging
Message-ID: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>

Hi,

Please review this small change to improve the granularity of verifier 
logging.? This change provides brief output for log level info and 
detailed logging for log levels debug and trace. Additionally, it 
changes verifier test TraceClassRes.java to use the logging API command 
line options.

Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/

JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656

The fix was regression tested by running Mach5 tiers 1 and 2 tests and 
builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 
tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64.

Thanks, Harold


From patricio.chilano.mateo at oracle.com  Mon Nov 25 18:48:04 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Mon, 25 Nov 2019 15:48:04 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <61034f63-0843-a531-4da6-fe4064cdb357@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <61034f63-0843-a531-4da6-fe4064cdb357@oracle.com>
Message-ID: <0b24ca43-1f36-6435-f63a-f495998abf11@oracle.com>

Hi David,

On 11/25/19 1:46 AM, David Holmes wrote:
> Hi Patricio,
>
> On 23/11/2019 4:25 am, Patricio Chilano wrote:
>> Hi,
>>
>> This patch aims to address a current bug where, given the right 
>> combination of handshakes and external suspend/resume, a JavaThread 
>> can transition from a safe state back to Java without blocking for a 
>> still-in-progress handshake. In the description of the bug I added an 
>> example, tracing the state changes of the JavaThread as it goes 
>> through the different transitions until it escapes the handshake. 
>> Currently, the window of time for this issue to happen is so small 
>> that we do not see actual failures running tests. Running test 
>> SuspendAtExit.java and adding some small delay before restoring the 
>> JavaThread state in java_suspend_self_with_safepoint_check() can 
>> demonstrate the issue.
>
> Good catch. This highlights how difficult it is to see where all the 
> thread-state-transitions are and reason about what can and can't 
> happen in a given sequence of code.
>
>> The proposed fix is to check again if we have a pending/in-progress 
>> handshake operation after executing ~ThreadInVMForHandshake().
>
> Minor nit but given we end up calling process_self_inner only after it 
> was determined the current _handshake has_operation(), then the while 
> loop should really be a do-while loop ?
Right, changed!

Here is v2:
http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/ 
<http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/src/hotspot/share/runtime/handshake.cpp.udiff.html>

Thanks for looking at this David!

Patricio
> Thanks,
> David
> -----
>
>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
>> and Solaris).
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>>
>> Thanks,
>> Patricio


From ioi.lam at oracle.com  Mon Nov 25 18:50:24 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 25 Nov 2019 10:50:24 -0800
Subject: RFR: 8234735: InstanceKlass:find_method_index regression after
 JDK-8231610
In-Reply-To: <10b66da5-1cc8-16bf-0d9b-f1e51665e191@oracle.com>
References: <10b66da5-1cc8-16bf-0d9b-f1e51665e191@oracle.com>
Message-ID: <994ada01-e1ee-e7bd-6a3f-9c1a8136684f@oracle.com>

Hi Claes,

This change looks good to me.

Thanks!
- Ioi

On 11/25/19 5:51 AM, Claes Redestad wrote:
> Hi,
>
> JDK-8231610 refactored InstanceKlass::binary_search to ::quick_search,
> which now has a linear slow-path version in case the VM is currently
> dumping the dynamic CDS archive. This cause the methods to grow enough
> that ::quick_search is not inlined into find_method_index, which cause a
> local regression.
>
> Since the linear search is not as performance sensitive as the default
> binary search, then outlining the slow-path to a different method makes
> sense. Tests show that this allows the performance critical path to be
> completely inlined again, recuperating about half of the performance
> regression (still adds a branch to check for whether we should do a
> linear search to the fast path, compared to pre-8231610)
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234735
> Webrev: http://cr.openjdk.java.net/~redestad/8234735/open.00/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes


From ioi.lam at oracle.com  Mon Nov 25 18:57:12 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 25 Nov 2019 10:57:12 -0800
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
Message-ID: <129f73f6-0d9b-ed5e-a24a-f95339f6a496@oracle.com>

Hi Harold,

The change looks good to me.

Thanks
- Ioi

On 11/25/19 9:13 AM, Harold Seigel wrote:
> Hi,
>
> Please review this small change to improve the granularity of verifier 
> logging.? This change provides brief output for log level info and 
> detailed logging for log levels debug and trace. Additionally, it 
> changes verifier test TraceClassRes.java to use the logging API 
> command line options.
>
> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
>
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
>
> The fix was regression tested by running Mach5 tiers 1 and 2 tests and 
> builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 
> tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64.
>
> Thanks, Harold
>


From patricio.chilano.mateo at oracle.com  Mon Nov 25 18:58:50 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Mon, 25 Nov 2019 15:58:50 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <1333677e-23ff-17fb-87df-055220efe850@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <1333677e-23ff-17fb-87df-055220efe850@oracle.com>
Message-ID: <28ff192e-19b2-6380-30ba-e406cdf69a0b@oracle.com>

Hi Robbin,

On 11/25/19 4:14 AM, Robbin Ehn wrote:
> Hi Patricio,
>
> On 2019-11-22 19:25, Patricio Chilano wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>
> Thanks, I think this is good and easy to backport!
> You might as well add native to the assert.
Added!

> We should revisit this when we have time.
> There are two polls and four transition in this code, which is more 
> complicated
> than I like.
Agree, I think it could be simplified.

Here is v2:
http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/ 
<http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/src/hotspot/share/runtime/handshake.cpp.udiff.html>

Thanks for looking at this Robbin!

Patricio
> /Robbin
>
>>
>> Thanks,
>> Patricio


From harold.seigel at oracle.com  Mon Nov 25 19:04:44 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Mon, 25 Nov 2019 14:04:44 -0500
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <129f73f6-0d9b-ed5e-a24a-f95339f6a496@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
 <129f73f6-0d9b-ed5e-a24a-f95339f6a496@oracle.com>
Message-ID: <bf992b3b-dff6-9037-a6b0-b4a1bdf77159@oracle.com>

Thanks Ioi!

Harold

On 11/25/2019 1:57 PM, Ioi Lam wrote:
> Hi Harold,
>
> The change looks good to me.
>
> Thanks
> - Ioi
>
> On 11/25/19 9:13 AM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this small change to improve the granularity of 
>> verifier logging.? This change provides brief output for log level 
>> info and detailed logging for log levels debug and trace. 
>> Additionally, it changes verifier test TraceClassRes.java to use the 
>> logging API command line options.
>>
>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
>>
>> The fix was regression tested by running Mach5 tiers 1 and 2 tests 
>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running 
>> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on 
>> Linux-x64.
>>
>> Thanks, Harold
>>
>

From lois.foltan at oracle.com  Mon Nov 25 19:35:14 2019
From: lois.foltan at oracle.com (Lois Foltan)
Date: Mon, 25 Nov 2019 14:35:14 -0500
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
Message-ID: <0099f421-5998-a796-5046-037e394ae344@oracle.com>

Looks good Harold!
Lois

On 11/25/2019 12:13 PM, Harold Seigel wrote:
> Hi,
>
> Please review this small change to improve the granularity of verifier 
> logging.? This change provides brief output for log level info and 
> detailed logging for log levels debug and trace. Additionally, it 
> changes verifier test TraceClassRes.java to use the logging API 
> command line options.
>
> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
>
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
>
> The fix was regression tested by running Mach5 tiers 1 and 2 tests and 
> builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 
> tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64.
>
> Thanks, Harold
>


From harold.seigel at oracle.com  Mon Nov 25 19:38:47 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Mon, 25 Nov 2019 14:38:47 -0500
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <0099f421-5998-a796-5046-037e394ae344@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
 <0099f421-5998-a796-5046-037e394ae344@oracle.com>
Message-ID: <8a7a21cb-8cc7-7473-cc9f-0186d27ca124@oracle.com>

Thanks Lois!

Harold

On 11/25/2019 2:35 PM, Lois Foltan wrote:
> Looks good Harold!
> Lois
>
> On 11/25/2019 12:13 PM, Harold Seigel wrote:
>> Hi,
>>
>> Please review this small change to improve the granularity of 
>> verifier logging.? This change provides brief output for log level 
>> info and detailed logging for log levels debug and trace. 
>> Additionally, it changes verifier test TraceClassRes.java to use the 
>> logging API command line options.
>>
>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
>>
>> The fix was regression tested by running Mach5 tiers 1 and 2 tests 
>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running 
>> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on 
>> Linux-x64.
>>
>> Thanks, Harold
>>
>

From thomas.stuefe at gmail.com  Mon Nov 25 20:54:48 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 25 Nov 2019 21:54:48 +0100
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
 <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>
Message-ID: <CAA-vtUw5VMStBAQwA2tvsyyCPgz48g5s3hEDetebSNfMCBgnUA@mail.gmail.com>

Hi Zhengyu,

Hi Zhengyu,

http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/src/hotspot/share/memory/arena.cpp.udiff.html

not sure I understand this change - why is it needed?:

@@ -360,11 +360,12 @@
   }
   if (k) k->set_next(_chunk);   // Append new chunk to end of linked list
   else _first = _chunk;
   _hwm  = _chunk->bottom();     // Save the cached hwm, max
   _max =  _chunk->top();
-  set_size_in_bytes(size_in_bytes() + len);
+  size_t new_size = size_in_bytes() + _chunk->length();
+  set_size_in_bytes(new_size);
   void* result = _hwm;
   _hwm += x;
   return result;
 }

http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/src/hotspot/share/services/mallocTracker.hpp.udiff.html

-  inline void resize(long sz) {
+  inline void resize(ssize_t sz) {
     if (sz != 0) {
+      assert(sz >= 0 || _size >= size_t(-sz), "Must be");
       Atomic::add(size_t(sz), &_size);
       DEBUG_ONLY(_peak_size = MAX2(_size, _peak_size);)
     }
   }

assert looks fine but the Atomic::add() took me by surprise: when size is
reduced, we feed it knowingly a negative number we then to unsigned and
rely on the overflow? Just a nit, but would Atomic::sub() not be clearer
here?

http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/test/hotspot/jtreg/runtime/NMT/HugeArenaTracking.java.html

The test is fine as it is. Some thoughts (feel free to ignore them):

- would be more interesting to shake the boat a little by varying increase
rate, e.g.:

  60     // Allocate 2GB+ from arena
  61     long total = 0;
  62     while (total < 2 * GB) {
           long increase = <random_number_between_some_bytes_and_I dont
know, 100M? Capped at 2 GB>
  63       wb.NMTArenaMalloc(arena1, increase);
  64       total += increase;
  65     }

and maybe test jcmd VM.native_memory before this point? I know its annoying
since the numbers are probably not exact, and parsing would be necessary.
Just an idea.

  66     wb.NMTFreeArena(arena1);

On Mon, Nov 25, 2019 at 2:30 PM Zhengyu Gu <zgu at redhat.com> wrote:

> Ping ... May I get a second review?
>
> Thanks,
>
> -Zhengyu
>
> On 11/21/19 12:12 PM, yumin qi wrote:
> > Hi, Zhengyu
> >
> >    The fix looks good to me.
> >
> > Thanks
> > Yumin
> >
> >
> >
> > On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com
> > <mailto:zgu at redhat.com>> wrote:
> >
> >     JDK-8204128 did not fix the original bug. But new assertion helped to
> >     catch the problem, as it consistently failed in Oracle internal
> tests.
> >
> >     The root cause is that, when NMT biases a resource area to compiler,
> it
> >     did not adjust tracking data to reflect that. When the biased
> resource
> >     area is released, there is a possibility that its size is greater
> than
> >     total size recorded, and underflow a size_t counter.
> >
> >     JDK-8204128 patch also missed a long to ssize_t parameter type
> change,
> >     that resulted new test failure on Windows, because long is 4-bytes on
> >     Windows.
> >
> >     Many thanks to Leonid Mesnik, who helped to run this patch through
> >     Oracle's internal stress tests.
> >
> >     Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
> >     Webrev:
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
> >
> >
> >     Test:
> >         hotspot_nmt
> >         Submit test
> >         Oracle internal stress tests.
> >
> >
> >     Thanks,
> >
> >     -Zhengyu
> >
>
>

From david.holmes at oracle.com  Mon Nov 25 22:16:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 08:16:22 +1000
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <0b24ca43-1f36-6435-f63a-f495998abf11@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <61034f63-0843-a531-4da6-fe4064cdb357@oracle.com>
 <0b24ca43-1f36-6435-f63a-f495998abf11@oracle.com>
Message-ID: <97a6fb3b-e572-fad3-a4ec-75548d1cabf6@oracle.com>

Hi Patricio,

v2 looks good!

Thanks,
David

On 26/11/2019 4:48 am, Patricio Chilano wrote:
> Hi David,
> 
> On 11/25/19 1:46 AM, David Holmes wrote:
>> Hi Patricio,
>>
>> On 23/11/2019 4:25 am, Patricio Chilano wrote:
>>> Hi,
>>>
>>> This patch aims to address a current bug where, given the right 
>>> combination of handshakes and external suspend/resume, a JavaThread 
>>> can transition from a safe state back to Java without blocking for a 
>>> still-in-progress handshake. In the description of the bug I added an 
>>> example, tracing the state changes of the JavaThread as it goes 
>>> through the different transitions until it escapes the handshake. 
>>> Currently, the window of time for this issue to happen is so small 
>>> that we do not see actual failures running tests. Running test 
>>> SuspendAtExit.java and adding some small delay before restoring the 
>>> JavaThread state in java_suspend_self_with_safepoint_check() can 
>>> demonstrate the issue.
>>
>> Good catch. This highlights how difficult it is to see where all the 
>> thread-state-transitions are and reason about what can and can't 
>> happen in a given sequence of code.
>>
>>> The proposed fix is to check again if we have a pending/in-progress 
>>> handshake operation after executing ~ThreadInVMForHandshake().
>>
>> Minor nit but given we end up calling process_self_inner only after it 
>> was determined the current _handshake has_operation(), then the while 
>> loop should really be a do-while loop ?
> Right, changed!
> 
> Here is v2:
> http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/ 
> <http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/src/hotspot/share/runtime/handshake.cpp.udiff.html>
> 
> Thanks for looking at this David!
> 
> Patricio
>> Thanks,
>> David
>> -----
>>
>>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
>>> and Solaris).
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>>>
>>> Thanks,
>>> Patricio
> 

From david.holmes at oracle.com  Mon Nov 25 22:30:25 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 08:30:25 +1000
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
Message-ID: <e4b4bb6e-09d5-cec4-e9fa-62d939dd12ed@oracle.com>

Hi Harold,

On 26/11/2019 3:13 am, Harold Seigel wrote:
> Hi,
> 
> Please review this small change to improve the granularity of verifier 
> logging.? This change provides brief output for log level info and 
> detailed logging for log levels debug and trace. Additionally, it 
> changes verifier test TraceClassRes.java to use the logging API command 
> line options.

Deciding what to log at what level is highly subjective :) This change 
seems okay though as anyone who wants the current output can enable 
"debug" logging for verification and won't then get a tonne of other 
stuff they didn't want.

The new test functionality could be added to the existing:

./hotspot/jtreg/runtime/logging/VerificationTest.java

Thanks,
David
-----

> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
> 
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
> 
> The fix was regression tested by running Mach5 tiers 1 and 2 tests and 
> builds on Linux-x64, Solaris, Windows, and Mac OS X, by running Mach5 
> tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on Linux-x64.
> 
> Thanks, Harold
> 

From calvin.cheung at oracle.com  Mon Nov 25 23:22:51 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Mon, 25 Nov 2019 15:22:51 -0800
Subject: RFR(XS) 8234539 ArchiveRelocationTest.java failed: Archive
 mapping should always succeed
In-Reply-To: <24e62e72-bc2f-3045-45d8-a778271d6c2d@oracle.com>
References: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>
 <698fac17-b84e-a5ab-44c2-ccfb08bbfe27@oracle.com>
 <24e62e72-bc2f-3045-45d8-a778271d6c2d@oracle.com>
Message-ID: <a1132ee2-0fac-46e2-3172-4dd23027cde6@oracle.com>

Hi Ioi,

This seems good.

Just wondering are the following 'if' checks necessary in 
metaspaceShared.cpp?

2155?????? if (static_result == MAP_ARCHIVE_SUCCESS) {
2156???????? static_result = MAP_ARCHIVE_MMAP_FAILURE;
2157?????? }
2158?????? if (dynamic_result == MAP_ARCHIVE_SUCCESS) {
2159???????? dynamic_result = MAP_ARCHIVE_MMAP_FAILURE;
2160?????? }

The checks weren't there in filemap.cpp. Also, the caller won't try 
map_archives() again if the result is not MAP_ARCHIVE_MMAP_FAILURE.

thanks,

Calvin

On 11/22/19 5:46 PM, Ioi Lam wrote:
> Hi Calvin,
>
> Thanks for the review. It turned out that I needed to fix another 
> (addr_delta == 0) bug in the code. I've also moved the handling of 
> ArchiveRelocationMode==1 in debug builds to 
> MetaspaceShared::map_archives(). This way, we can simulate the 
> "mapping failure" after all archives have been mapped. This way, we 
> can better test the code that unmap the archives after the initial 
> mapping failures.
>
> Here's the updated patch.
> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v02/ 
>
>
> I am running tier4-rt-cds-relocation multiple times to make sure 
> 8234539 is no longer triggered on Windows.
>
> Thanks
> - Ioi
>
> On 11/22/2019 11:23 AM, Calvin Cheung wrote:
>> Hi Ioi,
>>
>> The fix looks good.
>>
>> thanks,
>>
>> Calvin
>>
>> On 11/21/19 2:58 PM, Ioi Lam wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8234539
>>> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v01/ 
>>>
>>>
>>> This bug happens only on Windows. The fix is one-line -- in order to 
>>> check
>>> whether "This is the second time we try to map the archive(s)", 
>>> instead of
>>> using (addr_delta != 0), the correct condition is 
>>> (rs.is_reserved()). Please
>>> see the bug report for details.
>>>
>>> I also improve the log messages when error happens.
>>>
>>> Thanks
>>> - Ioi
>

From ioi.lam at oracle.com  Mon Nov 25 23:43:44 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 25 Nov 2019 15:43:44 -0800
Subject: RFR(XS) 8234539 ArchiveRelocationTest.java failed: Archive
 mapping should always succeed
In-Reply-To: <a1132ee2-0fac-46e2-3172-4dd23027cde6@oracle.com>
References: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>
 <698fac17-b84e-a5ab-44c2-ccfb08bbfe27@oracle.com>
 <24e62e72-bc2f-3045-45d8-a778271d6c2d@oracle.com>
 <a1132ee2-0fac-46e2-3172-4dd23027cde6@oracle.com>
Message-ID: <631c5088-35f9-cfd8-4e4a-19683ba639d0@oracle.com>

Hi Calvin,

Thanks for the review.

On 11/25/19 3:22 PM, Calvin Cheung wrote:
> Hi Ioi,
>
> This seems good.
>
> Just wondering are the following 'if' checks necessary in 
> metaspaceShared.cpp?
>
> 2155?????? if (static_result == MAP_ARCHIVE_SUCCESS) {
> 2156???????? static_result = MAP_ARCHIVE_MMAP_FAILURE;
> 2157?????? }
> 2158?????? if (dynamic_result == MAP_ARCHIVE_SUCCESS) {
> 2159???????? dynamic_result = MAP_ARCHIVE_MMAP_FAILURE;
> 2160?????? }
>

The checks for (static_result == MAP_ARCHIVE_SUCCESS) is to make sure we 
aren't in the MAP_ARCHIVE_OTHER_FAILURE state, which could happen if 
archive CRC check failed, classpath validation failed, etc.


> The checks weren't there in filemap.cpp. 

The the old code (removed by this patch) was at a point that no error 
has appended, so we are implicitly in the MAP_ARCHIVE_SUCCESS state.

> Also, the caller won't try map_archives() again if the result is not 
> MAP_ARCHIVE_MMAP_FAILURE.

That's the intended behavior. If CRC check has failed, for example, even 
if we retry mapping, we will get the same failure again.

Thanks
- Ioi

>
> thanks,
>
> Calvin
>
> On 11/22/19 5:46 PM, Ioi Lam wrote:
>> Hi Calvin,
>>
>> Thanks for the review. It turned out that I needed to fix another 
>> (addr_delta == 0) bug in the code. I've also moved the handling of 
>> ArchiveRelocationMode==1 in debug builds to 
>> MetaspaceShared::map_archives(). This way, we can simulate the 
>> "mapping failure" after all archives have been mapped. This way, we 
>> can better test the code that unmap the archives after the initial 
>> mapping failures.
>>
>> Here's the updated patch.
>> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v02/ 
>>
>>
>> I am running tier4-rt-cds-relocation multiple times to make sure 
>> 8234539 is no longer triggered on Windows.
>>
>> Thanks
>> - Ioi
>>
>> On 11/22/2019 11:23 AM, Calvin Cheung wrote:
>>> Hi Ioi,
>>>
>>> The fix looks good.
>>>
>>> thanks,
>>>
>>> Calvin
>>>
>>> On 11/21/19 2:58 PM, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8234539
>>>> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v01/ 
>>>>
>>>>
>>>> This bug happens only on Windows. The fix is one-line -- in order 
>>>> to check
>>>> whether "This is the second time we try to map the archive(s)", 
>>>> instead of
>>>> using (addr_delta != 0), the correct condition is 
>>>> (rs.is_reserved()). Please
>>>> see the bug report for details.
>>>>
>>>> I also improve the log messages when error happens.
>>>>
>>>> Thanks
>>>> - Ioi
>>


From patricio.chilano.mateo at oracle.com  Tue Nov 26 00:53:30 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Mon, 25 Nov 2019 21:53:30 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <97a6fb3b-e572-fad3-a4ec-75548d1cabf6@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <61034f63-0843-a531-4da6-fe4064cdb357@oracle.com>
 <0b24ca43-1f36-6435-f63a-f495998abf11@oracle.com>
 <97a6fb3b-e572-fad3-a4ec-75548d1cabf6@oracle.com>
Message-ID: <f5e79edc-b4e1-813f-1c17-3c646cc4f233@oracle.com>

Thanks David!

Patricio
On 11/25/19 5:16 PM, David Holmes wrote:
> Hi Patricio,
>
> v2 looks good!
>
> Thanks,
> David
>
> On 26/11/2019 4:48 am, Patricio Chilano wrote:
>> Hi David,
>>
>> On 11/25/19 1:46 AM, David Holmes wrote:
>>> Hi Patricio,
>>>
>>> On 23/11/2019 4:25 am, Patricio Chilano wrote:
>>>> Hi,
>>>>
>>>> This patch aims to address a current bug where, given the right 
>>>> combination of handshakes and external suspend/resume, a JavaThread 
>>>> can transition from a safe state back to Java without blocking for 
>>>> a still-in-progress handshake. In the description of the bug I 
>>>> added an example, tracing the state changes of the JavaThread as it 
>>>> goes through the different transitions until it escapes the 
>>>> handshake. Currently, the window of time for this issue to happen 
>>>> is so small that we do not see actual failures running tests. 
>>>> Running test SuspendAtExit.java and adding some small delay before 
>>>> restoring the JavaThread state in 
>>>> java_suspend_self_with_safepoint_check() can demonstrate the issue.
>>>
>>> Good catch. This highlights how difficult it is to see where all the 
>>> thread-state-transitions are and reason about what can and can't 
>>> happen in a given sequence of code.
>>>
>>>> The proposed fix is to check again if we have a pending/in-progress 
>>>> handshake operation after executing ~ThreadInVMForHandshake().
>>>
>>> Minor nit but given we end up calling process_self_inner only after 
>>> it was determined the current _handshake has_operation(), then the 
>>> while loop should really be a do-while loop ?
>> Right, changed!
>>
>> Here is v2:
>> http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/ 
>> <http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/src/hotspot/share/runtime/handshake.cpp.udiff.html>
>>
>> Thanks for looking at this David!
>>
>> Patricio
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
>>>> and Solaris).
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>>>>
>>>> Thanks,
>>>> Patricio
>>


From zgu at redhat.com  Tue Nov 26 01:20:24 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 25 Nov 2019 20:20:24 -0500
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <CAA-vtUw5VMStBAQwA2tvsyyCPgz48g5s3hEDetebSNfMCBgnUA@mail.gmail.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
 <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>
 <CAA-vtUw5VMStBAQwA2tvsyyCPgz48g5s3hEDetebSNfMCBgnUA@mail.gmail.com>
Message-ID: <c095dc25-faee-f605-b1f1-ebc8ec3cb7ff@redhat.com>

Hi Thomas,

Thanks for reviewing.

On 11/25/19 3:54 PM, Thomas St?fe wrote:
> Hi Zhengyu,
> 
> Hi Zhengyu,
> 
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/src/hotspot/share/memory/arena.cpp.udiff.html
> 
> not sure I understand this change - why is it needed?:
> 
> @@ -360,11 +360,12 @@
>  ? ?}
>  ? ?if (k) k->set_next(_chunk); ? // Append new chunk to end of linked list
>  ? ?else _first = _chunk;
>  ? ?_hwm ?= _chunk->bottom(); ? ? // Save the cached hwm, max
>  ? ?_max = ?_chunk->top();
> - ?set_size_in_bytes(size_in_bytes() + len);
> + ?size_t new_size = size_in_bytes() + _chunk->length();
> + ?set_size_in_bytes(new_size);
>  ? ?void* result = _hwm;
>  ? ?_hwm += x;
>  ? ?return result;
>  ?}

Right, it is not needed. Removed.

> 
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/src/hotspot/share/services/mallocTracker.hpp.udiff.html
> 
> - ?inline void resize(long sz) {
> + ?inline void resize(ssize_t sz) {
>  ? ? ?if (sz != 0) {
> + ? ? ?assert(sz >= 0 || _size >= size_t(-sz), "Must be");
>  ? ? ? ?Atomic::add(size_t(sz), &_size);
>  ? ? ? ?DEBUG_ONLY(_peak_size = MAX2(_size, _peak_size);)
>  ? ? ?}
>  ? ?}
> 
> assert looks fine but the Atomic::add() took me by surprise: when size 
> is reduced, we feed it knowingly a negative number we then to unsigned 
> and rely on the overflow? Just a nit, but would Atomic::sub() not be 
> clearer here?

Because Atomic::sub() is implemented with Atomic::add()

http://hg.openjdk.java.net/jdk/jdk/file/f34ad283fcd6/src/hotspot/share/runtime/atomic.hpp#l560


Atomic::sub(-sz, &_size) => Atomic::add(-(-sz), &_size) => 
Atomic::add(size_t(sz), &_size)

> 
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/test/hotspot/jtreg/runtime/NMT/HugeArenaTracking.java.html
> 
> The test is fine as it is. Some thoughts (feel free to ignore them):
> 
> - would be more interesting to shake the boat a little by varying 
> increase rate, e.g.:
> 
>  ? 60 ? ? // Allocate 2GB+ from arena
>  ? 61 ? ? long total = 0;
>  ? 62 ? ? while (total < 2 * GB) {
>  ? ? ? ? ? ?long increase = <random_number_between_some_bytes_and_I dont 
> know, 100M? Capped at 2 GB>
>  ? 63 ? ? ? wb.NMTArenaMalloc(arena1, increase);
>  ? 64 ? ? ? total += increase;
>  ? 65 ? ? }
> 
> and maybe test jcmd VM.native_memory before this point? I know its 
> annoying since the numbers are probably not exact, and parsing would be 
> necessary. Just an idea.

Right, we certainly can.

Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.01/

New patch also passed submit test:
[Mach5] mach5-one-zgu-JDK-8234270-2-20191126-0012-6997789: PASSED


-Zhengyu


> 
>  ? 66 ? ? wb.NMTFreeArena(arena1);
> 
> On Mon, Nov 25, 2019 at 2:30 PM Zhengyu Gu <zgu at redhat.com 
> <mailto:zgu at redhat.com>> wrote:
> 
>     Ping ... May I get a second review?
> 
>     Thanks,
> 
>     -Zhengyu
> 
>     On 11/21/19 12:12 PM, yumin qi wrote:
>      > Hi, Zhengyu
>      >
>      >? ? The fix looks good to me.
>      >
>      > Thanks
>      > Yumin
>      >
>      >
>      >
>      > On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com
>     <mailto:zgu at redhat.com>
>      > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>> wrote:
>      >
>      >? ? ?JDK-8204128 did not fix the original bug. But new assertion
>     helped to
>      >? ? ?catch the problem, as it consistently failed in Oracle
>     internal tests.
>      >
>      >? ? ?The root cause is that, when NMT biases a resource area to
>     compiler, it
>      >? ? ?did not adjust tracking data to reflect that. When the biased
>     resource
>      >? ? ?area is released, there is a possibility that its size is
>     greater than
>      >? ? ?total size recorded, and underflow a size_t counter.
>      >
>      >? ? ?JDK-8204128 patch also missed a long to ssize_t parameter
>     type change,
>      >? ? ?that resulted new test failure on Windows, because long is
>     4-bytes on
>      >? ? ?Windows.
>      >
>      >? ? ?Many thanks to Leonid Mesnik, who helped to run this patch
>     through
>      >? ? ?Oracle's internal stress tests.
>      >
>      >? ? ?Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
>      >? ? ?Webrev:
>     http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
>      >
>      >
>      >? ? ?Test:
>      >? ? ? ? ?hotspot_nmt
>      >? ? ? ? ?Submit test
>      >? ? ? ? ?Oracle internal stress tests.
>      >
>      >
>      >? ? ?Thanks,
>      >
>      >? ? ?-Zhengyu
>      >
> 


From thomas.stuefe at gmail.com  Tue Nov 26 08:43:23 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 26 Nov 2019 09:43:23 +0100
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <c095dc25-faee-f605-b1f1-ebc8ec3cb7ff@redhat.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
 <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>
 <CAA-vtUw5VMStBAQwA2tvsyyCPgz48g5s3hEDetebSNfMCBgnUA@mail.gmail.com>
 <c095dc25-faee-f605-b1f1-ebc8ec3cb7ff@redhat.com>
Message-ID: <CAA-vtUyyezqx+X4yq4zLA7KDKNrQYnTOdvKUBFt3NSv=arZfKg@mail.gmail.com>

Hi Zhengyu,


On Tue, Nov 26, 2019 at 2:20 AM Zhengyu Gu <zgu at redhat.com> wrote:

> Hi Thomas,
>
> Thanks for reviewing.
>
> On 11/25/19 3:54 PM, Thomas St?fe wrote:
> > Hi Zhengyu,
> >
> > Hi Zhengyu,
> >
> >
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/src/hotspot/share/memory/arena.cpp.udiff.html
> >
> > not sure I understand this change - why is it needed?:
> >
> > @@ -360,11 +360,12 @@
> >     }
> >     if (k) k->set_next(_chunk);   // Append new chunk to end of linked
> list
> >     else _first = _chunk;
> >     _hwm  = _chunk->bottom();     // Save the cached hwm, max
> >     _max =  _chunk->top();
> > -  set_size_in_bytes(size_in_bytes() + len);
> > +  size_t new_size = size_in_bytes() + _chunk->length();
> > +  set_size_in_bytes(new_size);
> >     void* result = _hwm;
> >     _hwm += x;
> >     return result;
> >   }
>
> Right, it is not needed. Removed.
>
> >
> >
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/src/hotspot/share/services/mallocTracker.hpp.udiff.html
> >
> > -  inline void resize(long sz) {
> > +  inline void resize(ssize_t sz) {
> >       if (sz != 0) {
> > +      assert(sz >= 0 || _size >= size_t(-sz), "Must be");
> >         Atomic::add(size_t(sz), &_size);
> >         DEBUG_ONLY(_peak_size = MAX2(_size, _peak_size);)
> >       }
> >     }
> >
> > assert looks fine but the Atomic::add() took me by surprise: when size
> > is reduced, we feed it knowingly a negative number we then to unsigned
> > and rely on the overflow? Just a nit, but would Atomic::sub() not be
> > clearer here?
>
> Because Atomic::sub() is implemented with Atomic::add()
>
>
> http://hg.openjdk.java.net/jdk/jdk/file/f34ad283fcd6/src/hotspot/share/runtime/atomic.hpp#l560
>
>
> Atomic::sub(-sz, &_size) => Atomic::add(-(-sz), &_size) =>
> Atomic::add(size_t(sz), &_size)
>
>
Okay.


> >
> >
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/test/hotspot/jtreg/runtime/NMT/HugeArenaTracking.java.html
> >
> > The test is fine as it is. Some thoughts (feel free to ignore them):
> >
> > - would be more interesting to shake the boat a little by varying
> > increase rate, e.g.:
> >
> >    60     // Allocate 2GB+ from arena
> >    61     long total = 0;
> >    62     while (total < 2 * GB) {
> >             long increase = <random_number_between_some_bytes_and_I dont
> > know, 100M? Capped at 2 GB>
> >    63       wb.NMTArenaMalloc(arena1, increase);
> >    64       total += increase;
> >    65     }
> >
> > and maybe test jcmd VM.native_memory before this point? I know its
> > annoying since the numbers are probably not exact, and parsing would be
> > necessary. Just an idea.
>
> Right, we certainly can.
>
> Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.01/
>
>
output.shouldContain("Test (reserved=2GB, committed=2GB)");

Does this work? Does the output round sufficiently to always show "2G" even
though the total can jitter by +- 10M?


>
> New patch also passed submit test:
> [Mach5] mach5-one-zgu-JDK-8234270-2-20191126-0012-6997789: PASSED
>
>
Some more remarks and questions:

http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.01/src/hotspot/share/memory/resourceArea.cpp.udiff.html
@@ -31,10 +31,13 @@

 void ResourceArea::bias_to(MEMFLAGS new_flags) {
   if (new_flags != _flags) {
     MemTracker::record_arena_free(_flags);
     MemTracker::record_new_arena(new_flags);
+    size_t size = size_in_bytes();
+    MemTracker::record_arena_size_change(-ssize_t(size), _flags);  (A)
+    MemTracker::record_arena_size_change(ssize_t(size), new_flags);
     _flags = new_flags;
   }

Just aesthetics, but coding would be easier to understand if you reordered
things:

   if (new_flags != _flags) {
+    size_t size = size_in_bytes();
+    MemTracker::record_arena_size_change(-ssize_t(size), _flags);  (A)
     MemTracker::record_arena_free(_flags);
     MemTracker::record_new_arena(new_flags);
+    MemTracker::record_arena_size_change(ssize_t(size), new_flags);
     _flags = new_flags;
   }

or, if you were extending record_arena_free/record_new_arena to take
last/initial arena size too and pass that on to
MallocMemory::deallocate()/allocate().

But I leave it up tp you if you change this. If you just reorder the calls,
I do not need another Webrev,

...

Another unrelated question, what is the reason for the unusual creation of
MallocMemorySummarySnapshot with placement new? Why not just put it as a
member into MallocMemorySummary? I must be missing something.


> -Zhengyu
>
>
>
> >
> >    66     wb.NMTFreeArena(arena1);
> >
> > On Mon, Nov 25, 2019 at 2:30 PM Zhengyu Gu <zgu at redhat.com
> > <mailto:zgu at redhat.com>> wrote:
> >
> >     Ping ... May I get a second review?
> >
> >     Thanks,
> >
> >     -Zhengyu
> >
> >     On 11/21/19 12:12 PM, yumin qi wrote:
> >      > Hi, Zhengyu
> >      >
> >      >    The fix looks good to me.
> >      >
> >      > Thanks
> >      > Yumin
> >      >
> >      >
> >      >
> >      > On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com
> >     <mailto:zgu at redhat.com>
> >      > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>> wrote:
> >      >
> >      >     JDK-8204128 did not fix the original bug. But new assertion
> >     helped to
> >      >     catch the problem, as it consistently failed in Oracle
> >     internal tests.
> >      >
> >      >     The root cause is that, when NMT biases a resource area to
> >     compiler, it
> >      >     did not adjust tracking data to reflect that. When the biased
> >     resource
> >      >     area is released, there is a possibility that its size is
> >     greater than
> >      >     total size recorded, and underflow a size_t counter.
> >      >
> >      >     JDK-8204128 patch also missed a long to ssize_t parameter
> >     type change,
> >      >     that resulted new test failure on Windows, because long is
> >     4-bytes on
> >      >     Windows.
> >      >
> >      >     Many thanks to Leonid Mesnik, who helped to run this patch
> >     through
> >      >     Oracle's internal stress tests.
> >      >
> >      >     Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
> >      >     Webrev:
> >     http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
> >      >
> >      >
> >      >     Test:
> >      >         hotspot_nmt
> >      >         Submit test
> >      >         Oracle internal stress tests.
> >      >
> >      >
> >      >     Thanks,
> >      >
> >      >     -Zhengyu
> >      >
> >
>
>

From ralf.schmelter at sap.com  Tue Nov 26 09:30:37 2019
From: ralf.schmelter at sap.com (Schmelter, Ralf)
Date: Tue, 26 Nov 2019 09:30:37 +0000
Subject: RFR (M) 8234510: Remove file seeking requirement for writing a
 heap dump
In-Reply-To: <866ba7da-c16f-223d-0fc4-64b7ab69f831@oracle.com>
References: <AM0PR02MB45008C66EC315E9836F7FF7A9F4A0@AM0PR02MB4500.eurprd02.prod.outlook.com>
 <866ba7da-c16f-223d-0fc4-64b7ab69f831@oracle.com>
Message-ID: <AM0PR02MB45007617CF26A9D65297BAD79F450@AM0PR02MB4500.eurprd02.prod.outlook.com>

Hi Larry,

there should be no compatibility impact. The hprof format stayed the same, just the heap dump segments we write are smaller on average and more frequent.

I tested the created heap dumps with the jtreg test (the former jhat code), memory analyzer from eclipse, heap hero (an online heap analyzer) and visual VM. All without problems.

Best regards,
Ralf

-----Original Message-----
From: Laurence Cable <larry.cable at oracle.com> 
Sent: Montag, 25. November 2019 18:11
To: Schmelter, Ralf <ralf.schmelter at sap.com>; OpenJDK Serviceability <serviceability-dev at openjdk.java.net>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR (M) 8234510: Remove file seeking requirement for writing a heap dump

What (if any) is the compatibility impact of this change on tools 
consuming the heap dump format?

Thanks

- Larry

From claes.redestad at oracle.com  Tue Nov 26 09:44:36 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 26 Nov 2019 10:44:36 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
Message-ID: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>

Hi,

in various places in the hotspot we have custom code to calculate the
next power of two, some of which have potential to go into an infinite 
loop in case of an overflow.

This patch proposes adding next_power_of_two utility methods which
avoid infinite loops on overflow, while providing slightly more
efficient code in most cases.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8234331
Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/

Testing: tier1-3

Thanks!

/Claes

From david.holmes at oracle.com  Tue Nov 26 09:50:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 19:50:22 +1000
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
Message-ID: <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>

Hi Claes,

Just some high-level comments

- should next_power_of_two be defined in globalDefinitions.hpp along 
side the related functionality ie is_power_of_two ?

- can next_power_of_two build on the existing log2_* functions (or vice 
versa)?

- do the existing ZUtils not cover the same general area?

./share/gc/z/zUtils.inline.hpp

inline size_t ZUtils::round_up_power_of_2(size_t value) {
   assert(value != 0, "Invalid value");

   if (is_power_of_2(value)) {
     return value;
   }

   return (size_t)1 << (log2_intptr(value) + 1);
}

inline size_t ZUtils::round_down_power_of_2(size_t value) {
   assert(value != 0, "Invalid value");
   return (size_t)1 << log2_intptr(value);
}


Cheers,
David

On 26/11/2019 7:44 pm, Claes Redestad wrote:
> Hi,
> 
> in various places in the hotspot we have custom code to calculate the
> next power of two, some of which have potential to go into an infinite 
> loop in case of an overflow.
> 
> This patch proposes adding next_power_of_two utility methods which
> avoid infinite loops on overflow, while providing slightly more
> efficient code in most cases.
> 
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234331
> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
> 
> Testing: tier1-3
> 
> Thanks!
> 
> /Claes

From claes.redestad at oracle.com  Tue Nov 26 10:06:29 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 26 Nov 2019 11:06:29 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
Message-ID: <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>


On 2019-11-26 10:50, David Holmes wrote:
> Hi Claes,
> 
> Just some high-level comments
> 
> - should next_power_of_two be defined in globalDefinitions.hpp along 
> side the related functionality ie is_power_of_two ?

I thought we are trying to move things _out_ of globalDefinitions. I
agree align.hpp might not be the best place, either, though..

> 
> - can next_power_of_two build on the existing log2_* functions (or vice 
> versa)?

Yes, log2_intptr et al could probably be tamed to do a single step
operation, although we'd need to add 64-bit implementations in
count_leading_zeros. At least these log2_* functions already deal with
overflows without looping forever.

> 
> - do the existing ZUtils not cover the same general area?
> 
> ./share/gc/z/zUtils.inline.hpp
> 
> inline size_t ZUtils::round_up_power_of_2(size_t value) {
>  ? assert(value != 0, "Invalid value");
> 
>  ? if (is_power_of_2(value)) {
>  ??? return value;
>  ? }
> 
>  ? return (size_t)1 << (log2_intptr(value) + 1);
> }
> 
> inline size_t ZUtils::round_down_power_of_2(size_t value) {
>  ? assert(value != 0, "Invalid value");
>  ? return (size_t)1 << log2_intptr(value);
> }

round_up_power_of_2 is similar, but not identical (next_power_of_two 
doesn't care if the value is already a power of 2, nor should it).

/Claes

From ivan.gerasimov at oracle.com  Tue Nov 26 10:14:11 2019
From: ivan.gerasimov at oracle.com (Ivan Gerasimov)
Date: Tue, 26 Nov 2019 02:14:11 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
Message-ID: <603eb8ec-ea48-42ae-1c6e-92c2165133b7@oracle.com>

Hi Claes!

In the code in align.hpp it is assumed that (1U << 32) == 0, which is 
not guaranteed.

In fact, if the right argument of the shift operator is >= 32 (for 
32-bit left argument) then the behavior is undefined, and thus is 
compiler specific.

With kind regards,

Ivan


On 11/26/19 1:44 AM, Claes Redestad wrote:
> Hi,
>
> in various places in the hotspot we have custom code to calculate the
> next power of two, some of which have potential to go into an infinite 
> loop in case of an overflow.
>
> This patch proposes adding next_power_of_two utility methods which
> avoid infinite loops on overflow, while providing slightly more
> efficient code in most cases.
>
> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8234331
> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes

-- 
With kind regards,
Ivan Gerasimov


From david.holmes at oracle.com  Tue Nov 26 10:23:12 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 20:23:12 +1000
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
 <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>
Message-ID: <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com>

On 26/11/2019 8:06 pm, Claes Redestad wrote:
> 
> 
> On 2019-11-26 10:50, David Holmes wrote:
>> Hi Claes,
>>
>> Just some high-level comments
>>
>> - should next_power_of_two be defined in globalDefinitions.hpp along 
>> side the related functionality ie is_power_of_two ?
> 
> I thought we are trying to move things _out_ of globalDefinitions. I

We are? I don't recall hearing that. But wherever these go seems they 
all belong together.

> agree align.hpp might not be the best place, either, though..

I thought align.hpp as strange place too. :)

>>
>> - can next_power_of_two build on the existing log2_* functions (or 
>> vice versa)?
> 
> Yes, log2_intptr et al could probably be tamed to do a single step
> operation, although we'd need to add 64-bit implementations in
> count_leading_zeros. At least these log2_* functions already deal with
> overflows without looping forever.
> 
>>
>> - do the existing ZUtils not cover the same general area?
>>
>> ./share/gc/z/zUtils.inline.hpp
>>
>> inline size_t ZUtils::round_up_power_of_2(size_t value) {
>> ?? assert(value != 0, "Invalid value");
>>
>> ?? if (is_power_of_2(value)) {
>> ???? return value;
>> ?? }
>>
>> ?? return (size_t)1 << (log2_intptr(value) + 1);
>> }
>>
>> inline size_t ZUtils::round_down_power_of_2(size_t value) {
>> ?? assert(value != 0, "Invalid value");
>> ?? return (size_t)1 << log2_intptr(value);
>> }
> 
> round_up_power_of_2 is similar, but not identical (next_power_of_two 
> doesn't care if the value is already a power of 2, nor should it).

Okay but seems perhaps these should also be moved out of ZUtils and 
co-located with the other "power of two" functions.

Cheers,
David
-----

> /Claes

From david.holmes at oracle.com  Tue Nov 26 10:30:20 2019
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 26 Nov 2019 20:30:20 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <9c0d3053-e134-3225-c719-34db0cdf8fac@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>
 <9cdcb5e0-a517-2193-e77d-ad024ac1d11f@oracle.com>
 <8271e3b6-0a94-4b6d-dfb8-3405f1534444@oracle.com>
 <9c0d3053-e134-3225-c719-34db0cdf8fac@oracle.com>
Message-ID: <574c3950-96ee-84ca-3079-7094e4fda989@oracle.com>

Hi Serguei,

Note that has_pending_exception() is not the same as having a pending 
async exception. This is what Robbin was clarifying in his other mail. 
The VM_StopThread sets the _pending_async_exception field, but that 
exception only becomes the _pending_exception when we execute specific 
thread-state transitions that check for the pending async exception - 
and we apparently do not execute that kind of transition in this code. 
Hence the assertion would not fire.

Cheers,
David

On 26/11/2019 8:10 pm, serguei.spitsyn at oracle.com wrote:
> Hi Robbin, Dan and David,
> 
> Sorry for being slow with this reply.
> Probably, it'd be Okay to reply just to this latest email from Robbin.
> 
> I agree with Dan that we basically have the assert issue which Dan has 
> spotted.
> Calling the JVM TI StopThread on the target thread (current thread is 
> target thread)
> should cause the assertion to fire:
>  ?????? src/hotspot/share/utilities/preserveException.cpp:
>  ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
> generated");
> 
> Below is just to make sure Robbin gets everything right ...
> 
> ------------------------------------------------------------
> 
> At build time we generate the jvmtiEnter.cpp with the JVM TI function 
> wrappers
> that check arguments and do state transitions.
> 
> The location is like this:
> build/linux-x86_64-server-release/hotspot/variant-server/gensrc/jvmtifiles/jvmtiEnter.cpp
> 
> For the StopThread we have this wrapper:
> 
> static jvmtiError JNICALL
> jvmti_StopThread(jvmtiEnv* env,
>  ??????????? jthread thread,
>  ??????????? jobject exception) {
> 
> #if !INCLUDE_JVMTI
>  ? return JVMTI_ERROR_NOT_AVAILABLE;
> #else
>  ? if(!JvmtiEnv::is_vm_live()) {
>  ??? return JVMTI_ERROR_WRONG_PHASE;
>  ? }
>  ? Thread* this_thread = Thread::current_or_null();
>  ? if (this_thread == NULL || !this_thread->is_Java_thread()) {
>  ??? return JVMTI_ERROR_UNATTACHED_THREAD;
>  ? }
>  ? JavaThread* current_thread = (JavaThread*)this_thread;
>  ? ThreadInVMfromNative __tiv(current_thread);
>  ? VM_ENTRY_BASE(jvmtiError, jvmti_StopThread , current_thread)
>  ? debug_only(VMNativeEntryWrapper __vew;)
> *? CautiouslyPreserveExceptionMark __em(this_thread);*
>  ? JvmtiEnv* jvmti_env = JvmtiEnv::JvmtiEnv_from_jvmti_env(env);
>  ? if (!jvmti_env->is_valid()) {
>  ??? return JVMTI_ERROR_INVALID_ENVIRONMENT;
>  ? }
> 
>  ? if (jvmti_env->get_capabilities()->can_signal_thread == 0) {
>  ??? return JVMTI_ERROR_MUST_POSSESS_CAPABILITY;
>  ? }
>  ? jvmtiError err;
>  ? JavaThread* java_thread = NULL;
>  ? ThreadsListHandle tlh(this_thread);
>  ??? err = JvmtiExport::cv_external_thread_to_JavaThread(tlh.list(), 
> thread, &java_thread, NULL);
>  ??? if (err != JVMTI_ERROR_NONE) {
>  ????? return err;
>  ??? }
>  ? err = jvmti_env->StopThread(java_thread, exception);
>  ? return err;
> #endif // INCLUDE_JVMTI
> }
> 
> 
> The VM_ThreadStop::doit() calls this:
>  ???? target->send_thread_stop(throwable());
> 
> which sets a pending async. exception in the target thread (which is 
> current thread in our case):
>  ????? // Set async. pending exception in thread.
>  ????? set_pending_async_exception(java_throwable);
> 
> 
> AFAIK the VM_ThreadStop is executed at a safepoint (not concurrently), 
> so that it should cause
> to fire the below assert in the CautiouslyPreserveExceptionMark desctructor:
> 
> CautiouslyPreserveExceptionMark::~CautiouslyPreserveExceptionMark() {
> *? assert(!_thread->has_pending_exception(), "unexpected exception 
> generated");*
>  ? if (_thread->has_pending_exception()) {
>  ??? _thread->clear_pending_exception();
>  ? }
>  ? if (_preserved_exception_oop() != NULL) {
>  ??? _thread->set_pending_exception(_preserved_exception_oop(), 
> _preserved_exception_file, _preserved_exception_line);
>  ? }
> }
> ------------------------------------------------------------------
> 
> 
> I'm not sure we have a test coverage for this case.
> It looks like the JVM TI StopThread was not designed to stop the current 
> thread.
> I think so because the NULL is not accepted as thread parameter to 
> designate current thread.
> The error JVMTI_ERROR_INVALID_THREAD has to be returned in such a case.
> But it is hard to say why this assumption was not spelled clearly in the 
> spec.
> 
> Overall, it does not look important to keep the StopThread correctly 
> working for target thread == current.
> It is because there is always an option to send a synchronous exception 
> to itself.
> 
> I don't know why this was not deprecated together with the Thread.stop().
> The JVM TI SuspendThread()/ResumeThread() also were not deprecated 
> together with Thread.suspend()/Thread.resume().
> I think, these functions are needed for debuggers.
> 
> Thanks,
> Serguei
> 
> 
> 
> 
> On 11/25/19 04:48, Robbin Ehn wrote:
>> Hi Serguei, thanks for having a look.
>>
>> AFAIK:
>> Today one of these three can happen when returning to agent (native).
>> - The target thread for stop have not yet installed the async exception.
>> - The target thread have installed the async exception, but not yet 
>> stopped.
>> - The target thread have installed the async exception and already 
>> stopped.
>>
>> A agent must handle all three possible scenarios.
>> This patch just removes the first scenario, and makes the installation 
>> part synchrone.
>>
>> Hope that helps!
>>
>> Note, I don't see the method "StopThread" being documented as either 
>> synchrone
>> or asynchrone itself.
>> The exception is documented as beeing asynchrone.
>>
>> And I don't think we follow the specs now, since we ignore the result of:
>> void VM_ThreadStop::doit()
>> One would expect e.g. JVMTI_ERROR_THREAD_NOT_ALIVE if we never deilver 
>> the async
>> exception. So making the async exception installation part synchrone 
>> would be a
>> step to fix that issue.
>>
>> Thanks, Robbin
>>
>> On 2019-11-25 12:45, serguei.spitsyn at oracle.com wrote:
>>> Please, skip my reply below.
>>> I need to read all emails carefully.
>>>
>>> Thanks,
>>> Serguei
>>>
>>> On 11/25/19 03:35, serguei.spitsyn at oracle.com wrote:
>>>> Hi Dan and Robbin,
>>>>
>>>> I can be wrong and missing something but it feels like there is no 
>>>> issue for JVMTI with this fix.
>>>>
>>>> > Off the top of head, I can't think of a way for a caller of
>>>> > Thread::send_async_exception() to determine that the call is now
>>>> > synchronous instead of asynchronous, but ...
>>>>
>>>> There can be some confusion here about what is synchronous relative to.
>>>> I read it this way:
>>>> ?It synchronous for the current thread which calls the 
>>>> send_async_exception().
>>>> ?However, it is asynchronous for the target thread that needs to be 
>>>> stopped.
>>>> ?So that the fix does not break the JVMTI spec requirements.
>>>>
>>>> Please, let me know if you agree (or not) with this reading.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>> On 11/22/19 13:50, Daniel D. Daugherty wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> Sorry I'm late to this review thread...
>>>>>
>>>>> I'm adding Serguei to this email thread since I'm making comments
>>>>> about the JVM/TI parts of this changeset...
>>>>>
>>>>>
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>>>>
>>>>>
>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>> ??? L148: ? // The ever running loop for the VMThread
>>>>> ??? L149: ? void loop();
>>>>> ??? L150: ? static void check_cleanup();
>>>>> ??????? nit - Feels like an odd place to add check_cleanup().
>>>>>
>>>>> ??????? Update: Now that I've seen what clean_up(), it needs a
>>>>> ??????? better name. Perhaps check_for_forced_cleanup()? And since
>>>>> ??????? it is supposed to affect the running loop for the VMThread
>>>>> ??????? I'm okay with its location now.
>>>>>
>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>> ??? L382: ? event->set_blocking(true);
>>>>> ??????? Probably have to keep the 'blocking' attribute in the event
>>>>> ??????? for backward compatibility in the JFR record format?
>>>>>
>>>>> ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
>>>>> regular intervals
>>>>> ??????? Is this comment true anymore (even before this changeset)?
>>>>> ??????? Adding this on the next line might help:
>>>>>
>>>>> ????????????????? // (if there is cleanup work to do)
>>>>>
>>>>> ??????? since I _think_ that's how the policy has been evolved...
>>>>>
>>>>> ??? L479: mu_queue.wait(GuaranteedSafepointInterval);
>>>>> ??????? Please prefix with "(void)" to make it clear you are
>>>>> ??????? intentionally ignoring the return value.
>>>>>
>>>>> ??? old L627-634 (We want to make sure that we get to a safepoint 
>>>>> regularly)
>>>>> ??????? I think this now old code is covered by your change above:
>>>>>
>>>>> ??????? L488: ??????? // If the queue contains a safepoint VM op,
>>>>> ??????? L489: ??????? // clean up will be done so we can skip this 
>>>>> part.
>>>>> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>>>>
>>>>> ??????? Please confirm that our thinking is the same here.
>>>>>
>>>>> ??? L661: ??? int ticket =? t->vm_operation_ticket();
>>>>> ??????? nit - extra space after '='
>>>>>
>>>>> ??? Okay. Definitely simpler code.
>>>>>
>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/runtime/safepoint.hpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>> ??? Definitely got my attention with
>>>>> ??? ObjectSynchronizer::needs_monitor_scavenge().
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
>>>>> triggering safepoint cleanup.");
>>>>> ??????? Thanks for adding the logging line.
>>>>>
>>>>> ?? ? ?? Update: As Kim pointed out, this code goes away when
>>>>> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>>>>> ? ? ? ? forward to making that change.
>>>>>
>>>>> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
>>>>> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
>>>>> ??????? nit - extra space between 'xchg ('
>>>>>
>>>>> ??????? Since InduceScavenge() is only called when the deprecated
>>>>> ??????? MonitorBound is specified, I think you could use cmpxchg()
>>>>> ??????? for clarity. Of course, you might be thinking that the
>>>>> ??????? pattern is a useful example for other folks to copy...
>>>>>
>>>>> src/hotspot/share/runtime/thread.cpp
>>>>> ??? old L527: // Enqueue a VM_Operation to do the job for us - 
>>>>> sometime later
>>>>> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
>>>>> java_throwable) {
>>>>> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>>>>> ??? L529: ? VMThread::execute(&vm_stop);
>>>>> ??? L530: }
>>>>> ?????? Okay so you deleted the comment about the call being async 
>>>>> and the
>>>>> ?????? VM op is no longer async, but does that break the 
>>>>> expectation of
>>>>> ?????? any callers?
>>>>>
>>>>> ?????? Off the top of head, I can't think of a way for a caller of
>>>>> ?????? Thread::send_async_exception() to determine that the call is 
>>>>> now
>>>>> ?????? synchronous instead of asynchronous, but ...
>>>>>
>>>>> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>>>>> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>>>>> ?????? used to throw an exception at the calling thread, I suspect 
>>>>> that
>>>>> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>>>>> ?????? With the exception throwing now being synchronous, would that
>>>>> ?????? affect the return value of the JVM/TI StopThread() call?
>>>>>
>>>>> ?????? Looks like the JVM/TI wrapper (see 
>>>>> gensrc/jvmtifiles/jvmtiEnter.cpp
>>>>> ?????? in the build directory) uses ThreadInVMfromNative so the 
>>>>> calling
>>>>> ?????? thread is in VM when it requests the now synchronous VM 
>>>>> operation.
>>>>> ?????? When it requests the VM op, the calling thread will block which
>>>>> ?????? should allow the VM thread to execute the op. No worries 
>>>>> there so
>>>>> ?????? far...
>>>>>
>>>>> ?????? It looks like the code also uses 
>>>>> CautiouslyPreserveExceptionMark
>>>>> ?????? so I think if the exception is delivered to the calling thread
>>>>> ?????? it won't affect the return from jvmti_env->StopThread(), 
>>>>> i.e., we
>>>>> ?????? will have our return value. The CautiouslyPreserveExceptionMark
>>>>> ?????? destructor won't kick in until we return from 
>>>>> jvmti_StopThread()
>>>>> ?????? (the JVM/TI wrapper from the build).
>>>>>
>>>>> ?????? However, that might cause this assertion to fire:
>>>>>
>>>>> ?????? src/hotspot/share/utilities/preserveException.cpp:
>>>>> ?????? assert(!_thread->has_pending_exception(), "unexpected 
>>>>> exception generated");
>>>>>
>>>>> ?????? because it is now detecting that an exception was thrown
>>>>> ?????? while executing a JVM/TI call. This is pure theory here.
>>>>>
>>>>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>>>>> ??? No comments.
>>>>>
>>>>> src/hotspot/share/runtime/biasedLocking.cpp
>>>>> ??? old L85: ??? // Use async VM operation to avoid blocking the 
>>>>> Watcher thread.
>>>>> ??????? Again, you've deleted the comment, but is there going to
>>>>> ??????? be any unexpected side effects from the change? Looks like
>>>>> ??????? the work consists of:
>>>>>
>>>>> ??????? L70: 
>>>>> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>>>>
>>>>> ??????? Is that going to be a problem for the WatcherThread?
>>>>>
>>>>> test/hotspot/gtest/threadHelper.inline.hpp
>>>>> ??? No comments.
>>>>>
>>>>> As David H. likes to say: the proof is in the building and testing.
>>>>>
>>>>> Thumbs up on the overall idea and implementation. There might be an
>>>>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>>>>> on my part...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>>>>> Hi Robbin,
>>>>>>>
>>>>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Here is v3:
>>>>>>>>
>>>>>>>> Full:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>>>>
>>>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>>>
>>>>>>> Looking at the highly discussed:
>>>>>>>
>>>>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>>>>> &ForceMonitorScavenge) == 0) {
>>>>>>>
>>>>>>> why isn't that just:
>>>>>>>
>>>>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>>>>
>>>>>>> ??
>>>>>>
>>>>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>>>>> Many threads can be enter and re-enter here.
>>>>>> I don't know if that's still the case.
>>>>>>
>>>>>> Since we only hit this path when the deprecated MonitorsBound is 
>>>>>> set, I think I can change it?
>>>>>>
>>>>>>>
>>>>>>> Also while we are here can we clean this up further:
>>>>>>>
>>>>>>> static volatile int ForceMonitorScavenge = 0;
>>>>>>>
>>>>>>> becomes
>>>>>>>
>>>>>>> static int _forceMonitorScavenge = 0;
>>>>>>>
>>>>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>>>>
>>>>>>
>>>>>> Sure!
>>>>>>
>>>>>>> Just to be clear, I understand the changes around monitor 
>>>>>>> scavenging now, though I'm not sure getting rid of async VM ops 
>>>>>>> and replacing with a new way to directly wakeup the VMThread 
>>>>>>> really amounts to a simplification.
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>>>
>>>>>>> I still think getting rid of Mode altogether would be a good 
>>>>>>> simplification. :)
>>>>>>
>>>>>> Sure!
>>>>>>
>>>>>> Here is v4, inc:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>>>>
>>>>>> Tested t1-3
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>
>>>>>>>> Inc:
>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>>>>
>>>>>>>> Tested t1-3
>>>>>>>>
>>>>>>>> Thanks, Robbin
>>>>>>>>
>>>>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>>>>> Hi all, please review.
>>>>>>>>>
>>>>>>>>> CMS was the last real user of the more advantage features of VM 
>>>>>>>>> operation.
>>>>>>>>> VM operation can be simplified to always be an stack object and 
>>>>>>>>> thus either be
>>>>>>>>> of safepoint or no safepoint type.
>>>>>>>>>
>>>>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should 
>>>>>>>>> stop for the safepoint anyways, no real point in not stopping 
>>>>>>>>> direct.
>>>>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint 
>>>>>>>>> cleanup, the VM op is not needed. Arguably this thread should 
>>>>>>>>> actually stop here, since we are about to safepoint.
>>>>>>>>>
>>>>>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>>>>>> method is removed.
>>>>>>>>> And the extra safepoint is removed:
>>>>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>>>>> No we don't :)
>>>>>>>>>
>>>>>>>>> Issue:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>>>>> Change-set:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>>>>
>>>>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>>>>
>>>>>>>>> Thanks, Robbin
>>>>>
>>>>
>>>
> 

From robbin.ehn at oracle.com  Tue Nov 26 10:49:00 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 26 Nov 2019 11:49:00 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <574c3950-96ee-84ca-3079-7094e4fda989@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>
 <9cdcb5e0-a517-2193-e77d-ad024ac1d11f@oracle.com>
 <8271e3b6-0a94-4b6d-dfb8-3405f1534444@oracle.com>
 <9c0d3053-e134-3225-c719-34db0cdf8fac@oracle.com>
 <574c3950-96ee-84ca-3079-7094e4fda989@oracle.com>
Message-ID: <b0d5bb94-2cda-3752-e691-982ef0b60d7c@oracle.com>

Thanks for adding that explaination David.

And if it could fire, it would do that without these changes.
Since this thread might reievce a thread stop (async exception) doing this VM op 
(either from it's own thread stop or a thread stop from another thread).

I'm almost passed t8 now and I cannot find any issues with this:
Inc with Dan's comments:
http://cr.openjdk.java.net/~rehn/8234086/v5/inc/webrev/
Full:
http://cr.openjdk.java.net/~rehn/8234086/v5/full/webrev/

Kim is okay with these changes.
Let me know if there is still concerns!

Thanks, Robbin

On 11/26/19 11:30 AM, David Holmes wrote:
> Hi Serguei,
> 
> Note that has_pending_exception() is not the same as having a pending async 
> exception. This is what Robbin was clarifying in his other mail. The 
> VM_StopThread sets the _pending_async_exception field, but that exception only 
> becomes the _pending_exception when we execute specific thread-state transitions 
> that check for the pending async exception - and we apparently do not execute 
> that kind of transition in this code. Hence the assertion would not fire.
> 
> Cheers,
> David
> 
> On 26/11/2019 8:10 pm, serguei.spitsyn at oracle.com wrote:
>> Hi Robbin, Dan and David,
>>
>> Sorry for being slow with this reply.
>> Probably, it'd be Okay to reply just to this latest email from Robbin.
>>
>> I agree with Dan that we basically have the assert issue which Dan has spotted.
>> Calling the JVM TI StopThread on the target thread (current thread is target 
>> thread)
>> should cause the assertion to fire:
>> ??????? src/hotspot/share/utilities/preserveException.cpp:
>> ??????? assert(!_thread->has_pending_exception(), "unexpected exception 
>> generated");
>>
>> Below is just to make sure Robbin gets everything right ...
>>
>> ------------------------------------------------------------
>>
>> At build time we generate the jvmtiEnter.cpp with the JVM TI function wrappers
>> that check arguments and do state transitions.
>>
>> The location is like this:
>> build/linux-x86_64-server-release/hotspot/variant-server/gensrc/jvmtifiles/jvmtiEnter.cpp 
>>
>>
>> For the StopThread we have this wrapper:
>>
>> static jvmtiError JNICALL
>> jvmti_StopThread(jvmtiEnv* env,
>> ???????????? jthread thread,
>> ???????????? jobject exception) {
>>
>> #if !INCLUDE_JVMTI
>> ?? return JVMTI_ERROR_NOT_AVAILABLE;
>> #else
>> ?? if(!JvmtiEnv::is_vm_live()) {
>> ???? return JVMTI_ERROR_WRONG_PHASE;
>> ?? }
>> ?? Thread* this_thread = Thread::current_or_null();
>> ?? if (this_thread == NULL || !this_thread->is_Java_thread()) {
>> ???? return JVMTI_ERROR_UNATTACHED_THREAD;
>> ?? }
>> ?? JavaThread* current_thread = (JavaThread*)this_thread;
>> ?? ThreadInVMfromNative __tiv(current_thread);
>> ?? VM_ENTRY_BASE(jvmtiError, jvmti_StopThread , current_thread)
>> ?? debug_only(VMNativeEntryWrapper __vew;)
>> *? CautiouslyPreserveExceptionMark __em(this_thread);*
>> ?? JvmtiEnv* jvmti_env = JvmtiEnv::JvmtiEnv_from_jvmti_env(env);
>> ?? if (!jvmti_env->is_valid()) {
>> ???? return JVMTI_ERROR_INVALID_ENVIRONMENT;
>> ?? }
>>
>> ?? if (jvmti_env->get_capabilities()->can_signal_thread == 0) {
>> ???? return JVMTI_ERROR_MUST_POSSESS_CAPABILITY;
>> ?? }
>> ?? jvmtiError err;
>> ?? JavaThread* java_thread = NULL;
>> ?? ThreadsListHandle tlh(this_thread);
>> ???? err = JvmtiExport::cv_external_thread_to_JavaThread(tlh.list(), thread, 
>> &java_thread, NULL);
>> ???? if (err != JVMTI_ERROR_NONE) {
>> ?????? return err;
>> ???? }
>> ?? err = jvmti_env->StopThread(java_thread, exception);
>> ?? return err;
>> #endif // INCLUDE_JVMTI
>> }
>>
>>
>> The VM_ThreadStop::doit() calls this:
>> ????? target->send_thread_stop(throwable());
>>
>> which sets a pending async. exception in the target thread (which is current 
>> thread in our case):
>> ?????? // Set async. pending exception in thread.
>> ?????? set_pending_async_exception(java_throwable);
>>
>>
>> AFAIK the VM_ThreadStop is executed at a safepoint (not concurrently), so that 
>> it should cause
>> to fire the below assert in the CautiouslyPreserveExceptionMark desctructor:
>>
>> CautiouslyPreserveExceptionMark::~CautiouslyPreserveExceptionMark() {
>> *? assert(!_thread->has_pending_exception(), "unexpected exception generated");*
>> ?? if (_thread->has_pending_exception()) {
>> ???? _thread->clear_pending_exception();
>> ?? }
>> ?? if (_preserved_exception_oop() != NULL) {
>> ???? _thread->set_pending_exception(_preserved_exception_oop(), 
>> _preserved_exception_file, _preserved_exception_line);
>> ?? }
>> }
>> ------------------------------------------------------------------
>>
>>
>> I'm not sure we have a test coverage for this case.
>> It looks like the JVM TI StopThread was not designed to stop the current thread.
>> I think so because the NULL is not accepted as thread parameter to designate 
>> current thread.
>> The error JVMTI_ERROR_INVALID_THREAD has to be returned in such a case.
>> But it is hard to say why this assumption was not spelled clearly in the spec.
>>
>> Overall, it does not look important to keep the StopThread correctly working 
>> for target thread == current.
>> It is because there is always an option to send a synchronous exception to 
>> itself.
>>
>> I don't know why this was not deprecated together with the Thread.stop().
>> The JVM TI SuspendThread()/ResumeThread() also were not deprecated together 
>> with Thread.suspend()/Thread.resume().
>> I think, these functions are needed for debuggers.
>>
>> Thanks,
>> Serguei
>>
>>
>>
>>
>> On 11/25/19 04:48, Robbin Ehn wrote:
>>> Hi Serguei, thanks for having a look.
>>>
>>> AFAIK:
>>> Today one of these three can happen when returning to agent (native).
>>> - The target thread for stop have not yet installed the async exception.
>>> - The target thread have installed the async exception, but not yet stopped.
>>> - The target thread have installed the async exception and already stopped.
>>>
>>> A agent must handle all three possible scenarios.
>>> This patch just removes the first scenario, and makes the installation part 
>>> synchrone.
>>>
>>> Hope that helps!
>>>
>>> Note, I don't see the method "StopThread" being documented as either synchrone
>>> or asynchrone itself.
>>> The exception is documented as beeing asynchrone.
>>>
>>> And I don't think we follow the specs now, since we ignore the result of:
>>> void VM_ThreadStop::doit()
>>> One would expect e.g. JVMTI_ERROR_THREAD_NOT_ALIVE if we never deilver the async
>>> exception. So making the async exception installation part synchrone would be a
>>> step to fix that issue.
>>>
>>> Thanks, Robbin
>>>
>>> On 2019-11-25 12:45, serguei.spitsyn at oracle.com wrote:
>>>> Please, skip my reply below.
>>>> I need to read all emails carefully.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>> On 11/25/19 03:35, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Dan and Robbin,
>>>>>
>>>>> I can be wrong and missing something but it feels like there is no issue 
>>>>> for JVMTI with this fix.
>>>>>
>>>>> > Off the top of head, I can't think of a way for a caller of
>>>>> > Thread::send_async_exception() to determine that the call is now
>>>>> > synchronous instead of asynchronous, but ...
>>>>>
>>>>> There can be some confusion here about what is synchronous relative to.
>>>>> I read it this way:
>>>>> ?It synchronous for the current thread which calls the send_async_exception().
>>>>> ?However, it is asynchronous for the target thread that needs to be stopped.
>>>>> ?So that the fix does not break the JVMTI spec requirements.
>>>>>
>>>>> Please, let me know if you agree (or not) with this reading.
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 11/22/19 13:50, Daniel D. Daugherty wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>> Sorry I'm late to this review thread...
>>>>>>
>>>>>> I'm adding Serguei to this email thread since I'm making comments
>>>>>> about the JVM/TI parts of this changeset...
>>>>>>
>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>>>>>
>>>>>>
>>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>>> ??? L148: ? // The ever running loop for the VMThread
>>>>>> ??? L149: ? void loop();
>>>>>> ??? L150: ? static void check_cleanup();
>>>>>> ??????? nit - Feels like an odd place to add check_cleanup().
>>>>>>
>>>>>> ??????? Update: Now that I've seen what clean_up(), it needs a
>>>>>> ??????? better name. Perhaps check_for_forced_cleanup()? And since
>>>>>> ??????? it is supposed to affect the running loop for the VMThread
>>>>>> ??????? I'm okay with its location now.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>>> ??? L382: ? event->set_blocking(true);
>>>>>> ??????? Probably have to keep the 'blocking' attribute in the event
>>>>>> ??????? for backward compatibility in the JFR record format?
>>>>>>
>>>>>> ??? L478: ??????? // wait with a timeout to guarantee safepoints at 
>>>>>> regular intervals
>>>>>> ??????? Is this comment true anymore (even before this changeset)?
>>>>>> ??????? Adding this on the next line might help:
>>>>>>
>>>>>> ????????????????? // (if there is cleanup work to do)
>>>>>>
>>>>>> ??????? since I _think_ that's how the policy has been evolved...
>>>>>>
>>>>>> ??? L479: mu_queue.wait(GuaranteedSafepointInterval);
>>>>>> ??????? Please prefix with "(void)" to make it clear you are
>>>>>> ??????? intentionally ignoring the return value.
>>>>>>
>>>>>> ??? old L627-634 (We want to make sure that we get to a safepoint regularly)
>>>>>> ??????? I think this now old code is covered by your change above:
>>>>>>
>>>>>> ??????? L488: ??????? // If the queue contains a safepoint VM op,
>>>>>> ??????? L489: ??????? // clean up will be done so we can skip this part.
>>>>>> ??????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>>>>>
>>>>>> ??????? Please confirm that our thinking is the same here.
>>>>>>
>>>>>> ??? L661: ??? int ticket =? t->vm_operation_ticket();
>>>>>> ??????? nit - extra space after '='
>>>>>>
>>>>>> ??? Okay. Definitely simpler code.
>>>>>>
>>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/safepoint.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>>> ??? Definitely got my attention with
>>>>>> ??? ObjectSynchronizer::needs_monitor_scavenge().
>>>>>>
>>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
>>>>>> triggering safepoint cleanup.");
>>>>>> ??????? Thanks for adding the logging line.
>>>>>>
>>>>>> ?? ? ?? Update: As Kim pointed out, this code goes away when
>>>>>> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>>>>>> ? ? ? ? forward to making that change.
>>>>>>
>>>>>> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && Atomic::xchg 
>>>>>> (1, &_forceMonitorScavenge) == 0) {
>>>>>> ??????? nit - extra space between 'xchg ('
>>>>>>
>>>>>> ??????? Since InduceScavenge() is only called when the deprecated
>>>>>> ??????? MonitorBound is specified, I think you could use cmpxchg()
>>>>>> ??????? for clarity. Of course, you might be thinking that the
>>>>>> ??????? pattern is a useful example for other folks to copy...
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>> ??? old L527: // Enqueue a VM_Operation to do the job for us - sometime later
>>>>>> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
>>>>>> java_throwable) {
>>>>>> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>>>>>> ??? L529: ? VMThread::execute(&vm_stop);
>>>>>> ??? L530: }
>>>>>> ?????? Okay so you deleted the comment about the call being async and the
>>>>>> ?????? VM op is no longer async, but does that break the expectation of
>>>>>> ?????? any callers?
>>>>>>
>>>>>> ?????? Off the top of head, I can't think of a way for a caller of
>>>>>> ?????? Thread::send_async_exception() to determine that the call is now
>>>>>> ?????? synchronous instead of asynchronous, but ...
>>>>>>
>>>>>> ?????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>>>>>> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>>>>>> ?????? used to throw an exception at the calling thread, I suspect that
>>>>>> ?????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>>>>>> ?????? With the exception throwing now being synchronous, would that
>>>>>> ?????? affect the return value of the JVM/TI StopThread() call?
>>>>>>
>>>>>> ?????? Looks like the JVM/TI wrapper (see gensrc/jvmtifiles/jvmtiEnter.cpp
>>>>>> ?????? in the build directory) uses ThreadInVMfromNative so the calling
>>>>>> ?????? thread is in VM when it requests the now synchronous VM operation.
>>>>>> ?????? When it requests the VM op, the calling thread will block which
>>>>>> ?????? should allow the VM thread to execute the op. No worries there so
>>>>>> ?????? far...
>>>>>>
>>>>>> ?????? It looks like the code also uses CautiouslyPreserveExceptionMark
>>>>>> ?????? so I think if the exception is delivered to the calling thread
>>>>>> ?????? it won't affect the return from jvmti_env->StopThread(), i.e., we
>>>>>> ?????? will have our return value. The CautiouslyPreserveExceptionMark
>>>>>> ?????? destructor won't kick in until we return from jvmti_StopThread()
>>>>>> ?????? (the JVM/TI wrapper from the build).
>>>>>>
>>>>>> ?????? However, that might cause this assertion to fire:
>>>>>>
>>>>>> ?????? src/hotspot/share/utilities/preserveException.cpp:
>>>>>> ?????? assert(!_thread->has_pending_exception(), "unexpected exception 
>>>>>> generated");
>>>>>>
>>>>>> ?????? because it is now detecting that an exception was thrown
>>>>>> ?????? while executing a JVM/TI call. This is pure theory here.
>>>>>>
>>>>>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/biasedLocking.cpp
>>>>>> ??? old L85: ??? // Use async VM operation to avoid blocking the Watcher 
>>>>>> thread.
>>>>>> ??????? Again, you've deleted the comment, but is there going to
>>>>>> ??????? be any unexpected side effects from the change? Looks like
>>>>>> ??????? the work consists of:
>>>>>>
>>>>>> ??????? L70: 
>>>>>> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>>>>>
>>>>>> ??????? Is that going to be a problem for the WatcherThread?
>>>>>>
>>>>>> test/hotspot/gtest/threadHelper.inline.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> As David H. likes to say: the proof is in the building and testing.
>>>>>>
>>>>>> Thumbs up on the overall idea and implementation. There might be an
>>>>>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>>>>>> on my part...
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>>>>>> Hi Robbin,
>>>>>>>>
>>>>>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Here is v3:
>>>>>>>>>
>>>>>>>>> Full:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>>>>
>>>>>>>> Looking at the highly discussed:
>>>>>>>>
>>>>>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>>>>>> &ForceMonitorScavenge) == 0) {
>>>>>>>>
>>>>>>>> why isn't that just:
>>>>>>>>
>>>>>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>>>>>
>>>>>>>> ??
>>>>>>>
>>>>>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>>>>>> Many threads can be enter and re-enter here.
>>>>>>> I don't know if that's still the case.
>>>>>>>
>>>>>>> Since we only hit this path when the deprecated MonitorsBound is set, I 
>>>>>>> think I can change it?
>>>>>>>
>>>>>>>>
>>>>>>>> Also while we are here can we clean this up further:
>>>>>>>>
>>>>>>>> static volatile int ForceMonitorScavenge = 0;
>>>>>>>>
>>>>>>>> becomes
>>>>>>>>
>>>>>>>> static int _forceMonitorScavenge = 0;
>>>>>>>>
>>>>>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>>>>>
>>>>>>>
>>>>>>> Sure!
>>>>>>>
>>>>>>>> Just to be clear, I understand the changes around monitor scavenging 
>>>>>>>> now, though I'm not sure getting rid of async VM ops and replacing with 
>>>>>>>> a new way to directly wakeup the VMThread really amounts to a 
>>>>>>>> simplification.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>>>>
>>>>>>>> I still think getting rid of Mode altogether would be a good 
>>>>>>>> simplification. :)
>>>>>>>
>>>>>>> Sure!
>>>>>>>
>>>>>>> Here is v4, inc:
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>>>>>
>>>>>>> Tested t1-3
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>>>>>>> Inc:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>>>>>
>>>>>>>>> Tested t1-3
>>>>>>>>>
>>>>>>>>> Thanks, Robbin
>>>>>>>>>
>>>>>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>>>>>> Hi all, please review.
>>>>>>>>>>
>>>>>>>>>> CMS was the last real user of the more advantage features of VM 
>>>>>>>>>> operation.
>>>>>>>>>> VM operation can be simplified to always be an stack object and thus 
>>>>>>>>>> either be
>>>>>>>>>> of safepoint or no safepoint type.
>>>>>>>>>>
>>>>>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>>>>>>>> (default not used). Making it synchrone doesn't matter.
>>>>>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for 
>>>>>>>>>> the safepoint anyways, no real point in not stopping direct.
>>>>>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the 
>>>>>>>>>> VM op is not needed. Arguably this thread should actually stop here, 
>>>>>>>>>> since we are about to safepoint.
>>>>>>>>>>
>>>>>>>>>> There is also a small cleanup in vmThread.cpp where an unused method 
>>>>>>>>>> is removed.
>>>>>>>>>> And the extra safepoint is removed:
>>>>>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>>>>>> No we don't :)
>>>>>>>>>>
>>>>>>>>>> Issue:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>>>>>> Change-set:
>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>>>>>
>>>>>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>>>>>
>>>>>>>>>> Thanks, Robbin
>>>>>>
>>>>>
>>>>
>>

From robbin.ehn at oracle.com  Tue Nov 26 11:01:49 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 26 Nov 2019 12:01:49 +0100
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <28ff192e-19b2-6380-30ba-e406cdf69a0b@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <1333677e-23ff-17fb-87df-055220efe850@oracle.com>
 <28ff192e-19b2-6380-30ba-e406cdf69a0b@oracle.com>
Message-ID: <a9f57340-4c3c-8c35-70cf-f4894f382ebc@oracle.com>

Thanks Patricio!

/Robbin

On 11/25/19 7:58 PM, Patricio Chilano wrote:
> Hi Robbin,
> 
> On 11/25/19 4:14 AM, Robbin Ehn wrote:
>> Hi Patricio,
>>
>> On 2019-11-22 19:25, Patricio Chilano wrote:
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>>
>> Thanks, I think this is good and easy to backport!
>> You might as well add native to the assert.
> Added!
> 
>> We should revisit this when we have time.
>> There are two polls and four transition in this code, which is more complicated
>> than I like.
> Agree, I think it could be simplified.
> 
> Here is v2:
> http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/ 
> <http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/src/hotspot/share/runtime/handshake.cpp.udiff.html>
> 
> Thanks for looking at this Robbin!
> 
> Patricio
>> /Robbin
>>
>>>
>>> Thanks,
>>> Patricio
> 

From patricio.chilano.mateo at oracle.com  Tue Nov 26 12:47:00 2019
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Tue, 26 Nov 2019 09:47:00 -0300
Subject: RFR 8234613: JavaThread can escape back to Java from an ongoing
 handshake
In-Reply-To: <a9f57340-4c3c-8c35-70cf-f4894f382ebc@oracle.com>
References: <44a92e40-eb69-0fe2-fb64-c49d8e3af963@oracle.com>
 <1333677e-23ff-17fb-87df-055220efe850@oracle.com>
 <28ff192e-19b2-6380-30ba-e406cdf69a0b@oracle.com>
 <a9f57340-4c3c-8c35-70cf-f4894f382ebc@oracle.com>
Message-ID: <1bb5086d-5467-b1f7-b2f1-ea73445b6a1c@oracle.com>

Thanks Robbin!

Patricio
On 11/26/19 6:01 AM, Robbin Ehn wrote:
> Thanks Patricio!
>
> /Robbin
>
> On 11/25/19 7:58 PM, Patricio Chilano wrote:
>> Hi Robbin,
>>
>> On 11/25/19 4:14 AM, Robbin Ehn wrote:
>>> Hi Patricio,
>>>
>>> On 2019-11-22 19:25, Patricio Chilano wrote:
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>>>
>>> Thanks, I think this is good and easy to backport!
>>> You might as well add native to the assert.
>> Added!
>>
>>> We should revisit this when we have time.
>>> There are two polls and four transition in this code, which is more 
>>> complicated
>>> than I like.
>> Agree, I think it could be simplified.
>>
>> Here is v2:
>> http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/ 
>> <http://cr.openjdk.java.net/~pchilanomate/8234613/v02/webrev/src/hotspot/share/runtime/handshake.cpp.udiff.html>
>>
>> Thanks for looking at this Robbin!
>>
>> Patricio
>>> /Robbin
>>>
>>>>
>>>> Thanks,
>>>> Patricio
>>


From zgu at redhat.com  Tue Nov 26 12:59:55 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 26 Nov 2019 07:59:55 -0500
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <CAA-vtUyyezqx+X4yq4zLA7KDKNrQYnTOdvKUBFt3NSv=arZfKg@mail.gmail.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
 <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>
 <CAA-vtUw5VMStBAQwA2tvsyyCPgz48g5s3hEDetebSNfMCBgnUA@mail.gmail.com>
 <c095dc25-faee-f605-b1f1-ebc8ec3cb7ff@redhat.com>
 <CAA-vtUyyezqx+X4yq4zLA7KDKNrQYnTOdvKUBFt3NSv=arZfKg@mail.gmail.com>
Message-ID: <25702d6d-1829-9882-0980-73b5d54b7916@redhat.com>

Hi Thomas,

> 
>     Right, we certainly can.
> 
>     Updated webrev: http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.01/
> 
> 
> output.shouldContain("Test (reserved=2GB, committed=2GB)");
> 
> Does this work? Does the output round sufficiently to always show "2G" 
> even though the total can jitter by +- 10M?

NMT rounds it to nearest whole number, so 2GB +/- 10M won't make a dent.

> 
>     New patch also passed submit test:
>     [Mach5] mach5-one-zgu-JDK-8234270-2-20191126-0012-6997789: PASSED
> 
> 
> Some more remarks and questions:
> 
> http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.01/src/hotspot/share/memory/resourceArea.cpp.udiff.html
> @@ -31,10 +31,13 @@
> 
>  ?void ResourceArea::bias_to(MEMFLAGS new_flags) {
>  ? ?if (new_flags != _flags) {
>  ? ? ?MemTracker::record_arena_free(_flags);
>  ? ? ?MemTracker::record_new_arena(new_flags);
> + ? ?size_t size = size_in_bytes();
> + ? ?MemTracker::record_arena_size_change(-ssize_t(size), _flags);? (A)
> + ? ?MemTracker::record_arena_size_change(ssize_t(size), new_flags);
>  ? ? ?_flags = new_flags;
>  ? ?}
> 
> Just aesthetics, but coding would be easier to understand if you 
> reordered things:
> 
>  ? ?if (new_flags != _flags) {
> + ? ?size_t size = size_in_bytes();
> + ? ?MemTracker::record_arena_size_change(-ssize_t(size), _flags);? (A)
>  ? ? ?MemTracker::record_arena_free(_flags);
>  ? ? ?MemTracker::record_new_arena(new_flags);
> + ? ?MemTracker::record_arena_size_change(ssize_t(size), new_flags);
>  ? ? ?_flags = new_flags;
>  ? ?}
> 
> or, if you were extending?record_arena_free/record_new_arena to take 
> last/initial arena size too and pass that on to 
> MallocMemory::deallocate()/allocate().
> 
> But I leave it up tp you if you change this. If you just reorder the 
> calls, I do not need another Webrev,

I will re-order the calls, as you suggested.

Thanks,

-Zhengyu

> 
> ...
> 
> Another unrelated question, what is the reason for the unusual creation 
> of MallocMemorySummarySnapshot with placement new? Why not just put it 
> as a member into?MallocMemorySummary? I must be missing something.
> 
> 
>     -Zhengyu
> 
> 
> 
>      >
>      >? ? 66 ? ? wb.NMTFreeArena(arena1);
>      >
>      > On Mon, Nov 25, 2019 at 2:30 PM Zhengyu Gu <zgu at redhat.com
>     <mailto:zgu at redhat.com>
>      > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>> wrote:
>      >
>      >? ? ?Ping ... May I get a second review?
>      >
>      >? ? ?Thanks,
>      >
>      >? ? ?-Zhengyu
>      >
>      >? ? ?On 11/21/19 12:12 PM, yumin qi wrote:
>      >? ? ? > Hi, Zhengyu
>      >? ? ? >
>      >? ? ? >? ? The fix looks good to me.
>      >? ? ? >
>      >? ? ? > Thanks
>      >? ? ? > Yumin
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >
>      >? ? ? > On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com
>     <mailto:zgu at redhat.com>
>      >? ? ?<mailto:zgu at redhat.com <mailto:zgu at redhat.com>>
>      >? ? ? > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>
>     <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>>> wrote:
>      >? ? ? >
>      >? ? ? >? ? ?JDK-8204128 did not fix the original bug. But new
>     assertion
>      >? ? ?helped to
>      >? ? ? >? ? ?catch the problem, as it consistently failed in Oracle
>      >? ? ?internal tests.
>      >? ? ? >
>      >? ? ? >? ? ?The root cause is that, when NMT biases a resource area to
>      >? ? ?compiler, it
>      >? ? ? >? ? ?did not adjust tracking data to reflect that. When the
>     biased
>      >? ? ?resource
>      >? ? ? >? ? ?area is released, there is a possibility that its size is
>      >? ? ?greater than
>      >? ? ? >? ? ?total size recorded, and underflow a size_t counter.
>      >? ? ? >
>      >? ? ? >? ? ?JDK-8204128 patch also missed a long to ssize_t parameter
>      >? ? ?type change,
>      >? ? ? >? ? ?that resulted new test failure on Windows, because long is
>      >? ? ?4-bytes on
>      >? ? ? >? ? ?Windows.
>      >? ? ? >
>      >? ? ? >? ? ?Many thanks to Leonid Mesnik, who helped to run this patch
>      >? ? ?through
>      >? ? ? >? ? ?Oracle's internal stress tests.
>      >? ? ? >
>      >? ? ? >? ? ?Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
>      >? ? ? >? ? ?Webrev:
>      > http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >? ? ?Test:
>      >? ? ? >? ? ? ? ?hotspot_nmt
>      >? ? ? >? ? ? ? ?Submit test
>      >? ? ? >? ? ? ? ?Oracle internal stress tests.
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >? ? ?Thanks,
>      >? ? ? >
>      >? ? ? >? ? ?-Zhengyu
>      >? ? ? >
>      >
> 


From tschoening at am-soft.de  Mon Nov 18 15:05:38 2019
From: tschoening at am-soft.de (=?iso-8859-1?Q?Thorsten_Sch=F6ning?=)
Date: Mon, 18 Nov 2019 16:05:38 +0100
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB48012B1619D64259A579AC418A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
 <489372066.20191118140919@am-soft.de>
 <AM6PR02MB48012B1619D64259A579AC418A4D0@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <1716757562.20191118160538@am-soft.de>

Guten Tag Langer, Christoph,
am Montag, 18. November 2019 um 14:22 schrieben Sie:

> I saw your other mail already but didn't find time to reply.

Thanks, I wasn't sure if I'm received at all. Now that I know and
because of your negative answer, I'm going to leave this thread alone
and whoever is interested in discussing my problem might do so in the
original thread:

https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063437.html

Am only going to answer your concrete questions...

> However, your case, the sporadic ERROR_NO_MORE_FILES, needs to be
> understood first. I rather think if this happens, there's a real
> condition for an IOException.

I don't see much difference to real problems like
ERROR_NETWORK_UNREACHABLE, in which case you don't even know if paths
exist already, of what type they are etc. Compared to that, my
ProcMon-logs looked pretty reasonable and "correct". The only
difference is the error code some rare times, without any changes in
the overall setup otherwise.

> It should definitely be analyzed and
> understood what the reason is for ERROR_NO_MORE_FILES. Are you aware
> of other reports of this issue?

No, I couldn't find anything of interest besides what I linked already
on SO. Additionally, while my software is used by multiple customers
with Windows environment, only one suffers from this problem and it
seems to occur only in the recent past after migrating to Windows
Server 2016, some new SAN etc. I'm somewhat sure that it didn't happen
before, this would have been noticed, like it was now.

> Was this already analyzed by some Windows experts, e.g. Microsoft
> support?

No and it's unlikely to happen, that customer is busy of course... I'm
going to see if I can push things in that direction anyway.

Mit freundlichen Gr??en,

Thorsten Sch?ning

-- 
Thorsten Sch?ning       E-Mail: Thorsten.Schoening at AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Gesch?ftsf?hrer: Andreas Muchow


From larry.cable at oracle.com  Mon Nov 25 17:10:37 2019
From: larry.cable at oracle.com (Laurence Cable)
Date: Mon, 25 Nov 2019 09:10:37 -0800 (PST)
Subject: RFR (M) 8234510: Remove file seeking requirement for writing a
 heap dump
In-Reply-To: <AM0PR02MB45008C66EC315E9836F7FF7A9F4A0@AM0PR02MB4500.eurprd02.prod.outlook.com>
References: <AM0PR02MB45008C66EC315E9836F7FF7A9F4A0@AM0PR02MB4500.eurprd02.prod.outlook.com>
Message-ID: <866ba7da-c16f-223d-0fc4-64b7ab69f831@oracle.com>

What (if any) is the compatibility impact of this change on tools 
consuming the heap dump format?

Thanks

- Larry


On 11/25/19 6:41 AM, Schmelter, Ralf wrote:
> Hello,
>
> this change removes the need to use seek on the hprof file when creating a heap dump, thus making it possible to stream the dump. This enables us to dump to a socket or directly gzip the dump.
>
> Instead of fixing the heap dump segments size on the written file, the size of the heap dump segments is either fixed up in the buffer instead or, for entries to big to fit into the buffer fully, the entry get its own segment with no need to fix up the segment size later.
>
> To do this, we now need to know how large an heap dump segment entry is when starting to write the entry. This is either trivial (for the roots) or already known (for the instance and array dump entries). Just the class entry needed a little more code to track the size.
>
> The change results in more heap dump segments in the written heap dump. But since the overhead per segment is 9 bytes, even for the smallest used buffer (64K) the overhead is less than 0.02%. Additionally the heap dump now expects to be able to allocate at least 64k for the buffer. The old code tried to run even with a buffer of 1 byte or no buffer at all.
>
> Bugreport: https://bugs.openjdk.java.net/browse/JDK-8234510
> Webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8234510/webrev.0/
>
> Best regards,
> Ralf


From tschoening at am-soft.de  Mon Nov 18 13:09:19 2019
From: tschoening at am-soft.de (=?windows-1250?Q?Thorsten_Sch=F6ning?=)
Date: Mon, 18 Nov 2019 14:09:19 +0100
Subject: RFR: 8234185: Cleanup usage of canonicalize function between
 libjava, hotspot and libinstrument
In-Reply-To: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
References: <AM6PR02MB4801E150D9CE4559B5A043098A710@AM6PR02MB4801.eurprd02.prod.outlook.com>
Message-ID: <489372066.20191118140919@am-soft.de>

Guten Tag Langer, Christoph,
am Donnerstag, 14. November 2019 um 16:37 schrieben Sie:

> please review this cleanup change regarding function "canonicalize" of libjava.
[...]
> The goal is to cleanup how this function is defined and used.[...]

If you are already changing "lastErrorReportable" for Windows, how
about adding ERROR_NO_MORE_FILES there as well to not run into
unnecessary exceptions under some circumstances?

https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-November/063437.html
https://stackoverflow.com/questions/58825588/does-java-need-to-support-error-no-more-files-when-canonicalizing-paths-on-windo
https://stackoverflow.com/questions/58825963/when-does-findfirstfilew-set-last-error-to-be-error-no-more-files-instead-of-err?noredirect=1&lq=1

Mit freundlichen Gr??en,

Thorsten Sch?ning

-- 
Thorsten Sch?ning       E-Mail: Thorsten.Schoening at AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Gesch?ftsf?hrer: Andreas Muchow


From tschoening at am-soft.de  Tue Nov 19 09:30:30 2019
From: tschoening at am-soft.de (=?utf-8?Q?Thorsten_Sch=C3=B6ning?=)
Date: Tue, 19 Nov 2019 10:30:30 +0100
Subject: Should Java support ERROR_NO_MORE_FILES when canonicalizing paths
 on Windows?
In-Reply-To: <feced9f8-a16a-b01c-4c77-5c6259b49c92@oracle.com>
References: <346034750.20191114204642@am-soft.de>
 <feced9f8-a16a-b01c-4c77-5c6259b49c92@oracle.com>
Message-ID: <1841161740.20191119103030@am-soft.de>

Guten Tag Ioi Lam,
am Montag, 18. November 2019 um 22:21 schrieben Sie:

> https://bugs.openjdk.java.net/browse/JDK-8234363

Thanks for doing that!

> I have not investigated the issue in detail yet. How often do you see 
> ERROR_NO_MORE_FILES happening?

It's difficult to say currently because my customer doesn't monitor
such things. So I don't know when it starts to happen and if so, if it
happens always really. From my tests, it seems to start at some point
and happens occasionally afterwards, maybe even getting more. But it
still doesn't happen always, during my tests there where some times at
which copying the files succeeded.

I'm trying to get some more logs in the meantime to find a pattern.

> Have you checked if your process
> (apache?) has too many open files such that FindFirstFileW is not able
> to open the directory to get a file listing?

With Apache I meant the Java-lib providing some I/O-helpers[1], not
the web server or stuff. My daemon is a plain Java-process started
manually on the shell and I'm somewhat sure that I don't have a
handle-leak in that process because of the following reasons:

At the point where I copy things and ERROR_NOR_MORE_FILES happen, I
don't have any open files myself anymore and looking at SCM-logs, code
didn't change. Commons-IO hasn't been updated in years as well, so is
unlikely to newly introduce leaks as well.

Besides that, what Process Monitor[2] logs during success/failure
looks exactly the same in case of error vs. success, the only
difference being ERROR_NO_SUCH_FILE vs. ERROR_NO_MORE_FILES. If there
would be some handle leak in the process resulting in the IOException,
keeping the process running would fail in former file-related
operations already, where I really read and create files on my own.
But that's not the case, all those operations always succeed, only
when it's about copying the created files into their target directory
things start failing at some point, but even then still succeed
sometimes.

But the most important thing in my opinion is that the error is
persistent during restarts of my daemon, which should clear all open
handles in theory. When the problem happens often, restarting my
daemon doesn't seem to change anything. What instantly solves the
problem is clearing the target directory of the copy operation,
either by renaming the old one and creating a new one or by simply
deleting what is present in that directory currently. ONLY if I do one
of those things the copy operations start to succeed reliably again,
regardless of if the daemon is restartedt or kept running even after
failing before.

I don't care about formerly available contents in the target directory
myself, but am using files with timestamps and stuff like that. And
that's my point: While there surely is some problem somewhere, I think
it's most likely to be in the infrastructure of my customer, because
he has storage-related problems anyway. Things are too slow sometimes
and all that. While I don't see anything of those problems in ProcMon,
like timeouts, permissions problems or other real errors, when my
problem occurs, it might simply be that Windows internally behaves
undocumented for some currently unknown reason.

By allowing ERROR_NO_MORE_FILES it might be that whatever the problem
in Windows might be simply gets ignored up until a point where a real
problem happens. And if that doesn't occur in the end, one doesn't
need to care as well. Allowing ERROR_NO_MORE_FILES doesn't look that
different to e.g. ERROR_NETWORK_UNREACHABLE to me, because in my setup
the latter would be the even bigger problem, as I'm copying things on
network shares in the end.

> If that is indeed the case, I am not sure what's the best way of
> handling it. If resource (file descriptors) are running out, perhaps the
> current behavior of throwing an exception in 
> WinNTFileSystem.canonicalize0() would be better than just ignoring it 
> and return an incorrect result. But I'll defer to the folks on the 
> core-libs team.

Please notice that Windows has ERROR_TOO_MANY_OPEN_FILES for that, so
in my opinion this is another strong hint that ERROR_NO_MORE_FILES
really is some kind of success. Only undocumented/unepxected, but that
seems to be the case with many of the other error codes handled in
"lastErrorReportable" as well.

[1]: https://commons.apache.org/proper/commons-io/
[2]: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

Mit freundlichen Gr??en,

Thorsten Sch?ning

-- 
Thorsten Sch?ning       E-Mail: Thorsten.Schoening at AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Gesch?ftsf?hrer: Andreas Muchow


From tschoening at am-soft.de  Tue Nov 19 09:58:05 2019
From: tschoening at am-soft.de (=?utf-8?Q?Thorsten_Sch=C3=B6ning?=)
Date: Tue, 19 Nov 2019 10:58:05 +0100
Subject: Should Java support ERROR_NO_MORE_FILES when canonicalizing paths
 on Windows?
In-Reply-To: <1841161740.20191119103030@am-soft.de>
References: <346034750.20191114204642@am-soft.de> 
 <feced9f8-a16a-b01c-4c77-5c6259b49c92@oracle.com>
 <1841161740.20191119103030@am-soft.de>
Message-ID: <510285966.20191119105805@am-soft.de>

Guten Tag Thorsten Sch?ning,
am Dienstag, 19. November 2019 um 10:30 schrieben Sie:

> Please notice that Windows has ERROR_TOO_MANY_OPEN_FILES for that, so
> in my opinion this is another strong hint that ERROR_NO_MORE_FILES
> really is some kind of success. Only undocumented/unepxected, but that
> seems to be the case with many of the other error codes handled in
> "lastErrorReportable" as well.

I've searched the web again about people using ERROR_NO_MORE_FILES
strictly in context of FindFirstFile and at least found some:

https://www.tek-tips.com/viewthread.cfm?qid=876612
https://www.perlmonks.org/?node_id=1104873
ftp://kermit.columbia.edu/kermit/k95source/findfile.c
https://books.google.de/books?id=AF5Lr5HA5UEC&pg=PA68&lpg=PA68&dq=FindFirstFileW+ERROR_NO_MORE_FILES&source=bl&ots=Uvk3ZWJ3vo&sig=ACfU3U1XcESErD0A6I9zPfMjN9GG2pmJeQ&hl=de&sa=X&ved=2ahUKEwjc-dL1__XlAhWPKFAKHanrDi84FBDoATAJegQIChAB#v=onepage&q=FindFirstFileW%20ERROR_NO_MORE_FILES&f=false

Mit freundlichen Gr??en,

Thorsten Sch?ning

-- 
Thorsten Sch?ning       E-Mail: Thorsten.Schoening at AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Gesch?ftsf?hrer: Andreas Muchow


From xun.chen at ele.me  Tue Nov 26 07:29:59 2019
From: xun.chen at ele.me (xun.chen at ele.me)
Date: Tue, 26 Nov 2019 07:29:59 +0000
Subject: hotspot support dynamic expansion method stack
Message-ID: <9645B0EE-CEA6-4447-8574-6E75B5099695@ele.me>

hi?

The Java virtual machine specification specifies two exception conditions for this area: If the stack depth requested by the thread is greater than the depth allowed by the virtual machine, a StackOverflowError exception will be thrown. If the virtual machine fails to apply for enough memory, it will run OutOfMemoryError exception.

But Does hotspot support dynamic expansion method stack?

How to reproduce this phenomenon ?


?????????
??(ace)?????????? ELEME Inc.
email?xun.chen at ele.me<mailto:xun.chen at ele.me> | mobile:+86 15216614939
http://ele.me ???


From zgu at redhat.com  Tue Nov 26 14:29:40 2019
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 26 Nov 2019 09:29:40 -0500
Subject: RFR 8234270: [REDO] JDK-8204128 NMT might report incorrect
 numbers for Compiler area
In-Reply-To: <25702d6d-1829-9882-0980-73b5d54b7916@redhat.com>
References: <7b82c213-35e9-1aef-c3c4-06fa8dec0d13@redhat.com>
 <CAOEheN7K+mL96yyTb9nZ+zW+R0yUpuzHWq3bCcAeLpyTmjK06Q@mail.gmail.com>
 <64906b40-5040-df49-2c77-19f88f64a16c@redhat.com>
 <CAA-vtUw5VMStBAQwA2tvsyyCPgz48g5s3hEDetebSNfMCBgnUA@mail.gmail.com>
 <c095dc25-faee-f605-b1f1-ebc8ec3cb7ff@redhat.com>
 <CAA-vtUyyezqx+X4yq4zLA7KDKNrQYnTOdvKUBFt3NSv=arZfKg@mail.gmail.com>
 <25702d6d-1829-9882-0980-73b5d54b7916@redhat.com>
Message-ID: <3a2c2045-717e-6282-dcdd-40342f091177@redhat.com>


>>
>> But I leave it up tp you if you change this. If you just reorder the 
>> calls, I do not need another Webrev,
> 
> I will re-order the calls, as you suggested.

Passed submit tests and pushed.

Thanks, Thomas and Yumin

-Zhengyu

> 
> Thanks,
> 
> -Zhengyu
> 
>>
>> ...
>>
>> Another unrelated question, what is the reason for the unusual 
>> creation of MallocMemorySummarySnapshot with placement new? Why not 
>> just put it as a member into?MallocMemorySummary? I must be missing 
>> something.
>>
>>
>> ??? -Zhengyu
>>
>>
>>
>> ???? >
>> ???? >? ? 66 ? ? wb.NMTFreeArena(arena1);
>> ???? >
>> ???? > On Mon, Nov 25, 2019 at 2:30 PM Zhengyu Gu <zgu at redhat.com
>> ??? <mailto:zgu at redhat.com>
>> ???? > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>> wrote:
>> ???? >
>> ???? >? ? ?Ping ... May I get a second review?
>> ???? >
>> ???? >? ? ?Thanks,
>> ???? >
>> ???? >? ? ?-Zhengyu
>> ???? >
>> ???? >? ? ?On 11/21/19 12:12 PM, yumin qi wrote:
>> ???? >? ? ? > Hi, Zhengyu
>> ???? >? ? ? >
>> ???? >? ? ? >? ? The fix looks good to me.
>> ???? >? ? ? >
>> ???? >? ? ? > Thanks
>> ???? >? ? ? > Yumin
>> ???? >? ? ? >
>> ???? >? ? ? >
>> ???? >? ? ? >
>> ???? >? ? ? > On Wed, Nov 20, 2019 at 5:49 AM Zhengyu Gu <zgu at redhat.com
>> ??? <mailto:zgu at redhat.com>
>> ???? >? ? ?<mailto:zgu at redhat.com <mailto:zgu at redhat.com>>
>> ???? >? ? ? > <mailto:zgu at redhat.com <mailto:zgu at redhat.com>
>> ??? <mailto:zgu at redhat.com <mailto:zgu at redhat.com>>>> wrote:
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?JDK-8204128 did not fix the original bug. But new
>> ??? assertion
>> ???? >? ? ?helped to
>> ???? >? ? ? >? ? ?catch the problem, as it consistently failed in Oracle
>> ???? >? ? ?internal tests.
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?The root cause is that, when NMT biases a resource 
>> area to
>> ???? >? ? ?compiler, it
>> ???? >? ? ? >? ? ?did not adjust tracking data to reflect that. When the
>> ??? biased
>> ???? >? ? ?resource
>> ???? >? ? ? >? ? ?area is released, there is a possibility that its 
>> size is
>> ???? >? ? ?greater than
>> ???? >? ? ? >? ? ?total size recorded, and underflow a size_t counter.
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?JDK-8204128 patch also missed a long to ssize_t 
>> parameter
>> ???? >? ? ?type change,
>> ???? >? ? ? >? ? ?that resulted new test failure on Windows, because 
>> long is
>> ???? >? ? ?4-bytes on
>> ???? >? ? ? >? ? ?Windows.
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?Many thanks to Leonid Mesnik, who helped to run this 
>> patch
>> ???? >? ? ?through
>> ???? >? ? ? >? ? ?Oracle's internal stress tests.
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?Bug: https://bugs.openjdk.java.net/browse/JDK-8234270
>> ???? >? ? ? >? ? ?Webrev:
>> ???? > http://cr.openjdk.java.net/~zgu/JDK-8234270/webrev.00/index.html
>> ???? >? ? ? >
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?Test:
>> ???? >? ? ? >? ? ? ? ?hotspot_nmt
>> ???? >? ? ? >? ? ? ? ?Submit test
>> ???? >? ? ? >? ? ? ? ?Oracle internal stress tests.
>> ???? >? ? ? >
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?Thanks,
>> ???? >? ? ? >
>> ???? >? ? ? >? ? ?-Zhengyu
>> ???? >? ? ? >
>> ???? >
>>


From harold.seigel at oracle.com  Tue Nov 26 16:19:15 2019
From: harold.seigel at oracle.com (Harold Seigel)
Date: Tue, 26 Nov 2019 11:19:15 -0500
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <e4b4bb6e-09d5-cec4-e9fa-62d939dd12ed@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
 <e4b4bb6e-09d5-cec4-e9fa-62d939dd12ed@oracle.com>
Message-ID: <b3a0309f-7812-5d54-6fbf-1b170932a49c@oracle.com>

Hi David,

Thanks for looking at this change.? Please review this updated webrev 
that moves the new test functionality into the existing hotspot 
.../logging/VerificationTest.java test.

    http://cr.openjdk.java.net/~hseigel/bug_8234656.2/webrev/index.html

Thanks, Harold

On 11/25/2019 5:30 PM, David Holmes wrote:
> Hi Harold,
>
> On 26/11/2019 3:13 am, Harold Seigel wrote:
>> Hi,
>>
>> Please review this small change to improve the granularity of 
>> verifier logging.? This change provides brief output for log level 
>> info and detailed logging for log levels debug and trace. 
>> Additionally, it changes verifier test TraceClassRes.java to use the 
>> logging API command line options.
>
> Deciding what to log at what level is highly subjective :) This change 
> seems okay though as anyone who wants the current output can enable 
> "debug" logging for verification and won't then get a tonne of other 
> stuff they didn't want.
>
> The new test functionality could be added to the existing:
>
> ./hotspot/jtreg/runtime/logging/VerificationTest.java
>
> Thanks,
> David
> -----
>
>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
>>
>> The fix was regression tested by running Mach5 tiers 1 and 2 tests 
>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running 
>> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on 
>> Linux-x64.
>>
>> Thanks, Harold
>>

From larry.cable at oracle.com  Tue Nov 26 16:39:05 2019
From: larry.cable at oracle.com (Laurence Cable)
Date: Tue, 26 Nov 2019 08:39:05 -0800
Subject: RFR (M) 8234510: Remove file seeking requirement for writing a
 heap dump
In-Reply-To: <AM0PR02MB45007617CF26A9D65297BAD79F450@AM0PR02MB4500.eurprd02.prod.outlook.com>
References: <AM0PR02MB45008C66EC315E9836F7FF7A9F4A0@AM0PR02MB4500.eurprd02.prod.outlook.com>
 <866ba7da-c16f-223d-0fc4-64b7ab69f831@oracle.com>
 <AM0PR02MB45007617CF26A9D65297BAD79F450@AM0PR02MB4500.eurprd02.prod.outlook.com>
Message-ID: <7b5ee1a9-6ed0-f897-9646-a2f6ee5e2742@oracle.com>

COOL! thx

- Larry


On 11/26/19 1:30 AM, Schmelter, Ralf wrote:
> Hi Larry,
>
> there should be no compatibility impact. The hprof format stayed the same, just the heap dump segments we write are smaller on average and more frequent.
>
> I tested the created heap dumps with the jtreg test (the former jhat code), memory analyzer from eclipse, heap hero (an online heap analyzer) and visual VM. All without problems.
>
> Best regards,
> Ralf
>
> -----Original Message-----
> From: Laurence Cable <larry.cable at oracle.com>
> Sent: Montag, 25. November 2019 18:11
> To: Schmelter, Ralf <ralf.schmelter at sap.com>; OpenJDK Serviceability <serviceability-dev at openjdk.java.net>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR (M) 8234510: Remove file seeking requirement for writing a heap dump
>
> What (if any) is the compatibility impact of this change on tools
> consuming the heap dump format?
>
> Thanks
>
> - Larry


From martin.doerr at sap.com  Tue Nov 26 18:11:21 2019
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 26 Nov 2019 18:11:21 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
Message-ID: <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>

Hi David,

> Biased-locking is a very old optimization for uncontended locking, based
> on a time when there was heavy single-threaded use of synchronized data
> structures, and where actual lock/unlock atomic operations were very
> expensive.

I don't understand which role the age of an optimization plays.
The only thing I can imagine is that old optimizations may be subject for reevaluation.
I'm ok with doing that.

I assume that the code which was written at that time still exists.
And I think it's a valid approach to use a library which was written for a MT usage for a single-threaded use case.
Not everyone wants to write an additional single-threaded version if BiasedLocking can do the job.

I don't think lock/unlock atomic operations have become cheap.


> It is very complex and highly intrusive code. Every time "we"
> have had to make changes to object monitor support, or safepoint
> support, we have had to deal with the added complexity that
> biased-locking introduced and "we" have asked ourselves many times
> whether "we" can just get rid of this old optimization.

I understand this (also had to do some work for it). IMHO this is only a good argument if the benefit is small which should get proven.


But I see that BiasedLocking is kind of disturbing for other features.
E.g. UseRTMLocking in addition to project loom.
Transactional memory fans would probably be happy about the deprecation because it can't be used together.
It'd be also interesting to try UseRTMLocking together with project loom.


I guess there will be more discussion when the JEP arrives.

Best regards,
Martin


From calvin.cheung at oracle.com  Tue Nov 26 19:12:02 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Tue, 26 Nov 2019 11:12:02 -0800
Subject: RFR(XS) 8234539 ArchiveRelocationTest.java failed: Archive
 mapping should always succeed
In-Reply-To: <631c5088-35f9-cfd8-4e4a-19683ba639d0@oracle.com>
References: <719e7512-cf84-072c-3ecf-9181f4c495dd@oracle.com>
 <698fac17-b84e-a5ab-44c2-ccfb08bbfe27@oracle.com>
 <24e62e72-bc2f-3045-45d8-a778271d6c2d@oracle.com>
 <a1132ee2-0fac-46e2-3172-4dd23027cde6@oracle.com>
 <631c5088-35f9-cfd8-4e4a-19683ba639d0@oracle.com>
Message-ID: <e4743fc8-10b0-f7e8-c208-4d6e01084990@oracle.com>


On 11/25/19 3:43 PM, Ioi Lam wrote:
> Hi Calvin,
>
> Thanks for the review.
>
> On 11/25/19 3:22 PM, Calvin Cheung wrote:
>> Hi Ioi,
>>
>> This seems good.
>>
>> Just wondering are the following 'if' checks necessary in 
>> metaspaceShared.cpp?
>>
>> 2155?????? if (static_result == MAP_ARCHIVE_SUCCESS) {
>> 2156???????? static_result = MAP_ARCHIVE_MMAP_FAILURE;
>> 2157?????? }
>> 2158?????? if (dynamic_result == MAP_ARCHIVE_SUCCESS) {
>> 2159???????? dynamic_result = MAP_ARCHIVE_MMAP_FAILURE;
>> 2160?????? }
>>
>
> The checks for (static_result == MAP_ARCHIVE_SUCCESS) is to make sure 
> we aren't in the MAP_ARCHIVE_OTHER_FAILURE state, which could happen 
> if archive CRC check failed, classpath validation failed, etc.
>
>
>> The checks weren't there in filemap.cpp. 
>
> The the old code (removed by this patch) was at a point that no error 
> has appended, so we are implicitly in the MAP_ARCHIVE_SUCCESS state.
>
>> Also, the caller won't try map_archives() again if the result is not 
>> MAP_ARCHIVE_MMAP_FAILURE.
>
> That's the intended behavior. If CRC check has failed, for example, 
> even if we retry mapping, we will get the same failure again.

Thanks for the explanations.

Looks good.

thanks,

Calvin

>
> Thanks
> - Ioi
>
>>
>> thanks,
>>
>> Calvin
>>
>> On 11/22/19 5:46 PM, Ioi Lam wrote:
>>> Hi Calvin,
>>>
>>> Thanks for the review. It turned out that I needed to fix another 
>>> (addr_delta == 0) bug in the code. I've also moved the handling of 
>>> ArchiveRelocationMode==1 in debug builds to 
>>> MetaspaceShared::map_archives(). This way, we can simulate the 
>>> "mapping failure" after all archives have been mapped. This way, we 
>>> can better test the code that unmap the archives after the initial 
>>> mapping failures.
>>>
>>> Here's the updated patch.
>>> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v02/ 
>>>
>>>
>>> I am running tier4-rt-cds-relocation multiple times to make sure 
>>> 8234539 is no longer triggered on Windows.
>>>
>>> Thanks
>>> - Ioi
>>>
>>> On 11/22/2019 11:23 AM, Calvin Cheung wrote:
>>>> Hi Ioi,
>>>>
>>>> The fix looks good.
>>>>
>>>> thanks,
>>>>
>>>> Calvin
>>>>
>>>> On 11/21/19 2:58 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234539
>>>>> http://cr.openjdk.java.net/~iklam/jdk14/8234539-mapping-should-always-succeed.v01/ 
>>>>>
>>>>>
>>>>> This bug happens only on Windows. The fix is one-line -- in order 
>>>>> to check
>>>>> whether "This is the second time we try to map the archive(s)", 
>>>>> instead of
>>>>> using (addr_delta != 0), the correct condition is 
>>>>> (rs.is_reserved()). Please
>>>>> see the bug report for details.
>>>>>
>>>>> I also improve the log messages when error happens.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>
>

From david.holmes at oracle.com  Tue Nov 26 23:57:50 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 09:57:50 +1000
Subject: hotspot support dynamic expansion method stack
In-Reply-To: <9645B0EE-CEA6-4447-8574-6E75B5099695@ele.me>
References: <9645B0EE-CEA6-4447-8574-6E75B5099695@ele.me>
Message-ID: <7ddcb0af-877e-f842-5016-2226d1cb7b0f@oracle.com>

Hi,

On 26/11/2019 5:29 pm, xun.chen at ele.me wrote:
> hi?
> 
> The Java virtual machine specification specifies two exception conditions for this area: If the stack depth requested by the thread is greater than the depth allowed by the virtual machine, a StackOverflowError exception will be thrown. If the virtual machine fails to apply for enough memory, it will run OutOfMemoryError exception.
> 
> But Does hotspot support dynamic expansion method stack?

hotspot uses threads with fixed stack sizes.

> How to reproduce this phenomenon ?

You want to know how to generate a StackOverflowError? Just keep 
recursing into a function.

David
-----

> 
> 
> 
> ?????????
> ??(ace)?????????? ELEME Inc.
> email?xun.chen at ele.me<mailto:xun.chen at ele.me> | mobile:+86 15216614939
> http://ele.me ???
> 
> 
> 
> 

From david.holmes at oracle.com  Wed Nov 27 00:03:56 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 10:03:56 +1000
Subject: RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL)
 failed: resolving NULL _value"
In-Reply-To: <886380dd-fa13-94e5-ba1d-fc4678a5f90c@oracle.com>
References: <886380dd-fa13-94e5-ba1d-fc4678a5f90c@oracle.com>
Message-ID: <32889ba8-b9e1-6a38-deaf-a16cb6d2a9c6@oracle.com>

(adding runtime as well)

Hi Coleen,

On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
> Summary: Add local deferred event list to thread to post events outside 
> CodeCache_lock.
> 
> This patch builds on the patch for JDK-8173361.? With this patch, I made 
> the JvmtiDeferredEventQueue an instance class (not AllStatic) and have 
> one per thread.? The CodeBlob event that used to drop the CodeCache_lock 
> and raced with the sweeper thread, adds the events it wants to post to 
> its thread local list, and processes it outside the lock.? The list is 
> walked in GC and by the sweeper to keep the nmethods from being unloaded 
> and zombied, respectively.

Sorry I don't understand why we would want/need a deferred event queue 
for every JavaThread? Isn't this only relevant for non-JavaThreads that 
need to have the ServiceThread process the deferred event?

David

> Also, the jmethod_id field in nmethod was only used as a boolean so 
> don't create a jmethod_id until needed for post_compiled_method_unload.
> 
> Ran hs tier1-8 on linux-x64-debug and the stress test that crashed in 
> the original bug report.
> 
> open webrev at http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
> 
> Thanks,
> Coleen

From serguei.spitsyn at oracle.com  Wed Nov 27 00:10:37 2019
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 26 Nov 2019 16:10:37 -0800
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <574c3950-96ee-84ca-3079-7094e4fda989@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <dac8c89c-e40b-b6ac-c7e1-e1fd4b09adf3@oracle.com>
 <9cdcb5e0-a517-2193-e77d-ad024ac1d11f@oracle.com>
 <8271e3b6-0a94-4b6d-dfb8-3405f1534444@oracle.com>
 <9c0d3053-e134-3225-c719-34db0cdf8fac@oracle.com>
 <574c3950-96ee-84ca-3079-7094e4fda989@oracle.com>
Message-ID: <cad9e471-dd4e-9fa0-5f34-3eb1efac4776@oracle.com>

Hi David,

Oh, right.

Thank you for explaining this!
Serguei

On 11/26/19 02:30, David Holmes wrote:
> Hi Serguei,
>
> Note that has_pending_exception() is not the same as having a pending 
> async exception. This is what Robbin was clarifying in his other mail. 
> The VM_StopThread sets the _pending_async_exception field, but that 
> exception only becomes the _pending_exception when we execute specific 
> thread-state transitions that check for the pending async exception - 
> and we apparently do not execute that kind of transition in this code. 
> Hence the assertion would not fire.
>
> Cheers,
> David
>
> On 26/11/2019 8:10 pm, serguei.spitsyn at oracle.com wrote:
>> Hi Robbin, Dan and David,
>>
>> Sorry for being slow with this reply.
>> Probably, it'd be Okay to reply just to this latest email from Robbin.
>>
>> I agree with Dan that we basically have the assert issue which Dan 
>> has spotted.
>> Calling the JVM TI StopThread on the target thread (current thread is 
>> target thread)
>> should cause the assertion to fire:
>> ??????? src/hotspot/share/utilities/preserveException.cpp:
>> ??????? assert(!_thread->has_pending_exception(), "unexpected 
>> exception generated");
>>
>> Below is just to make sure Robbin gets everything right ...
>>
>> ------------------------------------------------------------
>>
>> At build time we generate the jvmtiEnter.cpp with the JVM TI function 
>> wrappers
>> that check arguments and do state transitions.
>>
>> The location is like this:
>> build/linux-x86_64-server-release/hotspot/variant-server/gensrc/jvmtifiles/jvmtiEnter.cpp 
>>
>>
>> For the StopThread we have this wrapper:
>>
>> static jvmtiError JNICALL
>> jvmti_StopThread(jvmtiEnv* env,
>> ???????????? jthread thread,
>> ???????????? jobject exception) {
>>
>> #if !INCLUDE_JVMTI
>> ?? return JVMTI_ERROR_NOT_AVAILABLE;
>> #else
>> ?? if(!JvmtiEnv::is_vm_live()) {
>> ???? return JVMTI_ERROR_WRONG_PHASE;
>> ?? }
>> ?? Thread* this_thread = Thread::current_or_null();
>> ?? if (this_thread == NULL || !this_thread->is_Java_thread()) {
>> ???? return JVMTI_ERROR_UNATTACHED_THREAD;
>> ?? }
>> ?? JavaThread* current_thread = (JavaThread*)this_thread;
>> ?? ThreadInVMfromNative __tiv(current_thread);
>> ?? VM_ENTRY_BASE(jvmtiError, jvmti_StopThread , current_thread)
>> ?? debug_only(VMNativeEntryWrapper __vew;)
>> *? CautiouslyPreserveExceptionMark __em(this_thread);*
>> ?? JvmtiEnv* jvmti_env = JvmtiEnv::JvmtiEnv_from_jvmti_env(env);
>> ?? if (!jvmti_env->is_valid()) {
>> ???? return JVMTI_ERROR_INVALID_ENVIRONMENT;
>> ?? }
>>
>> ?? if (jvmti_env->get_capabilities()->can_signal_thread == 0) {
>> ???? return JVMTI_ERROR_MUST_POSSESS_CAPABILITY;
>> ?? }
>> ?? jvmtiError err;
>> ?? JavaThread* java_thread = NULL;
>> ?? ThreadsListHandle tlh(this_thread);
>> ???? err = JvmtiExport::cv_external_thread_to_JavaThread(tlh.list(), 
>> thread, &java_thread, NULL);
>> ???? if (err != JVMTI_ERROR_NONE) {
>> ?????? return err;
>> ???? }
>> ?? err = jvmti_env->StopThread(java_thread, exception);
>> ?? return err;
>> #endif // INCLUDE_JVMTI
>> }
>>
>>
>> The VM_ThreadStop::doit() calls this:
>> ????? target->send_thread_stop(throwable());
>>
>> which sets a pending async. exception in the target thread (which is 
>> current thread in our case):
>> ?????? // Set async. pending exception in thread.
>> ?????? set_pending_async_exception(java_throwable);
>>
>>
>> AFAIK the VM_ThreadStop is executed at a safepoint (not 
>> concurrently), so that it should cause
>> to fire the below assert in the CautiouslyPreserveExceptionMark 
>> desctructor:
>>
>> CautiouslyPreserveExceptionMark::~CautiouslyPreserveExceptionMark() {
>> *? assert(!_thread->has_pending_exception(), "unexpected exception 
>> generated");*
>> ?? if (_thread->has_pending_exception()) {
>> ???? _thread->clear_pending_exception();
>> ?? }
>> ?? if (_preserved_exception_oop() != NULL) {
>> _thread->set_pending_exception(_preserved_exception_oop(), 
>> _preserved_exception_file, _preserved_exception_line);
>> ?? }
>> }
>> ------------------------------------------------------------------
>>
>>
>> I'm not sure we have a test coverage for this case.
>> It looks like the JVM TI StopThread was not designed to stop the 
>> current thread.
>> I think so because the NULL is not accepted as thread parameter to 
>> designate current thread.
>> The error JVMTI_ERROR_INVALID_THREAD has to be returned in such a case.
>> But it is hard to say why this assumption was not spelled clearly in 
>> the spec.
>>
>> Overall, it does not look important to keep the StopThread correctly 
>> working for target thread == current.
>> It is because there is always an option to send a synchronous 
>> exception to itself.
>>
>> I don't know why this was not deprecated together with the 
>> Thread.stop().
>> The JVM TI SuspendThread()/ResumeThread() also were not deprecated 
>> together with Thread.suspend()/Thread.resume().
>> I think, these functions are needed for debuggers.
>>
>> Thanks,
>> Serguei
>>
>>
>>
>>
>> On 11/25/19 04:48, Robbin Ehn wrote:
>>> Hi Serguei, thanks for having a look.
>>>
>>> AFAIK:
>>> Today one of these three can happen when returning to agent (native).
>>> - The target thread for stop have not yet installed the async 
>>> exception.
>>> - The target thread have installed the async exception, but not yet 
>>> stopped.
>>> - The target thread have installed the async exception and already 
>>> stopped.
>>>
>>> A agent must handle all three possible scenarios.
>>> This patch just removes the first scenario, and makes the 
>>> installation part synchrone.
>>>
>>> Hope that helps!
>>>
>>> Note, I don't see the method "StopThread" being documented as either 
>>> synchrone
>>> or asynchrone itself.
>>> The exception is documented as beeing asynchrone.
>>>
>>> And I don't think we follow the specs now, since we ignore the 
>>> result of:
>>> void VM_ThreadStop::doit()
>>> One would expect e.g. JVMTI_ERROR_THREAD_NOT_ALIVE if we never 
>>> deilver the async
>>> exception. So making the async exception installation part synchrone 
>>> would be a
>>> step to fix that issue.
>>>
>>> Thanks, Robbin
>>>
>>> On 2019-11-25 12:45, serguei.spitsyn at oracle.com wrote:
>>>> Please, skip my reply below.
>>>> I need to read all emails carefully.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>> On 11/25/19 03:35, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Dan and Robbin,
>>>>>
>>>>> I can be wrong and missing something but it feels like there is no 
>>>>> issue for JVMTI with this fix.
>>>>>
>>>>> > Off the top of head, I can't think of a way for a caller of
>>>>> > Thread::send_async_exception() to determine that the call is now
>>>>> > synchronous instead of asynchronous, but ...
>>>>>
>>>>> There can be some confusion here about what is synchronous 
>>>>> relative to.
>>>>> I read it this way:
>>>>> ?It synchronous for the current thread which calls the 
>>>>> send_async_exception().
>>>>> ?However, it is asynchronous for the target thread that needs to 
>>>>> be stopped.
>>>>> ?So that the fix does not break the JVMTI spec requirements.
>>>>>
>>>>> Please, let me know if you agree (or not) with this reading.
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 11/22/19 13:50, Daniel D. Daugherty wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>> Sorry I'm late to this review thread...
>>>>>>
>>>>>> I'm adding Serguei to this email thread since I'm making comments
>>>>>> about the JVM/TI parts of this changeset...
>>>>>>
>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>>>>>
>>>>>>
>>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmOperations.cpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>>> ??? L148: ? // The ever running loop for the VMThread
>>>>>> ??? L149: ? void loop();
>>>>>> ??? L150: ? static void check_cleanup();
>>>>>> ??????? nit - Feels like an odd place to add check_cleanup().
>>>>>>
>>>>>> ??????? Update: Now that I've seen what clean_up(), it needs a
>>>>>> ??????? better name. Perhaps check_for_forced_cleanup()? And since
>>>>>> ??????? it is supposed to affect the running loop for the VMThread
>>>>>> ??????? I'm okay with its location now.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>>> ??? L382: ? event->set_blocking(true);
>>>>>> ??????? Probably have to keep the 'blocking' attribute in the event
>>>>>> ??????? for backward compatibility in the JFR record format?
>>>>>>
>>>>>> ??? L478: ??????? // wait with a timeout to guarantee safepoints 
>>>>>> at regular intervals
>>>>>> ??????? Is this comment true anymore (even before this changeset)?
>>>>>> ??????? Adding this on the next line might help:
>>>>>>
>>>>>> ????????????????? // (if there is cleanup work to do)
>>>>>>
>>>>>> ??????? since I _think_ that's how the policy has been evolved...
>>>>>>
>>>>>> ??? L479: mu_queue.wait(GuaranteedSafepointInterval);
>>>>>> ??????? Please prefix with "(void)" to make it clear you are
>>>>>> ??????? intentionally ignoring the return value.
>>>>>>
>>>>>> ??? old L627-634 (We want to make sure that we get to a safepoint 
>>>>>> regularly)
>>>>>> ??????? I think this now old code is covered by your change above:
>>>>>>
>>>>>> ??????? L488: ??????? // If the queue contains a safepoint VM op,
>>>>>> ??????? L489: ??????? // clean up will be done so we can skip 
>>>>>> this part.
>>>>>> ??????? L490: ??????? if 
>>>>>> (!_vm_queue->peek_at_safepoint_priority()) {
>>>>>>
>>>>>> ??????? Please confirm that our thinking is the same here.
>>>>>>
>>>>>> ??? L661: ??? int ticket =? t->vm_operation_ticket();
>>>>>> ??????? nit - extra space after '='
>>>>>>
>>>>>> ??? Okay. Definitely simpler code.
>>>>>>
>>>>>> src/hotspot/share/runtime/handshake.cpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/safepoint.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/safepoint.cpp
>>>>>> ??? Definitely got my attention with
>>>>>> ??? ObjectSynchronizer::needs_monitor_scavenge().
>>>>>>
>>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>> ??? L921: ??? log_info(monitorinflation)("Monitor scavenge 
>>>>>> needed, triggering safepoint cleanup.");
>>>>>> ??????? Thanks for adding the logging line.
>>>>>>
>>>>>> ?? ? ?? Update: As Kim pointed out, this code goes away when
>>>>>> ??????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>>>>>> ? ? ? ? forward to making that change.
>>>>>>
>>>>>> ??? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
>>>>>> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
>>>>>> ??????? nit - extra space between 'xchg ('
>>>>>>
>>>>>> ??????? Since InduceScavenge() is only called when the deprecated
>>>>>> ??????? MonitorBound is specified, I think you could use cmpxchg()
>>>>>> ??????? for clarity. Of course, you might be thinking that the
>>>>>> ??????? pattern is a useful example for other folks to copy...
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>> ??? old L527: // Enqueue a VM_Operation to do the job for us - 
>>>>>> sometime later
>>>>>> ??? L527: void Thread::send_async_exception(oop java_thread, oop 
>>>>>> java_throwable) {
>>>>>> ??? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>>>>>> ??? L529: ? VMThread::execute(&vm_stop);
>>>>>> ??? L530: }
>>>>>> ?????? Okay so you deleted the comment about the call being async 
>>>>>> and the
>>>>>> ?????? VM op is no longer async, but does that break the 
>>>>>> expectation of
>>>>>> ?????? any callers?
>>>>>>
>>>>>> ?????? Off the top of head, I can't think of a way for a caller of
>>>>>> ?????? Thread::send_async_exception() to determine that the call 
>>>>>> is now
>>>>>> ?????? synchronous instead of asynchronous, but ...
>>>>>>
>>>>>> ?????? Update: Just took a look at JvmtiEnv::StopThread() which 
>>>>>> calls
>>>>>> ?????? Thread::send_async_exception(). If JVM/TI StopThread() is 
>>>>>> being
>>>>>> ?????? used to throw an exception at the calling thread, I 
>>>>>> suspect that
>>>>>> ?????? in the baseline, the call would always return 
>>>>>> JVMTI_ERROR_NONE.
>>>>>> ?????? With the exception throwing now being synchronous, would that
>>>>>> ?????? affect the return value of the JVM/TI StopThread() call?
>>>>>>
>>>>>> ?????? Looks like the JVM/TI wrapper (see 
>>>>>> gensrc/jvmtifiles/jvmtiEnter.cpp
>>>>>> ?????? in the build directory) uses ThreadInVMfromNative so the 
>>>>>> calling
>>>>>> ?????? thread is in VM when it requests the now synchronous VM 
>>>>>> operation.
>>>>>> ?????? When it requests the VM op, the calling thread will block 
>>>>>> which
>>>>>> ?????? should allow the VM thread to execute the op. No worries 
>>>>>> there so
>>>>>> ?????? far...
>>>>>>
>>>>>> ?????? It looks like the code also uses 
>>>>>> CautiouslyPreserveExceptionMark
>>>>>> ?????? so I think if the exception is delivered to the calling 
>>>>>> thread
>>>>>> ?????? it won't affect the return from jvmti_env->StopThread(), 
>>>>>> i.e., we
>>>>>> ?????? will have our return value. The 
>>>>>> CautiouslyPreserveExceptionMark
>>>>>> ?????? destructor won't kick in until we return from 
>>>>>> jvmti_StopThread()
>>>>>> ?????? (the JVM/TI wrapper from the build).
>>>>>>
>>>>>> ?????? However, that might cause this assertion to fire:
>>>>>>
>>>>>> src/hotspot/share/utilities/preserveException.cpp:
>>>>>> ?????? assert(!_thread->has_pending_exception(), "unexpected 
>>>>>> exception generated");
>>>>>>
>>>>>> ?????? because it is now detecting that an exception was thrown
>>>>>> ?????? while executing a JVM/TI call. This is pure theory here.
>>>>>>
>>>>>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/biasedLocking.cpp
>>>>>> ??? old L85: ??? // Use async VM operation to avoid blocking the 
>>>>>> Watcher thread.
>>>>>> ??????? Again, you've deleted the comment, but is there going to
>>>>>> ??????? be any unexpected side effects from the change? Looks like
>>>>>> ??????? the work consists of:
>>>>>>
>>>>>> ??????? L70: 
>>>>>> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>>>>>
>>>>>> ??????? Is that going to be a problem for the WatcherThread?
>>>>>>
>>>>>> test/hotspot/gtest/threadHelper.inline.hpp
>>>>>> ??? No comments.
>>>>>>
>>>>>> As David H. likes to say: the proof is in the building and testing.
>>>>>>
>>>>>> Thumbs up on the overall idea and implementation. There might be an
>>>>>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>>>>>> on my part...
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>>>>>> Hi Robbin,
>>>>>>>>
>>>>>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Here is v3:
>>>>>>>>>
>>>>>>>>> Full:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>>>>
>>>>>>>> Looking at the highly discussed:
>>>>>>>>
>>>>>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg 
>>>>>>>> (1, &ForceMonitorScavenge) == 0) {
>>>>>>>>
>>>>>>>> why isn't that just:
>>>>>>>>
>>>>>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>>>>>
>>>>>>>> ??
>>>>>>>
>>>>>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>>>>>> Many threads can be enter and re-enter here.
>>>>>>> I don't know if that's still the case.
>>>>>>>
>>>>>>> Since we only hit this path when the deprecated MonitorsBound is 
>>>>>>> set, I think I can change it?
>>>>>>>
>>>>>>>>
>>>>>>>> Also while we are here can we clean this up further:
>>>>>>>>
>>>>>>>> static volatile int ForceMonitorScavenge = 0;
>>>>>>>>
>>>>>>>> becomes
>>>>>>>>
>>>>>>>> static int _forceMonitorScavenge = 0;
>>>>>>>>
>>>>>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>>>>>
>>>>>>>
>>>>>>> Sure!
>>>>>>>
>>>>>>>> Just to be clear, I understand the changes around monitor 
>>>>>>>> scavenging now, though I'm not sure getting rid of async VM ops 
>>>>>>>> and replacing with a new way to directly wakeup the VMThread 
>>>>>>>> really amounts to a simplification.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>>>>
>>>>>>>> I still think getting rid of Mode altogether would be a good 
>>>>>>>> simplification. :)
>>>>>>>
>>>>>>> Sure!
>>>>>>>
>>>>>>> Here is v4, inc:
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>>>>>
>>>>>>> Tested t1-3
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>>>>>>> Inc:
>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>>>>>
>>>>>>>>> Tested t1-3
>>>>>>>>>
>>>>>>>>> Thanks, Robbin
>>>>>>>>>
>>>>>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>>>>>> Hi all, please review.
>>>>>>>>>>
>>>>>>>>>> CMS was the last real user of the more advantage features of 
>>>>>>>>>> VM operation.
>>>>>>>>>> VM operation can be simplified to always be an stack object 
>>>>>>>>>> and thus either be
>>>>>>>>>> of safepoint or no safepoint type.
>>>>>>>>>>
>>>>>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>>>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should 
>>>>>>>>>> stop for the safepoint anyways, no real point in not stopping 
>>>>>>>>>> direct.
>>>>>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint 
>>>>>>>>>> cleanup, the VM op is not needed. Arguably this thread should 
>>>>>>>>>> actually stop here, since we are about to safepoint.
>>>>>>>>>>
>>>>>>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>>>>>>> method is removed.
>>>>>>>>>> And the extra safepoint is removed:
>>>>>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>>>>>> No we don't :)
>>>>>>>>>>
>>>>>>>>>> Issue:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>>>>>> Change-set:
>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>>>>>
>>>>>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>>>>>
>>>>>>>>>> Thanks, Robbin
>>>>>>
>>>>>
>>>>
>>


From david.holmes at oracle.com  Wed Nov 27 00:28:19 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 10:28:19 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
Message-ID: <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>

On 27/11/2019 4:11 am, Doerr, Martin wrote:
> Hi David,
> 
>> Biased-locking is a very old optimization for uncontended locking, based
>> on a time when there was heavy single-threaded use of synchronized data
>> structures, and where actual lock/unlock atomic operations were very
>> expensive.
> 
> I don't understand which role the age of an optimization plays.
> The only thing I can imagine is that old optimizations may be subject for reevaluation.
> I'm ok with doing that.

That's exactly what role the age of an optimization plays. We've had 
numerous optimizations put in place to deal with the behaviour of 
hardware/OS at the time, and which subsequently becomes of no value when 
hardware/OS changes. We are reevaluating these constantly.

> I assume that the code which was written at that time still exists.

I would expect some still exists simply because you can get stuff from 
20 years ago. But the vast majority of non-trivial code that may have 
had extensive use of Vector/StringBuffer/Hashtable would have been 
rewritten to use more efficient single-threaded data structures when 
MT-safety was not needed (which is the uncontended case for which BL 
applies).

> And I think it's a valid approach to use a library which was written for a MT usage for a single-threaded use case.

It may be functionally "valid" but not the most efficient/performant.

> Not everyone wants to write an additional single-threaded version if BiasedLocking can do the job.

That's really clutching at straws. Based on that we'd never have defined 
StringBuilder to replace StringBuffer.

> I don't think lock/unlock atomic operations have become cheap.

They have certainly become cheaper.

> 
>> It is very complex and highly intrusive code. Every time "we"
>> have had to make changes to object monitor support, or safepoint
>> support, we have had to deal with the added complexity that
>> biased-locking introduced and "we" have asked ourselves many times
>> whether "we" can just get rid of this old optimization.
> 
> I understand this (also had to do some work for it). IMHO this is only a good argument if the benefit is small which should get proven.

We have no practical means to a-priori establish the benefit of BL to 
real world, significant, applications, but we expect it to be of limited 
benefit for the reasons outlined. All we can do is turn it off (with a 
way to turn back on) and see if our expectations are borne out. If our 
expectations are way off then we would hope that early access adopters 
would reflect that very quickly.

> 
> But I see that BiasedLocking is kind of disturbing for other features.
> E.g. UseRTMLocking in addition to project loom.
> Transactional memory fans would probably be happy about the deprecation because it can't be used together.
> It'd be also interesting to try UseRTMLocking together with project loom.
> 
> 
> I guess there will be more discussion when the JEP arrives.

I think a JEP is complete overkill for this. I don't see anything to be 
added to the discussion that hasn't already been said. Additional 
developer opinions won't change anything. Only real world data will show 
whether BL is still of sufficient value to make it worth keeping.

Regards,
David
-----

> 
> Best regards,
> Martin
> 

From david.holmes at oracle.com  Wed Nov 27 00:46:39 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 10:46:39 +1000
Subject: RFR 8234656: Improve granularity of verifier logging
In-Reply-To: <b3a0309f-7812-5d54-6fbf-1b170932a49c@oracle.com>
References: <599daef0-fa71-48da-49e5-343e04b6a1f3@oracle.com>
 <e4b4bb6e-09d5-cec4-e9fa-62d939dd12ed@oracle.com>
 <b3a0309f-7812-5d54-6fbf-1b170932a49c@oracle.com>
Message-ID: <2d130586-4c09-0f35-a1c5-7a406c11f626@oracle.com>

Hi Harold,

On 27/11/2019 2:19 am, Harold Seigel wrote:
> Hi David,
> 
> Thanks for looking at this change.? Please review this updated webrev 
> that moves the new test functionality into the existing hotspot 
> .../logging/VerificationTest.java test.
> 
>     http://cr.openjdk.java.net/~hseigel/bug_8234656.2/webrev/index.html

Thanks for doing that. You could integrate it a little more by having 
analyzeOutputOn take a parameter to indicate whether info or debug 
logging should be expected. But no big deal.

   66         pb = ProcessTools.createJavaProcessBuilder("-Xverify:all",
   67 
"-Xlog:verification=debug",
   68                                                    "-Xshare:off",
   69                                                    "-version");

why those flags versus just

pb = ProcessTools.createJavaProcessBuilder("-Xlog:verification=debug",
                                          InternalClass.class.getName());

?

Thanks,
David
> 
> Thanks, Harold
> 
> On 11/25/2019 5:30 PM, David Holmes wrote:
>> Hi Harold,
>>
>> On 26/11/2019 3:13 am, Harold Seigel wrote:
>>> Hi,
>>>
>>> Please review this small change to improve the granularity of 
>>> verifier logging.? This change provides brief output for log level 
>>> info and detailed logging for log levels debug and trace. 
>>> Additionally, it changes verifier test TraceClassRes.java to use the 
>>> logging API command line options.
>>
>> Deciding what to log at what level is highly subjective :) This change 
>> seems okay though as anyone who wants the current output can enable 
>> "debug" logging for verification and won't then get a tonne of other 
>> stuff they didn't want.
>>
>> The new test functionality could be added to the existing:
>>
>> ./hotspot/jtreg/runtime/logging/VerificationTest.java
>>
>> Thanks,
>> David
>> -----
>>
>>> Open Webrev: http://cr.openjdk.java.net/~hseigel/bug_8234656/webrev/
>>>
>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8234656
>>>
>>> The fix was regression tested by running Mach5 tiers 1 and 2 tests 
>>> and builds on Linux-x64, Solaris, Windows, and Mac OS X, by running 
>>> Mach5 tiers 3-5 tests on Linux-x64, and JCK lang and VM tests on 
>>> Linux-x64.
>>>
>>> Thanks, Harold
>>>

From john.r.rose at oracle.com  Wed Nov 27 01:24:52 2019
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 26 Nov 2019 17:24:52 -0800
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
Message-ID: <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>

+1 on David?s points.  I?d like to add that, as workloads and hardware changes,
the JVM needs to keep up.  An obvious case of this is adding and tuning
optimizations, an activity generally regarded as safe, although one users'
optimization might be another?s performance pothole.  And sometimes
we need to delete an optimization that no longer pays for itself, like BL.
In both cases (adding and deleting) if somebody shows us a problem, we?ll
fix it.  But in both cases we need to make progress.

BL is more than just a sunk cost and an innocuous add-on.  It is an ongoing
maintenance burden which slows us down.  (Like CMS was, in fact.)
It has been costly through its entire run, more than many similar features.
It seems simple, but as a transparent implementation of Java?s locking
semantics, it needs to handle every corner case, and that least to nightmarish
complexity.

Something like BL would be nice to have, but it can?t just piggy-back on
Java?s current locking model, barring heroics we don?t want to perform.
Something more explicit is required to make it feasible, an API or user model
that ties an object more firmly to the thread that needs to work on it.  We
are exploring this space in Panama under the rubric of ?confinement?, for
foreign resources.  I hope it will also bear fruit for the Java object model.

For now, I will be glad to see us prove (by experience) that BL is no longer
needed on today?s workloads and hardware, and can be deleted.  Current
evidence indicates this is possible. Barring new evidence, we need to clear
the decks here for future work.

? John

On Nov 26, 2019, at 4:28 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 27/11/2019 4:11 am, Doerr, Martin wrote:
>> Hi David,
>>> Biased-locking is a very old optimization for uncontended locking, based
>>> on a time when there was heavy single-threaded use of synchronized data
>>> structures, and where actual lock/unlock atomic operations were very
>>> expensive.
>> I don't understand which role the age of an optimization plays.
>> The only thing I can imagine is that old optimizations may be subject for reevaluation.
>> I'm ok with doing that.
> 
> That's exactly what role the age of an optimization plays. We've had numerous optimizations put in place to deal with the behaviour of hardware/OS at the time, and which subsequently becomes of no value when hardware/OS changes. We are reevaluating these constantly.
> 
>> I assume that the code which was written at that time still exists.
> 
> I would expect some still exists simply because you can get stuff from 20 years ago. But the vast majority of non-trivial code that may have had extensive use of Vector/StringBuffer/Hashtable would have been rewritten to use more efficient single-threaded data structures when MT-safety was not needed (which is the uncontended case for which BL applies).
> 
>> And I think it's a valid approach to use a library which was written for a MT usage for a single-threaded use case.
> 
> It may be functionally "valid" but not the most efficient/performant.
> 
>> Not everyone wants to write an additional single-threaded version if BiasedLocking can do the job.
> 
> That's really clutching at straws. Based on that we'd never have defined StringBuilder to replace StringBuffer.
> 
>> I don't think lock/unlock atomic operations have become cheap.
> 
> They have certainly become cheaper.
> 
>>> It is very complex and highly intrusive code. Every time "we"
>>> have had to make changes to object monitor support, or safepoint
>>> support, we have had to deal with the added complexity that
>>> biased-locking introduced and "we" have asked ourselves many times
>>> whether "we" can just get rid of this old optimization.
>> I understand this (also had to do some work for it). IMHO this is only a good argument if the benefit is small which should get proven.
> 
> We have no practical means to a-priori establish the benefit of BL to real world, significant, applications, but we expect it to be of limited benefit for the reasons outlined. All we can do is turn it off (with a way to turn back on) and see if our expectations are borne out. If our expectations are way off then we would hope that early access adopters would reflect that very quickly.
> 
>> But I see that BiasedLocking is kind of disturbing for other features.
>> E.g. UseRTMLocking in addition to project loom.
>> Transactional memory fans would probably be happy about the deprecation because it can't be used together.
>> It'd be also interesting to try UseRTMLocking together with project loom.
>> I guess there will be more discussion when the JEP arrives.
> 
> I think a JEP is complete overkill for this. I don't see anything to be added to the discussion that hasn't already been said. Additional developer opinions won't change anything. Only real world data will show whether BL is still of sufficient value to make it worth keeping.
> 
> Regards,
> David
> -----
> 
>> Best regards,
>> Martin


From david.holmes at oracle.com  Wed Nov 27 01:34:50 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 11:34:50 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
Message-ID: <51c0e504-c822-78eb-2c46-a2190b1aeab0@oracle.com>

Hi Robbin,

Incremental v4 looks good.

Thanks,
David

On 23/11/2019 12:39 am, Robbin Ehn wrote:
> Hi David,
> 
> On 11/22/19 7:13 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>> Hi,
>>>
>>> Here is v3:
>>>
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>
>> src/hotspot/share/runtime/synchronizer.cpp
>>
>> Looking at the highly discussed:
>>
>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>> &ForceMonitorScavenge) == 0) {
>>
>> why isn't that just:
>>
>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>
>> ??
> 
> I assumed someone had seen contention on ForceMonitorScavenge.
> Many threads can be enter and re-enter here.
> I don't know if that's still the case.
> 
> Since we only hit this path when the deprecated MonitorsBound is set, I 
> think I can change it?
> 
>>
>> Also while we are here can we clean this up further:
>>
>> static volatile int ForceMonitorScavenge = 0;
>>
>> becomes
>>
>> static int _forceMonitorScavenge = 0;
>>
>> so the variable doesn't look like it came from globals.hpp :)
>>
> 
> Sure!
> 
>> Just to be clear, I understand the changes around monitor scavenging 
>> now, though I'm not sure getting rid of async VM ops and replacing 
>> with a new way to directly wakeup the VMThread really amounts to a 
>> simplification.
>>
>> ---
>>
>> src/hotspot/share/runtime/vmOperations.hpp
>>
>> I still think getting rid of Mode altogether would be a good 
>> simplification. :)
> 
> Sure!
> 
> Here is v4, inc:
> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
> Full:
> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
> 
> Tested t1-3
> 
> Thanks, Robbin
> 
> 
>>
>> Thanks,
>> David
>> -----
>>
>>
>>> Inc:
>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>
>>> Tested t1-3
>>>
>>> Thanks, Robbin
>>>
>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>> Hi all, please review.
>>>>
>>>> CMS was the last real user of the more advantage features of VM 
>>>> operation.
>>>> VM operation can be simplified to always be an stack object and thus 
>>>> either be
>>>> of safepoint or no safepoint type.
>>>>
>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>> (default not used). Making it synchrone doesn't matter.
>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>>> for the safepoint anyways, no real point in not stopping direct.
>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the 
>>>> VM op is not needed. Arguably this thread should actually stop here, 
>>>> since we are about to safepoint.
>>>>
>>>> There is also a small cleanup in vmThread.cpp where an unused method 
>>>> is removed.
>>>> And the extra safepoint is removed:
>>>> "// We want to make sure that we get to a safepoint regularly"
>>>> No we don't :)
>>>>
>>>> Issue:
>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>> Change-set:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>
>>>> Tested scavenge manually, passes t1-2.
>>>>
>>>> Thanks, Robbin

From david.holmes at oracle.com  Wed Nov 27 01:49:22 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 11:49:22 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <a77c1a73-af3d-daa4-4f0b-11d886e0909f@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <d48d6472-5cd2-fbb3-8df4-c71920372948@oracle.com>
 <f4d66c80-3da3-0d80-fad2-e79ae3380bf3@oracle.com>
 <a77c1a73-af3d-daa4-4f0b-11d886e0909f@oracle.com>
Message-ID: <9a79d547-8776-ac43-cadb-5aff142dc22e@oracle.com>

Hi Robbin,

On 25/11/2019 6:06 pm, Robbin Ehn wrote:
> Hi,
> 
> Starting with this email due to thanksgiving.
> 
> On 2019-11-25 06:36, David Holmes wrote:
>> Hi Dan,
>>
>> I support your analysis regarding JVM TI StopThread. It's very hard 
>> via code inspection to be 100% certain but I think Robbin's change 
>> will install the async-exception in the current thread in the context 
>> of the StopThread call, resulting in the 
>> CautiouslyPreserveExceptionMark asserting. Unfortunately JVM TI 
>> StopThread doesn't special-case the current thread the way 
>> JVM_StopThread does.
> 
> It passes jvmti/jdi tests.

We may not have any self-stopping tests.

> The async-exception is thread internally only delivered when going to 
> java, via
> the suspend flags. In the case going VM->native it should not be delivered.
> I'll investigate and have a look.

Okay. Figuring out exactly which actions happen on which transitions is 
always a challenge for me. I can't say I have any intuition about why 
VM->native should not install an async exception, but if it doesn't (and 
there's no new transition introduced that does) then that is fine.

> I find it very unsettling that jvmti StopThread is not deprecated?
> This have exactly the same flaws as Thread.stop().
> Meaning even if we remove Thread.stop() the VM needs to support this flawed
> stopping ability...

Yes it is flawed, but Thread.stop is unlikely to ever actually be 
removed (it is deprecated but not deprecated-for-removal).

What should have happened is that JVM TI StopThread should have been 
modified to only allow ThreadDeath at the same time that 
java.lang.Thread(Throwable t) was changed to throw 
UnsupportedOperationException. That is probably a change we should still 
make ... in fact I will file a RFE for that.

>>
>> Your observations about the WatcherThread change in behaviour are also 
>> spot on. Potentially at least, forcing the WatcherThread to wait for 
>> the safepoint to be executed could interfere with executing other 
>> periodic tasks. By default the WatcherThread won't be executing this 
>> code as the BiasedLockingStartupDelay is zero. But potentially, if 
>> anyone has that delay enabled, this could cause an observable change 
>> in behaviour in relation to other PeriodicTasks.
> 
> In RFR I had this comment:
> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
> (default
> not used). Making it synchrone doesn't matter.
> 
> A periodic task have a minimum resolution of 10ms, while the safepoint 
> for enabling biased locking takes <1ms under normal circumstances. On an
> over-provisioned machine we see longer safepoints, but we see also see 
> scheduler
> delays up to 35-40ms.
> 
> I deemed it very unlikely that it is possible to notice it.

Okay.

>> Perhaps the ability to execute an async-safepoint VM operation needs 
>> to remain, for simplicity (compared to working around the issues).
> 
> I'm hoping not :(

To be quite upfront I don't think support for async VM ops really adds 
any complexity, whereas ensuring the existing async ops can safely 
become sync ops has been a bit of effort :) But it is a one-time effort 
so ...

Thanks,
David

> /Robbin
> 
>>
>> David
>> -----
>>
>> On 23/11/2019 7:50 am, Daniel D. Daugherty wrote:
>>> Hi Robbin,
>>>
>>> Sorry I'm late to this review thread...
>>>
>>> I'm adding Serguei to this email thread since I'm making comments
>>> about the JVM/TI parts of this changeset...
>>>
>>>
>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html 
>>>
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/runtime/vmOperations.cpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/runtime/vmThread.hpp
>>> ???? L148: ? // The ever running loop for the VMThread
>>> ???? L149: ? void loop();
>>> ???? L150: ? static void check_cleanup();
>>> ???????? nit - Feels like an odd place to add check_cleanup().
>>>
>>> ???????? Update: Now that I've seen what clean_up(), it needs a
>>> ???????? better name. Perhaps check_for_forced_cleanup()? And since
>>> ???????? it is supposed to affect the running loop for the VMThread
>>> ???????? I'm okay with its location now.
>>>
>>> src/hotspot/share/runtime/vmThread.cpp
>>> ???? L382: ? event->set_blocking(true);
>>> ???????? Probably have to keep the 'blocking' attribute in the event
>>> ???????? for backward compatibility in the JFR record format?
>>>
>>> ???? L478: ??????? // wait with a timeout to guarantee safepoints at 
>>> regular intervals
>>> ???????? Is this comment true anymore (even before this changeset)?
>>> ???????? Adding this on the next line might help:
>>>
>>> ?????????????????? // (if there is cleanup work to do)
>>>
>>> ???????? since I _think_ that's how the policy has been evolved...
>>>
>>> ???? L479: ??????? mu_queue.wait(GuaranteedSafepointInterval);
>>> ???????? Please prefix with "(void)" to make it clear you are
>>> ???????? intentionally ignoring the return value.
>>>
>>> ???? old L627-634 (We want to make sure that we get to a safepoint 
>>> regularly)
>>> ???????? I think this now old code is covered by your change above:
>>>
>>> ???????? L488: ??????? // If the queue contains a safepoint VM op,
>>> ???????? L489: ??????? // clean up will be done so we can skip this 
>>> part.
>>> ???????? L490: ??????? if (!_vm_queue->peek_at_safepoint_priority()) {
>>>
>>> ???????? Please confirm that our thinking is the same here.
>>>
>>> ???? L661: ??? int ticket =? t->vm_operation_ticket();
>>> ???????? nit - extra space after '='
>>>
>>> ???? Okay. Definitely simpler code.
>>>
>>> src/hotspot/share/runtime/handshake.cpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/runtime/safepoint.hpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/runtime/safepoint.cpp
>>> ???? Definitely got my attention with
>>> ???? ObjectSynchronizer::needs_monitor_scavenge().
>>>
>>> src/hotspot/share/runtime/synchronizer.hpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>> ???? L921: ??? log_info(monitorinflation)("Monitor scavenge needed, 
>>> triggering safepoint cleanup.");
>>> ???????? Thanks for adding the logging line.
>>>
>>> ??? ? ?? Update: As Kim pointed out, this code goes away when
>>> ???????? MonitorBound is made obsolete (JDK-8230940). I'm looking
>>> ?? ? ? ? forward to making that change.
>>>
>>> ???? L1003: ? if (Atomic::load(&_forceMonitorScavenge) == 0 && 
>>> Atomic::xchg (1, &_forceMonitorScavenge) == 0) {
>>> ???????? nit - extra space between 'xchg ('
>>>
>>> ???????? Since InduceScavenge() is only called when the deprecated
>>> ???????? MonitorBound is specified, I think you could use cmpxchg()
>>> ???????? for clarity. Of course, you might be thinking that the
>>> ???????? pattern is a useful example for other folks to copy...
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>> ???? old L527: // Enqueue a VM_Operation to do the job for us - 
>>> sometime later
>>> ???? L527: void Thread::send_async_exception(oop java_thread, oop 
>>> java_throwable) {
>>> ???? L528: ? VM_ThreadStop vm_stop(java_thread, java_throwable);
>>> ???? L529: ? VMThread::execute(&vm_stop);
>>> ???? L530: }
>>> ??????? Okay so you deleted the comment about the call being async 
>>> and the
>>> ??????? VM op is no longer async, but does that break the expectation of
>>> ??????? any callers?
>>>
>>> ??????? Off the top of head, I can't think of a way for a caller of
>>> ??????? Thread::send_async_exception() to determine that the call is now
>>> ??????? synchronous instead of asynchronous, but ...
>>>
>>> ??????? Update: Just took a look at JvmtiEnv::StopThread() which calls
>>> ??????? Thread::send_async_exception(). If JVM/TI StopThread() is being
>>> ??????? used to throw an exception at the calling thread, I suspect that
>>> ??????? in the baseline, the call would always return JVMTI_ERROR_NONE.
>>> ??????? With the exception throwing now being synchronous, would that
>>> ??????? affect the return value of the JVM/TI StopThread() call?
>>>
>>> ??????? Looks like the JVM/TI wrapper (see 
>>> gensrc/jvmtifiles/jvmtiEnter.cpp
>>> ??????? in the build directory) uses ThreadInVMfromNative so the calling
>>> ??????? thread is in VM when it requests the now synchronous VM 
>>> operation.
>>> ??????? When it requests the VM op, the calling thread will block which
>>> ??????? should allow the VM thread to execute the op. No worries 
>>> there so
>>> ??????? far...
>>>
>>> ??????? It looks like the code also uses CautiouslyPreserveExceptionMark
>>> ??????? so I think if the exception is delivered to the calling thread
>>> ??????? it won't affect the return from jvmti_env->StopThread(), 
>>> i.e., we
>>> ??????? will have our return value. The CautiouslyPreserveExceptionMark
>>> ??????? destructor won't kick in until we return from jvmti_StopThread()
>>> ??????? (the JVM/TI wrapper from the build).
>>>
>>> ??????? However, that might cause this assertion to fire:
>>>
>>> ??????? src/hotspot/share/utilities/preserveException.cpp:
>>> ??????? assert(!_thread->has_pending_exception(), "unexpected 
>>> exception generated");
>>>
>>> ??????? because it is now detecting that an exception was thrown
>>> ??????? while executing a JVM/TI call. This is pure theory here.
>>>
>>> src/hotspot/share/jfr/leakprofiler/utilities/vmOperation.hpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/jfr/recorder/service/jfrRecorderService.cpp
>>> ???? No comments.
>>>
>>> src/hotspot/share/runtime/biasedLocking.cpp
>>> ???? old L85: ??? // Use async VM operation to avoid blocking the 
>>> Watcher thread.
>>> ???????? Again, you've deleted the comment, but is there going to
>>> ???????? be any unexpected side effects from the change? Looks like
>>> ???????? the work consists of:
>>>
>>> ???????? L70: 
>>> ClassLoaderDataGraph::dictionary_classes_do(enable_biased_locking);
>>>
>>> ???????? Is that going to be a problem for the WatcherThread?
>>>
>>> test/hotspot/gtest/threadHelper.inline.hpp
>>> ???? No comments.
>>>
>>> As David H. likes to say: the proof is in the building and testing.
>>>
>>> Thumbs up on the overall idea and implementation. There might be an
>>> issue lurking there in JVM/TI StopThread(), but that's just a theory
>>> on my part...
>>>
>>> Dan
>>>
>>>
>>>
>>> On 11/22/19 9:39 AM, Robbin Ehn wrote:
>>>> Hi David,
>>>>
>>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Here is v3:
>>>>>>
>>>>>> Full:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>>
>>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>>
>>>>> Looking at the highly discussed:
>>>>>
>>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>>> &ForceMonitorScavenge) == 0) {
>>>>>
>>>>> why isn't that just:
>>>>>
>>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>>
>>>>> ??
>>>>
>>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>>> Many threads can be enter and re-enter here.
>>>> I don't know if that's still the case.
>>>>
>>>> Since we only hit this path when the deprecated MonitorsBound is 
>>>> set, I think I can change it?
>>>>
>>>>>
>>>>> Also while we are here can we clean this up further:
>>>>>
>>>>> static volatile int ForceMonitorScavenge = 0;
>>>>>
>>>>> becomes
>>>>>
>>>>> static int _forceMonitorScavenge = 0;
>>>>>
>>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>>
>>>>
>>>> Sure!
>>>>
>>>>> Just to be clear, I understand the changes around monitor 
>>>>> scavenging now, though I'm not sure getting rid of async VM ops and 
>>>>> replacing with a new way to directly wakeup the VMThread really 
>>>>> amounts to a simplification.
>>>>>
>>>>> ---
>>>>>
>>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>>
>>>>> I still think getting rid of Mode altogether would be a good 
>>>>> simplification. :)
>>>>
>>>> Sure!
>>>>
>>>> Here is v4, inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>>> Inc:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>>
>>>>>> Tested t1-3
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>>> Hi all, please review.
>>>>>>>
>>>>>>> CMS was the last real user of the more advantage features of VM 
>>>>>>> operation.
>>>>>>> VM operation can be simplified to always be an stack object and 
>>>>>>> thus either be
>>>>>>> of safepoint or no safepoint type.
>>>>>>>
>>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should 
>>>>>>> stop for the safepoint anyways, no real point in not stopping 
>>>>>>> direct.
>>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>>>>> the VM op is not needed. Arguably this thread should actually 
>>>>>>> stop here, since we are about to safepoint.
>>>>>>>
>>>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>>>> method is removed.
>>>>>>> And the extra safepoint is removed:
>>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>>> No we don't :)
>>>>>>>
>>>>>>> Issue:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>>> Change-set:
>>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>>
>>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>

From david.holmes at oracle.com  Wed Nov 27 04:34:39 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 14:34:39 +1000
Subject: hotspot support dynamic expansion method stack
In-Reply-To: <FF479F9D-636E-4678-9185-C2E51A2D46A0@ele.me>
References: <9645B0EE-CEA6-4447-8574-6E75B5099695@ele.me>
 <7ddcb0af-877e-f842-5016-2226d1cb7b0f@oracle.com>
 <FF479F9D-636E-4678-9185-C2E51A2D46A0@ele.me>
Message-ID: <f571e70d-6d71-252e-7bad-95748b58ce0c@oracle.com>

On 27/11/2019 2:13 pm, xun.chen at ele.me wrote:
> thanks ?hotspot uses threads with fixed stack sizes ? Which jvm 
> vendors support dynamic stacks?

I'm not aware of any.

David
-----

> ?????????
> ??(ace)?????????? ELEME Inc.
> email?xun.chen at ele.me <mailto:xun.chen at ele.me>?|?mobile:+86 15216614939
> http://ele.me????
> 
> 
> 
> 
>> ? 2019?11?27????7:57?David Holmes <david.holmes at oracle.com 
>> <mailto:david.holmes at oracle.com>> ???
>>
>> Hi,
>>
>> On 26/11/2019 5:29 pm, xun.chen at ele.me <mailto:xun.chen at ele.me> wrote:
>>> hi?
>>> The Java virtual machine specification specifies two exception 
>>> conditions for this area: If the stack depth requested by the thread 
>>> is greater than the depth allowed by the virtual machine, a 
>>> StackOverflowError exception will be thrown. If the virtual machine 
>>> fails to apply for enough memory, it will run OutOfMemoryError exception.
>>> But Does hotspot support dynamic expansion method stack?
>>
>> hotspot uses threads with fixed stack sizes.
>>
>>> How to reproduce this phenomenon ?
>>
>> You want to know how to generate a StackOverflowError? Just keep 
>> recursing into a function.
>>
>> David
>> -----
>>
>>> ?????????
>>> ??(ace)?????????? ELEME Inc.
>>> email?xun.chen at ele.me 
>>> <mailto:xun.chen at ele.me><mailto:xun.chen at ele.me> | mobile:+86 15216614939
>>> http://ele.me ???
> 

From robbin.ehn at oracle.com  Wed Nov 27 10:09:13 2019
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 27 Nov 2019 11:09:13 +0100
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <51c0e504-c822-78eb-2c46-a2190b1aeab0@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <51c0e504-c822-78eb-2c46-a2190b1aeab0@oracle.com>
Message-ID: <b1716d61-f7b4-c329-d049-abd0845c6115@oracle.com>

Thanks David!

I hope you noted that there is a v5 also  with Dan's comments:
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/037212.html
http://cr.openjdk.java.net/~rehn/8234086/v5/inc

To many mails here :)

/Robbin

On 11/27/19 2:34 AM, David Holmes wrote:
> Hi Robbin,
> 
> Incremental v4 looks good.
> 
> Thanks,
> David
> 
> On 23/11/2019 12:39 am, Robbin Ehn wrote:
>> Hi David,
>>
>> On 11/22/19 7:13 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>> Hi,
>>>>
>>>> Here is v3:
>>>>
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>
>>> src/hotspot/share/runtime/synchronizer.cpp
>>>
>>> Looking at the highly discussed:
>>>
>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>> &ForceMonitorScavenge) == 0) {
>>>
>>> why isn't that just:
>>>
>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>
>>> ??
>>
>> I assumed someone had seen contention on ForceMonitorScavenge.
>> Many threads can be enter and re-enter here.
>> I don't know if that's still the case.
>>
>> Since we only hit this path when the deprecated MonitorsBound is set, I think 
>> I can change it?
>>
>>>
>>> Also while we are here can we clean this up further:
>>>
>>> static volatile int ForceMonitorScavenge = 0;
>>>
>>> becomes
>>>
>>> static int _forceMonitorScavenge = 0;
>>>
>>> so the variable doesn't look like it came from globals.hpp :)
>>>
>>
>> Sure!
>>
>>> Just to be clear, I understand the changes around monitor scavenging now, 
>>> though I'm not sure getting rid of async VM ops and replacing with a new way 
>>> to directly wakeup the VMThread really amounts to a simplification.
>>>
>>> ---
>>>
>>> src/hotspot/share/runtime/vmOperations.hpp
>>>
>>> I still think getting rid of Mode altogether would be a good simplification. :)
>>
>> Sure!
>>
>> Here is v4, inc:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>> Full:
>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>
>> Tested t1-3
>>
>> Thanks, Robbin
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>
>>>> Tested t1-3
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>> Hi all, please review.
>>>>>
>>>>> CMS was the last real user of the more advantage features of VM operation.
>>>>> VM operation can be simplified to always be an stack object and thus either be
>>>>> of safepoint or no safepoint type.
>>>>>
>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if needed 
>>>>> (default not used). Making it synchrone doesn't matter.
>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop for the 
>>>>> safepoint anyways, no real point in not stopping direct.
>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, the VM op 
>>>>> is not needed. Arguably this thread should actually stop here, since we are 
>>>>> about to safepoint.
>>>>>
>>>>> There is also a small cleanup in vmThread.cpp where an unused method is 
>>>>> removed.
>>>>> And the extra safepoint is removed:
>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>> No we don't :)
>>>>>
>>>>> Issue:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>> Change-set:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>
>>>>> Tested scavenge manually, passes t1-2.
>>>>>
>>>>> Thanks, Robbin

From david.holmes at oracle.com  Wed Nov 27 10:33:52 2019
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 27 Nov 2019 20:33:52 +1000
Subject: RFR(s): 8234086: VM operation can be simplified
In-Reply-To: <b1716d61-f7b4-c329-d049-abd0845c6115@oracle.com>
References: <34c9d2f7-d6e3-f0df-6745-898d85c333ce@oracle.com>
 <327c2360-349e-8847-bed4-a03f9d9e6430@oracle.com>
 <c1aac43b-ab12-37a1-9b35-deef31b9b000@oracle.com>
 <f95b4ca4-f89b-4f7f-147c-9d251c207a60@oracle.com>
 <51c0e504-c822-78eb-2c46-a2190b1aeab0@oracle.com>
 <b1716d61-f7b4-c329-d049-abd0845c6115@oracle.com>
Message-ID: <8eadaa27-2d3e-b53b-abc9-5b0293de1d3f@oracle.com>

On 27/11/2019 8:09 pm, Robbin Ehn wrote:
> Thanks David!
> 
> I hope you noted that there is a v5 also? with Dan's comments:
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-November/037212.html 
> 
> http://cr.openjdk.java.net/~rehn/8234086/v5/inc

I hadn't noticed. Those changes are also fine. The use of xchg rather 
than cmpxchg is a good simplification given the value must be 0 or 1.

> To many mails here :)

:) Makes me wonder how things will work once we move to git ;-)

Thanks,
David

> 
> /Robbin
> 
> On 11/27/19 2:34 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> Incremental v4 looks good.
>>
>> Thanks,
>> David
>>
>> On 23/11/2019 12:39 am, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 11/22/19 7:13 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> On 21/11/2019 9:50 pm, Robbin Ehn wrote:
>>>>> Hi,
>>>>>
>>>>> Here is v3:
>>>>>
>>>>> Full:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/full/webrev/
>>>>
>>>> src/hotspot/share/runtime/synchronizer.cpp
>>>>
>>>> Looking at the highly discussed:
>>>>
>>>> if (Atomic::load(&ForceMonitorScavenge) == 0 && Atomic::xchg (1, 
>>>> &ForceMonitorScavenge) == 0) {
>>>>
>>>> why isn't that just:
>>>>
>>>> if (Atomic::cmpxchg(1, &ForceMonitorScavenge,0) == 0) {
>>>>
>>>> ??
>>>
>>> I assumed someone had seen contention on ForceMonitorScavenge.
>>> Many threads can be enter and re-enter here.
>>> I don't know if that's still the case.
>>>
>>> Since we only hit this path when the deprecated MonitorsBound is set, 
>>> I think I can change it?
>>>
>>>>
>>>> Also while we are here can we clean this up further:
>>>>
>>>> static volatile int ForceMonitorScavenge = 0;
>>>>
>>>> becomes
>>>>
>>>> static int _forceMonitorScavenge = 0;
>>>>
>>>> so the variable doesn't look like it came from globals.hpp :)
>>>>
>>>
>>> Sure!
>>>
>>>> Just to be clear, I understand the changes around monitor scavenging 
>>>> now, though I'm not sure getting rid of async VM ops and replacing 
>>>> with a new way to directly wakeup the VMThread really amounts to a 
>>>> simplification.
>>>>
>>>> ---
>>>>
>>>> src/hotspot/share/runtime/vmOperations.hpp
>>>>
>>>> I still think getting rid of Mode altogether would be a good 
>>>> simplification. :)
>>>
>>> Sure!
>>>
>>> Here is v4, inc:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/inc/webrev/index.html
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8234086/v4/full/webrev/index.html
>>>
>>> Tested t1-3
>>>
>>> Thanks, Robbin
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Inc:
>>>>> http://cr.openjdk.java.net/~rehn/8234086/v3/inc/webrev/
>>>>>
>>>>> Tested t1-3
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 2019-11-19 12:05, Robbin Ehn wrote:
>>>>>> Hi all, please review.
>>>>>>
>>>>>> CMS was the last real user of the more advantage features of VM 
>>>>>> operation.
>>>>>> VM operation can be simplified to always be an stack object and 
>>>>>> thus either be
>>>>>> of safepoint or no safepoint type.
>>>>>>
>>>>>> VM_EnableBiasedLocking is executed once by watcher thread, if 
>>>>>> needed (default not used). Making it synchrone doesn't matter.
>>>>>> VM_ThreadStop is executed by a JavaThread, that thread should stop 
>>>>>> for the safepoint anyways, no real point in not stopping direct.
>>>>>> VM_ScavengeMonitors is only used to trigger a safepoint cleanup, 
>>>>>> the VM op is not needed. Arguably this thread should actually stop 
>>>>>> here, since we are about to safepoint.
>>>>>>
>>>>>> There is also a small cleanup in vmThread.cpp where an unused 
>>>>>> method is removed.
>>>>>> And the extra safepoint is removed:
>>>>>> "// We want to make sure that we get to a safepoint regularly"
>>>>>> No we don't :)
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8234086
>>>>>> Change-set:
>>>>>> http://cr.openjdk.java.net/~rehn/8234086/v1/webrev/index.html
>>>>>>
>>>>>> Tested scavenge manually, passes t1-2.
>>>>>>
>>>>>> Thanks, Robbin

From aph at redhat.com  Wed Nov 27 13:55:14 2019
From: aph at redhat.com (Andrew Haley)
Date: Wed, 27 Nov 2019 13:55:14 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
 <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>
Message-ID: <eafe9e6f-5019-f9ca-1b38-a6bbdc66f786@redhat.com>

On 11/27/19 1:24 AM, John Rose wrote:
> For now, I will be glad to see us prove (by experience) that BL is no longer
> needed on today?s workloads and hardware, and can be deleted.  Current
> evidence indicates this is possible. Barring new evidence, we need to clear
> the decks here for future work.

What evidence is that? We've already heard that one of the benchmarks
quoted (SPECjbb2015) was specifically written so that locks are more
or less contended, so that locks get out of their biased state.
According to the author of the benchmark, "therefore, arguing that
biased locking is not needed because SPECjbb2015 does not show the
benefit of having it enabled -- is circular."

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From ioi.lam at oracle.com  Wed Nov 27 18:29:13 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 27 Nov 2019 10:29:13 -0800
Subject: RFR(XXS) 8230385 [cds] No message is logged when shared image cannot
 be used due to mismatched configuration
Message-ID: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8230385
http://cr.openjdk.java.net/~iklam/jdk14/8230385-log-when-cds-is-disabled.v01/

Please review this one-liner addition that prints a log message when CDS 
cannot be used. Output:

$ java -XX:-UseCompressedOops -Xlog:cds -version
[0.001s][info][cds] Unable to use shared archive: UseCompressedOops and 
UseCompressedClassPointers must be on for UseSharedSpaces.
java version "14-internal" 2020-03-17
Java(TM) SE Runtime Environment (fastdebug build 
14-internal+0-adhoc.iklam.open)
Java HotSpot(TM) 64-Bit Server VM (fastdebug build 
14-internal+0-adhoc.iklam.open, mixed mode)

Thanks
- Ioi

From thomas.stuefe at gmail.com  Wed Nov 27 18:32:54 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 27 Nov 2019 19:32:54 +0100
Subject: RFR(XXS) 8230385 [cds] No message is logged when shared image
 cannot be used due to mismatched configuration
In-Reply-To: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
References: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
Message-ID: <CAA-vtUzcr6w3wOjsYv02ED_NT5wB39XhmVJRw1j5Efz+5cod9Q@mail.gmail.com>

Looks good Ioi. I'm surprised UL works this early.

Cheers, Thomas

On Wed, Nov 27, 2019 at 7:29 PM Ioi Lam <ioi.lam at oracle.com> wrote:

> https://bugs.openjdk.java.net/browse/JDK-8230385
>
> http://cr.openjdk.java.net/~iklam/jdk14/8230385-log-when-cds-is-disabled.v01/
>
> Please review this one-liner addition that prints a log message when CDS
> cannot be used. Output:
>
> $ java -XX:-UseCompressedOops -Xlog:cds -version
> [0.001s][info][cds] Unable to use shared archive: UseCompressedOops and
> UseCompressedClassPointers must be on for UseSharedSpaces.
> java version "14-internal" 2020-03-17
> Java(TM) SE Runtime Environment (fastdebug build
> 14-internal+0-adhoc.iklam.open)
> Java HotSpot(TM) 64-Bit Server VM (fastdebug build
> 14-internal+0-adhoc.iklam.open, mixed mode)
>
> Thanks
> - Ioi
>

From calvin.cheung at oracle.com  Wed Nov 27 21:04:08 2019
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Wed, 27 Nov 2019 13:04:08 -0800
Subject: RFR(XXS) 8230385 [cds] No message is logged when shared image
 cannot be used due to mismatched configuration
In-Reply-To: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
References: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
Message-ID: <3f5b2cec-8bde-0b38-43ec-e68fdef94d57@oracle.com>

Looks good.

thanks,

Calvin

On 11/27/19 10:29 AM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8230385
> http://cr.openjdk.java.net/~iklam/jdk14/8230385-log-when-cds-is-disabled.v01/ 
>
>
> Please review this one-liner addition that prints a log message when 
> CDS cannot be used. Output:
>
> $ java -XX:-UseCompressedOops -Xlog:cds -version
> [0.001s][info][cds] Unable to use shared archive: UseCompressedOops 
> and UseCompressedClassPointers must be on for UseSharedSpaces.
> java version "14-internal" 2020-03-17
> Java(TM) SE Runtime Environment (fastdebug build 
> 14-internal+0-adhoc.iklam.open)
> Java HotSpot(TM) 64-Bit Server VM (fastdebug build 
> 14-internal+0-adhoc.iklam.open, mixed mode)
>
> Thanks
> - Ioi

From david.holmes at oracle.com  Wed Nov 27 21:08:31 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 28 Nov 2019 07:08:31 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <eafe9e6f-5019-f9ca-1b38-a6bbdc66f786@redhat.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
 <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>
 <eafe9e6f-5019-f9ca-1b38-a6bbdc66f786@redhat.com>
Message-ID: <085e21fb-31f9-48e3-1f06-294a3da079c3@oracle.com>

On 27/11/2019 11:55 pm, Andrew Haley wrote:
> On 11/27/19 1:24 AM, John Rose wrote:
>> For now, I will be glad to see us prove (by experience) that BL is no longer
>> needed on today?s workloads and hardware, and can be deleted.  Current
>> evidence indicates this is possible. Barring new evidence, we need to clear
>> the decks here for future work.
> 
> What evidence is that? We've already heard that one of the benchmarks
> quoted (SPECjbb2015) was specifically written so that locks are more
> or less contended, so that locks get out of their biased state.
> According to the author of the benchmark, "therefore, arguing that
> biased locking is not needed because SPECjbb2015 does not show the
> benefit of having it enabled -- is circular."

SPECjbb2015 was used as supporting evidence that a modern benchmark that 
people like to measure and report is not impacted by this change. As I 
now know there is a good reason for that. But if SPECjbb2015 is somehow 
supposed to be representative of some class of real applications then 
this also bodes well for those applications.

David

From david.holmes at oracle.com  Wed Nov 27 21:39:51 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 28 Nov 2019 07:39:51 +1000
Subject: RFR(XXS) 8230385 [cds] No message is logged when shared image
 cannot be used due to mismatched configuration
In-Reply-To: <CAA-vtUzcr6w3wOjsYv02ED_NT5wB39XhmVJRw1j5Efz+5cod9Q@mail.gmail.com>
References: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
 <CAA-vtUzcr6w3wOjsYv02ED_NT5wB39XhmVJRw1j5Efz+5cod9Q@mail.gmail.com>
Message-ID: <72d36881-b82c-10d4-1d5d-d6a2f07b677e@oracle.com>

On 28/11/2019 4:32 am, Thomas St?fe wrote:
> Looks good Ioi. I'm surprised UL works this early.

I'm also surprised. no_shared_spaces() is called from three different 
paths - will they all work? It is hard to see by code inspection.

Cheers,
David

> Cheers, Thomas
> 
> On Wed, Nov 27, 2019 at 7:29 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> 
>> https://bugs.openjdk.java.net/browse/JDK-8230385
>>
>> http://cr.openjdk.java.net/~iklam/jdk14/8230385-log-when-cds-is-disabled.v01/
>>
>> Please review this one-liner addition that prints a log message when CDS
>> cannot be used. Output:
>>
>> $ java -XX:-UseCompressedOops -Xlog:cds -version
>> [0.001s][info][cds] Unable to use shared archive: UseCompressedOops and
>> UseCompressedClassPointers must be on for UseSharedSpaces.
>> java version "14-internal" 2020-03-17
>> Java(TM) SE Runtime Environment (fastdebug build
>> 14-internal+0-adhoc.iklam.open)
>> Java HotSpot(TM) 64-Bit Server VM (fastdebug build
>> 14-internal+0-adhoc.iklam.open, mixed mode)
>>
>> Thanks
>> - Ioi
>>

From fw at deneb.enyo.de  Wed Nov 27 22:30:37 2019
From: fw at deneb.enyo.de (Florian Weimer)
Date: Wed, 27 Nov 2019 23:30:37 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com> (David Holmes's
 message of "Mon, 18 Nov 2019 21:27:41 +1000")
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
Message-ID: <878so1lzoy.fsf@mid.deneb.enyo.de>

* David Holmes:

> For a micro-benchmark like that sure. But is that at all representative 
> of real modern code? We know some of the really old benchmarks used 
> synchronized collections and StringBuffer extensively and so they also 
> benefit from biased-locking. But more modern benchmarks are not showing 
> any benefit.
>
> We'd like to know the impact on real applications but we have no way to 
> know that a-priori. So we're either stuck with the burden of supporting 
> biased-locking forever, or we flip the switch to turn it off and see if 
> it causes too many issues. Unless you see another way to determine this?

How important is biased locking for JNI libraries which use
synchronization to prevent incorrect concurrent use of objects (with
JNI resources) from resulting in arbitrary memory corruption?

I've always thought that biased locking is really attractive for this
scenario because these locks are just overhead if the library is used
correctly.  One wouldn't want to use ReentrantLocks here precisely
because of the benefits of biased locking.

On the other hand, the SWIG binding generator does not systematically
emit such synchronization.  There is an attempt to deal with the
finalization race condition, but I have doubts regarding its
effectiveness because the SWIG-generated code does not ensure object
reachability after JNI calls (which are apparently implemented as
static native methods).

If biased locking were determined important to JNI code, it would be
interesting to consider how its performance gains relate to the
general JNI overhead.

From ioi.lam at oracle.com  Thu Nov 28 00:41:34 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 27 Nov 2019 16:41:34 -0800
Subject: RFR(XXS) 8230385 [cds] No message is logged when shared image
 cannot be used due to mismatched configuration
In-Reply-To: <72d36881-b82c-10d4-1d5d-d6a2f07b677e@oracle.com>
References: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
 <CAA-vtUzcr6w3wOjsYv02ED_NT5wB39XhmVJRw1j5Efz+5cod9Q@mail.gmail.com>
 <72d36881-b82c-10d4-1d5d-d6a2f07b677e@oracle.com>
Message-ID: <b7286c1a-1249-e4c8-97ab-86912e9f8e3c@oracle.com>


On 11/27/19 1:39 PM, David Holmes wrote:
> On 28/11/2019 4:32 am, Thomas St?fe wrote:
>> Looks good Ioi. I'm surprised UL works this early.
>
> I'm also surprised. no_shared_spaces() is called from three different 
> paths - will they all work? It is hard to see by code inspection.
>
The three paths are all called after the VM arguments have been parsed, 
at which point UL has already been set up. So it should be safe to call 
UL. E.g., if you tell UL to save to a file:

$ java -XX:-UseCompressedOops -Xlog:cds:file=foo.txt -version
java version "14-internal" 2020-03-17
Java(TM) SE Runtime Environment (fastdebug build 
14-internal+0-adhoc.iklam.open)
Java HotSpot(TM) 64-Bit Server VM (fastdebug build 
14-internal+0-adhoc.iklam.open, mixed mode)
$ cat foo.txt
[0.001s][info][cds] Unable to use shared archive: UseCompressedOops and 
UseCompressedClassPointers must be on for UseSharedSpaces.

In fact, all three call sites are preceded by something like:

 ??? if (SomeJVMOption) { ....

So if this were done before all the VM arguments are parsed, we would 
already be doing something wrong ...

There are a bunch of log_debug/log_info calls in arguments.cpp. I think 
they all have the same assumption. This is kind of iffy, but that's 
beyond the scope of this patch :-)

Thanks
- Ioi


> Cheers,
> David
>
>> Cheers, Thomas
>>
>> On Wed, Nov 27, 2019 at 7:29 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>>> https://bugs.openjdk.java.net/browse/JDK-8230385
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk14/8230385-log-when-cds-is-disabled.v01/ 
>>>
>>>
>>> Please review this one-liner addition that prints a log message when 
>>> CDS
>>> cannot be used. Output:
>>>
>>> $ java -XX:-UseCompressedOops -Xlog:cds -version
>>> [0.001s][info][cds] Unable to use shared archive: UseCompressedOops and
>>> UseCompressedClassPointers must be on for UseSharedSpaces.
>>> java version "14-internal" 2020-03-17
>>> Java(TM) SE Runtime Environment (fastdebug build
>>> 14-internal+0-adhoc.iklam.open)
>>> Java HotSpot(TM) 64-Bit Server VM (fastdebug build
>>> 14-internal+0-adhoc.iklam.open, mixed mode)
>>>
>>> Thanks
>>> - Ioi
>>>


From david.holmes at oracle.com  Thu Nov 28 00:49:03 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 28 Nov 2019 10:49:03 +1000
Subject: RFR(XXS) 8230385 [cds] No message is logged when shared image
 cannot be used due to mismatched configuration
In-Reply-To: <b7286c1a-1249-e4c8-97ab-86912e9f8e3c@oracle.com>
References: <a757a55e-d294-c705-2357-f0e84ffc8171@oracle.com>
 <CAA-vtUzcr6w3wOjsYv02ED_NT5wB39XhmVJRw1j5Efz+5cod9Q@mail.gmail.com>
 <72d36881-b82c-10d4-1d5d-d6a2f07b677e@oracle.com>
 <b7286c1a-1249-e4c8-97ab-86912e9f8e3c@oracle.com>
Message-ID: <ab74c925-f532-9f0c-bbb9-7ab478f1bbcf@oracle.com>

On 28/11/2019 10:41 am, Ioi Lam wrote:
> 
> 
> On 11/27/19 1:39 PM, David Holmes wrote:
>> On 28/11/2019 4:32 am, Thomas St?fe wrote:
>>> Looks good Ioi. I'm surprised UL works this early.
>>
>> I'm also surprised. no_shared_spaces() is called from three different 
>> paths - will they all work? It is hard to see by code inspection.
>>
> The three paths are all called after the VM arguments have been parsed, 
> at which point UL has already been set up. So it should be safe to call 
> UL. E.g., if you tell UL to save to a file:
> 
> $ java -XX:-UseCompressedOops -Xlog:cds:file=foo.txt -version
> java version "14-internal" 2020-03-17
> Java(TM) SE Runtime Environment (fastdebug build 
> 14-internal+0-adhoc.iklam.open)
> Java HotSpot(TM) 64-Bit Server VM (fastdebug build 
> 14-internal+0-adhoc.iklam.open, mixed mode)
> $ cat foo.txt
> [0.001s][info][cds] Unable to use shared archive: UseCompressedOops and 
> UseCompressedClassPointers must be on for UseSharedSpaces.
> 
> In fact, all three call sites are preceded by something like:
> 
>  ??? if (SomeJVMOption) { ....
> 
> So if this were done before all the VM arguments are parsed, we would 
> already be doing something wrong ...
> 
> There are a bunch of log_debug/log_info calls in arguments.cpp. I think 
> they all have the same assumption. This is kind of iffy, but that's 
> beyond the scope of this patch :-)

Okay thanks for clarifying.

David

> Thanks
> - Ioi
> 
> 
>> Cheers,
>> David
>>
>>> Cheers, Thomas
>>>
>>> On Wed, Nov 27, 2019 at 7:29 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8230385
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk14/8230385-log-when-cds-is-disabled.v01/ 
>>>>
>>>>
>>>> Please review this one-liner addition that prints a log message when 
>>>> CDS
>>>> cannot be used. Output:
>>>>
>>>> $ java -XX:-UseCompressedOops -Xlog:cds -version
>>>> [0.001s][info][cds] Unable to use shared archive: UseCompressedOops and
>>>> UseCompressedClassPointers must be on for UseSharedSpaces.
>>>> java version "14-internal" 2020-03-17
>>>> Java(TM) SE Runtime Environment (fastdebug build
>>>> 14-internal+0-adhoc.iklam.open)
>>>> Java HotSpot(TM) 64-Bit Server VM (fastdebug build
>>>> 14-internal+0-adhoc.iklam.open, mixed mode)
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
> 

From john.r.rose at oracle.com  Thu Nov 28 03:05:11 2019
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 27 Nov 2019 19:05:11 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <d12ae70f-77ad-b2d5-8af6-87a9d13a725f@oracle.com>
 <2686ec0d-3dac-a0c3-d2c3-0fa5211bc07b@oracle.com>
 <2eb6f996-cd85-00cb-b795-dd2eefabd10b@oracle.com>
Message-ID: <FB5656BF-068D-486F-BF4C-244A106BFCA5@oracle.com>

I too would expect these things to be placed in globalDefinitions.hpp
rather than align.hpp, for the reasons already given:  There are already
similar functions in there.

I?m not against cleaning up globalDefinitions.hpp, but until there?s
a better proposal on the table, let?s stay with it for functions like this.

? John

On Nov 26, 2019, at 2:23 AM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 26/11/2019 8:06 pm, Claes Redestad wrote:
>> On 2019-11-26 10:50, David Holmes wrote:
>>> Hi Claes,
>>> 
>>> Just some high-level comments
>>> 
>>> - should next_power_of_two be defined in globalDefinitions.hpp along side the related functionality ie is_power_of_two ?
>> I thought we are trying to move things _out_ of globalDefinitions. I
> 
> We are? I don't recall hearing that. But wherever these go seems they all belong together.
> 
>> agree align.hpp might not be the best place, either, though..
> 
> I thought align.hpp as strange place too. :)
> 
>>> 
>>> - can next_power_of_two build on the existing log2_* functions (or vice versa)?
>> Yes, log2_intptr et al could probably be tamed to do a single step
>> operation, although we'd need to add 64-bit implementations in
>> count_leading_zeros. At least these log2_* functions already deal with
>> overflows without looping forever.
>>> 
>>> - do the existing ZUtils not cover the same general area?
>>> 
>>> ./share/gc/z/zUtils.inline.hpp
>>> 
>>> inline size_t ZUtils::round_up_power_of_2(size_t value) {
>>>    assert(value != 0, "Invalid value");
>>> 
>>>    if (is_power_of_2(value)) {
>>>      return value;
>>>    }
>>> 
>>>    return (size_t)1 << (log2_intptr(value) + 1);
>>> }
>>> 
>>> inline size_t ZUtils::round_down_power_of_2(size_t value) {
>>>    assert(value != 0, "Invalid value");
>>>    return (size_t)1 << log2_intptr(value);
>>> }
>> round_up_power_of_2 is similar, but not identical (next_power_of_two doesn't care if the value is already a power of 2, nor should it).
> 
> Okay but seems perhaps these should also be moved out of ZUtils and co-located with the other "power of two" functions.
> 
> Cheers,
> David
> -----
> 
>> /Claes


From david.holmes at oracle.com  Thu Nov 28 04:16:30 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 28 Nov 2019 14:16:30 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <878so1lzoy.fsf@mid.deneb.enyo.de>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <878so1lzoy.fsf@mid.deneb.enyo.de>
Message-ID: <19782729-f3ef-59b7-0830-b5ba2002746d@oracle.com>

Hi Florian,

On 28/11/2019 8:30 am, Florian Weimer wrote:
> * David Holmes:
> 
>> For a micro-benchmark like that sure. But is that at all representative
>> of real modern code? We know some of the really old benchmarks used
>> synchronized collections and StringBuffer extensively and so they also
>> benefit from biased-locking. But more modern benchmarks are not showing
>> any benefit.
>>
>> We'd like to know the impact on real applications but we have no way to
>> know that a-priori. So we're either stuck with the burden of supporting
>> biased-locking forever, or we flip the switch to turn it off and see if
>> it causes too many issues. Unless you see another way to determine this?
> 
> How important is biased locking for JNI libraries which use
> synchronization to prevent incorrect concurrent use of objects (with
> JNI resources) from resulting in arbitrary memory corruption?
> 
> I've always thought that biased locking is really attractive for this
> scenario because these locks are just overhead if the library is used
> correctly.  One wouldn't want to use ReentrantLocks here precisely
> because of the benefits of biased locking.

It depends on exactly what you mean. If you use the JNI MonitorEnter 
function it revokes any bias and inflates the monitor thus making 
BiasedLocking irrelevant to that scenario. If JNI calls a synchronized 
Java method it works the same as a non-JNI call of that method.

David

> On the other hand, the SWIG binding generator does not systematically
> emit such synchronization.  There is an attempt to deal with the
> finalization race condition, but I have doubts regarding its
> effectiveness because the SWIG-generated code does not ensure object
> reachability after JNI calls (which are apparently implemented as
> static native methods).
> 
> If biased locking were determined important to JNI code, it would be
> interesting to consider how its performance gains relate to the
> general JNI overhead.
> 

From vicente.romero at oracle.com  Thu Nov 28 04:37:21 2019
From: vicente.romero at oracle.com (Vicente Romero)
Date: Wed, 27 Nov 2019 23:37:21 -0500
Subject: RFR: JEP 359: Records (Preview) (full code)
Message-ID: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>

Hi,

Please review the code for the records feature at [1]. This webrev 
includes all: APIs, runtime, compiler, serialization, javadoc, and more! 
Must of the code has been reviewed but there have been some changes 
since reviewers saw it. Also this is the first time an integral webrev 
is sent out for review. Last changes on top of my mind since last review 
iterations:

On the compiler implementation:
- it has been adapted to the last version of the language spec [2], as a 
reference the JVM spec is at [3]. This implied some changes in 
determining if a user defined constructor is the canonical or not. Now 
if a constructor is override-equivalent to a signature derived from the 
record components, then it is considered the canonical constructor. And 
any canonical constructor should satisfy a set of restrictions, see 
section 8.10.4 Record Constructor Declarations of the specification.
- It was also added a check to make sure that accessors are not generic.
- And that the canonical constructor, if user defined, is not explicitly 
invoking any other constructor.
- The list of forbidden record component names has also been updated.
- new error messages have been added

APIs:
- there have been some API editing in java.lang.Record, 
java.lang.runtime.ObjectMethods and java.lang.reflect.RecordComponent, 
java.io.ObjectInputStream, javax.lang.model (some visitors were added)

On the JVM implementation:
- some logging capabilities have been added to classFileParser.cpp to 
provide the reason for which the Record attribute has been ignored

Reflection:
- there are several new changes to the implementation of 
java.lang.reflect.RecordComponent apart from the spec changes mentioned 
before.

bug fixes in
- compiler
- serialization,
- JVM, etc

As a reference the last iteration of the previous reviews can be found 
at [4] under folders: compiler, hotspot_runtime, javadoc, reflection and 
serialization,

TIA,
Vicente

[1] http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
[2] 
http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
[3] 
http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
[4] http://cr.openjdk.java.net/~vromero/records.review/


From fw at deneb.enyo.de  Thu Nov 28 06:45:30 2019
From: fw at deneb.enyo.de (Florian Weimer)
Date: Thu, 28 Nov 2019 07:45:30 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <19782729-f3ef-59b7-0830-b5ba2002746d@oracle.com> (David Holmes's
 message of "Thu, 28 Nov 2019 14:16:30 +1000")
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <878so1lzoy.fsf@mid.deneb.enyo.de>
 <19782729-f3ef-59b7-0830-b5ba2002746d@oracle.com>
Message-ID: <87blswbit1.fsf@mid.deneb.enyo.de>

* David Holmes:

> Hi Florian,
>
> On 28/11/2019 8:30 am, Florian Weimer wrote:
>> * David Holmes:
>> 
>>> For a micro-benchmark like that sure. But is that at all representative
>>> of real modern code? We know some of the really old benchmarks used
>>> synchronized collections and StringBuffer extensively and so they also
>>> benefit from biased-locking. But more modern benchmarks are not showing
>>> any benefit.
>>>
>>> We'd like to know the impact on real applications but we have no way to
>>> know that a-priori. So we're either stuck with the burden of supporting
>>> biased-locking forever, or we flip the switch to turn it off and see if
>>> it causes too many issues. Unless you see another way to determine this?
>> 
>> How important is biased locking for JNI libraries which use
>> synchronization to prevent incorrect concurrent use of objects (with
>> JNI resources) from resulting in arbitrary memory corruption?
>> 
>> I've always thought that biased locking is really attractive for this
>> scenario because these locks are just overhead if the library is used
>> correctly.  One wouldn't want to use ReentrantLocks here precisely
>> because of the benefits of biased locking.
>
> It depends on exactly what you mean. If you use the JNI MonitorEnter 
> function it revokes any bias and inflates the monitor thus making 
> BiasedLocking irrelevant to that scenario. If JNI calls a synchronized 
> Java method it works the same as a non-JNI call of that method.

Sorry, I meant synchronized wrappers in Java code, calling JNI methods
which do no synchronization.  Kind of what SWIG does for delete(),
only consistent for everything.

From thomas.stuefe at gmail.com  Thu Nov 28 07:34:49 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 28 Nov 2019 08:34:49 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
Message-ID: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>

Hi Claes,

I think this is useful. Why not a 64bit variant too? If you do not want to
go through the hassle of providing a count_leading_zeros(uint64_t), you
could call the 32bit variant twice and take care of endianness for the
caller.

--

In inline int32_t next_power_of_two(int32_t value) , should we weed out
negative input values right away instead of asserting at the end of the
function?

--

The functions will always return the next power of two, even if the input
is a power of two - e.g. "2" for "1". Is that intended? It would be nice to
have an API comment in the header describing these corner cases (what
happens for negative input, what happens if input is power 2).

--

The patch can cause subtle differences in some caller code, I think, if
input value is a power of 2 already. See e.g:

http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html

-  i=16;
-  while( i < size ) i <<= 1;
+  i = MAX2(16, (int)next_power_of_two(size));

If i == size == 16, old code would keep i==16, new code would come to
i==32, I think.

http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html

 //------------------------------round_up---------------------------------------
 // Round up to nearest power of 2
-uint NodeHash::round_up( uint x ) {
-  x += (x>>2);                  // Add 25% slop
-  if( x <16 ) return 16;        // Small stuff
-  uint i=16;
-  while( i < x ) i <<= 1;       // Double to fit
-  return i;                     // Return hash table size
+uint NodeHash::round_up(uint x) {
+  x += (x >> 2);                  // Add 25% slop
+  return MAX2(16U, next_power_of_two(x));
 }

same here. If x == 16, before we'd return 16, now 32.

---

http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html

I admit I do not understand the current coding :) I do not believe it works
for all input values, e.g. were get_java_thread_list()->length()==1025,
we'd get 1861 - if I am not mistaken. Your code is definitely clearer but
not equivalent to the old one.

---

In the end, I wonder whether we should have two kind of APIs, or a
parameter, distinguishing between "next power of 2" and "next power of 2
unless input value is already power of 2".

Cheers, Thomas


On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> in various places in the hotspot we have custom code to calculate the
> next power of two, some of which have potential to go into an infinite
> loop in case of an overflow.
>
> This patch proposes adding next_power_of_two utility methods which
> avoid infinite loops on overflow, while providing slightly more
> efficient code in most cases.
>
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8234331
> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
>
> Testing: tier1-3
>
> Thanks!
>
> /Claes
>

From thomas.stuefe at gmail.com  Thu Nov 28 07:44:25 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 28 Nov 2019 08:44:25 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
Message-ID: <CAA-vtUzPCHtQz3BVnLFSrW1+jUSbtwiJU9EJk5FKU2Q854O_qA@mail.gmail.com>

p.s. I think it would be good to have some gtests for these functions,
especially to test corner cases.

Cheers, Thomas

On Thu, Nov 28, 2019 at 8:34 AM Thomas St?fe <thomas.stuefe at gmail.com>
wrote:

> Hi Claes,
>
> I think this is useful. Why not a 64bit variant too? If you do not want to
> go through the hassle of providing a count_leading_zeros(uint64_t), you
> could call the 32bit variant twice and take care of endianness for the
> caller.
>
> --
>
> In inline int32_t next_power_of_two(int32_t value) , should we weed out
> negative input values right away instead of asserting at the end of the
> function?
>
> --
>
> The functions will always return the next power of two, even if the input
> is a power of two - e.g. "2" for "1". Is that intended? It would be nice to
> have an API comment in the header describing these corner cases (what
> happens for negative input, what happens if input is power 2).
>
> --
>
> The patch can cause subtle differences in some caller code, I think, if
> input value is a power of 2 already. See e.g:
>
>
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/libadt/dict.cpp.udiff.html
>
> -  i=16;
> -  while( i < size ) i <<= 1;
> +  i = MAX2(16, (int)next_power_of_two(size));
>
> If i == size == 16, old code would keep i==16, new code would come to
> i==32, I think.
>
>
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/opto/phaseX.cpp.udiff.html
>
>
>  //------------------------------round_up---------------------------------------
>  // Round up to nearest power of 2
> -uint NodeHash::round_up( uint x ) {
> -  x += (x>>2);                  // Add 25% slop
> -  if( x <16 ) return 16;        // Small stuff
> -  uint i=16;
> -  while( i < x ) i <<= 1;       // Double to fit
> -  return i;                     // Return hash table size
> +uint NodeHash::round_up(uint x) {
> +  x += (x >> 2);                  // Add 25% slop
> +  return MAX2(16U, next_power_of_two(x));
>  }
>
> same here. If x == 16, before we'd return 16, now 32.
>
> ---
>
>
> http://cr.openjdk.java.net/~redestad/8234331/open.01/src/hotspot/share/runtime/threadSMR.cpp.udiff.html
>
> I admit I do not understand the current coding :) I do not believe it
> works for all input values, e.g. were
> get_java_thread_list()->length()==1025, we'd get 1861 - if I am not
> mistaken. Your code is definitely clearer but not equivalent to the old one.
>
> ---
>
> In the end, I wonder whether we should have two kind of APIs, or a
> parameter, distinguishing between "next power of 2" and "next power of 2
> unless input value is already power of 2".
>
> Cheers, Thomas
>
>
>
>
>
> On Tue, Nov 26, 2019 at 10:42 AM Claes Redestad <claes.redestad at oracle.com>
> wrote:
>
>> Hi,
>>
>> in various places in the hotspot we have custom code to calculate the
>> next power of two, some of which have potential to go into an infinite
>> loop in case of an overflow.
>>
>> This patch proposes adding next_power_of_two utility methods which
>> avoid infinite loops on overflow, while providing slightly more
>> efficient code in most cases.
>>
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8234331
>> Webrev: http://cr.openjdk.java.net/~redestad/8234331/open.01/
>>
>> Testing: tier1-3
>>
>> Thanks!
>>
>> /Claes
>>
>

From aph at redhat.com  Thu Nov 28 10:12:59 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 28 Nov 2019 10:12:59 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <878so1lzoy.fsf@mid.deneb.enyo.de>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <6b913f48-b4f8-5069-e10f-fb966432942b@redhat.com>
 <46bfbeb1-9a50-62e8-0a3a-1709356ed534@oracle.com>
 <878so1lzoy.fsf@mid.deneb.enyo.de>
Message-ID: <eef66216-c5d5-add9-0eb8-40a994d5109c@redhat.com>

On 11/27/19 10:30 PM, Florian Weimer wrote:
> If biased locking were determined important to JNI code, it would be
> interesting to consider how its performance gains relate to the
> general JNI overhead.

JNI overhead is significant, partly because of the need for the
VM-native state changes to be atomic. I don't think a couple of CAS
operations will make a large difference.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Thu Nov 28 10:27:11 2019
From: aph at redhat.com (Andrew Haley)
Date: Thu, 28 Nov 2019 10:27:11 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <085e21fb-31f9-48e3-1f06-294a3da079c3@oracle.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
 <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>
 <eafe9e6f-5019-f9ca-1b38-a6bbdc66f786@redhat.com>
 <085e21fb-31f9-48e3-1f06-294a3da079c3@oracle.com>
Message-ID: <c3e1ca3c-004b-328c-468b-d375a1dc684d@redhat.com>

On 11/27/19 9:08 PM, David Holmes wrote:
> SPECjbb2015 was used as supporting evidence that a modern benchmark
> that people like to measure and report is not impacted by this
> change. As I now know there is a good reason for that. But if
> SPECjbb2015 is somehow supposed to be representative of some class
> of real applications then this also bodes well for those
> applications.

I completely agree.


However, so far we know that:

When uncontended, biased locking is almost an order of magnitude
faster than un-biased locking.

In a couple of benchmarks which use uncontended biased locking, the
performance advantage it brings is 3% - 5%.

ReentrantLocks are about the same speed as un-biased synchronized blocks.

Another claim -- which I make without any evidence -- is that there is
a lot of old Java code out there.


Like many others i would love to be rid of biased locking, and I am
perpetually disappointed by how effective it is.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From david.holmes at oracle.com  Thu Nov 28 12:41:36 2019
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 28 Nov 2019 22:41:36 +1000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <c3e1ca3c-004b-328c-468b-d375a1dc684d@redhat.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
 <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>
 <eafe9e6f-5019-f9ca-1b38-a6bbdc66f786@redhat.com>
 <085e21fb-31f9-48e3-1f06-294a3da079c3@oracle.com>
 <c3e1ca3c-004b-328c-468b-d375a1dc684d@redhat.com>
Message-ID: <fc2d2f62-9930-4ab5-bdba-c4837301f99f@oracle.com>

On 28/11/2019 8:27 pm, Andrew Haley wrote:
> On 11/27/19 9:08 PM, David Holmes wrote:
>> SPECjbb2015 was used as supporting evidence that a modern benchmark
>> that people like to measure and report is not impacted by this
>> change. As I now know there is a good reason for that. But if
>> SPECjbb2015 is somehow supposed to be representative of some class
>> of real applications then this also bodes well for those
>> applications.
> 
> I completely agree.
> 
> 
> However, so far we know that:
> 
> When uncontended, biased locking is almost an order of magnitude
> faster than un-biased locking.

Oaky, but lets be very clear about what "uncontended locking" is. We're 
not talking about infrequent multi-threaded access to a shared object 
such that there is no contention on the lock. We are talking about 
performing locking on an object that is never actually shared. As soon 
as another thread locks an object the bias is revoked and that is the 
end of BL for that object (modulo the special bulk revocation case).

> In a couple of benchmarks which use uncontended biased locking, the
> performance advantage it brings is 3% - 5%.

Yes, and? Benchmark results are only of interest if they are 
representative of real world code.

> ReentrantLocks are about the same speed as un-biased synchronized blocks.
> 
> Another claim -- which I make without any evidence -- is that there is
> a lot of old Java code out there.

Perhaps - no way to know. And some fraction of that may benefit from 
biased-locking. And some fraction of that may benefit so much that it is 
noticeable. And some fraction of that is important enough to someone 
that they flag the issue.

> 
> Like many others i would love to be rid of biased locking, and I am
> perpetually disappointed by how effective it is.

Biased-locking is effective at what it does - that is not in dispute. 
What we are questioning is how much code really needs it today.

We will be stuck with it forever if we can't move forward somehow.

Cheers,
David


From adinn at redhat.com  Thu Nov 28 13:43:09 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 28 Nov 2019 13:43:09 +0000
Subject: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when
 profiling return and parameters type
In-Reply-To: <DA41BE1DDCA941489001C7FBD7A8820EDAA21F3E@dggeml527-mbx.china.huawei.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EDAA21F3E@dggeml527-mbx.china.huawei.com>
Message-ID: <d5dd8664-5a6c-0517-f052-5eea02a06990@redhat.com>

Hi Felix,

On 25/11/2019 11:33, Yangfei (Felix) wrote:
> Ping?   Any comments?

Yes, that load into mdp is redundant. x86 omits the load and so should
AArch64. The patch is good.


regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From vicente.romero at oracle.com  Thu Nov 28 16:05:52 2019
From: vicente.romero at oracle.com (Vicente Romero)
Date: Thu, 28 Nov 2019 11:05:52 -0500
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
Message-ID: <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>

Hi again,

Sorry but I realized that I forgot to remove some code on the compiler 
side. The code removed is small, before we were issuing an error if some 
serialization methods were declared as record members. That section was 
removed from the spec. I have prepared another iteration with this 
change at [1]

Thanks,
Vicente

[1] http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01/

On 11/27/19 11:37 PM, Vicente Romero wrote:
> Hi,
>
> Please review the code for the records feature at [1]. This webrev 
> includes all: APIs, runtime, compiler, serialization, javadoc, and 
> more! Must of the code has been reviewed but there have been some 
> changes since reviewers saw it. Also this is the first time an 
> integral webrev is sent out for review. Last changes on top of my mind 
> since last review iterations:
>
> On the compiler implementation:
> - it has been adapted to the last version of the language spec [2], as 
> a reference the JVM spec is at [3]. This implied some changes in 
> determining if a user defined constructor is the canonical or not. Now 
> if a constructor is override-equivalent to a signature derived from 
> the record components, then it is considered the canonical 
> constructor. And any canonical constructor should satisfy a set of 
> restrictions, see section 8.10.4 Record Constructor Declarations of 
> the specification.
> - It was also added a check to make sure that accessors are not generic.
> - And that the canonical constructor, if user defined, is not 
> explicitly invoking any other constructor.
> - The list of forbidden record component names has also been updated.
> - new error messages have been added
>
> APIs:
> - there have been some API editing in java.lang.Record, 
> java.lang.runtime.ObjectMethods and java.lang.reflect.RecordComponent, 
> java.io.ObjectInputStream, javax.lang.model (some visitors were added)
>
> On the JVM implementation:
> - some logging capabilities have been added to classFileParser.cpp to 
> provide the reason for which the Record attribute has been ignored
>
> Reflection:
> - there are several new changes to the implementation of 
> java.lang.reflect.RecordComponent apart from the spec changes 
> mentioned before.
>
> bug fixes in
> - compiler
> - serialization,
> - JVM, etc
>
> As a reference the last iteration of the previous reviews can be found 
> at [4] under folders: compiler, hotspot_runtime, javadoc, reflection 
> and serialization,
>
> TIA,
> Vicente
>
> [1] 
> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
> [2] 
> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
> [3] 
> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
> [4] http://cr.openjdk.java.net/~vromero/records.review/
>


From thomas.stuefe at gmail.com  Thu Nov 28 17:03:26 2019
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 28 Nov 2019 18:03:26 +0100
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <fc2d2f62-9930-4ab5-bdba-c4837301f99f@oracle.com>
References: <VI1PR0201MB24796A75CD45362D594678519A490@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <9F4A2CCB-532E-4738-8706-EB408048AECA@oracle.com>
 <cdcff990-a5c0-5123-6308-d47ce5d28974@oracle.com>
 <VI1PR0201MB24796AAC15DD4289729AD4409A450@VI1PR0201MB2479.eurprd02.prod.outlook.com>
 <5f488e2b-c7d4-1efc-c1ec-2cb4251d8b93@oracle.com>
 <5C6C2950-8162-4BCC-9D0B-1F268F0D8010@oracle.com>
 <eafe9e6f-5019-f9ca-1b38-a6bbdc66f786@redhat.com>
 <085e21fb-31f9-48e3-1f06-294a3da079c3@oracle.com>
 <c3e1ca3c-004b-328c-468b-d375a1dc684d@redhat.com>
 <fc2d2f62-9930-4ab5-bdba-c4837301f99f@oracle.com>
Message-ID: <CAA-vtUxWRXROxS_c26vXwcX8OijM0eArOnbBDZO=oaLJw8t9AA@mail.gmail.com>

Hi David,

On Thu, Nov 28, 2019 at 1:41 PM David Holmes <david.holmes at oracle.com>
wrote:

> On 28/11/2019 8:27 pm, Andrew Haley wrote:
> > On 11/27/19 9:08 PM, David Holmes wrote:
> >> SPECjbb2015 was used as supporting evidence that a modern benchmark
> >> that people like to measure and report is not impacted by this
> >> change. As I now know there is a good reason for that. But if
> >> SPECjbb2015 is somehow supposed to be representative of some class
> >> of real applications then this also bodes well for those
> >> applications.
> >
> > I completely agree.
> >
> >
> > However, so far we know that:
> >
> > When uncontended, biased locking is almost an order of magnitude
> > faster than un-biased locking.
>
> Oaky, but lets be very clear about what "uncontended locking" is. We're
> not talking about infrequent multi-threaded access to a shared object
> such that there is no contention on the lock. We are talking about
> performing locking on an object that is never actually shared. As soon
> as another thread locks an object the bias is revoked and that is the
> end of BL for that object (modulo the special bulk revocation case).
>
> > In a couple of benchmarks which use uncontended biased locking, the
> > performance advantage it brings is 3% - 5%.
>
> Yes, and? Benchmark results are only of interest if they are
> representative of real world code.
>
> > ReentrantLocks are about the same speed as un-biased synchronized blocks.
> >
> > Another claim -- which I make without any evidence -- is that there is
> > a lot of old Java code out there.
>
> Perhaps - no way to know. And some fraction of that may benefit from
> biased-locking. And some fraction of that may benefit so much that it is
> noticeable. And some fraction of that is important enough to someone
> that they flag the issue.
>
> >
> > Like many others i would love to be rid of biased locking, and I am
> > perpetually disappointed by how effective it is.
>
> Biased-locking is effective at what it does - that is not in dispute.
> What we are questioning is how much code really needs it today.
>
>
I'm curious, has Oracle done some sort of market research to see the actual
impact this change would have? Or will the deprecation and
disabling-by-default of BL be the market research? And if yes, how would
you even get the result? Customers may see a performance regression without
ever attributing it to BL or without ever telling anyone about it, they may
just sigh and go back to earlier releases. And I would like customers to
have motivation to go to new releases, not to stay on old ones.

My gut feeling is that the vast majority of code in the field is legacy
code, and IMHO a good VM is one which runs legacy code well. Similar to the
Linux kernel which aims to "never break user space".


> We will be stuck with it forever if we can't move forward somehow.
>
> Cheers,
> David
>
>
Cheers, Thomas

From adinn at redhat.com  Thu Nov 28 17:18:20 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 28 Nov 2019 17:18:20 +0000
Subject: RFR 8231264: Disable biased-locking and deprecate all flags
 related to biased-locking
In-Reply-To: <a89922f0-2e53-f312-4458-9ad7bb8d4476@oracle.com>
References: <a07b2b85-77b6-7441-ff78-49b15b2c6fbe@oracle.com>
 <fe549cc9-fba7-9a15-eed6-832717acdee0@oracle.com>
 <HE1PR0201MB24756A4232EFC1AFAA26BD7D9A4C0@HE1PR0201MB2475.eurprd02.prod.outlook.com>
 <34110470-9006-072b-d88c-22f145dee363@oracle.com>
 <249648ff-084b-00bc-4c70-14024471e082@redhat.com>
 <a89922f0-2e53-f312-4458-9ad7bb8d4476@oracle.com>
Message-ID: <c06d79c8-d82a-9c32-dc49-f352d3d43198@redhat.com>

Hi Daniel,

TL;DR Aargh, loom!

Otherwise see inline comments.

On 22/11/2019 15:06, Daniel D. Daugherty wrote:
> On 11/22/19 5:14 AM, Andrew Dinn wrote:
>> On 21/11/2019 21:31, David Holmes wrote:
>>> On 20/11/2019 2:51 am, Doerr, Martin wrote:
>>>> I think deprecating before publishing an evaluation or at least having
>>>> a discussion is not appropriate.
>>> Deprecation shows the intent that we (eventually) want to remove this
>>> and that people should try to avoid using it. If we don't actually
>>> deprecate it but just turn off then here is a likely scenario:
>> Who is this we?
> 
> The way you edited the original thread makes it look like the "we" comes
> out of no where. Okay. Not sure why you did that, but here's the complete
> context:

That's a fair point with the one proviso that /I/ didn't actually do any
of the editing . . .

> . . .
> So now David's use of "we" should be more clear. I do have to point out
> that my use of "we (Oracle)" was present in both Martin's reply and in
> David's reply to Martin, but for some reason you chose to edit it out.
> This makes your pushing back on David's use of an unqualified "we"
> questionable. Are you trying to be intentionally confrontational?

Thank you for the careful exegesis. A-and, no! Of course I'm not trying
to be intentionally confrontational. I merely made the mistake of
responding to David's somewhat elliptical reduction of your original words.

>> The premise of your scenario has built in the conclusion
>> that some of /us/ are questioning and thereby excluded our critique from
>> any chance of qualifying the proposed action.
> 
> Hmmm... I don't see anyone excluding critiques here, but maybe I've missed
> something...

Well, it seems I missed the ellipsis in David's statement -- where 'we'
was used in place of your original 'we (Oracle)'. Reading his statement
without awareness that several notes back 'we' was so qualified makes it
look as if he is committing the error of /petitio principii/ to bypass
the critique made in those intervening notes.

However, even granted the qualifier your textual spadework has restored
I still can't see how his use of 'we (Oracle)' justifies the notion that
'we (OpenJDK project)' should go ahead and removing biased locking
without (to quote the Martin Doerr, the person he was replying to)
'publishing an evaluation or at least having a discussion'. David didn't
actually answer that point, he merely reiterated the far less
contentious claim that switching it off without deprecation would be bad.

>> If the existence of such a consensus is not clear (and I suggest that
>> this thread makes that plain) and the evidence for arriving at such a
>> consensus is not compelling (ditto) and if the rest of the scenario will
>> likely play out as you suggest then that is a strong reason to
>> re-address the decision to switch the feature off, whether or not it is
>> deprecated at the same time.
> 
> I suspect that "compelling" is in the eye of the beholder.
> 
> Simply changing the default from true to false is pretty much a silent
> change in behavior even if we put out a release note. By deprecating at
> the same time, we'll have a visible diagnostic message if biased locking
> is enabled. That's much more likely to lead to feedback than a silent
> change in behavior.

Of course, I agree that a silent change would not be good enough. The
issue still stands that little or no real evidence has (strictly, had --
see below) been presented for any change, /silent or not/. The only
clear reason offered at the point where I replied was the failure of
biased locking to improve performance in a specific benchmark. However
as Aleksey (one of the benchmark's developers) pointed out that
benchmark was designed explicitly to test cases where biased locking
would not help. I don't know what your eyesight is like but surely you
too need to squint to see that as sufficient?

>>> Alternatively we deprecate in 14 and customer lets us know straight away
>>> that it is still useful.
>> Alternatively, we come up with better evidence that it needs switching
>> off (and, possibly, deprecating).
> 
> I wonder what would be considered acceptable "better evidence".
Well, as Marlon Brando said: What have you got?

It's hardly appropriate to ask /me/ to specify in advance what reason
/you/ need to give. Contrariwise, a request for /you/ to provide the
reason may be rather boringly conventional but it is not exactly out of
place.

n.b. the small hand grenade spelled 'loom' that was lobbed by Erik
Osterlund into an adjacent note in this thread might be a good place to
start ;-). I'm more than happy to surrender biased locking in the face
of the nightmare problems that loom's fragmentation of the notion of
identity implies for an implementation.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From jianglizhou at google.com  Thu Nov 28 17:33:46 2019
From: jianglizhou at google.com (Jiangli Zhou)
Date: Thu, 28 Nov 2019 09:33:46 -0800
Subject: RFR JDK-8232222: Set state to 'linked' when an archived class is
 restored at runtime
Message-ID: <CALrW1jxBmX2QA+R4=v5S2QzDJrwVumRu0vqKvhPmf2JtME3t7w@mail.gmail.com>

Hi,

Please review the following optimization related to archived classes'
(from builtin loaders) runtime restoration and linking. It is not
intended for OpenJDK 14. After reviewers review the change, I'll wait
for 14 fork before pushing.

webrev: http://cr.openjdk.java.net/~jiangli/8232222/webrev.00/
RFE: https://bugs.openjdk.java.net/browse/JDK-8232222

Motivation and details of the change, which are duplicated in the RFE
=====================================================

When linking a class, InstanceKlass::link_class_impl() first links all
super classes and super interfaces of the current class. For the
current class, it then verifies and rewrites the bytecode, links
methods, initializes the itable and vtable, and sets the current class
to 'linked' state.

When loading an archived class at runtime,
SystemDictionary::load_shared_class makes sure the super types (all
super classes and super interfaces) in the class hierarchy are loaded
first. If not, the archived class is not used. The archived class is
restored when 'loading' from the archive. At the end of the
restoration, all methods are linked. As bytecode verification and
rewriting are done at CDS dump time, the runtime does not redo the
operations for an archived class.

If we make sure the itable and vtable are properly initialized (not
needed for classes loaded by the NULL class loader) and
SystemDictionaryShared::check_verification_constraints is performed
for an archived class during restoration, then the archived class
(from builtin loaders) is effectively in 'linked' state.

For all archived classes loaded by the builtin loaders, we can safely
set to 'linked' state at the end of restoration. As a result, we can
save the work for iterating the super types in
InstanceKlass::link_class_impl() at runtime.

Performance results
================

With both JDK 11 and the latest jdk/jdk, the proposed change saves
~1.5M instruction execution when running HelloWorld with the default
CDS. Please see raw data in the RFE. For applications using more
archived classes at runtime, larger saving should be experienced.

Testing
======
Tested with all jtreg cds/* tests, which include all appcds tests.
Submit repo testing passed.

The change has also gone through internal testing with very large
number of tests (all with default CDS enabled) for more than a month.

Best,
Jiangli

From maurizio.cimadamore at oracle.com  Thu Nov 28 18:35:26 2019
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Thu, 28 Nov 2019 18:35:26 +0000
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
 <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
Message-ID: <ba1516ea-7bff-3b05-0c58-d2a9fc9b3d70@oracle.com>

Hi Vicente,
generally looks good - few comments below; I tried to focus on areas 
where the compiler code seemed to diverge from the spec, as well on 
pieces of code which look a leftover from previous spec rounds.

* canonical constructors *can* have return statements - compact 
constructors cant; the spec went a bit back and forth on this, but now 
it has settled. Since compact constructors are turned into ordinary 
canonical ones by the parser, I think you need to add an extra check for 
COMPACT_RECORD_CONSTRUCTOR; in other words, this is ok:

record Foo() {
 ?? Foo() { return; } //canonical
}

this isn't

record Foo() {
 ?? Foo { return; } //compact
}

but the compiler rejects both. This probably means tweaking the 
diagnostic a bit to say "a compact constructor must not contain |return| 
statements"

* in general, all diagnostics speak about 'canonical constructor' 
regardless of whether the constructor is canonical of compact; while I 
understand the reason behind what we get now, some of the error messages 
can be confusing, especially if you look at the spec, where canonical 
constructor and compact constructor are two different concepts. This 
should be fixed (even if not immediately, in which case I'd recommend to 
file a JBS issue to track that)

* static accessors are allowed - this means that I can do this:

record Foo(int x) {
public static int x() {return 0; }

public static void main(String[] args) {
 ?? System.err.println(new Foo(42).x());
}
}

This will compile and print 0. The classfile will contain the following 
members:

final class Foo extends java.lang.Record {
 ? public Foo(int);
 ? public static int x();
 ? public static void main(java.lang.String[]);
 ? public java.lang.String toString();
 ? public final int hashCode();
 ? public final boolean equals(java.lang.Object);
}

I believe this is an issue in the compiler, but also in the latest spec 
draft, if I'm not mistaken.

* [optional - style] the env.info.isSerializableLambda could become an 
enum { NONE, LAMBDA, SERIALIZABLE_LAMBDA } instead of two boolean parameters

* this code is still rejected with --enable-preview _disabled_:

class X {
 ??? record R(int i) {
 ??????? return null;
 ??? }
}
class record {}

This gives the error:

Error:
|? records are a preview feature and are disabled by default.
|??? (use --enable-preview to enable records)
|? record R(int i) { return null } }
|? ^
|? Error:
|? illegal start of type
|? record R(int i) { return null } }
|??????????????????? ^

In other words, the parsing logic for members is too aggressive - we 
shouldn't call isRecordStart() in there. If this is not fixed in this 
round, we should keep track with a JBS issue.

* Are the changes in Tokens really needed?

* Check::checkUnique doesn't seem to use the added 'env' parameter - 
changes should be reverted

* Names.jave - the logic for having forbiddenRecordComponentNames could 
use some refresh - in the latest spec we basically have to ban 
components that have same name as j.l.Object members - so I think we can 
implement the check more directly (e.g. w/o having a set of names). 
Also, the serialization names are not needed (although I guess they will 
come back at some point). And, not sure what "get" and "set" are needed for?


Maurizio

On 28/11/2019 16:05, Vicente Romero wrote:
> Hi again,
>
> Sorry but I realized that I forgot to remove some code on the compiler 
> side. The code removed is small, before we were issuing an error if 
> some serialization methods were declared as record members. That 
> section was removed from the spec. I have prepared another iteration 
> with this change at [1]
>
> Thanks,
> Vicente
>
> [1] 
> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01/
>
> On 11/27/19 11:37 PM, Vicente Romero wrote:
>> Hi,
>>
>> Please review the code for the records feature at [1]. This webrev 
>> includes all: APIs, runtime, compiler, serialization, javadoc, and 
>> more! Must of the code has been reviewed but there have been some 
>> changes since reviewers saw it. Also this is the first time an 
>> integral webrev is sent out for review. Last changes on top of my 
>> mind since last review iterations:
>>
>> On the compiler implementation:
>> - it has been adapted to the last version of the language spec [2], 
>> as a reference the JVM spec is at [3]. This implied some changes in 
>> determining if a user defined constructor is the canonical or not. 
>> Now if a constructor is override-equivalent to a signature derived 
>> from the record components, then it is considered the canonical 
>> constructor. And any canonical constructor should satisfy a set of 
>> restrictions, see section 8.10.4 Record Constructor Declarations of 
>> the specification.
>> - It was also added a check to make sure that accessors are not generic.
>> - And that the canonical constructor, if user defined, is not 
>> explicitly invoking any other constructor.
>> - The list of forbidden record component names has also been updated.
>> - new error messages have been added
>>
>> APIs:
>> - there have been some API editing in java.lang.Record, 
>> java.lang.runtime.ObjectMethods and 
>> java.lang.reflect.RecordComponent, java.io.ObjectInputStream, 
>> javax.lang.model (some visitors were added)
>>
>> On the JVM implementation:
>> - some logging capabilities have been added to classFileParser.cpp to 
>> provide the reason for which the Record attribute has been ignored
>>
>> Reflection:
>> - there are several new changes to the implementation of 
>> java.lang.reflect.RecordComponent apart from the spec changes 
>> mentioned before.
>>
>> bug fixes in
>> - compiler
>> - serialization,
>> - JVM, etc
>>
>> As a reference the last iteration of the previous reviews can be 
>> found at [4] under folders: compiler, hotspot_runtime, javadoc, 
>> reflection and serialization,
>>
>> TIA,
>> Vicente
>>
>> [1] 
>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
>> [2] 
>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
>> [3] 
>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
>> [4] http://cr.openjdk.java.net/~vromero/records.review/
>>
>

From john.r.rose at oracle.com  Thu Nov 28 20:23:26 2019
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 28 Nov 2019 12:23:26 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
Message-ID: <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>

On Nov 27, 2019, at 11:34 PM, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> 
> In the end, I wonder whether we should have two kind of APIs, or a
> parameter, distinguishing between "next power of 2" and "next power of 2
> unless input value is already power of 2?.

Naming is important for clarity.  ?Round up? means if it?s already ?rounded?
(whatever that means) the input is returned unchanged.

The other notion is a true ?next up?, because it always increases ?
barring overflow.  The possibility of overflow makes the ?next up? function
more bug-prone than the ?round up? function.

The usual trick for deriving that second function is to add one to the argument
to the first function, ensuring that the result will always increase.

If (for clarity) we implement a ?next power of two? function, rather than ask
coders to use the +1 trick, the second function should be implemented in terms of
the first function using the +1 trick, maybe with an assert added against overflow.

My $0.02.

? John


From claes.redestad at oracle.com  Thu Nov 28 21:02:56 2019
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 28 Nov 2019 22:02:56 +0100
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>
Message-ID: <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com>

I'm working on a new version, but I'm also out sick, so don't expect
anything soon.

I just want to point out that the "round up to power of 2"
implementations I've seen seem prone to the same kind of overflows as a
next up would, just not for exactly the same set of inputs.

/Claes

On 2019-11-28 21:23, John Rose wrote:
> On Nov 27, 2019, at 11:34 PM, Thomas St?fe <thomas.stuefe at gmail.com 
> <mailto:thomas.stuefe at gmail.com>> wrote:
>>
>> In the end, I wonder whether we should have two kind of APIs, or a
>> parameter, distinguishing between "next power of 2" and "next power of 2
>> unless input value is already power of 2?.
> 
> Naming is important for clarity. ??Round up? means if it?s already ?rounded?
> (whatever that means) the input is returned unchanged.
> 
> The other notion is a true ?next up?, because it always increases ?
> barring overflow. ?The possibility of overflow makes the ?next up? function
> more bug-prone than the ?round up? function.
> 
> The usual trick for deriving that second function is to add one to the 
> argument
> to the first function, ensuring that the result will always increase.
> 
> If (for clarity) we implement a ?next power of two? function, rather 
> than ask
> coders to use the +1 trick, the second function should be implemented in 
> terms of
> the first function using the +1 trick, maybe with an assert added 
> against overflow.
> 
> My $0.02.
> 
> ? John
> 

From felix.yang at huawei.com  Fri Nov 29 02:41:27 2019
From: felix.yang at huawei.com (Yangfei (Felix))
Date: Fri, 29 Nov 2019 02:41:27 +0000
Subject: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when
 profiling return and parameters type
In-Reply-To: <d5dd8664-5a6c-0517-f052-5eea02a06990@redhat.com>
References: <DA41BE1DDCA941489001C7FBD7A8820EDAA21F3E@dggeml527-mbx.china.huawei.com>
 <d5dd8664-5a6c-0517-f052-5eea02a06990@redhat.com>
Message-ID: <DA41BE1DDCA941489001C7FBD7A8820EDAA24A9E@dggeml527-mbx.china.huawei.com>

> -----Original Message-----
> From: Andrew Dinn [mailto:adinn at redhat.com]
> Sent: Thursday, November 28, 2019 9:43 PM
> To: Yangfei (Felix) <felix.yang at huawei.com>;
> hotspot-runtime-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8233466: aarch64: remove unnecessary load of mdo when
> profiling return and parameters type
> 
> Hi Felix,
> 
> On 25/11/2019 11:33, Yangfei (Felix) wrote:
> > Ping?   Any comments?
> 
> Yes, that load into mdp is redundant. x86 omits the load and so should AArch64.
> The patch is good.
> 
Hi Andrew, 

  Thanks for reviewing.  Pushed: http://hg.openjdk.java.net/jdk/jdk/rev/fc216dcef2bb

Felix

From Pengfei.Li at arm.com  Fri Nov 29 03:56:50 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Fri, 29 Nov 2019 03:56:50 +0000
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
Message-ID: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi,

Please help review this small fix for 64-bit client build.

Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/
JBS: https://bugs.openjdk.java.net/browse/JDK-8234791

Current 64-bit client VM build fails because errors occurred in dumping
the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which
runs "java -Xshare:dump" after linking the JDK image. But for Client VM
build on 64-bit platforms, the ergonomic flag UseCompressedOops is not
set.[2] This leads to VM exits in checking the flags for dumping the
shared archive.[3]

This change removes the "#if defined" macro to make shared archive dump
successful in 64-bit client build. By tracking the history of the macro,
I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago
when C1 did not have a good support of compressed oops and modified to
current shape[5] in the implementation of tiered compilation. It should
be safe to be removed today.

This patch also fixes another client build issue on AArch64.

[1] http://openjdk.java.net/jeps/341
[2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694
[3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551
[4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7
[5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56

--
Thanks,
Pengfei


From adinn at redhat.com  Fri Nov 29 09:19:12 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 29 Nov 2019 09:19:12 +0000
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com>

Hi Pengfei,

On 29/11/2019 03:56, Pengfei Li (Arm Technology China) wrote:

> Please help review this small fix for 64-bit client build.
> 
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791
> 
> Current 64-bit client VM build fails because errors occurred in dumping
> the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which
> runs "java -Xshare:dump" after linking the JDK image. But for Client VM
> build on 64-bit platforms, the ergonomic flag UseCompressedOops is not
> set.[2] This leads to VM exits in checking the flags for dumping the
> shared archive.[3]
> 
> This change removes the "#if defined" macro to make shared archive dump
> successful in 64-bit client build. By tracking the history of the macro,
> I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago
> when C1 did not have a good support of compressed oops and modified to
> current shape[5] in the implementation of tiered compilation. It should
> be safe to be removed today.
> 
> This patch also fixes another client build issue on AArch64.
> 
> [1] http://openjdk.java.net/jeps/341
> [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694
> [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551
> [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7
> [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56
Your explanation sounds correct and the change to arguments.cpp looks good.

Can you explain why you have modified sharedRuntime_aarch64.cpp to
include nativeInst_aarch64.hpp? I don't see any other change in the
source file that would make this necessary.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From Pengfei.Li at arm.com  Fri Nov 29 10:01:37 2019
From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China))
Date: Fri, 29 Nov 2019 10:01:37 +0000
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com>
Message-ID: <DB7PR08MB311513D3C81A3BAB1ECABF0A96460@DB7PR08MB3115.eurprd08.prod.outlook.com>

Hi Andrew Dinn,

> Your explanation sounds correct and the change to arguments.cpp looks
> good.
> 
> Can you explain why you have modified sharedRuntime_aarch64.cpp to
> include nativeInst_aarch64.hpp? I don't see any other change in the source
> file that would make this necessary.

Thanks for review. There is another build error below after I fixed arguments.cpp.

For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o:
....../src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2836:22: error: 'NativeInstruction' has not been declared
__ add(r20, r20, NativeInstruction::instruction_size);

We see that sharedRuntime_aarch64.cpp uses NativeInstruction but doesn't include nativeInst_aarch64.hpp.
There is no error in Server VM build because the header file is included indirectly from some C2 file.
But for Client VM build where C2 files are not in, this error occurs.

--
Thanks,
Pengfei


From aph at redhat.com  Fri Nov 29 10:11:33 2019
From: aph at redhat.com (Andrew Haley)
Date: Fri, 29 Nov 2019 10:11:33 +0000
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <DB7PR08MB311513D3C81A3BAB1ECABF0A96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com>
 <DB7PR08MB311513D3C81A3BAB1ECABF0A96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <6af3f9d1-f1e3-7f03-6056-fd0c36af65b7@redhat.com>

On 11/29/19 10:01 AM, Pengfei Li (Arm Technology China) wrote:
> Thanks for review. There is another build error below after I fixed arguments.cpp.
> 
> For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o:
> ....../src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2836:22: error: 'NativeInstruction' has not been declared
> __ add(r20, r20, NativeInstruction::instruction_size);
> 
> We see that sharedRuntime_aarch64.cpp uses NativeInstruction but doesn't include nativeInst_aarch64.hpp.
> There is no error in Server VM build because the header file is included indirectly from some C2 file.
> But for Client VM build where C2 files are not in, this error occurs.

OK.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From adinn at redhat.com  Fri Nov 29 10:20:40 2019
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 29 Nov 2019 10:20:40 +0000
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <DB7PR08MB311513D3C81A3BAB1ECABF0A96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
 <3e79b1e4-c548-dc3d-075f-e04e496d3863@redhat.com>
 <DB7PR08MB311513D3C81A3BAB1ECABF0A96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <bcea3df2-8a97-ad13-d3d4-a208a50b6330@redhat.com>

HiPengfei,

On 29/11/2019 10:01, Pengfei Li (Arm Technology China) wrote:
>> Can you explain why you have modified sharedRuntime_aarch64.cpp to
>> include nativeInst_aarch64.hpp? I don't see any other change in the source
>> file that would make this necessary.
> 
> Thanks for review. There is another build error below after I fixed arguments.cpp.
> 
> For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o:
> ....../src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2836:22: error: 'NativeInstruction' has not been declared
> __ add(r20, r20, NativeInstruction::instruction_size);
> 
> We see that sharedRuntime_aarch64.cpp uses NativeInstruction but doesn't include nativeInst_aarch64.hpp.
> There is no error in Server VM build because the header file is included indirectly from some C2 file.
> But for Client VM build where C2 files are not in, this error occurs.
Ok, in that case the patch is good to push.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill


From joe.darcy at oracle.com  Fri Nov 29 14:59:00 2019
From: joe.darcy at oracle.com (Joe Darcy)
Date: Fri, 29 Nov 2019 06:59:00 -0800
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
 <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
Message-ID: <04ab170a-72a3-81e0-c38b-79b9a2533cd1@oracle.com>

Hi Vicente,

Please change all uses of

 ??? @compile --enable-preview -source 14

in jtreg tags to to

 ??? @compile --enable-preview -source ${jdk.version}

The former structure will spuriously fail when the JDK 14 -> 15 
transition occurs.

Also, publishing delta-webrevs between iterations in additional to full 
webrev would help track the incremental changes.

Thanks,

-Joe

On 11/28/2019 8:05 AM, Vicente Romero wrote:
> Hi again,
>
> Sorry but I realized that I forgot to remove some code on the compiler 
> side. The code removed is small, before we were issuing an error if 
> some serialization methods were declared as record members. That 
> section was removed from the spec. I have prepared another iteration 
> with this change at [1]
>
> Thanks,
> Vicente
>
> [1] 
> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01/
>
> On 11/27/19 11:37 PM, Vicente Romero wrote:
>> Hi,
>>
>> Please review the code for the records feature at [1]. This webrev 
>> includes all: APIs, runtime, compiler, serialization, javadoc, and 
>> more! Must of the code has been reviewed but there have been some 
>> changes since reviewers saw it. Also this is the first time an 
>> integral webrev is sent out for review. Last changes on top of my 
>> mind since last review iterations:
>>
>> On the compiler implementation:
>> - it has been adapted to the last version of the language spec [2], 
>> as a reference the JVM spec is at [3]. This implied some changes in 
>> determining if a user defined constructor is the canonical or not. 
>> Now if a constructor is override-equivalent to a signature derived 
>> from the record components, then it is considered the canonical 
>> constructor. And any canonical constructor should satisfy a set of 
>> restrictions, see section 8.10.4 Record Constructor Declarations of 
>> the specification.
>> - It was also added a check to make sure that accessors are not generic.
>> - And that the canonical constructor, if user defined, is not 
>> explicitly invoking any other constructor.
>> - The list of forbidden record component names has also been updated.
>> - new error messages have been added
>>
>> APIs:
>> - there have been some API editing in java.lang.Record, 
>> java.lang.runtime.ObjectMethods and 
>> java.lang.reflect.RecordComponent, java.io.ObjectInputStream, 
>> javax.lang.model (some visitors were added)
>>
>> On the JVM implementation:
>> - some logging capabilities have been added to classFileParser.cpp to 
>> provide the reason for which the Record attribute has been ignored
>>
>> Reflection:
>> - there are several new changes to the implementation of 
>> java.lang.reflect.RecordComponent apart from the spec changes 
>> mentioned before.
>>
>> bug fixes in
>> - compiler
>> - serialization,
>> - JVM, etc
>>
>> As a reference the last iteration of the previous reviews can be 
>> found at [4] under folders: compiler, hotspot_runtime, javadoc, 
>> reflection and serialization,
>>
>> TIA,
>> Vicente
>>
>> [1] 
>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
>> [2] 
>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
>> [3] 
>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
>> [4] http://cr.openjdk.java.net/~vromero/records.review/
>>
>

From jan.lahoda at oracle.com  Fri Nov 29 16:30:57 2019
From: jan.lahoda at oracle.com (Jan Lahoda)
Date: Fri, 29 Nov 2019 17:30:57 +0100
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
Message-ID: <ee8c50c4-60a7-9d86-6ac8-f394df5b3c9e@oracle.com>

Hi Vcente,

Overall, looks fine I think. A few comments on the javac implementation:
-in TypeEnter, I believe this:
memberEnter.memberEnter(tree.defs.diff(List.convert(JCTree.class, 
defsBeforeAddingNewMembers)), env);
is unnecessary (and fairly slow). defsBeforeAddingNewMembers is 
initialized to tree.defs a few lines above, and I don't think tree.defs 
is modified between the point defsBeforeAddingNewMembers and this line. 
I.e. the diff will be empty, but very slow to compute (basically n^2, 
unless I am mistaken). Not having this line would also help in not changing:
test/langtools/tools/javac/importscope/T8193717.java
(If the tree.defs would be changed, I would suggest to keep track of 
which members were added, and avoid the List.diff - tree.defs can 
contain a lot of elements, and the diff is fairly slow.)

-in test/langtools/tools/javac/expswitch/ExpSwitchNestingTest.java, the 
--enable-preview should no longer be needed

-Flags.COMPACT_RECORD_CONSTRUCTOR says it is only for MethodSymbols, but 
is also for VarSymbols. Either the comment should be adjusted, or 
(possibly better), there could be a different/new flag for the fields. 
(That new flag can reuse the same bit position as 
COMPACT_RECORD_CONSTRUCTOR, of course.)

-nits:
--Lower.generateMandatedAccessors: could be rewritten using filter() to 
avoid having an if in the forEach, or changed even more by using 
filter-map-collect (and avoid having a manual ListBuffer). For your 
consideration
--AttrContext.java: no need to move the isLambda initialization, 
correct? (i.e. the actual change is adding isSerializableLambda, but the 
diff says isLambda is removed and added on a different place).
--in src/jdk.compiler/share/classes/com/sun/source/doctree/DocTree.java 
and 
src/jdk.compiler/share/classes/com/sun/source/util/DocTreeFactory.java, 
there are no changes except for a change in the copyright years - these 
could be presumably stripped.
--there is a TODO in:
src/java.compiler/share/classes/javax/lang/model/SourceVersion.java
presumably, this could be resolved based on the current spec?

Jan

On 28. 11. 19 5:37, Vicente Romero wrote:
> Hi,
> 
> Please review the code for the records feature at [1]. This webrev 
> includes all: APIs, runtime, compiler, serialization, javadoc, and more! 
> Must of the code has been reviewed but there have been some changes 
> since reviewers saw it. Also this is the first time an integral webrev 
> is sent out for review. Last changes on top of my mind since last review 
> iterations:
> 
> On the compiler implementation:
> - it has been adapted to the last version of the language spec [2], as a 
> reference the JVM spec is at [3]. This implied some changes in 
> determining if a user defined constructor is the canonical or not. Now 
> if a constructor is override-equivalent to a signature derived from the 
> record components, then it is considered the canonical constructor. And 
> any canonical constructor should satisfy a set of restrictions, see 
> section 8.10.4 Record Constructor Declarations of the specification.
> - It was also added a check to make sure that accessors are not generic.
> - And that the canonical constructor, if user defined, is not explicitly 
> invoking any other constructor.
> - The list of forbidden record component names has also been updated.
> - new error messages have been added
> 
> APIs:
> - there have been some API editing in java.lang.Record, 
> java.lang.runtime.ObjectMethods and java.lang.reflect.RecordComponent, 
> java.io.ObjectInputStream, javax.lang.model (some visitors were added)
> 
> On the JVM implementation:
> - some logging capabilities have been added to classFileParser.cpp to 
> provide the reason for which the Record attribute has been ignored
> 
> Reflection:
> - there are several new changes to the implementation of 
> java.lang.reflect.RecordComponent apart from the spec changes mentioned 
> before.
> 
> bug fixes in
> - compiler
> - serialization,
> - JVM, etc
> 
> As a reference the last iteration of the previous reviews can be found 
> at [4] under folders: compiler, hotspot_runtime, javadoc, reflection and 
> serialization,
> 
> TIA,
> Vicente
> 
> [1] http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
> [2] 
> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html 
> 
> [3] 
> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html 
> 
> [4] http://cr.openjdk.java.net/~vromero/records.review/
> 

From vicente.romero at oracle.com  Fri Nov 29 23:12:30 2019
From: vicente.romero at oracle.com (Vicente Romero)
Date: Fri, 29 Nov 2019 18:12:30 -0500
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <04ab170a-72a3-81e0-c38b-79b9a2533cd1@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
 <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
 <04ab170a-72a3-81e0-c38b-79b9a2533cd1@oracle.com>
Message-ID: <3805c12d-d476-0f09-3344-a0f7e215c34b@oracle.com>

Hi Joe,

All the tests that have an explicit -source 14 are that way because of, 
I think to remember, a bug in jtreg that doesn't expand the 
${some.property} macro for those tests. I don't remember the details though

Thanks,
Vicente

On 11/29/19 9:59 AM, Joe Darcy wrote:
> Hi Vicente,
>
> Please change all uses of
>
> ??? @compile --enable-preview -source 14
>
> in jtreg tags to to
>
> ??? @compile --enable-preview -source ${jdk.version}
>
> The former structure will spuriously fail when the JDK 14 -> 15 
> transition occurs.
>
> Also, publishing delta-webrevs between iterations in additional to 
> full webrev would help track the incremental changes.
>
> Thanks,
>
> -Joe
>
> On 11/28/2019 8:05 AM, Vicente Romero wrote:
>> Hi again,
>>
>> Sorry but I realized that I forgot to remove some code on the 
>> compiler side. The code removed is small, before we were issuing an 
>> error if some serialization methods were declared as record members. 
>> That section was removed from the spec. I have prepared another 
>> iteration with this change at [1]
>>
>> Thanks,
>> Vicente
>>
>> [1] 
>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01/
>>
>> On 11/27/19 11:37 PM, Vicente Romero wrote:
>>> Hi,
>>>
>>> Please review the code for the records feature at [1]. This webrev 
>>> includes all: APIs, runtime, compiler, serialization, javadoc, and 
>>> more! Must of the code has been reviewed but there have been some 
>>> changes since reviewers saw it. Also this is the first time an 
>>> integral webrev is sent out for review. Last changes on top of my 
>>> mind since last review iterations:
>>>
>>> On the compiler implementation:
>>> - it has been adapted to the last version of the language spec [2], 
>>> as a reference the JVM spec is at [3]. This implied some changes in 
>>> determining if a user defined constructor is the canonical or not. 
>>> Now if a constructor is override-equivalent to a signature derived 
>>> from the record components, then it is considered the canonical 
>>> constructor. And any canonical constructor should satisfy a set of 
>>> restrictions, see section 8.10.4 Record Constructor Declarations of 
>>> the specification.
>>> - It was also added a check to make sure that accessors are not 
>>> generic.
>>> - And that the canonical constructor, if user defined, is not 
>>> explicitly invoking any other constructor.
>>> - The list of forbidden record component names has also been updated.
>>> - new error messages have been added
>>>
>>> APIs:
>>> - there have been some API editing in java.lang.Record, 
>>> java.lang.runtime.ObjectMethods and 
>>> java.lang.reflect.RecordComponent, java.io.ObjectInputStream, 
>>> javax.lang.model (some visitors were added)
>>>
>>> On the JVM implementation:
>>> - some logging capabilities have been added to classFileParser.cpp 
>>> to provide the reason for which the Record attribute has been ignored
>>>
>>> Reflection:
>>> - there are several new changes to the implementation of 
>>> java.lang.reflect.RecordComponent apart from the spec changes 
>>> mentioned before.
>>>
>>> bug fixes in
>>> - compiler
>>> - serialization,
>>> - JVM, etc
>>>
>>> As a reference the last iteration of the previous reviews can be 
>>> found at [4] under folders: compiler, hotspot_runtime, javadoc, 
>>> reflection and serialization,
>>>
>>> TIA,
>>> Vicente
>>>
>>> [1] 
>>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
>>> [2] 
>>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
>>> [3] 
>>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
>>> [4] http://cr.openjdk.java.net/~vromero/records.review/
>>>
>>


From ioi.lam at oracle.com  Sat Nov 30 01:02:29 2019
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 29 Nov 2019 17:02:29 -0800
Subject: RFR(S): 8234791: Fix Client VM build for x86_64 and AArch64
In-Reply-To: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
References: <DB7PR08MB3115121B5DBB949E8A96A2FF96460@DB7PR08MB3115.eurprd08.prod.outlook.com>
Message-ID: <0b33fa95-ff15-b628-1891-f990f239e60f@oracle.com>

Hi Pengfei,

I have cc-ed hotspot-compiler-dev at openjdk.java.net.

Please do not push the patch until someone from hotspot-compiler-dev has 
looked at it.

Many people are away due to Thanksgiving in the US.

Thanks
- Ioi

On 11/28/19 7:56 PM, Pengfei Li (Arm Technology China) wrote:
> Hi,
>
> Please help review this small fix for 64-bit client build.
>
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8234791/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8234791
>
> Current 64-bit client VM build fails because errors occurred in dumping
> the CDS archive. In JDK 12, we enabled "Default CDS Archives"[1] which
> runs "java -Xshare:dump" after linking the JDK image. But for Client VM
> build on 64-bit platforms, the ergonomic flag UseCompressedOops is not
> set.[2] This leads to VM exits in checking the flags for dumping the
> shared archive.[3]
>
> This change removes the "#if defined" macro to make shared archive dump
> successful in 64-bit client build. By tracking the history of the macro,
> I found it is initially added as "#ifndef COMPILER1"[4] 10 years ago
> when C1 did not have a good support of compressed oops and modified to
> current shape[5] in the implementation of tiered compilation. It should
> be safe to be removed today.
>
> This patch also fixes another client build issue on AArch64.
>
> [1] http://openjdk.java.net/jeps/341
> [2] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l1694
> [3] http://hg.openjdk.java.net/jdk/jdk/file/981a55672786/src/hotspot/share/runtime/arguments.cpp#l3551
> [4] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/323bd24c6520#l11.7
> [5] http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/d5d065957597#l86.56
>
> --
> Thanks,
> Pengfei
>


From john.r.rose at oracle.com  Sat Nov 30 07:02:33 2019
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 29 Nov 2019 23:02:33 -0800
Subject: RFR: 8234331: Add robust and optimized utility for rounding up to
 next power of two+
In-Reply-To: <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com>
References: <83f39858-2f88-dbab-d587-b4a383414cf5@oracle.com>
 <CAA-vtUy5gEDA_srTGgqjfHWQLtDKVaazsiv=AT4Bgc91Tn=uyw@mail.gmail.com>
 <C5533D91-B35E-4956-BD67-41FDEC505F74@oracle.com>
 <7e25e62f-46c1-ec08-2b2f-60436251d12b@oracle.com>
Message-ID: <84C1F5DD-E939-49A0-A82A-258E6E864B77@oracle.com>

On Nov 28, 2019, at 1:02 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
> 
> I just want to point out that the "round up to power of 2"
> implementations I've seen seem prone to the same kind of overflows as a
> next up would, just not for exactly the same set of inputs.

Thanks; I stand corrected.


From vicente.romero at oracle.com  Sat Nov 30 18:29:23 2019
From: vicente.romero at oracle.com (Vicente Romero)
Date: Sat, 30 Nov 2019 13:29:23 -0500
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <ba1516ea-7bff-3b05-0c58-d2a9fc9b3d70@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
 <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
 <ba1516ea-7bff-3b05-0c58-d2a9fc9b3d70@oracle.com>
Message-ID: <5cc9da98-3dae-b6c6-7acb-c9a4c3484a3b@oracle.com>

Hi,

I have created another iteration at [1], this is just a delta from last 
iteration [2]. Some comments below

[1] 
http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01_delta.01/
[2] http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01

On 11/28/19 1:35 PM, Maurizio Cimadamore wrote:
>
> Hi Vicente,
> generally looks good - few comments below; I tried to focus on areas 
> where the compiler code seemed to diverge from the spec, as well on 
> pieces of code which look a leftover from previous spec rounds.
>
> * canonical constructors *can* have return statements - compact 
> constructors cant; the spec went a bit back and forth on this, but now 
> it has settled. Since compact constructors are turned into ordinary 
> canonical ones by the parser, I think you need to add an extra check 
> for COMPACT_RECORD_CONSTRUCTOR; in other words, this is ok:
>
> record Foo() {
> ?? Foo() { return; } //canonical
> }
>
> this isn't
>
> record Foo() {
> ?? Foo { return; } //compact
> }
>
> but the compiler rejects both. This probably means tweaking the 
> diagnostic a bit to say "a compact constructor must not contain 
> |return| statements"
>

yes I have modified the code so that the error messages can show 
"canonical" or "compact" depending on the case

> * in general, all diagnostics speak about 'canonical constructor' 
> regardless of whether the constructor is canonical of compact; while I 
> understand the reason behind what we get now, some of the error 
> messages can be confusing, especially if you look at the spec, where 
> canonical constructor and compact constructor are two different 
> concepts. This should be fixed (even if not immediately, in which case 
> I'd recommend to file a JBS issue to track that)
>
> * static accessors are allowed - this means that I can do this:
>
> record Foo(int x) {
> public static int x() {return 0; }
>
> public static void main(String[] args) {
> ?? System.err.println(new Foo(42).x());
> }
> }
>
> This will compile and print 0. The classfile will contain the 
> following members:
>
> final class Foo extends java.lang.Record {
> ? public Foo(int);
> ? public static int x();
> ? public static void main(java.lang.String[]);
> ? public java.lang.String toString();
> ? public final int hashCode();
> ? public final boolean equals(java.lang.Object);
> }
>
> I believe this is an issue in the compiler, but also in the latest 
> spec draft, if I'm not mistaken.
>

yes this is a bug, we are considering updating both the spec and the 
compiler. I will submit another iteration as soon as this change is 
reflected in both

> * [optional - style] the env.info.isSerializableLambda could become an 
> enum { NONE, LAMBDA, SERIALIZABLE_LAMBDA } instead of two boolean 
> parameters
>

will do that later, I remember that there were some interactions between 
these flags, they are not exclusive
>
> * this code is still rejected with --enable-preview _disabled_:
>
> class X {
> ??? record R(int i) {
> ??????? return null;
> ??? }
> }
> class record {}
>
> This gives the error:
>
> Error:
> |? records are a preview feature and are disabled by default.
> |??? (use --enable-preview to enable records)
> |? record R(int i) { return null } }
> |? ^
> |? Error:
> |? illegal start of type
> |? record R(int i) { return null } }
> |??????????????????? ^
>
> In other words, the parsing logic for members is too aggressive - we 
> shouldn't call isRecordStart() in there. If this is not fixed in this 
> round, we should keep track with a JBS issue.
>

I have created a follow up issue: 
https://bugs.openjdk.java.net/browse/JDK-8235149

> * Are the changes in Tokens really needed?
>
no removed
>
> * Check::checkUnique doesn't seem to use the added 'env' parameter - 
> changes should be reverted
>
yep done
>
> * Names.jave - the logic for having forbiddenRecordComponentNames 
> could use some refresh - in the latest spec we basically have to ban 
> components that have same name as j.l.Object members - so I think we 
> can implement the check more directly (e.g. w/o having a set of names).
>

right done, I have created a method that check if a record component 
name matches with a parameterless method in Object

> Also, the serialization names are not needed (although I guess they 
> will come back at some point).
>

yes they will, I can remove them now but I guess we will need them once 
we implement a lint warning

> And, not sure what "get" and "set" are needed for?
>

removed
>
>
> Maurizio
>

thanks,
Vicente
>
> On 28/11/2019 16:05, Vicente Romero wrote:
>> Hi again,
>>
>> Sorry but I realized that I forgot to remove some code on the 
>> compiler side. The code removed is small, before we were issuing an 
>> error if some serialization methods were declared as record members. 
>> That section was removed from the spec. I have prepared another 
>> iteration with this change at [1]
>>
>> Thanks,
>> Vicente
>>
>> [1] 
>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01/
>>
>> On 11/27/19 11:37 PM, Vicente Romero wrote:
>>> Hi,
>>>
>>> Please review the code for the records feature at [1]. This webrev 
>>> includes all: APIs, runtime, compiler, serialization, javadoc, and 
>>> more! Must of the code has been reviewed but there have been some 
>>> changes since reviewers saw it. Also this is the first time an 
>>> integral webrev is sent out for review. Last changes on top of my 
>>> mind since last review iterations:
>>>
>>> On the compiler implementation:
>>> - it has been adapted to the last version of the language spec [2], 
>>> as a reference the JVM spec is at [3]. This implied some changes in 
>>> determining if a user defined constructor is the canonical or not. 
>>> Now if a constructor is override-equivalent to a signature derived 
>>> from the record components, then it is considered the canonical 
>>> constructor. And any canonical constructor should satisfy a set of 
>>> restrictions, see section 8.10.4 Record Constructor Declarations of 
>>> the specification.
>>> - It was also added a check to make sure that accessors are not 
>>> generic.
>>> - And that the canonical constructor, if user defined, is not 
>>> explicitly invoking any other constructor.
>>> - The list of forbidden record component names has also been updated.
>>> - new error messages have been added
>>>
>>> APIs:
>>> - there have been some API editing in java.lang.Record, 
>>> java.lang.runtime.ObjectMethods and 
>>> java.lang.reflect.RecordComponent, java.io.ObjectInputStream, 
>>> javax.lang.model (some visitors were added)
>>>
>>> On the JVM implementation:
>>> - some logging capabilities have been added to classFileParser.cpp 
>>> to provide the reason for which the Record attribute has been ignored
>>>
>>> Reflection:
>>> - there are several new changes to the implementation of 
>>> java.lang.reflect.RecordComponent apart from the spec changes 
>>> mentioned before.
>>>
>>> bug fixes in
>>> - compiler
>>> - serialization,
>>> - JVM, etc
>>>
>>> As a reference the last iteration of the previous reviews can be 
>>> found at [4] under folders: compiler, hotspot_runtime, javadoc, 
>>> reflection and serialization,
>>>
>>> TIA,
>>> Vicente
>>>
>>> [1] 
>>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
>>> [2] 
>>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
>>> [3] 
>>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
>>> [4] http://cr.openjdk.java.net/~vromero/records.review/
>>>
>>


From vicente.romero at oracle.com  Sat Nov 30 18:31:57 2019
From: vicente.romero at oracle.com (Vicente Romero)
Date: Sat, 30 Nov 2019 13:31:57 -0500
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <ee8c50c4-60a7-9d86-6ac8-f394df5b3c9e@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
 <ee8c50c4-60a7-9d86-6ac8-f394df5b3c9e@oracle.com>
Message-ID: <fd94b269-9288-bb38-c369-492fe6d8c106@oracle.com>

Hi Jan,

I have addressed your comments on delta iteration [1]

[1] 
http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01_delta.01/ 
some comments below.

On 11/29/19 11:30 AM, Jan Lahoda wrote:
> Hi Vcente,
>
> Overall, looks fine I think. A few comments on the javac implementation:
> -in TypeEnter, I believe this:
> memberEnter.memberEnter(tree.defs.diff(List.convert(JCTree.class, 
> defsBeforeAddingNewMembers)), env);
> is unnecessary

yep removed

> (and fairly slow). defsBeforeAddingNewMembers is initialized to 
> tree.defs a few lines above, and I don't think tree.defs is modified 
> between the point defsBeforeAddingNewMembers and this line. I.e. the 
> diff will be empty, but very slow to compute (basically n^2, unless I 
> am mistaken). Not having this line would also help in not changing:
> test/langtools/tools/javac/importscope/T8193717.java
> (If the tree.defs would be changed, I would suggest to keep track of 
> which members were added, and avoid the List.diff - tree.defs can 
> contain a lot of elements, and the diff is fairly slow.)
>
> -in test/langtools/tools/javac/expswitch/ExpSwitchNestingTest.java, 
> the --enable-preview should no longer be needed

done

>
> -Flags.COMPACT_RECORD_CONSTRUCTOR says it is only for MethodSymbols, 
> but is also for VarSymbols. Either the comment should be adjusted, or 
> (possibly better), there could be a different/new flag for the fields. 
> (That new flag can reuse the same bit position as 
> COMPACT_RECORD_CONSTRUCTOR, of course.)

done

>
> -nits:
> --Lower.generateMandatedAccessors: could be rewritten using filter() 
> to avoid having an if in the forEach, or changed even more by using 
> filter-map-collect (and avoid having a manual ListBuffer). For your 
> consideration

yep done

> --AttrContext.java: no need to move the isLambda initialization, 
> correct? (i.e. the actual change is adding isSerializableLambda, but 
> the diff says isLambda is removed and added on a different place).
done
> --in 
> src/jdk.compiler/share/classes/com/sun/source/doctree/DocTree.java and 
> src/jdk.compiler/share/classes/com/sun/source/util/DocTreeFactory.java, 
> there are no changes except for a change in the copyright years - 
> these could be presumably stripped.
yep done
> --there is a TODO in:
> src/java.compiler/share/classes/javax/lang/model/SourceVersion.java
> presumably, this could be resolved based on the current spec?

I think that this comment doesn't apply as there is no need for any 
additional change. I have removed it
>
> Jan

Thanks,
Vicente

>
> On 28. 11. 19 5:37, Vicente Romero wrote:
>> Hi,
>>
>> Please review the code for the records feature at [1]. This webrev 
>> includes all: APIs, runtime, compiler, serialization, javadoc, and 
>> more! Must of the code has been reviewed but there have been some 
>> changes since reviewers saw it. Also this is the first time an 
>> integral webrev is sent out for review. Last changes on top of my 
>> mind since last review iterations:
>>
>> On the compiler implementation:
>> - it has been adapted to the last version of the language spec [2], 
>> as a reference the JVM spec is at [3]. This implied some changes in 
>> determining if a user defined constructor is the canonical or not. 
>> Now if a constructor is override-equivalent to a signature derived 
>> from the record components, then it is considered the canonical 
>> constructor. And any canonical constructor should satisfy a set of 
>> restrictions, see section 8.10.4 Record Constructor Declarations of 
>> the specification.
>> - It was also added a check to make sure that accessors are not generic.
>> - And that the canonical constructor, if user defined, is not 
>> explicitly invoking any other constructor.
>> - The list of forbidden record component names has also been updated.
>> - new error messages have been added
>>
>> APIs:
>> - there have been some API editing in java.lang.Record, 
>> java.lang.runtime.ObjectMethods and 
>> java.lang.reflect.RecordComponent, java.io.ObjectInputStream, 
>> javax.lang.model (some visitors were added)
>>
>> On the JVM implementation:
>> - some logging capabilities have been added to classFileParser.cpp to 
>> provide the reason for which the Record attribute has been ignored
>>
>> Reflection:
>> - there are several new changes to the implementation of 
>> java.lang.reflect.RecordComponent apart from the spec changes 
>> mentioned before.
>>
>> bug fixes in
>> - compiler
>> - serialization,
>> - JVM, etc
>>
>> As a reference the last iteration of the previous reviews can be 
>> found at [4] under folders: compiler, hotspot_runtime, javadoc, 
>> reflection and serialization,
>>
>> TIA,
>> Vicente
>>
>> [1] 
>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
>> [2] 
>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html 
>>
>> [3] 
>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html 
>>
>> [4] http://cr.openjdk.java.net/~vromero/records.review/
>>


From maurizio.cimadamore at oracle.com  Sat Nov 30 22:51:25 2019
From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore)
Date: Sat, 30 Nov 2019 22:51:25 +0000
Subject: RFR: JEP 359: Records (Preview) (full code)
In-Reply-To: <5cc9da98-3dae-b6c6-7acb-c9a4c3484a3b@oracle.com>
References: <12069074-7830-8bf6-3818-1df7e2a29f18@oracle.com>
 <32b8c703-523f-ae83-291d-4f1b28fa1d91@oracle.com>
 <ba1516ea-7bff-3b05-0c58-d2a9fc9b3d70@oracle.com>
 <5cc9da98-3dae-b6c6-7acb-c9a4c3484a3b@oracle.com>
Message-ID: <1c81d93e-4fcc-68b5-7e1b-a6b8880a2964@oracle.com>

Looks good - but you need diagnostic fragments for the "canonical", 
"compact" diagnostic bits (no need for re-review).

Maurizio

On 30/11/2019 18:29, Vicente Romero wrote:
> Hi,
>
> I have created another iteration at [1], this is just a delta from 
> last iteration [2]. Some comments below
>
> [1] 
> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01_delta.01/
> [2] http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01
>
> On 11/28/19 1:35 PM, Maurizio Cimadamore wrote:
>>
>> Hi Vicente,
>> generally looks good - few comments below; I tried to focus on areas 
>> where the compiler code seemed to diverge from the spec, as well on 
>> pieces of code which look a leftover from previous spec rounds.
>>
>> * canonical constructors *can* have return statements - compact 
>> constructors cant; the spec went a bit back and forth on this, but 
>> now it has settled. Since compact constructors are turned into 
>> ordinary canonical ones by the parser, I think you need to add an 
>> extra check for COMPACT_RECORD_CONSTRUCTOR; in other words, this is ok:
>>
>> record Foo() {
>> ?? Foo() { return; } //canonical
>> }
>>
>> this isn't
>>
>> record Foo() {
>> ?? Foo { return; } //compact
>> }
>>
>> but the compiler rejects both. This probably means tweaking the 
>> diagnostic a bit to say "a compact constructor must not contain 
>> |return| statements"
>>
>
> yes I have modified the code so that the error messages can show 
> "canonical" or "compact" depending on the case
>
>> * in general, all diagnostics speak about 'canonical constructor' 
>> regardless of whether the constructor is canonical of compact; while 
>> I understand the reason behind what we get now, some of the error 
>> messages can be confusing, especially if you look at the spec, where 
>> canonical constructor and compact constructor are two different 
>> concepts. This should be fixed (even if not immediately, in which 
>> case I'd recommend to file a JBS issue to track that)
>>
>> * static accessors are allowed - this means that I can do this:
>>
>> record Foo(int x) {
>> public static int x() {return 0; }
>>
>> public static void main(String[] args) {
>> ?? System.err.println(new Foo(42).x());
>> }
>> }
>>
>> This will compile and print 0. The classfile will contain the 
>> following members:
>>
>> final class Foo extends java.lang.Record {
>> ? public Foo(int);
>> ? public static int x();
>> ? public static void main(java.lang.String[]);
>> ? public java.lang.String toString();
>> ? public final int hashCode();
>> ? public final boolean equals(java.lang.Object);
>> }
>>
>> I believe this is an issue in the compiler, but also in the latest 
>> spec draft, if I'm not mistaken.
>>
>
> yes this is a bug, we are considering updating both the spec and the 
> compiler. I will submit another iteration as soon as this change is 
> reflected in both
>
>> * [optional - style] the env.info.isSerializableLambda could become 
>> an enum { NONE, LAMBDA, SERIALIZABLE_LAMBDA } instead of two boolean 
>> parameters
>>
>
> will do that later, I remember that there were some interactions 
> between these flags, they are not exclusive
>>
>> * this code is still rejected with --enable-preview _disabled_:
>>
>> class X {
>> ??? record R(int i) {
>> ??????? return null;
>> ??? }
>> }
>> class record {}
>>
>> This gives the error:
>>
>> Error:
>> |? records are a preview feature and are disabled by default.
>> |??? (use --enable-preview to enable records)
>> |? record R(int i) { return null } }
>> |? ^
>> |? Error:
>> |? illegal start of type
>> |? record R(int i) { return null } }
>> |??????????????????? ^
>>
>> In other words, the parsing logic for members is too aggressive - we 
>> shouldn't call isRecordStart() in there. If this is not fixed in this 
>> round, we should keep track with a JBS issue.
>>
>
> I have created a follow up issue: 
> https://bugs.openjdk.java.net/browse/JDK-8235149
>
>> * Are the changes in Tokens really needed?
>>
> no removed
>>
>> * Check::checkUnique doesn't seem to use the added 'env' parameter - 
>> changes should be reverted
>>
> yep done
>>
>> * Names.jave - the logic for having forbiddenRecordComponentNames 
>> could use some refresh - in the latest spec we basically have to ban 
>> components that have same name as j.l.Object members - so I think we 
>> can implement the check more directly (e.g. w/o having a set of names).
>>
>
> right done, I have created a method that check if a record component 
> name matches with a parameterless method in Object
>
>> Also, the serialization names are not needed (although I guess they 
>> will come back at some point).
>>
>
> yes they will, I can remove them now but I guess we will need them 
> once we implement a lint warning
>
>> And, not sure what "get" and "set" are needed for?
>>
>
> removed
>>
>>
>> Maurizio
>>
>
> thanks,
> Vicente
>>
>> On 28/11/2019 16:05, Vicente Romero wrote:
>>> Hi again,
>>>
>>> Sorry but I realized that I forgot to remove some code on the 
>>> compiler side. The code removed is small, before we were issuing an 
>>> error if some serialization methods were declared as record members. 
>>> That section was removed from the spec. I have prepared another 
>>> iteration with this change at [1]
>>>
>>> Thanks,
>>> Vicente
>>>
>>> [1] 
>>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.01/
>>>
>>> On 11/27/19 11:37 PM, Vicente Romero wrote:
>>>> Hi,
>>>>
>>>> Please review the code for the records feature at [1]. This webrev 
>>>> includes all: APIs, runtime, compiler, serialization, javadoc, and 
>>>> more! Must of the code has been reviewed but there have been some 
>>>> changes since reviewers saw it. Also this is the first time an 
>>>> integral webrev is sent out for review. Last changes on top of my 
>>>> mind since last review iterations:
>>>>
>>>> On the compiler implementation:
>>>> - it has been adapted to the last version of the language spec [2], 
>>>> as a reference the JVM spec is at [3]. This implied some changes in 
>>>> determining if a user defined constructor is the canonical or not. 
>>>> Now if a constructor is override-equivalent to a signature derived 
>>>> from the record components, then it is considered the canonical 
>>>> constructor. And any canonical constructor should satisfy a set of 
>>>> restrictions, see section 8.10.4 Record Constructor Declarations of 
>>>> the specification.
>>>> - It was also added a check to make sure that accessors are not 
>>>> generic.
>>>> - And that the canonical constructor, if user defined, is not 
>>>> explicitly invoking any other constructor.
>>>> - The list of forbidden record component names has also been updated.
>>>> - new error messages have been added
>>>>
>>>> APIs:
>>>> - there have been some API editing in java.lang.Record, 
>>>> java.lang.runtime.ObjectMethods and 
>>>> java.lang.reflect.RecordComponent, java.io.ObjectInputStream, 
>>>> javax.lang.model (some visitors were added)
>>>>
>>>> On the JVM implementation:
>>>> - some logging capabilities have been added to classFileParser.cpp 
>>>> to provide the reason for which the Record attribute has been ignored
>>>>
>>>> Reflection:
>>>> - there are several new changes to the implementation of 
>>>> java.lang.reflect.RecordComponent apart from the spec changes 
>>>> mentioned before.
>>>>
>>>> bug fixes in
>>>> - compiler
>>>> - serialization,
>>>> - JVM, etc
>>>>
>>>> As a reference the last iteration of the previous reviews can be 
>>>> found at [4] under folders: compiler, hotspot_runtime, javadoc, 
>>>> reflection and serialization,
>>>>
>>>> TIA,
>>>> Vicente
>>>>
>>>> [1] 
>>>> http://cr.openjdk.java.net/~vromero/records.review/all_code/webrev.00/
>>>> [2] 
>>>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jls.html
>>>> [3] 
>>>> http://cr.openjdk.java.net/~gbierman/jep359/jep359-20191125/specs/records-jvms.html
>>>> [4] http://cr.openjdk.java.net/~vromero/records.review/
>>>>
>>>
>

From xun.chen at ele.me  Wed Nov 27 04:13:29 2019
From: xun.chen at ele.me (xun.chen at ele.me)
Date: Wed, 27 Nov 2019 04:13:29 +0000
Subject: hotspot support dynamic expansion method stack
In-Reply-To: <7ddcb0af-877e-f842-5016-2226d1cb7b0f@oracle.com>
References: <9645B0EE-CEA6-4447-8574-6E75B5099695@ele.me>
 <7ddcb0af-877e-f842-5016-2226d1cb7b0f@oracle.com>
Message-ID: <FF479F9D-636E-4678-9185-C2E51A2D46A0@ele.me>

thanks ?hotspot uses threads with fixed stack sizes ? Which jvm vendors support dynamic stacks?

?????????
??(ace)?????????? ELEME Inc.
email?xun.chen at ele.me<mailto:xun.chen at ele.me> | mobile:+86 15216614939
http://ele.me ???


? 2019?11?27????7:57?David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> ???

Hi,

On 26/11/2019 5:29 pm, xun.chen at ele.me<mailto:xun.chen at ele.me> wrote:
hi?
The Java virtual machine specification specifies two exception conditions for this area: If the stack depth requested by the thread is greater than the depth allowed by the virtual machine, a StackOverflowError exception will be thrown. If the virtual machine fails to apply for enough memory, it will run OutOfMemoryError exception.
But Does hotspot support dynamic expansion method stack?

hotspot uses threads with fixed stack sizes.

How to reproduce this phenomenon ?

You want to know how to generate a StackOverflowError? Just keep recursing into a function.

David
-----

?????????
??(ace)?????????? ELEME Inc.
email?xun.chen at ele.me<mailto:xun.chen at ele.me><mailto:xun.chen at ele.me> | mobile:+86 15216614939
http://ele.me ???