RFR(XXS): 8188109 JVM should print a warning message that -Xshare:on may cause VM to abort start-up

Ioi Lam ioi.lam at oracle.com
Fri Jun 1 05:40:05 UTC 2018



On 5/30/18 11:33 PM, Thomas Stüfe wrote:
> On Thu, May 31, 2018 at 8:09 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>> On 5/30/18 10:57 PM, Thomas Stüfe wrote:
>>> On Thu, May 31, 2018 at 7:50 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>
>>>> On 5/30/18 9:53 PM, David Holmes wrote:
>>>>> On 31/05/2018 2:24 PM, Ioi Lam wrote:
>>>>>>
>>>>>>
>>>>>> On 5/30/18 9:13 PM, David Holmes wrote:
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>> On 31/05/2018 2:01 PM, Ioi Lam wrote:
>>>>>>>> On 5/30/18 6:47 PM, David Holmes wrote:
>>>>>>>>> Hi Ioi,
>>>>>>>>>
>>>>>>>>> Sorry but this troubles me ...
>>>>>>>>>
>>>>>>>>> On 31/05/2018 9:39 AM, Ioi Lam wrote:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8188109
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk11/8188109-xshare-on-print-warning.v01/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Please review this one-liner patch.
>>>>>>>>>>
>>>>>>>>>> -Xshare:on may cause infrequent/intermittent start-up failure due
>>>>>>>>>> to
>>>>>>>>>> the presence of Address Space Layout Randomization (ASLR). This
>>>>>>>>>> option is
>>>>>>>>>> intended for testing (the internals of CDS) only and should not be
>>>>>>>>>> used in
>>>>>>>>>> production environments.
>>>>>>>>>>
>>>>>>>>>> With this patch, the following warning message is printed when
>>>>>>>>>> -Xshare:on is specified:
>>>>>>>>>>
>>>>>>>>>> $ java -Xshare:on -version
>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: -Xshare:on is for
>>>>>>>>>> testing
>>>>>>>>>> purpose only and may cause JVM start-up failure. Use -Xshare:auto
>>>>>>>>>> instead.
>>>>>>>>>> java version "11-internal" 2018-09-25
>>>>>>>>>> Java(TM) SE Runtime Environment 18.9 (fastdebug build
>>>>>>>>>> 11-internal+0-adhoc.iklam.open)
>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM 18.9 (fastdebug build
>>>>>>>>>> 11-internal+0-adhoc.iklam.open, mixed mode, sharing)
>>>>>>>>>
>>>>>>>>> So should this warning only be enabled in product builds?
>>>>>>>>>
>>>>>>>>> Even then it may be annoying for anyone who runs with -Xshare:on as
>>>>>>>>> they've set up CDS as documented [1][2] and they know their
>>>>>>>>> environment
>>>>>>>>> works ok - now they get a warning.
>>>>>>>>>
>>>>>>>>> Also I'm unclear how "on" fails due to ASLR but "auto" keeps going?
>>>>>>>>>
>>>>>>>> The documentation [1] says:
>>>>>>>>
>>>>>>>> -Xshare:on
>>>>>>>> To enable class data sharing. If class data sharing can't be enabled,
>>>>>>>> print an error message and exit.
>>>>>>>>
>>>>>>>> -Xshare:auto
>>>>>>>> To enable class data sharing by default. Enable class data sharing
>>>>>>>> whenever possible.
>>>>>>>>
>>>>>>>> So if mapping fails due to ASLR, "on" will exit and "auto" will
>>>>>>>> disable
>>>>>>>> CDS and continue .
>>>>>>>
>>>>>>> Ah! In the bug you state "-Xshare:auto continue to execute (with CDS
>>>>>>> enabled)" - so that should be disabled.
>>>>>>>
>>>>>>>> The documentation in [2] is wrong.  It says "Ensure that you have
>>>>>>>> specified the option -Xshare:on or -Xshare:auto.", but -Xshare:on
>>>>>>>> should not
>>>>>>>> be used in production environments. I have filed a doc bug for this
>>>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8204141).
>>>>>>>>
>>>>>>>> The main reason for doing this REF is -- if people have been
>>>>>>>> following
>>>>>>>> [2] and using -Xshare:on, their setup is NOT OK. ASLR may happen just
>>>>>>>> very
>>>>>>>> rarely, but you don't want your program suddenly failing (e.g., if
>>>>>>>> some
>>>>>>>> admin has turned on more aggressive ASLR settings).
>>>>>>>
>>>>>>> But conversely you don't want your application to suddenly and
>>>>>>> silently
>>>>>>> stop using CDS because of ASLR and you've masked that by using "auto"!
>>>>>>>
>>>>>>>> As more people are moving their Java workload to micro-services type
>>>>>>>> of
>>>>>>>> environments, JVM launches will happen more, and there will be more
>>>>>>>> chances
>>>>>>>> of running into the ASLR issue. Therefore, we should fix the docs,
>>>>>>>> and warn
>>>>>>>> people that they should switch to -Xshare:auto.
>>>>>>>
>>>>>>> That's only a partial solution. If ASLR is a problem then that needs
>>>>>>> to
>>>>>>> be known and addressed.
>>>>>>>
>>>>>>>>> Maybe only if the archive mapping fails and "on" was used then give
>>>>>>>>> a
>>>>>>>>> warning? Or just improve the message given when the VM aborts?
>>>>>>>>>
>>>>>>>> That's already too late, especially for people running critical
>>>>>>>> services.
>>>>>>>>
>>>>>>>> We want people to see this warning and actively fix their scripts to
>>>>>>>> get rid -Xshare:on.
>>>>>>>
>>>>>>> I think what we want/need people to realize is that ASLR can seriously
>>>>>>> impact the ability to use CDS and if you are trying to use CDS for
>>>>>>> startup
>>>>>>> or footprint reasons then you're going to have a major problem if CDS
>>>>>>> is
>>>>>>> silently disabled!
>>>>>>>
>>>>>>> I think "on" and "auto" are both just as valid. "on" is for people who
>>>>>>> need CDS reliably and want to fail fast if it's not working. "auto" is
>>>>>>> for
>>>>>>> people who would like CDS but can live without it.
>>>>>>>
>>>>>> I think -Xshare:on has the potential to do much more harm than good.
>>>>>>
>>>>>> People have lived with the fact that optimizations in the JVM are not
>>>>>> always deterministic. They want their programs to run regardless. CDS
>>>>>> is the
>>>>>> only optimization that has an option to say "let the program fail if
>>>>>> the
>>>>>> optimization is not available".
>>>>>
>>>>> I don't agree with that characterization - I think it oversimplifies the
>>>>> situation. If you want to push this analogy then the right analogy would
>>>>> be
>>>>> allowing "-server" to silently run the interpreter instead because there
>>>>> was
>>>>> some error configuring the JIT! I wonder how many users would be happy
>>>>> with
>>>>> that! But the bulk of optimizations are not things that can fail as such
>>>>> so
>>>>> I don't think the comparison holds.
>>>>>
>>>>> I think this is purely a documentation and education issue. Particularly
>>>>> because this is not something new with JDK 11 - this is an issue that
>>>>> exists
>>>>> with CDS in all releases. So you want to get the message out to all
>>>>> users
>>>>> that "auto" may be preferable to "on".
>>>>>
>>>>> Until something actually fails I doubt anyone would notice your warning
>>>>> anyway.
>>>>>
>>>>> Maybe we need -Xshare:on_and_I_really_mean_it ;-)
>>>>>
>>>> -Xshare:on was an ill-conceived and dangerous option. It was designed
>>>> when
>>>> ASLR wasn't common. With ASLR more in common use, and short-lived JVMs
>>>> becoming more common, the danger is getting bigger and bigger.
>>>>
>>> Just out of curiosity (I never payed close attention to CDS), how do
>>> you communicate the mapping address the first process establishes to
>>> the subsequent processes attaching? Or do you just have a fixed well
>>> known address value baked into the jvm?
>>>
>>> I just try to understand how probable a failed mapping could be in a
>>> 64bit address space.
>>
>> The CDS archive is created at a fixed address. The default value is in the
>> SharedBaseAddress option (0x800000000 on 64-bit). This is usually a good
>> range on Linux, as ASLR (usually) does not place shared library segments in
>> this range.
>>
>> However, we have analyzed our distributed test runs and found a small
>> percentage (<5%?? can't remember the exact number) of cases where the
>> mapping would fail. So overall you get the benefit of CDS, but not every
>> single time.
>>
>> We have been thinking about making the CDS archive relocatable, but we're
>> not there yet :-(
>>
>> In the short term, we can relocate by patching all the pointers. We just
>> added the ability to iterate over all metaspace pointers in JDK 10 (see
>> metaspaceClosure.hpp) so we can map to an alternative address and relocate.
>> We can probably do part of the relocation incrementally as the classes are
>> being loaded.
>>
> Okay... so, am I understanding this right, you fix up pointers in the
> java heap or in other non-shared process local memory sections,
> pointing to the metaspace? E.g. the object Klass* pointers?

We also fix up pointers inside the MetaspaceObjs. For example, 
InstanceKlass::_super points to another InstanceKlass. During 
relocation, _super will be modified to point to the new location.

> What do you do about metaspace internal pointers, e.g. pointers from
> one Metachunk to another? You cannot fix them, right, since you share
> the memory with other processes which may have mapped it at different
> bases?

We actually don't have any Metachunks inside the CDS regions. There are 
2 main regions (RW and RO) where we allocate objects like 
InstanceKlasses, Methods, ConstantPools, etc. The allocation is done 
during CDS dump time. Each region is a contiguous block of memory. At 
runtime, we don't free any of these objects during, and you can't 
allocate new objects from these regions, either.

>> In the long term, we probably should make the metadata position independent,
>> so it can be mapped at any address. Not quite sure how to do that yet ...
>>
> This sounds (to my very uninformed mind) more promising - by using
> indexes instead of pointers into the metaspace. E.g. to get a Klass*,
> add base + index. Like we do already with compressed class pointers.
>
> For many things one could even use 32bit indices, at least for Klass*
> pointers, since the compressed class space cannot exceed 32bit anyway.
One issue with the (base + offset) is how to do it quickly. We need to 
tweak the C++ compilers so it can be as fast as loading a direct pointer :-)

> I recently though about a similar thing myself, at a much much smaller
> scale, I wondered whether it would be worth to replace all linking
> pointers in Metachunk with 32bit indices (or even 16bit) indices to
> shrink the Metachunk header.
I think that would be good. I assume we don't walk these pointers that 
often (at a much lower frequency than reading the *real* metadata 
pointers such as Klass::_name, InstanceKlass::_constants, etc.

Thanks
- Ioi
> One may have to get rid of the notion of linking Metaspace nodes
> together (VirtualSpaceList) in favour of one continuous memory block -
> but that may have other advantages too (make metaspace coding simpler,
> allow for page-wise de-commit to shrink process memory, reduce number
> of memory mappings per process...)
>
> Thanks Thomas
>
>> Thanks
>> - Ioi
>>
>>>> Just because a bad option has existed for a long time does not mean we
>>>> should not fix it.
>>>>
>>>> The only reason for this option to exist is for diagnostic purposes. It
>>>> can
>>>> check for
>>>>
>>>>    * Did my archive fail to map due to ASLR?
>>>>    * Did I specify a bad path to the archive?
>>>>    * Did I specify a bad archive file (e.g., one created by a different
>>>>      JDK version)?
>>>>
>>>> So Thomas's suggestion of providing this functionality as a diagnostic
>>>> flag
>>>> makes sense.
>>>>
>>>> In any case, I think this particular REF is not a good way of handling
>>>> the
>>>> problem, so I am withdrawing it.
>>>>
>>>> I'll file a separate CSR to actually change how -Xshare:on works, after
>>>> more
>>>> discussion on how best to change it.
>>> I agree with all your points.
>>>
>>> ..Thomas
>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>> There are diagnostic options to find out if CDS is enabled. If you run
>>>>>> with -showversion, it will tell you if sharing is enabled. People don't
>>>>>> need
>>>>>> their program to die just to find this out.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>> https://docs.oracle.com/javase/10/vm/class-data-sharing.htm#JSJVM-GUID-0260F857-A70E-4399-A1DF-A5766BE33285
>>>>>>>>> [2]
>>>>>>>>>
>>>>>>>>> https://docs.oracle.com/javase/10/tools/java.htm#JSWOR-GUID-31503FCE-93D0-4175-9B4F-F6A738B2F4C4
>>>>>>>>>
>>>>>>>>>>       --- vs ---
>>>>>>>>>>
>>>>>>>>>> $ java-Xshare:auto -version
>>>>>>>>>> java version "11-internal" 2018-09-25
>>>>>>>>>> Java(TM) SE Runtime Environment 18.9 (fastdebug build
>>>>>>>>>> 11-internal+0-adhoc.iklam.open)
>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM 18.9 (fastdebug build
>>>>>>>>>> 11-internal+0-adhoc.iklam.open, mixed mode, sharing)
>>>>>>>>>>
>>>>>>>>>> I am testing with HotSpot tiers 1-3 to make sure the tests don't
>>>>>>>>>> get
>>>>>>>>>> tripped by this new warning message.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> - Ioi
>>>>>>>>



More information about the hotspot-runtime-dev mailing list