RFR(XXS): 8188109 JVM should print a warning message that -Xshare:on may cause VM to abort start-up

Ioi Lam ioi.lam at oracle.com
Thu May 31 06:09:25 UTC 2018



On 5/30/18 10:57 PM, Thomas Stüfe wrote:
> On Thu, May 31, 2018 at 7:50 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>> On 5/30/18 9:53 PM, David Holmes wrote:
>>> On 31/05/2018 2:24 PM, Ioi Lam wrote:
>>>>
>>>>
>>>> On 5/30/18 9:13 PM, David Holmes wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> On 31/05/2018 2:01 PM, Ioi Lam wrote:
>>>>>> On 5/30/18 6:47 PM, David Holmes wrote:
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>> Sorry but this troubles me ...
>>>>>>>
>>>>>>> On 31/05/2018 9:39 AM, Ioi Lam wrote:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8188109
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk11/8188109-xshare-on-print-warning.v01/
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please review this one-liner patch.
>>>>>>>>
>>>>>>>> -Xshare:on may cause infrequent/intermittent start-up failure due to
>>>>>>>> the presence of Address Space Layout Randomization (ASLR). This option is
>>>>>>>> intended for testing (the internals of CDS) only and should not be used in
>>>>>>>> production environments.
>>>>>>>>
>>>>>>>> With this patch, the following warning message is printed when
>>>>>>>> -Xshare:on is specified:
>>>>>>>>
>>>>>>>> $ java -Xshare:on -version
>>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: -Xshare:on is for testing
>>>>>>>> purpose only and may cause JVM start-up failure. Use -Xshare:auto instead.
>>>>>>>> java version "11-internal" 2018-09-25
>>>>>>>> Java(TM) SE Runtime Environment 18.9 (fastdebug build
>>>>>>>> 11-internal+0-adhoc.iklam.open)
>>>>>>>> Java HotSpot(TM) 64-Bit Server VM 18.9 (fastdebug build
>>>>>>>> 11-internal+0-adhoc.iklam.open, mixed mode, sharing)
>>>>>>>
>>>>>>> So should this warning only be enabled in product builds?
>>>>>>>
>>>>>>> Even then it may be annoying for anyone who runs with -Xshare:on as
>>>>>>> they've set up CDS as documented [1][2] and they know their environment
>>>>>>> works ok - now they get a warning.
>>>>>>>
>>>>>>> Also I'm unclear how "on" fails due to ASLR but "auto" keeps going?
>>>>>>>
>>>>>> The documentation [1] says:
>>>>>>
>>>>>> -Xshare:on
>>>>>> To enable class data sharing. If class data sharing can't be enabled,
>>>>>> print an error message and exit.
>>>>>>
>>>>>> -Xshare:auto
>>>>>> To enable class data sharing by default. Enable class data sharing
>>>>>> whenever possible.
>>>>>>
>>>>>> So if mapping fails due to ASLR, "on" will exit and "auto" will disable
>>>>>> CDS and continue .
>>>>>
>>>>> Ah! In the bug you state "-Xshare:auto continue to execute (with CDS
>>>>> enabled)" - so that should be disabled.
>>>>>
>>>>>> The documentation in [2] is wrong.  It says "Ensure that you have
>>>>>> specified the option -Xshare:on or -Xshare:auto.", but -Xshare:on should not
>>>>>> be used in production environments. I have filed a doc bug for this
>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8204141).
>>>>>>
>>>>>> The main reason for doing this REF is -- if people have been following
>>>>>> [2] and using -Xshare:on, their setup is NOT OK. ASLR may happen just very
>>>>>> rarely, but you don't want your program suddenly failing (e.g., if some
>>>>>> admin has turned on more aggressive ASLR settings).
>>>>>
>>>>> But conversely you don't want your application to suddenly and silently
>>>>> stop using CDS because of ASLR and you've masked that by using "auto"!
>>>>>
>>>>>> As more people are moving their Java workload to micro-services type of
>>>>>> environments, JVM launches will happen more, and there will be more chances
>>>>>> of running into the ASLR issue. Therefore, we should fix the docs, and warn
>>>>>> people that they should switch to -Xshare:auto.
>>>>>
>>>>> That's only a partial solution. If ASLR is a problem then that needs to
>>>>> be known and addressed.
>>>>>
>>>>>>> Maybe only if the archive mapping fails and "on" was used then give a
>>>>>>> warning? Or just improve the message given when the VM aborts?
>>>>>>>
>>>>>> That's already too late, especially for people running critical
>>>>>> services.
>>>>>>
>>>>>> We want people to see this warning and actively fix their scripts to
>>>>>> get rid -Xshare:on.
>>>>>
>>>>> I think what we want/need people to realize is that ASLR can seriously
>>>>> impact the ability to use CDS and if you are trying to use CDS for startup
>>>>> or footprint reasons then you're going to have a major problem if CDS is
>>>>> silently disabled!
>>>>>
>>>>> I think "on" and "auto" are both just as valid. "on" is for people who
>>>>> need CDS reliably and want to fail fast if it's not working. "auto" is for
>>>>> people who would like CDS but can live without it.
>>>>>
>>>> I think -Xshare:on has the potential to do much more harm than good.
>>>>
>>>> People have lived with the fact that optimizations in the JVM are not
>>>> always deterministic. They want their programs to run regardless. CDS is the
>>>> only optimization that has an option to say "let the program fail if the
>>>> optimization is not available".
>>>
>>> I don't agree with that characterization - I think it oversimplifies the
>>> situation. If you want to push this analogy then the right analogy would be
>>> allowing "-server" to silently run the interpreter instead because there was
>>> some error configuring the JIT! I wonder how many users would be happy with
>>> that! But the bulk of optimizations are not things that can fail as such so
>>> I don't think the comparison holds.
>>>
>>> I think this is purely a documentation and education issue. Particularly
>>> because this is not something new with JDK 11 - this is an issue that exists
>>> with CDS in all releases. So you want to get the message out to all users
>>> that "auto" may be preferable to "on".
>>>
>>> Until something actually fails I doubt anyone would notice your warning
>>> anyway.
>>>
>>> Maybe we need -Xshare:on_and_I_really_mean_it ;-)
>>>
>> -Xshare:on was an ill-conceived and dangerous option. It was designed when
>> ASLR wasn't common. With ASLR more in common use, and short-lived JVMs
>> becoming more common, the danger is getting bigger and bigger.
>>
> Just out of curiosity (I never payed close attention to CDS), how do
> you communicate the mapping address the first process establishes to
> the subsequent processes attaching? Or do you just have a fixed well
> known address value baked into the jvm?
>
> I just try to understand how probable a failed mapping could be in a
> 64bit address space.

The CDS archive is created at a fixed address. The default value is in 
the SharedBaseAddress option (0x800000000 on 64-bit). This is usually a 
good range on Linux, as ASLR (usually) does not place shared library 
segments in this range.

However, we have analyzed our distributed test runs and found a small 
percentage (<5%?? can't remember the exact number) of cases where the 
mapping would fail. So overall you get the benefit of CDS, but not every 
single time.

We have been thinking about making the CDS archive relocatable, but 
we're not there yet :-(

In the short term, we can relocate by patching all the pointers. We just 
added the ability to iterate over all metaspace pointers in JDK 10 (see 
metaspaceClosure.hpp) so we can map to an alternative address and 
relocate. We can probably do part of the relocation incrementally as the 
classes are being loaded.

In the long term, we probably should make the metadata position 
independent, so it can be mapped at any address. Not quite sure how to 
do that yet ...


Thanks
- Ioi
>> Just because a bad option has existed for a long time does not mean we
>> should not fix it.
>>
>> The only reason for this option to exist is for diagnostic purposes. It can
>> check for
>>
>>   * Did my archive fail to map due to ASLR?
>>   * Did I specify a bad path to the archive?
>>   * Did I specify a bad archive file (e.g., one created by a different
>>     JDK version)?
>>
>> So Thomas's suggestion of providing this functionality as a diagnostic flag
>> makes sense.
>>
>> In any case, I think this particular REF is not a good way of handling the
>> problem, so I am withdrawing it.
>>
>> I'll file a separate CSR to actually change how -Xshare:on works, after more
>> discussion on how best to change it.
> I agree with all your points.
>
> ..Thomas
>
>> Thanks
>> - Ioi
>>
>>
>>> Cheers,
>>> David
>>>
>>>> There are diagnostic options to find out if CDS is enabled. If you run
>>>> with -showversion, it will tell you if sharing is enabled. People don't need
>>>> their program to die just to find this out.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> [1]
>>>>>>> https://docs.oracle.com/javase/10/vm/class-data-sharing.htm#JSJVM-GUID-0260F857-A70E-4399-A1DF-A5766BE33285
>>>>>>> [2]
>>>>>>> https://docs.oracle.com/javase/10/tools/java.htm#JSWOR-GUID-31503FCE-93D0-4175-9B4F-F6A738B2F4C4
>>>>>>>
>>>>>>>>      --- vs ---
>>>>>>>>
>>>>>>>> $ java-Xshare:auto -version
>>>>>>>> java version "11-internal" 2018-09-25
>>>>>>>> Java(TM) SE Runtime Environment 18.9 (fastdebug build
>>>>>>>> 11-internal+0-adhoc.iklam.open)
>>>>>>>> Java HotSpot(TM) 64-Bit Server VM 18.9 (fastdebug build
>>>>>>>> 11-internal+0-adhoc.iklam.open, mixed mode, sharing)
>>>>>>>>
>>>>>>>> I am testing with HotSpot tiers 1-3 to make sure the tests don't get
>>>>>>>> tripped by this new warning message.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>



More information about the hotspot-runtime-dev mailing list