RFR (xs) 8075967: Zero interpreter asserts for SafeFetch<32, N> calls in ObjectMonitor

Fri Mar 27 06:46:34 UTC 2015

Colleen, David,

can we all agree an Colleens proposal:

inline bool CanUseSafeFetch32() {
  return NOT_ZERO(StubRoutines::SafeFetch32_stub() != NULL)
ZERO_ONLY(false);
}

?

If that is ok, I make a patch for that.

As for objectMonitor:

>> So I thought of changing the objectMonitor code to do
>>    if (CanUseSafeFetchN) {
>>      SafeFetchN(...)
>>    } else {
>>       just fetch unsafely
>>    }
>>
>
> This serves no purpose and adds overheads in the monitor code. We don't
> need the "have the stubs been generated" check for this code. And the "can
> I use ZERO" check would be pointless: if safefetch on zero doesn't work
> then SafeFetch may crash; if you take the else then it may also crash -
> same result both code paths, so no point.
>
>
But this documents the intent clearly. We want to access the memory safely,
unless we cannot do SafeFetch, then we risk a direct load.

That said, I have not many emotions here. I think in the long run SafeFetch
can/will be implemented on zero.

Kind Regards, Thomas

David
>
>  There's also a SafeFetch32 call around that same place that would also
>> have to be changed this way.
>>
>> But I thought it was messy to add this conditional code since the result
>> for Zero would be the exactly the same in this place.  So I chose
>> CanUseSafeFetch to mean SafeFetch isn't safe and protected code the test
>> code that way and not the ObjectMonitor code.
>>
>> I don't mind if this is reversed as long as it's not messy and is tested.
>>
>> Coleen
>>
>> On 3/26/15, 9:46 AM, Thomas Stüfe wrote:
>>
>>> Hi David,
>>>
>>> you are right, it was also broken before. I missed that Coleen just
>>> reintroduced an earlier version of the dummy SafeFetch()
>>> implementation, sorry for that.
>>>
>>> I would be happy with a CanUseSafeFetch() which returns always false
>>> on zero and false on other platforms before stub generation. That
>>> would be always safe to call.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>> On Thu, Mar 26, 2015 at 1:38 PM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>>     On 26/03/2015 10:24 PM, Thomas Stüfe wrote:
>>>
>>>         Hi David,
>>>
>>>         On Thu, Mar 26, 2015 at 11:42 AM, David Holmes
>>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>>         <mailto:david.holmes at oracle.com
>>>
>>>         <mailto:david.holmes at oracle.com>>> wrote:
>>>
>>>             Hi Thomas,
>>>
>>>             On 26/03/2015 6:35 PM, Thomas Stüfe wrote:
>>>
>>>                 Hi,
>>>
>>>                 I agree with Goetz.
>>>
>>>                 SafeFetch() was meant to be guarded with
>>>         CanUseSafeFetch(), this
>>>                 is part
>>>                 of the contract, so using it without this guard is
>>>         actually
>>>                 incorrect
>>>                 and would crash were the calling code ever called
>>>         before Stub
>>>                 routine
>>>                 initialization.
>>>
>>>
>>>             That contract was already implicit - this API is for use
>>>         after the
>>>             stubs have been generated. There's no reason to expand the
>>>         use of
>>>             this API to situations before stubs are generated. So the
>>>         assert
>>>             captures that and CanUseSafeFetch is really unnecessary.
>>>
>>>
>>>         We had to introduce CanUseSafeFetch() exactly because we have
>>>         situations
>>>         where we may call SafeFetch before stub routine generation:
>>>
>>>         1) In os_aix.cpp, we use it to probe protected memory after doing
>>>         mprotect(). This is a long story, but we had to add this
>>>         workaround
>>>         because on shortcomings in AIX's mprotect(). This code will
>>>         get called
>>>         during initialization before stub routines are set up.
>>>         Note that right now, in the OpenJDK code base, the code still
>>>         directly
>>>         checks whether the stubroutine is NULL, because that mprotect
>>>         workaround
>>>         is older than 8074552. But I planned to change this to
>>>         CanUseSafeFetch(). But with the new CanUseSafeFetch, this
>>>         would just
>>>         break AIX.
>>>
>>>         2) In SAPs code base, we also use it to harden error log
>>>         writing (e.g.
>>>         to make callstack dumping more reliable). This may can be
>>>         invoked very
>>>         early, if a crash happens very early, before stub routines are
>>>         available. We planned to contribute those improvements piece
>>>         by piece to
>>>         the OpenJDK - my first change for 8074552 was actually supposed
>>> to
>>>         prepare that. Now, however, this makes no sense, because using
>>>         SafeFetch() to harden error log writing would actually be
>>>         worse for
>>>         error logs for early crashes.
>>>
>>>         SafeFetch is a very basic function which could be useful in
>>>         many more
>>>         contexts, but only if you can really be sure it is safe. The
>>>         contexts it
>>>         is used in may be very low level where you cannot be sure at
>>>         which point
>>>         in VM initialization you are.
>>>
>>>         When I introduced CanUseSafeFetch(), it was to safeguard the
>>>         use of
>>>         SafeFetch() for situations where SafeFetch was not safe to
>>>         call. Exactly
>>>         so that I would have not to think about "is this too early to use
>>>         SafeFetch()?". This is broken now - instead of guarding me
>>>         against using
>>>         SafeFetch() in situations where it is not safe to use, it will
>>>         assert(debug) or potentially lie(release), in which case
>>> SafeFetch
>>>         itself may crash.
>>>
>>>
>>>     Okay I see a case for adding a check for "can I use SafeFetch at
>>>     this point in time" (though if what you are fetching is
>>>     potentially not-safe then you are in trouble!).
>>>
>>>                 So now
>>>                 - CanUseSafeFetch really now means "SafeFetch will
>>>         behave as
>>>                 excpected"
>>>                 - SafeFetch is softened to "will safely fetch content
>>>         of address
>>>                 unless on zero, there it may crash the VM".
>>>
>>>                 I think this is more confusing than before. In
>>>         objectMonitor,
>>>                 the logic
>>>                 we want is "use SafeFetch wherever possible but if
>>>         not, just
>>>                 access the
>>>                 memory directly". This logic now is hidden behind a
>>>         seemingly
>>>                 incorrect
>>>                 use of SafeFetch without CanUseSafeFetch, and I think
>>>         Goetz proposal
>>>                 would have been far more clearer.
>>>
>>>
>>>             No the problem is that SafeFetch doesn't work on ZERO - if
>>>         you give
>>>             it a bad address it won't recover it will crash. The whole
>>>         point is
>>>             for it not to crash. The objectMonitor code is saying
>>>         "here is a
>>>             load that we know may be invalid, but if it is then we
>>>         don't want to
>>>             crash, so we use SafeFetch".
>>>
>>>
>>>         Yes! But the way it is coded, it does not. Because SafeFetch
>>>         is now
>>>         implemented on zero with a simple load, it is not safe at all.
>>>         The code
>>>         in objectMonitor.cpp appears to be safe, but it is not. And
>>>         SafeFetch()
>>>         promises to be safe, but it is not, and you would not know
>>>         unless you
>>>         know the zero implementation.
>>>
>>>
>>>     Sorry did I miss something - has Zero ever implemented SafeFetch
>>>     as other than a direct simple load? ie hasn't Zero always been
>>>     broken wrt SafeFetch and that is the basic problem ?
>>>
>>>         It seems to be a bit inconsequent. Either one is really sure
>>>         the pointer
>>>         is accessible - in which case why use SafeFetch? - or it may
>>>         be invalid,
>>>         even if this is unusual - in which case why is the resulting
>>>         crash not a
>>>         problem for zero?
>>>         In other words, if the original programmer thought it worth to
>>>         guard the
>>>         memory access with SafeFetch(), why is this not a concern for
>>>         zero?
>>>
>>>
>>>     It is - which is why Zero is broken.
>>>
>>>             The expectation is that SafeFetch actually is safe no
>>>         matter what is
>>>             fetched.
>>>
>>>
>>>         Yes, and that expectation is broken now.
>>>
>>>
>>>     When was it not broken?
>>>
>>>         Before, if you used SafeFetch() and guarded it with
>>>         CanUseSafeFetch(),
>>>         you could be sure to be safe and not to break anything -
>>>         regardless when
>>>         in VM life you used the function or on which platform.
>>>
>>>
>>>     No you only knew that a function called SafeFetch could be called
>>>     - you didn't know it was actually implemented to actually always
>>>     do a "safe fetch".
>>>
>>>         Both SafeFetch and CanUseSafeFetch made promises:
>>>         - CanUseSafeFetch promised to tell me if it is too early to call
>>>         SafeFetch. Of course, it itself could be called at all times,
>>>         that was
>>>         the whole point of CanUseSafeFetch.
>>>         - SafeFetch promised to be able to access a potentially
>>>         invalid pointer
>>>         and not to crash - if CanUseSafeFetch returned true.
>>>
>>>
>>>     SafeFetch (if it exists) promises to be safe. The ZERO
>>>     implementation breaks that.
>>>
>>>         Both promises are broken now. CanUseSafeFetch is not safe to
>>>         call if it
>>>         is too early - ironically a condition it was designed to tell me.
>>>
>>>
>>>     Okay we will re-fix that.
>>>
>>>         SafeFetch is not safe to call because it lies about being safe
>>>         for zero.
>>>
>>>
>>>     It never was safe to call as far as I can see, it was just never
>>>     called with an invalid value. The new test exposed this.
>>>
>>>     David
>>>     -----
>>>
>>>         The result is that either if we continue to use SafeFetch, we
>>> risk
>>>         crashes on zero and crashes if invoked too early. This makes
>>>         the API
>>>         useless for most cases.
>>>
>>>         I am open to any other suggestions - renaming the APIs, maybe
>>>         reshaping
>>>         them - but I regret seeing SafeFetch being back to being quite
>>>         unsafe again.
>>>
>>>         Kind Regards, Thomas
>>>
>>>
>>>             The ZERO implementation just does a load and hopes it
>>>         isn't asked
>>>             for anything bad. The objectMonitor code will rarely actually
>>>             trigger a bad load. But the newly introduced test does.
>>>
>>>             So in my opinion to return to the status-quo where ZERO
>>> really
>>>             doesn't implement SafeFetch safely but it "works okay in
>>>         practice",
>>>             we should disable those explicit tests when running on ZERO.
>>>
>>>             Whether CanUseSafeFetch (aka is SafeFetchStubAvailable) is
>>>             implemented explicitly or implicitly as an assert is a
>>>         side-issue.
>>>
>>>             David
>>>
>>>                 Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                 On Thu, Mar 26, 2015 at 8:40 AM, Lindenmaier, Goetz
>>>                 <goetz.lindenmaier at sap.com
>>>         <mailto:goetz.lindenmaier at sap.com>
>>>         <mailto:goetz.lindenmaier at sap.com
>>>         <mailto:goetz.lindenmaier at sap.com>>
>>>                 <mailto:goetz.lindenmaier at sap.
>>>         <mailto:goetz.lindenmaier at sap.>__com
>>>                 <mailto:goetz.lindenmaier at sap.com
>>>
>>>         <mailto:goetz.lindenmaier at sap.com>>>> wrote:
>>>
>>>                      Hi,
>>>
>>>                      sorry I only comment on this now ...
>>>                      I think it's rather strange that you can not use
>>>         SafeFetch
>>>                 on zero,
>>>                      but it's there,
>>>                      and it's actually used ... which will be the
>>>         situation
>>>                 after this
>>>                      change.
>>>
>>>                      Why not just change objectMonitor as the
>>>         SafeFetch api is
>>>                 meant to
>>>                      be used?
>>>                      Obviously zero takes care that this access always
>>>         succeeds:
>>>
>>>                      diff -r a35854c84d9d
>>>         src/share/vm/runtime/__objectMonitor.cpp
>>>                      --- a/src/share/vm/runtime/__objectMonitor.cpp
>>>         Wed Mar
>>>                 25 12:36:28
>>>                      2015 +0100
>>>                      +++ b/src/share/vm/runtime/__objectMonitor.cpp
>>>         Thu Mar
>>>
>>>                 26 08:19:53
>>>                      2015 +0100
>>>                      @@ -2241,7 +2241,12 @@
>>>                          }
>>>
>>>                          assert(sizeof(((JavaThread
>>>         *)ox)->_thread_state ==
>>>                 sizeof(int)),
>>>                      "invariant");
>>>                      -  int jst = SafeFetch32((int *) &((JavaThread *)
>>>                      ox)->_thread_state, -1);;
>>>                      +  int jst;
>>>                      +  if (CanUseSafeFetch32()) {
>>>                      +    jst = SafeFetch32((int *) &((JavaThread *)
>>>                 ox)->_thread_state, -1);
>>>                      +  } else {
>>>                      +    jst = ((JavaThread *)ox)->_thread_state;
>>>                      +  }
>>>                          // consider also: jst != _thread_in_Java --
>>>         but that's
>>>                 overspecific.
>>>                          return jst == _thread_blocked || jst ==
>>>         _thread_in_native;
>>>                        }
>>>
>>>                      It's handled similarly in vmError.cpp. And no
>>>         #ifdefs ;)
>>>
>>>                      But I'm also fine with the other change ... to
>>>         get zero
>>>                 working again.
>>>
>>>                      Best regards,
>>>                         Goetz.
>>>
>>>
>>>
>>>                      -----Original Message-----
>>>                      From: hotspot-dev
>>>                 [mailto:hotspot-dev-bounces@
>>>         <mailto:hotspot-dev-bounces@>__openjdk.java.net
>>>         <http://openjdk.java.net>
>>>                 <mailto:hotspot-dev-bounces at openjdk.java.net
>>>         <mailto:hotspot-dev-bounces at openjdk.java.net>>
>>>                      <mailto:hotspot-dev-bounces@
>>>         <mailto:hotspot-dev-bounces@>__openjdk.java.net
>>>         <http://openjdk.java.net>
>>>                 <mailto:hotspot-dev-bounces at openjdk.java.net
>>>
>>>         <mailto:hotspot-dev-bounces at openjdk.java.net>>>] On Behalf Of
>>>         Coleen
>>>                      Phillimore
>>>                      Sent: Thursday, March 26, 2015 3:30 AM
>>>                      To: David Holmes; hotspot-dev developers
>>>                      Subject: Re: RFR (xs) 8075967: Zero interpreter
>>>         asserts for
>>>                      SafeFetch<32, N> calls in ObjectMonitor
>>>
>>>
>>>                      On 3/25/15, 10:15 PM, David Holmes wrote:
>>>                       > On 26/03/2015 11:00 AM, Coleen Phillimore wrote:
>>>                       >>
>>>                       >> On 3/25/15, 8:21 PM, David Holmes wrote:
>>>                       >>> On 26/03/2015 10:04 AM, Coleen Phillimore
>>> wrote:
>>>                       >>>>
>>>                       >>>> On 3/25/15, 7:59 PM, David Holmes wrote:
>>>                       >>>>> Hi Coleen,
>>>                       >>>>>
>>>                       >>>>> Why generate stubs that can't be used and
>>>         then add
>>>                 zero-specific
>>>                       >>>>> logic
>>>                       >>>>> to the shared CanUseSafeFetchN instead of not
>>>                 generating the
>>>                      stubs so
>>>                       >>>>> that CanUseSafeFetchN will return false
>>>         anyway ??
>>>                       >>>>
>>>                       >>>> Because there is platform independent code in
>>>                      objectMonitor.cpp that
>>>                       >>>> uses SafeFetchX (both).  I'd rather not
>>>         burden the
>>>                 caller of
>>>                      this with
>>>                       >>>> testing CanUseSafeFetchX for each call.
>>>          This code
>>>                 existed before
>>>                       >>>> SafeFetch was implemented, so I'm restoring
>>>         previous
>>>                 behavior for
>>>                       >>>> Zero.
>>>                       >>>
>>>                       >>> Sorry Coleen I thought this was deal to with
>>>         8075533,
>>>                 I hadn't
>>>                       >>> realized 8075533 broke the objectMonitor
>>>         code and this
>>>                 was a
>>>                      follow up.
>>>                       >>>
>>>                       >>> What a mess. :(
>>>                       >>
>>>                       >> Yes, it is.  Is this a code Review?
>>>                       >
>>>                       > Let me work through this :) The original
>>>         change for 8074552:
>>>                       >
>>>                       >
>>>         http://hg.openjdk.java.net/__jdk9/hs-rt/hotspot/rev/__
>>> 3eb61269f421
>>>
>>>         <http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/3eb61269f421>
>>>                       >
>>>                       > added the extra error handling stuff related to
>>>                 SafeFetch and a new
>>>                       > ErrorHandler test that was predicated on
>>>                 CanUseSafeFetch32 which
>>>                       > returns true if the stubs exist. It also added
>>>         testing of
>>>                      SafeFetch in
>>>                       > StubRoutines::initialize2 which was excluded for
>>>                 windows-32-bit and
>>>                       > which did not check CanUseSafeFetch32.
>>>                       >
>>>                       > That change broke zero as reported in 8075533
>>>         because
>>>                 you can't
>>>                       > actually use the SafeFetch routines with an
>>> unsafe
>>>                 address on Zero,
>>>                       > and the solution:
>>>                       >
>>>                       >
>>>         http://hg.openjdk.java.net/__jdk9/hs-rt/hotspot/rev/__
>>> 5c2bc6278fc4
>>>
>>>
>>>
>>>         <http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/5c2bc6278fc4>
>>>                       >
>>>                       > was to remove the stubs so that
>>>         CanUseSafeFetch32 would
>>>                 return false,
>>>                       > and to include a check of CanUseSafeFetch32 in
>>>         the tests
>>>                 used with
>>>                       > StubRoutines::initialize2.
>>>                       >
>>>                       > And now we find that the SafeFetch routines
>>>         are used
>>>                 unconditionally
>>>                       > by the objectMonitor code, so we need to
>>>         restore the
>>>                 stubs that were
>>>                       > previously removed, but force CanUseSafeFetch32
>>> to
>>>                 return false
>>>                      on zero.
>>>
>>>                      Yes, exactly.
>>>                       >
>>>                       > Ok. Only nit is that:
>>>                       >
>>>                       > return NOT_ZERO(true) ZERO_ONLY(false);
>>>
>>>                      Yes, you're right.  I'll make that change.
>>>
>>>                      Thanks!
>>>                      Coleen
>>>
>>>                       >
>>>                       > would be more readable than the ifdefs.
>>>                       >
>>>                       > Thanks,
>>>                       > David
>>>                       >
>>>                       >> thanks!
>>>                       >> Coleen
>>>                       >>>
>>>                       >>> Thanks,
>>>                       >>> David
>>>                       >>>
>>>                       >>>
>>>                       >>>
>>>                       >>>> We could file an RFE to either implement
>>>         SafeFetch
>>>                 for Zero or
>>>                      rewrite
>>>                       >>>> this objectMonitor code to not need
>>>         SafeFetch.  I
>>>                 didn't want
>>>                      to do
>>>                       >>>> either of these things with this patch.
>>>                       >>>>
>>>                       >>>> Coleen
>>>                       >>>>
>>>                       >>>>>
>>>                       >>>>> Thanks,
>>>                       >>>>> David
>>>                       >>>>>
>>>                       >>>>> On 26/03/2015 3:53 AM, Coleen Phillimore
>>>         wrote:
>>>                       >>>>>> Summary: Implement SafeFetchX unsafely
>>>         and make
>>>                 CanUseSafeFetchX
>>>                       >>>>>> false
>>>                       >>>>>> for Zero
>>>                       >>>>>>
>>>                       >>>>>> Also, this fixes a couple other minor
>>> issues.
>>>                       >>>>>>
>>>                       >>>>>> Ran jdk9 jck tests with one timeout.
>>>                 hotspot/runtime jtreg
>>>                      tests
>>>                       >>>>>> don't
>>>                       >>>>>> run because Zero doesn't support
>>>         UseCompressedOops
>>>                 (not sure
>>>                      why)
>>>                       >>>>>> and
>>>                       >>>>>> CDS (know why).
>>>                       >>>>>>
>>>                       >>>>>> open webrev at
>>>         http://cr.openjdk.java.net/~__coleenp/8075967/
>>>         <http://cr.openjdk.java.net/%7E__coleenp/8075967/>
>>>                 <http://cr.openjdk.java.net/~coleenp/8075967/
>>>         <http://cr.openjdk.java.net/%7Ecoleenp/8075967/>>
>>>                       >>>>>> bug link
>>>         https://bugs.openjdk.java.net/__browse/JDK-8075967
>>>                 <https://bugs.openjdk.java.net/browse/JDK-8075967>
>>>                       >>>>>>
>>>                       >>>>>> Thanks,
>>>                       >>>>>> Coleen
>>>                       >>>>
>>>                       >>
>>>
>>>
>>>
>>>
>>>
>>