RFR(M): 8210754: print_location is not reliable enough (printing register info)

Thomas Stüfe thomas.stuefe at gmail.com
Tue Sep 25 06:31:24 UTC 2018


Hi Martin,

On Mon, Sep 24, 2018 at 4:26 PM, Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
> Hi Martin,
>
> Thank you, that looks like a lot of work!
>
> Some remarks:
>
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/classfile/classLoaderData.cpp.udiff.html
>
> Can we put all that analysis stuff - e.g.
> ClassLoaderDataGraph::is_valid - into DEBUG_ONLY ?
>
> +  if (loader_data == ClassLoaderData::the_null_class_loader_data()) {
> +    return true;
> +  }
>
> Init dependency here. the_null_class_loader_data() can be NULL early
> on, so better test for NULL too.
>
> ==========
>
>
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/memory/metaspace.cpp.udiff.html
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/memory/metaspace.hpp.udiff.html
>
>
> I feel these functions could be better put into an own file. We
> slimmed down metaspace.cpp a lot and removed most into own files.
> Either metaDebug.hpp/cpp or a new file under metaspace/...
>
> --
>
> I am not happy with exposing VirtualSpaceNode via metaspace.hpp. I
> feel that is not necessary to solve to problem you have. All which is
> needed, from the outside, are:
>
> MetaspaceUtils::is_valid(ptr)
> MetaspaceUtils::is_valid_range(from, to)
>
> I even would avoid "_in_committed" as a wording, just use "_is_valid",
> since this presumes less of the underlying implementation.
>
> I also would make those two functions public, but DEBUG_ONLY, and
> remove the vmErrorhelper friendship, but that is just my personal
> taste.
>
> --
>
> In all case you could weed out obvious bogus values before actually
> iterating the CLDG:
>
> MetaspaceUtils::is_in_committed(void* p)
>
> if (p != NULL)
> or even
> if (p < (reasonable low value)
>
> may catch a whole range of cases where we have invalid pointers.
>
> In fact, a function like this:
>
> bool looks_like_a_real_pointer(void* p) {
>
> }
>
> could already weed out 80% of all cases, by checking reasonable low
> and high limits and checking for alignment.
>
> ---
>
> Do we really need the range check? How probable is it that
> MetaspaceUtils::is_in_committed(ptr1) and
> MetaspaceUtils::is_in_committed(ptr2) are ok but the range is invalid?
> (It can happen, sure, but how probable is this?)
>
> ---
>
> I cannot comment on the CDS sections. Looks fine to me, but I'm no expert here.
>
> ---
>
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/memory/metaspace/virtualSpaceList.hpp.udiff.html
>
> I would not expose a "find_enclosing_space" but instead one - possibly
> two - functions like this:
>
> VistualSpaceList::is_valid(ptr)
> (if we need range checking): VistualSpaceList::is_valid_range(from, to)
>
> And all the way down to VistualSpaceNode.
>
> Reason is that I would not expose functionality to the outside which
> is not needed.
>
> ==========
>
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/utilities/vmErrorHelper.hpp.html
>
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/utilities/vmErrorHelper.cpp.html
>
>
> Style nit: vmErrorHelper -> VMErrorHelper. Or maybe VMErrorUtils?
>
> ---
>
> Style: for brevity, I wonder whether we can do away with all the
> "_in_committed" and just assume that is implied?
>
> e.g.:
>
> is_range_in_committed_java_heap -> is_range_in_java_heap
>
> or even shorter "is_in_java_heap" since two argument pointers imply we
> do range checking.
>
> Same argument goes for the metaspace functions too.
>
> --
>
>   46 bool vmErrorHelper::is_range_in_committed_java_heap(const void*
> from, const void* to) {
>   47   for (uintptr_t i = (uintptr_t)from; i != (uintptr_t)to; ++i) {
>   48     if (!Universe::heap()->is_in((const void*) i)) {
>   49       return false;
>   50     }
>   51   }
>   52
>   53   return true;
>   54 }
>
> Why the cast to uintptr_t?
> Also, why bytewise iteration? Why not pagewise like in
> vmErrorHelper::is_readable_range?
>
> ---
>
> vmErrorHelper::is_range_in_committed_c_heap()
>
> This function name is a blatant lie :-) - to the intent is to know if
> a region is readable but not one of the known big JVM regions. I would
> split this function in two more basic operations:
>
> vmErrorHelper::is_known_range() - which would return true if a given
> range is either heap, metaspace, ...
>
> and the already existing is_readable_range().
>
> Also, I see we use this only in is_valid_symbol(). Is it really
> necessary to exclude the remote possibility that a given range is
> valid but points to java heap or metaspace?
>
> I would change is_valid_symbol() to:
>
> - is symbol obvious garbage pointer?
> - is [symbol, symbol+sizeof(Symbol)) readable?
> - extract length. Is length obvious garbage? <0 or very large?
> - is byte array readable?
> - and if you are really thorough, check symbol byte array for
> non-printable chars or zeros.
>
> I would omit a metaspace-contains check or a java-heap-contains check
> just to exclude the remote possibility that the range points there.
>
> When in doubt, we may add a "safe_print" function to print out a
> suspect char array like the one we extract from a suspect Symbol*,
> where we could check each char for "is_ascii" before printing. But
> this may be overdoing it.
>
> ---
>
>   71 void* vmErrorHelper::load_klass_raw(oop obj) {
>
> Name is misleading. How about "get_klass_pointer()" os similar?
>
> Just dereferencing the given oop without prior checks?
>
> ---
>
>   81 void* vmErrorHelper::load_oop_raw(oop obj, int offset) {
>
> same remarks here.
>
> ---
>
> bool vmErrorHelper::is_valid_klass(Klass* k) {
>
> - better entry checks.
> - alignment checks for sizeof(MetaWord), not for intptr_t
> - You can loose the "is_valid_symbol". Symbols live in C heap anyway.
> - I feel that the CLDG walk is overdoing it a bit. If the given
> pointer points to metaspace, I would say chance is high that thing is
> valid.
>
> ---
>
>  133 bool vmErrorHelper::is_valid_oop(oop obj) {
>
> - why not using is_valid_klass? to check the extracted klass pointer?
>
> ---
>
> oop vmErrorHelper::oop_or_null(address addr) {
>   HeapWord* p = Universe::heap()->block_start(addr);
>   // If we couldn't find it it just may mean that heap wasn't parsable
>   // See if we were just given an oop directly
>   if (p != NULL && Universe::heap()->block_is_obj(p)) {
>     return oop(p);
>   } else if (p == NULL && is_valid_oop(oop(addr))) {
>     return oop(addr);
>   }
>   return NULL;
> }
>
> Can you move the comment into the else branch please?
>
> Also, a short comment would be helpful describing what this function
> really does (true for the other functions too).
>
> e.g.:
>
> "Given an address which may point at or into a java object, returns
> the oop for the java object. NULL otherwise."
>
> Would it make sense to revert the order of the check? First
> is_valid_oop, then heap()->block_start()? Since more frequently one
> has an oop, not an address into an object?
>
> ---
>
>  166 void vmErrorHelper::dump_code_blob(address addr, CodeBlob* b,
> outputStream* st, bool verbose) {
>
> I really do not like the use of ResourceMarks here! This assumes a
> whole lot of things work which may be broken already.
> Thread::current(), arena allocation etc.
>
> ===
>
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/runtime/os.cpp.udiff.html
>
> I do not understand why the compressed-oop and non-compressed-oop case
> are so different?
>
> In the latter case we do vmErrorHelper::oop_or_null(addr); so we
> handle pointer into objects. In the former case we dont. Why?
>
> +      st->print(UINT32_FORMAT " is a compressed pointer to object: ",
> narrow_oop);
>
> I would probably print out the decompressed oop too, before calling
> oop->print_on.
>
> 1127   if (UseCompressedClassPointers && ((uintptr_t)addr &~
> (uintptr_t)max_juint) == 0) {
>
> We print klass in case the addr is a compressed klass pointer, but I
> did not find a corresponding function if addr is just an uncompressed
> klass pointer?
>
>
> ========================================
>
> Bottomline, this all feels a bit on the heavy side for just debug
> printing. I would generally prefer all that code being DEBUG_ONLY, if
> only for readibility. Also, I think in many cases we could do less
> in-depth checking (e.g. walking the CLDG) and instead do some generic
> sanity checks for a given value which claims to be a valid
> pointer/oop/MetaWord* etc.

Thinking this thru again, I realize I made a thinking error. The point
of the is_valid_<xxx>() functions is not to establish that a value
which should be an <xxx> can be in fact safely accessed. The point is
to determine whether a completely unknown random number plucked from a
register is a <xxx>.

So, this means e.g. for Metadata, you need to actually iterate the
metadata nodes. So forget my paragraph above, it wont work.

>
> E.g. just a simple
>
> vmErrorHelper::looks_like_a_reasonable_(heap|metaspace|whatever)_pointer(addr) {
>  - check for NULL
>  - check sensible lower and upper limits
>  - check valid alignment
> }
>
> One even could maintain outer limits in the components: e.g. it would
> not be hard to keep track of the highest and lowest ever-allocated
> ClassLoaderData*. That way, is_valid_classloaderdata() could just do a
> range check between these areas and probably catch 98% of all cases
> with garbage input. That way, we would not have to walk data
> structures in error handling. (Still I would combine this always with
> "is_readable_range").
>
> We do not aim for 100%, just reasonable enough heuristics. Most of the
> invalid-data cases are not almost-correct-pointers but complete
> garbage, like NULL or very small or large values.
>
> Just my 5 cents
>
> Cheers,
>
> Thomas
>
>
> On Fri, Sep 14, 2018 at 4:03 PM, Doerr, Martin <martin.doerr at sap.com> wrote:
>> Hi,
>>
>> I'd like to make os::print_location more reliable which is used in error reporting step "printing register info". Oops and Klasses should get inspected more carefully.
>> I have seen errors like "[error occurred during error reporting (printing register info), id 0xe0000000, Internal Error (/usr/work/d056149/openjdk/jdk/src/hotspot/share/oops/klass.inline.hpp:63)]" in many hs_err files.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8210754
>>
>> Webrev:
>> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/
>>
>> Sometimes, I get such errors when using -XX:+CrashGCForDumpingJavaThread, sometimes when injecting crashing code into compiled methods which I did by the following code:
>> http://cr.openjdk.java.net/~mdoerr/crash_C2_method/webrev.00/
>> I can also contribute this if it's desired. Automatic tests would certainly be nice to have. Maybe I can find some time for that.
>>
>> Please review.
>>
>> Best regards,
>> Martin
>>

..

..Thomas


More information about the hotspot-runtime-dev mailing list