RFR(M): 8210754: print_location is not reliable enough (printing register info)
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Sep 24 14:26:54 UTC 2018
Hi Martin,
Thank you, that looks like a lot of work!
Some remarks:
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/classfile/classLoaderData.cpp.udiff.html
Can we put all that analysis stuff - e.g.
ClassLoaderDataGraph::is_valid - into DEBUG_ONLY ?
+ if (loader_data == ClassLoaderData::the_null_class_loader_data()) {
+ return true;
+ }
Init dependency here. the_null_class_loader_data() can be NULL early
on, so better test for NULL too.
==========
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/memory/metaspace.cpp.udiff.html
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/memory/metaspace.hpp.udiff.html
I feel these functions could be better put into an own file. We
slimmed down metaspace.cpp a lot and removed most into own files.
Either metaDebug.hpp/cpp or a new file under metaspace/...
--
I am not happy with exposing VirtualSpaceNode via metaspace.hpp. I
feel that is not necessary to solve to problem you have. All which is
needed, from the outside, are:
MetaspaceUtils::is_valid(ptr)
MetaspaceUtils::is_valid_range(from, to)
I even would avoid "_in_committed" as a wording, just use "_is_valid",
since this presumes less of the underlying implementation.
I also would make those two functions public, but DEBUG_ONLY, and
remove the vmErrorhelper friendship, but that is just my personal
taste.
--
In all case you could weed out obvious bogus values before actually
iterating the CLDG:
MetaspaceUtils::is_in_committed(void* p)
if (p != NULL)
or even
if (p < (reasonable low value)
may catch a whole range of cases where we have invalid pointers.
In fact, a function like this:
bool looks_like_a_real_pointer(void* p) {
}
could already weed out 80% of all cases, by checking reasonable low
and high limits and checking for alignment.
---
Do we really need the range check? How probable is it that
MetaspaceUtils::is_in_committed(ptr1) and
MetaspaceUtils::is_in_committed(ptr2) are ok but the range is invalid?
(It can happen, sure, but how probable is this?)
---
I cannot comment on the CDS sections. Looks fine to me, but I'm no expert here.
---
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/memory/metaspace/virtualSpaceList.hpp.udiff.html
I would not expose a "find_enclosing_space" but instead one - possibly
two - functions like this:
VistualSpaceList::is_valid(ptr)
(if we need range checking): VistualSpaceList::is_valid_range(from, to)
And all the way down to VistualSpaceNode.
Reason is that I would not expose functionality to the outside which
is not needed.
==========
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/utilities/vmErrorHelper.hpp.html
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/utilities/vmErrorHelper.cpp.html
Style nit: vmErrorHelper -> VMErrorHelper. Or maybe VMErrorUtils?
---
Style: for brevity, I wonder whether we can do away with all the
"_in_committed" and just assume that is implied?
e.g.:
is_range_in_committed_java_heap -> is_range_in_java_heap
or even shorter "is_in_java_heap" since two argument pointers imply we
do range checking.
Same argument goes for the metaspace functions too.
--
46 bool vmErrorHelper::is_range_in_committed_java_heap(const void*
from, const void* to) {
47 for (uintptr_t i = (uintptr_t)from; i != (uintptr_t)to; ++i) {
48 if (!Universe::heap()->is_in((const void*) i)) {
49 return false;
50 }
51 }
52
53 return true;
54 }
Why the cast to uintptr_t?
Also, why bytewise iteration? Why not pagewise like in
vmErrorHelper::is_readable_range?
---
vmErrorHelper::is_range_in_committed_c_heap()
This function name is a blatant lie :-) - to the intent is to know if
a region is readable but not one of the known big JVM regions. I would
split this function in two more basic operations:
vmErrorHelper::is_known_range() - which would return true if a given
range is either heap, metaspace, ...
and the already existing is_readable_range().
Also, I see we use this only in is_valid_symbol(). Is it really
necessary to exclude the remote possibility that a given range is
valid but points to java heap or metaspace?
I would change is_valid_symbol() to:
- is symbol obvious garbage pointer?
- is [symbol, symbol+sizeof(Symbol)) readable?
- extract length. Is length obvious garbage? <0 or very large?
- is byte array readable?
- and if you are really thorough, check symbol byte array for
non-printable chars or zeros.
I would omit a metaspace-contains check or a java-heap-contains check
just to exclude the remote possibility that the range points there.
When in doubt, we may add a "safe_print" function to print out a
suspect char array like the one we extract from a suspect Symbol*,
where we could check each char for "is_ascii" before printing. But
this may be overdoing it.
---
71 void* vmErrorHelper::load_klass_raw(oop obj) {
Name is misleading. How about "get_klass_pointer()" os similar?
Just dereferencing the given oop without prior checks?
---
81 void* vmErrorHelper::load_oop_raw(oop obj, int offset) {
same remarks here.
---
bool vmErrorHelper::is_valid_klass(Klass* k) {
- better entry checks.
- alignment checks for sizeof(MetaWord), not for intptr_t
- You can loose the "is_valid_symbol". Symbols live in C heap anyway.
- I feel that the CLDG walk is overdoing it a bit. If the given
pointer points to metaspace, I would say chance is high that thing is
valid.
---
133 bool vmErrorHelper::is_valid_oop(oop obj) {
- why not using is_valid_klass? to check the extracted klass pointer?
---
oop vmErrorHelper::oop_or_null(address addr) {
HeapWord* p = Universe::heap()->block_start(addr);
// If we couldn't find it it just may mean that heap wasn't parsable
// See if we were just given an oop directly
if (p != NULL && Universe::heap()->block_is_obj(p)) {
return oop(p);
} else if (p == NULL && is_valid_oop(oop(addr))) {
return oop(addr);
}
return NULL;
}
Can you move the comment into the else branch please?
Also, a short comment would be helpful describing what this function
really does (true for the other functions too).
e.g.:
"Given an address which may point at or into a java object, returns
the oop for the java object. NULL otherwise."
Would it make sense to revert the order of the check? First
is_valid_oop, then heap()->block_start()? Since more frequently one
has an oop, not an address into an object?
---
166 void vmErrorHelper::dump_code_blob(address addr, CodeBlob* b,
outputStream* st, bool verbose) {
I really do not like the use of ResourceMarks here! This assumes a
whole lot of things work which may be broken already.
Thread::current(), arena allocation etc.
===
http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/src/hotspot/share/runtime/os.cpp.udiff.html
I do not understand why the compressed-oop and non-compressed-oop case
are so different?
In the latter case we do vmErrorHelper::oop_or_null(addr); so we
handle pointer into objects. In the former case we dont. Why?
+ st->print(UINT32_FORMAT " is a compressed pointer to object: ",
narrow_oop);
I would probably print out the decompressed oop too, before calling
oop->print_on.
1127 if (UseCompressedClassPointers && ((uintptr_t)addr &~
(uintptr_t)max_juint) == 0) {
We print klass in case the addr is a compressed klass pointer, but I
did not find a corresponding function if addr is just an uncompressed
klass pointer?
========================================
Bottomline, this all feels a bit on the heavy side for just debug
printing. I would generally prefer all that code being DEBUG_ONLY, if
only for readibility. Also, I think in many cases we could do less
in-depth checking (e.g. walking the CLDG) and instead do some generic
sanity checks for a given value which claims to be a valid
pointer/oop/MetaWord* etc.
E.g. just a simple
vmErrorHelper::looks_like_a_reasonable_(heap|metaspace|whatever)_pointer(addr) {
- check for NULL
- check sensible lower and upper limits
- check valid alignment
}
One even could maintain outer limits in the components: e.g. it would
not be hard to keep track of the highest and lowest ever-allocated
ClassLoaderData*. That way, is_valid_classloaderdata() could just do a
range check between these areas and probably catch 98% of all cases
with garbage input. That way, we would not have to walk data
structures in error handling. (Still I would combine this always with
"is_readable_range").
We do not aim for 100%, just reasonable enough heuristics. Most of the
invalid-data cases are not almost-correct-pointers but complete
garbage, like NULL or very small or large values.
Just my 5 cents
Cheers,
Thomas
On Fri, Sep 14, 2018 at 4:03 PM, Doerr, Martin <martin.doerr at sap.com> wrote:
> Hi,
>
> I'd like to make os::print_location more reliable which is used in error reporting step "printing register info". Oops and Klasses should get inspected more carefully.
> I have seen errors like "[error occurred during error reporting (printing register info), id 0xe0000000, Internal Error (/usr/work/d056149/openjdk/jdk/src/hotspot/share/oops/klass.inline.hpp:63)]" in many hs_err files.
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8210754
>
> Webrev:
> http://cr.openjdk.java.net/~mdoerr/8210754_print_location/webrev.00/
>
> Sometimes, I get such errors when using -XX:+CrashGCForDumpingJavaThread, sometimes when injecting crashing code into compiled methods which I did by the following code:
> http://cr.openjdk.java.net/~mdoerr/crash_C2_method/webrev.00/
> I can also contribute this if it's desired. Automatic tests would certainly be nice to have. Maybe I can find some time for that.
>
> Please review.
>
> Best regards,
> Martin
>
More information about the hotspot-runtime-dev
mailing list