RFR (M): 8129419: heapDumper.cpp: assert(length_in_bytes > 0) failed: nothing to copy

Mon Jan 18 12:58:08 UTC 2016

Andreas,

On 2016-01-18 15:26, Andreas Eriksson wrote:
> I believe I then have to cast it to size_t at 496, 497 and 498?

I hope not.

> I'm not sure how write works in regards to returning ssize_t vs a size_t
> length.

It's a tricky part.

The maximum number of bytes that could be actually written by write() in
one shot is implementation dependent and limited by fs (or VFS) driver.
Typically it is less than MAX_INT.

If len overflow it, write just return the number of bytes actually
written without setting an error.

> Maybe I should change it back to checking for return value < 0 (instead
> of only checking for OS_ERR), and then keep 'n' as ssize_t?

I would prefer this way.

-Dmitry

> 
> - Andreas
> 
> On 2016-01-18 13:11, Dmitry Samersoff wrote:
>> Андреас,
>>
>> 481: It's better to declare n as ssize_t and remove cast at 488
>>
>> Looks good for me!
>>
>> -Dmitry
>>
>>
>> On 2016-01-18 14:42, Andreas Eriksson wrote:
>>> Hi,
>>>
>>> New webrev:
>>> http://cr.openjdk.java.net/~aeriksso/8129419/webrev.02/hotspot/
>>>
>>> Write_internal handles writes less than len, and EINTR.
>>>
>>> EINTR is taken care of by os::write, but to use it I had to remove an
>>> assert from the solaris os::write, which checked that a JavaThread was
>>> in_native. Since heap dumping is done from the vm_thread (not a
>>> JavaThread) the assert crashed the VM.
>>>
>>> Regards,
>>> Andreas
>>>
>>> On 2016-01-13 13:39, Andreas Eriksson wrote:
>>>> Hi,
>>>>
>>>> On 2015-12-29 21:27, Dmitry Samersoff wrote:
>>>>> Andreas,
>>>>>
>>>>> Great work. All but write_internal looks good for me (see below).
>>>>>
>>>>>
>>>>> HprofReader.java:
>>>>>
>>>>> Nit -  length -= skipped; should be after if skipped == 0
>>>>>
>>>>> heapDumper.cpp:480
>>>>>
>>>>> 1. For windows you can do :
>>>>>       assert(len < (size_t)UINT_MAX, ... );
>>>>>       ::write( ..., (uint) (len & UINT_MAX));
>>>>>
>>>>> 2. You should check for n < len and if it is true,
>>>>> deal with errno. n = 0 is legitimate case,
>>>>> so assert is not necessary.
>>>> I only think n = 0 is valid if write is called with length 0, which
>>>> write_internal will never do.
>>>> Otherwise, according to posix, n will be -1 if an error occurred, or
>>>> greater than zero if some bytes were written. [1]
>>>> If some bytes but not all were written then the while(len > 0) loop
>>>> will make sure we try to write the rest of the bytes.
>>>> It looks like windows write should never return 0 when len != 0
>>>> either. [2]
>>>>
>>>> I should however handle the -1 result with errno EINTR, working on a
>>>> new webrev.
>>>>
>>>> Thanks,
>>>> Andreas
>>>>
>>>> [1]
>>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
>>>> If write() is interrupted by a signal before it writes any data, it
>>>> shall return -1 with errno set to [EINTR].
>>>> If write() is interrupted by a signal after it successfully writes
>>>> some data, it shall return the number of bytes written.
>>>>
>>>> [2] https://msdn.microsoft.com/en-us/library/1570wh78(v=vs.80).aspx
>>>>
>>>>> -Dmitry
>>>>>
>>>>> On 2015-12-28 17:02, Andreas Eriksson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Here is the webrev for type changes.
>>>>>> Top-repo:
>>>>>> http://cr.openjdk.java.net/~aeriksso/8129419/webrev.01/jdk9-hs-rt/
>>>>>> Hotspot:
>>>>>> http://cr.openjdk.java.net/~aeriksso/8129419/webrev.01/hotspot/
>>>>>>
>>>>>> The windows_x64 native write function uses a length of type uint, not
>>>>>> size_t. I added an ifdef for that case and handled it, but better
>>>>>> suggestions are welcome.
>>>>>>
>>>>>> Also found and fixed another problem in the hprof parser when
>>>>>> skipping
>>>>>> over bytes.
>>>>>>
>>>>>> I've not included the tests, they have OOM issues on several JPRT
>>>>>> hosts,
>>>>>> and until I've figured out a way to fix that I wont be pushing them.
>>>>>>
>>>>>> Thanks,
>>>>>> Andreas
>>>>>>
>>>>>> On 2015-12-14 19:34, Andreas Eriksson wrote:
>>>>>>> Hi Dmitry, thanks for looking at this.
>>>>>>>
>>>>>>>
>>>>>>> On 2015-12-14 18:30, Dmitry Samersoff wrote:
>>>>>>>> Andreas,
>>>>>>>>
>>>>>>>> Nice cleanup.
>>>>>>>>
>>>>>>>> Some generic comments below.
>>>>>>>>
>>>>>>>> 1. It would be excellent if you can split webrev into two parts,
>>>>>>>> one
>>>>>>>> part with types cleanup and other part with array truncation
>>>>>>>> related
>>>>>>>> changes or ever push these changes separately as it address
>>>>>>>> different
>>>>>>>> problems.
>>>>>>>>
>>>>>>>> Type cleanup could be reviewed quickly, but review of array
>>>>>>>> truncation
>>>>>>>> requires some time.
>>>>>>>>
>>>>>>>> (We already have two different CRs: JDK-8129419 for type cleanup
>>>>>>>> and
>>>>>>>> JDK-8144732 for array truncation)
>>>>>>> Yes, that's a good point.
>>>>>>>
>>>>>>>> 2.
>>>>>>>> it might be better to use size_t and jlong (or julong) across all
>>>>>>>> file
>>>>>>>> and get rid of all other types with a few exclusion.
>>>>>>> I'll see what I can do, and we can look at it closer once I've split
>>>>>>> the type changes into a separate webrev.
>>>>>>>
>>>>>>>> 3.
>>>>>>>> Each assert costs some time in nightly testing, so we probably
>>>>>>>> don't
>>>>>>>> need as much asserts.
>>>>>>> Alright.
>>>>>>>
>>>>>>>> 4. write_internal:
>>>>>>>>
>>>>>>>>      There are couple of cases where ::write writes less bytes than
>>>>>>>> requested and doesn't return -1.
>>>>>>>>      So we should check if ::write returns a value less that len
>>>>>>>> and if it
>>>>>>>> happens deal with errno - either repeat entire write
>>>>>>>> attempt(EINTR/EAGAIN) or abort writing.
>>>>>>> Actually, I meant to ask about that, but forgot:
>>>>>>> I tried using os::write, which handles EINTR/EAGAIN if necessary
>>>>>>> (depending on platform), but Solaris has an assert that fails:
>>>>>>> http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/file/5a42c1dde332/src/os/solaris/vm/os_solaris.cpp#l5692
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It checks that os::write is called by a JavaThread that is
>>>>>>> in_native,
>>>>>>> which we are not, since we are in the VM_thread doing a
>>>>>>> safepoint_op.
>>>>>>> Does anyone know if that assert is really needed? It's only done for
>>>>>>> Solaris.
>>>>>>>
>>>>>>>> 5. we should handle zero-length array in
>>>>>>>> calculate_array_max_length -
>>>>>>>> it's a legitimate case
>>>>>>> Not sure what you mean here, I believe it does handle zero-length
>>>>>>> arrays.
>>>>>>> It should just fall through without taking any of the if-clauses.
>>>>>>> (Or do you mean that I should add a return immediately at the top if
>>>>>>> length is zero?)
>>>>>>>
>>>>>>>> 6. max_juint is less than SegmentedHeapDumpThreshold so
>>>>>>>> non-segmented
>>>>>>>> heapdump can't contain huge array and it's better to check for
>>>>>>>> segmented
>>>>>>>> dump before any other checks.
>>>>>>> Correct me if I'm wrong here, but SegmentedHeapDumpThreshold
>>>>>>> could in
>>>>>>> theory be set so that we have a non-segmented heap dump while still
>>>>>>> having huge arrays.
>>>>>>> Not sure if this is worth taking into account, since it most likely
>>>>>>> would lead to other problems as well, and the flag is develop
>>>>>>> only, so
>>>>>>> it can't happen in product builds.
>>>>>>> What do you think would be best to do here?
>>>>>>>
>>>>>>> - Andreas
>>>>>>>
>>>>>>>> -Dmitry
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2015-12-14 18:38, Andreas Eriksson wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Please review this fix for dumping of long arrays, and general
>>>>>>>>> cleanup
>>>>>>>>> of types in heapDumper.cpp.
>>>>>>>>>
>>>>>>>>> Problem:
>>>>>>>>> At several places in heapDumper.cpp overflows could happen when
>>>>>>>>> dumping
>>>>>>>>> long arrays.
>>>>>>>>> Also the hprof format uses an u4 as a record length field, but
>>>>>>>>> arrays
>>>>>>>>> can be longer than that (counted in bytes).
>>>>>>>>>
>>>>>>>>> Fix:
>>>>>>>>> Many types that were previously signed are changed to equivalent
>>>>>>>>> unsigned types and/or to a larger type to prevent overflow.
>>>>>>>>> The bulk of the change though is the addition of
>>>>>>>>> calculate_array_max_length, which for a given array returns the
>>>>>>>>> number
>>>>>>>>> of elements we can dump. That length is then used to truncate
>>>>>>>>> arrays
>>>>>>>>> that are too long.
>>>>>>>>> Whenever an array is truncated a warning is printed:
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: cannot dump array of
>>>>>>>>> type
>>>>>>>>> object[] with length 1,073,741,823; truncating to length
>>>>>>>>> 536,870,908
>>>>>>>>>
>>>>>>>>> Much of the rest of the change is moving functions needed by
>>>>>>>>> calculate_array_max_length to the DumpWriter or DumperSupport
>>>>>>>>> class so
>>>>>>>>> that they can be accessed.
>>>>>>>>>
>>>>>>>>> Added a test that relies on the hprof parser, which also had a
>>>>>>>>> couple of
>>>>>>>>> overflow problems (top repo changes).
>>>>>>>>> I've also run this change against a couple of other tests, but
>>>>>>>>> they are
>>>>>>>>> problematic in JPRT because they are using large heaps and lots of
>>>>>>>>> disk.
>>>>>>>>>
>>>>>>>>> Bug:
>>>>>>>>> 8129419: heapDumper.cpp: assert(length_in_bytes > 0) failed:
>>>>>>>>> nothing to
>>>>>>>>> copy
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8129419
>>>>>>>>>
>>>>>>>>> Also fixed in this change is the problems seen in these two bugs:
>>>>>>>>> 8133317: Integer overflow in heapDumper.cpp leads to corrupt HPROF
>>>>>>>>> dumps
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133317
>>>>>>>>>
>>>>>>>>> 8144732: VM_HeapDumper hits assert with bad dump_len
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8144732
>>>>>>>>>
>>>>>>>>> Webrev:
>>>>>>>>> Top repo:
>>>>>>>>> http://cr.openjdk.java.net/~aeriksso/8129419/webrev.00/jdk9-hs-rt/
>>>>>>>>> Hotspot:
>>>>>>>>> http://cr.openjdk.java.net/~aeriksso/8129419/webrev.00/hotspot/
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Andreas
>>
> 

-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.