RFR(xs): (aix but affects shared code too) 8186665: buffer overflow in Java_java_nio_MappedByteBuffer_isLoaded0

Thu Oct 19 13:21:58 UTC 2017

Hi Peter, Christoph,

if you have no objections, I'd like to push this change. As I explained in
my earlier mail, I'd prefer not to change MappedByteBuffer::load(), and if
you are fine with the change in its current form (http://cr.openjdk.
java.net/~stuefe/webrevs/8186665-buffer-overflow-in-
mincore/webrev.02/webrev/), I'd like to push it.

Thanks, Thomas

On Wed, Oct 18, 2017 at 12:24 PM, Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

> Hi Peter, Christoph,
>
> Thank you both for reviewing.
>
> New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
> 8186665-buffer-overflow-in-mincore/webrev.02/webrev/
>
> @Peter:
>
> >Shouldn't the following line:
> >
> >  47     return len2 + pagesize - 1 / pagesize;
> >
> >..be written as:
> >
> >           return (len2 + pagesize - 1) / pagesize;
>
> You are right. Did not cause the mincore output buffer to be unnecessarily
> large. Thanks for catching this.
>
> As for your other concern:
>
>
> On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart at gmail.com>
> wrote:
>
>> --
>> In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to
>> retrieve the paging status of an address range.
>>
>> The size of the output buffer for mincore(2) depends on the number of
>> pages in *system page size* in the given memory range (this is spelled out
>> more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I
>> assume the same). The number of pages in the memory range is calculated by
>> MappedByteBuffer.isLoaded() and handed down to
>> Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory
>> range to test.
>>
>> MappedByteBuffer.isLoaded() calculates this number of pages based on
>> jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to
>> os::vm_page_size().
>>
>> For AIX, os::vm_page_size() may return a page size larger than the system
>> page size of 4K. The reason for this is that on AIX, memory can be backed
>> by different page sizes, usually either 4K or 64K - e.g. posix thread
>> stacks may have 4K pages, java heap (system V shared memory) with 64K
>> pages, but mmap memory is always 4K page backed...
>>
>>
>> If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps
>> also the MappedByteBuffer.load() method is wrong for AIX?
>>
>>     public final MappedByteBuffer load() {
>>         checkMapped();
>>         if ((address == 0) || (capacity() == 0))
>>             return this;
>>         long offset = mappingOffset();
>>         long length = mappingLength(offset);
>>         load0(mappingAddress(offset), length);
>>
>>         // Read a byte from each page to bring it into memory. A checksum
>>         // is computed as we go along to prevent the compiler from
>> otherwise
>>         // considering the loop as dead code.
>>         Unsafe unsafe = Unsafe.getUnsafe();
>>         int ps = Bits.pageSize(); // << LOOK HERE
>>         int count = Bits.pageCount(length);
>>         long a = mappingAddress(offset);
>>         byte x = 0;
>>         for (int i=0; i<count; i++) {
>>             x ^= unsafe.getByte(a);
>>             a += ps; // << AND HERE
>>         }
>>         if (unused != 0)
>>             unused = x;
>>
>>         return this;
>>     }
>>
>> ...this loop reads a byte from the start of each block in lumps of
>> Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather
>> need a special Unsafe.mmappedPageSize() method instead of just a hack in
>> isLoaded0 ?
>>
>>
> Yes, I considered this too. In effect, on AIX, we only touch every 16th
> page, thereby reducing MappedByteBuffer::load() to something like
> load_every_16th_page... However, this bug is very old (even our 1.4 VM
> already does this when the touching was still implemented in
> MappedByteBuffer.c) and did not cause any problems AFAIK.
>
> The story behind this is long and quite boring :) basically, 64k pages are
> used for the java heap and give a large performance bonus over 4K paged
> java heap. But we cannot switch all memory regions to 64K pages, so we live
> with multiple page sizes and above us we have a ton of code which assumes
> one consistent page size for everything. So we lie about the page size to
> everyone - claiming system page size to be 64k - and except for very rare
> cases like this one get away with this.
>
> I would like to keep lying consistently. There is not a hard reason for
> it, just that I am afraid that starting to publish a different page size to
> parts of the VM will confuse things and may introduce errors further down
> the line.
>
> I think a proper solution would be to keep page size on a per-ByteBuffer
> base, because ByteBuffers may be allocated in different memory regions -
> they are now allocated with mmap() in system page size, but that may change
> in the future. That is assuming that one byte buffer cannot span areas of
> multiple page sizes, which would complicate matters further.
>
> Btw, I also wondered whether other platforms could have a clash between
> the real memory page size and MappedByteBuffer's notion of that size - e.g.
> whether it is possible to have MappedByteBuffers with huge pages on Linux.
> But all cases I could think of are cases where the page size the JDK would
> assume is smaller than the actual page size, which would not be a problem
> for both mincore and load/touch. In the above example (huge pages on
> Linux), pages would be pinned anyway, so load() and isLoaded() would be
> noops.
>
>
> @Christoph:
>
> > apart from the point that Peter found, I’d also think it would look
> better if the typedef section (line 51-56) would be placed before the AIX
> only function (line 41-49).
>
>
> Sure. I moved the section up, below the includes.
>
> Kind Regards, Thomas
>
>