Re: RFR(xs): (aix but affects shared code too) 8186665: buffer overflow in Java_java_nio_MappedByteBuffer_isLoaded0
Hi all, I am wrapping up fixes which did not make it into the repo before the consolidation. Alan already reviewed the last webrev, but I need a second reviewer. Issue: https://bugs.openjdk.java.net/browse/JDK-8186665 Last Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-ov erflow-in-mincore/webrev.01/webrev Issue text for your convenience: -- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range. The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test. MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size(). For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed... But as the OpenJDK code base generally assumes one homogeneous page size for everything - which is usually synonymous with os::vm_page_size() - a decision had to be made which page size to assume as a global system page size, and this may be a larger page size than the 4K system page size mincore(2) assumes. This usually is no problem, but with mincore(2) it is: as the size of the output buffer depends on the number of pages, calculating with a too-large page size causes the output buffer to be too small and hence the buffer overflows. The solution must be to base the size of the mincore output buffer on the system page size. -- Thanks and Kind Regards, Thomas On Thu, Aug 31, 2017 at 9:39 PM, Alan Bateman <Alan.Bateman@oracle.com> wrote:
On 31/08/2017 19:01, Thomas Stüfe wrote:
Hi Alan,
thank you for your review!
Updated webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-ov erflow-in-mincore/webrev.01/webrev/ <http://cr.openjdk.java.net/%7 Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>
I fixed the indention and embellished the comments around the sentinel at the end of the buffer somewhat.
This looks okay to me.
-Alan
Hi Thomas, Shouldn't the following line: 47 return len2 + pagesize - 1 / pagesize; ..be written as: return (len2 + pagesize - 1) / pagesize; Regards, Peter On 10/18/2017 09:44 AM, Thomas Stüfe wrote:
Hi all,
I am wrapping up fixes which did not make it into the repo before the consolidation. Alan already reviewed the last webrev, but I need a second reviewer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8186665 <https://bugs.openjdk.java.net/browse/JDK-8186665> Last Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>
Issue text for your convenience:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
But as the OpenJDK code base generally assumes one homogeneous page size for everything - which is usually synonymous with os::vm_page_size() - a decision had to be made which page size to assume as a global system page size, and this may be a larger page size than the 4K system page size mincore(2) assumes.
This usually is no problem, but with mincore(2) it is: as the size of the output buffer depends on the number of pages, calculating with a too-large page size causes the output buffer to be too small and hence the buffer overflows. The solution must be to base the size of the mincore output buffer on the system page size.
--
Thanks and Kind Regards, Thomas
On Thu, Aug 31, 2017 at 9:39 PM, Alan Bateman <Alan.Bateman@oracle.com <mailto:Alan.Bateman@oracle.com>> wrote:
On 31/08/2017 19:01, Thomas Stüfe wrote:
Hi Alan,
thank you for your review!
Updated webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-minc... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>>
I fixed the indention and embellished the comments around the sentinel at the end of the buffer somewhat.
This looks okay to me.
-Alan
On 10/18/2017 10:14 AM, Peter Levart wrote:
Hi Thomas,
Shouldn't the following line:
47 return len2 + pagesize - 1 / pagesize;
..be written as:
return (len2 + pagesize - 1) / pagesize;
...or better yet, as: return numPages; (you already calculate it correctly in previous line, but then return an expression, which is wrong). Regards, Peter
Regards, Peter
On 10/18/2017 09:44 AM, Thomas Stüfe wrote:
Hi all,
I am wrapping up fixes which did not make it into the repo before the consolidation. Alan already reviewed the last webrev, but I need a second reviewer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8186665 <https://bugs.openjdk.java.net/browse/JDK-8186665> Last Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>
Issue text for your convenience:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
But as the OpenJDK code base generally assumes one homogeneous page size for everything - which is usually synonymous with os::vm_page_size() - a decision had to be made which page size to assume as a global system page size, and this may be a larger page size than the 4K system page size mincore(2) assumes.
This usually is no problem, but with mincore(2) it is: as the size of the output buffer depends on the number of pages, calculating with a too-large page size causes the output buffer to be too small and hence the buffer overflows. The solution must be to base the size of the mincore output buffer on the system page size.
--
Thanks and Kind Regards, Thomas
On Thu, Aug 31, 2017 at 9:39 PM, Alan Bateman <Alan.Bateman@oracle.com <mailto:Alan.Bateman@oracle.com>> wrote:
On 31/08/2017 19:01, Thomas Stüfe wrote:
Hi Alan,
thank you for your review!
Updated webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-minc... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>>
I fixed the indention and embellished the comments around the sentinel at the end of the buffer somewhat.
This looks okay to me.
-Alan
Hi Thomas, apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49). Other than that, it looks good to me. Best regards Christoph From: nio-dev [mailto:nio-dev-bounces@openjdk.java.net] On Behalf Of Peter Levart Sent: Mittwoch, 18. Oktober 2017 10:17 To: Thomas Stüfe <thomas.stuefe@gmail.com>; Alan Bateman <Alan.Bateman@oracle.com> Cc: nio-dev@openjdk.java.net; ppc-aix-port-dev@openjdk.java.net; Java Core Libs <core-libs-dev@openjdk.java.net> Subject: Re: RFR(xs): (aix but affects shared code too) 8186665: buffer overflow in Java_java_nio_MappedByteBuffer_isLoaded0 On 10/18/2017 10:14 AM, Peter Levart wrote: Hi Thomas, Shouldn't the following line: 47 return len2 + pagesize - 1 / pagesize; ..be written as: return (len2 + pagesize - 1) / pagesize; ...or better yet, as: return numPages; (you already calculate it correctly in previous line, but then return an expression, which is wrong). Regards, Peter Regards, Peter On 10/18/2017 09:44 AM, Thomas Stüfe wrote: Hi all, I am wrapping up fixes which did not make it into the repo before the consolidation. Alan already reviewed the last webrev, but I need a second reviewer. Issue: https://bugs.openjdk.java.net/browse/JDK-8186665 Last Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev<http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/> Issue text for your convenience: -- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range. The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test. MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size(). For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed... But as the OpenJDK code base generally assumes one homogeneous page size for everything - which is usually synonymous with os::vm_page_size() - a decision had to be made which page size to assume as a global system page size, and this may be a larger page size than the 4K system page size mincore(2) assumes. This usually is no problem, but with mincore(2) it is: as the size of the output buffer depends on the number of pages, calculating with a too-large page size causes the output buffer to be too small and hence the buffer overflows. The solution must be to base the size of the mincore output buffer on the system page size. -- Thanks and Kind Regards, Thomas On Thu, Aug 31, 2017 at 9:39 PM, Alan Bateman <Alan.Bateman@oracle.com<mailto:Alan.Bateman@oracle.com>> wrote: On 31/08/2017 19:01, Thomas Stüfe wrote: Hi Alan, thank you for your review! Updated webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/<http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/> I fixed the indention and embellished the comments around the sentinel at the end of the buffer somewhat. This looks okay to me. -Alan
Hi Thomas, Just one more concern... On 10/18/2017 09:44 AM, Thomas Stüfe wrote:
Hi all,
I am wrapping up fixes which did not make it into the repo before the consolidation. Alan already reviewed the last webrev, but I need a second reviewer.
Issue: https://bugs.openjdk.java.net/browse/JDK-8186665 <https://bugs.openjdk.java.net/browse/JDK-8186665> Last Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>
Issue text for your convenience:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX? public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length); // Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x; return this; } ...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ? Regards, Peter
But as the OpenJDK code base generally assumes one homogeneous page size for everything - which is usually synonymous with os::vm_page_size() - a decision had to be made which page size to assume as a global system page size, and this may be a larger page size than the 4K system page size mincore(2) assumes.
This usually is no problem, but with mincore(2) it is: as the size of the output buffer depends on the number of pages, calculating with a too-large page size causes the output buffer to be too small and hence the buffer overflows. The solution must be to base the size of the mincore output buffer on the system page size.
--
Thanks and Kind Regards, Thomas
On Thu, Aug 31, 2017 at 9:39 PM, Alan Bateman <Alan.Bateman@oracle.com <mailto:Alan.Bateman@oracle.com>> wrote:
On 31/08/2017 19:01, Thomas Stüfe wrote:
Hi Alan,
thank you for your review!
Updated webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-minc... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.01/webrev/>>
I fixed the indention and embellished the comments around the sentinel at the end of the buffer somewhat.
This looks okay to me.
-Alan
Hi Peter, Christoph, Thank you both for reviewing. New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... @Peter:
Shouldn't the following line:
47 return len2 + pagesize - 1 / pagesize;
..be written as:
return (len2 + pagesize - 1) / pagesize;
You are right. Did not cause the mincore output buffer to be unnecessarily large. Thanks for catching this. As for your other concern: On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart@gmail.com> wrote:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX?
public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length);
// Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x;
return this; }
...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ?
Yes, I considered this too. In effect, on AIX, we only touch every 16th page, thereby reducing MappedByteBuffer::load() to something like load_every_16th_page... However, this bug is very old (even our 1.4 VM already does this when the touching was still implemented in MappedByteBuffer.c) and did not cause any problems AFAIK. The story behind this is long and quite boring :) basically, 64k pages are used for the java heap and give a large performance bonus over 4K paged java heap. But we cannot switch all memory regions to 64K pages, so we live with multiple page sizes and above us we have a ton of code which assumes one consistent page size for everything. So we lie about the page size to everyone - claiming system page size to be 64k - and except for very rare cases like this one get away with this. I would like to keep lying consistently. There is not a hard reason for it, just that I am afraid that starting to publish a different page size to parts of the VM will confuse things and may introduce errors further down the line. I think a proper solution would be to keep page size on a per-ByteBuffer base, because ByteBuffers may be allocated in different memory regions - they are now allocated with mmap() in system page size, but that may change in the future. That is assuming that one byte buffer cannot span areas of multiple page sizes, which would complicate matters further. Btw, I also wondered whether other platforms could have a clash between the real memory page size and MappedByteBuffer's notion of that size - e.g. whether it is possible to have MappedByteBuffers with huge pages on Linux. But all cases I could think of are cases where the page size the JDK would assume is smaller than the actual page size, which would not be a problem for both mincore and load/touch. In the above example (huge pages on Linux), pages would be pinned anyway, so load() and isLoaded() would be noops. @Christoph:
apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49).
Sure. I moved the section up, below the includes. Kind Regards, Thomas
Hi Peter, Christoph, if you have no objections, I'd like to push this change. As I explained in my earlier mail, I'd prefer not to change MappedByteBuffer::load(), and if you are fine with the change in its current form (http://cr.openjdk. java.net/~stuefe/webrevs/8186665-buffer-overflow-in- mincore/webrev.02/webrev/), I'd like to push it. Thanks, Thomas On Wed, Oct 18, 2017 at 12:24 PM, Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Peter, Christoph,
Thank you both for reviewing.
New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/ 8186665-buffer-overflow-in-mincore/webrev.02/webrev/
@Peter:
Shouldn't the following line:
47 return len2 + pagesize - 1 / pagesize;
..be written as:
return (len2 + pagesize - 1) / pagesize;
You are right. Did not cause the mincore output buffer to be unnecessarily large. Thanks for catching this.
As for your other concern:
On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart@gmail.com> wrote:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX?
public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length);
// Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x;
return this; }
...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ?
Yes, I considered this too. In effect, on AIX, we only touch every 16th page, thereby reducing MappedByteBuffer::load() to something like load_every_16th_page... However, this bug is very old (even our 1.4 VM already does this when the touching was still implemented in MappedByteBuffer.c) and did not cause any problems AFAIK.
The story behind this is long and quite boring :) basically, 64k pages are used for the java heap and give a large performance bonus over 4K paged java heap. But we cannot switch all memory regions to 64K pages, so we live with multiple page sizes and above us we have a ton of code which assumes one consistent page size for everything. So we lie about the page size to everyone - claiming system page size to be 64k - and except for very rare cases like this one get away with this.
I would like to keep lying consistently. There is not a hard reason for it, just that I am afraid that starting to publish a different page size to parts of the VM will confuse things and may introduce errors further down the line.
I think a proper solution would be to keep page size on a per-ByteBuffer base, because ByteBuffers may be allocated in different memory regions - they are now allocated with mmap() in system page size, but that may change in the future. That is assuming that one byte buffer cannot span areas of multiple page sizes, which would complicate matters further.
Btw, I also wondered whether other platforms could have a clash between the real memory page size and MappedByteBuffer's notion of that size - e.g. whether it is possible to have MappedByteBuffers with huge pages on Linux. But all cases I could think of are cases where the page size the JDK would assume is smaller than the actual page size, which would not be a problem for both mincore and load/touch. In the above example (huge pages on Linux), pages would be pinned anyway, so load() and isLoaded() would be noops.
@Christoph:
apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49).
Sure. I moved the section up, below the includes.
Kind Regards, Thomas
Hi Thomas, ok from my end. Best regards Christoph From: Thomas Stüfe [mailto:thomas.stuefe@gmail.com] Sent: Donnerstag, 19. Oktober 2017 15:22 To: Peter Levart <peter.levart@gmail.com>; Langer, Christoph <christoph.langer@sap.com> Cc: Alan Bateman <Alan.Bateman@oracle.com>; nio-dev@openjdk.java.net; ppc-aix-port-dev@openjdk.java.net; Java Core Libs <core-libs-dev@openjdk.java.net> Subject: Re: RFR(xs): (aix but affects shared code too) 8186665: buffer overflow in Java_java_nio_MappedByteBuffer_isLoaded0 Hi Peter, Christoph, if you have no objections, I'd like to push this change. As I explained in my earlier mail, I'd prefer not to change MappedByteBuffer::load(), and if you are fine with the change in its current form (http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor...), I'd like to push it. Thanks, Thomas On Wed, Oct 18, 2017 at 12:24 PM, Thomas Stüfe <thomas.stuefe@gmail.com<mailto:thomas.stuefe@gmail.com>> wrote: Hi Peter, Christoph, Thank you both for reviewing. New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... @Peter:
Shouldn't the following line:
47 return len2 + pagesize - 1 / pagesize;
..be written as:
return (len2 + pagesize - 1) / pagesize;
You are right. Did not cause the mincore output buffer to be unnecessarily large. Thanks for catching this. As for your other concern: On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart@gmail.com<mailto:peter.levart@gmail.com>> wrote: -- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range. The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test. MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size(). For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed... If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX? public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length); // Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x; return this; } ...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ? Yes, I considered this too. In effect, on AIX, we only touch every 16th page, thereby reducing MappedByteBuffer::load() to something like load_every_16th_page... However, this bug is very old (even our 1.4 VM already does this when the touching was still implemented in MappedByteBuffer.c) and did not cause any problems AFAIK. The story behind this is long and quite boring :) basically, 64k pages are used for the java heap and give a large performance bonus over 4K paged java heap. But we cannot switch all memory regions to 64K pages, so we live with multiple page sizes and above us we have a ton of code which assumes one consistent page size for everything. So we lie about the page size to everyone - claiming system page size to be 64k - and except for very rare cases like this one get away with this. I would like to keep lying consistently. There is not a hard reason for it, just that I am afraid that starting to publish a different page size to parts of the VM will confuse things and may introduce errors further down the line. I think a proper solution would be to keep page size on a per-ByteBuffer base, because ByteBuffers may be allocated in different memory regions - they are now allocated with mmap() in system page size, but that may change in the future. That is assuming that one byte buffer cannot span areas of multiple page sizes, which would complicate matters further. Btw, I also wondered whether other platforms could have a clash between the real memory page size and MappedByteBuffer's notion of that size - e.g. whether it is possible to have MappedByteBuffers with huge pages on Linux. But all cases I could think of are cases where the page size the JDK would assume is smaller than the actual page size, which would not be a problem for both mincore and load/touch. In the above example (huge pages on Linux), pages would be pinned anyway, so load() and isLoaded() would be noops. @Christoph:
apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49).
Sure. I moved the section up, below the includes. Kind Regards, Thomas
Thanks Christoph! On Thu, Oct 19, 2017 at 3:22 PM, Langer, Christoph <christoph.langer@sap.com
wrote:
Hi Thomas,
ok from my end.
Best regards
Christoph
*From:* Thomas Stüfe [mailto:thomas.stuefe@gmail.com] *Sent:* Donnerstag, 19. Oktober 2017 15:22 *To:* Peter Levart <peter.levart@gmail.com>; Langer, Christoph < christoph.langer@sap.com> *Cc:* Alan Bateman <Alan.Bateman@oracle.com>; nio-dev@openjdk.java.net; ppc-aix-port-dev@openjdk.java.net; Java Core Libs < core-libs-dev@openjdk.java.net> *Subject:* Re: RFR(xs): (aix but affects shared code too) 8186665: buffer overflow in Java_java_nio_MappedByteBuffer_isLoaded0
Hi Peter, Christoph,
if you have no objections, I'd like to push this change. As I explained in my earlier mail, I'd prefer not to change MappedByteBuffer::load(), and if you are fine with the change in its current form ( http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer- overflow-in-mincore/webrev.02/webrev/), I'd like to push it.
Thanks, Thomas
On Wed, Oct 18, 2017 at 12:24 PM, Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Peter, Christoph,
Thank you both for reviewing.
New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/ 8186665-buffer-overflow-in-mincore/webrev.02/webrev/
@Peter:
Shouldn't the following line:
47 return len2 + pagesize - 1 / pagesize;
..be written as:
return (len2 + pagesize - 1) / pagesize;
You are right. Did not cause the mincore output buffer to be unnecessarily large. Thanks for catching this.
As for your other concern:
On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart@gmail.com> wrote:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX?
public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length);
// Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x;
return this; }
...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ?
Yes, I considered this too. In effect, on AIX, we only touch every 16th page, thereby reducing MappedByteBuffer::load() to something like load_every_16th_page... However, this bug is very old (even our 1.4 VM already does this when the touching was still implemented in MappedByteBuffer.c) and did not cause any problems AFAIK.
The story behind this is long and quite boring :) basically, 64k pages are used for the java heap and give a large performance bonus over 4K paged java heap. But we cannot switch all memory regions to 64K pages, so we live with multiple page sizes and above us we have a ton of code which assumes one consistent page size for everything. So we lie about the page size to everyone - claiming system page size to be 64k - and except for very rare cases like this one get away with this.
I would like to keep lying consistently. There is not a hard reason for it, just that I am afraid that starting to publish a different page size to parts of the VM will confuse things and may introduce errors further down the line.
I think a proper solution would be to keep page size on a per-ByteBuffer base, because ByteBuffers may be allocated in different memory regions - they are now allocated with mmap() in system page size, but that may change in the future. That is assuming that one byte buffer cannot span areas of multiple page sizes, which would complicate matters further.
Btw, I also wondered whether other platforms could have a clash between the real memory page size and MappedByteBuffer's notion of that size - e.g. whether it is possible to have MappedByteBuffers with huge pages on Linux. But all cases I could think of are cases where the page size the JDK would assume is smaller than the actual page size, which would not be a problem for both mincore and load/touch. In the above example (huge pages on Linux), pages would be pinned anyway, so load() and isLoaded() would be noops.
@Christoph:
apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49).
Sure. I moved the section up, below the includes.
Kind Regards, Thomas
Hi Thomas, Right. I can understand the situation and potential problems of introducing API for mmappedPageSize. So let's keep pretending that page size is uniform for all memory regions used by VM and see where this brings us. The change fixes the problem, although in a hack-ish way. It will be good for now. Regards, Peter On 10/19/17 15:21, Thomas Stüfe wrote:
Hi Peter, Christoph,
if you have no objections, I'd like to push this change. As I explained in my earlier mail, I'd prefer not to change MappedByteBuffer::load(), and if you are fine with the change in its current form (http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.02/webrev/>), I'd like to push it.
Thanks, Thomas
On Wed, Oct 18, 2017 at 12:24 PM, Thomas Stüfe <thomas.stuefe@gmail.com <mailto:thomas.stuefe@gmail.com>> wrote:
Hi Peter, Christoph,
Thank you both for reviewing.
New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor... <http://cr.openjdk.java.net/%7Estuefe/webrevs/8186665-buffer-overflow-in-mincore/webrev.02/webrev/>
@Peter:
>Shouldn't the following line: > > 47 return len2 + pagesize - 1 / pagesize; > >..be written as: > > return (len2 + pagesize - 1) / pagesize;
You are right. Did not cause the mincore output buffer to be unnecessarily large. Thanks for catching this.
As for your other concern:
On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart@gmail.com <mailto:peter.levart@gmail.com>> wrote:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX?
public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length);
// Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x;
return this; }
...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ?
Yes, I considered this too. In effect, on AIX, we only touch every 16th page, thereby reducing MappedByteBuffer::load() to something like load_every_16th_page... However, this bug is very old (even our 1.4 VM already does this when the touching was still implemented in MappedByteBuffer.c) and did not cause any problems AFAIK.
The story behind this is long and quite boring :) basically, 64k pages are used for the java heap and give a large performance bonus over 4K paged java heap. But we cannot switch all memory regions to 64K pages, so we live with multiple page sizes and above us we have a ton of code which assumes one consistent page size for everything. So we lie about the page size to everyone - claiming system page size to be 64k - and except for very rare cases like this one get away with this.
I would like to keep lying consistently. There is not a hard reason for it, just that I am afraid that starting to publish a different page size to parts of the VM will confuse things and may introduce errors further down the line.
I think a proper solution would be to keep page size on a per-ByteBuffer base, because ByteBuffers may be allocated in different memory regions - they are now allocated with mmap() in system page size, but that may change in the future. That is assuming that one byte buffer cannot span areas of multiple page sizes, which would complicate matters further.
Btw, I also wondered whether other platforms could have a clash between the real memory page size and MappedByteBuffer's notion of that size - e.g. whether it is possible to have MappedByteBuffers with huge pages on Linux. But all cases I could think of are cases where the page size the JDK would assume is smaller than the actual page size, which would not be a problem for both mincore and load/touch. In the above example (huge pages on Linux), pages would be pinned anyway, so load() and isLoaded() would be noops.
@Christoph:
> apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49).
Sure. I moved the section up, below the includes.
Kind Regards, Thomas
Thank you, Peter. ..Thomas On Thu 19. Oct 2017 at 22:42, Peter Levart <peter.levart@gmail.com> wrote:
Hi Thomas,
Right. I can understand the situation and potential problems of introducing API for mmappedPageSize. So let's keep pretending that page size is uniform for all memory regions used by VM and see where this brings us. The change fixes the problem, although in a hack-ish way. It will be good for now.
Regards, Peter
On 10/19/17 15:21, Thomas Stüfe wrote:
Hi Peter, Christoph,
if you have no objections, I'd like to push this change. As I explained in my earlier mail, I'd prefer not to change MappedByteBuffer::load(), and if you are fine with the change in its current form ( http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor...), I'd like to push it.
Thanks, Thomas
On Wed, Oct 18, 2017 at 12:24 PM, Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Peter, Christoph,
Thank you both for reviewing.
New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8186665-buffer-overflow-in-mincor...
@Peter:
Shouldn't the following line:
47 return len2 + pagesize - 1 / pagesize;
..be written as:
return (len2 + pagesize - 1) / pagesize;
You are right. Did not cause the mincore output buffer to be unnecessarily large. Thanks for catching this.
As for your other concern:
On Wed, Oct 18, 2017 at 10:32 AM, Peter Levart <peter.levart@gmail.com> wrote:
-- In Java_java_nio_MappedByteBuffer_isLoaded0, we call mincore(2) to retrieve the paging status of an address range.
The size of the output buffer for mincore(2) depends on the number of pages in *system page size* in the given memory range (this is spelled out more or less explicitly on AIX and Linux, nothing is said on BSD/OSX, but I assume the same). The number of pages in the memory range is calculated by MappedByteBuffer.isLoaded() and handed down to Java_java_nio_MappedByteBuffer_isLoaded0() together with the memory range to test.
MappedByteBuffer.isLoaded() calculates this number of pages based on jjdk.internal.misc.Unsafe.pagesize(), which ultimately comes down to os::vm_page_size().
For AIX, os::vm_page_size() may return a page size larger than the system page size of 4K. The reason for this is that on AIX, memory can be backed by different page sizes, usually either 4K or 64K - e.g. posix thread stacks may have 4K pages, java heap (system V shared memory) with 64K pages, but mmap memory is always 4K page backed...
If this is true and Unsafe.pagesize() returns a value > 4K, then perhaps also the MappedByteBuffer.load() method is wrong for AIX?
public final MappedByteBuffer load() { checkMapped(); if ((address == 0) || (capacity() == 0)) return this; long offset = mappingOffset(); long length = mappingLength(offset); load0(mappingAddress(offset), length);
// Read a byte from each page to bring it into memory. A checksum // is computed as we go along to prevent the compiler from otherwise // considering the loop as dead code. Unsafe unsafe = Unsafe.getUnsafe(); int ps = Bits.pageSize(); // << LOOK HERE int count = Bits.pageCount(length); long a = mappingAddress(offset); byte x = 0; for (int i=0; i<count; i++) { x ^= unsafe.getByte(a); a += ps; // << AND HERE } if (unused != 0) unused = x;
return this; }
...this loop reads a byte from the start of each block in lumps of Bits.pageSize(). Should it always read in lumps of 4K for AIX? Do we rather need a special Unsafe.mmappedPageSize() method instead of just a hack in isLoaded0 ?
Yes, I considered this too. In effect, on AIX, we only touch every 16th page, thereby reducing MappedByteBuffer::load() to something like load_every_16th_page... However, this bug is very old (even our 1.4 VM already does this when the touching was still implemented in MappedByteBuffer.c) and did not cause any problems AFAIK.
The story behind this is long and quite boring :) basically, 64k pages are used for the java heap and give a large performance bonus over 4K paged java heap. But we cannot switch all memory regions to 64K pages, so we live with multiple page sizes and above us we have a ton of code which assumes one consistent page size for everything. So we lie about the page size to everyone - claiming system page size to be 64k - and except for very rare cases like this one get away with this.
I would like to keep lying consistently. There is not a hard reason for it, just that I am afraid that starting to publish a different page size to parts of the VM will confuse things and may introduce errors further down the line.
I think a proper solution would be to keep page size on a per-ByteBuffer base, because ByteBuffers may be allocated in different memory regions - they are now allocated with mmap() in system page size, but that may change in the future. That is assuming that one byte buffer cannot span areas of multiple page sizes, which would complicate matters further.
Btw, I also wondered whether other platforms could have a clash between the real memory page size and MappedByteBuffer's notion of that size - e.g. whether it is possible to have MappedByteBuffers with huge pages on Linux. But all cases I could think of are cases where the page size the JDK would assume is smaller than the actual page size, which would not be a problem for both mincore and load/touch. In the above example (huge pages on Linux), pages would be pinned anyway, so load() and isLoaded() would be noops.
@Christoph:
apart from the point that Peter found, I’d also think it would look better if the typedef section (line 51-56) would be placed before the AIX only function (line 41-49).
Sure. I moved the section up, below the includes.
Kind Regards, Thomas
participants (3)
-
Langer, Christoph
-
Peter Levart
-
Thomas Stüfe