RFR (L) 8028541: Native Memory Tracking enhancement

Mon Jun 16 22:29:51 UTC 2014

Zhengyu,   Here is a patch with some of my comments about the code 
inline because I thought that was easier.  There are some suggestions, 
some requests for better comments and some proposed comments.  The code 
looks really  good.  I ran out of steam on the MallocSiteTable which is 
somewhat more complicated so hopefully others can also help review this.

Coleen


On 6/13/14, 10:02 AM, Zhengyu Gu wrote:
> Actually, I am working on this.
>
> It was caught in between outputStream's format check and Mikael's 
> macros, I need to merge up the repo for this.
>
> There are a lot more instances in memReporter.cpp.
>
> Thanks,
>
> -Zhengyu
>
> On 6/12/2014 10:43 PM, Coleen Phillimore wrote:
>>
>> Zhengyu,
>>
>> In nativeCallStack.cpp why do you need 
>> PRAGMA_FORMAT_MUTE_WARNINGS_FOR_GCC?  Can you change the print 
>> formats to keep gcc from complaining?
>>
>> Coleen
>>
>> On 6/12/14, 8:34 AM, Zhengyu Gu wrote:
>>> Coleen,
>>>
>>> Thanks for the review.
>>>
>>> On 6/11/2014 7:21 PM, Coleen Phillimore wrote:
>>>>
>>>> Hi Zhengyu,
>>>>
>>>> I have early feedback but I haven't looked at the new files yet. 
>>>> Nothing major.  I like the new code and I'm glad you could add NMT 
>>>> tracking to vmError reporting.
>>>>
>>>> http://cr.openjdk.java.net/~zgu/8028541/webrev.00/src/os/solaris/vm/os_solaris.cpp.udiff.html 
>>>>
>>>>
>>>> + bool os::unsetenv(const char* name) {
>>>> +  assert(name != NULL, "Null pointer");
>>>> +  return (::unsetenv(name) == 0);
>>>> + }
>>>> +
>>>>
>>>> Can you add this to os_posix.cpp instead?  Someday this duplicate 
>>>> code will be consolidated so it can start in posix even though it's 
>>>> not posix.
>>>>
>>> OK.  I assume that aix can also use os_posix functions, right?
>>>
>>>> http://cr.openjdk.java.net/~zgu/8028541/webrev.00/src/os/windows/vm/os_windows.cpp.udiff.html 
>>>>
>>>>
>>>> I thought the original comment was more descriptive.  Is it still a 
>>>> workaround?
>>>>
>>> I think we still need this workaround. Auto-merge overwrote 
>>> Christian's comment, I will restore.
>>>
>>>> http://cr.openjdk.java.net/~zgu/8028541/webrev.00/src/share/vm/runtime/arguments.cpp.html 
>>>>
>>>>
>>>> Can you factor lines 3596-3623 into a function like 
>>>> verify_nmt_flag?  Is it that you already know what nmt level is 
>>>> used from the launcher and are just checking consistency and 
>>>> initializing here?  These cascading if() code checking options gets 
>>>> too long if you're not careful.
>>>
>>> Will do.
>>>>
>>>> In memReporter.hpp
>>>>
>>>> A short comment before each class what they are for and how you 
>>>> would get to this code would be nice.  I think they correspond to 
>>>> NMT settings (summary vs. detail). Mem*DiffReporter reports the 
>>>> difference between the current memory tracked and a previous 
>>>> snapshot it appears.
>>> Will do.
>>>>
>>>>
>>>> In memReporter.cpp
>>>>
>>>> Lines 302-311, you could have two functions that does this same 
>>>> calculation for these values so that it doesn't get broken.  The 
>>>> same calculation appears 61 and 64.
>>>>
>>>> If you use size_t can you avoid eagerly scaling the amounts? 
>>>> amount_in_current_scale appears 66 times which is a lot. Can you 
>>>> only call the scaling when you print the amounts?
>>> I think that is what I did.  All calculations are done in byte 
>>> scale, only converts to current scale when checks amount > 0 in 
>>> current scale, and print the number.
>>>
>>>> In MemTracker.hpp
>>>>
>>>> Line 109, I think you can assert that the JVM is single threaded.
>>>>
>>> What's the reliable method to check JVM single thread mode? I tried 
>>> in early NMT implementation, but it was not reliable.
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>> That's as far as I got today.  This looks good!
>>>>
>>>> Coleen
>>>>
>>>> On 5/22/14, 3:19 PM, Zhengyu Gu wrote:
>>>>> This is significant rework of native memory tracking introduced in 
>>>>> earlier releases.
>>>>>
>>>>> The goal of this enhancement is to improve scalability, from both 
>>>>> tracking memory and CPU usage perspectives, so it can scale well 
>>>>> with increased memory allocation in large applications.
>>>>>
>>>>> The enhancement is mainly focused on malloc memory tracking, whose 
>>>>> activities are several magnitude higher than virtual memory, and 
>>>>> was the main bottleneck in early implementation.
>>>>>
>>>>> Instead of using book keeping records for tracking malloc 
>>>>> activities, new implementation co-locates tracking data along side 
>>>>> with user data by using a prefixed header. The header size is 8 
>>>>> bytes on 32-bit systems and 16 bytes on 64-bit systems, which 
>>>>> ensure that user data also align properly.
>>>>>
>>>>> Virtual memory tracking still uses book keeping records, and 
>>>>> ThreadCritical lock is always acquired to alter the records and 
>>>>> related data structures.
>>>>>
>>>>> Summary tracking data is maintained in static data structures, via 
>>>>> atomic operations. Malloc detail tracking call stacks are 
>>>>> maintained in a lock free hashtable.
>>>>>
>>>>> The key improvements:
>>>>>   1. Up-to-date tracking report.
>>>>>   2. Detail tracking now shows multiple call frames. Number of 
>>>>> frames is compilation time decision, currently default to 4.
>>>>>   3. Malloc tracking is lock free.
>>>>>   4. Tracking summary is reported in hs_err file when native 
>>>>> memory tracking is enabled.
>>>>>   5. Query is faster, uses little memory and need a very little 
>>>>> process.
>>>>>
>>>>> The drawback is that, malloc tracking header is always needed if 
>>>>> native memory tracking has ever been enabled, even after tracking 
>>>>> is shutdown.
>>>>>
>>>>> Impacts:
>>>>>   The most noticeable impact for JVM developers, is that Arena now 
>>>>> also take memory type as constructor parameter, besides the new 
>>>>> operators.
>>>>>      Arena* a = new (mtCode) Arena()  => Arena* a = new (mtCode) 
>>>>> Arena(mtCode)
>>>>>
>>>>> The webrev shows modification of about 60 files, but most of them 
>>>>> are due to tracking API changes, mainly due to tracking stack, 
>>>>> now, is an object, vs. a single pc.
>>>>>
>>>>> The most important files for this implementations are:
>>>>>
>>>>> memTracker.hpp/cpp
>>>>> mallocTracker.hpp/cpp and mallocTracker.inline.hpp
>>>>> virtualMemoryTracker.hpp/cpp
>>>>> mallocSiteTable.hpp/cpp
>>>>>
>>>>> allocationSite.hpp
>>>>> nativeCallStack.hpp/cpp
>>>>> linkedlist.hpp
>>>>>
>>>>>
>>>>> Tests:
>>>>>   - JPRT
>>>>>   - NMT test suite
>>>>>   - vm.quick.testlist
>>>>>   - Kitchensink stability test for 16+ days
>>>>>   - FMW
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8028541
>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8028541/webrev.00/
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------

diff --git a/src/share/vm/memory/allocation.hpp b/src/share/vm/memory/allocation.hpp
--- a/src/share/vm/memory/allocation.hpp
+++ b/src/share/vm/memory/allocation.hpp
@@ -159,6 +159,10 @@
                                  // is not included as validate type)
 };
 
+// Need a comment here that if you add an enum type,you have to add it to nmtCommon.cpp
+// memory_type_names.
+
+// Why isn't MEMFLAGS type MemoryType ?
 typedef size_t MEMFLAGS;
 
 
diff --git a/src/share/vm/services/mallocSiteTable.hpp b/src/share/vm/services/mallocSiteTable.hpp
--- a/src/share/vm/services/mallocSiteTable.hpp
+++ b/src/share/vm/services/mallocSiteTable.hpp
@@ -171,6 +171,9 @@
   // since malloc call can come from C runtime linker.
   static MallocSiteHashtableEntry*   _table[table_size];
 
+  // Here's this number again.  Or is it the same as the other one I saw twice?
+  // Can you make this const DescriptiveName = all of this and reuse? or enum.
+
   // The memory for hashtable entry allocation stack object
   static size_t _hash_entry_allocation_stack[(sizeof(NativeCallStack)
                                             + sizeof(size_t)/2) / sizeof(size_t)];
diff --git a/src/share/vm/services/mallocTracker.hpp b/src/share/vm/services/mallocTracker.hpp
--- a/src/share/vm/services/mallocTracker.hpp
+++ b/src/share/vm/services/mallocTracker.hpp
@@ -264,6 +264,7 @@
 #define MAX_MALLOCSITE_TABLE_SIZE  ((size_t)(1 << 16))
 #define MAX_BUCKET_LENGTH          ((size_t)(1 << 8))
 #define MAX_MALLOC_SIZE            ((size_t)(K * K * K) - 1)
+   // This doesn't seem that large.  Can Arena chunks be this large?
 #endif  // _LP64
 
  public:
diff --git a/src/share/vm/services/memBaseline.cpp b/src/share/vm/services/memBaseline.cpp
--- a/src/share/vm/services/memBaseline.cpp
+++ b/src/share/vm/services/memBaseline.cpp
@@ -246,6 +246,7 @@
   }
 }
 
+// See below....
 void MemBaseline::malloc_sites_to_site_order() {
   if (_malloc_sites_order != by_site) {
     SortedLinkedList<MallocSite, compare_malloc_site, ResourceObj::ARENA>
@@ -278,6 +279,7 @@
   }
 }
 
+// Why is this twice?  Oh gosh, I see it.  There are 2 characters different.  Please refactor this. :)  Christian's suggestion sounds good.
 void MemBaseline::virtual_memory_sites_to_site_order() {
   if (_virtual_memory_sites_order != by_size) {
     SortedLinkedList<VirtualMemoryAllocationSite, compare_virtual_memory_site, ResourceObj::ARENA>
diff --git a/src/share/vm/services/memTracker.hpp b/src/share/vm/services/memTracker.hpp
--- a/src/share/vm/services/memTracker.hpp
+++ b/src/share/vm/services/memTracker.hpp
@@ -86,6 +86,8 @@
 class MemBaseline;
 class Mutex;
 
+// Tracker was some sort of special MemTracker class iirc.  Why would you use Tracker
+// and not MemTracker?
 class Tracker : public StackObj {
  public:
   enum TrackerType {
@@ -98,6 +100,7 @@
   void record(address addr, size_t size);
  private:
   enum TrackerType  _type;
+  // Some explanation of why you are taking ThreadCritical would be good here.
   ThreadCritical    _tc;
 };
 
@@ -163,6 +166,9 @@
     MEMFLAGS flag = mtNone) {
     if (tracking_level() < NMT_summary) return;
     if (addr != NULL) {
+      // Thread critical lock is used because this code can be called by the compiler
+      // thread and in other native jvm code which does not respect safepoints
+      // Can you really do all of this while holding TC lock?
       ThreadCritical tc;
       VirtualMemoryTracker::add_reserved_region((address)addr, size, stack, flag);
     }
@@ -224,6 +230,8 @@
     }
   }
 
+  // What do you lock with query_lock now?  It's only in NMT dcmd now. I think you
+  // should move this lock to that class since that's the remaining use.
   static inline Mutex* query_lock() { return _query_lock; }
 
   // Make a final report and shutdown.
diff --git a/src/share/vm/services/virtualMemoryTracker.cpp b/src/share/vm/services/virtualMemoryTracker.cpp
--- a/src/share/vm/services/virtualMemoryTracker.cpp
+++ b/src/share/vm/services/virtualMemoryTracker.cpp
@@ -26,13 +26,17 @@
 #include "runtime/threadCritical.hpp"
 #include "services/virtualMemoryTracker.hpp"
 
+// Holds latest snapshot of virtual memory SortedLinkedList.
+// How do we know this size?  You have this twice!!
 size_t VirtualMemorySummary::_snapshot[(sizeof(VirtualMemorySnapshot) + sizeof(size_t) / 2) / sizeof(size_t)];
 
 void VirtualMemorySummary::initialize() {
   assert(sizeof(_snapshot) >= sizeof(VirtualMemorySnapshot), "Sanity Check");
+  // Use placement global operator new to static data area.
   ::new ((void*)_snapshot) VirtualMemorySnapshot();
 }
 
+
 SortedLinkedList<ReservedMemoryRegion, compare_reserved_region_base> VirtualMemoryTracker::_reserved_regions;
 
 int compare_committed_region(const CommittedMemoryRegion& r1, const CommittedMemoryRegion& r2) {
@@ -398,13 +402,16 @@
   return true;
 }
 
-
+// When do you transition?
+// ie. Using the dcmd interface, user can transition from more or to less tracking.
 bool VirtualMemoryTracker::transition(NMT_TrackingLevel from, NMT_TrackingLevel to) {
   if (from == NMT_minimal) {
     assert(to == NMT_summary || to == NMT_detail, "Just check");
     VirtualMemorySummary::reset();
   } else if (to == NMT_minimal) {
     assert(from == NMT_summary || from == NMT_detail, "Just check");
+    // Do you reset or clear virtual memory summary too?
+    // Is this only at safepoint?  Can you be adding a reserved region while this is being cleared?
     ThreadCritical tc;
     _reserved_regions.clear();
   }
diff --git a/src/share/vm/services/virtualMemoryTracker.hpp b/src/share/vm/services/virtualMemoryTracker.hpp
--- a/src/share/vm/services/virtualMemoryTracker.hpp
+++ b/src/share/vm/services/virtualMemoryTracker.hpp
@@ -36,7 +36,7 @@
 
 
 /*
- * Virtual memory counter
+ * Virtual memory counter  (rename this VirtualMemoryCounter then)
  */
 class VirtualMemory VALUE_OBJ_CLASS_SPEC {
  private:
@@ -71,7 +71,7 @@
   inline size_t committed() const { return _committed; }
 };
 
-// Virtual memory allocation site, where the virtual memory is reserved.
+// Virtual memory allocation site, keeps track where the virtual memory is reserved.
 class VirtualMemoryAllocationSite : public AllocationSite<VirtualMemory> {
  public:
   VirtualMemoryAllocationSite(const NativeCallStack& stack) :
@@ -87,6 +87,10 @@
 
 class VirtualMemorySummary;
 
+// This class represents a snapshot of virtual memory at a given time.
+// The latest snapshot is saved in a static area and the current snapshot is allocated
+// in a resource area for printing the comparison.
+
 class VirtualMemorySnapshot : public ResourceObj {
   friend class VirtualMemorySummary;
  public:
@@ -147,6 +151,7 @@
     as_snapshot()->by_type(flag)->release_memory(size);
   }
 
+  // Why do you move the reserved and committed memory???
   static inline void move_reserved_memory(MEMFLAGS from, MEMFLAGS to, size_t size) {
     as_snapshot()->by_type(from)->release_memory(size);
     as_snapshot()->by_type(to)->reserve_memory(size);
@@ -170,6 +175,7 @@
   }
 
  private:
+  // What is this calculation?  It appears twice!
   static size_t _snapshot[(sizeof(VirtualMemorySnapshot) + sizeof(size_t) / 2) / sizeof(size_t)];
 };
 
@@ -295,7 +301,6 @@
 
   NativeCallStack        _stack;
   MEMFLAGS         _flag;
-
   bool             _all_committed;
  public:
   ReservedMemoryRegion(address base, size_t size, const NativeCallStack& stack,
@@ -304,6 +309,7 @@
     _all_committed(false) { }
 
 
+  // Should this constructor initialize _all_committed?
   ReservedMemoryRegion(address base, size_t size) :
     VirtualMemoryRegion(base, size), _stack(emptyStack), _flag(mtNone) { }
 
@@ -336,8 +342,8 @@
 
   size_t  committed_size() const;
 
-  // move committed regions that higher than specified address to
-  // the new region
+  // move committed regions that higher are than specified address to
+  // the new region (why? what is this for?)
   void    move_committed_regions(address addr, ReservedMemoryRegion& rgn);
 
   inline bool all_committed() const { return _all_committed; }
@@ -362,7 +368,7 @@
   }
 
  private:
-  // The committed region contains the uncommitted region, substract the uncommitted
+  // The committed region contains the uncommitted region, subtract the uncommitted
   // region from this committed region
   bool remove_uncommitted_region(LinkedListNode<CommittedMemoryRegion>* node,
     address addr, size_t sz);
@@ -381,6 +387,8 @@
 };
 
 
+// Main class called from MemTracker to track virtual memory allocations, commits and
+// deallocations.
 class VirtualMemoryTracker : AllStatic {
  public:
   static bool initialize(NMT_TrackingLevel level);
@@ -393,6 +401,7 @@
   static bool remove_released_region    (address base_addr, size_t size);
   static void set_reserved_region_type  (address addr, MEMFLAGS flag);
 
+  // called by MemBaseline to create the snapshot?
   static bool walk_virtual_memory(VirtualMemoryWalker* walker);
 
   static bool transition(NMT_TrackingLevel from, NMT_TrackingLevel to);