RFR: 8212879: Make JVMTI TagMap table concurrent [v4]

Thu Nov 5 12:30:01 UTC 2020

On Wed, 4 Nov 2020 10:05:29 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> The comment is trying to describe the situation like:
>> 1. mark-end pause (WeakHandle.peek() returns NULL because object A is unmarked)
>> 2. safepoint for heap walk
>>    2a. Need to post ObjectFree event for object A before the heap walk doesn't find object A.
>> 3. gc_notification - would have posted an ObjectFree event for object A if the heapwalk hadn't intervened
>> 
>> The check_hashmap() function also checks whether the hash table needs to be rehashed before the next operation that uses the hashtable.
>> 
>> Both operations require the table to be locked.
>> 
>> The unlink and post needs to be in a GC pause for reasons that I stated above.  The unlink and post were done in a GC pause so this isn't worse for any GCs.  The lock can be held for concurrent GC while the number of entries are processed and this would be a delay for some applications that have requested a lot of tags, but these applications have asked for this and it's not worse than what we had with GC walking this table in safepoints.
>
> For the GCs that call the num_dead notification in a pause it is much worse than what we had. As I pointed out elsewhere, it used to be that tagmap processing was all-in-one, as a single serial subtask taken by the first thread that reached it in WeakProcessor processing. Other threads would find that subtask taken and move on to processing oopstores in parallel with the tagmap processing. Now everything except the oopstorage-based clearing of dead entries is a single threaded serial task done by the VMThread, after all the parallel WeakProcessor work is done, because that's where the num-dead callbacks are invoked. WeakProcessor's parallel oopstorage processing doesn't have a way to do the num-dead callbacks by the last thread out of each parallel oopstorage processing. Instead it's left to the end, on the assumption that the callbacks are relatively cheap.  But that could still be much worse than the old code, since the tagmap oopstorage could be late in the order of processing, 
 and so still effectively be a serial subtask after all the parallel subtasks are done or mostly done.

Yes, you are right that the processing will be done serially and not by a parallel worker thread.  This is could spawn a new GC worker thread to process the posts, as you suggest.  We could do that if we find a customer that has a complaint about the pause time of this processing.

>> The JVMTI code expects the posting to be done quite eagerly presumably during GC, before it has a chance to disable the event or some other such operation.  So the posting is done during the notification because it's as soon as possible.  Deferring to the ServiceThread had two problems.  1. the event came later than the caller is expecting it, and in at least one test the event was disabled before posting, and 2. there's a comment in the code why we can't post events with a JavaThread.  We'd have to transition into native while holding a no safepoint lock (or else deadlock).  The point of making this change was so that the JVMTI table does not need GC code to serially process the table.
>
> I know of nothing that leads to "presumably during GC" being a requirement. Having all pending events of some type occur before that type of event is disabled seems like a reasonable requirement, but just means that event disabling also requires the table to be "up to date", in the sense that any GC-cleared entries need to be dealt with. That can be handled just like other operations that use the table contents, rather than during the GC.  That is, use post_dead_object_on_vm_thread if there are or might be any pending dead objects, before disabling the event.

Ok, so there were many test failures with other approaches.  Having GC trigger the posting was the most reliable way to post the events when the tests (and presumably the jvmti customers) expected the events to be posted.  We could revisit during event disabling if a customer complains about GC pause times.

-------------

PR: https://git.openjdk.java.net/jdk/pull/967