RFR 8222671 : thread_large/thread_large.java times out on MacOSX

Thu Aug 27 02:08:48 UTC 2020

Hi Gerard,

I would have preferred the @bug to remain for traceability purposes 
(without having to check repo history).

I would also have kept the main/othervm as this is something of a stress 
test and I would be concerned that the hosting VM might be de-stabilised 
in some way when a huge number of threads are created.

But I don't insist on any further changes and this test has already 
consumed far too many resources IMO so feel free to push as-is.

Thanks,
David

On 20/08/2020 5:55 am, gerard ziemski wrote:
> hi Igor,
> 
> I have responses to your questions inline:
> 
> bug link at https://bugs.openjdk.java.net/browse/JDK-8222671
> open webrev at http://cr.openjdk.java.net/~gziemski/8222671_rev5
> testing: Mach5 hs-tier1,2,3,4,5 in progress...
> 
> 
> On 8/19/20 12:25 PM, Igor Ignatyev wrote:
>> Hi Gerard,
>>
>> please see my comments inline.
>>
>> -- Igor
>>
>>> On Aug 19, 2020, at 9:53 AM, gerard ziemski 
>>> <gerard.ziemski at oracle.com> wrote:
>>>
>>> hi Igor,
>>>
>>> Thank you for the review. Here is the webrev, but please see my 
>>> comments inline for more:
>>>
>>> bug link at https://bugs.openjdk.java.net/browse/JDK-8222671
>>> open webrev at http://cr.openjdk.java.net/~gziemski/8222671_rev4
>>> testing: Mach5 hs-tier1,2,3,4,5 in progress...
>>>
>>>
>>>
>>> On 8/18/20 12:18 PM, Igor Ignatyev wrote:
>>>> Hi Gerard,
>>>>
>>>> looks be
>>>> 1.
>>>>>    26 * @bug 8222671
>>>> @bug tag should refer to a product bug which can be reproduced w/ 
>>>> this test, not the bug id used to push a test.
>>> The bug 8222671 is the issue here, i.e. timeouts in the existing 
>>> test, which is what we are trying to fix, so we have a new test that 
>>> replaces the old and less robust one. I did try to find an issue that 
>>> originated the old test and I couldn’t, but if you know how to find, 
>>> I can add a reference to it in the new test. To me, the old test 
>>> looks more like a general functionality test though.
>> I'd just remove @bug tag, b/c from my point of view listing all test 
>> bugs which caused modification of the test in @bug is just useless and 
>> pollute metadata. others have other opinion though, so I don't insist 
>> on removal.
> You are the expert here, removed.
> 
> 
>>>> 2.
>>>>>    28 * @run main/othervm ThreadCountLimit
>>>> what's the reason to run in othervm mode?
>>> Oh, I just copied that from the existing test, without giving it any 
>>> thought - I will remove it.
>> in hotspot tests, we tend not to relay on implicit @run (as it's error 
>> prone) and always have explicit `@run`, that's to say, please add 
>> `@run main ThreadCountLimit`
> Is there a template for a new test, or a wiki page that explains all 
> this? I learned quite a bit writing this test, but it would be nice to 
> have some documentation I could use as a reference in the future.
> 
> 
>>>> 3.
>>>>>    34 public class ThreadCountLimit extends Object {
>>>> you don't need to explicitly extend Object.
>>> True, but I like the code explicit like that if that’s OK.
>> fine w/ me, but it's kinda against regular java code guidelines and 
>> doesn't bring any information.
> Fixed.
> 
> 
>>>> 4. in ThreadCountLimit::run()
>>>>>    67       } catch (InterruptedException ex) {
>>>>>    68       }
>>>> shouldn't this mark the test as failed?
>>> Good question. In a perfect world we would want to know if a thread 
>>> got unexpectedly interrupted, but realistically a process might 
>>> (will?) exhibit spurious interruptions from the OS sooner or later, 
>>> which are not the focus of this test, so instead of marking it a 
>>> failure and making someone look into it, only later to determine that 
>>> it was just a spurious interrupt, we simply ignore it and focus on 
>>> the core goal of this test instead.
>> I'm not sure what you are refer here by 'spurious interruptions', java 
>> threads must be interrupted only if someone interrupted them, all 
>> other behaviors are JVM bugs and are to be analyzed. given the goal of 
>> this test is to check how JVM behaves when lots of threads have been 
>> created and started, I see catching unexpected InterruptedException as 
>> a valid failure mode of this test.
> I think I might be mixing up native wait/notify on mutexes here, where a 
> wait on a native mutex can be spuriously awaken.
> 
> According to 
> https://www.ibm.com/developerworks/library/j-jtp05236/index.html Thread 
> interruption is a cooperative mechanism, and since we are not 
> interrupting any of the blocking methods we use in the test ourselves, 
> it would be really curious if we ever did get InterruptedException, so 
> it might be worth it to report it.
> 
> I tightened up the blocking code and its try/catch code block, to make 
> it to clear what is the subject to interrupts and we will now throw an 
> Error if we ever catch one (never?) Have we ever seen one of those 
> unexpected interrupts in our testing so far?
> 
> 
> Thank you for the feedback.
> 
> 
> cheers
> 
>>>
>>> I do see, however, that in one place I report it as Error and ignore 
>>> it in the other instance, so I fixed that inconsistency. I added 
>>> comments as well.
>>>
>>>> Thanks,
>>>> -- Igor
>>>>
>>>>> On Aug 18, 2020, at 7:38 AM, gerard ziemski 
>>>>> <gerard.ziemski at oracle.com> wrote:
>>>>>
>>>>> Thank you David for the review!
>>>>>
>>>>> I incorporated all your feedback.
>>>>>
>>>>> I'm surprised that "assert" is not available by default in our 
>>>>> tests though - what is a reason for that? Of all the places where 
>>>>> asserts should be available, a testing suite would seem to be it to 
>>>>> me.
>>>>>
>>>>> bug link at https://bugs.openjdk.java.net/browse/JDK-8222671
>>>>> open webrev at http://cr.openjdk.java.net/~gziemski/8222671_rev3
>>>>> testing: passes locally, Mach5 hs-tier1,2,3,4,5 in progress...
>>>>>
>>>>>
>>>>> cheers
>>>>>
>>>>> On 8/18/20 12:15 AM, David Holmes wrote:
>>>>>> Hi Gerard,
>>>>>>
>>>>>> On 18/08/2020 1:28 am, gerard ziemski wrote:
>>>>>>> Thank you David and Igor for the feedback!
>>>>>>>
>>>>>>> I rewrote the test again, this time basing it off 
>>>>>>> test/hotspot/jtreg/runtime/Thread/* tests.
>>>>>>>
>>>>>>> For macOS and Linux we create threads until we hit a process 
>>>>>>> limit thread count. Windows, however, allows unlimited (?) number 
>>>>>>> of threads, though the process gradually slows down as the count 
>>>>>>> grows, so I had to add time limit and I set it to 5 sec. I chose 
>>>>>>> 5 sec because the value is well below jtreg time out (i.e. 120s), 
>>>>>>> but still quite above how long my laptop needs to hit the process 
>>>>>>> thread count limit (i.e. my Mac laptop needs about 1 sec to hit 
>>>>>>> that limit).
>>>>>>>
>>>>>>> Otherwise there are no other hardwired values or assumptions (I 
>>>>>>> hope). What do you think of the test now?
>>>>>> Not unreasonable :) It could still have tried for an upper limit 
>>>>>> of 10000 instead of using the time-limit, but it makes no real 
>>>>>> difference.
>>>>>>
>>>>>> A few nits/suggestions:
>>>>>>
>>>>>> Please add an @bug line to the test comment.
>>>>>>
>>>>>>   49         if ((this.index % 250) == 0) {
>>>>>>
>>>>>> Outside of the constructor it is poor style to use this.x to refer 
>>>>>> to fields methods (unless needed due to local variable shadowing). 
>>>>>> This applies throughout the test.
>>>>>>
>>>>>>   54         doWork();
>>>>>>
>>>>>> Not sure why you factored things out the way you did - everything 
>>>>>> could go in run().
>>>>>>
>>>>>>   61       setName(""+this.index);
>>>>>>
>>>>>> This could go in constructor, but I presume it adds something to 
>>>>>> the workload? Is the use of ""+index deliberate in that regard 
>>>>>> (generate string concatenation work), rather than using 
>>>>>> String.valueOf(index)?
>>>>>>
>>>>>>   65       assert(this.index == Integer.parseInt(getName()));
>>>>>>
>>>>>> We never run tests with Java asserts enabled so this is "dead" code.
>>>>>>
>>>>>>   87           // Windows path
>>>>>>
>>>>>> Windows path or system with very large ulimit?
>>>>>>
>>>>>>    92     } catch (OutOfMemoryError e) {
>>>>>>
>>>>>> I suggest checking this is actually the "unable to create native 
>>>>>> thread" OOME rather than a general Java OOME (if run with a small 
>>>>>> heap for example you might get OOME growing the ArrayList).
>>>>>>
>>>>>>   110     String cmd = System.getProperty("sun.java.command");
>>>>>>   111     if (cmd != null && 
>>>>>> !cmd.startsWith("com.sun.javatest.regtest.agent.MainWrapper")) {
>>>>>>   112       // Exit with success in a non-JavaTest environment:
>>>>>>   113       System.exit(0);
>>>>>>   114     }
>>>>>>
>>>>>> I never noticed when this was used in the little set of 
>>>>>> runtime/Thread tests! I don't know what the rationale was but it 
>>>>>> is completely unnecessary as a normal return from main would be 
>>>>>> the equivalent of exit(0) - so the tests work fine without 
>>>>>> System.exit when run standalone. Please don't emulate this part of 
>>>>>> the existing tests. I'll file a RFE to remove it from the other 
>>>>>> tests.
>>>>>>
>>>>>> I won't be around for any further follow up for a week, so feel 
>>>>>> free to proceed once Igor is okay with things.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> bug link at https://bugs.openjdk.java.net/browse/JDK-8222671
>>>>>>> open webrev at http://cr.openjdk.java.net/~gziemski/8222671_rev2
>>>>>>> testing: Mach5 hs-tier1,2,3,4,5 in progress...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/7/20 2:41 PM, Igor Ignatyev wrote:
>>>>>>>>> On Aug 6, 2020, at 12:33 AM, David Holmes 
>>>>>>>>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>>>>>>>>>
>>>>>>>>> Basing the new test around the existing thread stress tests was 
>>>>>>>>> not unreasonable but I don't think we're actively adding to 
>>>>>>>>> those tests. I would check with Igor whether it is better to 
>>>>>>>>> just add this under runtime/thread rather than vmTestBase.
>>>>>>>> Hi Gerard,
>>>>>>>>
>>>>>>>> David is right, we are trying to get rid of vmTestbase directory 
>>>>>>>> and move all tests into corresponding component directories 
>>>>>>>> cleaning them up at the same time. so runtime/thread directory 
>>>>>>>> sound like a much better location for a new test.
>>>>>>>>
>>>>>>>> I don't think you need to cary all this baggage of 'VM testbase 
>>>>>>>> keywords', 'converted from ...' and the rest, I'd suggest you 
>>>>>>>> just use the text from 'DESCRIPTION' as @summary. I'd also 
>>>>>>>> remove System.exit call and would just make thread009::run to 
>>>>>>>> return bool. it's also unclear to me if someone is ever going to 
>>>>>>>> change THREADS_EXPECTED , TIMEOUT, and DEBUG_MODE values, I'd 
>>>>>>>> just hardcode them in the test and get rid of parsing. it seems 
>>>>>>>> that debug output won't clutter output, so you might want to 
>>>>>>>> always print them and remove DEBUG_MODE field.
>>>>>>>>
>>>>>>>> using `assert` in test code isn't a good idea as it will work 
>>>>>>>> only if someone runs test w/ `-ea` which not all people do. it's 
>>>>>>>> better to replace w/ `if (p) throw new Error(...)`  or use 
>>>>>>>> jdk.test.lib.Asserts class.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> -- Igor
>