RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'

Bob Vandette bob.vandette at oracle.com
Thu Aug 29 17:55:20 UTC 2019


Misha,

Looks good.  A couple of nits.

1. You might want to remove hotspot_containers section of ProblemList.txt since there are no bugs listed.

2. Can you make this timeout a constant just like TIME_TO_RUN_MAIN_PROCESS

  73             mainContainer.waitForMainMethodStart(5*1000);

3. assertIsAlive() is not used except for the commented out tests.  Do you think you’ll ultimately use
this method or is this left over from previous attempts?

222         public void assertIsAlive() throws Exception {


Bob.


> On Aug 29, 2019, at 11:41 AM, mikhailo.seledtsov at oracle.com wrote:
> 
> I believe I need a second reviewer for this change. Could someone, please, review this change version 2 ? (David already reviewed it).
> 
> http://cr.openjdk.java.net/~mseledtsov/8228960.02/
> 
> 
> Thank you in advance,
> 
> Misha
> 
> 
> On 8/26/19 12:32 PM, mikhailo.seledtsov at oracle.com wrote:
>> Hi David,
>> 
>>   Thank you for review.
>> 
>> On 8/26/19 12:57 AM, David Holmes wrote:
>>> Hi Misha,
>>> 
>>> On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
>>>> Finally got some time to work on this issue.
>>>> Since I have encountered problem using files for passing messages between a container and a test driver (due to permissions), I looked for alternative solutions. I am using the output of a container process to signal when the main method has started, and it works. This simplifies things quite a bit as well.
>>>> 
>>>> Normally, we use OutputAnalyzer test utility to collect the whole output once the process has completed, and then analyze the resulting output for "contains some string", match, etc. However, testutils/ProcessTools provides an API to consume the output as it is produced. I am using this API to detect when the main() method of the container has started.
>>> 
>>> That seems reasonable. Do we want to make the following change to minimise unneeded output processing:
>>> 
>>>          private Consumer<String> outputConsumer = s -> {
>>> !            if (!mainMethodStarted && s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
>>>                  System.out.println("MainContainer: setting mainMethodStarted");
>>>                  mainMethodStarted = true;
>>>              }
>>>          };
>> Thank you for the suggestion. I will update the code accordingly.
>>> 
>>>> Updated webrev:
>>>>      http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>>> 
>>> Otherwise looks okay. Hopefully those other test cases will be enabled in the not too distant future.
>> 
>> I hope so as well.
>> 
>> 
>> Thank you,
>> 
>> Misha
>> 
>>> 
>>> Thanks,
>>> David
>>> -----
>>> 
>>>> 
>>>> Testing:
>>>> 
>>>>    Ran the test on Linux-x64, various multiple nodes in a test cluster 50 times - All PASS
>>>> 
>>>> 
>>>> Thank you,
>>>> 
>>>> Misha
>>>> 
>>>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>>> 
>>>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>> 
>>>>>> 
>>>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>> 
>>>>>>>> Hi Bob,
>>>>>>>> 
>>>>>>>>    The workdir (JTwork/scratch) is created with the "test user" permissions. Let me try to place the "signal" file in /tmp instead, since /tmp should normally have a 777 permission on Linux.
>>>>>>> Aren’t you creating a file inside a docker container and then checking for its existence outside of the container?
>>>>>> Correct
>>>>>>> Isn’t the root user running inside the container?
>>>>>> By default it is. But it still fails to create a file, for some reason. Can be related to selinux settings (for instance, see this article: https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), I can not change those.
>>>>> Is your JTWork/scratch on an NFS mounted file system?  If this is the case then the problem is that root is equivalent to nobody on
>>>>> mounted file systems and can’t create files unless the directory has 777 permissions.  I just confirmed this. You’d have to either run
>>>>> the container test as test-user or change the scratch directory permission.
>>>>> 
>>>>> Bob.
>>>>> 
>>>>>> My hope is that /tmp is configured to be accessed by a container engine as a general purpose directory, hence I was thinking to try it out.
>>>>>> 
>>>>>>> Both processes don’t see the same /tmp right?   So that shouldn’t help.
>>>>>> In my next experiment, I will map a /tmp from host to be a /host-tmp inside the container (--volume /tmp:/host-tmp), then write a signal file to /host-tmp.
>>>>>>> If scratch has 777 permissions, anyone can create a file.
>>>>>> scratch has  "rwxr-xr-x"
>>>>>>> You have to be careful that you can clean up the
>>>>>>> file from outside the container.  I’d make sure to create it with 777.
>>>>>> I do use deleteOnExit(), so it should work (unless the JVM crashes). I guess I could add extra layer of safety here, and set the permissions to 777. Thank you for advice.
>>>>>> 
>>>>>> 
>>>>>> Thank you,
>>>>>> 
>>>>>> Misha
>>>>>> 
>>>>>>> Bob.
>>>>>>> 
>>>>>>>> If this works, I will have to add some unique number to the file name, perhaps a PID of a child process.
>>>>>>>> 
>>>>>>>> I will try this, and let you know how it works.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> 
>>>>>>>> Misha
>>>>>>>> 
>>>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>>>> Sorry, I just looked at the webrev and you are trying the approach I suggested. I thought you
>>>>>>>>> were trying to use file change notification.
>>>>>>>>> 
>>>>>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>>>>> 
>>>>>>>>> Bob.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette <bob.vandette at oracle.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> What if you just poll for the creation of the file waiting some small amount of time between polling with a maximum timeout.
>>>>>>>>>> 
>>>>>>>>>> Bob.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Unfortunately, this approach does not seem to work on many of our test cluster machines. The creation of a "signal" file results in "PermissionDenied".
>>>>>>>>>>> 
>>>>>>>>>>> The possible reason is the selinux configuration, or some other permission related stuff. The container tries to create a new file on a mounted volume on a host system, but host system denies it. I will look a bit deeper into this, but I think this type of issue can be encountered on any automated test system. Hence, we may have to abandon this approach.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Misha
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>> Here is an updated webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>>> 
>>>>>>>>>>>> I am using a simple file-based mechanism to communicate between the processes. The "EventGeneratorLoop" process creates a specific "signal" file on a shared mounted volume, while the main test process waits  for the file to exist before running the test cases.
>>>>>>>>>>>> 
>>>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test cluster is in progress.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> 
>>>>>>>>>>>> Misha
>>>>>>>>>>>> 
>>>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>    Thank you for reviewing the code.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the test by possibly writing a
>>>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing serviceability-dev.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue with the premature sending of the attach signal?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I can only hope we’re getting
>>>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>>>>> Please review this change that fixes a container test TestJcmdWithSideCar.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> My investigation indicated that a root cause for this failure is:
>>>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main class has not
>>>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but has not loaded
>>>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, with a short
>>>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an alternative.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to another bug:
>>>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
>>>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test and skip a
>>>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>>>>      Webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>>> 



More information about the serviceability-dev mailing list