RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Fri Aug 23 17:21:55 UTC 2019


Finally got some time to work on this issue.
Since I have encountered problem using files for passing messages 
between a container and a test driver (due to permissions), I looked for 
alternative solutions. I am using the output of a container process to 
signal when the main method has started, and it works. This simplifies 
things quite a bit as well.

Normally, we use OutputAnalyzer test utility to collect the whole output 
once the process has completed, and then analyze the resulting output 
for "contains some string", match, etc. However, testutils/ProcessTools 
provides an API to consume the output as it is produced. I am using this 
API to detect when the main() method of the container has started.

Updated webrev:
     http://cr.openjdk.java.net/~mseledtsov/8228960.02/


Testing:

   Ran the test on Linux-x64, various multiple nodes in a test cluster 
50 times - All PASS


Thank you,

Misha

On 8/13/19 2:05 PM, Bob Vandette wrote:
>
>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>
>>
>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>
>>>> Hi Bob,
>>>>
>>>>    The workdir (JTwork/scratch) is created with the "test user" permissions. Let me try to place the "signal" file in /tmp instead, since /tmp should normally have a 777 permission on Linux.
>>> Aren’t you creating a file inside a docker container and then checking for its existence outside of the container?
>> Correct
>>> Isn’t the root user running inside the container?
>> By default it is. But it still fails to create a file, for some reason. Can be related to selinux settings (for instance, see this article: https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), I can not change those.
> Is your JTWork/scratch on an NFS mounted file system?  If this is the case then the problem is that root is equivalent to nobody on
> mounted file systems and can’t create files unless the directory has 777 permissions.  I just confirmed this.  You’d have to either run
> the container test as test-user or change the scratch directory permission.
>
> Bob.
>
>> My hope is that /tmp is configured to be accessed by a container engine as a general purpose directory, hence I was thinking to try it out.
>>
>>> Both processes don’t see the same /tmp right?   So that shouldn’t help.
>> In my next experiment, I will map a /tmp from host to be a /host-tmp inside the container (--volume /tmp:/host-tmp), then write a signal file to /host-tmp.
>>> If scratch has 777 permissions, anyone can create a file.
>> scratch has  "rwxr-xr-x"
>>> You have to be careful that you can clean up the
>>> file from outside the container.  I’d make sure to create it with 777.
>> I do use deleteOnExit(), so it should work (unless the JVM crashes). I guess I could add extra layer of safety here, and set the permissions to 777. Thank you for advice.
>>
>>
>> Thank you,
>>
>> Misha
>>
>>> Bob.
>>>
>>>> If this works, I will have to add some unique number to the file name, perhaps a PID of a child process.
>>>>
>>>> I will try this, and let you know how it works.
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Misha
>>>>
>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>> Sorry, I just looked at the webrev and you are trying the approach I suggested.  I thought you
>>>>> were trying to use file change notification.
>>>>>
>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>
>>>>> Bob.
>>>>>
>>>>>
>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette <bob.vandette at oracle.com> wrote:
>>>>>>
>>>>>> What if you just poll for the creation of the file waiting some small amount of time between polling with a maximum timeout.
>>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>
>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>
>>>>>>> Unfortunately, this approach does not seem to work on many of our test cluster machines. The creation of a "signal" file results in "PermissionDenied".
>>>>>>>
>>>>>>> The possible reason is the selinux configuration, or some other permission related stuff. The container tries to create a new file on a mounted volume on a host system, but host system denies it. I will look a bit deeper into this, but I think this type of issue can be encountered on any automated test system. Hence, we may have to abandon this approach.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>>
>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>> Here is an updated webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>
>>>>>>>> I am using a simple file-based mechanism to communicate between the processes. The "EventGeneratorLoop" process creates a specific "signal" file on a shared mounted volume, while the main test process waits  for the file to exist before running the test cases.
>>>>>>>>
>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test cluster is in progress.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Misha
>>>>>>>>
>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>
>>>>>>>>>>    Thank you for reviewing the code.
>>>>>>>>>>
>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>> Can’t you come up with a better way of synchronizing the test by possibly writing a
>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>> I will try out this approach.
>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing serviceability-dev.
>>>>>>>>>
>>>>>>>>> But I'm pretty sure they recently addressed a similar issue with the premature sending of the attach signal?
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Misha
>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>> processes?
>>>>>>>>>>>
>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I can only hope we’re getting
>>>>>>>>>>> to the end.
>>>>>>>>>>>
>>>>>>>>>>> Bob.
>>>>>>>>>>>
>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin Gehwolf<sgehwolf at redhat.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>> Please review this change that fixes a container test TestJcmdWithSideCar.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My investigation indicated that a root cause for this failure is:
>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main class has not
>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>> The target test JVM has started, it is initializing, but has not loaded
>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>
>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, with a short
>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an alternative.
>>>>>>>>>>>>
>>>>>>>>>>>>> Also I have commented out the testCase02() due to another bug:
>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test and skip a
>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>
>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>      Webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Severin
>>>>>>>>>>>>


More information about the hotspot-runtime-dev mailing list