RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Mon Aug 26 19:32:36 UTC 2019


Hi David,

   Thank you for review.

On 8/26/19 12:57 AM, David Holmes wrote:
> Hi Misha,
>
> On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
>> Finally got some time to work on this issue.
>> Since I have encountered problem using files for passing messages 
>> between a container and a test driver (due to permissions), I looked 
>> for alternative solutions. I am using the output of a container 
>> process to signal when the main method has started, and it works. 
>> This simplifies things quite a bit as well.
>>
>> Normally, we use OutputAnalyzer test utility to collect the whole 
>> output once the process has completed, and then analyze the resulting 
>> output for "contains some string", match, etc. However, 
>> testutils/ProcessTools provides an API to consume the output as it is 
>> produced. I am using this API to detect when the main() method of the 
>> container has started.
>
> That seems reasonable. Do we want to make the following change to 
> minimise unneeded output processing:
>
>          private Consumer<String> outputConsumer = s -> {
> !            if (!mainMethodStarted && 
> s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
>                  System.out.println("MainContainer: setting 
> mainMethodStarted");
>                  mainMethodStarted = true;
>              }
>          };
Thank you for the suggestion. I will update the code accordingly.
>
>> Updated webrev:
>>      http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>
> Otherwise looks okay. Hopefully those other test cases will be enabled 
> in the not too distant future.

I hope so as well.


Thank you,

Misha

>
> Thanks,
> David
> -----
>
>>
>> Testing:
>>
>>    Ran the test on Linux-x64, various multiple nodes in a test 
>> cluster 50 times - All PASS
>>
>>
>> Thank you,
>>
>> Misha
>>
>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>
>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>
>>>>
>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>
>>>>>> Hi Bob,
>>>>>>
>>>>>>    The workdir (JTwork/scratch) is created with the "test user" 
>>>>>> permissions. Let me try to place the "signal" file in /tmp 
>>>>>> instead, since /tmp should normally have a 777 permission on Linux.
>>>>> Aren’t you creating a file inside a docker container and then 
>>>>> checking for its existence outside of the container?
>>>> Correct
>>>>> Isn’t the root user running inside the container?
>>>> By default it is. But it still fails to create a file, for some 
>>>> reason. Can be related to selinux settings (for instance, see this 
>>>> article: 
>>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), 
>>>> I can not change those.
>>> Is your JTWork/scratch on an NFS mounted file system?  If this is 
>>> the case then the problem is that root is equivalent to nobody on
>>> mounted file systems and can’t create files unless the directory has 
>>> 777 permissions.  I just confirmed this.  You’d have to either run
>>> the container test as test-user or change the scratch directory 
>>> permission.
>>>
>>> Bob.
>>>
>>>> My hope is that /tmp is configured to be accessed by a container 
>>>> engine as a general purpose directory, hence I was thinking to try 
>>>> it out.
>>>>
>>>>> Both processes don’t see the same /tmp right?   So that shouldn’t 
>>>>> help.
>>>> In my next experiment, I will map a /tmp from host to be a 
>>>> /host-tmp inside the container (--volume /tmp:/host-tmp), then 
>>>> write a signal file to /host-tmp.
>>>>> If scratch has 777 permissions, anyone can create a file.
>>>> scratch has  "rwxr-xr-x"
>>>>> You have to be careful that you can clean up the
>>>>> file from outside the container.  I’d make sure to create it with 
>>>>> 777.
>>>> I do use deleteOnExit(), so it should work (unless the JVM 
>>>> crashes). I guess I could add extra layer of safety here, and set 
>>>> the permissions to 777. Thank you for advice.
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Misha
>>>>
>>>>> Bob.
>>>>>
>>>>>> If this works, I will have to add some unique number to the file 
>>>>>> name, perhaps a PID of a child process.
>>>>>>
>>>>>> I will try this, and let you know how it works.
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Misha
>>>>>>
>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>> Sorry, I just looked at the webrev and you are trying the 
>>>>>>> approach I suggested.  I thought you
>>>>>>> were trying to use file change notification.
>>>>>>>
>>>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>>>
>>>>>>> Bob.
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette 
>>>>>>>> <bob.vandette at oracle.com> wrote:
>>>>>>>>
>>>>>>>> What if you just poll for the creation of the file waiting some 
>>>>>>>> small amount of time between polling with a maximum timeout.
>>>>>>>>
>>>>>>>> Bob.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>
>>>>>>>>> Unfortunately, this approach does not seem to work on many of 
>>>>>>>>> our test cluster machines. The creation of a "signal" file 
>>>>>>>>> results in "PermissionDenied".
>>>>>>>>>
>>>>>>>>> The possible reason is the selinux configuration, or some 
>>>>>>>>> other permission related stuff. The container tries to create 
>>>>>>>>> a new file on a mounted volume on a host system, but host 
>>>>>>>>> system denies it. I will look a bit deeper into this, but I 
>>>>>>>>> think this type of issue can be encountered on any automated 
>>>>>>>>> test system. Hence, we may have to abandon this approach.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>> Here is an updated webrev: 
>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>
>>>>>>>>>> I am using a simple file-based mechanism to communicate 
>>>>>>>>>> between the processes. The "EventGeneratorLoop" process 
>>>>>>>>>> creates a specific "signal" file on a shared mounted volume, 
>>>>>>>>>> while the main test process waits  for the file to exist 
>>>>>>>>>> before running the test cases.
>>>>>>>>>>
>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test 
>>>>>>>>>> cluster is in progress.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>> Misha
>>>>>>>>>>
>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>
>>>>>>>>>>>>    Thank you for reviewing the code.
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the 
>>>>>>>>>>>>> test by possibly writing a
>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing 
>>>>>>>>>>> serviceability-dev.
>>>>>>>>>>>
>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue 
>>>>>>>>>>> with the premature sending of the attach signal?
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Misha
>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>
>>>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I 
>>>>>>>>>>>>> can only hope we’re getting
>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin 
>>>>>>>>>>>>>> Gehwolf<sgehwolf at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, 
>>>>>>>>>>>>>> mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>>> Please review this change that fixes a container test 
>>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My investigation indicated that a root cause for this 
>>>>>>>>>>>>>>> failure is:
>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main 
>>>>>>>>>>>>>>> class has not
>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but 
>>>>>>>>>>>>>>> has not loaded
>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, 
>>>>>>>>>>>>>>> with a short
>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an 
>>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to 
>>>>>>>>>>>>>>> another bug:
>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to 
>>>>>>>>>>>>>>> s.j.h.oops.Instance",
>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the 
>>>>>>>>>>>>>>> test and skip a
>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>>      Webrev: 
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>


More information about the serviceability-dev mailing list