RFR(S): 8228960: [TESTBUG] containers/docker/TestJcmdWithSideCar.java: jcmd reports main class as 'Unknown'

David Holmes david.holmes at oracle.com
Mon Aug 26 07:57:46 UTC 2019


Hi Misha,

On 24/08/2019 3:21 am, mikhailo.seledtsov at oracle.com wrote:
> Finally got some time to work on this issue.
> Since I have encountered problem using files for passing messages 
> between a container and a test driver (due to permissions), I looked for 
> alternative solutions. I am using the output of a container process to 
> signal when the main method has started, and it works. This simplifies 
> things quite a bit as well.
> 
> Normally, we use OutputAnalyzer test utility to collect the whole output 
> once the process has completed, and then analyze the resulting output 
> for "contains some string", match, etc. However, testutils/ProcessTools 
> provides an API to consume the output as it is produced. I am using this 
> API to detect when the main() method of the container has started.

That seems reasonable. Do we want to make the following change to 
minimise unneeded output processing:

          private Consumer<String> outputConsumer = s -> {
!            if (!mainMethodStarted && 
s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
                  System.out.println("MainContainer: setting 
mainMethodStarted");
                  mainMethodStarted = true;
              }
          };

> Updated webrev:
>      http://cr.openjdk.java.net/~mseledtsov/8228960.02/

Otherwise looks okay. Hopefully those other test cases will be enabled 
in the not too distant future.

Thanks,
David
-----

> 
> Testing:
> 
>    Ran the test on Linux-x64, various multiple nodes in a test cluster 
> 50 times - All PASS
> 
> 
> Thank you,
> 
> Misha
> 
> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>
>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledtsov at oracle.com wrote:
>>>
>>>
>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>
>>>>> Hi Bob,
>>>>>
>>>>>    The workdir (JTwork/scratch) is created with the "test user" 
>>>>> permissions. Let me try to place the "signal" file in /tmp instead, 
>>>>> since /tmp should normally have a 777 permission on Linux.
>>>> Aren’t you creating a file inside a docker container and then 
>>>> checking for its existence outside of the container?
>>> Correct
>>>> Isn’t the root user running inside the container?
>>> By default it is. But it still fails to create a file, for some 
>>> reason. Can be related to selinux settings (for instance, see this 
>>> article: 
>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), 
>>> I can not change those.
>> Is your JTWork/scratch on an NFS mounted file system?  If this is the 
>> case then the problem is that root is equivalent to nobody on
>> mounted file systems and can’t create files unless the directory has 
>> 777 permissions.  I just confirmed this.  You’d have to either run
>> the container test as test-user or change the scratch directory 
>> permission.
>>
>> Bob.
>>
>>> My hope is that /tmp is configured to be accessed by a container 
>>> engine as a general purpose directory, hence I was thinking to try it 
>>> out.
>>>
>>>> Both processes don’t see the same /tmp right?   So that shouldn’t help.
>>> In my next experiment, I will map a /tmp from host to be a /host-tmp 
>>> inside the container (--volume /tmp:/host-tmp), then write a signal 
>>> file to /host-tmp.
>>>> If scratch has 777 permissions, anyone can create a file.
>>> scratch has  "rwxr-xr-x"
>>>> You have to be careful that you can clean up the
>>>> file from outside the container.  I’d make sure to create it with 777.
>>> I do use deleteOnExit(), so it should work (unless the JVM crashes). 
>>> I guess I could add extra layer of safety here, and set the 
>>> permissions to 777. Thank you for advice.
>>>
>>>
>>> Thank you,
>>>
>>> Misha
>>>
>>>> Bob.
>>>>
>>>>> If this works, I will have to add some unique number to the file 
>>>>> name, perhaps a PID of a child process.
>>>>>
>>>>> I will try this, and let you know how it works.
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Misha
>>>>>
>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>> Sorry, I just looked at the webrev and you are trying the approach 
>>>>>> I suggested.  I thought you
>>>>>> were trying to use file change notification.
>>>>>>
>>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>
>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette 
>>>>>>> <bob.vandette at oracle.com> wrote:
>>>>>>>
>>>>>>> What if you just poll for the creation of the file waiting some 
>>>>>>> small amount of time between polling with a maximum timeout.
>>>>>>>
>>>>>>> Bob.
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>
>>>>>>>> Unfortunately, this approach does not seem to work on many of 
>>>>>>>> our test cluster machines. The creation of a "signal" file 
>>>>>>>> results in "PermissionDenied".
>>>>>>>>
>>>>>>>> The possible reason is the selinux configuration, or some other 
>>>>>>>> permission related stuff. The container tries to create a new 
>>>>>>>> file on a mounted volume on a host system, but host system 
>>>>>>>> denies it. I will look a bit deeper into this, but I think this 
>>>>>>>> type of issue can be encountered on any automated test system. 
>>>>>>>> Hence, we may have to abandon this approach.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Misha
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>> Here is an updated webrev: 
>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>
>>>>>>>>> I am using a simple file-based mechanism to communicate between 
>>>>>>>>> the processes. The "EventGeneratorLoop" process creates a 
>>>>>>>>> specific "signal" file on a shared mounted volume, while the 
>>>>>>>>> main test process waits  for the file to exist before running 
>>>>>>>>> the test cases.
>>>>>>>>>
>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test 
>>>>>>>>> cluster is in progress.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>
>>>>>>>>>>>    Thank you for reviewing the code.
>>>>>>>>>>>
>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the 
>>>>>>>>>>>> test by possibly writing a
>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>> I will try out this approach.
>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing 
>>>>>>>>>> serviceability-dev.
>>>>>>>>>>
>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue 
>>>>>>>>>> with the premature sending of the attach signal?
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Misha
>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>> processes?
>>>>>>>>>>>>
>>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I can 
>>>>>>>>>>>> only hope we’re getting
>>>>>>>>>>>> to the end.
>>>>>>>>>>>>
>>>>>>>>>>>> Bob.
>>>>>>>>>>>>
>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin 
>>>>>>>>>>>>> Gehwolf<sgehwolf at redhat.com>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, 
>>>>>>>>>>>>> mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>>> Please review this change that fixes a container test 
>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My investigation indicated that a root cause for this 
>>>>>>>>>>>>>> failure is:
>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main 
>>>>>>>>>>>>>> class has not
>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but 
>>>>>>>>>>>>>> has not loaded
>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, 
>>>>>>>>>>>>>> with a short
>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an 
>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I have commented out the testCase02() due to another 
>>>>>>>>>>>>>> bug:
>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test 
>>>>>>>>>>>>>> and skip a
>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>      Webrev: 
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>


More information about the serviceability-dev mailing list