RFR: 8193710 - jcmd -l and jps commands do not list Java processes running in Docker containers

Fri Jan 12 15:02:28 UTC 2018

Looping in core-libs.

The following fix alters the module-info for java.base.  Can someone from the core-libs comment
on these changes?

I’d also like to remove the two PerfDataFile.getFile methods?  Since these are in jdk.internal.jvmstat, can
I just pull them or do they need to go through deprecation/CSR?

Thanks,
Bob.

> On Jan 12, 2018, at 8:59 AM, Bob Vandette <bob.vandette at oracle.com> wrote:
> 
> 
> 
>> On Jan 11, 2018, at 9:44 PM, David Holmes <david.holmes at oracle.com> wrote:
>> 
>> Hi Bob,
>> 
>> Some initial discussion and commentary. I'm a bit concerned by how much accommodating docker penalises code that is not using containers.
> 
> True but querying the number of Java processes is at least not in a performance critical area.
> 
>> 
>> On 12/01/2018 4:41 AM, Bob Vandette wrote:
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8193710
>>> Webrev:
>>> http://cr.openjdk.java.net/~bobv/8193710/webrev.00/
>>> Summary:
>>> This changeset enables the ability to use jcmd and jps running on a Host to
>>> list the java processes that are running in docker (cgroup based) containers.
>> 
>> Can you clarify this for me please. I assume we are still only listing processes for the current user - correct?
> 
> No, it will report any Java processes that the current user has permission to see (hsperfdata* files are readable).
> If you run as root, you see all processes.
> 
>> So the issue is if I am running mutiple JVMs and also running a Docker process that itself contains JVMs then jps outside of Docker won't show the JVMs that are inside Docker.
> 
> Yes, this is what is being fixed.
> 
>> And presumably the opposite is also true?
>> 
> 
> Yes, this is also true but is not being fixed by this change.  I am only fixing the case where you run jps on a host and
> want to see all java processes running (in and out of containers).
> 
> 
>> If we change jps to find all JVMs on the host for the current user, whether in a container or not (and whether jps is run from in a container or not?) are the process id's that get returned directly useable from where jps was run?
> 
> Given the above constraints, yes.
> 
> The results jps can passed to jcmd to query things like VM.version
> 
>> 
>>> I’ve tested this change by examining processes running as root on both host and in
>>> docker containers as well as under my userid using “jps and jcmd -l”.
>>> I’ve also tested the getFile functions with a small example program that I wrote.
>>> From what I can tell,  these two getFile functions are not used in the JDK sources
>>> and in fact the PerfDataFile.getFile(int lvmid) method doesn’t appear to have never worked!
>>> I’d really like to remove these two methods.
>> 
>> If they are not used by anything then by all means drop them.
> I saw some Javadocs on the web documenting these methods leading me to believe that we would need
> to deprecate these even though they are internal APIs.  I’ll ask core-libs.
> 
> http://openjdk.java.net/groups/serviceability/jvmstat/sun/jvmstat/perfdata/monitor/protocol/local/PerfDataFile.html#getFile(int)
> 
>> 
>>> 140                 candidate = new File(f.getName(), name);  <<——— This line drops the /tmp directory off of the path.
>>> Here are some implementation details that I’ve added added to one of the Unix
>>> specific source files (src/java.base/unix/classes/jdk/internal/vm/VMSupportImpl.java)
>> 
>> I have an issue with this. This is not "unix" (ie anything other than windows) support code, it is Linux support code. I'm also unclear why this needs to be lifted out of the PerfMemory subsystem.
> You are correct, I’ll move this into a linux specific directory.  I factored these functions in order to isolate the cgroup specific functionality
> in an OS specific tree.
> 
>> 
>> And if you're going to mess with the jdk.internal module then I think core-libs folk will want to know before hand.
> 
> Will do.
> 
>> 
>> Other comments ...
>> 
>> src/hotspot/os/linux/perfMemory_linux.cpp
>> 
>> I'm concerned about the MAXPATHLEN stack buffers and the extra overhead being added to the normal non-container path.
> 
> I believe the impacted paths are only used when attaching to the VM and should not impact normal startup.
> 
>> We've had issues in the past where a newly added stack buffer caused StackOverflowError. We don't need a 4K buffer because os::get_temp_directory() just returns "/tmp" and we're dealing with all Linux code here. I don't like making assumptions like this but ...
>> 
>> #define TMP_BUFFER_LEN (4 + 21)
>> static char* get_user_tmp_dir(const char* user, int vmid, int nspid) {
>> char buffer[TMP_BUFFER_LEN];
>> char* tmpdir = os::get_temp_directory();
>> assert(strlen(tmpdir) == 4, "No longer using /tmp - update buffer size");
>> if (nspid != -1) {
>>   jio_snprintf(buffer, TMP_BUFFER_LEN, "/proc/%d/root%s", vmid, tmpdir);
>>   tmpdir = buffer;
>> }
>> ...
>> 
>> 
>> 657   char fname[64];
>> 
>> Maximum length of "/proc/%d/status" is 23.
> 
> Ok, I’ll reduce the buffer sizes I use.
> 
> Thanks,
> Bob.
> 
> 
>> 
>> Thanks,
>> David
>