Deterministic naming of subclasses of `java/lang/reflect/Proxy`

Chen Liang liangchenblue at gmail.com
Wed May 22 19:37:16 UTC 2024


Hi Aman,
Even though the specification says "not in any particular order," the
getInterfaces and getMethods actually return an ordered array, in the order
these methods/interfaces are declared in their class files.

I believe you are decompiling the proxy classes generated by an older
version of the JDK; for example, back in JDK 8, the proxy methods were not
ordered because they were tracked in a HashMap:
https://github.com/openjdk/jdk8u/blob/6b53212ef78ad50f9eede829c5ff87cadcdb434b/jdk/src/share/classes/sun/misc/ProxyGenerator.java#L405
Which is no longer the case:
https://github.com/openjdk/jdk/blob/d59c12fe1041a1f61f68408241a9aa4d96ac4fd2/src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java#L241

- Chen

On Wed, May 22, 2024 at 1:19 PM Aman Sharma <amansha at kth.se> wrote:

> Hi,
>
>
> Another thing I wanted to look into in this thread was the order of fields
> in the Proxy classes generated. They are also based on the a number. The
> same proxy classes across different executions can have random order of
> `Method` fields and the methods could be mapped to different field names.
>
>
> For example, consider the proxy class based on `picocli.CommandLine
> <https://github.com/remkop/picocli/blob/da98db63d1b516141b7485881b0dcddfd082dbc8/src/main/java/picocli/CommandLine.java#L4541>`
> in two different executions.
>
> // fields and method are truncated for brevity
> public final class $Proxy9 extends Proxy implements CommandLine.Command {
>     private static Method m1;
>     private static Method m32;
>     private static Method m21;
>     private static Method m43;
>     private static Method m36;
>     private static Method m27;
>
>     public final boolean helpCommand() throws  {
>         try {
>             return (Boolean)super.h.invoke(this, m32, (Object[])null);
>         } catch (RuntimeException | Error var2) {
>             throw var2;
>         } catch (Throwable var3) {
>             throw new UndeclaredThrowableException(var3);
>         }
>      }
>
> // fields and method are truncated for brevity
> public final class $Proxy13 extends Proxy implements CommandLine.Command {
>     private static Method m1;
>     private static Method m29;
>     private static Method m16;
>     private static Method m40;
>     private static Method m38;
>     private static Method m12;
>
>     public final boolean helpCommand() throws  {
>         try {
>             return (Boolean)super.h.invoke(this, m29, (Object[])null);
>         } catch (RuntimeException | Error var2) {
>             throw var2;
>         } catch (Throwable var3) {
>             throw new UndeclaredThrowableException(var3);
>         }
>     }
>
>
> Notice the difference in the order of fields and `helpCommand` method is
> mapped to a different field name in both classes. This happens because
> the method array returned by `getMethods` is not sorted in any particular
> order
> <https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178>
> when generating a proxy class. What dictates this order? And why is it
> not deterministic?
>
>
> Regards,
> Aman Sharma
>
> PhD Student
> KTH Royal Institute of Technology
> School of Electrical Engineering and Computer Science (EECS)
> Department of Theoretical Computer Science (TCS)
> <http://www.kth.se> <https://www.kth.se/profile/amansha>
> <https://www.kth.se/profile/amansha>
> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
> ------------------------------
> *From:* Aman Sharma
> *Sent:* Wednesday, May 22, 2024 4:12:19 PM
> *To:* Chen Liang
> *Cc:* David Holmes; core-libs-dev at openjdk.org; leyden-dev at openjdk.org
> *Subject:* Re: Deterministic naming of subclasses of
> `java/lang/reflect/Proxy`
>
>
> Hi Chen,
>
>
> That's clear. Thanks for letting me know. I guess then Project Leyden is
> working on naming the hidden classes deterministically to achieve their
> goals <https://openjdk.org/projects/leyden/notes/01-beginnings>.
>
>
> Regards,
> Aman Sharma
>
> PhD Student
> KTH Royal Institute of Technology
> School of Electrical Engineering and Computer Science (EECS)
> Department of Theoretical Computer Science (TCS)
> <http://www.kth.se> <https://www.kth.se/profile/amansha>
> <https://www.kth.se/profile/amansha>
> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
> ------------------------------
> *From:* Chen Liang <liangchenblue at gmail.com>
> *Sent:* Wednesday, May 22, 2024 1:35:46 PM
> *To:* Aman Sharma
> *Cc:* David Holmes; core-libs-dev at openjdk.org; leyden-dev at openjdk.org
> *Subject:* Re: Deterministic naming of subclasses of
> `java/lang/reflect/Proxy`
>
> Hi Aman,
> We have tried defining Proxy as hidden classes; a previous attempt was on
> hold because of issues with serialization. Otherwise, Proxies work great as
> hidden classes.
>
> Chen
>
> On Mon, May 20, 2024 at 7:56 AM Aman Sharma <amansha at kth.se> wrote:
>
>> Hi David,
>>
>>
>> > I would not expect any class load
>> events.
>>
>>
>> I understand. I also haven't tried to intercept them but I see only one
>> approach right now to include them in an allowlist - 1) statically look for
>> invocations of "Lookup::defineHiddenClass". 2) Instrument them so that
>> its first argument "bytes" can be looked into upon. I haven't looked into
>> it much because I did not have much idea about it. And they are hidden so
>> it made it worse. 😅 Thanks for sharing the JEP!
>>
>>
>> >
>> java.lang.reflect.Proxy could define hidden classes to act as the proxy
>> classes which implement proxy interfaces; from JEP 317
>>
>>
>> It says that Proxy classes will also become hidden classes. Is it
>> underway? Right now one can intercept, transform them, and include them in
>> an allowlist. What do you think of naming them independent of AtomicLong so
>> that a proxy class generated at runtime is easy to lookup in the allowlist?
>>
>>
>>
>> Regards,
>> Aman Sharma
>>
>> PhD Student
>> KTH Royal Institute of Technology
>> School of Electrical Engineering and Computer Science (EECS)
>> Department of Theoretical Computer Science (TCS)
>> <http://www.kth.se> <https://www.kth.se/profile/amansha>
>> <https://www.kth.se/profile/amansha>
>> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
>> ------------------------------
>> *From:* David Holmes <david.holmes at oracle.com>
>> *Sent:* Monday, May 20, 2024 2:30:37 PM
>> *To:* Aman Sharma; liangchenblue at gmail.com
>> *Cc:* core-libs-dev at openjdk.org; leyden-dev at openjdk.org
>> *Subject:* Re: Deterministic naming of subclasses of
>> `java/lang/reflect/Proxy`
>>
>> On 20/05/2024 10:12 pm, Aman Sharma wrote:
>> > Hi David,
>> >
>> >
>> >  > How did you try to intercept them? Hidden classes are not "loaded" in
>> > the normal sense so won't trigger class load events.
>> >
>> >
>> > I could not intercept them. I only see them when I pass
>> `-verbose:class`
>> > in the Java CLI.
>>
>> Yes that is why I asked how you tried to intercept them.
>>
>> >
>> > I also couldn't intercept them using JVMTI Class File Load Hook
>> > <
>> https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassFileLoadHook>
>> event. However JEP 371 suggests that it should be possible to intercept
>> them using JVMTI Class Load <
>> https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassLoad>
>> event, but I won't have the bytecode at this stage. So is there no way to
>> get its bytecode before it is linked and initialized in the JVM?
>>
>> Hidden classes are not loaded so I would not expect any class load
>> events. However the exact nature of the JVMTI class load event is
>> unclear as it talks about "class or interface creation" which is neither
>> loading or defining per se. But a class prepare event sounds like it
>> should be issued. However neither give you access to the bytecode of the
>> class AFAICS.
>>
>> David
>> -----
>>
>>
>> >
>> > Regards,
>> > Aman Sharma
>> >
>> > PhD Student
>> > KTH Royal Institute of Technology
>> > School of Electrical Engineering and Computer Science (EECS)
>> > Department of Theoretical Computer Science (TCS)
>> > <
>> http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha
>> >
>> > <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
>> > <https://algomaster99.github.io/>
>> > ------------------------------------------------------------------------
>> > *From:* David Holmes <david.holmes at oracle.com>
>> > *Sent:* Monday, May 20, 2024 2:59:17 AM
>> > *To:* Aman Sharma; liangchenblue at gmail.com
>> > *Cc:* core-libs-dev at openjdk.org; leyden-dev at openjdk.org
>> > *Subject:* Re: Deterministic naming of subclasses of
>> > `java/lang/reflect/Proxy`
>> > On 17/05/2024 9:43 pm, Aman Sharma wrote:
>> >> Hi Chen,
>> >>
>> >>  > java.lang.invoke.LambdaForm$MH/0x00000200cc000400
>> >>
>> >> I do see this as output when I pass -verbose:class. However, based on
>> my
>> >> experiments, I have seen that neither an agent passed via 'javaagent'
>> >> nor an agent passed via 'agentpath' is able to intercept this hidden
>> class.
>> >
>> > How did you try to intercept them? Hidden classes are not "loaded" in
>> > the normal sense so won't trigger class load events.
>> >
>> >> Also, I was a bit confused since I saw somewhere that the names of
>> >> hidden classes are null. But thanks for clarifying here.
>> >
>> > The JEP clearly defines the name format for hidden classes - though the
>> > final component is VM specific (and typically a hashcode).
>> >
>> > https://openjdk.org/jeps/371 <https://openjdk.org/jeps/371>
>> >
>> > Cheers,
>> > David
>> > -----
>> >
>> >>  > avoid dynamic class loading
>> >>
>> >> I don't see dynamic class loading as a problem. I only mind some
>> >> unstable generation aspects of them which make it hard to verify them
>> >> based on an allowlist.
>> >>
>> >> For example, if this hidden class is generated with the exact same
>> name
>> >> and the exact same bytecode during runtime as well, it would be easy
>> to
>> >> verify it. However, I do see the names are based on some sort of
>> memory
>> >> address so and I don't know what bytecode it has so I don't have
>> >> suggestions to make them stable as of now. For Proxy classes, I feel
>> it
>> >> can be addressed unless you disagree or some involved in Project
>> Leyden
>> >> does. :) Thank you for forwarding my mail there.
>> >>
>> >> Regards,
>> >> Aman Sharma
>> >>
>> >> PhD Student
>> >> KTH Royal Institute of Technology
>> >> https://algomaster99.github.io/ <https://algomaster99.github.io/>
>> > <https://algomaster99.github.io/ <https://algomaster99.github.io/>>
>> >>
>> >>
>> ------------------------------------------------------------------------
>> >> *From:* liangchenblue at gmail.com <liangchenblue at gmail.com>
>> >> *Sent:* Friday, May 17, 2024 1:23:58 pm
>> >> *To:* Aman Sharma <amansha at kth.se>
>> >> *Cc:* core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>;
>> >> leyden-dev at openjdk.org <leyden-dev at openjdk.org>
>> >> *Subject:* Re: Deterministic naming of subclasses of
>> >> `java/lang/reflect/Proxy`
>> >>
>> >> Hi Aman,
>> >> For `-verbose:class`, it's a JVM argument instead of a program
>> argument;
>> >> so when you run a java program like `java Main`, you should call it as
>> >> `java -verbose:class Main`.
>> >> When done correctly, you should see hidden class outputs like:
>> >> [0.032s][info][class,load]
>> >> java.lang.invoke.LambdaForm$MH/0x00000200cc000400 source:
>> >> __JVM_LookupDefineClass__
>> >> The loading of java.lang.invoke hidden classes requires your program
>> to
>> >> use MethodHandle features, like a lambda.
>> >>
>> >> I think the problem you are exploring, that to avoid dynamic class
>> >> loading and effectively turn Java Platform closed for security, is
>> also
>> >> being accomplished by project Leyden (as I've shared initially); Thus,
>> I
>> >> am forwarding this to leyden-dev instead, so you can see what approach
>> >> Leyden uses to accomplish the same goal as yours.
>> >>
>> >> Regards, Chen Liang
>> >>
>> >> On Fri, May 17, 2024 at 4:40 AM Aman Sharma <amansha at kth.se
>> >> <mailto:amansha at kth.se <mailto:amansha at kth.se <amansha at kth.se>>>>
>> wrote:
>> >>
>> >>     __
>> >>
>> >>     Hi Roger,
>> >>
>> >>
>> >>     Do you have ideas on how to intercept them? My javaagent is not
>> able
>> >>     to nor a JVMTI agent passed using `agentpath` option. It also does
>> >>     not seem to show up in logs when I pass `-verbose:class`.
>> >>
>> >>
>> >>     Also, what do you think of renaming the proxy classes as suggested
>> >>     below?
>> >>
>> >>
>> >>     Regards,
>> >>     Aman Sharma
>> >>
>> >>     PhD Student
>> >>     KTH Royal Institute of Technology
>> >>     School of Electrical Engineering and Computer Science (EECS)
>> >>     Department of Theoretical Computer Science (TCS)
>> >>     <http://www.kth.se><https://www.kth.se/profile/amansha><
>> https://www.kth.se/profile/amansha <
>> http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha
>> >>
>> >>     <https://www.kth.se/profile/amansha
>> > <https://www.kth.se/profile/amansha>>https://algomaster99.github.io/
>> >>     <https://algomaster99.github.io/ <https://algomaster99.github.io/
>> >>
>> >>
>> ------------------------------------------------------------------------
>> >>     *From:* core-libs-dev <core-libs-dev-retn at openjdk.org
>> >>     <mailto:core-libs-dev-retn at openjdk.org
>> > <mailto:core-libs-dev-retn at openjdk.org <core-libs-dev-retn at openjdk.org>>>>
>> on behalf of Roger Riggs
>> >>     <roger.riggs at oracle.com <mailto:roger.riggs at oracle.com <
>> mailto:roger.riggs at oracle.com <roger.riggs at oracle.com>>>>
>> >>     *Sent:* Friday, May 17, 2024 4:57:46 AM
>> >>     *To:* core-libs-dev at openjdk.org <mailto:core-libs-dev at openjdk.org
>> <mailto:core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>>>
>> >>     *Subject:* Re: Deterministic naming of subclasses of
>> >>     `java/lang/reflect/Proxy`
>> >>     Hi Aman,
>> >>
>> >>     You may also run into hidden classes (JEP 371: Hidden Classes) that
>> >>     allow classes to be defined, at runtime, without names.
>> >>     It has been proposed to use them for generated proxies but that
>> >>     hasn't been implemented yet.
>> >>     There are benefits to having nameless classes, because they can't
>> be
>> >>     referenced by name, only as a capability, they can be better
>> >>     encapsulated.
>> >>
>> >>     fyi, Roger Riggs
>> >>
>> >>
>> >>     On 5/16/24 8:11 AM, Aman Sharma wrote:
>> >>>
>> >>>     Hi,
>> >>>
>> >>>
>> >>>     Thanks for your response, Liang!
>> >>>
>> >>>
>> >>>     > I think you meant CVE-2021-42392 instead of 2022.
>> >>>
>> >>>
>> >>>     Sorry of the error. I indeed meant CVE-2021-42392
>> >>>     <https://nvd.nist.gov/vuln/detail/cve-2021-42392
>> > <https://nvd.nist.gov/vuln/detail/cve-2021-42392>>.
>> >>>
>> >>>
>> >>>     > Leyden mainly avoids this unstable generation by performing a
>> >>>     training run to collect classes loaded
>> >>>
>> >>>
>> >>>     Would love to know the details of Project Leyden and how they
>> >>>     worked so far to focus on this goal. In our case, the training run
>> >>>     is the test suite.
>> >>>
>> >>>
>> >>>     > GeneratedConstructorAccessor is already retired by JEP 416 [2]
>> >>>     in Java 18
>> >>>
>> >>>
>> >>>     I did see them not appearing in my allowlist when I ran my study
>> >>>     subject (Apache PDFBox) with Java 21. Thanks for letting me know
>> >>>     about this JEP. I see they are re-implemented with method handles.
>> >>>
>> >>>
>> >>>     > How are you checking the classes?
>> >>>
>> >>>
>> >>>     To detect runtime generated code, we have javaagent that is hooked
>> >>>     statically to the test suite execution. It gives us all classes
>> >>>     that that is loaded post the JVM and the javaagent are loaded. So
>> >>>     we only check the classes loaded for the purpose of running the
>> >>>     application. This is also why we did not choose -agentlib as it
>> >>>     would give classes for the setting up JVM and javaagent and we the
>> >>>     user of our tool must the classes they load.
>> >>>
>> >>>
>> >>>     Next, we have a `ClassFileTransformer` hook in the agent where we
>> >>>     produce the checksum using the bytecode. And we compare the
>> >>>     checksum with the one existing in the allowlist. The checksum
>> >>>     computation algorithm is same for both steps. Let me describe how
>> >>>     I compute the checksum.
>> >>>
>> >>>
>> >>>      1. I get the CONSTANT_Class_info
>> >>>         <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1
>> <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1>>
>> entry corresponding to `this_class` and rewrite the CONSTANT_Utf8_info <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7
>> <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7>>
>> corresponding to a fix String constant, say "foo".
>> >>>      2. Since, the name of the class is used to refer to its types
>> >>>         members (fields/method), I get all CONSTANT_Fieldref_info
>> >>>         <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2
>> <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2>>
>> and if its `class_index` corresponds to the old `this_class`, we rewrite
>> the UTF8 value of class_index to the same constant "foo".
>> >>>      3. Next, since the naming of the fields, in Proxy classes, are
>> >>>         also suffixed by numbers, for example, `private static Method
>> >>>         m4`, we rewrite the UTF8 value of name in the
>> >>>         CONSTANT_NameAndType_info
>> >>>         <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6
>> <
>> https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6
>> >>.
>> >>>      4. These fields can also have a random order so we simply sort
>> >>>         the entire byte code using `Arrays.sort(byte[])` to eliminate
>> >>>         any differences due to ordering of fields/methods.
>> >>>      5. Simply sorting the byte array still had minute differences. I
>> >>>         could not understand why they existed even though values in
>> >>>         constant pool of the bytecode in allowlist and at runtime were
>> >>>         exactly the same after rewriting. The differences existed in
>> >>>         the bytes of the Code attribute of methods. I concluded that
>> >>>         the bytes stored some position information. To avoid this, I
>> >>>         created a subarray where I considered the bytes corresponding
>> >>>         to `CONSTANT_Utf8_info.bytes` only. Computing a checksum for
>> >>>         it resulted in the same checksums for both classfiles.
>> >>>
>> >>>
>> >>>     Let's understand the whole approach with an example of Proxy
>> class.
>> >>>
>> >>>     `
>> >>>     public  final  class  $Proxy42  extends  Proxy  implements
>> org.apache.logging.log4j.core.config.plugins.Plugin  {
>> >>>     `
>> >>>
>> >>>     The will go in the allowlist as "Proxy_Plugin: <SHA256 checksum>".
>> >>>
>> >>>     When the same class is intercepted at runtime, say "$Proxy10", we
>> >>>     look for "Proxy_Plugin" in the allowlist and since the checksum
>> >>>     algorithm is same in both cases, we get a match and let the class
>> >>>     load.
>> >>>
>> >>>     This approach has seemed to work well for Proxy classes, Generated
>> >>>     Constructor Accessor (which is removed as you said). I also looked
>> >>>     at the species generated by method handles. I did not notice any
>> >>>     modification in them. Their name generation seemed okay to me. If
>> >>>     some new Species are generated, it is of course detected since it
>> >>>     is not in the allowlist.
>> >>>
>> >>>     I have not looked into LambdaMetafactory because I did not
>> >>>     encounter it as a problem so far, but I am aware its name
>> >>>     generation is also unstable. I have run my approach only a few
>> >>>     projects only. And for hidden classes, I assume the the agent
>> >>>     won't be able to intercept them so detecting them would be really
>> >>>     hard.
>> >>>
>> >>>
>> >>>     Regards,
>> >>>     Aman Sharma
>> >>>
>> >>>     PhD Student
>> >>>     KTH Royal Institute of Technology
>> >>>     School of Electrical Engineering and Computer Science (EECS)
>> >>>     Department of Theoretical Computer Science (TCS)
>> >>>     <https://www.kth.se/profile/amansha
>> > <https://www.kth.se/profile/amansha>>https://algomaster99.github.io/
>> > <https://algomaster99.github.io/ <https://algomaster99.github.io/>>
>> >>>
>> ------------------------------------------------------------------------
>> >>>     *From:* liangchenblue at gmail.com <mailto:liangchenblue at gmail.com <
>> mailto:liangchenblue at gmail.com <liangchenblue at gmail.com>>>
>> >>>     <liangchenblue at gmail.com> <mailto:liangchenblue at gmail.com <
>> mailto:liangchenblue at gmail.com <liangchenblue at gmail.com>>>
>> >>>     *Sent:* Thursday, May 16, 2024 5:52:03 AM
>> >>>     *To:* Aman Sharma; core-libs-dev
>> >>>     *Cc:* Martin Monperrus
>> >>>     *Subject:* Re: Deterministic naming of subclasses of
>> >>>     `java/lang/reflect/Proxy`
>> >>>     Hi Aman,
>> >>>     I think you meant CVE-2021-42392 instead of 2022.
>> >>>
>> >>>     For your approach of an "allowlist" for Java runtime, project
>> >>>     Leyden is looking to generate a static image [1], that
>> >>>     > At run time it cannot load classes from outside the image, nor
>> >>>     can it create classes dynamically.
>> >>>     Leyden mainly avoids this unstable generation by performing a
>> >>>     training run to collect classes loaded and even object graphs; I
>> >>>     am not familiar with the details unfortunately.
>> >>>
>> >>>     Otherwise, the Proxy discussion belongs better to core-libs-dev,
>> >>>     as java.lang.reflect.Proxy is part of Java's core libraries. I am
>> >>>     replying this thread to core-libs-dev.
>> >>>
>> >>>     For your perceived problem that classes don't have unique names,
>> >>>     your description sounds dubious: GeneratedConstructorAccessor is
>> >>>     already retired by JEP 416 [2] in Java 18, and there are many
>> >>>     other cases in which JDK generates classes without stable names,
>> >>>     notoriously LambdaMetafactory (Gradle wished for cacheable
>> >>>     Lambdas); the same applies for the generated classes for
>> >>>     MethodHandle's LambdaForms (which carries implementation code for
>> >>>     LambdaForm). How are you checking the classes? It seems you are
>> >>>     not checking hidden classes. Proxy and Lambda classes are defined
>> >>>     by the caller's class loader, while LambdaForms are under JDK's
>> >>>     system class loader I think. We need to ensure you are correctly
>> >>>     finding all unstable classes before we can proceed.
>> >>>
>> >>>     [1]: https://openjdk.org/projects/leyden/notes/01-beginnings
>> > <https://openjdk.org/projects/leyden/notes/01-beginnings>
>> >>>     <https://openjdk.org/projects/leyden/notes/01-beginnings
>> > <https://openjdk.org/projects/leyden/notes/01-beginnings>>
>> >>>     [2]: https://openjdk.org/jeps/416 <https://openjdk.org/jeps/416>
>> > <https://openjdk.org/jeps/416 <https://openjdk.org/jeps/416>>
>> >>>
>> >>>     On Wed, May 15, 2024 at 7:00 PM Aman Sharma <amansha at kth.se
>> >>>     <mailto:amansha at kth.se <mailto:amansha at kth.se <amansha at kth.se>>>>
>> wrote:
>> >>>
>> >>>         Hi,
>> >>>
>> >>>
>> >>>         My name is Aman and I am a PhD student at KTH Royal Institute
>> >>>         of Technology, Stockholm, Sweden. I research as part of CHAINS
>> >>>         <https://chains.proj.kth.se/ <https://chains.proj.kth.se/>>
>> project to
>> > strengthen the
>> >>>         software supply chain of multiple ecosystem. I particularly
>> >>>         focus on runtime integrity in Java. In this email, I want to
>> >>>         write about an issue I have discovered with /dynamic
>> >>>         generation of `java.lang.reflect.Proxy`classes/. I will
>> >>>         propose a solution and would love to hear the feedback from
>> >>>         the community. Let me know if this is the correct mailing-list
>> >>>         for such discussions. It seemed the most relevant from this
>> >>>         list <https://mail.openjdk.org/mailman/listinfo
>> > <https://mail.openjdk.org/mailman/listinfo>>.
>> >>>
>> >>>
>> >>>         *My research*
>> >>>
>> >>>         *
>> >>>         *
>> >>>
>> >>>         Java has features to load class on the fly - it can either
>> >>>         download or generate a class at runtime. These features are
>> >>>         useful for inner workings of JDK. For example, implementing
>> >>>         annotations, reflective access, etc. However, these features
>> >>>         have also contributed to critical vulnerabilities in the past
>> >>>         - CVE-2021-44228  (log4shell), CVE-2022-33980, CVE-2022-42392.
>> >>>         All of these vulnerabilities have one thing in common - /a
>> >>>         class that was not known during build time was
>> >>>         downloaded/generated at runtime and loaded into JVM./
>> >>>
>> >>>
>> >>>         To defend against such vulnerabilities, we propose a solution
>> >>>         to /allowlist classes for runtime/. This allowlist will
>> >>>         contain an exhaustive list of classes that can be loaded by
>> >>>         the JVM and it will be enforced at runtime. We build this
>> >>>         allowlist from three sources:
>> >>>
>> >>>          1. All classes of all modules provided by the Java Standard
>> >>>             Library. We use ClassGraph
>> >>>             <https://github.com/classgraph/classgraph
>> > <https://github.com/classgraph/classgraph>> to scan the JDK.
>> >>>          2. We can take the source code and all dependencies of an
>> >>>             application. We use a software bill of materials to get
>> >>>             all the data.
>> >>>          3. Finally, we use run the test suite to include any runtime
>> >>>             downloaded/generated classes.
>> >>>
>> >>>         Such a list is able to prevent the above 3 CVEs because it
>> >>>         does not let the "unknown" bytecode to be loaded.
>> >>>
>> >>>         *Problem with generating such an allowlist*
>> >>>         *
>> >>>         *
>> >>>         The first two parts of the allowlist are easy to get. The
>> >>>         problem is with the third step where we want to allowlist all
>> >>>         the classes that could be downloaded or generated. Upon
>> >>>         running the test suite and hooking to the classes it loads, we
>> >>>         observer that the list consists of classes that are called
>> >>>         "com/sun/proxy/$Proxy2",
>> >>>         "jdk/internal/reflect/GeneratedConstructorAccessor3" among
>> >>>         many more. The purpose of these classes can be identifed. The
>> >>>         proxy class is created for to implement an annotation. The
>> >>>         accessor gives access to constructor of a class to the JVM.
>> >>>
>> >>>         When enforcing this allowlist at runtime, we see that the
>> >>>         bytecode content for "com/sun/proxy/$Proxy2" differs in the
>> >>>         allowlist and at runtime. In our case, we we are experimenting
>> >>>         with pdfbox <https://github.com/apache/pdfbox <
>> https://github.com/apache/pdfbox>> so
>> > we created
>> >>>         the allowlist using its test suite. Then we enforced this
>> >>>         allowlist while running some of its subcommands. However,
>> >>>         there was some other proxy class say "com/sun/proxy/$Proxy5"
>> >>>         at runtime that implemented the same interfaces and had the
>> >>>         same methods as "com/sun/proxy/$Proxy2" in the allowlist. They
>> >>>         only differed in the name of the class, order of fields, and
>> >>>         types for fields references. This could happen because the
>> >>>         order of the loading of class is workload dependent, but it
>> >>>         causes problem to generate such an allowlist.
>> >>>
>> >>>         *Solution
>> >>>         *
>> >>>
>> >>>
>> >>>         We propose that naming of subclasses of
>> >>>         "java/lang/reflect/Proxy" should not be dependent upon the
>> >>>         order of loading. In order to do so, two issues can be fixed:
>> >>>
>> >>>          1. The naming of the class should not be based on AtomicLong
>> >>>             <
>> https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531
>> <
>> https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531>>.
>> Rather it could be named based on the interfaces it implements. I also
>> wonder why AtomicLong is chosen in the first place.
>> >>>          2. Methods of the interfaces must be in a particular order.
>> >>>             Right now, they are not sorted in any particular order
>> >>>             <
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178
>> <
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178
>> >>.
>> >>>
>> >>>
>> >>>         These fixes will make proxy class generation deterministic
>> >>>         with respect to order of loading and won't be flagged at
>> >>>         runtime since the test suite would already detect them.
>> >>>
>> >>>         I would love to hear from the community about these ideas. If
>> >>>         in agreement, I would be happy to produce a patch. I have
>> >>>         discovered this issue with subclasses of
>> >>>         GeneratedConstructorAccessor
>> >>>         <
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java
>> <
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java>>
>> as well and I imagine it will also apply to some other runtime generated
>> classes. If you disagree, please let me know also. It helps with my
>> research.
>> >>>
>> >>>         I also have PoCs for the above CVEs
>> >>>         <https://github.com/chains-project/exploits-for-sbom.exe
>> > <https://github.com/chains-project/exploits-for-sbom.exe>> and
>> >>>         a proof concept tool is being developed under the name
>> >>>         sbom.exe <https://github.com/chains-project/sbom.exe
>> > <https://github.com/chains-project/sbom.exe>> in case
>> >>>         any one wonders about the implementation. I would also be
>> >>>         happy to explain more.
>> >>>
>> >>>         Regards,
>> >>>         Aman Sharma
>> >>>
>> >>>         PhD Student
>> >>>         KTH Royal Institute of Technology
>> >>>         School of Electrical Engineering and Computer Science (EECS)
>> >>>         Department of Theoretical Computer Science (TCS)
>> >>>         <https://www.kth.se/profile/amansha
>> > <https://www.kth.se/profile/amansha>>https://algomaster99.github.io/
>> > <https://algomaster99.github.io/ <https://algomaster99.github.io/>>
>> >>>
>> >>
>> >>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240522/cde58d87/attachment.htm>


More information about the leyden-dev mailing list