Largest files in the JDK repo

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Mon Oct 28 16:12:57 UTC 2024


On 2024-10-25 00:29, Jorn Vernee wrote:

> > You indicated that 1 is true. Is that the case for 2 and 3 as well?
>
> Yes, both 2 and 3 are true. (3 is in fact required, because the test 
> code uses the same stream of 'shapes' to do the actual calls).
>
> The program that generates this code is 
> test/jdk/java/foreign/CallgeneratorHelper.java [1] It generates 5 
> files in total. It's a bit entangled with the current test code, but 
> nothing we can't separate out I think. I suppose the trickiest part is 
> that the actual test also needs access to the code when running.
>
Thanks for the information. I created 
https://bugs.openjdk.org/browse/JDK-8343155 for an effort to convert 
this to gensrc. I put the issue on infrastructure/build for now, to 
enable gensrc for native tests, but at some point involvement of someone 
who knows the FFM tests will be necessary.

/Magnus


> FWIW, we have other examples of test code that is generated by scripts 
> as well, such as 
> test/jdk/java/lang/invoke/VarHandles/generate-vh-tests.sh and various 
> scripts under test/jdk/java/nio/Buffer which invoke SPP. There are 
> probably more cases like that.
>
> Jorn
>
> [1]: 
> https://github.com/openjdk/jdk/blob/master/test/jdk/java/foreign/CallGeneratorHelper.java
>
> On 24-10-2024 22:28, Magnus Ihse Bursie wrote:
>>
>> On 2024-10-24 20:47, Jorn Vernee wrote:
>>
>>> WRT the two biggest files:
>>>
>>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>>>
>>> These are mechanically generate C libraries featuring a lot of 
>>> different function shapes, for testing of FFM downcalls. The Java 
>>> code that is used to generate these C files could theoretically run 
>>> as part of the test as well, but the problem is that then we then 
>>> need to compile the generated sources into a native library.
>>>
>>> Currently the JDK build system will find and build all native 
>>> libraries needed for tests before any of the tests run, but maybe 
>>> it's possible to create a way for a test to request that a native 
>>> library be built on demand. Then we wouldn't need to pre-generate 
>>> these files and include them in the repo, and could instead generate 
>>> + compile them when the test runs. (This might also help cut down on 
>>> the build time of the test image, since you'd only need to compile 
>>> test libraries for the tests that actually run).
>>>
>> Unfortunately, requesting compilation of native code at test time is 
>> not trivial to support, and I don't even think we want to even try 
>> doing that, for various reasons.
>>
>> However, generating source code just in time for compilation is 
>> commonplace in the JDK; we call it "gensrc" in the build system. We 
>> have not done so for tests so far, but it would not be horribly hard 
>> to change add gensrc functionality to native tests as well.
>>
>> I'd say that there are three criteria that indicate we should start 
>> using a gensrc system for these tests:
>>
>> 1) They are generated by a Java tool
>>
>> 2) That tool runs rather quickly
>>
>> 3) Changing that tool, rather than changing the individual files, is 
>> the preferred way of updating this source code
>>
>> You indicated that 1 is true. Is that the case for 2 and 3 as well?
>>
>> /Magnus
>>
>>
>>> Jorn
>>>
>>> On 24-10-2024 13:04, Magnus Ihse Bursie wrote:
>>>>
>>>> I got intrigued at how https://bugs.openjdk.org/browse/JDK-8339507 
>>>> could integrate a 7 MB large file without nobody noticing, so I 
>>>> started wondering how many other huge text files there is in our 
>>>> repo. (We are much more restrictive with binary files, even if they 
>>>> are small...)
>>>>
>>>> So I compiled a top 100 list, which basically ended up being all 
>>>> files larger than 400 kB. In total, these 100 files account from ca 
>>>> 82 MB of data. I'm not saying that any of these files are wrong per 
>>>> se, but maybe some of the files on this list could provide a bit 
>>>> food for thought. Further down is the complete top-list, but it is 
>>>> a bit hard to get a grip on. I sorted and grouped the result, since 
>>>> the large files are not randomly sprinkled throughout the code 
>>>> base. This list does not contain test files. The huge test files 
>>>> are more numerous, but there are also (imho) more compelling 
>>>> reasons in general to allow for bigger files in testing. With that 
>>>> said, even some of the test files seems a bit excessive. (And one 
>>>> can not help but wonder what kind of file 
>>>> src/java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>>>>
>>>> Character sets and localization:
>>>> * make/data/charsetmapping
>>>> * make/data/cldr
>>>> * src/java.base/share/data/lsrdata/
>>>> * src/java.base/share/data/unicodedata
>>>> * src/java.base/share/classes/java/lang/Character.java
>>>> * src/java.base/share/classes/sun/nio/cs/GB18030.java
>>>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>>>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>>>> 3rd party source:
>>>> * src/jdk.incubator.vector/*/native/libjsvml/*.S
>>>> * src/java.base/share/native/libzip/zlib/crc32.h
>>>> * src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>>>> Symbols from previous JDKS:
>>>> * src/jdk.compiler/share/data/symbols
>>>> Huge Hotspot files:
>>>> * src/hotspot/cpu/*/*.ad
>>>> * src/hotspot/cpu/x86/assembler_x86.cpp
>>>> * src/hotspot/share/prims/jvmti.xml
>>>> Other:
>>>> * src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>>>> * src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>>>> * 
>>>> src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>>>> And a binary file:
>>>> * src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>>>
>>>> And here is the complete top list:
>>>>
>>>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>>>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>>>> 2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
>>>> 2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
>>>> 2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
>>>> 2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
>>>> 2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>>>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
>>>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
>>>> 1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
>>>> 1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
>>>> 1.5M ./test/jdk/java/foreign/libTestUpcall.c
>>>> 1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
>>>> 1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
>>>> 1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
>>>> 952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
>>>> 941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
>>>> 928K ./make/data/charsetmapping/EUC_TW.map
>>>> 927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
>>>> 912K ./make/data/cldr/common/supplemental/likelySubtags.xml
>>>> 898K ./make/data/charsetmapping/MS936.map
>>>> 865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>>>> 857K ./test/jdk/java/foreign/libTestDowncall.c
>>>> 843K ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
>>>> 830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>>>> 794K ./make/data/cldr/common/main/ru.xml
>>>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
>>>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
>>>> 767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
>>>> 752K ./make/data/cldr/common/main/uk.xml
>>>> 742K ./make/data/charsetmapping/Johab.map
>>>> 741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
>>>> 739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
>>>> 733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
>>>> 731K ./make/data/charsetmapping/MS950.map
>>>> 727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
>>>> 709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
>>>> 698K ./make/data/charsetmapping/MS949.map
>>>> 695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
>>>> 655K ./src/hotspot/cpu/x86/assembler_x86.cpp
>>>> 647K ./test/jdk/java/lang/instrument/BigClass.java
>>>> 634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
>>>> 628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
>>>> 616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
>>>> 601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>>>> 597K ./src/hotspot/share/prims/jvmti.xml
>>>> 597K ./test/jdk/sun/security/ec/SigGen-1.txt
>>>> 593K ./make/data/cldr/common/main/lt.xml
>>>> 582K ./make/data/cldr/common/main/cs.xml
>>>> 579K ./src/java.base/share/native/libzip/zlib/crc32.h
>>>> 577K ./make/data/cldr/common/main/sk.xml
>>>> 577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
>>>> 572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
>>>> 567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
>>>> 539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
>>>> 536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
>>>> 534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
>>>> 532K ./make/data/cldr/common/main/ff_Adlm.xml
>>>> 531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
>>>> 526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
>>>> 524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
>>>> 523K ./make/data/cldr/common/main/pl.xml
>>>> 520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
>>>> 518K ./make/data/cldr/common/main/sl.xml
>>>> 510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
>>>> 509K ./make/data/cldr/common/main/mr.xml
>>>> 507K ./make/data/cldr/common/main/kn.xml
>>>> 505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
>>>> 504K ./make/data/cldr/common/main/sr.xml
>>>> 503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
>>>> 502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
>>>> 501K ./make/data/cldr/common/main/ta.xml
>>>> 496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
>>>> 490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
>>>> 489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
>>>> 485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
>>>> 485K ./src/hotspot/cpu/aarch64/aarch64.ad
>>>> 478K ./src/hotspot/cpu/ppc/ppc.ad
>>>> 467K ./make/data/cldr/common/main/gd.xml
>>>> 466K ./src/java.base/share/classes/java/lang/Character.java
>>>> 453K ./make/data/cldr/common/main/ar.xml
>>>> 452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
>>>> 446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
>>>> 445K ./make/data/cldr/common/main/cy.xml
>>>> 443K ./make/data/cldr/common/main/ml.xml
>>>> 442K ./make/data/cldr/common/main/br.xml
>>>> 442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
>>>> 442K ./make/data/cldr/common/main/hr.xml
>>>> 441K ./src/hotspot/cpu/x86/x86_32.ad
>>>> 438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>>>> 436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
>>>> 433K ./make/data/cldr/common/main/el.xml
>>>> 432K ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>>>> 429K ./make/data/cldr/common/main/lv.xml
>>>> 428K ./make/data/cldr/common/main/fi.xml
>>>> 427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
>>>> 421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>>> 419K ./make/data/cldr/common/main/en.xml
>>>> 418K ./src/hotspot/cpu/x86/x86.ad
>>>> 416K ./make/data/cldr/common/main/sr_Latn.xml
>>>>
>>>> The list was compiled by running:
>>>>
>>>> find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort -nr 
>>>> | numfmt --field=1 --to=iec | head -n 100
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20241028/a6f76c74/attachment-0001.htm>


More information about the jdk-dev mailing list