Largest files in the JDK repo

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Oct 24 20:28:43 UTC 2024


On 2024-10-24 20:47, Jorn Vernee wrote:

> WRT the two biggest files:
>
> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>
> These are mechanically generate C libraries featuring a lot of 
> different function shapes, for testing of FFM downcalls. The Java code 
> that is used to generate these C files could theoretically run as part 
> of the test as well, but the problem is that then we then need to 
> compile the generated sources into a native library.
>
> Currently the JDK build system will find and build all native 
> libraries needed for tests before any of the tests run, but maybe it's 
> possible to create a way for a test to request that a native library 
> be built on demand. Then we wouldn't need to pre-generate these files 
> and include them in the repo, and could instead generate + compile 
> them when the test runs. (This might also help cut down on the build 
> time of the test image, since you'd only need to compile test 
> libraries for the tests that actually run).
>
Unfortunately, requesting compilation of native code at test time is not 
trivial to support, and I don't even think we want to even try doing 
that, for various reasons.

However, generating source code just in time for compilation is 
commonplace in the JDK; we call it "gensrc" in the build system. We have 
not done so for tests so far, but it would not be horribly hard to 
change add gensrc functionality to native tests as well.

I'd say that there are three criteria that indicate we should start 
using a gensrc system for these tests:

1) They are generated by a Java tool

2) That tool runs rather quickly

3) Changing that tool, rather than changing the individual files, is the 
preferred way of updating this source code

You indicated that 1 is true. Is that the case for 2 and 3 as well?

/Magnus


> Jorn
>
> On 24-10-2024 13:04, Magnus Ihse Bursie wrote:
>>
>> I got intrigued at how https://bugs.openjdk.org/browse/JDK-8339507 
>> could integrate a 7 MB large file without nobody noticing, so I 
>> started wondering how many other huge text files there is in our 
>> repo. (We are much more restrictive with binary files, even if they 
>> are small...)
>>
>> So I compiled a top 100 list, which basically ended up being all 
>> files larger than 400 kB. In total, these 100 files account from ca 
>> 82 MB of data. I'm not saying that any of these files are wrong per 
>> se, but maybe some of the files on this list could provide a bit food 
>> for thought. Further down is the complete top-list, but it is a bit 
>> hard to get a grip on. I sorted and grouped the result, since the 
>> large files are not randomly sprinkled throughout the code base. This 
>> list does not contain test files. The huge test files are more 
>> numerous, but there are also (imho) more compelling reasons in 
>> general to allow for bigger files in testing. With that said, even 
>> some of the test files seems a bit excessive. (And one can not help 
>> but wonder what kind of file 
>> src/java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>>
>> Character sets and localization:
>> * make/data/charsetmapping
>> * make/data/cldr
>> * src/java.base/share/data/lsrdata/
>> * src/java.base/share/data/unicodedata
>> * src/java.base/share/classes/java/lang/Character.java
>> * src/java.base/share/classes/sun/nio/cs/GB18030.java
>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>> 3rd party source:
>> * src/jdk.incubator.vector/*/native/libjsvml/*.S
>> * src/java.base/share/native/libzip/zlib/crc32.h
>> * src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>> Symbols from previous JDKS:
>> * src/jdk.compiler/share/data/symbols
>> Huge Hotspot files:
>> * src/hotspot/cpu/*/*.ad
>> * src/hotspot/cpu/x86/assembler_x86.cpp
>> * src/hotspot/share/prims/jvmti.xml
>> Other:
>> * src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>> * src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>> * src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>> And a binary file:
>> * src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>
>> And here is the complete top list:
>>
>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>> 2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
>> 2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
>> 2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
>> 2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
>> 2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
>> 1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
>> 1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
>> 1.5M ./test/jdk/java/foreign/libTestUpcall.c
>> 1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
>> 1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
>> 1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
>> 952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
>> 941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
>> 928K ./make/data/charsetmapping/EUC_TW.map
>> 927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
>> 912K ./make/data/cldr/common/supplemental/likelySubtags.xml
>> 898K ./make/data/charsetmapping/MS936.map
>> 865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>> 857K ./test/jdk/java/foreign/libTestDowncall.c
>> 843K ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
>> 830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>> 794K ./make/data/cldr/common/main/ru.xml
>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
>> 767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
>> 752K ./make/data/cldr/common/main/uk.xml
>> 742K ./make/data/charsetmapping/Johab.map
>> 741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
>> 739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
>> 733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
>> 731K ./make/data/charsetmapping/MS950.map
>> 727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
>> 709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
>> 698K ./make/data/charsetmapping/MS949.map
>> 695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
>> 655K ./src/hotspot/cpu/x86/assembler_x86.cpp
>> 647K ./test/jdk/java/lang/instrument/BigClass.java
>> 634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
>> 628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
>> 616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
>> 601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>> 597K ./src/hotspot/share/prims/jvmti.xml
>> 597K ./test/jdk/sun/security/ec/SigGen-1.txt
>> 593K ./make/data/cldr/common/main/lt.xml
>> 582K ./make/data/cldr/common/main/cs.xml
>> 579K ./src/java.base/share/native/libzip/zlib/crc32.h
>> 577K ./make/data/cldr/common/main/sk.xml
>> 577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
>> 572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
>> 567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
>> 539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
>> 536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
>> 534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
>> 532K ./make/data/cldr/common/main/ff_Adlm.xml
>> 531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
>> 526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
>> 524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
>> 523K ./make/data/cldr/common/main/pl.xml
>> 520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
>> 518K ./make/data/cldr/common/main/sl.xml
>> 510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
>> 509K ./make/data/cldr/common/main/mr.xml
>> 507K ./make/data/cldr/common/main/kn.xml
>> 505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
>> 504K ./make/data/cldr/common/main/sr.xml
>> 503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
>> 502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
>> 501K ./make/data/cldr/common/main/ta.xml
>> 496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
>> 490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
>> 489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
>> 485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
>> 485K ./src/hotspot/cpu/aarch64/aarch64.ad
>> 478K ./src/hotspot/cpu/ppc/ppc.ad
>> 467K ./make/data/cldr/common/main/gd.xml
>> 466K ./src/java.base/share/classes/java/lang/Character.java
>> 453K ./make/data/cldr/common/main/ar.xml
>> 452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
>> 446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
>> 445K ./make/data/cldr/common/main/cy.xml
>> 443K ./make/data/cldr/common/main/ml.xml
>> 442K ./make/data/cldr/common/main/br.xml
>> 442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
>> 442K ./make/data/cldr/common/main/hr.xml
>> 441K ./src/hotspot/cpu/x86/x86_32.ad
>> 438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>> 436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
>> 433K ./make/data/cldr/common/main/el.xml
>> 432K ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>> 429K ./make/data/cldr/common/main/lv.xml
>> 428K ./make/data/cldr/common/main/fi.xml
>> 427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
>> 421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>> 419K ./make/data/cldr/common/main/en.xml
>> 418K ./src/hotspot/cpu/x86/x86.ad
>> 416K ./make/data/cldr/common/main/sr_Latn.xml
>>
>> The list was compiled by running:
>>
>> find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort -nr | 
>> numfmt --field=1 --to=iec | head -n 100
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20241024/0fccad11/attachment-0001.htm>


More information about the jdk-dev mailing list