Largest files in the JDK repo

Magnus Ihse Bursie magnus.ihse.bursie at
Mon Oct 28 16:12:57 UTC 2024

On 2024-10-25 00:29, Jorn Vernee wrote:

> > You indicated that 1 is true. Is that the case for 2 and 3 as well?
> Yes, both 2 and 3 are true. (3 is in fact required, because the test 
> code uses the same stream of 'shapes' to do the actual calls).
> The program that generates this code is 
> test/jdk/java/foreign/ [1] It generates 5 
> files in total. It's a bit entangled with the current test code, but 
> nothing we can't separate out I think. I suppose the trickiest part is 
> that the actual test also needs access to the code when running.
Thanks for the information. I created for an effort to convert 
this to gensrc. I put the issue on infrastructure/build for now, to 
enable gensrc for native tests, but at some point involvement of someone 
who knows the FFM tests will be necessary.


> FWIW, we have other examples of test code that is generated by scripts 
> as well, such as 
> test/jdk/java/lang/invoke/VarHandles/ and various 
> scripts under test/jdk/java/nio/Buffer which invoke SPP. There are 
> probably more cases like that.
> Jorn
> [1]: 
> On 24-10-2024 22:28, Magnus Ihse Bursie wrote:
>> On 2024-10-24 20:47, Jorn Vernee wrote:
>>> WRT the two biggest files:
>>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>>> These are mechanically generate C libraries featuring a lot of 
>>> different function shapes, for testing of FFM downcalls. The Java 
>>> code that is used to generate these C files could theoretically run 
>>> as part of the test as well, but the problem is that then we then 
>>> need to compile the generated sources into a native library.
>>> Currently the JDK build system will find and build all native 
>>> libraries needed for tests before any of the tests run, but maybe 
>>> it's possible to create a way for a test to request that a native 
>>> library be built on demand. Then we wouldn't need to pre-generate 
>>> these files and include them in the repo, and could instead generate 
>>> + compile them when the test runs. (This might also help cut down on 
>>> the build time of the test image, since you'd only need to compile 
>>> test libraries for the tests that actually run).
>> Unfortunately, requesting compilation of native code at test time is 
>> not trivial to support, and I don't even think we want to even try 
>> doing that, for various reasons.
>> However, generating source code just in time for compilation is 
>> commonplace in the JDK; we call it "gensrc" in the build system. We 
>> have not done so for tests so far, but it would not be horribly hard 
>> to change add gensrc functionality to native tests as well.
>> I'd say that there are three criteria that indicate we should start 
>> using a gensrc system for these tests:
>> 1) They are generated by a Java tool
>> 2) That tool runs rather quickly
>> 3) Changing that tool, rather than changing the individual files, is 
>> the preferred way of updating this source code
>> You indicated that 1 is true. Is that the case for 2 and 3 as well?
>> /Magnus
>>> Jorn
>>> On 24-10-2024 13:04, Magnus Ihse Bursie wrote:
>>>> I got intrigued at how 
>>>> could integrate a 7 MB large file without nobody noticing, so I 
>>>> started wondering how many other huge text files there is in our 
>>>> repo. (We are much more restrictive with binary files, even if they 
>>>> are small...)
>>>> So I compiled a top 100 list, which basically ended up being all 
>>>> files larger than 400 kB. In total, these 100 files account from ca 
>>>> 82 MB of data. I'm not saying that any of these files are wrong per 
>>>> se, but maybe some of the files on this list could provide a bit 
>>>> food for thought. Further down is the complete top-list, but it is 
>>>> a bit hard to get a grip on. I sorted and grouped the result, since 
>>>> the large files are not randomly sprinkled throughout the code 
>>>> base. This list does not contain test files. The huge test files 
>>>> are more numerous, but there are also (imho) more compelling 
>>>> reasons in general to allow for bigger files in testing. With that 
>>>> said, even some of the test files seems a bit excessive. (And one 
>>>> can not help but wonder what kind of file 
>>>> src/java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>>>> Character sets and localization:
>>>> * make/data/charsetmapping
>>>> * make/data/cldr
>>>> * src/java.base/share/data/lsrdata/
>>>> * src/java.base/share/data/unicodedata
>>>> * src/java.base/share/classes/java/lang/
>>>> * src/java.base/share/classes/sun/nio/cs/
>>>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/
>>>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/
>>>> 3rd party source:
>>>> * src/jdk.incubator.vector/*/native/libjsvml/*.S
>>>> * src/java.base/share/native/libzip/zlib/crc32.h
>>>> * src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>>>> Symbols from previous JDKS:
>>>> * src/jdk.compiler/share/data/symbols
>>>> Huge Hotspot files:
>>>> * src/hotspot/cpu/*/*.ad
>>>> * src/hotspot/cpu/x86/assembler_x86.cpp
>>>> * src/hotspot/share/prims/jvmti.xml
>>>> Other:
>>>> * src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>>>> * src/java.base/share/classes/java/lang/invoke/
>>>> * 
>>>> src/java.sql.rowset/share/classes/com/sun/rowset/
>>>> And a binary file:
>>>> * src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>>> And here is the complete top list:
>>>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>>>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>>>> 2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
>>>> 2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
>>>> 2.3M ./test/jdk/sun/nio/cs/
>>>> 2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
>>>> 2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>>>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
>>>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
>>>> 1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
>>>> 1.6M ./test/hotspot/jtreg/gc/
>>>> 1.5M ./test/jdk/java/foreign/libTestUpcall.c
>>>> 1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
>>>> 1.2M ./test/jdk/java/lang/String/concat/
>>>> 1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
>>>> 952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
>>>> 941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
>>>> 928K ./make/data/charsetmapping/
>>>> 927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/
>>>> 912K ./make/data/cldr/common/supplemental/likelySubtags.xml
>>>> 898K ./make/data/charsetmapping/
>>>> 865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/
>>>> 857K ./test/jdk/java/foreign/libTestDowncall.c
>>>> 843K ./test/hotspot/jtreg/compiler/loopopts/superword/
>>>> 830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>>>> 794K ./make/data/cldr/common/main/ru.xml
>>>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
>>>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
>>>> 767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/
>>>> 752K ./make/data/cldr/common/main/uk.xml
>>>> 742K ./make/data/charsetmapping/
>>>> 741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
>>>> 739K ./src/java.base/share/classes/sun/nio/cs/
>>>> 733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
>>>> 731K ./make/data/charsetmapping/
>>>> 727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
>>>> 709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
>>>> 698K ./make/data/charsetmapping/
>>>> 695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
>>>> 655K ./src/hotspot/cpu/x86/assembler_x86.cpp
>>>> 647K ./test/jdk/java/lang/instrument/
>>>> 634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
>>>> 628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
>>>> 616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
>>>> 601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/
>>>> 597K ./src/hotspot/share/prims/jvmti.xml
>>>> 597K ./test/jdk/sun/security/ec/SigGen-1.txt
>>>> 593K ./make/data/cldr/common/main/lt.xml
>>>> 582K ./make/data/cldr/common/main/cs.xml
>>>> 579K ./src/java.base/share/native/libzip/zlib/crc32.h
>>>> 577K ./make/data/cldr/common/main/sk.xml
>>>> 577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
>>>> 572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
>>>> 567K ./test/jdk/sun/nio/cs/OLD/
>>>> 539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
>>>> 536K ./test/micro/org/openjdk/bench/vm/gc/
>>>> 534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
>>>> 532K ./make/data/cldr/common/main/ff_Adlm.xml
>>>> 531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
>>>> 526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
>>>> 524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
>>>> 523K ./make/data/cldr/common/main/pl.xml
>>>> 520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/
>>>> 518K ./make/data/cldr/common/main/sl.xml
>>>> 510K ./test/jdk/sun/nio/cs/OLD/
>>>> 509K ./make/data/cldr/common/main/mr.xml
>>>> 507K ./make/data/cldr/common/main/kn.xml
>>>> 505K ./test/jdk/sun/nio/cs/OLD/
>>>> 504K ./make/data/cldr/common/main/sr.xml
>>>> 503K ./test/jdk/sun/nio/cs/OLD/
>>>> 502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
>>>> 501K ./make/data/cldr/common/main/ta.xml
>>>> 496K ./test/jdk/sun/nio/cs/OLD/
>>>> 490K ./test/jdk/sun/nio/cs/OLD/
>>>> 489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/
>>>> 485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/
>>>> 485K ./src/hotspot/cpu/aarch64/
>>>> 478K ./src/hotspot/cpu/ppc/
>>>> 467K ./make/data/cldr/common/main/gd.xml
>>>> 466K ./src/java.base/share/classes/java/lang/
>>>> 453K ./make/data/cldr/common/main/ar.xml
>>>> 452K ./test/jdk/sun/nio/cs/OLD/
>>>> 446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
>>>> 445K ./make/data/cldr/common/main/cy.xml
>>>> 443K ./make/data/cldr/common/main/ml.xml
>>>> 442K ./make/data/cldr/common/main/br.xml
>>>> 442K ./test/jdk/sun/nio/cs/OLD/
>>>> 442K ./make/data/cldr/common/main/hr.xml
>>>> 441K ./src/hotspot/cpu/x86/
>>>> 438K ./src/java.base/share/classes/java/lang/invoke/
>>>> 436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
>>>> 433K ./make/data/cldr/common/main/el.xml
>>>> 432K ./src/java.sql.rowset/share/classes/com/sun/rowset/
>>>> 429K ./make/data/cldr/common/main/lv.xml
>>>> 428K ./make/data/cldr/common/main/fi.xml
>>>> 427K ./test/jdk/sun/nio/cs/OLD/
>>>> 421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>>> 419K ./make/data/cldr/common/main/en.xml
>>>> 418K ./src/hotspot/cpu/x86/
>>>> 416K ./make/data/cldr/common/main/sr_Latn.xml
>>>> The list was compiled by running:
>>>> find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort -nr 
>>>> | numfmt --field=1 --to=iec | head -n 100
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the jdk-dev mailing list