Largest files in the JDK repo
Jorn Vernee
jorn.vernee at oracle.com
Thu Oct 24 22:29:40 UTC 2024
> You indicated that 1 is true. Is that the case for 2 and 3 as well?
Yes, both 2 and 3 are true. (3 is in fact required, because the test
code uses the same stream of 'shapes' to do the actual calls).
The program that generates this code is
test/jdk/java/foreign/CallgeneratorHelper.java [1] It generates 5 files
in total. It's a bit entangled with the current test code, but nothing
we can't separate out I think. I suppose the trickiest part is that the
actual test also needs access to the code when running.
FWIW, we have other examples of test code that is generated by scripts
as well, such as
test/jdk/java/lang/invoke/VarHandles/generate-vh-tests.sh and various
scripts under test/jdk/java/nio/Buffer which invoke SPP. There are
probably more cases like that.
Jorn
[1]:
https://github.com/openjdk/jdk/blob/master/test/jdk/java/foreign/CallGeneratorHelper.java
On 24-10-2024 22:28, Magnus Ihse Bursie wrote:
>
> On 2024-10-24 20:47, Jorn Vernee wrote:
>
>> WRT the two biggest files:
>>
>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>>
>> These are mechanically generate C libraries featuring a lot of
>> different function shapes, for testing of FFM downcalls. The Java
>> code that is used to generate these C files could theoretically run
>> as part of the test as well, but the problem is that then we then
>> need to compile the generated sources into a native library.
>>
>> Currently the JDK build system will find and build all native
>> libraries needed for tests before any of the tests run, but maybe
>> it's possible to create a way for a test to request that a native
>> library be built on demand. Then we wouldn't need to pre-generate
>> these files and include them in the repo, and could instead generate
>> + compile them when the test runs. (This might also help cut down on
>> the build time of the test image, since you'd only need to compile
>> test libraries for the tests that actually run).
>>
> Unfortunately, requesting compilation of native code at test time is
> not trivial to support, and I don't even think we want to even try
> doing that, for various reasons.
>
> However, generating source code just in time for compilation is
> commonplace in the JDK; we call it "gensrc" in the build system. We
> have not done so for tests so far, but it would not be horribly hard
> to change add gensrc functionality to native tests as well.
>
> I'd say that there are three criteria that indicate we should start
> using a gensrc system for these tests:
>
> 1) They are generated by a Java tool
>
> 2) That tool runs rather quickly
>
> 3) Changing that tool, rather than changing the individual files, is
> the preferred way of updating this source code
>
> You indicated that 1 is true. Is that the case for 2 and 3 as well?
>
> /Magnus
>
>
>> Jorn
>>
>> On 24-10-2024 13:04, Magnus Ihse Bursie wrote:
>>>
>>> I got intrigued at how https://bugs.openjdk.org/browse/JDK-8339507
>>> could integrate a 7 MB large file without nobody noticing, so I
>>> started wondering how many other huge text files there is in our
>>> repo. (We are much more restrictive with binary files, even if they
>>> are small...)
>>>
>>> So I compiled a top 100 list, which basically ended up being all
>>> files larger than 400 kB. In total, these 100 files account from ca
>>> 82 MB of data. I'm not saying that any of these files are wrong per
>>> se, but maybe some of the files on this list could provide a bit
>>> food for thought. Further down is the complete top-list, but it is a
>>> bit hard to get a grip on. I sorted and grouped the result, since
>>> the large files are not randomly sprinkled throughout the code base.
>>> This list does not contain test files. The huge test files are more
>>> numerous, but there are also (imho) more compelling reasons in
>>> general to allow for bigger files in testing. With that said, even
>>> some of the test files seems a bit excessive. (And one can not help
>>> but wonder what kind of file
>>> src/java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>>>
>>> Character sets and localization:
>>> * make/data/charsetmapping
>>> * make/data/cldr
>>> * src/java.base/share/data/lsrdata/
>>> * src/java.base/share/data/unicodedata
>>> * src/java.base/share/classes/java/lang/Character.java
>>> * src/java.base/share/classes/sun/nio/cs/GB18030.java
>>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>>> 3rd party source:
>>> * src/jdk.incubator.vector/*/native/libjsvml/*.S
>>> * src/java.base/share/native/libzip/zlib/crc32.h
>>> * src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>>> Symbols from previous JDKS:
>>> * src/jdk.compiler/share/data/symbols
>>> Huge Hotspot files:
>>> * src/hotspot/cpu/*/*.ad
>>> * src/hotspot/cpu/x86/assembler_x86.cpp
>>> * src/hotspot/share/prims/jvmti.xml
>>> Other:
>>> * src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>>> * src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>>> * src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>>> And a binary file:
>>> * src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>>
>>> And here is the complete top list:
>>>
>>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>>> 2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
>>> 2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
>>> 2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
>>> 2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
>>> 2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
>>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
>>> 1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
>>> 1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
>>> 1.5M ./test/jdk/java/foreign/libTestUpcall.c
>>> 1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
>>> 1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
>>> 1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
>>> 952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
>>> 941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
>>> 928K ./make/data/charsetmapping/EUC_TW.map
>>> 927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
>>> 912K ./make/data/cldr/common/supplemental/likelySubtags.xml
>>> 898K ./make/data/charsetmapping/MS936.map
>>> 865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>>> 857K ./test/jdk/java/foreign/libTestDowncall.c
>>> 843K ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
>>> 830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>>> 794K ./make/data/cldr/common/main/ru.xml
>>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
>>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
>>> 767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
>>> 752K ./make/data/cldr/common/main/uk.xml
>>> 742K ./make/data/charsetmapping/Johab.map
>>> 741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
>>> 739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
>>> 733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
>>> 731K ./make/data/charsetmapping/MS950.map
>>> 727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
>>> 709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
>>> 698K ./make/data/charsetmapping/MS949.map
>>> 695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
>>> 655K ./src/hotspot/cpu/x86/assembler_x86.cpp
>>> 647K ./test/jdk/java/lang/instrument/BigClass.java
>>> 634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
>>> 628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
>>> 616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
>>> 601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>>> 597K ./src/hotspot/share/prims/jvmti.xml
>>> 597K ./test/jdk/sun/security/ec/SigGen-1.txt
>>> 593K ./make/data/cldr/common/main/lt.xml
>>> 582K ./make/data/cldr/common/main/cs.xml
>>> 579K ./src/java.base/share/native/libzip/zlib/crc32.h
>>> 577K ./make/data/cldr/common/main/sk.xml
>>> 577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
>>> 572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
>>> 567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
>>> 539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
>>> 536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
>>> 534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
>>> 532K ./make/data/cldr/common/main/ff_Adlm.xml
>>> 531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
>>> 526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
>>> 524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
>>> 523K ./make/data/cldr/common/main/pl.xml
>>> 520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
>>> 518K ./make/data/cldr/common/main/sl.xml
>>> 510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
>>> 509K ./make/data/cldr/common/main/mr.xml
>>> 507K ./make/data/cldr/common/main/kn.xml
>>> 505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
>>> 504K ./make/data/cldr/common/main/sr.xml
>>> 503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
>>> 502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
>>> 501K ./make/data/cldr/common/main/ta.xml
>>> 496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
>>> 490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
>>> 489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
>>> 485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
>>> 485K ./src/hotspot/cpu/aarch64/aarch64.ad
>>> 478K ./src/hotspot/cpu/ppc/ppc.ad
>>> 467K ./make/data/cldr/common/main/gd.xml
>>> 466K ./src/java.base/share/classes/java/lang/Character.java
>>> 453K ./make/data/cldr/common/main/ar.xml
>>> 452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
>>> 446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
>>> 445K ./make/data/cldr/common/main/cy.xml
>>> 443K ./make/data/cldr/common/main/ml.xml
>>> 442K ./make/data/cldr/common/main/br.xml
>>> 442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
>>> 442K ./make/data/cldr/common/main/hr.xml
>>> 441K ./src/hotspot/cpu/x86/x86_32.ad
>>> 438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>>> 436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
>>> 433K ./make/data/cldr/common/main/el.xml
>>> 432K ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>>> 429K ./make/data/cldr/common/main/lv.xml
>>> 428K ./make/data/cldr/common/main/fi.xml
>>> 427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
>>> 421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>> 419K ./make/data/cldr/common/main/en.xml
>>> 418K ./src/hotspot/cpu/x86/x86.ad
>>> 416K ./make/data/cldr/common/main/sr_Latn.xml
>>>
>>> The list was compiled by running:
>>>
>>> find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort -nr |
>>> numfmt --field=1 --to=iec | head -n 100
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20241025/d47c957e/attachment-0001.htm>
More information about the jdk-dev
mailing list