Largest files in the JDK repo
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Thu Oct 24 20:23:09 UTC 2024
On 2024-10-24 18:38, Naoto Sato wrote:
> Thanks Magnus for food for thought.
>
> Character sets and localizations are inherently huge, as they cover
> each Unicode code point and locales.
I understand. I guess we are not trying to use any kind of compact
representation in the source code either, assuming it is better to have
something that is readable but large rather than e.g. a compressed
binary file.
> But I see some possibility of reducing, or removing ones that are for
> compatibility reasons, such as old GB18030 or *_OLD mappings.
That'd certainly be nice!
> > excessive. (And one can not help but wonder what kind of file src/
> > java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>
> This file is a golden file from Unicode Consortium, in order to check
> the conformance to their normalization spec.
Is it used in production code? Otherwise maybe we should move it to the
test directory hierarchy.
/Magnus
>
> Naoto
>
> On 10/24/24 4:04 AM, Magnus Ihse Bursie wrote:
>> I got intrigued at how https://bugs.openjdk.org/browse/JDK-8339507
>> could integrate a 7 MB large file without nobody noticing, so I
>> started wondering how many other huge text files there is in our
>> repo. (We are much more restrictive with binary files, even if they
>> are small...)
>>
>> So I compiled a top 100 list, which basically ended up being all
>> files larger than 400 kB. In total, these 100 files account from ca
>> 82 MB of data. I'm not saying that any of these files are wrong per
>> se, but maybe some of the files on this list could provide a bit food
>> for thought. Further down is the complete top-list, but it is a bit
>> hard to get a grip on. I sorted and grouped the result, since the
>> large files are not randomly sprinkled throughout the code base. This
>> list does not contain test files. The huge test files are more
>> numerous, but there are also (imho) more compelling reasons in
>> general to allow for bigger files in testing. With that said, even
>> some of the test files seems a bit excessive. (And one can not help
>> but wonder what kind of file src/
>> java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>>
>> Character sets and localization:
>> * make/data/charsetmapping
>> * make/data/cldr
>> * src/java.base/share/data/lsrdata/
>> * src/java.base/share/data/unicodedata
>> * src/java.base/share/classes/java/lang/Character.java
>> * src/java.base/share/classes/sun/nio/cs/GB18030.java
>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>> 3rd party source:
>> * src/jdk.incubator.vector/*/native/libjsvml/*.S
>> * src/java.base/share/native/libzip/zlib/crc32.h
>> * src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>> Symbols from previous JDKS:
>> * src/jdk.compiler/share/data/symbols
>> Huge Hotspot files:
>> * src/hotspot/cpu/*/*.ad
>> * src/hotspot/cpu/x86/assembler_x86.cpp
>> * src/hotspot/share/prims/jvmti.xml
>> Other:
>> * src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>> * src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>> * src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>> And a binary file:
>> * src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>>
>> And here is the complete top list:
>>
>> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
>> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
>> 2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
>> 2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
>> 2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
>> 2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
>> 2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
>> 2.0M
>> ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
>> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
>> 1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
>> 1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
>> 1.5M ./test/jdk/java/foreign/libTestUpcall.c
>> 1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
>> 1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
>> 1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
>> 952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
>> 941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
>> 928K ./make/data/charsetmapping/EUC_TW.map
>> 927K
>> ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
>> 912K ./make/data/cldr/common/supplemental/likelySubtags.xml
>> 898K ./make/data/charsetmapping/MS936.map
>> 865K
>> ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
>> 857K ./test/jdk/java/foreign/libTestDowncall.c
>> 843K
>> ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
>> 830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
>> 794K ./make/data/cldr/common/main/ru.xml
>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
>> 774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
>> 767K
>> ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
>> 752K ./make/data/cldr/common/main/uk.xml
>> 742K ./make/data/charsetmapping/Johab.map
>> 741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
>> 739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
>> 733K
>> ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
>> 731K ./make/data/charsetmapping/MS950.map
>> 727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
>> 709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
>> 698K ./make/data/charsetmapping/MS949.map
>> 695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
>> 655K ./src/hotspot/cpu/x86/assembler_x86.cpp
>> 647K ./test/jdk/java/lang/instrument/BigClass.java
>> 634K
>> ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
>> 628K
>> ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
>> 616K
>> ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
>> 601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
>> 597K ./src/hotspot/share/prims/jvmti.xml
>> 597K ./test/jdk/sun/security/ec/SigGen-1.txt
>> 593K ./make/data/cldr/common/main/lt.xml
>> 582K ./make/data/cldr/common/main/cs.xml
>> 579K ./src/java.base/share/native/libzip/zlib/crc32.h
>> 577K ./make/data/cldr/common/main/sk.xml
>> 577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
>> 572K
>> ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
>> 567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
>> 539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
>> 536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
>> 534K
>> ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
>> 532K ./make/data/cldr/common/main/ff_Adlm.xml
>> 531K
>> ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
>> 526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
>> 524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
>> 523K ./make/data/cldr/common/main/pl.xml
>> 520K
>> ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
>> 518K ./make/data/cldr/common/main/sl.xml
>> 510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
>> 509K ./make/data/cldr/common/main/mr.xml
>> 507K ./make/data/cldr/common/main/kn.xml
>> 505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
>> 504K ./make/data/cldr/common/main/sr.xml
>> 503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
>> 502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
>> 501K ./make/data/cldr/common/main/ta.xml
>> 496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
>> 490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
>> 489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
>> 485K
>> ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
>> 485K ./src/hotspot/cpu/aarch64/aarch64.ad
>> 478K ./src/hotspot/cpu/ppc/ppc.ad
>> 467K ./make/data/cldr/common/main/gd.xml
>> 466K ./src/java.base/share/classes/java/lang/Character.java
>> 453K ./make/data/cldr/common/main/ar.xml
>> 452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
>> 446K
>> ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
>> 445K ./make/data/cldr/common/main/cy.xml
>> 443K ./make/data/cldr/common/main/ml.xml
>> 442K ./make/data/cldr/common/main/br.xml
>> 442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
>> 442K ./make/data/cldr/common/main/hr.xml
>> 441K ./src/hotspot/cpu/x86/x86_32.ad
>> 438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>> 436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
>> 433K ./make/data/cldr/common/main/el.xml
>> 432K
>> ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
>> 429K ./make/data/cldr/common/main/lv.xml
>> 428K ./make/data/cldr/common/main/fi.xml
>> 427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
>> 421K
>> ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>> 419K ./make/data/cldr/common/main/en.xml
>> 418K ./src/hotspot/cpu/x86/x86.ad
>> 416K ./make/data/cldr/common/main/sr_Latn.xml
>>
>> The list was compiled by running:
>>
>> find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort -nr |
>> numfmt --field=1 --to=iec | head -n 100
>>
>
More information about the jdk-dev
mailing list