Largest files in the JDK repo
Naoto Sato
naoto.sato at oracle.com
Thu Oct 24 16:38:16 UTC 2024
Thanks Magnus for food for thought.
Character sets and localizations are inherently huge, as they cover each
Unicode code point and locales. But I see some possibility of reducing,
or removing ones that are for compatibility reasons, such as old GB18030
or *_OLD mappings.
> excessive. (And one can not help but wonder what kind of file src/
> java.base/share/data/unicodedata/NormalizationTest.txt really is.)
This file is a golden file from Unicode Consortium, in order to check
the conformance to their normalization spec.
Naoto
On 10/24/24 4:04 AM, Magnus Ihse Bursie wrote:
> I got intrigued at how https://bugs.openjdk.org/browse/JDK-8339507 could
> integrate a 7 MB large file without nobody noticing, so I started
> wondering how many other huge text files there is in our repo. (We are
> much more restrictive with binary files, even if they are small...)
>
> So I compiled a top 100 list, which basically ended up being all files
> larger than 400 kB. In total, these 100 files account from ca 82 MB of
> data. I'm not saying that any of these files are wrong per se, but maybe
> some of the files on this list could provide a bit food for thought.
> Further down is the complete top-list, but it is a bit hard to get a
> grip on. I sorted and grouped the result, since the large files are not
> randomly sprinkled throughout the code base. This list does not contain
> test files. The huge test files are more numerous, but there are also
> (imho) more compelling reasons in general to allow for bigger files in
> testing. With that said, even some of the test files seems a bit
> excessive. (And one can not help but wonder what kind of file src/
> java.base/share/data/unicodedata/NormalizationTest.txt really is.)
>
> Character sets and localization:
> * make/data/charsetmapping
> * make/data/cldr
> * src/java.base/share/data/lsrdata/
> * src/java.base/share/data/unicodedata
> * src/java.base/share/classes/java/lang/Character.java
> * src/java.base/share/classes/sun/nio/cs/GB18030.java
> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
> * src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
> 3rd party source:
> * src/jdk.incubator.vector/*/native/libjsvml/*.S
> * src/java.base/share/native/libzip/zlib/crc32.h
> * src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
> Symbols from previous JDKS:
> * src/jdk.compiler/share/data/symbols
> Huge Hotspot files:
> * src/hotspot/cpu/*/*.ad
> * src/hotspot/cpu/x86/assembler_x86.cpp
> * src/hotspot/share/prims/jvmti.xml
> Other:
> * src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
> * src/java.base/share/classes/java/lang/invoke/MethodHandles.java
> * src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
> And a binary file:
> * src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
>
> And here is the complete top list:
>
> 6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
> 3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
> 2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
> 2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
> 2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
> 2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
> 2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
> 2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
> 1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
> 1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
> 1.5M ./test/jdk/java/foreign/libTestUpcall.c
> 1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
> 1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
> 1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
> 952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
> 941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
> 928K ./make/data/charsetmapping/EUC_TW.map
> 927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
> 912K ./make/data/cldr/common/supplemental/likelySubtags.xml
> 898K ./make/data/charsetmapping/MS936.map
> 865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
> 857K ./test/jdk/java/foreign/libTestDowncall.c
> 843K ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
> 830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
> 794K ./make/data/cldr/common/main/ru.xml
> 774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
> 774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
> 767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
> 752K ./make/data/cldr/common/main/uk.xml
> 742K ./make/data/charsetmapping/Johab.map
> 741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
> 739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
> 733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
> 731K ./make/data/charsetmapping/MS950.map
> 727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
> 709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
> 698K ./make/data/charsetmapping/MS949.map
> 695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
> 655K ./src/hotspot/cpu/x86/assembler_x86.cpp
> 647K ./test/jdk/java/lang/instrument/BigClass.java
> 634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
> 628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
> 616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
> 601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
> 597K ./src/hotspot/share/prims/jvmti.xml
> 597K ./test/jdk/sun/security/ec/SigGen-1.txt
> 593K ./make/data/cldr/common/main/lt.xml
> 582K ./make/data/cldr/common/main/cs.xml
> 579K ./src/java.base/share/native/libzip/zlib/crc32.h
> 577K ./make/data/cldr/common/main/sk.xml
> 577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
> 572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
> 567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
> 539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
> 536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
> 534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
> 532K ./make/data/cldr/common/main/ff_Adlm.xml
> 531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
> 526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
> 524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
> 523K ./make/data/cldr/common/main/pl.xml
> 520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
> 518K ./make/data/cldr/common/main/sl.xml
> 510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
> 509K ./make/data/cldr/common/main/mr.xml
> 507K ./make/data/cldr/common/main/kn.xml
> 505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
> 504K ./make/data/cldr/common/main/sr.xml
> 503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
> 502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
> 501K ./make/data/cldr/common/main/ta.xml
> 496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
> 490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
> 489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
> 485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
> 485K ./src/hotspot/cpu/aarch64/aarch64.ad
> 478K ./src/hotspot/cpu/ppc/ppc.ad
> 467K ./make/data/cldr/common/main/gd.xml
> 466K ./src/java.base/share/classes/java/lang/Character.java
> 453K ./make/data/cldr/common/main/ar.xml
> 452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
> 446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
> 445K ./make/data/cldr/common/main/cy.xml
> 443K ./make/data/cldr/common/main/ml.xml
> 442K ./make/data/cldr/common/main/br.xml
> 442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
> 442K ./make/data/cldr/common/main/hr.xml
> 441K ./src/hotspot/cpu/x86/x86_32.ad
> 438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
> 436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
> 433K ./make/data/cldr/common/main/el.xml
> 432K ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
> 429K ./make/data/cldr/common/main/lv.xml
> 428K ./make/data/cldr/common/main/fi.xml
> 427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
> 421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
> 419K ./make/data/cldr/common/main/en.xml
> 418K ./src/hotspot/cpu/x86/x86.ad
> 416K ./make/data/cldr/common/main/sr_Latn.xml
>
> The list was compiled by running:
>
> find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort -nr |
> numfmt --field=1 --to=iec | head -n 100
>
More information about the jdk-dev
mailing list