<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>I got intrigued at how
<a class="moz-txt-link-freetext" href="https://bugs.openjdk.org/browse/JDK-8339507">https://bugs.openjdk.org/browse/JDK-8339507</a> could integrate a 7 MB
large file without nobody noticing, so I started wondering how
many other huge text files there is in our repo. (We are much more
restrictive with binary files, even if they are small...)</p>
<p>So I compiled a top 100 list, which basically ended up being all
files larger than 400 kB. In total, these 100 files account from
ca 82 MB of data. I'm not saying that any of these files are wrong
per se, but maybe some of the files on this list could provide a
bit food for thought. Further down is the complete top-list, but
it is a bit hard to get a grip on. I sorted and grouped the
result, since the large files are not randomly sprinkled
throughout the code base. This list does not contain test files.
The huge test files are more numerous, but there are also (imho)
more compelling reasons in general to allow for bigger files in
testing. With that said, even some of the test files seems a bit
excessive. (And one can not help but wonder what kind of file
src/java.base/share/data/unicodedata/NormalizationTest.txt really
is.)<br>
</p>
<div style="color: #000000;background-color: #ffffff;font-family: Hack,'Droid Sans Mono', 'monospace', monospace, Menlo, Monaco, 'Courier New', monospace;font-weight: normal;font-size: 12px;line-height: 18px;white-space: pre;"><div><span style="color: #000000;">Character sets and localization:</span></div>
<div><span style="color: #000000;">* make/data/charsetmapping</span></div><div><span style="color: #000000;">* make/data/cldr</span></div><div><span style="color: #000000;">* src/java.base/share/data/lsrdata/</span></div><div><span style="color: #000000;">* src/java.base/share/data/unicodedata</span></div><div><span style="color: #000000;">* src/java.base/share/classes/java/lang/Character.java</span></div><div><span style="color: #000000;">* src/java.base/share/classes/sun/nio/cs/GB18030.java</span></div><div><span style="color: #000000;">* src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java</span></div><div><span style="color: #000000;">* src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template</span></div>
<div><span style="color: #000000;">3rd party source:</span></div>
<div><span style="color: #000000;">* src/jdk.incubator.vector/*/native/libjsvml/*.S</span></div><div><span style="color: #000000;">* src/java.base/share/native/libzip/zlib/crc32.h</span></div><div><span style="color: #000000;">* src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h</span></div>
<div><span style="color: #000000;">Symbols from previous JDKS:</span></div><div><span style="color: #000000;">* src/jdk.compiler/share/data/symbols</span></div>
<div><span style="color: #000000;">Huge Hotspot files:</span></div><div><span style="color: #000000;">* src/hotspot/cpu/*/*.ad</span></div><div><span style="color: #000000;">* src/hotspot/cpu/x86/assembler_x86.cpp</span></div><div><span style="color: #000000;">* src/hotspot/share/prims/jvmti.xml</span></div>
<div><span style="color: #000000;">Other:</span></div><div><span style="color: #000000;">* src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf</span></div><div><span style="color: #000000;">* src/java.base/share/classes/java/lang/invoke/MethodHandles.java</span></div><div><span style="color: #000000;">* src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java</span></div>
<div><span style="color: #000000;">And a binary file:</span></div><div><span style="color: #000000;">* src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg</span></div>
</div>
<p></p>
<p>And here is the complete top list:<br>
</p>
<pre>6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
1.5M ./test/jdk/java/foreign/libTestUpcall.c
1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
928K ./make/data/charsetmapping/EUC_TW.map
927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
912K ./make/data/cldr/common/supplemental/likelySubtags.xml
898K ./make/data/charsetmapping/MS936.map
865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
857K ./test/jdk/java/foreign/libTestDowncall.c
843K ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
794K ./make/data/cldr/common/main/ru.xml
774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
752K ./make/data/cldr/common/main/uk.xml
742K ./make/data/charsetmapping/Johab.map
741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
731K ./make/data/charsetmapping/MS950.map
727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
698K ./make/data/charsetmapping/MS949.map
695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
655K ./src/hotspot/cpu/x86/assembler_x86.cpp
647K ./test/jdk/java/lang/instrument/BigClass.java
634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
597K ./src/hotspot/share/prims/jvmti.xml
597K ./test/jdk/sun/security/ec/SigGen-1.txt
593K ./make/data/cldr/common/main/lt.xml
582K ./make/data/cldr/common/main/cs.xml
579K ./src/java.base/share/native/libzip/zlib/crc32.h
577K ./make/data/cldr/common/main/sk.xml
577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
532K ./make/data/cldr/common/main/ff_Adlm.xml
531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
523K ./make/data/cldr/common/main/pl.xml
520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
518K ./make/data/cldr/common/main/sl.xml
510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
509K ./make/data/cldr/common/main/mr.xml
507K ./make/data/cldr/common/main/kn.xml
505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
504K ./make/data/cldr/common/main/sr.xml
503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
501K ./make/data/cldr/common/main/ta.xml
496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
485K ./src/hotspot/cpu/aarch64/aarch64.ad
478K ./src/hotspot/cpu/ppc/ppc.ad
467K ./make/data/cldr/common/main/gd.xml
466K ./src/java.base/share/classes/java/lang/Character.java
453K ./make/data/cldr/common/main/ar.xml
452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
445K ./make/data/cldr/common/main/cy.xml
443K ./make/data/cldr/common/main/ml.xml
442K ./make/data/cldr/common/main/br.xml
442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
442K ./make/data/cldr/common/main/hr.xml
441K ./src/hotspot/cpu/x86/x86_32.ad
438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
433K ./make/data/cldr/common/main/el.xml
432K ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
429K ./make/data/cldr/common/main/lv.xml
428K ./make/data/cldr/common/main/fi.xml
427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
419K ./make/data/cldr/common/main/en.xml
418K ./src/hotspot/cpu/x86/x86.ad
416K ./make/data/cldr/common/main/sr_Latn.xml
</pre>
<p>The list was compiled by running:</p>
<p>find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort
-nr | numfmt --field=1 --to=iec | head -n 100<br>
</p>
</body>
</html>