<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>I got intrigued at how
      <a class="moz-txt-link-freetext" href="https://bugs.openjdk.org/browse/JDK-8339507">https://bugs.openjdk.org/browse/JDK-8339507</a> could integrate a 7 MB
      large file without nobody noticing, so I started wondering how
      many other huge text files there is in our repo. (We are much more
      restrictive with binary files, even if they are small...)</p>
    <p>So I compiled a top 100 list, which basically ended up being all
      files larger than 400 kB. In total, these 100 files account from
      ca 82 MB of data. I'm not saying that any of these files are wrong
      per se, but maybe some of the files on this list could provide a
      bit food for thought. Further down is the complete top-list, but
      it is a bit hard to get a grip on. I sorted and grouped the
      result, since the large files are not randomly sprinkled
      throughout the code base. This list does not contain test files.
      The huge test files are more numerous, but there are also (imho)
      more compelling reasons in general to allow for bigger files in
      testing. With that said, even some of the test files seems a bit
      excessive. (And one can not help but wonder what kind of file
      src/java.base/share/data/unicodedata/NormalizationTest.txt really
      is.)<br>
    </p>
    <div style="color: #000000;background-color: #ffffff;font-family: Hack,'Droid Sans Mono', 'monospace', monospace, Menlo, Monaco, 'Courier New', monospace;font-weight: normal;font-size: 12px;line-height: 18px;white-space: pre;"><div><span style="color: #000000;">Character sets and localization:</span></div>
<div><span style="color: #000000;">* make/data/charsetmapping</span></div><div><span style="color: #000000;">* make/data/cldr</span></div><div><span style="color: #000000;">* src/java.base/share/data/lsrdata/</span></div><div><span style="color: #000000;">* src/java.base/share/data/unicodedata</span></div><div><span style="color: #000000;">* src/java.base/share/classes/java/lang/Character.java</span></div><div><span style="color: #000000;">* src/java.base/share/classes/sun/nio/cs/GB18030.java</span></div><div><span style="color: #000000;">* src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java</span></div><div><span style="color: #000000;">* src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template</span></div>
<div><span style="color: #000000;">3rd party source:</span></div>
<div><span style="color: #000000;">* src/jdk.incubator.vector/*/native/libjsvml/*.S</span></div><div><span style="color: #000000;">* src/java.base/share/native/libzip/zlib/crc32.h</span></div><div><span style="color: #000000;">* src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h</span></div>
<div><span style="color: #000000;">Symbols from previous JDKS:</span></div><div><span style="color: #000000;">* src/jdk.compiler/share/data/symbols</span></div>
<div><span style="color: #000000;">Huge Hotspot files:</span></div><div><span style="color: #000000;">* src/hotspot/cpu/*/*.ad</span></div><div><span style="color: #000000;">* src/hotspot/cpu/x86/assembler_x86.cpp</span></div><div><span style="color: #000000;">* src/hotspot/share/prims/jvmti.xml</span></div>
<div><span style="color: #000000;">Other:</span></div><div><span style="color: #000000;">* src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf</span></div><div><span style="color: #000000;">* src/java.base/share/classes/java/lang/invoke/MethodHandles.java</span></div><div><span style="color: #000000;">* src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java</span></div>
<div><span style="color: #000000;">And a binary file:</span></div><div><span style="color: #000000;">* src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg</span></div>
</div>
    <p></p>
    <p>And here is the complete top list:<br>
    </p>
    <pre>6.8M ./test/jdk/java/foreign/libTestUpcallStack.c
3.5M ./test/jdk/java/foreign/libTestDowncallStack.c
2.7M ./test/jdk/com/sun/net/httpserver/docs/test1/largefile.txt
2.6M ./src/java.base/share/data/unicodedata/NormalizationTest.txt
2.3M ./test/jdk/sun/nio/cs/EUC_TW_OLD.java
2.1M ./src/jdk.compiler/share/data/symbols/java.desktop-8.sym.txt
2.0M ./src/java.desktop/share/classes/javax/swing/plaf/nimbus/skin.laf
2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.Corrigendum4.txt
2.0M ./test/jdk/java/text/Normalizer/NormalizationTest-3.2.0.txt
1.9M ./src/java.base/share/data/unicodedata/UnicodeData.txt
1.6M ./test/hotspot/jtreg/gc/TestBigObj.java
1.5M ./test/jdk/java/foreign/libTestUpcall.c
1.4M ./src/jdk.compiler/share/data/symbols/java.base-8.sym.txt
1.2M ./test/jdk/java/lang/String/concat/ImplicitStringConcatShapes.java
1.1M ./src/java.base/share/data/unicodedata/DerivedCoreProperties.txt
952K ./test/hotspot/jtreg/compiler/c2/stemmer/words
941K ./src/jdk.compiler/share/data/symbols/java.base-M.sym.txt
928K ./make/data/charsetmapping/EUC_TW.map
927K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/INDIFY_Test.java
912K ./make/data/cldr/common/supplemental/likelySubtags.xml
898K ./make/data/charsetmapping/MS936.map
865K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM964.java.template
857K ./test/jdk/java/foreign/libTestDowncall.c
843K ./test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java
830K ./src/java.desktop/share/native/common/java2d/opengl/J2D_GL/glext.h
794K ./make/data/cldr/common/main/ru.xml
774K ./test/jdk/sun/nio/cs/mapping/GB18030_2000.b2c
774K ./test/jdk/sun/nio/cs/mapping/GB18030.b2c
767K ./test/jdk/jdk/internal/math/ToDecimal/java.base/jdk/internal/math/DoubleToDecimalChecker.java
752K ./make/data/cldr/common/main/uk.xml
742K ./make/data/charsetmapping/Johab.map
741K ./test/jdk/sun/nio/cs/mapping/Johab.b2c
739K ./src/java.base/share/classes/sun/nio/cs/GB18030.java
733K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_tan_linux_x86.S
731K ./make/data/charsetmapping/MS950.map
727K ./test/jdk/sun/nio/cs/mapping/MS950.b2c
709K ./src/java.base/share/data/lsrdata/language-subtag-registry.txt
698K ./make/data/charsetmapping/MS949.map
695K ./test/jdk/sun/nio/cs/mapping/MS949.b2c
655K ./src/hotspot/cpu/x86/assembler_x86.cpp
647K ./test/jdk/java/lang/instrument/BigClass.java
634K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_sin_linux_x86.S
628K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_cos_linux_x86.S
616K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_tan_windows_x86.S
601K ./src/jdk.charsets/share/classes/sun/nio/cs/ext/IBM33722.java
597K ./src/hotspot/share/prims/jvmti.xml
597K ./test/jdk/sun/security/ec/SigGen-1.txt
593K ./make/data/cldr/common/main/lt.xml
582K ./make/data/cldr/common/main/cs.xml
579K ./src/java.base/share/native/libzip/zlib/crc32.h
577K ./make/data/cldr/common/main/sk.xml
577K ./src/jdk.compiler/share/data/symbols/java.desktop-9.sym.txt
572K ./test/jdk/javax/swing/text/html/parser/Parser/8078268/slowparse.html
567K ./test/jdk/sun/nio/cs/OLD/IBM933_OLD.java
539K ./test/jdk/sun/nio/cs/mapping/untested/gb18030_1.b2c
536K ./test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
534K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_sin_windows_x86.S
532K ./make/data/cldr/common/main/ff_Adlm.xml
531K ./src/jdk.incubator.vector/windows/native/libjsvml/jsvml_d_cos_windows_x86.S
526K ./test/jdk/sun/nio/cs/mapping/EUC_TW.b2c
524K ./src/jdk.compiler/share/data/symbols/java.desktop-B.sym.txt
523K ./make/data/cldr/common/main/pl.xml
520K ./test/hotspot/jtreg/vmTestbase/vm/mlvm/indy/stress/java/loopsAndThreads/INDIFY_Test.java
518K ./make/data/cldr/common/main/sl.xml
510K ./test/jdk/sun/nio/cs/OLD/IBM950_OLD.java
509K ./make/data/cldr/common/main/mr.xml
507K ./make/data/cldr/common/main/kn.xml
505K ./test/jdk/sun/nio/cs/OLD/IBM948_OLD.java
504K ./make/data/cldr/common/main/sr.xml
503K ./test/jdk/sun/nio/cs/OLD/IBM937_OLD.java
502K ./test/jdk/sun/net/www/protocol/jar/foo1.jar
501K ./make/data/cldr/common/main/ta.xml
496K ./test/jdk/sun/nio/cs/OLD/Johab_OLD.java
490K ./test/jdk/sun/nio/cs/OLD/MS949_OLD.java
489K ./test/hotspot/jtreg/vmTestbase/vm/jit/LongTransitions/LTTest.java
485K ./test/hotspot/jtreg/vmTestbase/jit/FloatingPoint/FPCompare/TestFPBinop/TestFPBinop.gold
485K ./src/hotspot/cpu/aarch64/aarch64.ad
478K ./src/hotspot/cpu/ppc/ppc.ad
467K ./make/data/cldr/common/main/gd.xml
466K ./src/java.base/share/classes/java/lang/Character.java
453K ./make/data/cldr/common/main/ar.xml
452K ./test/jdk/sun/nio/cs/OLD/IBM949_OLD.java
446K ./src/jdk.incubator.vector/linux/native/libjsvml/jsvml_d_pow_linux_x86.S
445K ./make/data/cldr/common/main/cy.xml
443K ./make/data/cldr/common/main/ml.xml
442K ./make/data/cldr/common/main/br.xml
442K ./test/jdk/sun/nio/cs/OLD/MS950_OLD.java
442K ./make/data/cldr/common/main/hr.xml
441K ./src/hotspot/cpu/x86/x86_32.ad
438K ./src/java.base/share/classes/java/lang/invoke/MethodHandles.java
436K ./test/jaxp/javax/xml/jaxp/unittest/transform/msgAttach.xml
433K ./make/data/cldr/common/main/el.xml
432K ./src/java.sql.rowset/share/classes/com/sun/rowset/CachedRowSetImpl.java
429K ./make/data/cldr/common/main/lv.xml
428K ./make/data/cldr/common/main/fi.xml
427K ./test/jdk/sun/nio/cs/OLD/GBK_OLD.java
421K ./src/demo/share/java2d/J2DBench/resources/cmm_images/img_icc_large.jpg
419K ./make/data/cldr/common/main/en.xml
418K ./src/hotspot/cpu/x86/x86.ad
416K ./make/data/cldr/common/main/sr_Latn.xml
</pre>
    <p>The list was compiled by running:</p>
    <p>find . -path ./.git -prune -o -type f -printf '%s %p\n' | sort
      -nr | numfmt --field=1 --to=iec | head -n 100<br>
    </p>
  </body>
</html>