RFR: 8356893: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner

Alan Bateman alanb at openjdk.org
Thu May 22 05:54:52 UTC 2025


On Wed, 21 May 2025 20:56:31 GMT, Volkan Yazici <vyazici at openjdk.org> wrote:

> There are several locations in the JDK source where `System.in` and `FileDescriptor.in` is read with `InputStreamReader` and `Scanner` using the default charset. As recommended by the recently merged [JDK-8356420](https://bugs.openjdk.org/browse/JDK-8356420), this PR replaces the default charset with the one provided by the `stdin.encoding` system property.
> 
> ### Fixing strategy
> 
> * Where it is obvious that `System.in` is passed to `InputStreamReader`/`Scanner` ctors, `stdin.encoding` is employed fixed.
> * Where the `InputStream` passed to `InputStreamReader`/`Scanner` ctors is difficult to determine if it can ever be `System.in`, `assert` expressions are placed.
> * Where the odds of receiving `System.in` are low, yet it is technically possible (e.g., `Process::getInputStream`, `URL::openConnection`, `Class::getResourceAsStream`), nothing is done.
> 
> @naotoj was kind enough to guide me in this PR, and stated `assert` expressions can be skipped, since they are many ways one can circumvent those checks; wrapping `System.in`, usage of `System::setIn`, etc. Yet we decided to leave them as is to collect feedback from other reviewers too.
> 
> ### Scanning strategy
> 
> The following ~alien technology~ advanced static analysis tools are used to scan the code for potentially affected places:
> 
> 
> # Perl is used for multi-line matching
> find . -name "*.java" -exec perl -0777 -ne 'my $r = (/(InputStreamReader|Scanner)(\s*System.in)/) ? 0 : 1; exit $r' {} ; -print
> git grep -H 'FileDescriptor.in' "*.java"
> 
> 
> All calls to `InputStreamReader::new` and `Scanner::new` are checked too.
> 
> ### Problems encountered
> 
> 1. Due to either irregular, or non-existent license header, could not update the copyright year for following classes:
> 
>     ```
>     DOMImplementationRegistry 
>     InputRC 
>     ListingErrorHandler 
>     PandocFilter 
>     ```
> 2. Could not employ `stdin.encoding` in `PandocFilter`, since the bootstrap VM running that class returns empty for that system property

javax.swing.text.DefaultEditorKit.read(InputStream ...)  is one example changed in the PR. If that is changed to special case System.in then it will require a spec change. It also means that `read(System.in)` will behave differently to say `new BufferedInputStream(System.in)`.  From a quick scan, I suspect changes that impact the APIs will need to dropped from the PR, maybe replaced with spec clarification to document the charset that is actually used. In the DefaultEditorKit.read example it might direct folks to the read(Reader ..) method so that code can control which charset to use.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25368#issuecomment-2899998753


More information about the core-libs-dev mailing list