Inconsistencies when creating a Reader from a Path

Norbert Kiesel nkiesel at gmail.com
Fri Feb 28 18:22:26 UTC 2020


 [Note: I originally sent the below mail to jdk-dev but Alan Bateman
suggested to re-post it here]

The following 2 ways to construct a `Reader` for a `Path` look very
similar (with a slight edge for the first one because it is shorter):

```java
Reader reader1 = Files.newBufferedReader(path, StandardcharSets.UTF_8);
Reader reader2 = new BufferedReader(new
InputStreamReader(Files.newInputStream(path),
StandardCharsets.UTF_8));
```
Both readers ultimately create a `StreamDecoder` that is used to read
and decode the input.  However, they use different constructors:

 - `reader1` calls the constructor with a `CharsetDecoder` created
from the`Charset` using `newDecoder()`
 - `reader2` calls the constructor with the `Charset` which
`StreamDecoder` then converts into a `CharsetDecoder` using
`newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE))`

The result is that `reader1` will throw an exception when facing
invalid input while `reader2` will silently "fix" that invalid input
using "replacement chars".

Which brings me to my questions:

 1. Should the 2 approaches not behave identical by default?
 2. Is there a way to use the first approach but end up with the same
error behavior as the second approach? One possible way would be to
create an
    overloaded `Files.newBufferedReader` which takes a
`CharsetDecoder` as second parameter. Or perhaps add a way to create a
modified `Charset` and
    pass that to `Files.newBufferedReader`?

Alan's answer was:

The java.io APIs mostly date from early JDK releases. Changing
InputStreamReader to report errors with malformed input or unmappable
characters would be an incompatible change. Yes, replacing erroneous
input with a replacement value can be surprising in some usages. I think
we should at least improve the documentation of the APIs that bridge
between byte and character streams. The Files.newBufferedReader factory
method was intended to avoid surprises. It uses UTF-8 by default (not
the default charset) and is specified to return a Reader that throws
when malformed input or unmappable is encountered. Yes, a variant that
takes a CharsetDecoder could be added but that is an advanced API so the
method would only save one line of code for someone that has a
CharsetDecoder. Best to follow up on core-libs-dev, I think we should at
least improve the javadoc in a number of places.

-Alan


More information about the core-libs-dev mailing list