Insufficiencies in JEP: 400: UTF-8 by Default

Roger Riggs roger.riggs at oracle.com
Wed Mar 31 17:03:54 UTC 2021


Hi Anthony,

A draft of updates to the Process API is in the works and covers improving
the ease of use and providing Readers and Writers.  Note that if process 
output
is redirected to a file, it does not interpose on the byte streams and 
is not in
a position to affect the character set used by the child process.

Regards, Roger


On 3/30/21 1:03 PM, Anthony Vanelverdinghe wrote:
> Hi Alan
>
> As Marco mentioned, another use case is sub-process stdin/stdout/stderr. In my particular instance, I'm starting a Process which has its output redirected to a file. It uses the platform's default encoding for writing to stdout. So when I want to read its output from the file at some later point, I need to supply that encoding to the Files API.
> One way to accommodate this use case, is a method which allows to retrieve the platform's default encoding, for example a method `platformEncoding` in Charset or Process, or the `Console::charset` method you mentioned. Another option would be to enhance the Process API, by adding methods to Process which return appropriate Readers/Writers & adding methods of the form `redirectX(File file, Charset encoding)` to ProcessBuilder. But this seems like a lot of additional API surface, just to avoid surfacing the platform's default encoding itself.
> So I think the JEP should specify how it'll address use cases w.r.t. the Process API, shouldn't it?
>
> Kind regards,
> Anthony
>   
> On Sunday, March 14, 2021 13:01 CET, Alan Bateman <Alan.Bateman at oracle.com> wrote:
>   
>> On 14/03/2021 11:00, Marco wrote:
>>> :
>>>
>>> IMO Charset should provide standardized getters for the OS charset and the
>>> console charset. The latter being different has been a long standing issue on
>>> Windows where the codepage differs between its CLI and regular environments.
>>> OpenJDK has the necessary data already available in its custom system
>>> properties.
>>>
>>> The console charset is currently hidden behind PrintStream not exposing the
>>> underlying OSWriter and not offering getEncoding() itself. The OS charset
>>> would be hidden in the future by Charset.getDefaultCharset()'s specification
>>> change in JEP 400.
>> The intention that there will be little or no impact to the console
>> streams. This means that java.io.Console reader/writer methods should
>> continue to return a Reader/PrintWriter that uses the platform encoding
>> (or code page is on Windows). Same thing for the System.out/System.err
>> print streams. We need to make this clearer in the JEP.
>>
>> There has been discussion on this mailing list about adding a
>> Console::charset method but it didn't come to a consensus. Naoto Sato
>> and I have been chatting about it again recently as there may be a need
>> to add an API in advance of proposing to target the JEP.
>>
>> One case that we are still mulling over is code that creates an
>> InputStreamReader on System.in without specifying the charset. This
>> might be older code that pre-dates java.io.Console or maybe code that
>> wasn't tested on a wide range or platforms. Options range from a spec
>> change to doing nothing (the latter meaning running with "COMPACT" or
>> migrating the code to use the 2-arg constructor as the default charset
>> is not the right choice).
>>
>> -Alan
>>
>>
>>



More information about the core-libs-dev mailing list