[RFC] netx: added encoding support for UTF-16 and UTF-32
Man Wong
mwong at redhat.com
Thu Jul 8 11:30:45 PDT 2010
----- "Deepak Bhole" <dbhole at redhat.com> wrote:
> * Man Wong <mwong at redhat.com> [2010-07-06 12:06]:
> > Hi,
> >
> > This patch adds UTF-16 and UTF-32 encoding support for netx,
> allowing jnlp files saved under those encoding to launch in netx [1].
> Previously, when a jnlp file with UTF-16 or UTF-32 encoding is passed
> in, netx will throw an exception even though the jnlp file was a valid
> file. It would be greatly appreciated if someone can look over the
> code, make sure it is ok and see if additional comments are needed to
> make the code easier to understand.
> >
> > Thanks,
> > Man Lung Wong
> >
> > [1]
> http://icedtea.classpath.org/~mwong/webstart/HelloWorld/Test.jnlp (a
> simple Hello World applet I created to test this patch)
>
> InputStreamReader has a getEncoding() method[1]. Can that not be used
> instead of defining our own?
>
> 1:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.4.2/docs/api/java/io/InputStreamReader.html#getEncoding%28%29
>
> Deepak
>
Their getEncoding() method just gets the encoding base on what the charset that was passed in to initialize
InputStreamReader. And that is the part that the getEncoding() method I defined returns, the charset to be
passed in to initialize InputStreamReader. If we don't pass in any charset, like before, it will default to
UTF-8, which was the cause of the problem.
Is there anything else?
Thanks,
Man Lung Wong
> > diff -r 5c61be3119bb netx/net/sourceforge/jnlp/Parser.java
> > --- a/netx/net/sourceforge/jnlp/Parser.java Mon Jul 05 17:31:35 2010
> +0100
> > +++ b/netx/net/sourceforge/jnlp/Parser.java Mon Jul 05 17:41:10 2010
> -0400
> > @@ -1168,12 +1168,16 @@
> > Node document = new Node(TinyParser.parseXML(input));
> > Node jnlpNode = getChildNode(document, "jnlp"); // skip
> comments
> > */
> > +
> > + //A BufferedInputStream is used to allow marking and
> reseting
> > + //of a stream.
> > + BufferedInputStream bs = new
> BufferedInputStream(input);
> >
> > /* NANO */
> > final XMLElement xml = new XMLElement();
> > final PipedInputStream pin = new PipedInputStream();
> > - final PipedOutputStream pout = new
> PipedOutputStream(pin);
> > - final InputStreamReader isr = new
> InputStreamReader(input);
> > + final PipedOutputStream pout = new
> PipedOutputStream(pin);
> > + final InputStreamReader isr = new InputStreamReader(bs,
> getEncoding(bs));
> > // Clean the jnlp xml file of all comments before
> passing
> > // it to the parser.
> > new Thread(
> > @@ -1196,7 +1200,69 @@
> > throw new ParseException(R("PBadXML"), ex);
> > }
> > }
> > +
> > + /**
> > + * Returns the name of the encoding used in this InputStream.
> > + *
> > + * @param input the InputStream
> > + * @return a String representation of encoding
> > + */
> > + private static String getEncoding(InputStream input) throws
> IOException{
> > + //Fixme: This only recognizes UTF-8, UTF-16, and
> > + //UTF-32, which is enough to parse the prolog portion of
> xml to
> > + //find out the exact encoding (if it exists). The reason
> being
> > + //there could be other encodings, such as ISO 8859 which is
> 8-bits
> > + //but it supports latin characters.
> > + //So what needs to be done is to parse the prolog and
> retrieve
> > + //the exact encoding from it.
> >
> > + int[] s = new int[4];
> > + String encoding = "UTF-8";
> > +
> > + //Determine what the first four bytes are and store
> > + //them into an int array.
> > + input.mark(4);
> > + for (int i = 0; i < 4; i++) {
> > + s[i] = input.read();
> > + }
> > + input.reset();
> > +
> > + //Set the encoding base on what the first four bytes of
> the
> > + //inputstream turn out to be (following the information
> from
> > + //www.w3.org/TR/REC-xml/#sec-guessing).
> > + if (s[0] == 255) {
> > + if (s[1] == 254) {
> > + if (s[2] != 0 || s[3] != 0) {
> > + encoding = "UnicodeLittle";
> > + } else {
> > + encoding = "X-UTF-32LE-BOM";
> > + }
> > + }
> > + } else if (s[0] == 254 && s[1] == 255 && (s[2] != 0 ||
> > + s[3] != 0)) {
> > + encoding = "UTF-16";
> > +
> > + } else if (s[0] == 0 && s[1] == 0 && s[2] == 254 &&
> > + s[3] == 255) {
> > + encoding = "X-UTF-32BE-BOM";
> > +
> > + } else if (s[0] == 0 && s[1] == 0 && s[2] == 0 &&
> > + s[3] == 60) {
> > + encoding = "UTF-32BE";
> > +
> > + } else if (s[0] == 60 && s[1] == 0 && s[2] == 0 &&
> > + s[3] == 0) {
> > + encoding = "UTF-32LE";
> > +
> > + } else if (s[0] == 0 && s[1] == 60 && s[2] == 0 &&
> > + s[3] == 63) {
> > + encoding = "UTF-16BE";
> > + } else if (s[0] == 60 && s[1] == 0 && s[2] == 63 &&
> > + s[3] == 0) {
> > + encoding = "UTF-16LE";
> > + }
> > +
> > + return encoding;
> > + }
> > }
> >
> > -
More information about the distro-pkg-dev
mailing list