This is the mail archive of the mauve-discuss@sources.redhat.com mailing list for the Mauve project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: new test cases (long)

From: Mark Wielaard <mark at klomp dot org>
To: raif at fl dot net dot au
Cc: Mauve <mauve-discuss at sources dot redhat dot com>
Date: 16 Feb 2003 18:21:04 +0100
Subject: Re: new test cases (long)
Organization:
References: <200302090318.02317.raif@fl.net.au> <200302151101.21183.raif@fl.net.au> <1045396437.30179.522.camel@elsschot> <200302170348.00588.raif@fl.net.au>

Hi Raif,

On Sun, 2003-02-16 at 17:47, Raif S. Naffah wrote:
> i agree it's somewhat confusing but we can reduce this by sticking to 
> the documentation and the behaviour of the implementation (sun's jdk 
> that is).

I am not convinced that what the documentation says is always precisely
what the Sun implementation does and/or that what the Sun implementation
does is what the documentation (should) say...

> > Also note that the 1.4 docs and 1.4.1 encoding docs actually list
> > different canonical names... Duh...
> 
> where exactly does the 1.4 and the 1.4.1 differ?

I didn't scan the documents very carefully but immediatly notived that
http://java.sun.com/j2se/1.4/docs/guide/intl/encoding.doc.html
says that the canonocal name for what is called "Windows Latin-1" is
Cp1252 for the java.nio API, but that
http://java.sun.com/j2se/1.4.1/docs/guide/intl/encoding.doc.html
says it is "windows-1252" for java.nio, but Cp1252 for java.lang/io.

> another alternative is to stick to the distinction the javadocs makes 
> wrt. to the following aspects:
> 
> * specific packages use specific, albeit sometimes, different 
> encoding/charset names;
> * some names are "canonical" others are "aliases,"
> * some names are a MUST (Basic), others (the international version of 
> the JDK) are a MAY (Extended).
> 
> this way, gnu.testlet.java.lang.String.getBytes can be the test point 
> for java.lang.* API encoding names, and something like (a new) 
> gnu.testlet.java.nio.charset.Charset.isSupported test would emulate the 
> same for the java.nio.* API encoding names.
> [...]
> if my comments above are acceptable, i can revise the getBytes classes 
> to handle distinctly the last 2 points (canonical v/s alias, and basic 
> v/s extended), and write a new test case for java.nio.* API 
> conformance.  the pass/fail requirements can then be controlled with an 
> 'xfails' file.

Sure. Having more tests so that one can test how/what character set
names are actually supported by the class library implementation would
be very welcome. But I don't know if following the Sun (canonical)
naming convention (especially differences between java.lang/io and
java.nio names) makes much sense here since it looks very confusing for
users.

One of the documents above point to the IANA Charset Registry
http://www.iana.org/assignments/character-sets (rfc2278)
These define official names and aliases (and it look automatically
parsable which is a plus). I would take this document to create some
automatically generated tests (having a script so that new revisions of
the registry can be used to regenerate the tests).

It still makes sense to look at the (historical) character set names
that Sun defines but which aren't in the IANA Charset Registry and
create tests that at least make sure that an implementation can alias
those names to something in the official IANA Charset Registry (and for
the tests now in getBytes13 and getBytes14 this looks like they must be
supported by all platforms).

Cheers,

Mark

References:
- new test cases (long)
  - From: Raif S. Naffah
- Re: new test cases (long)
  - From: Raif S. Naffah
- Re: new test cases (long)
  - From: Mark Wielaard
- Re: new test cases (long)
  - From: Raif S. Naffah

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]