Probem with join and accentuated characters

Eric Blake ebb9@byu.net
Thu Mar 31 17:39:00 GMT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Boris New on 3/31/2005 5:30 AM:
> Hi,
> 
> Join in coreutils 5.3.03 gives incomplete results when the two files
> include french accentuated characters. (for instance
> é|è|â|ï|ü|ê|ç|î|ô|û|ü|ë|à|ù) .
> Results are okay when I have only one text file with accentuated characters.

I'll need more details on what you think is broken (hint - two actual
short files that you tried to join, and the results you got vs what you
expected).  Also, coreutils-5.3.0-3 join is unmodified from upstream
sources, so you may want to ask this question on the upstream list
(bug-coreutils@gnu.org).  But it may have something to do with file
encodings; if your two inputs have different encodings, accented
characters don't necessarily have the same underlying bytes, and that
might mess up join.  Also, join requires both files to be sorted on the
join fields, and if they are not, there is no telling what results to expect.

- --
Life is short - so eat dessert first!

Eric Blake             ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCTAFF84KuGfSFAYARAjX0AKCy83MHsdGJFx0kvsexYBPV6CnR2QCgrbfV
IVM1USqaQS3U8bdr1vV0Kck=
=D383
-----END PGP SIGNATURE-----

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list