This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Probem with join and accentuated characters

Hash: SHA1

According to Boris New on 3/31/2005 1:54 PM:
> Hi,
> I send you the zip file with the two files. I tested a lot of windows
> port and all have this problem. I thought it was perhaps due to locale
> on windows.
> The format is the same and files are sorted. Everything is ok if I
> remove accentuated words from rand.txt.

Contrary to your assertion, your files were not sorted.  Or put another
way, they weren't sorted by the same rules that join expected.  There are
some locales that treat é and e as the same collating character, but the C
locale that is the default of cygwin is not one of them.  Hence, join gave
up after the first line where the sorting failed to match its expectations.

Run the following to show this:
$ sort < rand.txt > randsort.txt
$ diff rand.txt randsort.txt

Only if the diff turns up no change on both files will join work like you
want, for the locale you are using.

- --
Life is short - so eat dessert first!

Eric Blake   

Version: GnuPG v1.4.0 (Cygwin)
Comment: Public key at
Comment: Using GnuPG with Thunderbird -


Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]