Problem with Bash regex test case sensitivity
Lee
ler762@gmail.com
Sat Dec 4 21:08:00 GMT 2010
On 12/4/10, Corinna Vinschen <corinna-cygwin > wrote:
> On Dec 4 10:05, Lee wrote:
>> On 12/3/10, Eric Blake <eblake@ > wrote:
>> > Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9.
>>
>> Which says the en_US locale collates the upper and lower case letters like
>> this:
>> AaBb...Zz
>>
>> I got that much :) What I don't get is why someone would _want_ the
>> collating sequence to be AaBb... or why that sequence was picked for
>> en_US instead of using the natural order of A-Za-z.
>
> It's not the "natural" order, it's an arbitrary order which has been
> chosen back in 1963 when the ASCII code has been defined. It's not used
> as "natural" order outside of computer systems and it's not even the
> natural order on some computer systems (See EBCDIC).
My idea of "natural order" is treating each character as an unsigned
integer. So even though ASCII has a different collating sequence than
EBCDIC, the characters are still treated as unsigned integers when
sorting them. Setting LANG to something other than C seems to break
that model..
> If you take a look into a hardcopy encyclopedia written in english,
> you'll be very comfortable that the words are ordered lexicographically
> instead of in ASCII coding, probably.
I never paid all that much attention to how the words were ordered,
but now that I have.. they're backwards! "god" comes before "God",
"hopper" before "Hopper", etc.
> Needless to say that ordering
> criteria for non-english languages may contain more characters in the
> sequence, in german for instance
>
> "AaäBb...Ooö...Ssß...Uuü...Zz"
>
> So, let's reiterate:
>
> - If I need the order for the computer language, I say so:
>
> LC_COLLATE=C.UTF-8
>
> - Otherwise, if I need the order for the natural language, I say so:
>
> LC_COLLATE=en_US.UTF-8
> LC_COLLATE=de_DE.UTF-8
You're quite good at explaining this.. I think I'm actually beginning
to understand it :)
So... the reason for setting LANG is a shorthand method of setting
all the LC_xxx environment variables?
Thanks,
Lee
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list