grep treating my text files as binary!

Bengt Larsson lists.cygwin4@bengtl.net
Sat Dec 27 10:07:00 GMT 2014


Warren Young wrote:
>On Dec 25, 2014, at 11:41 AM, Thomas Wolff <towo@towo.net> wrote:
>
>> In any case the argument is quite artificial since the new behaviour
>> hits many files that are in fact text files.
>
>Please define the term “text file” in a way that allows a C programmer
>to write a program that automatically does the correct thing for all
>members of the class “text file” without involving locales, or an
>equivalent mechanism.
...
>If grep runs into a byte sequence that makes it think it is not legal
>for your current locale, it must treat the file as raw bytes, unless you
>give it -a.
>
>If you don’t like this behavior, say “alias grep=grep -a” in your
>~/.bashrc, and forget the change ever happened.  It’ll be on you when
>some non-text file gets treated as text and grep spams your terminal
>with binary garbage, though.

It's better to use the "alias grep='LC_ALL=C grep'" method. It keeps the
old way of detecting binaries (for example it detects an .EXE as binary)
while allowing you to match mostly-ASCII files with some
mismatched-locale characters. The definition you ask for is already in
the code. For us non-english people detecting what is "mostly ASCII" is
mostly right, at least interactively.

I ran into this, actually. I keep a list of my directories and it is in
CP1252 for reasons of interfacing with CMD.EXE. Suddenly grep couldn't
match it. But I figured something was up and set my locale to CP1252 and
then it worked.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list