This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: gawk 4.1.4: CR separate char for CRLF files

Hi Roger,

On Wed, 9 Aug 2017 07:03:24 +0000, Roger Krebs wrote:
> I've added a BEGIN section at the beginning awk sript file setting the
> separator explicitly for the input file (RS) as well as for the output
file (ORS):
>         RS="\r\n"
>         ORS="\r\n"
> }
> {
>    ... your script
> }
> Especially the RS parameter wasn't necessary in the past but now it is.

Which is a pretty much of a pain when there is no easy fallback solution
provided in case a major change is applied. E.g. for sed - if I understand
the reference to sed in
correctly - a separate switch '-b' is added. For the latest gawk version I
cannot see anything like that which means that all of our awk scripts run
against cygwin's gawk do break without any tweak unless I am missing
anything here. 

This is - to say the least - unpleasant in the light of what Cygwin claims
to be, namely 'a large collection of GNU and Open Source tools which provide
functionality similar to a Linux distribution on Windows' (from the top of
the start website Again, admittedly I did not dive into the
discussion and the substance of the reasoning to make this move to gawk |
sed | grep.

Now I can see the following *easy* solutions to the very situation here
(input only for now):

1 - Inserting the BEGIN section as you suggested into more than 1k scripts
(not feasible due to additional regression test workload) 

2 - Calling 'gawk -vRS=\r\n -vORS=\r\n' instead of 'gawk' (hack to turn back
the additional the latest gawk's complexity, wrapper needed)

3 - Wrapping a d2u/u2d pipe solution (additional app and wrapper needed

4 - Using another compiled version of gawk which does *not* disable the
out-of-the-box gawk feature to swallow CRs (cf., e.g.,, i.e.
without the artificial obstacle to now know the EOL type of the input file
ahead of running gawk.

> It works in all my cases. The only disadvantage: you have to know what

... plus the disadvantage to systematically amend all the scripts instead of
having an external solution 

> of files you want to handle in the awk script. The same awk script will
> work for DOS files as well as for linux files.

... another issue originated by the change and which didn?t exist before.

> Best
> Roger

Please don't get me wrong, but this raises a real issue here and I am not
sure which rationale other than 'let's get more of the Linux-feel' drove the

All the best,

> -----Ursprüngliche Nachricht-----
> Von: [] Im
> Auftrag von Jannick
> Gesendet: Mittwoch, 9. August 2017 02:48
> An:
> Betreff: RE: gawk 4.1.4: CR separate char for CRLF files
> On Tue, 08 Aug 2017 16:23:40 -0700 (PDT), Steven Penny wrote:
> > On Wed, 9 Aug 2017 01:15:08, "Jannick" wrote:
> > > the current version 4.1.4 of gawk appears to unpleasantly treat CR
> > > for CRLF files, i.e. CR is not gracefully swallowed, but is a
> > > separate
> character.
> > >
> > > This makes some, if not all, of the scripts we are working with here
> > > useless, unless the input files are converted to LF which certainly
> > > is not feasible. IIRC the issue did not show up some versions back.
> > >
> > > Is this a bug - or am I missing something here?
> >
> > Learn to read:
> >
> >
> Thanks - quickly done.
> The link reveals that CRLF/LF conversion is now mandatory to work with
> cygwin's gawk on DOS machines. As far as I can see there is no legacy
> solution like for, e.g., sed (-b switch) to have an easy solution for the
> especially when invoking gawk from makefiles (piping).
> I consider this bad news while admittedly not fully understanding the
> background of the move which is not necessary for now.

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]