This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gawk Regression: CR characters are not stripped on Windows


Cross-posting per Eli Zaretskii's request.

CR characters used to be automatically stripped on Windows (MSYS2 and
Cygwin environments). This is broken in 4.2.0.

Minimal example:
echo -en "foo\r\n\r\nbar\r\n" > foo.txt
awk '/^$/ { print "found" }' foo.txt # This worked with 4.1.4 and
doesn't work with 4.2.0
awk '/^\r$/ { print "found" }' foo.txt # This works with 4.2.0 and
doesn't work with 4.1.4

Bisected to commit 5db38f775d9ba239e125d81dff2010a2ddacb48e:
(* gawkmisc.c (cygwin_premain0, cygwin_premain2): Remove.
No longer needed).

Apparently it's still needed...

This issue was reported in

Proposed patch is attached.

As Eli said, this change was deliberate. But this has several drawbacks.

1. The gawk info page states that:

> Under MS-Windows, 'gawk' (and many other text programs) silently
> translates end-of-line '\r\n' to '\n' on input and '\n' to '\r\n' on
> output.

and on Feb 8 the following section was added:

> Recent versions of Cygwin open all files in binary mode.  This means
> that you should use 'RS = "\r?\n"' in order to be able to handle
> standard MS-Windows text files with carriage-return plus line-feed line
> endings.

This breaks compatibility between different gawk versions. What were
the reasons for this change in cygwin, and why was it pushed upstream?

2. Git and other tools automatically convert text files to CRLF on
Windows. This means that any awk script that runs on both platforms
must use RS = "\r?\n". One example that was broken by this behavior
change is gerrit's commit-msg hook[1], which scans for empty lines by
/^$/ regexp.

Please consider reverting this change. Patch attached.


- Orgad

Attachment: 0001-Revert-default-mode-on-Cygwin-from-binary-back-to-te.patch
Description: Binary data

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]