Is this correct behaviour for 'rev'?

Thomas Wolff towo@towo.net
Thu Oct 24 08:37:39 GMT 2024


Am 24.10.2024 um 07:01 schrieb Mark Geisert via Cygwin:
> Replying to myself, I continue...
>
> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>> it, it just stops.
>>>
>>> I don't have access to a Linux box so I can't see if this happens
>>> there and nothing in the documentation suggests that this is the
>>> correct functionality.
>>>
>>> Test case:
>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>> here\nLine 4\n'|rev|rev
>>>
>>> This is for "rev from util-linux 2.33.1"
>>>
>>> I don't have the current version of 'rev' on my system due to not
>>> having updated in a while. I accidentally screwed up my installation
>>> and have been reluctant to wipe it and start over.
>>>
>>> So, is this the expected behaviour for the current version of 'rev'
>>> under Cygwin and/or Linux?
>>
>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same,
>> broken way.  It looks like line-ending char(s) are not being handled
>> correctly.   Don't know yet if it's rev itself or fgetws() being used
>> by rev that's busted.  I'll investigate further.  Thanks for the report!
>
> This is a locale issue.  In the default Cygwin locale, rev mishandles
> the \x80 byte and instead of stopping with an error message it enters
> an infinite loop.  I'll probably report this upstream instead of
> working out a local fix.
>
> There is a work-around: change to the "C" locale just to run rev.
>     LC_ALL=C rev zzz
> where zzz is a file containing your four lines.  You can also run your
> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
Sorry, this is not a good workaround as it corrupts all (proper)
non-ASCII characters.
You could do e.g.
grep . | rev

> HTH,
>
> ..mark
>
> P.S. ASCII runs from \x00 to \x7F, so your \x80 is non-ASCII FWIW ;-)
>



More information about the Cygwin mailing list