Is this correct behaviour for 'rev'?
Thomas Wolff
towo@towo.net
Mon Nov 4 12:31:49 GMT 2024
Am 04.11.2024 um 12:10 schrieb Backwoods BC via Cygwin:
> On Sun, Nov 3, 2024 at 11:42 PM Thomas Wolff via Cygwin
> <cygwin@cygwin.com> wrote:
>> Am 04.11.2024 um 05:56 schrieb Backwoods BC via Cygwin:
>>> On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin
>>> <cygwin@cygwin.com> wrote:
>>>> Continuing my monologue, with due consideration of comments posted, ...
>>>>
>>>> On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote:
>>>>> Replying to myself, I continue...
>>>>>
>>>>> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>>>>>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>>>>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>>>>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>>>>>> it, it just stops.
>>>>>>>
>>>>>>> I don't have access to a Linux box so I can't see if this happens
>>>>>>> there and nothing in the documentation suggests that this is the
>>>>>>> correct functionality.
>>>>>>>
>>>>>>> Test case:
>>>>>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>>>>>> here\nLine 4\n'|rev|rev
>>>>>>>
>>>>>>> This is for "rev from util-linux 2.33.1"
>>>>>>>
>>>>>>> I don't have the current version of 'rev' on my system due to not
>>>>>>> having updated in a while. I accidentally screwed up my installation
>>>>>>> and have been reluctant to wipe it and start over.
>>>>>>>
>>>>>>> So, is this the expected behaviour for the current version of 'rev'
>>>>>>> under Cygwin and/or Linux?
>>>>>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken
>>>>>> way. It looks like line-ending char(s) are not being handled
>>>>>> correctly. Don't know yet if it's rev itself or fgetws() being used
>>>>>> by rev that's busted. I'll investigate further. Thanks for the report!
>>>>> This is a locale issue. In the default Cygwin locale, rev mishandles
>>>>> the \x80 byte and instead of stopping with an error message it enters an
>>>>> infinite loop. I'll probably report this upstream instead of working
>>>>> out a local fix.
>>>> Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error
>>>> message when the OP's testcase is tried. I'm testing the full 2.40.2
>>>> for Cygwin release before too long.
>>>>
>>>>> There is a work-around: change to the "C" locale just to run rev.
>>>>> LC_ALL=C rev zzz
>>>>> where zzz is a file containing your four lines. You can also run your
>>>>> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
>>>> Implicit in that suggestion is that the OP seemed to be uninterested in
>>>> any form of multi-byte characters.. just straightforward operation on
>>>> bytes, even if they have the high bit set.
>>>>
>>>> That said, I appreciate the follow-up comments that dealt with the
>>>> general problem.
>>>> Thanks all,
>>>>
>>>> ..mark
>>> Sorry for dropping out of the thread. I lost interest in pursuing the
>>> issue once I learned that 'rev' would balk at any character it didn't
>>> like instead of just passing it through, and found a workaround for my
>>> case. What I really wanted is something that would do a byte-by-byte
>>> reversal working backwards from a LF character.
>>>
>>> My use for 'rev' is to allow sorting based on field position from the
>>> *end* of the line. 'sort' won't do this itself, as far as I can tell.
>>> My method follows:
>>> printf -v mySep '\xff'
>>> cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort
>>> -t "$mySep" --key=2.1 | tr "$mySep" '.'
>>>
>>> This particular pipe is to sort fileOfFullPathNames by file extension.
>>> As mentioned, this stops abruptly when it encounters my inserted field
>>> separator of \xff. I found that it would do what I wanted if I used
>>> \x1f as mySep instead.
>>>
>>> To be honest, in far too many years of using *nix as a user (not a
>>> developer), doing this kind of thing is the only use I've ever had for
>>> 'rev'. I probably used a different separator before (likely \x09)
>>> which is why I haven't encountered an issue.
>>>
>>> What I appear to really need is "rev --binary" that just reverses
>>> everything regardless of what it is until it finds a LF. I may get
>>> motivated to write it for myself if I run into situations where I
>>> can't work around the restrictions in 'rev'.
>> As noted before in this thread, "rev --binary" is "LC_ALL=C rev".
> When 'rev' gets fixed, I'll try that. Until then, I'll just work
> around it as "LC_ALL=C rev" still dies when it encounters any byte
>> =\x80.
Well, it doesn't for me:
> printf a'\x80'b | LC_ALL=C rev | od -t x1
0000000 62 80 61
More information about the Cygwin
mailing list