sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

Mon Mar 30 15:23:00 GMT 2009

Corinna Vinschen wrote:
> On Mar 30 13:48, Michael Moser wrote:
>> I need to mangle a file containing "8-bit ASCII" characters (i.e. the
>> file contains also characters in the upper 8-bit range, namely a few
>> umlauts as well as some french accented characters). 
>>
>> Strange enough, the SED version that came as part of cygwin emits the
>> result of the mangling using 16-bit characters (I believe those are
>> Unicode-16 characters, but not sure. The Hexeditor shows each second
>> byte as always 00, execpt for the first two bytes which read FF FE).
> 
> This is very likely not Cygwin's sed.  Do you have another sed in $PATH
> by any chance?  I tried with input files containing german umlauts and
> sed does not convert to wide char and it does not produce a BOM marker
> at the start of the file.

  Another possibility is that wordpad or notepad has tried to be clever and
gone and unexpectedly saved the original source file in UTF16.  Did you verify
the original source file in a hexeditor too, Michael?

    cheers,
      DaveK

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/