readdir() returns inaccessible name if file was created with invalid UTF-8

Christian Franke Christian.Franke@t-online.de
Sun Sep 15 20:48:53 GMT 2024


Thomas Wolff via Cygwin wrote:
>
> Am 15.09.2024 um 20:15 schrieb Thomas Wolff via Cygwin:
>> Am 15.09.2024 um 19:47 schrieb Christian Franke via Cygwin:
>>> If a file name contains an invalid (truncated) UTF-8 sequence, open()
>>> does not refuse to create the file. Later readdir() returns a
>>> different name which could not be used to access the file.
>>>
>>> Testcase with U+1F321 (Thermometer):
>>>
>>> $ uname -r
>>> 3.5.4-1.x86_64
>>>
>>> $ printf $'\U0001F321' | od -A none -t x1
>>>  f0 9f 8c a1
>>>
>>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
>>>
>>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
>>>
>>> $ touch 'file3-'$'\xf0\x9f\x8c'
>>>
>>> $ ls -1
>>> ls: cannot access 'file2-.?ext': No such file or directory
>>> ls: cannot access 'file3-': No such file or directory
>>> 'file1-'$'\360\237\214\241''.ext'
>>> file2-.?ext
>>> file3-
>> I don't reproduce this.

Yes, sorry, the above 'ls' was actually aliased to 'ls --color=auto' 
which needs to call stat(). Plain 'ls' does not, so the errors do not 
occur then.


>>
>> While the file name gets mangled, all resulting file names are valid and
>> listed:
>> In file2 the sequence is turned into U+17B3 but exchanged with the dot.
>> In file3 the same sequence is just dropped.
>> $ ls -1|cat
>> file1-🌡.ext
>> file2-.ឳext
>> file3-
>>
>> However, ls file2* fails, as does ls *.
> On the other hand, ls file3- fails too, so some mapping error occurs
> internally.
> Also, the files cannot be deleted from cygwin (need to use cmd).

'rm' using the original names works for file2-..., but not for file3-...

$ rm -v 'file2-'$'\xf0\x9f\x8c''.ext'
removed 'file2-'$'\360\237\214''.ext'

$ rm -v 'file3-'$'\xf0\x9f\x8c'
rm: cannot remove 'file3-'$'\360\237\214': No such file or directory



More information about the Cygwin mailing list