dd and binary mode

Eric Blake ebb9@byu.net
Tue May 17 14:59:00 GMT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Paul Eggert on 5/16/2005 6:14 PM:
> This isn't just dd; it's cat, md5sum, split, etc.  And I don't really
> understand how it works, or why some programs use binary modes and not
> others.  For example, POSIX says that the input to "head" must be a
> text file, so why does GNU "head" set binary mode?  Why does (say)
> "unexpand" use binary mode, but "uniq" uses text mode?  Why does
> md5sum invoke setmode (..., O_TEXT) on a file that has just been
> fopened with "r" (doesn't that mean text?).  None of this stuff really
> makes sense to me, and this makes the code hard to maintain.

I agree that a lot of programs have become ad hoc in deciding when text
vs. binary vs. default mode is needed, often as bugs are reported that the
behavior wasn't intuitive (such as the recent report on dd).

First, some background: POSIX fopen(,"r") and open(,,<neither O_BINARY nor
O_TEXT>) default to whatever the underlying mount point is.  Cygwin
recommends binary mount points, since POSIX requires "r" and "rb" to be
identical, but there are users who have text mount points for better
interoperability with Windows programs on text files; text mode mounts
make the most sense when all files in that mount point are text files.
POSIX fopen(,"rb") and the extension open(,,O_BINARY) force binary mode.
And the POSIX extensions fopen(,"rt") and open(,,O_TEXT) force text mode.
 Furthermore, terminals have strange behavior, where forcing binary or
text mode on a terminal is almost always the wrong thing to do, hence
coreutils' SET_BINARY macro that ensures that it is only changing mode on
a non-terminal.

As to POSIX requirements, you bring up a valid point on utilities that are
required to operate only on text files, and that a script that tries to
run that utility on a non-text file is non-portable.  I claim that
logically, programs that need operate only on text files should defer to
the mount point mode, and programs that must operate on any file type
should always default to binary.  Forcing text mode without a user option
is almost always wrong.  Output from many POSIX programs is human-readable
text, but since the utility inherits rather than opens stdout, it
shouldn't be changing the mode of stdout in that case.  Helper files (such
as uptime opening /proc/uptime under the hood) are not user-specified
stdin or filenames on the command line, and as such, should probably be
opened in binary mode.  I welcome feedback on whether this following list
of desired behavior sounds correct, before I then try to see whether
coreutils is actually doing that behavior:

[ - doesn't open files
basename - doesn't open files
cat - POSIX requires binary input and output, and this already has -B
option to fine-tune mode
chgrp - doesn't open files
chmod - doesn't open files
chown - doesn't open files
chroot - doesn't open files
cksum - POSIX requires binary input
comm - POSIX requires text input
cp - POSIX requires binary input and output
csplit - POSIX requires text input
cut - POSIX requires text input
date - doesn't open files
dd - POSIX requires binary input and output, and the [io]flag=text option
was just added
df - doesn't open files
dir - doesn't open files
dircolors - non-standard, but operates on text input
dirname - doesn't open files
du - doesn't open files
echo - doesn't open files
env - doesn't open files
expand - POSIX requires text input
expr - doesn't open files
factor - doesn't open files
false - doesn't open files
fmt - non-standard, but operates on text input
fold - POSIX requires text input
groups - doesn't open files
head - POSIX requires text input, but compare to tail -c on binary input
hostid - doesn't open files
hostname - doesn't open files
id - doesn't open files
install - non-standard, but operates on binary input and output
kill - doesn't open files
link - doesn't open files
ln - doesn't open files
logname - doesn't open files
ls - doesn't open files
md5sum - non-standard, but like cksum needs binary input
mkdir - doesn't open files
mkfifo - doesn't open files
mknod - doesn't open files
mv - POSIX requires binary input and output
nice - doesn't open files
nl - POSIX requires text input
nohup - POSIX requires that stdout from utility may to go to nohup.out, so
nohup.out should probably be opened in same mode as nohup's stdout (if it
exists)
od - POSIX requires binary input; but as this is a formatter, we probably
want options to fine-tune the mode
paste - POSIX requires text input and output
pathchk - doesn't open files
pinky - doesn't open files
pr - doesn't open files
printenv - doesn't open files
printf - doesn't open files
ptx - non-standard, but operates on text input
pwd - doesn't open files
readlink - doesn't open files
rm - doesn't open files
rmdir - doesn't open files
seq - doesn't open files
sha1sum - non-standard, but like cksum needs binary input
shred - non-standard, but needs binary input if it is going to affect the
same number of bytes on disk as it erases
sleep - doesn't open files
sort - POSIX requires text input
split - POSIX requires binary input
stat - doesn't open files
stty - doesn't open files
su - doesn't open files (cygwin doesn't support su; and the question was
raised earlier whether coreutils should drop su or add newgrp)
sum - non-standard, but like cksum needs binary input
sync - doesn't open files
tac - non-standard, but like cat operates on binary input
tail - POSIX requires that -c operates on binary input, otherwise on text
input
tee - POSIX requires binary input and output
test - doesn't open files
touch - doesn't open files
tr - POSIX requires binary input
true - doesn't open files
tsort - POSIX requires text input
tty - doesn't open files
uname - doesn't open files
unexpand - POSIX requires text input
uniq - POSIX requires text input
unlink - doesn't open files
uptime - doesn't open files
users - non-standard, but /var/run/utmp should probably be opened in
binary mode
vdir - doesn't open files
wc - POSIX requires binary input
who - doesn't open files
whoami - doesn't open files
yes - doesn't open files

> 
> Is there some way that we can simplify this by using wrapper functions
> on DOS-like hosts?  I'd rather get rid of the SETMODE and SET_BINARY
> macros entirely.  If Cygwin open or fcntl doesn't do the obvious thing
> with O_TEXT and O_BINARY, let's define a wrapper function, used only
> on cygwin, that does the right thing.

open does the right thing.  The problem is that
fcntl(fd,F_SETFL,fcntl(fd,F_GETFL)|O_BINARY) will not work, since O_BINARY
is not an additive property, but a mutually exclusive property with
O_TEXT.  Whether you used O_BINARY, O_TEXT, or nothing with the original
open(), fcntl(F_GETFL) will always return O_BINARY or O_TEXT in its list
of flags.  And even if cygwin is patched to let fcntl(F_SETFL,O_BINARY)
change the mode to binary, it will have to reject
fcntl(F_SETFL,O_BINARY|O_TEXT).  Hence the current use of setmode(mode),
which returns EINVAL unless mode is exactly O_BINARY, O_TEXT, or 0
(meaning no change).  I agree that a wrapper might help, but the wrapper
would need slightly different semantics than how fcntl(F_SETFL) is used in
dd.c, because of the mutually exclusive nature of O_BINARY and O_TEXT.

> Your patch assumes that (O_BINARY != 0 && O_TEXT != 0); is this really
> true on all platforms?  It seems to me that one could be zero.

system.h keys solely off of O_BINARY - if O_BINARY is non-zero, then
O_TEXT is required to also exist (and it is probably also non-zero).  If
O_BINARY doesn't exist or is 0, then system.h makes both O_BINARY and
O_TEXT be 0, to avoid later #ifdef'ery.  My patch always treated the
combination of (O_BINARY|O_TEXT), which should easily be optimized out as
0 on platforms without O_BINARY; and should work fine even if there is a
platform with non-zero O_BINARY but zero O_TEXT.

- --
Life is short - so eat dessert first!

Eric Blake             ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCifIF84KuGfSFAYARArZZAKCDotpvCtmF64M5CSizVfBCTWuycwCeK559
OxEtCVcmNQIVzS+dc9DmvBg=
=drvs
-----END PGP SIGNATURE-----

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list