This is the mail archive of the
guile@sourceware.cygnus.com
mailing list for the Guile project.
Re: binary-io, opposable-thumb, pack/unpack (was Re: binary-io (was Re: rfc 2045 base64 encoding/decoding module))
> From: Per Bothner <per@bothner.com>
> Date: 16 Feb 2000 14:04:54 -0800
>
> > I'm not sure I understand this proposal completely, since I don't see
> > what you gain by using two ports.
>
> No, two (rather four) port *types*.
Do you think combined input/output ports are more trouble than they're
worth?
> > Wouldn't it be confusing to work
> > with, e.g., if you were reading a stream of arbitrary data, would you
> > read from one port some of the time to unpack bytes into Scheme and
> > then from the other whenever you expected a character?
>
> I don't think you can meaningfully or reliably do that. You either
> process a sequence of bytes or you process a sequence of characters.
It seems a bit restrictive to allow only meaningful and reliable
formats. Examples would be things like reading a binary database
record with string fields or decoding network protocols (I'm not sure
which ones off hand. Doesn't HTTP start with an ASCII header and
switch to a character set specified in the header?)
> > It seems to me easier to consider an input port to be a source of
> > bytes, with read-char a procedure for unpacking bytes into characters.
>
> How about peek-char, read, read-line, etc? What about display, write,
> format?
Your system could make read and read-line simpler or more efficient, I
think, by allowing them to scan the buffer without needing to decode
the bytes.
> Basically, all standard Scheme procdures work with characters,
> not bytes, so an input port *is* a sequence of characters.
> You can add extra procedures that read the underlying bytes
> but you will find that buffering and character conversion
> make that problematical.
Once you place decoded characters in the buffers instead of bytes?
> > To support multiple encodings, the port could have a "current
> > encoding" which could be changed at will (actually this is just to
> > avoid adding an extra incompatible argument to read-char.
>
> Can't do that in general. Some encodings are "stateful". I guess
> you can reset the decoding state when you switch encodings. If you
> do that for output, you'll produce a meaningless document.
Maybe not in general, it would be up to the user not to mess it up.
Banning it completely seems like overkill.
> > An alternative would be to let read-char default to a global locale
> > setting and add read-char/charset or something to specify variations.)
>
> Yes, that is the C approach. It is of course the wrong way to do it.
> (It doesn't work with threads - or clean programming practices.)
>
> > Individual characters are only part of the problem anyway: there's
> > also the custom of treating strings as byte arrays that would break.
>
> Assuming the size of character remains at least 8 bits (i.e.
> integer->char and char->integer are well defined for at least
> the range 0 .. 255), I don't see where the breakage would come in.
I was thinking of where strings are passed to various system call and
gh_ interfaces, so reading a string (of arbitrary bytes) with read-line
and writing it to the interface would end up modifying the bytes.