This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: binary-io, opposable-thumb, pack/unpack (was Re: binary-io (was Re: rfc 2045 base64 encoding/decoding module))


> From: Per Bothner <per@bothner.com>
> Date: 16 Feb 2000 14:04:54 -0800
> 
> > I'm not sure I understand this proposal completely, since I don't see
> > what you gain by using two ports.
> 
> No, two (rather four) port *types*.

Do you think combined input/output ports are more trouble than they're
worth?

> > Wouldn't it be confusing to work
> > with, e.g., if you were reading a stream of arbitrary data, would you
> > read from one port some of the time to unpack bytes into Scheme and
> > then from the other whenever you expected a character?
> 
> I don't think you can meaningfully or reliably do that.  You either
> process a sequence of bytes or you process a sequence of characters.

It seems a bit restrictive to allow only meaningful and reliable
formats.  Examples would be things like reading a binary database
record with string fields or decoding network protocols (I'm not sure
which ones off hand.  Doesn't HTTP start with an ASCII header and
switch to a character set specified in the header?)

> > It seems to me easier to consider an input port to be a source of
> > bytes, with read-char a procedure for unpacking bytes into characters.
> 
> How about peek-char, read, read-line, etc?  What about display, write,
> format?

Your system could make read and read-line simpler or more efficient, I
think, by allowing them to scan the buffer without needing to decode
the bytes.

> Basically, all standard Scheme procdures work with characters,
> not bytes, so an input port *is* a sequence of characters.
> You can add extra procedures that read the underlying bytes
> but you will find that buffering and character conversion
> make that problematical.

Once you place decoded characters in the buffers instead of bytes?

> > To support multiple encodings, the port could have a "current
> > encoding" which could be changed at will (actually this is just to
> > avoid adding an extra incompatible argument to read-char.
> 
> Can't do that in general.  Some encodings are "stateful".  I guess
> you can reset the decoding state when you switch encodings.  If you
> do that for output, you'll produce a meaningless document.

Maybe not in general, it would be up to the user not to mess it up.
Banning it completely seems like overkill.

> > An alternative would be to let read-char default to a global locale
> > setting and add read-char/charset or something to specify variations.)
> 
> Yes, that is the C approach.  It is of course the wrong way to do it.
> (It doesn't work with threads - or clean programming practices.)
> 
> > Individual characters are only part of the problem anyway: there's
> > also the custom of treating strings as byte arrays that would break.
> 
> Assuming the size of character remains at least 8 bits (i.e.
> integer->char and char->integer are well defined for at least
> the range 0 .. 255), I don't see where the breakage would come in.

I was thinking of where strings are passed to various system call and
gh_ interfaces, so reading a string (of arbitrary bytes) with read-line
and writing it to the interface would end up modifying the bytes.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]