This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: binary-io (was Re: rfc 2045 base64 encoding/decoding module)


> Date: Thu, 10 Feb 2000 18:39:15 -0600
> From: "C. Ray C." <crayc@206.31.63.15>
> 
> On Thu, Feb 10, 2000 at 11:52:24PM -0000, Gary Houston wrote:
> > It's not a good idea to use read-char and write-char to process
> > arbitrary bytes.  It would probably break if multibyte character
> > support was implemented (or alternatively, make multibyte character
> > set support harder to implement and less elegant: just take a look at
> > C.)
> > 
> > Unfortunately the alternatives aren't very good at present.  R5RS
> > doesn't provide anything.
> 
> R5RS specifies as little as possible (to the point of not giving a way to
> embed newlines in strings!), and Guile implies that strings and characters
> should be used to store binary data. E.g. the docs for gh_scm2newstr()
> say, "Note that Scheme strings may contain arbitrary data, including
> null characters".

This is just saying that certain standard C facilities like strcpy
don't work correctly on data extracted from a Scheme string, which may
contain the NUL character.  However where Guile does encourage the use
of Scheme characters to represent arbitrary binary data, I think it
needs to be revised to discourage it, but not until we've got an
equally convenient alternative.

> Because it's the only way to deal with binary data, at the moment I
> use characters and strings for precisely that purpose. Like you say,
> this may collide with multibyte characters in the future.

I've been doing that too.

> To implement multibyte character support, we either need a new multibyte
> character type, or we need a new single-byte type. A new single-byte type
> certainly makes more sense just from a vocabulary standpoint (characters
> would be characters, and bytes would be bytes).

R5RS has managed to avoid letting its character type degrade to a byte
type, by the simple device of never introducing the concept of a byte.
RXRS has no need to go down the route of defining a new type for
larger character sets, since its existing character type can be as
wide as you want anyway.  Instead it's up to anyone who wants to go
beyond R5RS and process foreign data to introduce any new types
required.

> > Do not write procedures that pack/unpack data directly from a port,
> > since they would also be useful as part of a foreign function interface
> > and probably for things like mmap.  Constructing a port has a certain
> > overhead and imposes a serial interface.
> 
> I can't see the utility of having a non-serial interface. The only
> things you need this for are accessing foreign data, which come from
> files and network connections, no? Foreign function interfaces are
> generally written in C, and mmap in Scheme... well...

Allowing foreign function interfaces to be written in Scheme is a
valid technique, since there are already Guile modules that allow it:
Marius Vollmer's guile-ffi module for Anthony Green's libffi library
and Clark McGrew's guile-foreign module for Bruno Haible's ffcall
library. Foreign functions can return pointers to foreign data.  I
guess this is true even if less hairy G-wrap-like techniques are used.

mmap is just another conceivable way of getting a handle to foreign
data: I don't know if anyone will want to use it.

> > Define a new "byte-vector" type.
> > [...deleted discussion of new types...]
> > One could also address the entire memory by creating a shared
> > byte-vector at address 0 and length equal to the address space, thus
> > giving Guile a "peek" and "poke" facility.  Hurrah!
> > 
> > It may also be useful to make port buffers visible as shared
> > byte-vectors.
> > 
> > Then write unpack/pack routines which operate on byte-vectors, including
> > conversion to and from Scheme chars and strings.
> 
> This strikes me as needlessly complicated. The only purpose it would
> serve is to support things like mmap (maybe for instance with video
> frame buffers?). It just seems to me that it gives up one of the major
> benefits of Scheme -- the ability to not worry about the details of
> memory management or variable types.
> 
> I would prefer simply a new type, called "byte", an 8-bit numeric type,
> and then "read-byte", "write-byte", "byte->integer", "integer->byte"
> and "byte?" procedures. This is sufficient to do anything I can think
> of with external data.

How about reading a UDP packet from the network?  The interface has
the interesting property that you have to read the whole thing at
once: anything you don't read is then lost.  Likewise in reverse for
generating a packet to send.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]