This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[PATCH] solaris socket communications bug workaround


We have found that guile provokes a problem with communicating over
sockets, which we believe is due to a problem in solaris libc. Below
is a patch that implements a workaround.

The problem has been seen on both solaris 2.5.1 (with and without
patches) and solaris 2.6. We are working on the x86 platform. We have
not been able to test this on sparc or solaris 2.4.

The problem appears also under the latest snapshot (19980804).

We have been unable to devise an example involving only guile, so the
example given below uses a communication between a guile process and a
tcl8.0 process. But the underlying has also bitten us when
communicating with a locally developed server written in C, so this
does not appear to be connected to tcl8.0 in any way.

The following (somewhat scary) shell command will exhibit the
problem. It starts a guile process which sets up a socket, from which
it attempts to read, and when something is received, it is sent back
into the socket. It is the write step that fails. After starting
guile, it starts a tclsh, which connects to the socket, sends a string
and read from the socket again. The result is garbled.

The command is:

    guile -c '\
    (begin \
	   (define lsocket (socket AF_INET SOCK_STREAM 0))\
	   (define portno 8765)\
	   (setsockopt lsocket SOL_SOCKET SO_REUSEADDR 1)\
	   (bind lsocket AF_INET INADDR_ANY portno)\
	   (listen lsocket 5)\
	   (define newconnection #f)\
	   (display "Waiting for connections...\n")\
	   (set! newconnection (accept lsocket))\
	   (let* ((handle newconnection) (thissock (car handle)))\
		 (display "Waiting for input...\n")\
		 (let ((input (read thissock)))\
		      (display "Has read: ")\
		      (write input)\
		      (newline)\
		      (write input thissock)\
		      (newline thissock)))\
		      )' & ;\
    sleep 5 ;\
    echo 'puts "setting up...";\
	  set s [socket dur 8765];\
	  fconfigure $s -buffering line ;\
	  set line "(1)";\
	  puts "sending $line..." ;\
	  puts $s $line ;\
	  gets $s line;\
	  puts "Has received: $line"'\
    |tclsh8.0

which on our system produces the following output:

    Waiting for connections...
    setting up...
    sending (1)...
    Waiting for input...
    Has read: (1)
    Has received: ()1)

An extra ')' character has been inserted in the string `()1)' which
should have read `(1)'.

My colleague, Peder Chr. Nørgaard <pcn@tbit.dk>, which debugged the
problem has the following theory about what guile is doing to provoke
the problem.

When guile writes out the string "(1)" it uses the following sequence
of library calls:

    - fputs(s,fp) to write "("
    - fwrite(ptr, 1, 1, fp) to write "1"
    - fputc(c,fp) to write ")"

The `fwrite' is the one that on the socket produces ")1" where it
should only have been producing "1".

It is likely that the problem is one of buffering. `fp' is unbuffered
and this is one of the major difference between writing on a socket
and to stdout.

The fix is not to use the library `fwrite'. We have found that using
the one intended for VMS works well. The patch is given below.

Index: libguile/fports.c
===================================================================
RCS file: /nmc/Repository/tools/guile/guile-core/libguile/fports.c,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.2.4.1
diff -u -r1.1.1.2 -r1.1.1.2.4.1
--- fports.c	1997/12/20 13:52:09	1.1.1.2
+++ fports.c	1998/06/25 08:45:08	1.1.1.2.4.1
@@ -375,6 +375,28 @@
 }
 
 #define ffwrite pwrite
+#elif defined(sun)
+
+static scm_sizet ffwrite SCM_P ((char *ptr, scm_sizet size, scm_sizet nitems, FILE *port));
+
+static scm_sizet 
+ffwrite (ptr, size, nitems, port)
+     char *ptr;
+     scm_sizet size, nitems;
+     FILE *port;
+{
+  scm_sizet len = size * nitems;
+  if (port->_flag & _IONBF) {
+    return write (port->_file, ptr, len);
+  } else {
+    scm_sizet i = 0;
+    for (; i < len; i++)
+      putc (ptr[i], port);
+    return len;
+  }
+}
+
+
 #else
 #define ffwrite fwrite
 #endif



---------------------------+--------------------------------------------------
Christian Lynbech          | Telebit Communications A/S                       
Fax:   +45 8628 8186       | Fabrik 11, DK-8260 Viby J
Phone: +45 8628 8177 + 28  | email: chl@tbit.dk --- URL: http://www.telebit.dk
---------------------------+--------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - petonic@hal.com (Michael A. Petonic)