This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: base64.scm version 0.1



sen_ml@eccosys.com writes:

 > below is version 0.1 of base64.scm, an implementation of rfc 2045
 > base64 encoding and decoding in guile using ports.

<snip>

 > thanks to everyone for their feedback up to now.

<snip>

 >		(if (eof-object? current-char)
 >		    ;; following this branch should mean exiting the while
 >		    (begin
 >		      ;; nothing to do actually, we're done
 >		      (sleep 0))
 >		    (begin
 >		      (set! byte (char->integer current-char))
 >		      (set! table-index
 >			    ;; rshift top 6 bits by 2 (2^2)
 >			    (/ (logand byte #b11111100) 4))
 >		      (set! top-bits
 >			    ;; lshift bottom 2 bits by 4 (2^4)
 >			    (* (logand byte #b00000011) 16))
 >		      (write-char (hashv-ref base64-encode-table table-index)
 >				  base64-output-port)))))

(begin (sleep 0)) isn't such a good idea.  You could just make it #f,
or, better yet, switch the sense of the if:

   (if (not (eof-object? current-char))
       (begin
          (set! byte (char->integer current-char))
          (set! table-index
	        ;; rshift top 6 bits by 2 (2^2)
	        (/ (logand byte #b11111100) 4))
          (set! top-bits ...)
          ...))

BTW, when I find myself doing (if test (begin ...)), I typically
switch it to (cond (test ...)).  I don't know how others feel about
that.

I guess you're using a case instead of the big cond now, but in the
old code, since current-state is an integer, you can use = instead of
eqv?.

Also, since you need to work with the bytes as integers (as opposed to
characters), why use a hash table?  You might as well make the encode
table a vector.

More generally, I'd think of this as:

  - Blocking the input stream into blocks of 3 8 bit bytes each.
  - Reblocking the blocks using 6 bit bytes.
  - Encoding the bits.

Along these lines, I'd start off with something like:

(define base64-encode-bytelet
  (let ((encode-vector #(#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z
                           #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z
                           0 1 2 3 4 5 6 7 8 9 #\+ #\/)))
    (lambda (bytelet)
      (vector-ref encode-vector bytelet))))

(define (base64-encode-block b)
   (map base64-encode-bytelet (reblock b 6)))


(define (two-to-the n)                  ; Replace this with a vector lookup if it's too slow.
  (expt 2 n))

(define (shift byte i)			; positive is shift right.
  (cond ((< i 0)
         (* byte (two-to-the (- i))))
        ((> i 0)
             (quotient byte (two-to-the i)))
        (else
         byte)))

(define (mask byte i)                   ; Leave least significant i bits.
  (logand byte (- (two-to-the (+ i 1)) 1)))

(define (reblock block startbits endbits)
   (let loop ((block block)
              (previous #f)		; Previous byte
              (remaining 0)             ; Remaining endbits from previous
              (result '()))
        (cond ((and (null? block)
		    (= 0 remaining))
               (reverse result))
              ((= remaining 0)
               (loop (cdr block)
                     (car block)
                     (- startbitsendbits)
                     (cons (shift (car block) (- startbits endbits))
                           result)))
	      ((= remaining endbits)
	       (loop block
		     #f
		     0
		     (cons (mask previous endbits)
			   result)))
              (else
               (loop (cdr block)
                     (car block)
                     (- startbits(- endbits remaining))
                     (cons (+ (shift (mask previous remaining) (- remaining endbits)) ; Take remaining endbits & shift left.
			      (shift (car block) (- startbits(- endbits remaining)))) ; Take top endbits from new byte.
                           result))))))


Then I'd do something like:

(define (read-block port length)
   "Read LENGTH bytes from PORT.  Return a list of the bytes read.
   Return eof if nothing's left"
   ...)

(define (base64-encode port)
   (let loop ((block (read-block port 3)))
      (cond ((eof? block)
             #f)
            ((< (length block) 3)
              (for-each write-char (base64-encode-subblock (map char->integer block))))
            (else
              (for-each write-char (base64-encode-block (map char->integer block)))
              (loop (read-block port 3)))))

If you fix up reblock a little (so that endbits can be larger than
startbits) then you can use the same function for decoding.

Better would be 
(define (general-encode-port port blocking block-encoder subblock-encoder)
  ...)

(define (base64-encode port)
   (general-encode-port port 3 base64-encode-block base64-encode-subblock))

(Or maybe use a base64-encode-block-or-subblock & put the switching lower.)

Because then it'd presumably be easy to plug in a uuencode as well.

Alternatively, one could think of it as a combination of:

 - port->bitstream
   - Convert port from a stream of bytes to a stream of bits.
 - bitstream->integerstream <# of bits>
   - Convert stream of bits to a stream of integers of given size.
 - encode-integerstream

This might lead to a cleaner solution in that one wouldn't have to
deal with input byte sizes & output byte sizes at the same time.  In
fact, I bet this would be much simpler...

-- 
Harvey Stein
Bloomberg LP
hjstein@bfr.co.il

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]