This is the mail archive of the
guile@sourceware.cygnus.com
mailing list for the Guile project.
Re: base64.scm version 0.1
- To: sen_ml at eccosys dot com
- Subject: Re: base64.scm version 0.1
- From: "Harvey J. Stein" <hjstein at bfr dot co dot il>
- Date: Tue, 1 Feb 2000 18:02:08 +0200
- CC: hjstein at bfr dot co dot il, guile at sourceware dot cygnus dot com
sen_ml@eccosys.com writes:
> below is version 0.1 of base64.scm, an implementation of rfc 2045
> base64 encoding and decoding in guile using ports.
<snip>
> thanks to everyone for their feedback up to now.
<snip>
> (if (eof-object? current-char)
> ;; following this branch should mean exiting the while
> (begin
> ;; nothing to do actually, we're done
> (sleep 0))
> (begin
> (set! byte (char->integer current-char))
> (set! table-index
> ;; rshift top 6 bits by 2 (2^2)
> (/ (logand byte #b11111100) 4))
> (set! top-bits
> ;; lshift bottom 2 bits by 4 (2^4)
> (* (logand byte #b00000011) 16))
> (write-char (hashv-ref base64-encode-table table-index)
> base64-output-port)))))
(begin (sleep 0)) isn't such a good idea. You could just make it #f,
or, better yet, switch the sense of the if:
(if (not (eof-object? current-char))
(begin
(set! byte (char->integer current-char))
(set! table-index
;; rshift top 6 bits by 2 (2^2)
(/ (logand byte #b11111100) 4))
(set! top-bits ...)
...))
BTW, when I find myself doing (if test (begin ...)), I typically
switch it to (cond (test ...)). I don't know how others feel about
that.
I guess you're using a case instead of the big cond now, but in the
old code, since current-state is an integer, you can use = instead of
eqv?.
Also, since you need to work with the bytes as integers (as opposed to
characters), why use a hash table? You might as well make the encode
table a vector.
More generally, I'd think of this as:
- Blocking the input stream into blocks of 3 8 bit bytes each.
- Reblocking the blocks using 6 bit bytes.
- Encoding the bits.
Along these lines, I'd start off with something like:
(define base64-encode-bytelet
(let ((encode-vector #(#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z
#\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z
0 1 2 3 4 5 6 7 8 9 #\+ #\/)))
(lambda (bytelet)
(vector-ref encode-vector bytelet))))
(define (base64-encode-block b)
(map base64-encode-bytelet (reblock b 6)))
(define (two-to-the n) ; Replace this with a vector lookup if it's too slow.
(expt 2 n))
(define (shift byte i) ; positive is shift right.
(cond ((< i 0)
(* byte (two-to-the (- i))))
((> i 0)
(quotient byte (two-to-the i)))
(else
byte)))
(define (mask byte i) ; Leave least significant i bits.
(logand byte (- (two-to-the (+ i 1)) 1)))
(define (reblock block startbits endbits)
(let loop ((block block)
(previous #f) ; Previous byte
(remaining 0) ; Remaining endbits from previous
(result '()))
(cond ((and (null? block)
(= 0 remaining))
(reverse result))
((= remaining 0)
(loop (cdr block)
(car block)
(- startbitsendbits)
(cons (shift (car block) (- startbits endbits))
result)))
((= remaining endbits)
(loop block
#f
0
(cons (mask previous endbits)
result)))
(else
(loop (cdr block)
(car block)
(- startbits(- endbits remaining))
(cons (+ (shift (mask previous remaining) (- remaining endbits)) ; Take remaining endbits & shift left.
(shift (car block) (- startbits(- endbits remaining)))) ; Take top endbits from new byte.
result))))))
Then I'd do something like:
(define (read-block port length)
"Read LENGTH bytes from PORT. Return a list of the bytes read.
Return eof if nothing's left"
...)
(define (base64-encode port)
(let loop ((block (read-block port 3)))
(cond ((eof? block)
#f)
((< (length block) 3)
(for-each write-char (base64-encode-subblock (map char->integer block))))
(else
(for-each write-char (base64-encode-block (map char->integer block)))
(loop (read-block port 3)))))
If you fix up reblock a little (so that endbits can be larger than
startbits) then you can use the same function for decoding.
Better would be
(define (general-encode-port port blocking block-encoder subblock-encoder)
...)
(define (base64-encode port)
(general-encode-port port 3 base64-encode-block base64-encode-subblock))
(Or maybe use a base64-encode-block-or-subblock & put the switching lower.)
Because then it'd presumably be easy to plug in a uuencode as well.
Alternatively, one could think of it as a combination of:
- port->bitstream
- Convert port from a stream of bytes to a stream of bits.
- bitstream->integerstream <# of bits>
- Convert stream of bits to a stream of integers of given size.
- encode-integerstream
This might lead to a cleaner solution in that one wouldn't have to
deal with input byte sizes & output byte sizes at the same time. In
fact, I bet this would be much simpler...
--
Harvey Stein
Bloomberg LP
hjstein@bfr.co.il