This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
I really hate this whole regular expression thing. Every program seems to have different escaping conventions, different extra features, etc. Trying to port my URL parsing code from stk to scm to guile was driving me insane. Moving regexp code from stk to elisp was equally painful. Fixing buggy regular expressions is a nightmare. Regular expression strings are the worst thing to happen to programming - they're powerful and completely unreadable. In scheme one can do much better. When I was moving my URL parsing code around I finally wrote a function (list->string-regexp lst) which would take a regular expression such as '((set "^a-zA-Z_$") (group "read" or "readv" or "readln" or "write" or "writev" or "writeln" or "reset" or "extend" or "rewrite" or "close") (zero-or-more whitespace) "(") and converts it to "[^a-zA-Z_$]\\(read\\|readv\\|readln\\|write\\|writev\\|writeln\\|reset\\|extend\\|rewrite\\|close\\)\\s-*(" (at least this is what my elisp version creates). I think the list form is much more readable, hackable from within code, etc. My only regrets are that a) I didn't follow bigloo's notation (so as to be compatible with pre-existing work - my notation isn't any better than bigloo's notation, so there's no reason to invent a new one), b) I didn't do it sooner, and c) I didn't do a full and complete job of it. If I recall correctly, the author of scsh posted to the scsh mailing list over a year ago about doing something along these lines. Here's my elisp version. I don't have the scheme version handy, and I wrote this version without trying to do something especially general, but it basically works. ;; list2regexp - convert a readable regexp to a string regexp. ;; Copyright (c) 1997, Harvey J. Stein, hjstein@bfr.co.il, all rights reserved ;; This code is licensed for use under the GNU LGPL. ;; A readable regexp looks like: ;; regexp : ;; string - Match this string exactly. ;; whitespace - Match whitespace ;; char - Match any character ;; (regexp1 regexp2 ...) - Match regexp1 followed by regexp2 ... ;; (or regexp1 regexp2 ...) - Match regexp1 or regexp2 or ... ;; (group regexp1 regexp2) - Match regexp1 followed by regexp2, but group results. ;; (member string) - Match any character in string. ;; (not-member string) - Match any character not in string. ;; (one-or-more regexp) - Match regexp 1 or more times. ;; (zero-or-more regexp) - Match regexp 0 or more times. ;; (zero-or-one regexp) - Match regexp 0 or 1 time. ;;; Set these up for your particular scheme regexp package... (defvar regexp-start-group "\\(") (defvar regexp-end-group "\\)") (defvar regexp-start-set "[") (defvar regexp-end-set "]") (defvar regexp-one-or-more "+") (defvar regexp-zero-or-more "*") (defvar regexp-zero-or-one "\\?") (defvar regexp-or "\\|") (defvar regexp-begin "^") (defvar regexp-end "$") (defvar regexp-any-char ".") (defvar regexp-word-char "\\w") (defvar regexp-not-word "\\W") (defvar regexp-word-start "\\<") (defvar regexp-word-end "\\>") (defvar regexp-whitespace "\\s-") (defvar regexp-open-parenthesis "\\s(") (defvar regexp-close-parenthesis "\\s)") (defvar regexp-symbol-char "\\s_") (defvar regexp-punctuation "\\s.") (defvar regexp-string-quote "\\s\"") (defvar regexp-escape "\\s\\") (defvar regexp-char-quote "\\s/") (defvar regexp-paired-delimiter "\\s$") (defvar regexp-expression-prefix "\\s'") (defvar regexp-comment-starter "\\s<") (defvar regexp-comment-ender "\\s>") (defun list->regexp-string (l &optional quote) (cond ((null l) "") ((and (listp l) (symbolp (car l))) (case (car l) ((group) (concat regexp-start-group (list->regexp-string (cdr l) quote) regexp-end-group)) ((set) (concat regexp-start-set (list->regexp-string (cdr l) quote) regexp-end-set)) ((one-or-more) (concat (list->regexp-string (cdr l) quote) regexp-one-or-more)) ((zero-or-more) (concat (list->regexp-string (cdr l) quote) regexp-zero-or-more)) ((zero-or-one) (concat (list->regexp-string (cdr l) quote) regexp-zero-or-one)) ((begin) (concat regexp-begin (list->regexp-string (cdr l) quote))) ((end) (concat regexp-end (list->regexp-string (cdr l) quote))) ((any-char) (concat regexp-any-char (list->regexp-string (cdr l) quote))) ((whitespace) (concat regexp-whitespace (list->regexp-string (cdr l) quote))) ((symbol) (concat regexp-symbol-char (list->regexp-string (cdr l) quote))) ((word-start) (concat regexp-word-start (list->regexp-string (cdr l) quote))) ((word-end) (concat regexp-word-end (list->regexp-string (cdr l) quote))) ((word) (concat regexp-word-char (list->regexp-string (cdr l) quote))) ((not-word) (concat regexp-not-word (list->regexp-string (cdr l) quote))) ((or) (concat regexp-or (list->regexp-string (cdr l) quote))) ((token) (list->regexp-string (cons 'word-start (cons '(one-or-more (set "-a-zA-Z0-9_$")) (cons 'word-end (cdr l)))) quote)) ((escape) (list->regexp-string (cdr l) t)) ((unescape) (list->regexp-string (cdr l) nil)))) ((listp l) (concat (list->regexp-string (car l) quote) (list->regexp-string (cdr l) quote))) (quote (regexp-quote l)) (t l)))