This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: Regular expression functions (Was: Re: comments on December F&O draft)

From: "Steven Noels" <stevenn at outerthought dot org>
To: <xsl-list at lists dot mulberrytech dot com>
Date: Fri, 11 Jan 2002 09:09:37 +0100
Subject: RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
Reply-to: xsl-list at lists dot mulberrytech dot com

Jeni, Marc,

> > 2.      <... select-group=":fancy-number:2" >
> > </matcher>
> >
> > could be challenging to implement (spontanous idea of using the
> > indexes as offsets in counting parenthesis)
>
> I like this method better than the Omnimark method of assigning the
> names within the regular expression itself, because it doesn't clutter
> the regular expression (if anything it makes it more readable) and it
> allows regular expressions to be reused.

doing it the omnimark way coincides however with the idea of putting
everything you need when writing or editing a transformationsheet in the
same place

while explicitely declaring the named pattern subgroups in some location
has some appeal for people used to finding their way around classes or
(shudder) Schema's, I personally belief that it is the kind of niftyness
to hinders its broad acceptance

that being said, I will be perfectly happy with *any* progress in that
area - if the feature is there, people will use it regardless of the
syntax

> There are a couple of issues that would need to be worked out with it,
> though. What happens if you have a regular expression that involved
> two instances of the named subexpression at the same level:
>
>   <matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
>     ...
>   </matcher>
>
> You need to have separate indexes to indicate which one you're talking
> about, plus some kind of syntax to pull out submatches within the
> named subexpression. Borrowing from XPath syntax (which might be a bad
> idea), you might have:
>
>   fancy-number[2]/*[2]

XPath would be nice, yes

> to indicate the second subexpression of the second fancy-number
> subexpression in the matched string.
>
> Actually, that syntax isn't all that bad - you can imagine the matcher
> actually builds up a tree structure based on the subexpression
> matches (you need 'anonymous' elements for unnamed subexpressions, but
> you should be able to get away with that using elements in some
> restricted namespace or something)...

couldn't we use * for that?

reading the subsequent message on this thread, I'm getting rather
enthusiastic about using XPath (c.q. intermediate nested data-structures
(!)) to store & access (& later on perhaps manipulate) the matched
regions of a regex... and perhaps we could use Jaxen or SAXPath to build
that...?

> > this also makes me think about your earlier mentioning of dynamic
> > regexes you probably expect anything that qualifies as a
> > text-representing xsl parameter to be possibly carrying part of the
> > regex to be executed...
>
> I think that if you could build the named regular expressions
> dynamically, then this idea would work fine. Going back to the keyword
> example that I used on an earlier mail, you could do:
>
> <xsl:regexp name="keyword-as-word"
>             select="concat('\W', $keyword, '\W')" />
>
> If named regular expressions were like variables, you could assign
> them values at the global or local level...

such a substitution or parametrisation mechanism comparable to XSLT,
offering you the ability to pass portions of, or complete regexes to the
tranformation from the environment seems indeed to be a nice-to-have

as a usage case however, it remains to be seen whether the data which
gets massaged by regexslt will be worth the while developing a really
nifty transformation using all these interesting language design
constructs

in the end, people might be more interested in pure performance or
scalability (as was the case with Omnimark), or syntax conciseness
rather than having more-than-one-way-to-do-it ;-)

cheers,

</Steven>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

References:
- Re: Regular expression functions (Was: Re: comments on December F&O draft)
  - From: Jeni Tennison

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]