This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin bash regexp matching doesn't treat "\b" properly


Dave Korn <dave.korn.cygwin <at> googlemail.com> writes:

> 
> $ [[ "foo" =~ [[:\<:]]foo[[:\>:]] ]]; echo $?
> 0
> 
>   (Note that I had to backslash-escape the < and > there.  In other contexts
> that might not be needed.)

But here's something weird with how bash manages quoting inside [[ ]].  If you 
add a subexpression, you no longer need to quote < or >:

$ [[ foo =~ ([[:<:]]foo[[:>:]]) ]]; echo $?
0

With further experimentation, it turns out that cygwin's regex(3) does not 
understand [[:<:][:>:]] as a character class that accepts either direction of 
word boundary (for shame).  So, modulo the difference in the number of 
subexpressions, the closest representation of \b becomes:

([[:<:]]|[[:>:]])

and an expression to match words that either end in a or begin in b would be:

$ [[ ' b ' =~ ([a ]([[:<:]]|[[:>:]])[b ]) ]]; echo $?
0
$ [[ ' ab '  =~ ([a ]([[:<:]]|[[:>:]])[b ]) ]]; echo $?
1

which looks so much shorter as ([a ]\b[b ])

-- 
Eric Blake



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]