[ANNOUNCEMENT] Updated: dash 0.5.12-2

Corinna Vinschen corinna-cygwin@cygwin.com
Wed Feb 15 13:52:23 GMT 2023


Hi Brian,

On Feb 13 20:37, Corinna Vinschen via Cygwin wrote:
> On Feb 13 12:03, Brian Inglis via Cygwin wrote:
> > On 2023-02-13 10:43, ASSI via Cygwin wrote:
> > > Corinna Vinschen via Cygwin writes:
> > > > Can you give me an example?  I'm a bit puzzled because fnmatch as well
> > > > as glob in Cygwin support native characters.
> > 
> > But not locale dependent named character classes like regexp in paths.
> 
> I checked the dash code of curent dash git, and while its internal glob
> implementation supports character classes, they are no localized, using
> standard singlebyte functions isalnum, isalpha, etc. under the hood.
> 
> So, yeah, what you say further down this mail... looks like dash
> supports locale dependent character classes only with glibc.
> [...]
> Either way, I don't care much for what a certain application provides by
> itself.  I'm talking about our libc, that is Cygwin, and what it
> provides to processes calling its implementations of regcomp/regexec,
> glob and fnmatch.
> 
> All these functions have been taken from FreeBSD and all three suffer
> shortcomings:
> 
> - regcomp/regexec supports POSIX named character classes, collating
>   symbols, and equivalence class expressions, but all of them only work
>   for ASCII chars.
> 
> - fnmatch and glob support neither of named character classes,
>   collating symbols, and equivalence class expressions.
> 
> I checked the upstream code in FreeBSD, OpenBSD and NetBSD and none of
> these functions are improved to support locales (regcomp) or any of
> the character classes stuff (fnmatch/glob).
> 
> So, if we want to add this support to Cygwin (and thus, to all
> applications calling the libc implementation of these functions),
> quite a bit of work is required.
> 
> Being able to fetch the implementation from some other source
> would reduce the effort enourmously :}

I took the liberty to add [:<class>:] support to Cygwin's fnmatch(3) and
glob(3) functions.  They also recognize collating symbols [.<coll.] and
equivalence class expressions [=<equiv>=].  But the latter two are not
implemented yet and fnmatch/glob simply skip them in the pattern.

Given that glob and fnmatch use wide characters internally, the support
for character classes is internationalized by default, albeit in a
slightly differentt way than in glibc.  The classes a unicode character
belongs to is not locale dependent in Cygwin/newlib.  All characters
have their classes assigned all the time, so, for instance, the german
character 'ä' is lower and alpha even in the en_US.utf8 locale.

The currently building cygwin test release 3.5.0-0.174.gd6d4436145b8
contains the new code.  Would you mind to build a dash for testing so we
can see if and how it works?


Thanks,
Corinna


More information about the Cygwin mailing list