R 2.15.1-1 sub() function produces unexpected output

Warren Young warren@etr-usa.com
Fri Oct 12 22:56:00 GMT 2012

On 10/12/2012 9:18 AM, Toby Johnson wrote:
> The sub() function in R 2.15.1-1 produces unexpected output.  Here is
> a minimal piece of R code:

I see the same result here, except that I tested with R 2.15.1 in both 
Windows native and Cygwin versions, as opposed to your test which used a 
different version on Linux.  That means we can rule out a 2.14 -> 2.15 
difference and the OS difference.

I believe sub() is using PCRE for this, per http://goo.gl/XxDyB

I ldd'd the Cygwin R binary[*] and it's linking to cygpcre-1.dll, which 
is PCRE 8.31 if your Cygwin is up to date.  Since the PCRE packaged with 
the R sources is 8.30 (per http://goo.gl/O2UMk) you would think this is 
fine.  If anything, this setup should result in *fewer* bugs, not more.

That said, I don't see any better idea than trying to rebuild Cygwin R 
with different regex libraries.  If I were to go about it, I'd first try 
building R with the packaged PCRE instead of the platform PCRE, then if 
that gave the same result, build without PCRE entirely.

[*] You'd guess "ldd `which R`" but you'd be wrong.  The binary is 
hidden behind two layers of indirection:

     ldd /usr/lib/R/bin/exec/R.exe

