App runs 8x slower on dual core machine (with test case to replicate issue)

Hi all,

While trying to compile my multithreaded app written for Linux on
Cygwin to run on Windows, I discovered the app would perform 8x slower
on a machine with the same specs. I then went on to triage the issue
and found that if I set CPU affinity of that process to 1 (i.e. single
core), I'd get it to speed up to almost the speed I'd get under Linux
(set to single core too).

I dug deeper and had my suspicion on a Cygwin bug. I suspected a
problem in its thread singalling (condvar). So to test my hypothesis,
I created a minimal test case to show case this issue. This minimal
test case compiled on MSVC++ too and the difference is staggering.
What you'll find is if you started the process with CPU Aff = 1,
you'll get it to run >8x as fast as the default.

On my machine, it took 4300ms to run in dual core mode, 460ms to run
when CPU Aff = 1.

Core2Duo 2.33GHz
All firewalls disabled
Virus / Malware scanners disabled
Boost 1.48.0 (as per official Cygwin installation)
[Only for test case app] Boost.Threadpool -

Code to replicate the issue (get Boost.Threadpool from above):

#include <boost/threadpool.hpp>
#include <iostream>
#include <time.h>

inline int GetTickCount()
    timespec t;
    clock_gettime(CLOCK_MONOTONIC, &t);
    return (((int)t.tv_sec) * 1000) + (t.tv_nsec / 1000000);

class Test
    void Add()
    void Delete()

int main(int argc, char** argv)
    int start;
        boost::threadpool::pool tp(50);
        Test test;

        start = GetTickCount();
        for (int i=0; i<100000; i++)
            tp.schedule(boost::bind(&Test::Add, &test));
            tp.schedule(boost::bind(&Test::Delete, &test));


    int elapsed = GetTickCount() - start;
    std::cout << elapsed << std::endl;


    return 0;


