cygwin qsort erratic isolated

Mon Sep 21 20:01:25 GMT 2020

On 2020-09-21 12:34, Kurt Carlson via Cygwin wrote:
> The attached source replicates the erratic qsort problem. This compiles
> both cygwin and RHEL. In the following, the first descending dpcsort has
> erroneous results:
> 
> kc: uname -a
> CYGWIN_NT-10.0 kacds 3.1.7(0.340/5/3) 2020-08-22 17:48 x86_64 Cygwin
> kc: cc c19.c -o c19
> kc: ./c19
> # consort qsort struct descending -c 300000
> # consort:0:
> # dpcsort qsort divide descending -c 300000
> # dpcsort:0
> #  confirmed deaths dpc      location:
> 239   663437  29259   4.4102 Peru
> 240   633339  20348   3.2128 Colombia
> 195   610957  65816  10.7726 Mexico
> 145   488513  29234   5.9843 Spain
> 137   338676  41514  12.2577 United Kingdom
> 190   378752  21797   5.7550 Iran
> 000 26068854 863584   3.3127 World
> 237  3997865 123780   3.0962 Brazil
> 196  6114406 185744   3.0378 United States
> 246   414739  11344   2.7352 Chile
> 039   630595  14389   2.2818 South Africa
> 245   428226   8971   2.0949 Argentina
> 085  3853406  67376   1.7485 India
> 169  1005000  17414   1.7327 Russia
> 084   317528   4351   1.3703 Bangladesh
> 186   317486   3956   1.2460 Saudi Arabia
> # dpcsort qsort divide ascending -c 300000
> # dpcsort:1
> # dpcsort qsort divide descending -c 300000
> # dpcsort:0
> #  confirmed deaths dpc      location:
> 137   338676  41514  12.2577 United Kingdom
> 195   610957  65816  10.7726 Mexico
> 145   488513  29234   5.9843 Spain
> 190   378752  21797   5.7550 Iran
> 239   663437  29259   4.4102 Peru
> 000 26068854 863584   3.3127 World
> 240   633339  20348   3.2128 Colombia
> 237  3997865 123780   3.0962 Brazil
> 196  6114406 185744   3.0378 United States
> 246   414739  11344   2.7352 Chile
> 039   630595  14389   2.2818 South Africa
> 245   428226   8971   2.0949 Argentina
> 085  3853406  67376   1.7485 India
> 169  1005000  17414   1.7327 Russia
> 084   317528   4351   1.3703 Bangladesh
> 186   317486   3956   1.2460 Saudi Arabia
> 
> In researching this I looked at source for eight qsort implementations:
> 
> 1a. gnu (RHEL 7): Always works.
>         ./c19
> 
> 1b. RHEL/gnu compiled under cygwin: Always works.
>         Not proviced in c19.c
>     Had to incorporate both qsort.c and msort.c glibc stdlib.
> 
> 2a. cygwin unmodified: fails (above example)
>         ./c19
>     Both insertion sorts 'goto pop' vs. return.
> 
> 2b. cygwin modified: mostly works
>         ./c19 -q qcygw -r
>     Commented out secondary insertion sort (per Dennis de Champeaux).
> 
> 2c. cygwin compiled under RHEL 7: fails (as under cygwin)
>         ./c19 -q qcygw
> 
> 2d. cygwin modified under RHEL 7 (like 2b): mostly works
>         ./c19 -q qcygw -r
> 
> 3.  netbsd: mostly works
>         Not provided in c19.c
>     The only insertion sort returns.
> 
> 4.  openbsd: mosly works
>         Not provided in c19.c
>     Both insertion sorts return.
> 
> 5.  freebsd: did not attempt
>     Both insertion sorts return.
> 
> 6.  MacOS: did not attempt
>     Both insertion sorts return.
> 
> 7.  Android: did not attempt (older version of openbsd)
>     Both insertion sorts return.
> 
> 8.  Dennis de Chameaux 'bentley': mostly works
>         Not provided with c19
>     Besides commenting out secondary insertion sort, this code
>     has return vs. goto and he made corrections to reduce recursion.
> 
> Types 2-8 diverged (greatly) from the same base. The cygwin version is a
> bit puzzling using goto vs. return when there is a general concern for
> stack usage with the recursive calls. All versions of qsort are cryptic
> (stated kindly). I believe gnu did the right thing in re-writing, but it
> still is not a pleasant read.
> 
> The failures are triggered with values from a float divide by zero, either
> in the compare routine or imbedded in the structure being sorted. While
> divide by zero behaviour is "undefined" that really should not extend to
> qsort erraticism. The "mostly works" gets the numeric values in the correct
> order, but the divide by zero values are mis-placed.
> 
> For a better representation use either cygwin or RHEL:
>         ./c19 -c 0 -z -q qcygw -r
> 
> If you add the --verbose option, values are displayed in the compare
> routine. The divide by zero float is always 0xffc00000. Any subtract
> operation with that value always results (int) as 0x80000000,
> pathologically negative.

GIGO!

That value is a floating point NaN generated by divide by zero.
Any floating point operation with NaN will propagate a NaN.
All comparisons with NaN result in false.

Conversions of NaN results in undefined values, but for integer conversions
Intel always returns MIN_INT; AMD, ARM, and others may generate other values.

See
"What Every Computer Scientist Should Know About Floating Point Arithmetic",
David Goldberg, Xerox PARC, ACM Computing Surveys, Vol 23, No 1, March 1991:

http://pages.cs.wisc.edu/~david/courses/cs552/S12/handouts/goldberg-floating-point.pdf

also
"Floating point exception tracking and NAN propagation", Agner Fog, DTU, 2018-2020:

	https://www.agner.org/optimize/nan_propagation.pdf

> All four bsd variants I attempted, both under cygwin and rhel, do not fully
> cope with the 0xffc00000, cygwin most poorly. I contacted Dennis de
> Champeaux, this problem is unique from what he previously addressed
> although his fix improves cygwin's qsort up to the other bsd variants. The
> obvious circumvention is to avoid divide by zero. That all bsd variants
> have similar issues, and the crypticness of all the variants, it is less
> than optimal to attempt any kind of fix. E.g., call it not "broken" but a
> "feature" of undefined divide by zero even though it is well removed from
> the operation and a non-issue for RHEL. Be prepared to advise anybody who
> experiences erratic qsort behaviour of this "feature".

Erratic qsort behaviour from the results of the user program and user provided
floating point comparison routine that ignores floating point exceptions and
resulting undefined values.

> Please advise what, if any, assistance I can provide at this point.

You can write a better program by:

* avoiding or properly handling divide by zero (compare dividend == 0, or
better, +/-FLT_MIN, a very small epsilon that will check when your results are
subnormal, and subsequent operations are likely to produce 0),

and/or

* using a better floating point comparison routine that takes care of NaN
generated by divide by zero (use isnan()), also Inf (use isinf()) generated by
other invalid floating point operations, or isunordered(), and produces
consistent results from comparisons of undefined floating point values.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]