This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improving libm-test.inc structure and maintenance


On Thu, 9 May 2013, Ondrej Bilka wrote:

> On Sun, May 05, 2013 at 08:23:12PM +0000, Joseph S. Myers wrote:
> > On Sun, 5 May 2013, Ondrej Bilka wrote:
> > 
> > > You do not have to review if you do following: 
> > 
> > Tools may be able to use various heuristics to reduce the number of cases 
> > presented for human review.  That human review is still needed to ensure 
> > good, valid bug reports.  (Note that Jakub found various bugs in MPFR in 
> > his random fma testing.  You need to decide what component the bug is in 
> > before reporting it.)
> 
> Depends on what is found. If it founds only 10 cases in year then
> filtering is not necessary. My main concern is that when testing finds
> new bug (Which can be needle in haystack of existing bugs) then everybody 
> forgotten that it took place and did not read logs. Some notification system
> is necessary.

Frankly, we have more need right now - much more need - for people working 
on fixing bugs than for systems detecting and filing new bugs that have 
not affected any human enough for them to file the bugs.  I'd urge working 
on fixes for existing bugs in libm or any other part of glibc over new 
bug-finding systems, until the number of open bugs is much smaller than at 
present.

Few people have been interested in joining me in the patch-a-day goal, 
with a reasonable proportion of those patches being bug fixes, for 
improving glibc and dealing with the backlog of known issues.  Recruit ten 
more people who actively and accurately triage new bugs on a day-by-day 
basis and work daily on fixing bugs, and your approach of more automatic 
reporting to glibc Bugzilla may become more feasible.  Without those 
people, it's likely to be harmful rather than helpful to glibc development 
- even if the new bugs are in fact valid and not duplicates.

Given the extremely limited resources presently spent on bug fixing and 
triage, it's important to ensure new bugs reported are of high quality so 
those resources are productively spent improving glibc rather than dealing 
with poor-quality, incorrect or duplicative bug reports.

> Bugzilla is best place for notification. Second alternative is send mail
> which has higher probability of being ignored.

Any automatic tester should notify *the person running the tester*.  That 
person should then take responsibility for understanding the notifications 
and producing reports on the human window in glibc Bugzilla where there 
are genuinely new bugs.  It's the responsibility of the person running the 
tester to deal with notifications or to find someone to do so, rather than 
dumping them directly into Bugzilla without human review.  If you don't 
have the human resources to review the output of your system and produce 
good human bug reports from it, then at most put information on an 
external site and a link on the wiki to where people can find those 
external reports if they wish to look for new glibc bugs among them - but 
it will probably be largely ignored because there are too many *human* bug 
reports for the present level of work on bug fixing, even without new 
sources of potential bugs.

> > I'm thinking more on the lines of John Regehr's testing of compilers with 
> > Csmith.  Reporting one bug doesn't wait on other bugs being fixed if it 
> > looks to a human that they are different.  Failures appearing in different 
> > functions may have the same underlying cause, while failures in the same 
> > function may have different causes - that's something a human can judge.
> > 
> In libm functions are mostly standalone, same underlying cause can
> happen only by pattern which is repeated in code. Then having list of
> functions affected is handy.
> 
> I do not quite follow how you use testing with Csmith. Generate random
> expressions and look how functions behave?

See the bugs he's reported to GCC Bugzilla over the years - human bug 
reports, with reduced testcases - and his blog, and the papers he's 
published about finding bugs through random testing.

Before working on finding glibc bugs through such random testing, it would 
be a very good idea to (a) study the existing literature in the area - 
such work should be considered as much a piece of potentially publishable 
research, as a direct contribution to glibc, and should be approached 
accordingly - and (b) pay close attention to what the people who are 
actually fixing such bugs as you might hope to find say they find is 
useful regarding reporting them, rather than starting from external 
assumptions about how you would like to handle reporting bugs, just as 
John Regehr has paid attention to reporting bugs in ways that are useful 
to the projects to which he reports them (rather than just dumping the 
original large, unreduced and unreviewed tests into Bugzilla, for 
example).

> > I think automatic bug filing is always a bad idea - an automatic process 
> > may produce a list of *candidate* issues, tracked however is convenient, 
> > but the human should be in the loop before any such candidate issue 
> > becomes an actual bug report in glibc Bugzilla, not just after.
> > 
> What about adding separate state for example GENERATED that will not
> show unless asked.

In the absence of more bug triagers and fixers, a completely separate 
tracking system should be used for automatically-generated candidate 
issues like this, not glibc Bugzilla until a human has reviewed them and 
decided they are genuine and new glibc bugs.  Again, get more people 
working on bug fixing and triage, and the appropriate approaches may 
change, but get the extra people contributing *first* before dumping 
lower-quality bugs in Bugzilla.

> > Automatic closing of bugs is also a bad idea; a human needs to judge 
> > whether the whole issue is genuinely fixed or whether the commit only 
> > fixes particular cases and other parts of the same issue remain to fix.
> > 
> A test that tests only particular cases is inadequate test. You can not
> decide if issue is fixed with tests that are green before and green
> after. You also do not reliably know if regression happened. Closing
> bug is good way to fix it and make human add additional neccessary data.

Automatic systems are there as the servant of humans, not their master.  
"make human add" is fundamentally the wrong idea.  If no-one is paying 
attention on a particular day when a computer detects that an issue might 
be fixed (given that the issue was reported / reviewed as valid by a human 
in the first place), the issue should remain open until someone is looking 
at it and can review the notification; it should not be quietly closed 
without that review.  With extra bug reviewers, waiting for human review 
is not a burden here.  Without extra bug reviewers to notice errors, 
closing a bug when it may not be properly fixed is actively destructive 
and harmful to glibc.

It's impossible in advance to write a test that covers all cases, because 
until the issue has been analyzed and fixed you don't know how many 
instances of the issue appear in different places in the code, but it is 
possible to write one that covers at least one failing case, with the 
understanding that a human will need to check when it starts to pass and 
decide if the issue is really fully fixed.

> I plan write something like this but currently do not have that much time.
> I added it to my TODO list and probably will look in freeze.
> 
> Everybody would be welcome to join. What are options where to host it?

I suggest Savannah for GNU-related free software projects.  But as above, 
I advise (a) fixing existing bugs as higher priority than systems to find 
new ones; (b) understanding what people who have gone through and fixed 
hundreds of bugs in Bugzilla actually find useful and working based on 
that experience to optimize things for the people who fix bugs rather than 
optimizing for the person running an automatic system to find them; (c) 
understanding the existing literature and experiences with random testing, 
with a view to possibly making a publishable contribution to that 
literature.

If you do not have that much time, any one bug fix is a valuable 
contribution to glibc and likely to be much more practical than starting a 
substantial research project on random testing.  So is triage of existing 
bugs to identify if they are valid, non-duplicative and still applicable 
to current glibc.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]