This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fixes tree-loop-distribute-patterns issues


On Fri, Jun 21, 2013 at 12:44:03PM +0200, Torvald Riegel wrote:
> On Fri, 2013-06-21 at 13:24 +0200, OndÅej BÃlka wrote:
> > On Fri, Jun 21, 2013 at 10:07:08AM +0200, Torvald Riegel wrote:
> > > On Fri, 2013-06-21 at 04:00 +0200, OndÅej BÃlka wrote:
> > > > I choose a O0 as lesser evil than having reference implementation twice
> > > > faster depending what compiler you do use.
> > > > 
> > > > One solution is mandate to run benchmarks with fixed version of gcc and
> > > > fixed flags.
> > > > 
> > > > Second variant could be have assemblies and regeneration script that would 
> > > > be ran with specific gcc.
> > > 
> > > Yes you can try to find a niche where you hope you can compare stuff.
> > > But you can as well just get all the measurements you can from people
> > > out there -- with whatever version of gcc is available -- and take this
> > > into account when drawing conclusions from the data.  That is, you'd
> > > setup your machine learning in such a way that it looks a data and
> > > checks whether there is high confidence for a certain conclusion (eg,
> > > new version of code faster or not).  Confidence will be lower if, for
> > > example, we see performance vary a lot with different versions of gcc,
> > > but remain more or less unchanged when gcc versions don't differ; but if
> > > performance varies independently of the gcc version, that's also useful
> > > to know because it means we draw our conclusion from a wider set of
> > > tests.  Likewise for other properties of the test environment such as
> > > the CPU etc.
> > >
> > And what we will do with this data?
> >
Please answer this question. When you make vague proposals you risk that
people can argue that your bussiness plan is to make $10000 piece of
machinery for task that can be easily solved with $5 screwdriver.

I sent simple measurement. Please state what additional information you
want.

You migth mine gold there but trying that on byte by byte version for
artifical set of data is one of least likely places. 
 
> > You typicaly use machine learning to learn trivial facts from data sets
> > that are too vast to browse manualy.
> 
> Is your average web search just about trivial facts?
>
Actually yes. Most of my search are copypasted lines with error
messages. Over time google results got worse and worse as more
sophisticated techniques add only noise here. Other is searching for
projects with acronym where again you could get much better results if
google did not ignore casings, did not think it is misspelling etc.

 
> Seriously, if all that machine learning and "big data" gave you were
> trivial facts, do you think that people would invest as much into this
> as they do?
>
I said trivial not worthless. It is simple to find if somebody is man or
woman but advertisers pay lot for this information.

When you do machine learning an scope is broad. Number of possible facts
increases exponentialy with size so you start having problems that most
of these are duplicate, most are not relevant, large portion of them was
caused by chance (You are at situation that you have 1000000 of possible
facts each being 1/1000000 false possitive.)

A filtering to reduce breath is needed and when you restict to simpler
and simpler things it is easier to do. 

> > It is faster to just browse results
> > and you will train intuition on it. 
> 
> Manual inspection just doesn't scale to the scope we need it to scale
> to.  We know that there are *lots* of parameters that can influence
> performance, we cannot control all of them, and we likely don't even
> know all of them.

Please be specific. Most difficult part is have enough data for causes
manifest as signal, not just random noise. For example if you do not
track time when benchmarks were ran you could miss that those ran in leo
and virgin were slower.

Second difficult step once you have data is to act onto it. You need to
form an model to explain what was happening so you can modify your code
accordingly.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]