This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][BZ #12100] strstr with unaligned loads.

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Rich Felker <dalias at aerifal dot cx>
Cc: libc-alpha at sourceware dot org
Date: Tue, 27 Aug 2013 04:04:14 +0200
Subject: Re: [PATCH][BZ #12100] strstr with unaligned loads.
Authentication-results: sourceware.org; auth=none
References: <20130825090758 dot GA23686 at domone dot kolej dot mff dot cuni dot cz> <20130825155406 dot GV20515 at brightrain dot aerifal dot cx> <20130826070925 dot GA4292 at domone dot kolej dot mff dot cuni dot cz> <20130826162446 dot GZ20515 at brightrain dot aerifal dot cx>

On Mon, Aug 26, 2013 at 12:24:46PM -0400, Rich Felker wrote:
> On Mon, Aug 26, 2013 at 09:09:25AM +0200, OndÅej BÃlka wrote:
> > On Sun, Aug 25, 2013 at 11:54:06AM -0400, Rich Felker wrote:
> > > On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > > > Hi, 
> > > > 
> > > > This patch continues to remove ineffective sse4.2 functions. Now we
> > > > focus on strstr.
> > > > 
> > > > I use a techique of doing vectorized search for first pair of characters
> > > > which is fastest in practice.
> > > 
> > > What do you do after finding the first pair? As far as I can tell,
> > 
> > It enumerates all occurences of first pair. 
> > 
> > > this approach is only useful at the start of the string. Once you've
> > > gotten into two-way, searching for the next occurrence of the first
> > > two characters is not a useful operation. Of course it looks to me
> > 
> > Not very useful? A two way algorithm spend most time in ineffective
> > looking for these (or calculating critical factorization which is also
> > slow.). Actual code is following
> 
> No, before you make any such claims, you should read up on and
> understand the algorithm. You may be able to apply the optimized "find
> first character" or maybe even "find first pair" to one case here, but
> it is not searching for the beginning of the needle. The code you
> quoted:
>
I did not say which pair to search, from code and ommited comment it was clear
that it is a pair after critical position.

> > And most time first character will not match and in cases when it will
> > second won't most of time. 
 
> > 
> >           phaystack = &haystack[i + j];
> >           while (i < needle_len && (CANON_ELEMENT (*pneedle++)
> >                                     == CANON_ELEMENT (*phaystack++)))
> >             ++i;
> >           if (needle_len <= i)
> > 	    snip
> >           else
> >             {
> >               j += i - suffix + 1;
> >               memory = 0;
> >             }
> 
> is checking how many characters of the second factor of the needle
> match at the current point, and if it's not all of them, advancing by
> that many characters plus 1, and repeating.
> 
As I said it is mostly shift by one character until pair is found.

> Note that the above-quoted code is only for the short-needle case
> without the bad character table, which I also claim should be removed
> (empirically, glibc is much slower than musl on short needles,

False, on random testcases sse4.2 often beats musl by 50% factor. I also
tested musl on noise and it is 20% slower due of factorization overhead.

> presumably because musl always uses the bad-character table). There
> may be additional logic considerations to integrating any optimized
> "next pair" search with the bad character table; I have not checked
> this.
> 
Bad character is also heuristic that is not needed when pair heuristic
is used as it would only slow it down. 

There is a lower bound that you need to inspect each character to check
if it is zero. Provided that pairs are unlikely I could get within
factor of 2 of that lower bound. I traded here worse benchmark
performance for better real-time performance. For benchmark-oriented I
would seek for triples which will cut most of misprediction.


> > And most time first character will not match and in cases when it will
> > second won't most of time. 
> 
> The case you have to worry about is when they do match every single
> time, and whether this increases the runtime from linear to quadratic.
> 
Please read mails before replying to them, from original mail.

> On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > This could be solved by standard buy or rent techique which I employ.
> > For any position p there could be only p+64n+512 comparisons until I
> > switch to two way algorithm which gives linear complexity.

Follow-Ups:
- Re: [PATCH][BZ #12100] strstr with unaligned loads.
  - From: Rich Felker

References:
- [PATCH][BZ #12100] strstr with unaligned loads.
  - From: OndÅej BÃlka
- Re: [PATCH][BZ #12100] strstr with unaligned loads.
  - From: Rich Felker
- Re: [PATCH][BZ #12100] strstr with unaligned loads.
  - From: OndÅej BÃlka
- Re: [PATCH][BZ #12100] strstr with unaligned loads.
  - From: Rich Felker

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]