This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ #12100] strstr with unaligned loads.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Rich Felker <dalias at aerifal dot cx>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 27 Aug 2013 04:04:14 +0200
- Subject: Re: [PATCH][BZ #12100] strstr with unaligned loads.
- Authentication-results: sourceware.org; auth=none
- References: <20130825090758 dot GA23686 at domone dot kolej dot mff dot cuni dot cz> <20130825155406 dot GV20515 at brightrain dot aerifal dot cx> <20130826070925 dot GA4292 at domone dot kolej dot mff dot cuni dot cz> <20130826162446 dot GZ20515 at brightrain dot aerifal dot cx>
On Mon, Aug 26, 2013 at 12:24:46PM -0400, Rich Felker wrote:
> On Mon, Aug 26, 2013 at 09:09:25AM +0200, OndÅej BÃlka wrote:
> > On Sun, Aug 25, 2013 at 11:54:06AM -0400, Rich Felker wrote:
> > > On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > > > Hi,
> > > >
> > > > This patch continues to remove ineffective sse4.2 functions. Now we
> > > > focus on strstr.
> > > >
> > > > I use a techique of doing vectorized search for first pair of characters
> > > > which is fastest in practice.
> > >
> > > What do you do after finding the first pair? As far as I can tell,
> >
> > It enumerates all occurences of first pair.
> >
> > > this approach is only useful at the start of the string. Once you've
> > > gotten into two-way, searching for the next occurrence of the first
> > > two characters is not a useful operation. Of course it looks to me
> >
> > Not very useful? A two way algorithm spend most time in ineffective
> > looking for these (or calculating critical factorization which is also
> > slow.). Actual code is following
>
> No, before you make any such claims, you should read up on and
> understand the algorithm. You may be able to apply the optimized "find
> first character" or maybe even "find first pair" to one case here, but
> it is not searching for the beginning of the needle. The code you
> quoted:
>
I did not say which pair to search, from code and ommited comment it was clear
that it is a pair after critical position.
> > And most time first character will not match and in cases when it will
> > second won't most of time.
> >
> > phaystack = &haystack[i + j];
> > while (i < needle_len && (CANON_ELEMENT (*pneedle++)
> > == CANON_ELEMENT (*phaystack++)))
> > ++i;
> > if (needle_len <= i)
> > snip
> > else
> > {
> > j += i - suffix + 1;
> > memory = 0;
> > }
>
> is checking how many characters of the second factor of the needle
> match at the current point, and if it's not all of them, advancing by
> that many characters plus 1, and repeating.
>
As I said it is mostly shift by one character until pair is found.
> Note that the above-quoted code is only for the short-needle case
> without the bad character table, which I also claim should be removed
> (empirically, glibc is much slower than musl on short needles,
False, on random testcases sse4.2 often beats musl by 50% factor. I also
tested musl on noise and it is 20% slower due of factorization overhead.
> presumably because musl always uses the bad-character table). There
> may be additional logic considerations to integrating any optimized
> "next pair" search with the bad character table; I have not checked
> this.
>
Bad character is also heuristic that is not needed when pair heuristic
is used as it would only slow it down.
There is a lower bound that you need to inspect each character to check
if it is zero. Provided that pairs are unlikely I could get within
factor of 2 of that lower bound. I traded here worse benchmark
performance for better real-time performance. For benchmark-oriented I
would seek for triples which will cut most of misprediction.
> > And most time first character will not match and in cases when it will
> > second won't most of time.
>
> The case you have to worry about is when they do match every single
> time, and whether this increases the runtime from linear to quadratic.
>
Please read mails before replying to them, from original mail.
> On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > This could be solved by standard buy or rent techique which I employ.
> > For any position p there could be only p+64n+512 comparisons until I
> > switch to two way algorithm which gives linear complexity.