This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize strstr, strcasestr and memmem


On 12/31/1969 05:00 PM,  wrote:

>> I use only SSE2 instructions and my algorithm is faster than using SSE4 instructions.
>> With trivial modifications I could also use plain arithmetic/AVX2

I knew that when I first posted the twoway strstr implementation that it
was not optimal for short searches, and I totally welcome these attempts
to approve things.  My biggest concern is the avoidance of quadratic
effects (strstr should always be linear speed, as proven by the twoway
algorithm, and memmem should be sublinear with large enough needles, by
skipping over portions of the haystack).

> The biggest problem I have with reviewing *any* of this code is that
> performance is relative to benchmarks.
> 
> As a community we have no baseline benchmark numbers.

Precisely - the very reason that I have not tried to further optimize
anything in glibc beyond my twoway string, and the reason the SSE4
quadratic pessimization even got in to glibc in the first place, is
because we don't have a good benchmark.

> 
> I would like to see someone from the community come forward to help
> sort this out.
> 
> I think at a minimum we need:
> 
> * A list of baseline benchmarks we are going to use to evaluate
> performance related submissions.
> * Collect data on the runs.
> * Record everything in the wiki including how to run the benchmarks
> and the collection of growing results.

Among other things, the benchmark must cover:

short needle, short haystack, at all possible starting alignments of
both strings, as well as alignment of the match (when present) as well
as time to detect a miss

short needle, long haystack where the match or miss is late in the haystack

long needle short haystack must exit early rather than wasting time
computing the end of the needle

long needle long haystack must not be quadratic, even if the needle is
highly repetitive

> 
> Are either of you interested in helping the community develop some
> metrics for acceptance of performance related patches?

While I can't promise to spend time writing such metrics from scratch, I
will gladly help review them, as I'm interested in backporting whatever
speedups we have here back into gnulib.  I still think that there are
some arch-specific assembly code tweaks that can always beat a generic C
implementation, but having a fast and reliable generic C implementation
portable to all platforms is a worthy goal for gnulib.

The other thing I would really like to see is a way to pre-compile a
string search.  That is, the regcomp()/regexec()/regfree() model was
invented precisely because it is very common to search for a common
needle across multiple haystacks.  strstr() and memmem() must remain for
the ease of use on a one-shot search, but they have the overhead of
recomputing the setup on every invocation.  I think it would be worth
adding additional functions to glibc to make repeated searches faster
(for all but the case of long needle short haystack, because the
compilation phase doesn't know to abort early).  Maybe:

struct strmem_search_t { /* contents should be treated as opaque */ };

/* Populate psrch with compilation of a string needle.  Possible to fail
with errno set to ENOMEM.  Needle must not be altered until a call to
strmem_free. */
int strstr_comp(struct strmem_search_t *psrch, const char *needle);

/* Populate psrch with compilation of a memory needle.  Possible to fail
with errno set to ENOMEM.  Needle must not be altered until a call to
strmem_free.  */
int memmem_comp(struct strmem_search_t *psrch, const void *needle,
size_t len);

/* Use a compiled representation to search a string haystack.  */
char *strstr_exec(const char *haystack, const struct strmem_search_t
*psrch);

/* Use a compiled representation to search a memory haystack.  */
void *memmem_exec(const void *haystack, size_t len, const struct
strmem_search_t *psrch);

/* Free a compiled needle */
void strmem_free(struct strmem_search_t *psrch);

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]