This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: Generic strlen
> On 10/29/2010 11:49 AM, David A. Ramos wrote:
>> Hi newlib maintainers,
>>
>> Our checking tools (KLEE) keeps complaining about newlib's generic strlen version. It looks like it was patched back in May 2008 to include a speed hack that violates ISO C. It attempts to first word align the pointer, and then read a word at a time to check for a NULL:
>>
>> libc/string/strlen.c:
>> 73 /* If the string is word-aligned, we can check for the presence of
>> 74 a null in each word-sized block. */
>> 75 aligned_addr = (unsigned long *)str;
>> 76 while (!DETECTNULL (*aligned_addr))
>> 77 aligned_addr++;
>>
>> Obviously, this can read out of bounds if the memory allocated to that string is less than a word in length. While on most architectures this wouldn't actually cause a segfault, I don't think that's a safe assumption for the generic version of a libc routine. The same patch included an i386 target containing the same algorithm, which may be perfectly acceptable.
>>
>> Thoughts?
On Oct 29, 2010, at 11:02 AM, Eric Blake wrote:
> As long as reading beyond the end of a string does not fault, you can't
> detect the violation of the standard, so the as-if rule applies. Prove
> to me that there is an architecture that can fault on anything less than
> a word boundary, and then we'll talk about changing the code. Until
> then, this implementation may violate strict C89, but it is by all means
> portable to all possible platforms that newlib will ever target.
Take a look at the February 2008 edition of the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2, Section 18.2: Debug Registers:
"For each breakpoint, the following information can be specified:
- The linear address where the breakpoint is to occur.
- The length of the breakpoint location (1, 2, or 4 bytes)."
"When the DE flag is set, the processor interprets bits as follows:
11 - Break on data reads or writes but not instruction fetches."
Using this version of strlen precludes a developer from setting a watchpoint on a byte within the same word as the end of a string. It would, in fact, fault erroneously and make debugging difficult.