This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [GOLD] add new method for computing a build ID
- From: Cary Coutant <ccoutant at google dot com>
- To: gpike at chromium dot org
- Cc: binutils at sourceware dot org
- Date: Wed, 3 Oct 2012 15:36:35 -0700
- Subject: Re: [GOLD] add new method for computing a build ID
- References: <20121003192017.342EF1E0A04@geoffp.mtv.corp.google.com>
> The patch adds a new mathematical function for build ID, in addition
> to the two that are available now (SHA-1 and MD5). The new function
> does MD5 on chunks of the output file and then does SHA-1 on the MD5
> hashes of the chunks. This is easy to parallelize.
Why use SHA-1 to combine the MD5 hashes? Why not just use MD5
throughout? Or SHA-1 throughout? Is it the case that feeding MD5 into
itself is known to be weaker than one MD5 pass? If the benefit is from
parallelization, I don't really see why you'd need to switch from
SHA-1 to MD5 -- couldn't you just add your approach on top of whatever
hash function is selected?
I've got an incremental linker patch (haven't posted it yet because I
haven't finished writing the test cases) that recomputes the build id
for an incremental link by saving the context structure and streaming
just the new data into it. At the time I was implementing that, I was
thinking about rewriting the regular hash so that it would compute the
hashes of chunks in each Relocate_task, then combine the resulting
chunks at the end (adding in a few pieces not covered by the relocate
tasks). The difference is that each chunk would be the set of
contributions from an individual .o file, rather than a fixed-size
chunk of the output file. I think this would have an advantage,
though, in taking advantage of the cache locality as we're writing the
data to the output file, rather than starting up a whole new set of
tasks to go back over the data.
-cary