This is the mail archive of the
cygwin-apps
mailing list for the Cygwin project.
Re: Dedup x86/x86_64 --> noarch
- From: Jon Turney <jon dot turney at dronecode dot org dot uk>
- To: cygwin-apps at cygwin dot com
- Date: Sat, 23 Apr 2016 11:51:09 +0100
- Subject: Re: Dedup x86/x86_64 --> noarch
- Authentication-results: sourceware.org; auth=none
- References: <87zistg99v dot fsf at Rainer dot invalid>
On 16/04/2016 11:03, Achim Gratz wrote:
After a discussion on IRC about de-duping the noarch content out of
package files (where I was told this would be too difficult), I've just
I think it was more along the lines of 'not yet' :)
In any case, we need noarch support in calm, before it's useful to have
dedup of arch packages to noarch.
I think I have implemented the changes to calm to support all-or-nothing
noarch (i.e. where all packages produced from a source package must be
noarch), so if you can nominate a suitable, unimportant perl package, we
can test it with that, initially.
(This wasn't quite as straightforward as just looking in another
directory for packages, as the upload validation becomes more complex:
we must check that consistent package sets result for both x86 and
x86_64 before we can move noarch packages)
To make full use of this, cygport upload will need a feature to upload
noarch packages from dist/ to noarch/ rather than <arch>/.
On 18/04/2016 20:44, Achim Gratz wrote:
Looking at the current repo content we'd save about 30GB from the dedup
of the src abd doc packages alone and probably about 20GB from dedup in
the remaining packages.
I've implemented some POC code and deduped my Cygwin mirror (it is
missing most of KDE and the cross-Cygwin compilation toolchains). This
took a solid 12 hours of flat out 400% CPU load on my SandyBridge laptop
and ballooned the page file to 21GiB. But it also removed almost
exactly a third from the repo's size (going from 81.2GiB to 51.4GiB), so
projected to the full repo it's slightly more than my original estimate.
Thanks. It's very useful to have some numbers.
I don't think this distinguishes between packages which are (or should
be) marked ARCH="noarch" in the cygport, and those where the build
products happen to be identical and can be deduped?
I would guess that this saving is dominated by some very large,
data-only noarch packages, but who knows?
(Also, looking forward, perhaps cygport needs a separate command to
build the source package, rather than building it for each arch and then
deduping it?)