This is the mail archive of the cygwin-apps mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Dedup x86/x86_64 --> noarch


On 16/04/2016 11:03, Achim Gratz wrote:
After a discussion on IRC about de-duping the noarch content out of
package files (where I was told this would be too difficult), I've just

I think it was more along the lines of 'not yet' :)

In any case, we need noarch support in calm, before it's useful to have dedup of arch packages to noarch.

I think I have implemented the changes to calm to support all-or-nothing noarch (i.e. where all packages produced from a source package must be noarch), so if you can nominate a suitable, unimportant perl package, we can test it with that, initially.

(This wasn't quite as straightforward as just looking in another directory for packages, as the upload validation becomes more complex: we must check that consistent package sets result for both x86 and x86_64 before we can move noarch packages)

To make full use of this, cygport upload will need a feature to upload noarch packages from dist/ to noarch/ rather than <arch>/.

On 18/04/2016 20:44, Achim Gratz wrote:
Looking at the current repo content we'd save about 30GB from the dedup
of the src abd doc packages alone and probably about 20GB from dedup in
the remaining packages.

I've implemented some POC code and deduped my Cygwin mirror (it is
missing most of KDE and the cross-Cygwin compilation toolchains).  This
took a solid 12 hours of flat out 400% CPU load on my SandyBridge laptop
and ballooned the page file to 21GiB.  But it also removed almost
exactly a third from the repo's size (going from 81.2GiB to 51.4GiB), so
projected to the full repo it's slightly more than my original estimate.

Thanks.  It's very useful to have some numbers.

I don't think this distinguishes between packages which are (or should be) marked ARCH="noarch" in the cygport, and those where the build products happen to be identical and can be deduped?

I would guess that this saving is dominated by some very large, data-only noarch packages, but who knows?

(Also, looking forward, perhaps cygport needs a separate command to build the source package, rather than building it for each arch and then deduping it?)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]