This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
RE: New feature "source-id"
- From: Bruce Dawson <bruced at valvesoftware dot com>
- To: 'Gerhard Gappmeier' <gerhard dot gappmeier at ascolab dot com>, "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>
- Date: Tue, 18 Mar 2014 17:56:14 +0000
- Subject: RE: New feature "source-id"
- Authentication-results: sourceware.org; auth=none
- References: <7365721 dot BnaR1nHazz at lt-gergap> <1905500 dot YOUlx3S3mT at ws-gergap> <1395154991 dot 27876 dot 65 dot camel at bordewijk dot wildebeest dot org> <1825450 dot WDlHRVxHcI at ws-gergap>
I understand that some Linux distributions already make source packages for each package that they distribute, and this technique offers some unique advantages.
However, this is orthogonal to the source-id proposal. Source-id's offer different value that is complementary.
Our build system spits out dozens of builds a day. Some of these are run by developers, others by testers, and others by customers. Any one of them might crash. I might end up debugging (live debugging or a core file) any one of these builds, perhaps weeks after it was created. Because we have the source-id system set up I know that I can walk up and down the stack and have the source files automatically show up, with *zero* effort on my part. I don't' have to install source packages, I can have multiple core files from multiple versions loaded simultaneously. Only the source files that I need are downloaded so it is *extremely* efficient. Retrieving the needed source files is essentially instantaneous and requires zero developer effort.
A source package to go with every package has some advantages, such as getting all of the generated files. However it is a very heavyweight solution because it requires retrieving thousands of files from dozens of shared objects in order to look at a crash. In comparison, the source-id solution requires retrieving exactly the set of files that are needed to view the functions on the back-trace so it is extremely lightweight.
Another advantage to the source-id solution is it actually tells you the versions of the files. When I get a crash I can look at the source-id information and see that, for instance, it was built with foo.cpp#17 (version 17 of foo.cpp). That information is literally embedded in the source-id section. I can then look at that file in my VCS client and see if there is a newer version, perhaps with a fix. Doing this with a source package is more cumbersome because (if I understand correctly) you are getting a copy of the source file rather than a reference to a particular version.
A source package is like copying the files, whereas source-id is like having a symbolic link to the files.
I'm not suggesting that the source-id solution is better than a source package, I'm just saying that they are orthogonal. They support different work flows. They can co-exist perfectly.
-----Original Message-----
From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-owner@sourceware.org] On Behalf Of Gerhard Gappmeier
Sent: Tuesday, March 18, 2014 9:41 AM
To: gdb-patches@sourceware.org
Subject: Re: New feature "source-id"
On Tuesday, March 18, 2014 04:03:11 PM you wrote:
> On Tue, 2014-03-18 at 15:00 +0100, Gerhard Gappmeier wrote:
> > On Tuesday, March 18, 2014 02:22:04 PM you wrote:
> > > On Sat, 2014-03-15 at 11:49 +0100, Gerhard Gappmeier wrote:
> > > That way you can use the build-id from the ELF note section to
> > > retrieve both the separate .debug files and the corresponding
> > > source files. And on my distro gdb even helpfully suggests how to do this:
> > > Missing separate debuginfos, use: debuginfo-install
> > > at-3.1.13-14.fc20.x86_64 Which will then fetch the debuginfo
> > > package and all dependencies so gdb can find the .debug files and
> > > the corresponding source code those .debug files refer to. I don't
> > > know if the debuginfo-install suggestion is upstream or only in
> > > the distro package of gdb.
> >
> > If I understood this right, this means whenever a software is built
> > the sources get archived with the debug symbols in an debuginfo RPM file.
> > This way the build-id is all you need to get the correct sources and
> > debug symbols.
>
> Indeed. Just turn the build-id into the package, either through
> something like https://darkserver.fedoraproject.org/ or through yum
> install
> /usr/lib/debug/.build-id/b7/07011ecdbd5bcb1fad73cdc9b4433c791d8328.deb
> ug or just through debuginfo-install and you get both the .debug files
> and all sources files that .debug file refers to.
> > However my idea is somewhat different and a little bit smarter IMO:
> > * The SHA1 id of a git repo gets stored in the source-id meta info
> > when building.
> > * There is no need of archiving the source files in RPM, deb, tar.gz
> > or zip files. We have them already in the version control system and
> > we don't want to duplicate the data
> > * This solution is independent from any package format.
> > * You can analyze coredumps of executables that you don't have on
> > your system. There is no need to install any RPM package for that.
> > This way you can analyze e.g. a crash within a Ubuntu package on a Fedora system.
> > * The fetch-script fetches only the sources required by GDB, not the
> > complete project.
>
> Some of those features are already possible with the way distros
> package the debuginfo files. But your way might indeed be more
> flexible. I am mostly wondering how to take advantage of the way
> distros do it currently in your scheme. How do you describe the
> default distro setup and how do you make sure not to duplicate the storage of source files?
>
> One difference with your scheme is that the distros packages the
> post-processed source files. That means they are the actual files,
> however generated, that the compiler compiled to object code. Not
> necessarily the pristine source files. That is so in a debugger you
> can step through the source file as seen by the compiler (e.g. it will
> include source files generated by configure or the lex and yacc
> generated files that the compiler builds).
Having generated files (that are not in a VCS) available is indeed an advantage of this concept.
However you this is very focused on sources that get packaged by a Linux distribution.
But there is also the usecase for proprietary software the gets not bundled with your distribution. Vendors are creating there own installers or simply a tar.gz file which gets installed in /opt/somewhere.
So there is no debuginfo package available in the package manager.
Companies could recreate this concept of creating debug packages, but I really prefer to just fetch the sources from git.
That's the way I work today:
* Getting a problem report from a customer
* Hoping that the customer reports the correct version
* Search for a version tag in git which matches the reported version
* Checkout that version using git
The source-id simplifies that process. Just open the crashdump -> fetch separate debug info using build-id -> fetch source file that should be displayed in GDB using the fetch-script from an internal cgit web interface.
Our build server copies the separate debug info to an NFS share which I have mounted via /etc/fstab. So fetching symbols just works.
The sources are already in git and available via cgit web interface.
The missing part is just this "source-id" feature, then everything works out- of-the-box.
>
> > > > * We need to make the new section ".note.gnu.source-id"
> > > > official. I don't know who maintains this and this needs to be
> > > > registered somewhere.
> > > > [...]
> > > > * adding file hashes (SHA1) for each source file to the debug info.
> > > > This
> > > > way we can completely remove the mtime check and replace it with
> > > > a check of the SHA1 sum. When we can replace the existing
> > > > warning with a message like "The source file does not match the
> > > > executable."
> > >
> > > For DWARF5 there is a proposal to add the MD5 digest to debug-line
> > > file
> > > table: http://dwarfstd.org/ShowIssue.php?issue=130701.1
> > >
> > > Would that be a good alternative location to store the hash of the
> > > source file?
> >
> > That's exactly what I proposed. Only that I proposed SHA1 instead of
> > MD5, but this doesn't matter.
> > If this is already in the DWARD standard we should use this feature
> > and don't reinvent the wheel.
>
> It is currently just a proposal for DWARF5. The proposal deadline is
> end of this month. I just reviewed that proposal and saw that it is
> not very extensible, so I suggested some additions. See the discussion here:
> http://thread.gmane.org/gmane.comp.standards.dwarf/100
This feature really makes sense independently from the source-id feature.
I'm really looking forward to see that being accepted.
Cheers,
Gerhard
>
> Cheers,
>
> Mark