This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: New feature "source-id"


I understand that some Linux distributions already make source packages for each package that they distribute, and this technique offers some unique advantages.

However, this is orthogonal to the source-id proposal. Source-id's offer different value that is complementary.

Our build system spits out dozens of builds a day. Some of these are run by developers, others by testers, and others by customers. Any one of them might crash. I might end up debugging (live debugging or a core file) any one of these builds, perhaps weeks after it was created. Because we have the source-id system set up I know that I can walk up and down the stack and have the source files automatically show up, with *zero* effort on my part. I don't' have to install source packages, I can have multiple core files from multiple versions loaded simultaneously. Only the source files that I need are downloaded so it is *extremely* efficient. Retrieving the needed source files is essentially instantaneous and requires zero developer effort.

A source package to go with every package has some advantages, such as getting all of the generated files. However it is a very heavyweight solution because it requires retrieving thousands of files from dozens of shared objects in order to look at a crash. In comparison, the source-id solution requires retrieving exactly the set of files that are needed to view the functions on the back-trace so it is extremely lightweight.

Another advantage to the source-id solution is it actually tells you the versions of the files. When I get a crash I can look at the source-id information and see that, for instance, it was built with foo.cpp#17 (version 17 of foo.cpp). That information is literally embedded in the source-id section. I can then look at that file in my VCS client and see if there is a newer version, perhaps with a fix. Doing this with a source package is more cumbersome because (if I understand correctly) you are getting a copy of the source file rather than a reference to a particular version.

A source package is like copying the files, whereas source-id is like having a symbolic link to the files.

I'm not suggesting that the source-id solution is better than a source package, I'm just saying that they are orthogonal. They support different work flows. They can co-exist perfectly. 

-----Original Message-----
From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-owner@sourceware.org] On Behalf Of Gerhard Gappmeier
Sent: Tuesday, March 18, 2014 9:41 AM
To: gdb-patches@sourceware.org
Subject: Re: New feature "source-id"

On Tuesday, March 18, 2014 04:03:11 PM you wrote:
> On Tue, 2014-03-18 at 15:00 +0100, Gerhard Gappmeier wrote:
> > On Tuesday, March 18, 2014 02:22:04 PM you wrote:
> > > On Sat, 2014-03-15 at 11:49 +0100, Gerhard Gappmeier wrote:
> > > That way you can use the build-id from the ELF note section to 
> > > retrieve both the separate .debug files and the corresponding 
> > > source files. And on my distro gdb even helpfully suggests how to do this:
> > > Missing separate debuginfos, use: debuginfo-install
> > > at-3.1.13-14.fc20.x86_64 Which will then fetch the debuginfo 
> > > package and all dependencies so gdb can find the .debug files and 
> > > the corresponding source code those .debug files refer to. I don't 
> > > know if the debuginfo-install suggestion is upstream or only in 
> > > the distro package of gdb.
> > 
> > If I understood this right, this means whenever a software is built 
> > the sources get archived with the debug symbols in an debuginfo RPM file.
> > This way the build-id is all you need to get the correct sources and 
> > debug symbols.
> 
> Indeed. Just turn the build-id into the package, either through 
> something like https://darkserver.fedoraproject.org/ or through yum 
> install 
> /usr/lib/debug/.build-id/b7/07011ecdbd5bcb1fad73cdc9b4433c791d8328.deb
> ug or just through debuginfo-install and you get both the .debug files 
> and all sources files that .debug file refers to.
> > However my idea is somewhat different and a little bit smarter IMO:
> > * The SHA1 id of a git repo gets stored in the source-id meta info 
> > when building.
> > * There is no need of archiving the source files in RPM, deb, tar.gz 
> > or zip files. We have them already in the version control system and 
> > we don't want to duplicate the data
> > * This solution is independent from any package format.
> > * You can analyze coredumps of executables that you don't have on 
> > your system. There is no need to install any RPM package for that. 
> > This way you can analyze e.g. a crash within a Ubuntu package on a Fedora system.
> > * The fetch-script fetches only the sources required by GDB, not the 
> > complete project.
> 
> Some of those features are already possible with the way distros 
> package the debuginfo files. But your way might indeed be more 
> flexible. I am mostly wondering how to take advantage of the way 
> distros do it currently in your scheme. How do you describe the 
> default distro setup and how do you make sure not to duplicate the storage of source files?
> 
> One difference with your scheme is that the distros packages the 
> post-processed source files. That means they are the actual files, 
> however generated, that the compiler compiled to object code. Not 
> necessarily the pristine source files. That is so in a debugger you 
> can step through the source file as seen by the compiler (e.g. it will 
> include source files generated by configure or the lex and yacc 
> generated files that the compiler builds).
Having generated files (that are not in a VCS) available is indeed an advantage of this concept.
However you this is very focused on sources that get packaged by a Linux distribution.

But there is also the usecase for proprietary software the gets not bundled with your distribution. Vendors are creating there own installers or simply a tar.gz file which gets installed in /opt/somewhere.
So there is no debuginfo package available in the package manager.
Companies could recreate this concept of creating debug packages, but I really prefer to just fetch the sources from git.
That's the way I work today:
* Getting a problem report from a customer
* Hoping that the customer reports the correct version
* Search for a version tag in git which matches the reported version
* Checkout that version using git
The source-id simplifies that process. Just open the crashdump -> fetch separate debug info using build-id -> fetch source file that should be displayed in GDB using the fetch-script from an internal cgit web interface.

Our build server copies the separate debug info to an NFS share which I have mounted via /etc/fstab. So fetching symbols just works.
The sources are already in git and available via cgit web interface.
The missing part is just this "source-id" feature, then everything works out- of-the-box.
> 
> > > > * We need to make the new section ".note.gnu.source-id" 
> > > > official. I don't know who maintains this and this needs to be 
> > > > registered somewhere.
> > > > [...]
> > > > * adding file hashes (SHA1) for each source file to the debug info.
> > > > This
> > > > way we can completely remove the mtime check and replace it with 
> > > > a check of the SHA1 sum. When we can replace the existing 
> > > > warning with a message like "The source file does not match the 
> > > > executable."
> > > 
> > > For DWARF5 there is a proposal to add the MD5 digest to debug-line 
> > > file
> > > table: http://dwarfstd.org/ShowIssue.php?issue=130701.1
> > > 
> > > Would that be a good alternative location to store the hash of the 
> > > source file?
> > 
> > That's exactly what I proposed. Only that I proposed SHA1 instead of 
> > MD5, but this doesn't matter.
> > If this is already in the DWARD standard we should use this feature 
> > and don't reinvent the wheel.
> 
> It is currently just a proposal for DWARF5. The proposal deadline is 
> end of this month. I just reviewed that proposal and saw that it is 
> not very extensible, so I suggested some additions. See the discussion here:
> http://thread.gmane.org/gmane.comp.standards.dwarf/100
This feature really makes sense independently from the source-id feature.
I'm really looking forward to see that being accepted.

Cheers,
Gerhard
> 
> Cheers,
> 
> Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]